Sorting a data.frame in R
I frequently find myself having to re-order rows of a data.frame based on the levels of an ordered factor in R.
For example, I want to take this data.frame:
product store sales 1 a s1 12 2 b s1 24 3 a s2 32 4 c s2 12 5 a s3 9 6 b s3 2 7 c s3 29
And sort it so that the sales data from the stores with the most sales occur first:
product store sales 3 a s2 32 4 c s2 12 5 a s3 9 6 b s3 2 7 c s3 29 1 a s1 12 2 b s1 24
I keep forgetting the exact semantics of how its done and Google never offers any assistance on the topic, so here is a quick post to get it down once and for all, both for my own benefit and the greater good.
First we need some data:
productSalesByStore = data.frame(
product = c('a', 'b', 'a', 'c', 'a', 'b', 'c'),
store = c('s1', 's1', 's2', 's2', 's3', 's3', 's3'),
sales = c(12, 24, 32, 12, 9, 2, 29)
)
Now construct a sorted summary of sales by store:
storeSalesSummary =
aggregate(
productSalesByStore$sales,
list(store = productSalesByStore$store),
sum)
storeSalesSummary =
storeSalesSummary[
order(storeSalesSummary$x, decreasing=TRUE),
]
storeSalesSummary should look like this:
store x 2 s2 44 3 s3 40 1 s1 36
Use that summary data to construct an ordered factor of store names:
storesBySales =
ordered(
storeSalesSummary$store,
levels=storeSalesSummary$store
)
storesBySales is now an ordered factor that looks like this:
[1] s2 s3 s1 Levels: s2 < s3 < s1
Re-construct productSalesByStore$store so that it is an ordered factor with the same levels as storesBySales
productSalesByStore$store =
ordered(productSalesByStore$store, levels=storesBySales)
Note that neither the contents nor the order of productSalesByStore has changed (yet). Just the datatype of the store column.
Finally, we use the implicit ordering of store to generate an explicit permutation of productSalesByStore so that we can sort the rows in a stable manner:
productSalesByStore =
productSalesByStore[ order(productSalesByStore$store), ]
And we are done!
ap.
Comments
I have found this useful.
http://www.ens.gu.edu.au/ROBERTK/R/HELP/00B/0128.HTML
Posted by: Atte Tenkanen | July 16, 2007 04:42 AM
A possible shorter way is
do.call('rbind', by(productSalesByStore, productSalesByStore$store, function(x) x[sort.list(x$sales, dec=TRUE),]))
Posted by: Rajarshi Guha | July 16, 2007 04:59 AM