Skip to content

Commit 8e59f1e

Browse files
committed
Added plyr and reshape comparisons
1 parent 2485e09 commit 8e59f1e

File tree

1 file changed

+183
-0
lines changed

1 file changed

+183
-0
lines changed

doc/source/comparison_with_r.rst

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@
55
:suppress:
66
77
from pandas import *
8+
import numpy.random as random
9+
from numpy import *
810
options.display.max_rows=15
911
1012
Comparison with R / R libraries
@@ -98,12 +100,193 @@ xts
98100
plyr
99101
----
100102

103+
``plyr`` is an R library for the split-apply-combine strategy for data
104+
analysis. The functions revolve around three data structures in R, ``a``
105+
for ``arrays``, ``l`` for ``lists``, and ``d`` for ``data.frame``. The
106+
table below shows how these data structures could be mapped in Python.
107+
108+
+------------+-------------------------------+
109+
| R | Python |
110+
+============+===============================+
111+
| array | list |
112+
+------------+-------------------------------+
113+
| lists | dictionary or list of objects |
114+
+------------+-------------------------------+
115+
| data.frame | dataframe |
116+
+------------+-------------------------------+
117+
118+
|ddply|_
119+
~~~~~~~~
120+
121+
An expression using a data.frame called ``df`` in R where you want to
122+
summarize ``x`` by ``month``:
123+
124+
125+
126+
.. code-block:: r
127+
128+
require(plyr)
129+
df <- data.frame(
130+
x = runif(120, 1, 168),
131+
y = runif(120, 7, 334),
132+
z = runif(120, 1.7, 20.7),
133+
month = rep(c(5,6,7,8),30),
134+
week = sample(1:4, 120, TRUE)
135+
)
136+
137+
ddply(df, .(month, week), summarize,
138+
mean = round(mean(x), 2),
139+
sd = round(sd(x), 2))
140+
141+
In ``pandas`` the equivalent expression, using the
142+
:meth:`~pandas.DataFrame.groupby` method, would be:
143+
144+
145+
146+
.. ipython:: python
147+
148+
df = DataFrame({
149+
'x': random.uniform(1., 168., 120),
150+
'y': random.uniform(7., 334., 120),
151+
'z': random.uniform(1.7, 20.7, 120),
152+
'month': [5,6,7,8]*30,
153+
'week': random.randint(1,4, 120)
154+
})
155+
156+
grouped = df.groupby(['month','week'])
157+
print grouped['x'].agg([mean, std])
158+
159+
160+
For more details and examples see :ref:`the groupby documentation
161+
<groupby.aggregate>`.
162+
101163
reshape / reshape2
102164
------------------
103165

166+
|meltarray|_
167+
~~~~~~~~~~~~~
168+
169+
An expression using a 3 dimensional array called ``a`` in R where you want to
170+
melt it into a data.frame:
171+
172+
.. code-block:: r
173+
174+
a <- array(c(1:23, NA), c(2,3,4))
175+
data.frame(melt(a))
176+
177+
In Python, since ``a`` is a list, you can simply use list comprehension.
178+
179+
.. ipython:: python
180+
a = array(range(1,24)+[NAN]).reshape(2,3,4)
181+
DataFrame([tuple(list(x)+[val]) for x, val in ndenumerate(a)])
182+
183+
|meltlist|_
184+
~~~~~~~~~~~~
185+
186+
An expression using a list called ``a`` in R where you want to melt it
187+
into a data.frame:
188+
189+
.. code-block:: r
190+
191+
a <- as.list(c(1:4, NA))
192+
data.frame(melt(a))
193+
194+
In Python, this list would be a list of tuples, so
195+
:meth:`~pandas.DataFrame` method would convert it to a dataframe as required.
196+
197+
.. ipython:: python
198+
199+
a = list(enumerate(range(1,5)+[NAN]))
200+
DataFrame(a)
201+
202+
For more details and examples see :ref:`the Into to Data Structures
203+
documentation <basics.dataframe.from_items>`.
204+
205+
|meltdf|_
206+
~~~~~~~~~~~~~~~~
207+
208+
An expression using a data.frame called ``cheese`` in R where you want to
209+
reshape the data.frame:
210+
211+
.. code-block:: r
212+
213+
cheese <- data.frame(
214+
first = c('John, Mary'),
215+
last = c('Doe', 'Bo'),
216+
height = c(5.5, 6.0),
217+
weight = c(130, 150)
218+
)
219+
melt(cheese, id=c("first", "last"))
220+
221+
In Python, the :meth:`~pandas.melt` method is the R equivalent:
222+
223+
.. ipython:: python
224+
225+
cheese = DataFrame({'first' : ['John', 'Mary'],
226+
'last' : ['Doe', 'Bo'],
227+
'height' : [5.5, 6.0],
228+
'weight' : [130, 150]})
229+
melt(cheese, id_vars=['first', 'last'])
230+
cheese.set_index(['first', 'last']).stack() # alternative way
231+
232+
For more details and examples see :ref:`the reshaping documentation
233+
<reshaping.melt>`.
234+
235+
|cast|_
236+
~~~~~~~
237+
238+
An expression using a data.frame called ``df`` in R to cast into a higher
239+
dimensional array:
240+
241+
.. code-block:: r
242+
243+
df <- data.frame(
244+
x = runif(12, 1, 168),
245+
y = runif(12, 7, 334),
246+
z = runif(12, 1.7, 20.7),
247+
month = rep(c(5,6,7),4),
248+
week = rep(c(1,2), 6)
249+
)
250+
251+
mdf <- melt(df, id=c("month", "week"))
252+
acast(mdf, week ~ month ~ variable, mean)
253+
254+
In Python the best way is to make use of :meth:`~pandas.pivot_table`:
255+
256+
.. ipython:: python
257+
258+
df = DataFrame({
259+
'x': random.uniform(1., 168., 12),
260+
'y': random.uniform(7., 334., 12),
261+
'z': random.uniform(1.7, 20.7, 12),
262+
'month': [5,6,7]*4,
263+
'week': [1,2]*6
264+
})
265+
mdf = melt(df, id_vars=['month', 'week'])
266+
pivot_table(mdf, values='value', rows=['variable','week'],
267+
cols=['month'], aggfunc=mean)
268+
269+
For more details and examples see :ref:`the reshaping documentation
270+
<reshaping.pivot>`.
104271

105272
.. |with| replace:: ``with``
106273
.. _with: http://finzi.psych.upenn.edu/R/library/base/html/with.html
107274

108275
.. |subset| replace:: ``subset``
109276
.. _subset: http://finzi.psych.upenn.edu/R/library/base/html/subset.html
277+
278+
.. |ddply| replace:: ``ddply``
279+
.. _ddply: http://www.inside-r.org/packages/cran/plyr/docs/ddply
280+
281+
.. |meltarray| replace:: ``melt.array``
282+
.. _meltarray: http://www.inside-r.org/packages/cran/reshape2/docs/melt.array
283+
284+
.. |meltlist| replace:: ``melt.list``
285+
.. meltlist: http://www.inside-r.org/packages/cran/reshape2/docs/melt.list
286+
287+
.. |meltdf| replace:: ``melt.data.frame``
288+
.. meltdf: http://www.inside-r.org/packages/cran/reshape2/docs/melt.data.frame
289+
290+
.. |cast| replace:: ``cast``
291+
.. cast: http://www.inside-r.org/packages/cran/reshape2/docs/cast
292+

0 commit comments

Comments
 (0)