python - How to resample a Pandas dataframe of mixed type? -

- August 15, 2010

i generate mixed type (floats , strings) pandas dataframe df3 following python code:

df1 = pd.dataframe(np.random.randn(dates.shape[0],2),index=dates,columns=list('ab')) df1['c'] = 'a' df1['d'] = 'pickles' df2 = pd.dataframe(np.random.randn(dates.shape[0], 2),index=dates,columns=list('ab')) df2['c'] = 'b' df2['d'] = 'ham' df3 = pd.concat([df1, df2], axis=0)

when resample df3 higher frequency don't frame resampled higher rate how ignored , missing values:

df4 = df3.groupby(['c']).resample('m',  how={'a': 'mean', 'b': 'mean',  'd': 'ffill'}) df4.head()

result:

                      b                 d c                                           2014-03-31 -0.4640906 -0.2435414  pickles   2014-04-30        nan        nan      nan   2014-05-31        nan        nan      nan   2014-06-30 -0.5626360  0.6679614  pickles   2014-07-31        nan        nan      nan

when resample df3 lower frequency don't resampling @ all:

df5 = df3.groupby(['c']).resample('a',  how={'a': np.mean, 'b': np.mean,  'd': 'ffill'}) df5.head()

result:

                      b                 d c                                           2014-03-31        nan        nan  pickles   2014-06-30        nan        nan  pickles   2014-09-30        nan        nan  pickles   2014-12-31 -0.7429617 -0.1065645  pickles   2015-03-31        nan        nan  pickles

i'm pretty sure has mixed types because if redo annual down-sampling numerical columns works expected:

df5b = df3[['a', 'b', 'c']].groupby(['c']).resample('a',  how={'a': np.mean, 'b': np.mean}) df5b.head()

result:

                     b            c                                    2014-12-31 -0.7429617 -0.1065645     2015-12-31 -0.6245030 -0.3101057   b 2014-12-31  0.4213621 -0.0708263     2015-12-31 -0.0607028  0.0110456

but when switch numerical types resampling higher frequency still doesn't work expected:

df4b = df3[['a', 'b', 'c']].groupby(['c']).resample('m',  how={'a': 'mean', 'b': 'mean'}) df4b.head()

results:

                      b          c                                  2014-03-31 -0.4640906 -0.2435414   2014-04-30        nan        nan   2014-05-31        nan        nan   2014-06-30 -0.5626360  0.6679614   2014-07-31        nan        nan

which leaves me 2 questions:

what proper way resample dataframe of mixed type?
when resampling lower frequency higher frequency proper way resampling new values interpolated?

even if can't provide full answer both parts partial solution or answer either question appreciated.

when resampling lower frequency higher frequency realized specifying how when wanted specify fill_method. when things seem work.

df4c = df3.groupby(['c']).resample('m',  fill_method='ffill') df4c.head()                               b        d c                                           2014-03-31 -0.2435414 -0.4640906  pickles   2014-04-30 -0.2435414 -0.4640906  pickles   2014-05-31 -0.2435414 -0.4640906  pickles   2014-06-30  0.6679614 -0.5626360  pickles   2014-07-31  0.6679614 -0.5626360  pickles

you more limited set of interpolation choices handle mixed types.

when resampling lower frequency using no how option (i believe defaults mean) down-sampling work:

   df5c =df3.groupby(['c']).resample('a')    df5c.head()                            b c                                  2014-12-31 -0.1065645 -0.7429617   2015-12-31 -0.3101057 -0.6245030 b 2014-12-31 -0.0708263  0.4213621   2015-12-31  0.0110456 -0.0607028

therefore seems problem seems passing dictionary of how options or 1 of option choices, presumably ffill, i'm not sure.

Search This Blog

Stadnd

python - How to resample a Pandas dataframe of mixed type? -

Comments

Post a Comment

Popular posts from this blog

python - Statsmodels.api Logit model error ValueError: endog must be in the unit interval -

Capture and play voice with Asterisk ARI -

c++ - Can not find the "fiostream.h" file -