python - How to resample a Pandas dataframe of mixed type? -


i generate mixed type (floats , strings) pandas dataframe df3 following python code:

df1 = pd.dataframe(np.random.randn(dates.shape[0],2),index=dates,columns=list('ab')) df1['c'] = 'a' df1['d'] = 'pickles' df2 = pd.dataframe(np.random.randn(dates.shape[0], 2),index=dates,columns=list('ab')) df2['c'] = 'b' df2['d'] = 'ham' df3 = pd.concat([df1, df2], axis=0) 

when resample df3 higher frequency don't frame resampled higher rate how ignored , missing values:

df4 = df3.groupby(['c']).resample('m',  how={'a': 'mean', 'b': 'mean',  'd': 'ffill'}) df4.head() 

result:

                      b                 d c                                           2014-03-31 -0.4640906 -0.2435414  pickles   2014-04-30        nan        nan      nan   2014-05-31        nan        nan      nan   2014-06-30 -0.5626360  0.6679614  pickles   2014-07-31        nan        nan      nan 

when resample df3 lower frequency don't resampling @ all:

df5 = df3.groupby(['c']).resample('a',  how={'a': np.mean, 'b': np.mean,  'd': 'ffill'}) df5.head() 

result:

                      b                 d c                                           2014-03-31        nan        nan  pickles   2014-06-30        nan        nan  pickles   2014-09-30        nan        nan  pickles   2014-12-31 -0.7429617 -0.1065645  pickles   2015-03-31        nan        nan  pickles 

i'm pretty sure has mixed types because if redo annual down-sampling numerical columns works expected:

df5b = df3[['a', 'b', 'c']].groupby(['c']).resample('a',  how={'a': np.mean, 'b': np.mean}) df5b.head() 

result:

                     b            c                                    2014-12-31 -0.7429617 -0.1065645     2015-12-31 -0.6245030 -0.3101057   b 2014-12-31  0.4213621 -0.0708263     2015-12-31 -0.0607028  0.0110456 

but when switch numerical types resampling higher frequency still doesn't work expected:

df4b = df3[['a', 'b', 'c']].groupby(['c']).resample('m',  how={'a': 'mean', 'b': 'mean'}) df4b.head() 

results:

                      b          c                                  2014-03-31 -0.4640906 -0.2435414   2014-04-30        nan        nan   2014-05-31        nan        nan   2014-06-30 -0.5626360  0.6679614   2014-07-31        nan        nan 

which leaves me 2 questions:

  1. what proper way resample dataframe of mixed type?
  2. when resampling lower frequency higher frequency proper way resampling new values interpolated?

even if can't provide full answer both parts partial solution or answer either question appreciated.

when resampling lower frequency higher frequency realized specifying how when wanted specify fill_method. when things seem work.

df4c = df3.groupby(['c']).resample('m',  fill_method='ffill') df4c.head()                               b        d c                                           2014-03-31 -0.2435414 -0.4640906  pickles   2014-04-30 -0.2435414 -0.4640906  pickles   2014-05-31 -0.2435414 -0.4640906  pickles   2014-06-30  0.6679614 -0.5626360  pickles   2014-07-31  0.6679614 -0.5626360  pickles 

you more limited set of interpolation choices handle mixed types.

when resampling lower frequency using no how option (i believe defaults mean) down-sampling work:

   df5c =df3.groupby(['c']).resample('a')    df5c.head()                            b c                                  2014-12-31 -0.1065645 -0.7429617   2015-12-31 -0.3101057 -0.6245030 b 2014-12-31 -0.0708263  0.4213621   2015-12-31  0.0110456 -0.0607028 

therefore seems problem seems passing dictionary of how options or 1 of option choices, presumably ffill, i'm not sure.


Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -