python - How to resample a Pandas dataframe of mixed type? -
i generate mixed type (floats , strings) pandas dataframe df3 following python code:
df1 = pd.dataframe(np.random.randn(dates.shape[0],2),index=dates,columns=list('ab')) df1['c'] = 'a' df1['d'] = 'pickles' df2 = pd.dataframe(np.random.randn(dates.shape[0], 2),index=dates,columns=list('ab')) df2['c'] = 'b' df2['d'] = 'ham' df3 = pd.concat([df1, df2], axis=0) when resample df3 higher frequency don't frame resampled higher rate how ignored , missing values:
df4 = df3.groupby(['c']).resample('m', how={'a': 'mean', 'b': 'mean', 'd': 'ffill'}) df4.head() result:
b d c 2014-03-31 -0.4640906 -0.2435414 pickles 2014-04-30 nan nan nan 2014-05-31 nan nan nan 2014-06-30 -0.5626360 0.6679614 pickles 2014-07-31 nan nan nan when resample df3 lower frequency don't resampling @ all:
df5 = df3.groupby(['c']).resample('a', how={'a': np.mean, 'b': np.mean, 'd': 'ffill'}) df5.head() result:
b d c 2014-03-31 nan nan pickles 2014-06-30 nan nan pickles 2014-09-30 nan nan pickles 2014-12-31 -0.7429617 -0.1065645 pickles 2015-03-31 nan nan pickles i'm pretty sure has mixed types because if redo annual down-sampling numerical columns works expected:
df5b = df3[['a', 'b', 'c']].groupby(['c']).resample('a', how={'a': np.mean, 'b': np.mean}) df5b.head() result:
b c 2014-12-31 -0.7429617 -0.1065645 2015-12-31 -0.6245030 -0.3101057 b 2014-12-31 0.4213621 -0.0708263 2015-12-31 -0.0607028 0.0110456 but when switch numerical types resampling higher frequency still doesn't work expected:
df4b = df3[['a', 'b', 'c']].groupby(['c']).resample('m', how={'a': 'mean', 'b': 'mean'}) df4b.head() results:
b c 2014-03-31 -0.4640906 -0.2435414 2014-04-30 nan nan 2014-05-31 nan nan 2014-06-30 -0.5626360 0.6679614 2014-07-31 nan nan which leaves me 2 questions:
- what proper way resample dataframe of mixed type?
- when resampling lower frequency higher frequency proper way resampling new values interpolated?
even if can't provide full answer both parts partial solution or answer either question appreciated.
when resampling lower frequency higher frequency realized specifying how when wanted specify fill_method. when things seem work.
df4c = df3.groupby(['c']).resample('m', fill_method='ffill') df4c.head() b d c 2014-03-31 -0.2435414 -0.4640906 pickles 2014-04-30 -0.2435414 -0.4640906 pickles 2014-05-31 -0.2435414 -0.4640906 pickles 2014-06-30 0.6679614 -0.5626360 pickles 2014-07-31 0.6679614 -0.5626360 pickles you more limited set of interpolation choices handle mixed types.
when resampling lower frequency using no how option (i believe defaults mean) down-sampling work:
df5c =df3.groupby(['c']).resample('a') df5c.head() b c 2014-12-31 -0.1065645 -0.7429617 2015-12-31 -0.3101057 -0.6245030 b 2014-12-31 -0.0708263 0.4213621 2015-12-31 0.0110456 -0.0607028 therefore seems problem seems passing dictionary of how options or 1 of option choices, presumably ffill, i'm not sure.
Comments
Post a Comment