python - How to resample a Pandas dataframe of mixed type? -
i generate mixed type (floats , strings) pandas dataframe df3 following python code:
df1 = pd.dataframe(np.random.randn(dates.shape[0],2),index=dates,columns=list('ab')) df1['c'] = 'a' df1['d'] = 'pickles' df2 = pd.dataframe(np.random.randn(dates.shape[0], 2),index=dates,columns=list('ab')) df2['c'] = 'b' df2['d'] = 'ham' df3 = pd.concat([df1, df2], axis=0)
when resample df3 higher frequency don't frame resampled higher rate how ignored , missing values:
df4 = df3.groupby(['c']).resample('m', how={'a': 'mean', 'b': 'mean', 'd': 'ffill'}) df4.head()
result:
b d c 2014-03-31 -0.4640906 -0.2435414 pickles 2014-04-30 nan nan nan 2014-05-31 nan nan nan 2014-06-30 -0.5626360 0.6679614 pickles 2014-07-31 nan nan nan
when resample df3 lower frequency don't resampling @ all:
df5 = df3.groupby(['c']).resample('a', how={'a': np.mean, 'b': np.mean, 'd': 'ffill'}) df5.head()
result:
b d c 2014-03-31 nan nan pickles 2014-06-30 nan nan pickles 2014-09-30 nan nan pickles 2014-12-31 -0.7429617 -0.1065645 pickles 2015-03-31 nan nan pickles
i'm pretty sure has mixed types because if redo annual down-sampling numerical columns works expected:
df5b = df3[['a', 'b', 'c']].groupby(['c']).resample('a', how={'a': np.mean, 'b': np.mean}) df5b.head()
result:
b c 2014-12-31 -0.7429617 -0.1065645 2015-12-31 -0.6245030 -0.3101057 b 2014-12-31 0.4213621 -0.0708263 2015-12-31 -0.0607028 0.0110456
but when switch numerical types resampling higher frequency still doesn't work expected:
df4b = df3[['a', 'b', 'c']].groupby(['c']).resample('m', how={'a': 'mean', 'b': 'mean'}) df4b.head()
results:
b c 2014-03-31 -0.4640906 -0.2435414 2014-04-30 nan nan 2014-05-31 nan nan 2014-06-30 -0.5626360 0.6679614 2014-07-31 nan nan
which leaves me 2 questions:
- what proper way resample dataframe of mixed type?
- when resampling lower frequency higher frequency proper way resampling new values interpolated?
even if can't provide full answer both parts partial solution or answer either question appreciated.
when resampling lower frequency higher frequency realized specifying how when wanted specify fill_method. when things seem work.
df4c = df3.groupby(['c']).resample('m', fill_method='ffill') df4c.head() b d c 2014-03-31 -0.2435414 -0.4640906 pickles 2014-04-30 -0.2435414 -0.4640906 pickles 2014-05-31 -0.2435414 -0.4640906 pickles 2014-06-30 0.6679614 -0.5626360 pickles 2014-07-31 0.6679614 -0.5626360 pickles
you more limited set of interpolation choices handle mixed types.
when resampling lower frequency using no how option (i believe defaults mean) down-sampling work:
df5c =df3.groupby(['c']).resample('a') df5c.head() b c 2014-12-31 -0.1065645 -0.7429617 2015-12-31 -0.3101057 -0.6245030 b 2014-12-31 -0.0708263 0.4213621 2015-12-31 0.0110456 -0.0607028
therefore seems problem seems passing dictionary of how options or 1 of option choices, presumably ffill, i'm not sure.
Comments
Post a Comment