python - Adding Column headers to pandas dataframe.. but NAN's all the data even though headers are same dimension -
i trying add column headers csv file have parsed dataframe withing pandas.
dftrades = pd.read_csv('pnl1.txt',delim_whitespace=true,header=none,); dftrades = dftrades.drop(dftrades.columns[[3,4,6,8,10,11,13,15,17,18,25,27,29,32]], axis=1) # note: 0 indexed dftrades = dftrades.set_index([dftrades.index]); df = pd.dataframe(dftrades,columns=['tradedate', 'tradetime', 'cumpnl', 'dailycumpnl', 'realisedpnl', 'unrealisedpnl', 'ccyccy', 'ccyccypnldaily', 'position', 'candleopen', 'candlehigh', 'candlelow', 'candleclose', 'candledir', 'candledirswings', 'tradeamount', 'rate', 'pnl/trade', 'venue', 'ordertype', 'orderid' 'code']); print df
the structure of data is:
01/10/2015 05:47.3 190 190 -648 838 eurnok -648 0 0 611 -1137 -648 h 2 -1000000 9.465 -648 internal ioc 287
what pandas returns is:
tradedate tradetime cumpnl dailycumpnl realisedpnl unrealisedpnl \ 0 nan nan nan nan nan nan ...
i appreciate advice on issue.
thanks
ps. ed answer. have tried suggestion with
df = dftrades.columns=['tradedate', 'tradetime', 'cumpnl', 'dailycumpnl', 'realisedpnl', 'unrealisedpnl', 'ccyccy', 'ccyccypnldaily', 'position', 'candleopen', 'candlehigh', 'candlelow', 'candleclose', 'candledir', 'candledirswings', 'tradeamount', 'rate', 'pnl/trade', 'venue', 'ordertype', 'orderid' 'code'];
but problem has morphed to:
valueerror: length mismatch: expected axis has 22 elements, new values have 21 elements
i have taken shape of matrix , got: dftrades.shape
(12056, 22)
so sadly still need :(
assign directly columns:
df.columns = ['tradedate', 'tradetime', 'cumpnl', 'dailycumpnl', 'realisedpnl', 'unrealisedpnl', 'ccyccy', 'ccyccypnldaily', 'position', 'candleopen', 'candlehigh', 'candlelow', 'candleclose', 'candledir', 'candledirswings', 'tradeamount', 'rate', 'pnl/trade', 'venue', 'ordertype', 'orderid' 'code']
what you're doing reindexing , because columns don't agree nan
s you're passing df data align on existing column names , index values.
you can see same semantic behaviour here:
in [240]: df = pd.dataframe(data= np.random.randn(5,3), columns = np.arange(3)) df out[240]: 0 1 2 0 1.037216 0.761995 0.153047 1 -0.602141 -0.114032 -0.323872 2 -1.188986 0.594895 -0.733236 3 0.556196 0.363965 -0.893846 4 0.547791 -0.378287 -1.171706 in [242]: df1 = pd.dataframe(df, columns = list('abc')) df1 out[242]: b c 0 nan nan nan 1 nan nan nan 2 nan nan nan 3 nan nan nan 4 nan nan nan
alternatively can pass np array data:
df = pd.dataframe(dftrades.values,columns=['tradedate', in [244]: df1 = pd.dataframe(df.values, columns = list('abc')) df1 out[244]: b c 0 1.037216 0.761995 0.153047 1 -0.602141 -0.114032 -0.323872 2 -1.188986 0.594895 -0.733236 3 0.556196 0.363965 -0.893846 4 0.547791 -0.378287 -1.171706
Comments
Post a Comment