Going out of memory for python dictionary when the numbers are integer -

- June 15, 2013

i have python code suppose read large files dictionary in memory , operations. puzzles me in 1 case goes out of memory: when values in file integer...

the structure of file this:

string value_1 .... value_n

the files have varies in size 2g 40g. have 50g memory try read file in. when have this: string 0.001334 0.001473 -0.001277 -0.001093 0.000456 0.001007 0.000314 ... n=100 , number of rows equal 10m, i'll able read memory relatively fast. file size 10g. however, when have string 4 -2 3 1 1 1 ... same dimension (n=100) , same number of rows, i'm not able read memory.

for line in f:     tokens = line.strip().split()     if len(tokens) <= 5: #ignore w2v first line       continue     word = tokens[0]     number_of_columns = len(tokens)-1      features = {}      dim, val in enumerate(tokens[1:]):       val = float(val)       features[dim] = val     matrix[word] = features

this result killed in second case while work in first case.

i know not answer question specifically, offers better solution problem looking resolved:

may suggest use pandas kind of work? seems lot more appropriate you're trying do. http://pandas.pydata.org/index.html

import pandas pd  pd.read_csv('file.txt', sep=' ', skiprows=1)

then manipulations pandas package designed handle large datasets , process them. has tons of useful features end needing if you're dealing big data.

Search This Blog

Stadnd

Going out of memory for python dictionary when the numbers are integer -

Comments

Post a Comment

Popular posts from this blog

Capture and play voice with Asterisk ARI -

python - Statsmodels.api Logit model error ValueError: endog must be in the unit interval -

python - How to use elasticsearch.helpers.streaming_bulk -