Going out of memory for python dictionary when the numbers are integer -
i have python code suppose read large files dictionary in memory , operations. puzzles me in 1 case goes out of memory: when values in file integer...
the structure of file this:
string value_1 .... value_n
the files have varies in size 2g 40g. have 50g memory try read file in. when have this: string 0.001334 0.001473 -0.001277 -0.001093 0.000456 0.001007 0.000314 ...
n=100 , number of rows equal 10m, i'll able read memory relatively fast. file size 10g. however, when have string 4 -2 3 1 1 1 ...
same dimension (n=100) , same number of rows, i'm not able read memory.
for line in f: tokens = line.strip().split() if len(tokens) <= 5: #ignore w2v first line continue word = tokens[0] number_of_columns = len(tokens)-1 features = {} dim, val in enumerate(tokens[1:]): val = float(val) features[dim] = val matrix[word] = features
this result killed
in second case while work in first case.
i know not answer question specifically, offers better solution problem looking resolved:
may suggest use pandas kind of work? seems lot more appropriate you're trying do. http://pandas.pydata.org/index.html
import pandas pd pd.read_csv('file.txt', sep=' ', skiprows=1)
then manipulations pandas package designed handle large datasets , process them. has tons of useful features end needing if you're dealing big data.
Comments
Post a Comment