can advice how use function elasticsearch.helpers.streaming_bulk instead elasticsearch.helpers.bulk indexing data elasticsearch. if change streaming_bulk instead of bulk, nothing gets indexed, guess needs used in different form. code below creates index, type , index data csv file in chunks of 500 elemens elasticsearch. working wandering possible increse prerformance. that's why want try out streaming_bulk function. currently need 10 minutes index 1 million rows csv document of 200mb. use 2 machines, centos 6.6 8 cpu-s, x86_64, cpu mhz: 2499.902, mem: 15.574g total. not sure can go faster. es = elasticsearch.elasticsearch([{'host': 'uxmachine-test', 'port': 9200}]) index_name = 'new_index' type_name = 'new_type' mapping = json.loads(open(config["index_mapping"]).read()) #read mapping json file es.indices.create(index_name) es.indices.put_mapping(index=index_name, doc_type=type_name, body=mapping) open(file_to_index, ...
Comments
Post a Comment