Iterating through Python dictionary very slow -


i'm trying compute distance between pairs of users based on values of items assigned them. distance calculation should null when 2 users not have intersecting items. i'm calculating lower half of distance matrix (eg. usera-userb equivalent userb-usera calculate one).

so have following python script works, starts chugging when feed more few hundred users. sample script below shows input structure, i'm trying thousands, not 4 have shown here.

the line s = {k:v k,v in data.items() if k in (user1,user2)} seems add overhead

import math decimal import *  def has_matching_product(data,user1,user2):     c1=set(data[user1].keys())     c2=[k k in data[user2].keys()]     return any([x in c1 x in c2])  def get_euclidean_dist(data,user1,user2):     #tried subsetting run quicker?     s = {k:v k,v in data.items() if k in (user1,user2)}      #ignore users no overlapping items     if has_matching_product(s,user1,user2):         items=set()         k,v in s.items():             ki in v.keys():                 items.add(ki)          rs=decimal(0)         in items:             p1 = s.get(user1).get(i)             p2 = s.get(user2).get(i)             v1 = p1 or 0             v2 = p2 or 0              rs+= decimal((v1-v2)**2)         return math.sqrt(rs)     else:         return none  #user/product/value raw_data = {     'u1': {         'i1':5,         'i4':2     },     'u2': {         'i1':1,         'i3':6     },     'u3': {         'i3':11     },     'u4': {         'i4':9     } }  users = sorted(raw_data.keys()) l = len(users)  data_out = set() #compute lower half of distance matrix (unique pairs only) u1 in range(0,l-1):     u2 in range(1+u1,l):         dist = get_euclidean_dist(raw_data,users[u1],users[u2])         print('{x} | {y} | {d}'.format(x=users[u1],y=users[u2],d=dist)) #sample output 

what proper output should like:

u1 | u2 | 7.483314773547883 u1 | u3 | none u1 | u4 | 8.602325267042627 u2 | u3 | 5.0990195135927845 u2 | u4 | none u3 | u4 | none 

the issue you're walking entire dictionary every time, find 2 items want. , looks of it, you're pulling out users, , spending time trying go find them again in data. @peter wood's suggestion bunch - grab 2 users want in first place, that's sort of missing forest trees - don't need slim down dictionary in first place @ all. keep together:

import itertools kv1, kv2 in itertools.combinations(data.items(), 2):     ## calculate distance directly here 

Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -