python - Unable to generate network data in tree format while retaining node attributes -


i'm trying generate network graph visualises data lineage (cluster graph such this). please keep in mind i'm new networkx library , code below might far optimal.

my data consist of 2 pandas dataframes:

  • df_objs: df contains uuid , name of different items (these become nodes
  • df_calls: df contains calling , called uuid (these uuids references uuids of items in df_objs).

here's initialise directed graph , create nodes:

import networkx nx objs = df_objs.set_index('uuid').to_dict(orient='index') g = nx.digraph() obj_id, obj_attrs in objs.items():     g.add_node(obj_id, attr_dict=obj_attrs) 

and generate edges:

g.add_edges_from(df_calls.drop_duplicates().to_dict(orient='split')['data']) 

next, want know lineage of single item using uuid:

g_tree = nx.digraph(nx.bfs_edges(g, 'f6e214b1bba34a01bd0c18f232d6aee2', reverse=true)) 

so far good. last step generate json graph can feed resulting json file d3.js in order perform visualisation:

# create json data structure networkx.readwrite import json_graph data = json_graph.tree_data(g_tree, root='f6e214b1bba34a01bd0c18f232d6aee2') # write tree json file import json open('./tree.json', 'w') f:     json.dump(data, f) 

all of above works, however, instead of node names, i'm left uuid in json data, due node attributes being dropped in call nx.bfs_edges().

example:

tree example

not problem (at least that's thought); i'll update nodes in g_tree attributes g.

obj_names = nx.get_node_attributes(g, 'name') obj_id, obj_name in obj_names.items():     try:         g_tree[obj_id]['name'] = obj_name     except exception:         pass 

note: can't use set_node_attributes() g contains more nodes g_tree, causes keyerror.

if try generate json data again:

data = json_graph.tree_data(g_tree, root='f6e214b1bba34a01bd0c18f232d6aee2') 

it throw error:

typeerror: g not tree. 

this due number of nodes != number of edges + 1.

before setting attributes, number of nodes 81 , number of edges 80. after setting attributes, number of edges increased 120 (number of nodes remained same).

ok, questions:

  1. am taking long way around , there shorter/better/faster way generate same result?
  2. what causing number of edges increase when i'm setting attributes nodes?
  3. is there way retain node attributes when trying generate tree?

per warning in docs regarding dict g[node],

do not change returned dict – part of graph data structure , direct manipulation may leave graph in inconsistent state.

thus, assignment g_tree[obj_id] no-no:

g_tree[obj_id]['name'] = obj_name 

instead use g.node modify attributes:

g_tree.node[obj_id]['name'] = obj_name 

also, once have g_tree, can obtain list of nodes in g_tree with

in [220]: g_tree.nodes() out[220]: ['a', 'c', 'b'] 

and can use

for obj_id in g_tree.nodes():     g_tree.node[obj_id] = g.node[obj_id] 

to copy attributes g g_tree.


import json import pandas pd import networkx nx networkx.readwrite import json_graph  df_objs = pd.dataframe({'uuid':list('abcd'), 'name':['foo','bar','baz','quux']}) df_calls = pd.dataframe({'calling':['a','a'], 'called':['b','c']}) objs = df_objs.set_index('uuid').to_dict(orient='index') g = nx.digraph() g.add_nodes_from(objs.items()) g.add_edges_from(df_calls[['calling','called']].drop_duplicates().values)  g_tree = nx.digraph(nx.bfs_edges(g, 'a'))  obj_id in g_tree.nodes():     g_tree.node[obj_id] = g.node[obj_id]  print(g_tree.nodes(data=true)) # [('a', {'name': 'foo'}), ('c', {'name': 'baz'}), ('b', {'name': 'bar'})]  data = json_graph.tree_data(g_tree, root='a') print(json.dumps(data)) # {"children": [{"name": "baz", "id": "c"}, {"name": "bar", "id": "b"}],  #  "name": "foo", "id": "a"} 

Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -