python - Unable to generate network data in tree format while retaining node attributes -
i'm trying generate network graph visualises data lineage (cluster graph such this). please keep in mind i'm new networkx library , code below might far optimal.
my data consist of 2 pandas dataframes:
df_objs
: df contains uuid , name of different items (these become nodesdf_calls
: df contains calling , called uuid (these uuids references uuids of items indf_objs
).
here's initialise directed graph , create nodes:
import networkx nx objs = df_objs.set_index('uuid').to_dict(orient='index') g = nx.digraph() obj_id, obj_attrs in objs.items(): g.add_node(obj_id, attr_dict=obj_attrs)
and generate edges:
g.add_edges_from(df_calls.drop_duplicates().to_dict(orient='split')['data'])
next, want know lineage of single item using uuid:
g_tree = nx.digraph(nx.bfs_edges(g, 'f6e214b1bba34a01bd0c18f232d6aee2', reverse=true))
so far good. last step generate json graph can feed resulting json file d3.js in order perform visualisation:
# create json data structure networkx.readwrite import json_graph data = json_graph.tree_data(g_tree, root='f6e214b1bba34a01bd0c18f232d6aee2') # write tree json file import json open('./tree.json', 'w') f: json.dump(data, f)
all of above works, however, instead of node names, i'm left uuid in json data, due node attributes being dropped in call nx.bfs_edges()
.
example:
not problem (at least that's thought); i'll update nodes in g_tree
attributes g
.
obj_names = nx.get_node_attributes(g, 'name') obj_id, obj_name in obj_names.items(): try: g_tree[obj_id]['name'] = obj_name except exception: pass
note: can't use set_node_attributes()
g
contains more nodes g_tree
, causes keyerror
.
if try generate json data again:
data = json_graph.tree_data(g_tree, root='f6e214b1bba34a01bd0c18f232d6aee2')
it throw error:
typeerror: g not tree.
this due number of nodes != number of edges + 1
.
before setting attributes, number of nodes 81 , number of edges 80. after setting attributes, number of edges increased 120 (number of nodes remained same).
ok, questions:
- am taking long way around , there shorter/better/faster way generate same result?
- what causing number of edges increase when i'm setting attributes nodes?
- is there way retain node attributes when trying generate tree?
per warning in docs regarding dict g[node]
,
do not change returned dict – part of graph data structure , direct manipulation may leave graph in inconsistent state.
thus, assignment g_tree[obj_id]
no-no:
g_tree[obj_id]['name'] = obj_name
instead use g.node
modify attributes:
g_tree.node[obj_id]['name'] = obj_name
also, once have g_tree
, can obtain list of nodes in g_tree
with
in [220]: g_tree.nodes() out[220]: ['a', 'c', 'b']
and can use
for obj_id in g_tree.nodes(): g_tree.node[obj_id] = g.node[obj_id]
to copy attributes g
g_tree
.
import json import pandas pd import networkx nx networkx.readwrite import json_graph df_objs = pd.dataframe({'uuid':list('abcd'), 'name':['foo','bar','baz','quux']}) df_calls = pd.dataframe({'calling':['a','a'], 'called':['b','c']}) objs = df_objs.set_index('uuid').to_dict(orient='index') g = nx.digraph() g.add_nodes_from(objs.items()) g.add_edges_from(df_calls[['calling','called']].drop_duplicates().values) g_tree = nx.digraph(nx.bfs_edges(g, 'a')) obj_id in g_tree.nodes(): g_tree.node[obj_id] = g.node[obj_id] print(g_tree.nodes(data=true)) # [('a', {'name': 'foo'}), ('c', {'name': 'baz'}), ('b', {'name': 'bar'})] data = json_graph.tree_data(g_tree, root='a') print(json.dumps(data)) # {"children": [{"name": "baz", "id": "c"}, {"name": "bar", "id": "b"}], # "name": "foo", "id": "a"}
Comments
Post a Comment