The only blog not featuring an ipod.

networkx gephi and graphml: missing node attributes

You are exporting graphml files using networkx, loading them into gephi and not being able to see all the attributes your node has?
read on!

There's one dirty secret of gephi: some attribute-id's of graphml files have reserved status.
The problem is, networkx's write_graphml function produces id's that, sometimes, fall into this category.
Here's the scenario:

You create a graph with networkx and put attributes into the nodes using them as dictionaries.

let G be some networkx graph object with num_nodes nodes and let attMatrix a num_attributes x num_nodes (numpy) matrix that contains some numbers you want to put on the nodes

V=G.nodes();
for v in V:
for i in range (num_attributes):
G.node[v][str(i)] = int(attMatrix[i,v]);

networxx.write_graphml(G, "myGraph.graphml")


It turns out, if num_attributes >=4  then there's going to be one attribute you won't be able to use.. the fourth. If you peak into the graphml you will see that that attribute is assigned  id="d3" which gephi chooses to ignore.

The simplest workaround:   open your graphmlfile and replace all id= with idd= 

The slightly better workaround: open  networkx/readwrite/graphml.py  and change, in line 260  new_id = "d%i" % len(list(self.keys))     to   new_id = "dd%i" % len(list(self.keys))
then reload networkx module and re-export (you might have to re-start python altogether). This makes more sense if you are going to be exporting more graphs.

There is, for sure, a better workaround: modify the  write_graphml function so that it accepts this "d" or "dd" as a parameter. But that should be done by the networkx people, I guess.


No comments:




Alguien me habló todos los días de mi vida al oido, despacio, lentamente. Me dijo: ¡vive, vive, vive! Era la muerte. (JS)