Visualizing Quality Attributes with Neo4J and VisJS

Posted by Alan Barr on Sun 22 January 2017

When starting a new software project for my personal needs I focus on solving my problem. There are quality attributes for any software project and personal projects have a short list of those. Working in a business anything that is built to solve business needs has a much longer list of attributes than those personal projects. A startup business is most likely to come up with its minimum loveable product and more than likely than not build technical debt with a trade off in the future. For products that are completely greenfield it is difficult to predict how the software would need to be made to prevent future waste. For products that are rewritten the path is already clear it is much easier to integrate quality attributes because the existing system already is lacking.

I first heard about quality attributes from my team's architect and borrower his book Continuous Architecture. A theme I have seen others comment on has been architecture and quality in software is becoming a "Whole Team" effort. I found this poster and decided to use it for defining what metrics we would use for our rewrite.

This poster defines a hierarchy of quality and depending on your stakeholders and your project you would add/remove different qualities. I think the most difficult thing about this poster is that the number of metrics one can create can be overwhelming. It might be easier to give or only show the realms that different teams need to solve for than the entirety of the attributes.

The tedious part of this was to transform the content into a standard structure. I went with XML and based it on the webdev checklist. While a bit dated most of it was helpful though a bit lower level implementation details than I wanted to go into for this project.

The structure is shaped as thus

<quality>
  <category name="Capability">
    <rule name="Completeness">
      <example>All important functions wanted by end users are available.</example>
      <metric>Future features are story written and a estimated time frame is planned out.</metric>
    </rule>
  </category>
</quality>

From this format I parsed it into different formats to visualize it. First as HTML to have checkboxes next to all categories or items. Then into markdown as list items. Still the list is a daunting list. This list is primarily a hierarchy it could be displayed in a tree like format or a graph. A network of connections.

Recently working with my architect coworker we had been discussing displaying more of our data into graph formats. While we might know all the systems that we work on in some fashion or another it is helpful to be explicit where those connections occur.

Most of my programming experience so far has been with PHP and JavaScript. Working on web marketing projects with simple functionality. Now working in a finance company with lots of data traveling between systems correctness/safety/robustness has come to the fore. I understand the idea of object oriented software but very rarely had a need for objects. This project revealed a need for objects to be created with default properties.

While investigating how to display graph data I tried a few different options and settled on vis.js to display my graph data. I had my data in an XML document but in order to display the network graph I needed to convert the data into relationships. The format that vis.js needs is similar to other software though the terms vary. Generally your nodes describe the items in your data set with a unique id and a label to have a friendly name and potentially more attributes. Edges or Links define the relationships between the nodes. They have a properties are describe a from and a to relationship. Or a source and target. Both of these are lists of objects.

"nodes" : [{id:1,label:"first"},{id:2,label:"second"}],
"edges" : [{from: 1, to: 2}]

I had already written code to transform my xml to other formats. I started creating a direct JSON version but when I began to research importing this data into a graph database such as Neo4J I did not want to do unneeded data untangling when I went into the import process. The JSON version did help with displaying the data immediately in VisJS. The problem with documentation is that someone has to be responsible to maintain it and sometimes changes happen faster than the documentation process. If systems are self-describing then documentation is more valuable and more up to date. My ultimate goal would be for an interface to not only view the graph but use VisJS editing features to send those updates to a graph database backend to save changes and share them with the group.

I compared a few different choices for graph databases and went with Neo4J since it has lots of documentation and support. While reading about it I learned about their cypher queries and how they describe relationships between the data. A bit more complex than just JSON. One key point I read from most of the documentation was that uploading data was best done through the load csv function.

I reused the same code for generating my HTML/Markdown/JSON with a twist. I needed my ids for the elements to be unique and while trying to create new GUIDs in C# in a bunch of loops was creating bad/wrong data. My architect coworker suggested that I use objects to provide default properties. I begrudgingly added this but it ended up working very well for this situation. I created a GraphNode Object and a GraphEdge object and when created they make their own private ids while everything else is public. The only extra attribute I added was a hidden attribute so not all data would show up in visjs.

Once this data had its default object properties I would iterate through each node and each edge and write them out to their own tab seperate files. From there I could upload this to Neo4J.

One challenge that I had was that I wanted all this data imported and the connections established. While googling around the common pattern I saw suggested was when uploading the nodes create a default label for all of them then iterate through all the different properties you really want to establish connections with and establish those then remove the original link.

There were some gotchas with Neo4J for example uploading files are quarantined to the import folder unless a setting is changed. I messed up the import more than a few times and found a suggested cypher for cleaning up all the data. Also another problem is at this time using the browser fetch with credentials is not supported due to how fetch works with cross origin headers so I disabled auth for my local db.

Upload the nodes

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///nodes.tsv" AS node FIELDTERMINATOR "\t"
CREATE (n:group { id: node.id, label: node.label, content: node.content, hidden: node.hidden, group: node.group })

Create the index

CREATE INDEX ON :group(id)

Upload the links and create a connection based on the match criteria label group and their ids based on the edges from and to.

USING PERIODIC COMMIT
LOAD CSV WITH HEADERS FROM "file:///edges.tsv" AS edge FIELDTERMINATOR "\t"
MATCH (a:group { id: edge.from })
CREATE (a)-[:CONNECTS { id: edge.id, hidden: edge.hidden }]->(b)

Create new labels based on other properties. These are properties that my data already has set. Repeat for all labels.

MATCH (n:group { group: 'root' })
SET n:root
REMOVE n:group

If you need to remove edges and nodes but not indexes use this.

MATCH (a)
WITH a
LIMIT 10000
OPTIONAL MATCH (a)-[r]-()
DELETE a,r
RETURN COUNT(*)

Once all this data was in the database I could begin to query for it. The Neo4J documentation has a common javascript example of how to query the Neo4J transaction commit endpoint to grab the data. The last step for displaying the data with visjs came down to transforming the data the the query to Neo4J renders as. Based on their provided code the endpoint gives a lot of extra data about each of the nodes and its connections and it has to be filtered out. There was a bit of additional work to include the extra data that I had uploaded for each node as well as refer to the right relationships in my graph. I wanted to use the unique GUID ids and not the index position of the edges to refer to the nodes.

Once all that data was in the right format I loaded it into the vis.DataSet format and began the graph. At first it was the whole set of data and still overwhelming. With tweaks I was able to limit the amount of data using the hidden attribute that visjs understands. That last feature that I wanted to implement for this interface was to hide and show data based on the user preferences. I did not want to use buttons at this time only clicking on nodes in the graph.

Vis.js has many events that allow hooking into clicks, selected nodes, and other actions. With these I was able to display tooltips with the extra information I wanted to provide for nodes. Show or hide nodes based on holding a node long enough. As well as showing the whole graph when double clicking anywhere. The biggest challenge was hiding nodes when they were not directly connected. Since my data is a hierarchy I could find all the remote edges but if this graph connected to itself I would probably need to rework how the hide and show logic works.

I learned a bunch from these various projects and getting everything working together. The practicality for an organization is still unknown. Some people found the data interesting to view but ideally when creating visualizations one knows their audience and how to communicate complex ideas in a simple manner. The features to manipulate and save the graph are implementable but as an end user I am not sure how much I would need this kind of functionality?

I would ultimately like to be at a place that systems describe themselves and their connections programmatically and I can easily update and add connections without having to learn cypher queries and what not. If I need to add connections I can use a UI for that and otherwise the data lives in the database.

tags: how-to