A series on how the decentralised XRP ledger can be used to explore relationships between wallets using a graph database. This first part focuses on graph databases versus “normal database” and my process of establishing the database.
If you have read my previous articles, you know one of my interests is to look deep in the ledger, in the transaction history and relations. I have mainly been working from a graph database perspective to explore and visualise the ledger but without an actual graph database.
A family tree
About a week ago, I had an idea to use a family tree (or organisational chart) as a way to illustrate and explore the relationships between wallets. Who made who, and when?
I used my usual approach, querying the necessary data from BigQuery, exporting it to CSV or JSON to manipulate it using node.js locally, and in this case using Graphviz to make a graphic representation:
I made an experiment last night to create family trees from $XRP wallets. So I established a relational database of all wallets (who activated who) using BigQuery. The tree below is from genesis wallet X.Distributor.1 and 5 generations down. Go explore: https://t.co/8z0XgXmWe9 pic.twitter.com/6VJdOplGPf— Thomas Silkjær (@Silkjaer) December 4, 2018
The usefulness of this representation led me to research how I could move the data I was interested in from the XRP ledger to a graph database.
What is a graph database?
The “normal” relational databases are storing data in tables, with rows and columns, where relations are expressed through identifiers or a joining table. However, a graph database is based on nodes and relations—and this makes queries cheaper because where you would have to do multiple lookups in a relational database (lookup a parent, parent-child relationships, child), in a graph database you only do one lookup (find parent and traverse relations).
So graph databases are built for working with relational data in a completely different way and makes it more suitable for applying algorithms, e.g. finding sub-communities (clustering) and shortest paths between nodes.
In the hands of really brilliant people, it can be helpful in locating suspicious network activity, fraud, money laundering and much more (e.g. look at Linkurious)
Looking for suitable databases, I came across and tested both TigerGraph and Neo4j, but eventually chose the latter: Open source, large developer community, excellent documentation and a fantastic browser. After downloading the developer version, I had a database running within a day, with more than 34 million nodes and 66 million relations and even with a neat little browser guide with query examples.
In my tests, I acquired all data from BigQuery, but I still wanted to extend the database to also contain non-XRP payments, with additional transaction metadata than available in BigQuery and make sure it could stay synchronised with the ledger in near real time.
Setting up the real thing
- Successful payments (in XRP or any IOU) with metadata such as memos, destination tags and much more.
- If a payment created a new account, create a wallet and an activation relationship.
- AccountSet transactions to catch domain names on wallets, for better reference when using the graph database to do research.
Running tests, I populated a database with data from the first 500,000 closed ledgers, but calculating the total time it would take to populate the database with the complete history, it would take several months.
What's next? Speeding up the database population, exploring what it can be used for and making it available, stay tuned for part two!