After working with BigQuery to explore the nature of wallets and payment relationships, I had a crazy idea: visualising the full XRP ledger and all wallet relationships.
A network of 1.4 million wallets, connected by 2.6 million payment relationships—all connected visually in a complex network graph. This article is a small walkthrough of how information transforms into visuals and a few tips and tricks along the way. My computer choked on creating a graph with 50.000 wallets, so to work with 1.4 million required some creativity.
Step 1: Get the wallet balance data
I wanted the graph nodes to represent wallets, sized by the current balance. However, using the transaction history alone, it is not possible to extract an exact wallet balance. So I found it elsewhere. On ledger.exposed Wietse Wind makes a complete ledger wallet export available every 15 minutes. Perfect!
Step 2: Get the payment relationships
Not knowing exactly what data I would need to make a nice graph, except for relationships, I extracted a few extra parameters. I compiled a query in BigQuery to export all distinct payment relationships (Wallet A has sent payments to Wallet B), but also included the number of payments, the sum of all payments made and the size of the largest payment made.
Here is a small snippet for BigQuery, I used to extract all the payment relationships. I have grouped the main query in a
WITH() to be able to order or further query the results (e.g. by adding
WHERE Amount > 1000000). If you would like to break the results down on destination tags, you can add
DestinationTag to the
GROUP BY clause.
If you would like to export more than 100 results, you can change or remove the
WITH payments AS ( SELECT Account AS Sender, Destination AS Receiver, COUNT(*) AS Count, MAX(AmountXRP) / 1000000 AS MaxPayment, SUM(AmountXRP) / 1000000 AS Amount FROM xrpledgerdata.fullhistory.transactions WHERE TransactionType = "Payment" AND TransactionResult = "tesSUCCESS" AND AmountXRP IS NOT NULL AND Destination IS NOT NULL AND Destination != Account GROUP BY Account, Destination) SELECT * FROM payments LIMIT 100
Step 3: Preparing the data
No matter what kind of library or software used for preparing graphs (I use Gephi), it is necessary to prepare the data correctly. A network is made up by nodes (the dots) and edges (the connecting lines).
I wrote a small node.js script to read the JSON wallet list, to export the format I needed, with the information I needed (Wallet address, balance).
I also wrote a small node.js script to run through all the payment relations, to collate, merge and export payment relationships (where Wallet B also had sent payments to Wallet A), and all the extra parameters I exported.
Note that big exports from BigQuery cannot be saved directly from the browser, but will have to be saved in a new BigQuery table and eventually exported as a text file with one JSON object per line—so you will have to read the file line by line.
These processes resulted in two datasets:
- A node dataset of 1.4 million unique wallets with balances.
- An edge dataset of 2.6 million unique payment relationships with the total amount, the total number of payments and the size of the largest payment.
Step 4: Algorithms and layout
I like colourful graphs, and I find the concept of the Modularity algorithm very interesting. So by using this algorithm, I wanted to partition the network and used different colours to highlight the largest of them. From Wikipedia:
Modularity is one measure of the structure of networks or graphs. It was designed to measure the strength of division of a network into modules (also called groups, clusters or communities). Networks with high modularity have dense connections between the nodes within modules but sparse connections between nodes in different modules.
To create an aesthetically-pleasing graph that is also usable for larger networks, I wanted to use a force-directed graph layout. From Wikipedia:
Force-directed graph drawing algorithms are a class of algorithms for drawing graphs in an aesthetically-pleasing way. Their purpose is to position the nodes of a graph in two-dimensional or three-dimensional space so that all the edges are of more or less equal length and there are as few crossing edges as possible, by assigning forces among the set of edges and the set of nodes, based on their relative positions, and then using these forces either to simulate the motion of the edges and nodes or to minimize their energy.
However, knowing that my computer (MacBook Pro 2,9 GHz Intel Core i7) already had issues working with a much smaller network of 50,000 nodes, I had to find a solution to work with the algorithm and layout, without using the Gephi graphical user interface.
Using the Gephi toolkit it is possible to work with the Gephi algorithms and layouts, using Java programming. So I had to “dust off my old Java” and start programming.
First attempt: Making a headless script to load edges and nodes into a graph model, run 1,000 iterations of a Force Atlas layout with specific settings, running the Modularity algorithm, defining node sizes from wallet balance and save to a format I could open in Gephi for further refinements. It worked! Running non-stop for hours through the night maxing out my CPU's.
However, 1,000 iterations were not enough to layout the graph accurately. So next attempt: reusing the current state of iterations but running the process with 2,500 more iterations on an Amazon EC2 compute instance. Perfect solution!
Step 5: Gephi refinement
For the finishing touches, I loaded the export from the headless process and started the refinement process: adding colours to some of the discovered by running the Modularity algorithm, working with node sizes, edge weights, opacity settings and finally rendering.
With a bit of patience, it is possible to do smaller adjustments on such a vast network, using the GUI.
Result: The full XRP ledger, visualised
Made on a black background, all the white stars that can be spotted through the haze of payment relationships are wallets. The 12 colours are 4 different shades of yellow, blue and purple. The radius of the “stars” identifies the wallet's size:
Also, an “XRP” themed, blue haze version with both black and white backgrounds:
And a last one with a little broader colour palette:
To make the whole exercise of wasting many CPU cycles a bit more useful, I did a render without the edges and added some data to it as well: labelling the wallets that play central roles in the graph. Included with and without data, because it also looks quite cool by itself.
What is it good for?
The goal from the beginning of this visualisation was a visualisation—creating a piece of art. However, it is not only good for the art: Large networks are incomprehensible, grasping the mass of things by only looking at numbers. Sure it can be related to other things, we have a better idea about, such as population, but creating a visual representation is often more helpful.
In a vast network like this, the most dominant connections always shine through. But the small sub-networks, with connections by smaller amounts and wallets with lower balances, also create their patterns.
If you would like to explore it yourself, I have made an interactive version of the graph available here. Please note that not all connections are showing. However, it is an easy way to dig into the graph and find the wallets behind the nodes (or locate your wallet on in the “XRP galaxy”).
Update: I have made a zoomable high resolution version available here.