Perhaps as many of you who also hold XRP, I am often on the lookout for information about it and Ripple both.

I am on Twitter, where I follow a few who work at Ripple, and try to keep up with various company announcements and some of the more interesting tweets and exchanges. I watch YouTube videos about Ripple's presentations at conferences or interviews. For entertainment, I also scan a few channels by crypto enthusiasts who display some welcome humor. I confess that, in an environment rife with the rants of immature maximalists, FUDsters, and others of equally unsound persuasions, I prefer those who do not take themselves too seriously. I am on Quora, where I try to answer XRP and Ripple-related questions, on the XRPChat board, and, clearly, I am on the XRPCommunity blog, which I am finding quite useful in elevating my knowledge about a variety of 'things XRP.' For this, I want to thank the other contributors.

Again, as you, I also have visited Ripple's site many times. I read Ripple Insights, and I try to learn about xCurrent, xRapid, xVia, liquidity, payment processing, decentralization, and validators. I also look at their Careers page, trying to deduce something useful about where the company may be headed from specific postings -- how many, to do what, where located, and how long to fill.

Of course, I also peruse data. One of the pages I was originally attracted to on Ripple's website is XRPCharts.

XRP Charts: the 30,000 ft overview

Many here are probably quite familiar with the site in question. For those who may not be, here's a bit of a recap of the site's hierarchical structure, which may not be immediately obvious.

When you land on the XRPCharts page, you can see four main tabs: Markets, Network, Accounts, and Transactions. The top level view is then:

XRPCharts - top level view.

  • At the Transactions tab, you see a real-time feed of -- you guessed it -- transactions. You can also enter a transaction hash, and get additional information.

  • Under the Accounts tab, you use Account Explorer. This allows you to select an address, and you get a trustline visualization, in addition to information about balances and transaction history.

  • Under the Markets tab, the focus is on real-time XRP price info, a view of several markets in the US and abroad (in various currencies), trading volume, and so on.

  • Finally, we come to the Network tab, to me the most interesting. Here are its subtabs:

XRPCharts - Network tab view.

Under the subtab for Value Trends, we find a chart for payment volume, trade volume, and capitalization, all tied into a number of trading platforms and exchanges, and also click-selectable as to currency. The Historical Volume subtab displays both exchange and payments volume, again selectable as to time frame, currency, etc.

I find myself spending time on three specific Network subtabs -- Metrics, Topology, and Validators. Topology gives you a map view of nodes, weighted by uptime or connectivity. One can see the average number of connections (both inbound and outbound), the version of 'rippled' being used, and how long (in days) the longest connection has been up. The Validators subtab shows their IP address, domain, and how closely their individual validation of ledgers agrees with the consensus.

Under Metrics, we find Transactions, Ledgers, Ledger Close Interval, Payments, Exchanges, Network Fees (average and total), Transactions by Type, and Transactions by Result -- the latter with several categories indicating success or a number of types/reasons for failure or non-completion. The Metrics subtab view within the Network tab is this:

XRPCharts - Metrics subtab (under Network tab.)

Several charts have various 'configuration' levels built in, and some have legends that involve a degree of guesswork, as a screenshot of the Transactions by Result chart shows:

Metrics - Transactions by Result (screenshot.)

This is not meant as a criticism, but more as an illustration.

These data are all collected and put out by Ripple. A disclaimer at the bottom of each page states:

XRP Charts provides information based on public data.
Information is provided "as is" and solely for informational purposes only. XRP Charts is not a trading advisor. Ripple does not endorse, recommend, or make any representations with respect to the gateways and exchanges that appear on XRP Charts. Data may be delayed or incorrect. Ripple reserves the right not to include transactions in XRP Charts that it believes are not bona fide, e.g., wash sales where there is no change in beneficial ownership.

When you view all this, are you confused, or do you find it all intuitive and easy to follow? Do duplication and overlap exist among the charts? Most importantly, having viewed these charts, does a clear picture emerge for you, the stakeholder in XRP, of the current health of this digital asset (DA) and its prospects? Or, are you still dependent on the written quarterly reports for a business-focused interpretation of these data in the aggregate?

Looking for direction.

The four Vs of big data

There's little doubt that payments data, especially on a global scale, qualify as big data. Big data is an umbrella term used to denote the vast amounts of data that organizations collect, store, and analyze to arrive at business decisions.

Big data are often characterized by what are known as the 'four Vs' (#1-4 below), and, increasingly, the 'five Vs' (#5 below), to wit:

  1. Volume: amount of data collected -- 1 zettabyte = 1000 exabytes = 1,000,000 petabytes = 1,000,000,000 terabytes = 1,000,000,000,000 gigabytes. Various forecasts of yearly Internet traffic put it at about 2 zettabytes for 2019.
  2. Variety: types of data and sources (structured, unstructured, multimedia, etc.)
  3. Velocity: how fast do the data change, and what is the response time needed.
  4. Veracity: how uncertain are the data collected (due to latency, misdirection, ambiguity, incompleteness, approximations in models, undocumented filtering, etc.)
  5. Value: how data and their intelligent use can impact decisions, profits, etc.

How might these categories apply to the data behind XRPCharts?

From raw data to analytics to information to insight

I've posted before about quality, defects, and processes.2 Good or bad as they may be, and recognized as such or not, processes go hand in hand with data.

Good processes yield useful data, that is, data that can be used for decision making. Good data allow one to redesign processes properly and act based on them with some hope of success. It is a virtuous circle of mutually reinforcing and interacting elements. On the other hand, bad data are useless, although many decisions today are still based on poor data. Bad processes reflect many ills, including corporate apathy, broken thinking, resistance to change, and a general lack of understanding of a business. Bad data and broken processes tend to go together as well.

To get to the stage where one can truly develop meaningful insights requires collecting raw data (from reliable processes), manipulating them into useful derived data, and from there via analytics into aggregate information a business can use to act upon and grow.

When I look at all the graphs on XRPCharts, I wonder about the data behind them, and what this may mean for the charts themselves. How do the 4-5 dimensions of big data apply here? Can looking at data under this lens add anything to our understanding?

The 4Vs revisited

Simplifying, on the site we have current and historical price data, transaction data, network health data, and customer/wallet data. Their volume is high and will undoubtedly increase, at rates one cannot easily forecast, once the network grows from several hundred to thousands and tens of thousands of nodes, and to a proportionate volume of transactions. The velocity is high for some data (price), perhaps less so for other data (network connectivity and node uptime).

Veracity of the data is interesting. Disclaimers notwithstanding, how accurate and reliable is the information, how complete and unambiguous, how proofed against bad actors, and how well vetted are the sources? If, indeed, these are 'public data' tagged with a number of caveats and qualifiers, do better data exist somewhere? Might it be interesting to classify incoming data as high/low veracity via machine learning, and use them accordingly in strategic planning?

As to variety, there may be many data out there which simply haven't been collected. What about all the sentiment data, derivable from Twitter, Reddit, and other social media platforms? What is being ignored that could be helpful to strategy? And is machine learning being used for data visualization, unsupervised segmentation, and finding patterns that may make higher-level analysis more robust and eventually adaptive?

A word about validators and basic stats

What type of statistical analyses are being conducted on the data? For example, are there statistically significant differences between unverified and verified (or Ripple) validators as to their respective percentages of validated (or rejected) ledgers that eventually passed (or didn't pass) consensus? What does unverified mean or imply? Do all unverified validators eventually become verified? If so, what is that process? Is geographical location of the validators a factor in their degree of agreement with consensus? Would a clustering algorithm find something other than unverified and verified categories to classify validators by? Is it possible to predict how new validators may perform based on knowledge of current ones? This would seem useful to find out for nodes in general, as 'good' network growth is central to survival. What else is at play? Given Ripple's focus on decentralization, how can one plan a validator addition strategy that, going forward, is optimal in minimizing time to achieve a target number while also not weakening the robustness of consensus?

Averages are mentioned (uptime, etc.) What about deviations from them? Might an understanding of these not yield additional insight? Surely, it is all derivable from the raw data. A lower average node uptime with a narrower deviation might be 'better' than a somewhat higher uptime with a much broader deviation, the latter indicating a process that is not well understood or under statistical control. What insight could be derived from this?

What does the big picture look like?

How are the data being put together with an understanding of current processes to get a complete picture of what goes on, which also would make future process changes sustainable? Is data governance in place to oversee all this, and to sort out the quality and utility of different data streams and to define processes for integrating data from a variety of sources and extracting maximum value from them?

Reasons for collecting data, or fitness for purpose

Another point to keep in mind is this: why does one collect data? It seems a simple enough question, yet not everyone can answer it clearly. Presumably, it is for a combination of reasons, having to do with auditing and compliance, developing a historical record as part of the intrinsic value-add of an organization, and, clearly, to act upon them for process improvement and to take decisions. To act upon them means knowing what data to collect, how often and from where to collect them, and with what granularity.

I have been in situations where the customer had confidently and at great expense collected vast amounts of data over the years, storing them without giving much thought to what they would be used for. As such, certain data were only collected and time-stamped at the day level, which later prevented useful historical analysis and an understanding of sequencing -- modeling the order of events -- at the hour or minute level. Sometimes a gargantuan effort can fix that type of problem, but not always.

The other side of the coin is knowing what not to collect, and where to focus one's analytic efforts, considering resources are typically limited.

In both instances, understanding fitness for purpose where data are concerned is essential and should not be considered separately from process aspects.


These are some of the questions that come to mind as I scan the XRPCharts site, although I admit I often seem to think more clearly when I'm out for a walk in nature and I can come at things from a different angle.

A different angle.

As a holder of XRP, and someone who is trying to make sense of what I see, I want to have access to the best information available that, from a business standpoint, will allow me to reach intelligent decisions as to my admittedly speculative investment. I contrast some of the information found here -- or not found here -- and its presentation with that usually brought together in the format of a balanced scorecard. I hope to address this in more detail in a coming post.

From the perspective of XRPCharts as a window of sorts into what Ripple is collecting and posting, I wonder how representative these data may be of what is used internally at the company. The data are all 'public', 'as is', and come with caveats, and it is clear they are still closer to a dump of raw time series transactional data than to derived data or to analytic, heavily processed information. It is unclear who the intended audience may be, despite the well-intentioned effort and stated purpose, which is informational. But, information means different things to different people, and more clickable features do not necessarily translate to greater clarity, and what may be deemed 'informational' depends on who the viewer is and the degree to which raw data are processed and how they are presented for a specific purpose. In that sense, I believe work remains to be done, and I hope some of these comments are helpful in drawing attention to the topic.

I will prefix my last comment by saying I am not trying to leap to unwarranted conclusions. Still, given the above and the specific areas of focus of Ripple's job postings to date, I do wonder if data and processes are being managed internally in a complementary manner so as to be of maximum benefit, what emphasis is placed on data governance and analytics at this time, and what role technologies such as machine learning may play going forward.



All images © 2000-2018 Dario Boriani.