DevOps @ Fiverr – Scaling the System Monitoring Platform

Fiverr® is a global online marketplace, serving millions of buyers and sellers who generate millions of visits, messages, and transactions. As a business (and for us as engineers) it is crucial to have high-level interrogative visibility into everything that goes on at Fiverr, from the big numbers like the amount of visitors at any given moment, to complex technical details like trends in the actions of our internal services.

Monitoring this Internet marketplace and gaining real-time visibility into traffic, actions, and the user’s experience is one of the central tasks here at Fiverr Engineering. One of the core tools that we use for monitoring at Fiverr is the popular Graphite, which we’ve built a unique integration for using several other open source tools, as well as a few tricks of our own. Alongside many other monitoring tools, Graphite provides us with part of the Fiverr picture. This post will provide an overview of our Graphite integration, which has a central role in our monitoring toolbox.

A simple installation of Graphite provides a powerful graphing engine with a frankly uninspiring user interface, many points of failure, and a complex API that allows an advanced user to generate graphs for specific metrics via long, complicated URLs. Using Graphite in its basic format is best left to system administrators. However, using clustering, a few open source tools and a little bit of our own development, we’ve turned Graphite into a tool that makes it easy to send metrics and pull data, creating monitoring dashboards that serve many teams here at Fiverr.

Basic Graphite Frontend
Basic Graphite Frontend
 
The first stage in our Graphite flow is statsd, a service that makes it extremely easy to send metrics to Graphite from our applications. We use two statsd listeners for redundancy, each running two instances of statsd, serving different types of metrics. Statsd passes the metrics it receives to Graphite internals, two redundant carbon-relays, which pass the metrics to two carbon-caches, writing to two whisper databases serving two Graphite web frontends. Redundancy is very important in our Graphite integration.

As it stands now, no single part of the Graphite system constitutes a single point of failure. This has already proven itself, when an attempt to upgrade one of the frontends failed, but the redundant frontend was still available until we got the faulty web application back online. Up until now, there is nothing particularly special about the Graphite integration as described, asides from the cluster configuration. The really cool stuff is up next.

Graphite Cluster Diagram
Graphite Cluster Diagram
 
Now that we have our powerful and redundant Graphite cluster up and running, we need to take advantage of it by utilizing the system in the best way for the largest number of people. The first step is the installation of the excellent grafana dashboard tool.

Grafana is a web application that allows the user to create informative and useful dashboards quickly and easily, using Graphite as the data source. Using grafana we have created personalized dashboards for various teams in Fiverr, and each team can easily edit their dashboard in order to display the data they need. Most important is the independence in the acquisition of data for everyone at Fiverr, who can turn to grafana to get crucial business information at a glance, and can also use the tool to request and display data that they personally need.

Grafana
Grafana
 
The cherry on top of this monitoring system is our events management system, which is closely integrated with Graphite. A dynamic company like Fiverr needs to manage major internal and external events. Knowing when a major infrastructure replacement occurred, or when a marketing campaign started, offers better visibility on system and business responses to the things we do here at Fiverr; overlaying this data on our graphs provides unparalleled event integration.

This system utilizes two features and one application. First, the event management database is an experimental feature of Graphite, again providing an excellent data backend but no comfortable frontend. Using simple calls to Graphite we automatically submit events like major application deployments to this backend. Grafana is able to display events stored in Graphite over dashboards, as part of their ‘annotations’ feature. The final touch is a simple in-house application, the events submitter, which allows anyone at Fiverr to manually submit an event with details to the backend for display on the graphs.

The Basic Graphite Events Frontend
The Basic Graphite Events Frontend
 
The Fiverr Events Submitter
The Fiverr Events Submitter
 
The final construct gives us informative and tailored dashboards that serve many teams at Fiverr, containing useful data on events, with minimal engineer intervention. And if we do say so ourselves, it’s pretty cool.

Example Event
Example Event
 

The post DevOps @ Fiverr – Scaling the System Monitoring Platform appeared first on Official Fiverr Blog.

Leave a Comment