When making a decision regarding which technology to use, we at Fiverr® take into account the following:
- Scalability – Is the technology scalable enough to support our growth?
- Rapid Development – Is the technology able to support our fast development cycle?
- Safety and Security for End Users – Is the technology capable to support our high standards of safety and security?
- Open Source – We love open source technologies.
Here is an example of how we decide which database technology to use.
When choosing the right one for the job, there are some things that should be considered. But before jumping into that, let’s take a moment to better understand the difference between a relational database and what is a NOSQL database.
Relational Database
- SQL stands for Structured Query Language, which handles and processes data in rational databases.
- Its definition was first made in 1970 by Edgar Codd of IBM.
- The data in relational databases is stored on tables with rows and columns.
- Tables have relations between them that can be fetched together by a join.
- These database schemas are predefined and normalized to prevent data duplication.
- Relational databases support atomic operations.
- ACID = Atomicity, Consistency, Isolation and Durability.
NOSQL
- NoSql does not support JOINS.
- No atomic inserts (for multi documents).
- Flexible schema.
- BASE = Basically Available, Soft state and Eventually consistent.
When choosing your database, you need to ask yourself, “What will be the most popular queries on this data?” If the answer is,“I will need to get all the data given this ID” then NoSql is the answer for you.
Let’s look a real world example. Let’s say your site has users, and those users have a profile page. That page may contain some general information about them, like their site preference, the languages they can speak, their description, their skills, etc. I would assume that 100 percent of the time, if the user or other users are accessing it’s profile page, you would want to read all that data at once, given the user id.
That would be a nice data model to use NoSql database — let’s say mongo database. That way all that data will be stored on one place on the hard drive, in one json document, and you won’t need to search for it in various places as you would have in a relational database.
A user profile is usually something that can change over time. So let’s say you would add a link to the user like articles. Now, in a NoSql database what you would do is to simply adjust you data scheme without any “ceremonies”. The data will still save in one place on the hard drive, as if you would have used a relational database, and you would probably have to add an additional column or create a different table (that would save in a different place on the hard drive) and join all this data together.
You wouldn’t be too worried about atomic operations (all or nothing save, which are not supported in NoSql), because this data, as important as it is, is not critical to keep your site running. So if I’m updating the user profile and its address (which can be stored in another model), I don’t care, relatively speaking, if one fails and the other doesn’t.
(Side note: The story gets complicated if you think you would have to query for the most liked articles on you site, and that type of query would be difficult to achieve on the NoSql data structure)
Now, let’s say that you charge our users a monthly fee to use your website. When a user gets to a payment page, you would want to create a record (let’s say a token) stating that user got there and still hasn’t paid.
When a user pays, we get a confirmation from the payment provider and we:
- Create a transaction for the user (notating that he paid).
- Create a transaction for the payment provider (Their commission).
- Create a transaction for any referrer that might send the customer your way.
- Create the subscription for him.
This is straightforward important financial data, the kind you will probably want in a relational data since the type of questions you would ask it are:
1. How many users got to payment page and didn’t pay?
(Cross-reference that with a cut around the payment provider—you might have more than one.)
2. What kinds of users make purchases and where are they from?
3. How many subscriptions I have? Above a month? Above a year?
4. How much money did I pay to this referrer plan?
These are all cross-table queries that will be close to impossible to perform using NoSql.
On top of that, once a user pays, you would want to create all of the records (1 – 4), or nothing at all, since you won’t want to pass only some of the money around, would you? Money is very important to people so you would want to make sure data integrity.
A side note: After notating these, examples, you should now better understand the considerations in selecting our databases.
Of course real world examples are not all black and white and you should analyze your needs carefully before choosing.
Let’s move on with our example. Your site has articles in it. Those articles:
- Have or don’t have videos.
- Have or don’t have pictures.
- Have read count.
- Have a category they belong to.
- Have rating on each one of them.
- Have other attributes like their language (if you site enables multi languages), or number of outside URLs.
When thinking about the type of questions you would ask a database, think about the top 10 highest rating articles in that category, which articles have videos, which have photos and videos, which articles have been read the most, and the articles with the most number of outside URLs.
Of course, all of these questions can be cross-referenced with the language they are written in (for example, “What are the top French articles in this category?”).
One solution could be is sql. You would create the articles, article_photos, article_videos and article_languages tables, connect them with a foreign key, and join those four tables when you would want to query that data.
Assuming that you pull all or nothing when you want to pull an article data (meaning that, mostly, when you show it on your site, you would want to load its images, videos, languages, author, etc.), there is a better solution.
Store all that data in a vertical JSON document (like mongoDB), access it by ID only. Now, index all that in a Search specialized DB like SOLR or ElasticSearch (which are also included in the NoSql family). That way, you can ask all your questions and get your answers very fast, and when you fetch your article from the database by it’s ID, you know that would be fast because all that data will be in one document.
Of course here, at Fiverr, we use all these technologies as engineering solutions for our fast growing marketplace.
The post How We Choose the Right Tools at Fiverr appeared first on Official Fiverr Blog.