One question I am frequently asked is: why do we offer so many different database products? From my point of view, the answer is simple: developers want their applications to have a solid architecture and to be able to scale efficiently. To do this, they need to be able to use multiple databases and data models within the same application.
Few are the cases in which a single database can cover the needs of multiple different use cases. The days of monolithic databases that were valid for everything are over. Today, developers create highly distributed applications using a host of databases structured specifically for them. Developers continue to do what has always been their specialty: breaking complex applications down into smaller components, thus choosing the best tools to solve every problem. The best tool for a task usually varies depending on the use to be made of it.
New comparative Databases for 2020
For decades, with relational databases being the only option available, regardless of the form or function of the data contained in the application, we modeled the data following a relational model. Instead of being the use case that defined the requirements that the database had to fulfill, the reality was the other way around. It was the databases that imposed a data model, which the applications had to use. But are relational databases not specifically designed for non-standard schemas and to ensure referential integrity of the database? Of course, but the key I want to get to is that not all application or use case data models fit within the relational model.
As I have previously discussed, one of the reasons that led us to create DynamoDBAt the time it was that Amazon was reaching the limits of what was possible with what was one of the best databases for businesses of its time. We were unable to meet the availability, scalability, and performance needs to be demanded by the rapid growth of Amazon.com. Investigating, we discovered that approximately 70% of our operations were key-value queries, in which only one primary key was used and the result was a single entry. By not requiring referential integrity or referential transactions, we concluded that it would be better to respond to these access patterns using a database that follows a different model. Also, given the rapid growth and large scale of Amazon.com, Having virtually unlimited horizontal scalability became a key element in the design of our solutions. Scaling up as needed was simply not an option. Ultimately it was this that led to the creation of DynamoDB, a non-relational database service designed to scale beyond what is possible using relational databases.
I do not mean to say that relational databases cannot be of use in current development, nor are they incapable of offering high availability, scalability, or performance. Quite the contrary. In fact, this is something our customers have made clear, as Amazon Aurora continues to be the fastest-growing service in AWS history. What we experienced with Amazon.com was an example of using a database beyond its initial purpose. This type of lesson is one of the keys to this article: databases are created for a purpose and aligning the use case with the nature of the database will help you accelerate your high-performance application development projects and great functionality and availability.
Custom designed databases
The world is constantly changing and, in a similar way, the types of non-relational databases continue to increase in number. With increasing frequency, we see how our clients seek to create applications for use over the Internet that require various data models. To meet these needs, today’s developers can choose from databases such as relational, key-value, documentary, graph-oriented, in-memory, or search. Each of these types of databases allows solving a specific type or types of problems.
Let’s see in more detail the purpose of each of these types of databases:
- Relationship databases: a relational database is self-describing in that it allows developers to define the schema of the database, in addition to the relationships and limitations between the rows and tables that make it up. By using relational databases, developers use the functionality of the database (and not the application code) to implement the schema and preserve the referential integrity of the data. The most common use cases for relational databases include mobile and web applications, business applications and online gaming platforms. Airbnb is a great example of a client that has built highly scalable, high-performance applications using Amazon Aurora. This solution provides Airbnb with a fully managed, scalable, functional service to cover its MySQL workloads.
- Key-value databases: These databases are very easy to partition and allow to scale horizontally with an ease that other databases are unable to achieve. Environments such as video games, advertising and the Internet of Things lend themselves to databases structured around this model, which requires queries and data entries with very low latency for already known keys. DynamoDB’s goal is to deliver latencies of a few milliseconds with high homogeneity, for workloads of any scale. This consistent and consistent performance is a very important element in the success of the Snapchat Stories feature., which is the functionality with the highest volume of writing to storage of all those Snapchat operates and that the company has recently migrated to DynamoDB.
- Documentary databases: Documentary databases are intuitive for developers to use because application-level data is often represented in the form of JSON documents. Thus, developers can persist data using the same document format and model that they use in their application code. Tinder is an example of a client using the flexible schema models DynamoDB offers to increase the efficiency of their development.
- Graph-oriented databases: the purpose of this type of database is to facilitate the development and operation of applications that work with groups with high levels of interconnection. Examples of use cases for graph-oriented databases include social media, recommendation engines, fraud detection systems, and knowledge graph generators. Amazon Neptune is a fully managed graph-oriented database service. Neptune allows working with models like Property Graph and RDF (Resource Description Framework), giving the developer the ability to choose between two APIs: TinkerPop and RDF / SPARQL. Thus, Neptune provides our users with the tools to create knowledge graphs, offer recommendations within their games and detect fraud. Thomson Reuters, for example, uses Neptune to advise its clients, helping them deal with the complex global structure of tax policies and regulations.
- Databases in memory: sectors such as financial services, eCommerce, web pages and applications often present use cases such as live rankings, purchase sessions and real-time data analysis, which require response times of a few microseconds and can present large spikes in traffic at any time. For these cases, we created Amazon ElastiCache, which offers the Memcached and Redis systems, with which to respond to workloads that require low latency and high bandwidth, as is the case of McDonald’s, which cannot meet their needs using conventional disk storage systems. Amazon DynamoDB Accelerator (DAX) is another great example of a custom data storage system. DAX was created to make queries through DynamoDB several times faster.
- Search databases – Many applications generate log files to help developers identify and resolve potential problems. The Amazon Elasticsearch (Amazon ES) service has been specifically created to offer visualizations and analysis of machine-generated data sets in near real-time, by indexing, aggregating, and searching measurement files and semi-structured records. Amazon ES is also a powerful, high-performance search engine for text searches. Expedia for example, it uses more than 150 Amazon ES domains, 30 TB of data and 30,000 million documents to cover a whole series of applications critical to its operation, ranging from monitoring its operations and solving possible problems, to stack tracing of distributed applications, through optimization of prices.
Designing applications through the use of custom databases
Developers continue to create high tiered, unlinked distributions, and AWS gives them the ability to create these cloud-native applications using multiple AWS services. Take Expedia as an example. Although in the eyes of the consumer the Expedia page seems like a single application, its operation involves a large number of components, each with a specific function. By breaking down an app like Expedia.com into multiple components for specific purposes (like AWS Lambda microservices, containers, and functions), developers increase their productivity by increasing their scalability and performance, reduce the number of their operations, increase their agility in implementing and can evolve the various components independently. In addition, when creating applications, developers have the ability to choose the database that best suits each use case.
To see what this translates into real use, let’s see how some of our clients use different databases to create their applications:
- Airbnb uses DynamoDB to store users’ search histories so that they can perform quick queries as part of personalized searches. In addition, Airbnb also uses ElastiCache to store sessions in memory for faster web page display, and uses MySQL on Amazon RDS as its primary transactional database.
- Capital One uses Amazon RDS to store transaction data and to manage states; Amazon Redshift to store weblogs for analysis that require aggregation and DynamoDB to store user data, allowing users to access their data quickly through the Capital One application.
- Expedia has created a real-time data warehouse with accommodation market prices and availability for internal analysis using Aurora, Amazon Redshift and ElastiCache. This data warehouse performs multi-channel agglutination and self-aggregation with 24-hour re-query times using ElastiCache for Redis. In addition, this data warehouse also makes processed data persistent by integrating it directly into Aurora MySQL and Amazon Redshift to provide Expedia with the ability to perform operational and analytical queries.
- Zynga migrated the Zynga Poker database from a MySQL data center to DynamoDB, which saw its performance increase dramatically. Queries that required 30 seconds can now be made in one. Additionally, Zynga uses ElastiCache (Memcached and Redis) to replace the equivalent systems it used to manage internally for its in-memory data cache. Aurora’s serverless automation and scalability make it Zynga’s first choice for creating new services that use relational databases.
- Johnson & Johnson uses Amazon RDS, DynamoDB, and Amazon Redshift to minimize the time and effort it takes to gather and provision your data, and extract intelligence from it more quickly. Johnson & Johnson’s database services streamline the work of medical staff, streamline their supply chain, and accelerate the discovery of new medications.
In the same way that they have stopped developing monolithic applications, developers are giving up using a single database for all possible uses of their application. Instead, they use several. Although relational databases are still in good health and very valid for many applications, databases created specifically for models such as key-value, documentary, graph-oriented, in-memory, and search can help you optimize your functionality, performance, scale and, most importantly, the experience of your customers. Do not stop creating.