There has been a lot of discussion in the tech community about NoSQL databases lately.
The way that people have been talking, you would think that IBM, Oracle, SAP, and Microsoft’s relational database management systems (RDBMS) businesses are dead.
The truth is that relational databases are not going away. We have simply been given access to more data than ever, and we need new ways to store, manage, and process it.
Why is NoSQL (or Not Only SQL) growing
Traditional RDBMS providers are still the kings of the operational database market. The data they hold are in a structured schema that simplifies transactions and reporting.
Each data type has a home and a defined relationship with other data types. Relational databases are still ideal when you have very structured data that you can control, like in your ERP system.
However, we are dealing with more and more unstructured data. As you may have experienced already, storing, managing, and analyzing that data can become a nightmare. If you think about a messaging application that goes across multiple mobile devices, a traditional RDBMS would be incredibly difficult to work with.
However, Cassandra, a wide column NoSQL database, can provide a powerful, scalable platform.
Organizations in every industry are looking for ways to gain new insights from social media. The problem is this data comes streaming in from multiple sources. Friends lists, post content, subjects liked, and places visited. In order to maximize the value of that information, it needs to be analyzed in real time. These results could crush an RDBMS, but a graph database such as Neo4J could be ideal.
Web 2.0 organizations drove the charge. Facebook, Google, and Amazon began generating and collecting massive stores of data. They wanted the ability to do more with their data in real time. And now most organizations want to use that data as well. A study by Accenture revealed that 84% of organizations believe that big data analytics will shift the competitive landscape of their industry.
To take advantage of the data that does not fit nicely into a structured, relationship model, you will need new tools. And unlike an RDBMS, you will most likely need more than one tool based on the applications you are deploying and the data you are using.
Let’s take a look at the four primary NoSQL database categories and some of their use cases.
Graph NoSQL Database
We increasingly live in a connected world. All you need to do is look at the affect social media has had on news distribution. Facebook, Twitter, and LinkedIn are built on the relationships in our lives. They rely on our declared connections—the friends list. They also use additional information to infer connections based on likes, hobbies, or mutual friends—the suggested friends list.
Based on the name, you might think that relational databases would be the platform of choice when defining datasets based on relationships, but that is very misleading. RDBMS were developed to replace paper forms and reports. While they use relationships, they do this to connect one table to another. This view does not go very deep into the intricacies of the relationship.
Graph NoSQL databases, on the other hand, are based on the relationship. Everything else is second. Because of this, graph databases are extremely efficient at processing and analyzing relationships or potential relationships.
Platforms: Neo4J, OrientDB
Use Cases: Neo4J is a leader in the graph NoSQL space. On their website, they list out these use case examples: matchmaking, network and IT operations, fraud detection, software analytics, scientific research, routing, organizational and project management, recommendations, and social networks.
Document Store NoSQL Database
As mentioned, relational databases were designed to replace paper forms and reports. They are a way to take tables of information and efficiently store that information. The data schema needs to be defined up front, as it is time consuming to change.
In a document database, the record for each item is considered a document. Document databases store data within the document itself. Unlike relational databases where you define all of the data up front, in document databases, new categories or datasets can be defined within that document. This frees you from having to define every possible dataset from the start.
Consider a product catalog for an online retailer. Relational databases have been used in these scenarios for years. Unfortunately, they are not very agile and can limit your ability to grow.
Maybe you started your business originally as an online book retailer. Your data schemas were defined around those needs. You define title, author, ISBN #, and publish data among others. Here is one of your titles:
Title: The Hitchhiker's Guide to the Galaxy, 25th Anniversary Edition
Author: Douglas Adams
Publish Date: 08/03/2004
This model works just fine as long as you are only selling books. But perhaps you want to expand into records. You could probably do that fairly easily by making some changes or adding a couple of fields. But what happens when you begin carrying baby clothes, fine wine, hiking boots, Epsom salts, and canned unicorn meat?
Platforms: MongoDB, Couchbase.
Use Cases: MongoDB is a leader in the document store NoSQL space. On their website, they list these use case examples:
- Product Data Management: product catalog, inventory management, and category hierarchy
- Content Management Systems: metadata, asset management, and storing comments
- Operational Intelligence: storing log data, pre-aggregated reports, and hierarchical aggregation
Key Value Store (or Key Value) NoSQL Database
The key value store database is among the most popular of the NoSQL movement. In an RDBMS, the data structure is a defined set of tables and fields for each data type. Key value databases consider the data as a single collection that can have many different fields and records.
Because the data structure is far less rigid, key value databases lend themselves to today’s agile application development processes. They also do not have to store as much data as a relational database does. There are also no forced placeholders, for example.
The smaller databases along with increased RAM capacity have cleared the way for in-memory versions of key value databases. These databases run datasets completely in RAM or similar high-speed layer.
This database is often called a cache server because that is an ideal use case. As a cache server, sessions, shopping carts, or pages can be cached to greatly improve customer experience.
Platforms: Redis, DynamoDB
Use Cases: Redis is the leading NoSQL database on the market. It is an in-memory, key value store. Some use case examples include: cache server, session store, key worker, counters, leader boards, and job management.
By the way: Redis can also exploit some technologies from IBM that can allow it to view 40TB of flash as if it were RAM. This allows users to achieve amazing performance on larger databases than they may have thought possible.
Wide Column Store NoSQL Database
This category of database is the most similar to an RDBMS. Data is stored in a table style. They contain both columns and rows. Research firm Gartner actually calls this category “table-style” NoSQL. The primary difference is the large amount of data that can be stored in each column and that the columns can differ across rows.
These databases are designed to handle very large volumes of data while being highly available with high performance. These databases can run on clusters of many servers.
They are best suited for large databases that require more than just a single server. Two major organizations that have driven wide column database advancement include Google and Facebook.
Google uses Bigtable as the backend for many of their products including Google Earth and Google Finance. They refer to it as a distributed storage system for managing structured data. It is designed to scale to incredible sizes with petabytes of data spread across thousands of servers.
Cassandra is a wide column NoSQL database that was designed by Facebook to build on the advancements made by Bigtable. Cassandra is highly available while providing extreme read/write performance. They are also decentralized. Due to its performance, Google built the Google Compute Engine based on Cassandra.
With Cassandra, Google was able to achieve one million writes per second spread across 330 virtual machines with a median latency of 10.3 ms. Even if they lost one-third of their environment, they could maintain the one million writes per second with a higher latency.
Platforms: Cassandra, Bigtable
Use Cases: Casandra is probably the most widely used wide column NoSQL database. Here are some use cases from their website: fraud detection, messaging, sensor data and internet of things (IoT), playlists, and web application personalization.
The Right Tool for the Job
As technology advances and our dependence on data increases, we will continually be forced to look for new ways to handle the information overload. These are just additional tools in the developer’s tool belt. And they are being freed up to use the right tool for the job.
Relational database management systems are not going away. That is why NoSQL is often called Not Only SQL. Multiple data stores will continue to be used.
NoSQL databases are built on open systems. They will continue evolve and new distributions will emerge.
In future blogs, we will examine how NoSQL databases fit into the big data picture. We will explore their relationships with Hadoop and Spark for analytics. Subscribe to our blog in the form above so you never miss a post.
The Importance of Infrastructure
NoSQL databases can help you store, access, and analyze data in high performance environments. These systems are critical to your business.
They are the foundation for your competitive differentiation and customer experience.
To maximize your NoSQL deployments, we have put together an eBook for you on how to use optimized infrastructure to get the most out of your data. Get your copy today.