Growing data is changing the way that companies operate.
Business leaders view their data and the ability to analyze that data as a competitive advantage. This data is coming in from many different sources and unstructured data is increasing.
Traditionally, an enterprise housed most of its data in an ERP system running on a relational database management system (RDBMS) such as Oracle, DB2, and Microsoft SQL Server.
That data is very structured and fits neatly into carefully planned rows and columns. That data was relatively easy to deal with and application performance could be forecasted, managed, and optimized.
There are new data sources such as the internet of things (IoT), mobile devices with geo-data, social networks, and more.
This data is no longer as predictable and clean as your structured data. It no longer fits neatly in your RDBMS.
Database and application developers are turning to new tools to work with and extract value from this data. Those tools are often NoSQL databases.
What Are NoSQL Databases?
NoSQL does not mean anti-SQL. It is short for Not Only SQL, suggesting that you do not abandon your enterprise RDBMS but add to it with NoSQL. Many NoSQL databases are open source, which helps increase adoption. There are enterprise versions available that offer improved security, reliability, scalability, and management.
Because these databases are open source, it is hard to track how many databases are in production. However, organizations that provide the enterprise-class distributions are growing extremely quickly. MongoDB claims to be the fastest growing database ecosystem in the world. Redis Labs claims that Redis is the world’s most popular NoSQL database. And Neo4J states they are the most used graph database.
There are many reasons why NoSQL has gained so much traction. Some of those are:
- Better flexibility – They can handle a wide array of structured and unstructured data.
- Easily adaptable – You do not have to define a complex data schema that can be difficult to change.
- Highly scalable – The database can grow with your business and data requirements.
- Incredibly fast – Many of these run in RAM, which reduces storage lag and latency.
NoSQL is a broad category of databases. There are 4 primary categories of NoSQL databases:
Graph databases are based on connections. An easy way to picture a graph database is to picture your connections on a social network like LinkedIn. You can see how many first, second, and third level connections exist. Another use case could be recommendations on your favorite eCommerce site. These systems compare your data, previous purchases, and items you’ve viewed to other shoppers with similar behavior. It then creates suggestions for products that you might be interested in.
Neo4J is the most used graph database. Development began in 2000, but it was not released to the open source community until 2010. Since then, they have been very devoted to the open source community. They recently released their Cypher graph language to the open source community for use in other graph databases.
Key Value Store Database
The key value store databases are often the entry point for developers exploring NoSQL. These databases treat the entire data set as a single collection that can have many different fields and records. Originally, they rose in popularity as a cache database. When dealing with entire data sets, they can be incredibly fast.
Redis is one of the most popular databases in this category. Redis, along with several other NoSQL databases, operate in RAM. As data sets grow, unique hardware challenges can arise. It can get very expensive to run a 40 TB database in RAM. That is why infrastructure managers are turning to solutions such as IBM’s, which allows you to extend RAM to flash storage through their CAPI I/O.
Document Store Databases
Unlike relational databases where you have to define the data schema up front, document databases store data within individual documents. Documents can contain many different data types and can even contain additional nested documents. This allows your developers to be more agile. They can incorporate new data or modify how they work with data mid-stream without completely rebuilding the database.
One of the most popular document store databases is MongoDB. Like Redis, MongoDB operates within memory. As these databases increase in popularity and move toward mission-critical applications, it will be important for you to plan for that large increase in RAM requirements.
Wide Column Store Database
Wide column store databases can be similar to an RDBMS. They store data in rows and columns resembling a traditional data table. The difference is that wide column store databases can store incredibly large amounts of data in each column. The type of data stored in each column can also differ across rows.
These databases lend themselves to large clusters of servers with massive databases. In a clustered environment they can achieve incredible performance with extremely high availability. Google and Facebook are two organizations that use and improve wide column databases.
What Does This Means to the Hardware Team?
We know data is growing and extracting value out of that data is a business imperative. The tools to do that are changing the way we look at infrastructure. Commodity, scale-out servers could prove to be too expensive in this new environment.
IBM has invested heavily in the NoSQL community. They work closely with the independent software vendors in order to optimize their solutions on IBM Power Systems running Linux. Redis Labs sees the benefits in the increased performance. On their site, they provide a detailed solution overview.
To learn more about how you can leverage open source, download your guide here.