In this article, I will establish a base level of understanding about data and database technology and describe how they relate with each other. Then, I will discuss how many complete solutions for data management are available and optimized on Linux on IBM Power Systems.
Before diving into these topics, it’s important to understand the two different types of data most commonly encountered, stored, retrieved, and processed by business and individuals.
Structured Data and Unstructured Data
1) Structured Data
Data that resides in a fixed field within a record or file is called structured data. Examples of structured data include entries in a table containing individual employee information (e.g., first name, last name, address, title, date of birth), data in spreadsheets (issue, owner, next action, date), or other types of data that can be described and then stored in a fixed, relatively static format.
Structured data also includes definitions or rules around what fields of data will be stored and how that data will be stored: data type (numeric, currency, alphabetic, name, date, address) and any restrictions on the data input (number of characters; restricted to certain terms such as Mr., Ms. or Dr.; Male or Female).
2) Unstructured Data
Unstructured data refers to information that either does not have a pre-defined data model or can’t be organized in a pre-defined, static, or consistent manner. Examples of unstructured data are pictures, videos, movies, sensor data, sound recordings, and streams of weather data.
While many Fortune 5000 business are well-versed in storing, retrieving, processing, and analyzing structured data, unstructured data makes up the highest volume and the biggest growth being generated and processed by far.
Storing Data with Relational Database Management Systems
Relational Database Management Systems (RDBMS) store data as tables and utilize a relational model for the data. Applications or developers and programmers retrieve data via a Structured Query Language (SQL). Often, relational databases are visualized as very large tables consisting of row after row of unique, structured records accessed by primary and secondary keys.
Combined with decades of research and development by a massive ecosystem of partners and customers, the relational model of mathematics that RDBMSs are based on has resulted in a set of efficient and reliable commercial and open source products and solutions in the marketplace. A large network of database administrators who are RDBMS experts worked with experienced programmers and developers using SQL. Together, they make RDBMS perfect for applications that require the highest levels of performance and transactional integrity.
Relational Database Management Systems are available in two forms. The first is commercial, which comprises of proprietary products from software vendors (for example, IBM’s DB2 or Oracle’s Oracle DB). The other is Open Source RDBMS, such as MariaDB, EnterpriseDB, and Postgres. It should also be noted that newer database technology, like that found in SAP’s HANA product, store and retrieve data in a columnar format.
The downside of RDBMS includes the fact that it can be very expensive to purchase and maintain. They are also not optimized to store and retrieve specific types of unstructured data very commonly found in today’s business environments.
The NoSQL Approach
The NoSQL approach provides a way to store and retrieve data modeled in means other than the tabular relations used in relational databases. NoSQL databases are optimized to store and retrieve unstructured data. They don’t rely on just the SQL programming language. By design, NoSQL databases and management systems are relation-less (or schema-less). They are not based on a single model (e.g. relational model of RDBMSs). Depending on their target functionality, each database adopts a different approach to storing and retrieving data.
Many NoSQL databases are offered in both Open Source variants and commercially-supported, more advanced versions are also available.
NoSQL DBs offer additional, specialized functionality, which tends to be optimized for specific use cases and have a newer, different application ecosystem when compared against traditional Relational Database Management Systems.
The NoSQL approach and databases have become incredibly popular for a wide variety of use cases and global clients. It is used for storing and retrieving documents, videos, voice recording, stream data, and holding key-value pairs for shorter lengths of time for caching. It also stores complex graph databases. NoSQL databases are easier to use because they allow end users to interact with and analyze data without requiring deep skills in SQL and/or database administration.
Google, Facebook, Amazon, Twitter, LinkedIn, most cloud providers, and a huge number of customers are seeing value in deploying multiple types of NoSQL solutions to meet different needs and use cases. NoSQL databases are often grouped into a taxonomy based on what they are typically used for. Here’s an example of what that taxonomy might look like with specific NoSQL DBs mentioned:
- Key/Value Based (ex: Redis)
- Column Based (ex: Cassandra)
- Document Based (ex: MongoDB)
- Graph Based (ex: Neo4J)
Benefits of NoSQL Databases
Traditional Relational Database Management Systems are optimized to store and retrieve structured data with defined attributes. They continue to excel in high-volume transaction environments that require the highest levels of transactional integrity. If an issue arises, it is generally addressed specifically by the provider of the RDBMS.
With advanced functionality and application ecosystems, NoSQL databases are being pervasively deployed across most cloud providers, next-gen IT companies, and traditional Fortune 5000 clients. Many large enterprise clients are running a large number of NoSQL databases and adding applications and capacity to their traditional RDBMS-based databases. Newer technologies, such as Apache Hadoop and Spark, focus on the management and mining of very large amounts of unstructured data these companies collect.
NoSQL is not a replacement for SQL — it’s an alternative for workloads that are not suited for traditional RDBMS.
A key factor that impacts companies (including ISVs and start-ups) when choosing a database is cost. Open-source SQL databases can be made to scale and perform extremely well. Many companies prefer open source technologies over commercial operating systems and RDBMS. After all, they can access entire software and hardware solution stacks (including their NoSQL and/or SQL databases) on-premises or over the cloud. Additionally, Open Source technologies are often supported by a large ecosystem of contributors who release newer versions of the software quickly.
Enhancing Database Performance with IBM Power Systems
The really good news here is that a diverse number of both types of solutions are available and optimized on IBM Power Systems. IBM POWER8 is an ideal processor for Linux. It is optimized for cloud and big data environments and supports a vast ecosystem of OpenSource, ISV, and IBM SW Unit products. IBM DB2, Oracle DB, SAP HANA, MariaDB, MongoDB, Redis, Enterprise DB, Cassandra, Neo4J, and many other databases are optimized for performance, scalability, and lower TCO on IBM Power Systems.
IBM Power Systems gives users a single, industry-leading open architecture where they can derive value from the gold mine of next-generation applications and data.
The waitless world is one in which the explosive growth of both structured and unstructured data from multiple sources requires businesses to derive insights faster than ever to remain competitive. IBM delivers dramatic reductions in the cost of large NoSQL databases and traditional RDBMS from many vendors.
To get more information on IBM POWER8 and approaches to data and analytics solutions spanning SQL and NoSQL, visit this page.