What Is a Graph Database?
Everything around us is connected in one way or another. How these things, people, and places are connected can provide an incredible amount of insight when those relationships are understood. That is the basis for graph databases — relationships.
Relational database management systems (RDBMS), on the other hand, are actually not based on the relationships, but on the columns and rows of data. They depend on JOIN tables to connect pieces of data that are related.
A graph database at its core is made up of nodes. These nodes can be anything. For instance, if you are trying to understand traffic flow, the nodes could be the roads, traffic lights, traffic jams, and railroad crossings. For a graph database for a hotel, the nodes could be the individual properties, the rooms, and calendar availability.
What Does a Graph Database Look Like?
The easiest graph database to visualize is a social network. My user profile is a single node under the name Steve. I add a friend, Jasmine, who is another node, and then a relationship is established. We can begin to visualize all of the other nodes that are connected. The connections extend rapidly as we click the like button on similar items. All of those brands, pictures, books, songs, or posts that we interact with are also nodes.
These relationships can create a very powerful tool to understand me as an individual beyond the basic demographics that I entered such as my age, gender, and relationship status. But the uses go well beyond social media.
Google’s ad platform is based on a graph of web pages, users, and interconnections.
PayPal creates a graph of peer-to-peer transactions.
There are currently graph projects underway in every industry to address many different types of challenges. These industries include finance, healthcare, retail, social, manufacturing, logistics, oil and gas, and any other area where the relationship between multiple nodes is important.
Additional Use Cases for Graph Databases
Real-time Recommendations: Relationships between people who have viewed and bought similar items is an excellent use case for graph databases. If I view the same items that you viewed and eventually bought (a bicycle), there is a good chance that I might be interested in related items that you purchased as well (a helmet).
Fraud Detection and Prevention: A typical fraud ring consists of two or more people. They share a subset of legitimate information such as phone numbers or addresses. They open new accounts with fictional identities. They make regular purchases and payments as their credit increases. Then in a coordinated effort, they “bust out” by maxing out all of their credit lines and vanishing.
To detect a fraud ring, checks can happen on relationships at key points, such as when the account is created, during an investigation, or when credit balances change. In addition to existing fraud detection, a graph database can help them understand the relationships and possibly catch a ring before or during the “bust out” moment.
IT and Network Operations: IT regularly uses a tool like Visio to map out complicated technology systems. This includes switches, routers, physical servers, virtual machines, power, PCs, applications, etc. By moving this effort to a graph database, IT Ops can understand the interdependencies of their systems, anticipate points of failure, and recover quicker in the event of an outage.
The Whiteboard View of Data
In a SQL data world, it is complex to model and store relationships. Queries for data can get long and very complex. And performance can actually go down as the amount of data increases.
This image illustrates the fields and their relationships in an RDBMS Northwind database.
In contrast, the following represents the same relationships in a graph database. The image below is courtesy of Tech Target.
That is way more approachable. You can think of a graph database basically as a whiteboard. Every executive in the room can understand the visual relationships between the nodes. In fact, you can go directly from whiteboarding the problem to programming the database.
SQL is the primary language spoken by the RDBMS. This was not always the case, though. In the beginning, every product on the market had their own query tools and code. In order to survive, they had to adopt a similar language. The same thing could be happening with graph databases.
Neo4j is a leader in the graph databases space. In October of 2015, they announced the openCypher project with support from industry leaders like Oracle, Tableau, and Data Bricks (the company behind Apache Spark). This brings Cypher, their third revision of a graph query language, to the entire database community. As you make your way from SQL to Cypher, there are resources that can help, including this post from Neo4j: SQL Guide to Cypher.
Additional Resources for graph databases and Neo4j
There are a lot of graph and Neo4j resources available to you. For starters, you can download the Community Edition at no charge. As you implement production databases, you will most likely want the Enterprise Edition for enterprise-grade availability, scalability, and management.
Here you can get a glimpse of a Neo4j graph database that is a little more personal. The site allows you to log in with your Twitter account to get a visual representation of all of your connections, follows, tweets, and other factors.
5 more extremely useful resources:
- At The Neo4j developer page you get tools, documentation, and even a sandbox to test drive Neo4j.
- With Neo4j GraphGists you can see real life use cases, share code, and join the community. This is one of the best developer communities around.
- The Neo4j GraphAcademy provides full online training along with certification.
- Neo4j Intro to Graph Databases Video Series.
- At www.graphdatabases.com, you can download a free copy of the 2nd Edition of O’Reilly’s Graph Databases: The Definitive Book on Graph Databases.
Related Material You May Have Interest In: