Print Save PDF

About 6 minutes

big-data-graph.jpg

Big data analytics is growing. The IT industry trade association, CompTIA, recently surveyed business and IT professionals about their goals and experience with big data analytics. 51% of respondents stated their organizations had some form of big data initiative, up from 42% in 2013. And 72% of companies that have invested in big data analytics say that their results surpassed their expectations.

Companies are using insights in a variety of ways. 63% surveyed say they rely on data from day-to-day operations to make more informed decisions faster. 60% are using their data to better understand their customers. And 59% analyze their data to measure performance against business objectives.

There are several challenges that these organizations needed to overcome in order to take advantage of big data analytics. Below we go through some of the primary technological challenges and solutions that are available.

Data Challenges

These real business gains do not come without challenges. A major issue is that their data stored in silos. 45% of the companies surveyed stated that they have a high degree of data fragmentation. And another 42% state that they have a moderate level of fragmentation.

Hadoop and Spark Data Solutions

Hadoop made quite a wave when it came into the scene. It was viewed as an ideal solution to the challenges organizations were facing. It could be used to store and access just about any type of data. It could run quickly on clusters of servers. Spark became the natural successor as analytics applications could run up to 100x faster than MapReduce in memory and 10x faster when run on disk.

Management Complexity

While these solutions have the ability to scale easily and data clusters can grow quite quickly as the data and performance requirements grow. That can present organizations with some common challenges. System resources can be over or under-utilized. The growing clusters add complexity in managing the compute, storage, and networking components.

Talent Shortage

Another challenge that many organizations face is a shortage of big data talent. This has been well documented by consulting firms such as McKinsey. They believe that by 2018 there will be a 50% to 60% gap between big data talent supply and demand.

The Data Engine Solution

Analyst firm Ovum believes that the answer is in appliances with workloads running on Hadoop and Spark. IBM agrees and has announced the Data Engine for Hadoop and Spark. This is a fully integrated infrastructure solution that combines compute, network, storage, and RAM along with cluster management and analytics software. This greatly reduces the complexity of the solution, which makes it easier to deploy and manage.

Built on Power

The key to big data analytics is to process large amounts of data as quickly and efficiently as possible. The IBMs POWER8 processor is being promoted as the first processor designed for big data. The processor excels in memory bandwidth, cache architecture, and thread density. GITHUB reports that Spark performs 2x faster on the IBMs POWER8 processor than on competing commodity processors.

The IBM Data Engine for Hadoop and Spark come in a variety of configurations built on new storage-dense line of IBM POWER8 servers that are optimized for analytics. They can hold up to 14 drives and up to 1TB of memory each. There is a five node starter cluster that can handle up to 216TB of raw data. There are multi-rack configurations that offer up to 1.3PB of raw data per rack.

 

Integrated Software

The IBM Data Engine for Hadoop and Spark comes standard with pre-loaded advanced cluster management software. They can also be preloaded with optional advanced IBM Analytics software. IBM Open Platform for Apache Hadoop includes core Apache Hadoop and Apache Ambari for simple and efficient deployment and management. It also bundles Apache Spark.

IBM also offers an option for IBM BigInsights for Apache Hadoop to be pre-installed. BigInsights allow users to run extremely complex SQL queries right on the Hadoop cluster. It also provides the power to run these advanced analytics directly in an intuitive, browser-based environment.

 

Infrastructure Matters for Big Data

Big data analytics has moved from niche, departmental use to an organization-wide necessity. This has driven the need for purpose-built, enterprise class analytics appliances that provide a powerful, scalable solution. The IBM Data Engine for Hadoop and Spark provides a pre-configured solution that is ready to deploy – shortening time to value by weeks or even months. The integrated, intelligent management software allows organizations to scale without adding complexity.

More information on the IBM Data Engine for Hadoop and Spark is available here.

 

Written by Steve Erickson