BIG DATA is a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools. As such, there is huge challenge related to its acquisition, storage, maintenance and analysis. According to IBM (http://www-01.ibm.com/software/data/bigdata), companies face four main challenges when it comes to BIG DATA:
However, most people don’t realize just how much of a part BIG DATA plays in their everyday lives, and the impact its correct management and processing has on their daily activities. Let’s look at some examples.
Financial Transactions – resulting from Banks, Insurance companies and Stock markets
Air travel Data – Data resulting from air planes, flight operations, airlines and airports. A single flight from London’s Heathrow Airport to John F. Kennedy in New York generates about 650TB of data (IBM)
Population Data – Recording data from people, families, births and deaths
Telecommunication Data – Data from subscribers, calls, messaging, internet and other services usage
Internet Data – Facebook posts, tweets, video uploads, news, blogs, emails, cloud storage, etc.
Industrial Data – Data generated from industrial manufacturing e.g. food industry, pharmaceuticals, garments, automobiles, etc.)
Sensor Data – Data received from sensors is usually continuous and massive in size. For example, a single oil well can have more than 20,000 individual sensors generating multiple terabytes per day
Astronomical Data – from telescopes, observatories, monitoring stations, space agencies, etc.
Weather Data – Data related to temperature, humidity, wind speed, wind direction, precipitation levels, etc.
Travel/Tourism Data – Aside from air travel, tourism itself generates tons of data from hotels, car rentals and other branches of travel industry
Shopping Data – from department stores, retailers, e-commerce websites, social media platforms, etc. Wal-Mart for example, logs one million transactions per hour at its retail locations
Healthcare Data – Medical records, medical imagery & medical equipment sensors also build up a vast quantity of data
Utilities Data – Billing data, sensor data from grids & electronic meters that record consumption
Considering just how big a role BIG DATA plays in our everyday lives, it is therefore vital for companies that handle BIG DATA to have processes and technologies in place to store, manage and leverage this data. That is where Big Data Solutions or Business Intelligence Services come in, as they help companies harvest, organize, store and analyze the data coming in; helping them extract actionable information which the organization can leverage to their advantage, such as:
Business Intelligence (BI) services typically focus on 4 key areas i.e. data mining, data warehousing, information analytics and reporting. Within these areas Business Intelligence or Big Data Solutions offer a variety of sub-services such as Custom, Large & ERP Data Warehousing, ETL Services (Extraction, Transformation, Loading), Performance Management Solutions, Query & Analytics Services, Periodic & Ad Hoc Reporting, etc.
Creating such solutions however is not easy and requires a well-planned strategy on developing the solution’s architecture, which in turn requires looking at the big picture of how the data needs to be managed.
In order to understand just how a Big Data Solution’s architecture is conceived, we’ll compare it to a real life example, in this case a Library, as a library has many operations that resemble a BIG Data Solution’s architecture.
Let’s say we were to build a huge library. What would be the requirements, what would be the areas we would need to consider?
To setup a library, we would need to:
– Acquire a suitable space, making sure there’s ample room to start the library and to expand, as the collection of books increases.
– Create a structure and a boundary to outline the limits of the library.
– Define the process of acquiring books.
– Categorize and store the books in such a manner so as to facilitate quick retrieval.
– Define a process for library membership and a workflow for library members to access the books they desire.
– Have security protocols in place to prevent theft and to ensure the return of books at the correct time
– Have protocols in place for disaster recovery.
– Define a process for checking and maintaining the relevancy and health of titles in the library.
The above list of activities bears a surprising resemblance to the activities involved in building a BIG DATA store, which can be summarized as follows:
Or the collection of data from a variety of different sources. The process of fetching data from these sources will also involve a data validation procedure, in order to ensure that the data being gathered matches the criteria required. A well designed BIG DATA Solution will have a good deal of Metadata for the acquisition process itself.
The Storing, tagging and indexing of the acquired data in the data store and the implementation of techniques to optimize the data storage and retrieval process. This process would also involve the setup of data clustering or data replication processes
Analysis to help identify trends and outliers, in order to uncover facts about the business (risks, operations, customers etc.).
The framework for keeping data secure, defining its access privileges, making sure the information is available and accessible and carrying out actions on the basis of facts that are uncovered.
BIG DATA Solutions are developed and managed by highly skilled staff and typically comprise of the following two roles:
These individuals are software architects who have vast experience in the design and development of data projects and solutions. They the ones responsible for evaluating and developing the Big Data Solution’s architecture, by ensuring that enterprise data technology services are designed with an optimal balance of industry best practices, vendor recommendations, domain requirements, future orientation and tactical pragmatism. Based on these considerations, they recommend the appropriate Enterprise Development Services Solution Architecture that will support the organization’s corporate and business goals.
In addition to solution architects, BIG DATA Solutions also require engineers to run and manage their day to day operations. These individuals known as “Data Scientists” are multi-faceted computer engineers who are also experts in variety of other subjects such as statistics, analytics, mathematics, data engineering, pattern recognition, advanced computing, visualization, uncertainty modeling & data warehousing.
They help companies mine and analyze their data for actionable information, by employing their deep expertise in one or more of the above scientific disciplines.
However data scientists and solution architects only form part of the equation. The other key component in a big data solution is the technology used – the platforms, tools and scripts that drive the entire process. The next part of this series will look at these technologies in more detail.
(to be continued … )