Real Time Data Ingestion – Kinesis Overview

Author: Wavicle Data Solutions


 

Analytics, BI & Data Integration together today are changing the way decisions are made. The science of data is evolving rapidly as we are not only generating heaps of data every second but also putting together systems/applications to integrate that data & analyze it. BI & Predictive Analytics is important and provide actionable insights but at the same time we need to ensure that bringing the data from various sources is the bedrock for business intelligence and data mining.

 

Broadly, if I classify, most common data sources available today are:

 

  • Organizational data that is stable and historic that changes at a slow pace.
  • Real-time sensor data could be mobile app-generated data, video or audio, machine sensor-generated data, or GPS data that keeps on changing rapidly
  • Consumer data generated by people on social media platforms

When data from all varying data sources are combined it can lead to better analysis, better predictive models & higher precision. But to reach this stage, data ingestion is an essential step. Today, there are several database options available as per the type of data available. One such platform that processes & ingests the real-time data generated from machines, apps, sensors, etc. is Amazon Kinesis platform.

 

Amazon Kinesis offers several applications in its platform for different needs. The following chart highlights the offerings & high-level use cases of Amazon Kinesis applications.

 

 

One of the simplest Amazon Kinesis Application is Amazon Kinesis Data Analytics. It is used to process & analyze streaming data using standard SQL language. Since this application is designed to read continuously and process streaming data it is best suited for performing time series analytics and creating real-time dashboards.

 

There is some similarity in how we operate on relational databases and on Amazon Kinesis Data Analytics.

 

RDBMS:

 

 

Amazon Kinesis:

 

 

There are a few interesting things to note about Kinesis.

 

  • Data collection process is separate from data processing. Here, systems that input the data through some web service are called “producers” & these producers push data into “streams”.
  • Several Kinesis applications can consume data from single stream with no interference from other Kinesis applications. This separation of data allows easy processing & catering to varying reporting needs. Kinesis, as of now, provides four methods to process rapidly collected data –
  • Kinesis API application
  • Kinesis Client Library Application
  • Elastic MapR connector
  • Apache Storm Spout