Data, data, data. Data it is so popular that there should be a song about it, if not a song then maybe a poem or two. Actually there are AI generated poems if that’s what floats your boat.
What does data need?
The reality is that data needs to be handled with care, it needs to be cleaned (or cleansed), it needs to be ordered and it needs to be structured, all at speed, volume and scale. LinkedIn knew this back in 2010 when they open sourced what is now called Kafka (yes, after Frank Kafka). Kafka is now a very popular message bus system for handling data, but why is that?
What are the alternatives?
There are many message bus systems (or ESBs) on the market and some are available within the computer's operating system like MSMQ or Rabbit MQ. Each message bus system is built for a particular reason and Kafka is built to handle events coming from millions of users on a platform like LinkedIn. IoT systems can utilise this by feeding real-time (msec data) from the ‘edge’ to the ‘cloud’, and this is what we do with Atlas Edge and Atlas Boost.
How does Kafka do this?
In a very simplistic view Kafka, like other service buses, is a producer/ consumer message bus where the data is collected and organised, like the sorting hat in Harry Potter. Once the data is sorted the consumers of the data can ‘subscribe’ to the topic. Not only that, the consumer can play off the pointers to the data replying to any of the data in the queue time and time again.
Remember tape recorders?
It is like having a series of hundreds old tape recorders being able to record data in real-time, at the same time they can play the data to hundreds of different speakers while allowing each user of the system to rewind and replay different parts of the data. This flexibility at speed and scale is what makes Kafka ideal for data architects to use to solve problems for customers.
It doesn’t stop there, with Kafka streams the data from various topics can also be used to combine, filter, and manipulate the source data into information. This is where we get the term streaming analytics from; real-time analysis of the data to generate KPI’s in a flash.