Generalized Data Collection Setup leveraging Apache Kafka to store incoming events and process them using Apache Flink

This is a simplified, easy to understand deployment of Apache Kafka and Flink in a multi-container setup that illustrates how a simple Kafka producer that generates data can be configured to push data to Apache Kafka. The data then can be retrieved in Apache Flink for processing.

In this example setup, we rely on the Apache Flink SQL interface to query data from a Kafka topic. The setup simplifies and illustrates how to connect various pieces such as Apache Kafka, Confluent Schema Registry for automatic validation of incoming messages against a schema, and connect these two with Apache Flink Stream Processing system.

Authored by: Adam Clark, Michael Gottlieb, Rachel Terry, Karan Vahi, and Mike Stults

Read More