Below and the following blog spot will not only give you a kick start about the working of kafka but also give you a fair idea about how it can be used to scale out your application in production.
Kafka is a pub-sub (publish-subscribe) messaging system which can be used for messaging passing to develop highly scalable and fault tolerant applications.
Here I will start with the basic introduction to kafka in this post and will move further to the designing and development part.
Let's say you have a food delivery app where there are different systems which needs to interact with each other and needs to do some type of message passing ; better understood from the architectural diagram below:
This is an overly simplified architectural diagram where you have your consumer application from where your customer actually places an order for a particular food item and from a particular restaurant. You have a matching logic right in the middle block which matches the driver with the order optimally by getting fetching the drivers which are available to take an order from the Tracking Delivery Boys application and then finally sends the details to the delivery boys / driver application.
Imagine your business has expanded and now you receive thousands of orders in a minute!
What if any of the message passed get lost due to any reason like your system being down and how would you scale this system? That is where kafka comes into the picture.
Kafka is a unified platform for handling all the real-time data feeds. Kafka supports low latency message delivery and gives guarantee for fault tolerance in the presence of machine failures. It has the ability to handle a large number of diverse consumers. Kafka is very fast, performs 2 million writes/sec.
Main terminologies in kafka are topics, brokers, producers and consumers.
Topic - Messages belonging to the same category are stored in the same topic.
For example for all the messages for new order passing through the consumer app to the matching logic app you could have a topic named "orders" ; a topic has minimum 1 partition and can have multiple partitions for
parallelism (we will talk about this later
).
Brokers - Brokers maintain the published data ; this is where the partitions of your topic are stored.
The partitions of a topic are equally distributed amongst the brokers for equal load distribution.Each partitions also has a replication factor to replicate the data in a partition.
Consumers - Consumers read data from the partition inside the brokers by subscribing to that particular topic. A consumer can belong to a particular consumer group defined by a particular groupid ; if we have 4 consumers in the group with id 1 any one of the consumer will read that particular message i.e each message is delivered exactly to one consumer from a group (with a same group id).
Producers - Producers are the publisher of messages to one or more Kafka topics. Producers send data to Kafka brokers. Every time a producer publishes a message to a broker, it gets queued to a particular partition of the topic ; by default kafka tries that the messaged published to a topic is equally distributed amongst its partitions but
producer can also specify the exact partition it wants to publish.
In the next post we will discuss more closely about how would we use kafka for scaling our myFood application to the maximum extent and about achieving parallelism with multiple partitions , use case of same groupid, concept of "partitioning" and its use case for myFood app in complete details.
Please comment if you have any questions below!
Comments
Post a Comment