The Solution
SecureKloud recommended big data architecture for stream processing using Kafka on AWS. SecureKloud architected, implemented and is providing managed services for following system:
Confluent
- Kafka platform was configured for processing the messages
- Secure Amazon Virtual Private Cloud (VPC) was instantiated with the necessary capacity to handle the current data sources and scalability for future data sizes running into multiple TB. Entire platform is set on Amazon ECSecureKloud and Amazon EBS.
- Kafka Cluster was setup with Multi Availability zone inside Amazon region for High Availability
- Kafka is configured with Zookeeper for Electing a controller - The controller is one of the brokers and is responsible for maintaining the leader/follower relationship for all the partitions. When a node shuts down, it is the controller that tells other replicas to become partition leaders to replace the partition leaders on the node that is going away. Zookeeper is used to elect a controller, make sure there is only one and elect a new one it if it crashes. Cluster membership - to identify brokers that are alive and part of the cluster. This is also managed through ZooKeeper.
- Topic configuration - identify which topics exist, how many partitions each has, where are the replicas, who is the preferred leader, what configuration overrides are set for each topic.
HortonWorks Ambari is used for Hadoop management, It helps
Make Hadoop management simpler by providing a consistent, secure platform for operational control. Ambari provides an intuitive Web UI as well as a robust REST API, which is particularly useful for automating cluster operations. With Ambari, Hadoop operators get the following core benefits, Simplified Installation, Configuration and Management. Easily and efficiently create, manage and monitor clusters at scale. Takes the guesswork out of configuration with Smart Configs and Cluster Recommendations.
Centralized Security Setup. Reduce the complexity to administer and configure cluster security across the entire platform.
Full Visibility into Cluster Health. Ensure your cluster is healthy and available with a holistic approach to monitoring.