Kafka Policies

Tip

Policy names are prefixed withKafka -

Policy name

Duration

Condition 1

(and) Condition 2

Category

Description

Depressed Number of Zookeeper Connections

30 min

kafka.zookeeper.zk_num_alive_connections has a lower baseline deviation

WARNING

The number of active connections to Zookeeper has been lower than expected for at least the past 30 minutes.

Elevated Consumer Lag

15 min

kafka.zookeeper.consumer_groups.*.comsuler_lag has an upper baseline deviation

WARNING

Consumer lag has been higher than expected for at least 15 minutes.

Elevated Consumer Purgatory Size

15 min

kafka.server.DelayedOperationPurgatory.Fetch.PurgatorySize hasan upper baseline deviation

WARNING

The purgatory size for consumer fetch requests is higher than expected. This may be causing increases in consumer request latency.

Elevated Consumer Servicing Time

15 min

kafka.network.RequestMetrics.FetchConsumer.TotalTimeMs.Meanhasan upper baseline deviation

WARNING

The broker is taking longer than usual to service consumer requests.

Elevated Number of Outstanding Zookeeper Requests

15 min

kafka.zookeeper.zk_outstanding_requests has an upper baseline deviation

WARNING

The number of outstanding Zookeeper requests has been higher than expected for at least the past 15 minutes. This could be resulting in performance issues.

Elevated Producer Purgatory Size

15 min

kafka.server.DelayedOperationPurgatory.Produce.PurgatorySizehasan upper baseline deviation

WARNING

The purgatory size for producer requests is higher than expected. This may be causing increases in producer request latency.

Elevated Producer Servicing Time

15 min

kafka.network.RequestMetrics.Produce.TotalTimeMs.Mean has an upper baseline deviation

WARNING

The broker is taking longer than usual to service producer requests.

Elevated Topic Activity

30 min

iBrokerTopicMetrics._all.BytesInPerSec.Count has an upper baseline deviation

BrokerTopicMetrics._all.BytesOutPerSec.Count has an upper baseline deviation

WARNING

Topic activity has been higher than expected for at least the past 30 minutes.

Elevated Zookeeper Latency

15 min

kafka.zookeeper.zk_avg_latency has an upper baseline deviation

WARNING

The average latency for Zookeeper requests has been higher than expected for at least the past 15 minutes.

Extended Period of Consumer Lag

1 hour and 15 min

kafka.zookeeper.consumer_groups.*.consumer_lag has an upper baseline deviation

CRITICAL

Consumer lag has been higher than expected for over an hour.

No Active Controllers

5 min

kafka.controller.ActiveControllerCount has a static threshold < 1

CRITICAL

There are no active controllers in the Kafka cluster.

Unclean Leader Election Rate Greater Than 0

5 min

kafka.controller.UncleanLeaderElectionsPerSec.Count has a static threshold > 0

CRITICAL

An out-of-sync replica was chosen as leader because none of the available replicas were in sync. Some data loss has occurred as a result.

Under Replicated Partition Count Greater Than 0

30 min

kafka.server.ReplicaManager.UnderReplicatedPartitions has a static threshold > 0

CRITICAL

The number of partitions which are under-replicated has been greater than 0 for at least 30 minutes.