Container Observability – South Deployment Guide
In this deployment, you install the Virtana CO cluster components into your Kubernetes or OpenShift cluster. It helps cluster to continuously collect metrics, logs, and Kubernetes metadata and securely forward that telemetry to the Virtana backend. You need a South deployment to enable end-to-end observability for a specific cluster. Without it, the platform cannot discover workloads, export metrics, or collect node or container signals, etc., which can cause dashboards, health status, alerting, and troubleshooting views in the UI to be incomplete or unavailable for that cluster.
Prerequisites
Ensure the following requirements are met before starting the deployment or configuration process:
You need cluster-admin access to the target Kubernetes cluster.
Credentials for Docker Hub (or a private registry) and Keycloak client for the CO backend.
Get South values.yaml
The South values.yaml file contains the tenant and cluster-specific configuration needed to deploy CO South correctly. It also includes Org identifiers, backend endpoints, and any pre-configured module defaults expected for your environment. This ensures that deployment connects the South components to the correct Virtana tenant and applies the right settings for your selected cluster.
Perform the following steps to get the South values.yaml file:
Open the following URL, https://GLOBAL_VIEW_HOSTNAME/ui.
Log in to Virtana Platform using your org email and password.
Navigate to the Container Observability > Cluster.
In the top right of the CO default page, click System Status and select South Deployment Guide.
To download South
values.yamlby clicking Generate Token to Download YAML.Copy the URL generated and run it on your machine to download the YAML file.
Run the commands provided under Deploy Opscruise, or use the following commands.
helm repo add virtana-repo https://virtana.gitlab.io/helm-charts helm repo update helm search repo virtana-repo/virtana-co
Save as
<ORG_ID>-<CLUSTER_NAME>-opscruise-values.yaml.
Deploy the South components directly from your terminal using native Helm command-line tools.
helm upgrade --install opscruise-bundle virtana-repo/virtana-co --namespace opscruise \ --create-namespace -f <ORG_ID>-<CLUSTER_NAME>-virtana-co-values.yaml \ --version <LATEST_VERSION>
Field | Description |
|---|---|
| Target namespace for all components. |
| Creates the namespace if absent. |
| Specific chart version to deploy. |
Create an Argo CD Application to manage the Helm chart declaratively.
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: virtana-CLUSTER_NAME-south
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
destination:
server: https://kubernetes.default.svc
namespace: opscruise
source:
chart: virtana-co
repoURL: "https://virtana.gitlab.io/helm-charts"
targetRevision: <LATEST_VERSION>
helm:
releaseName: opscruise-bundle
valueFiles:
- values.yaml
values: |
global:
gatewayCreds:
environment:
DOCKER_SERVER: "https://index.docker.io/v2/"
DOCKER_USERNAME: "xxxxx"
DOCKER_PASSWORD: "xxxxx"
OPSCRUISE_ENDPOINT: "xxxxxxxxxxxxx-xxxxxxxxxxxxx.elb.us-east-2.amazonaws.com:443"
KEYCLOAK_ENABLED: "true"
KEYCLOAK_URL: "https://xxxxxx.example.com:443"
KEYCLOAK_CLIENT_ID: "xxxxxx"
KEYCLOAK_CLIENT_SECRET: "xxxxxx"
KEYCLOAK_REALM: "xxxxxx"
OPSCRUISE_ACCOUNT_ID: "xxxxxx"
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=trueThe following table describes each field in the configuration file.
Field | Description | Default value |
|---|---|---|
| The Kubernetes API version for the Argo CD Application Custom Resource. |
|
| The name of the Argo CD Application object. |
|
| The namespace where Argo CD is installed. |
|
| Prevents the Application from being deleted until Argo CD has cleaned up the resources it created. |
|
| The Argo CD Project this Application belongs to. |
|
| The Kubernetes API server address for the destination cluster. |
|
| The Helm chart name to install. |
|
| The Helm repository URL hosting the chart. |
|
| The chart version Argo CD should deploy. |
|
| Helm-specific settings under Argo CD | - |
| Enables automatic sync without manual clicking. | - |
| Tells Argo CD to create the destination namespace if it doesn’t exist. |
|
You can also integrate the deployment into your Infrastructure as Code pipelines by using the Terraform Helm provider.
resource "helm_release" "south" {
create_namespace = true
chart = "virtana-co"
name = "opscruise-bundle"
namespace = "opscruise"
repository = "https://virtana.gitlab.io/helm-charts"
version = var.helm_version
values = [
templatefile("${path.module}/../values/opscruise-values.yaml", {
docker_password = var.docker_password
docker_username = var.docker_username
keycloak_hostname = var.keycloak_hostname
opscruise_kafka_endpoint = var.opscruise_kafka_endpoint
kafka_client_id = var.kafka_client_id
kafka_client_secret = var.kafka_client_secret
tenant_name = var.tenant_name
cluster_name = var.cluster_name
})
]
}The following table describes each field in the configuration file.
Field | Description | Default value |
|---|---|---|
| Create opscruise namespace automatically. |
|
| Chart identity and version to install. | - |
| Helm release name and namespace. | - |
| Rendered values file using Terraform templatefile with variables for credentials and cluster metadata. | - |
Optional settings
Use the following optional settings to pull images from a private registry or to set the resource profile for the South modules.
Using a private image registry
Enter the following command to <ORG_ID>-<CLUSTER_NAME>-virtana-co-values.yaml to use the private image registry.
global: useGlobalRepository: true globalRepositoryName: example.io
The following table describes each field in the configuration file.
Field | Description | Default value |
|---|---|---|
| Enable global override of container image registry for supported components. |
|
| Fully qualified registry/repository path. |
|
Update resource profile
Virtana CO South supports resource profiles to simplify sizing across modules. Instead of manually configuring CPU or memory for every component, you can select a machine_type profile, such as small, medium, or large, and the chart applies the corresponding default resource limits.
Global-level resource profile
You can use this when you want a single sizing profile to apply across CO South components, unless a module overrides it.
global: machine_type: "small"
The command global.machine_type selects the default sizing profile used by modules that do not define their own machine_type. You can select small, medium, or large as a default sizing profile.
Module-level override
Use this command when a particular component needs separate resources from the global profile that provides them. If you want to change the machin_type for a specific module, you can add it to the specific section.
loggw-loki: machine_type: "large"
The command loggw-loki.machine_type overrides global.machine_type. You can select small, medium, or large as a default sizing profile.
Custom resource values for a module
When the predefined resource profiles ( small/medium/large ) do not meet the requirements of a specific module, you can define custom resource values. To do this, configure the machine_type and resources fields under the corresponding machine type section (small_machine, medium_machine, or large_machine). Ensure that the value of machine_type matches the machine type section where the custom resources are defined.
loggw-loki:
machine_type: "large"
large_machine:
resources:
requests:
memory: 2Gi
cpu: 1
limits:
memory: 4Gi
cpu: 2
xss: -Xss4m
xms: -Xms1500m
xmx: -Xmx3500mThe following table describes each field in the configuration file.
Field | Description | Default value |
|---|---|---|
| Chooses which profile is active for this module. |
|
| A profile-specific override block for the | |
| Kubernetes resource configuration to apply to the module’s pods. | |
| Guaranteed minimum resources the scheduler uses for placement.
| |
| The maximum resources the container is allowed to consume.
| |
| JVM tuning options, such as thread stack size, initial heap, and max heap. These apply only to Java Gateways such as loggw, tracegw, gcpgw, and azuregw. |
|
OpenShift deployment specifics
OpenShift-specific deployment steps are only required if your Kubernetes platform is Red Hat OpenShift, because OpenShift enforces additional security and runtime constraints compared to upstream Kubernetes. In particular, OpenShift uses Security Context Constraints (SCC) that can prevent CO South components from running with the permissions they need unless the correct SCCs are granted to their service accounts.
Follow the steps below to prepare the cluster before running the normal South deployment flow.
Download the
virtana-co-values.yamlgenerated by the Virtana UI. This file includes cluster-specific settings required for the deployment.Update the node-exporter port to
9200instead of the default9100invirtana-co-values.yaml.global: nodeExporterPort: 9200
The command
global.nodeExporterPortsets the port used by Node Exporter.Enter the following command to create the South Namespace if not present, where opscruise is a namespace where CO South components are deployed.
kubectl create ns opscruise
Grant required SCC permissions to the CO South service accounts. OpenShift uses SCCs to control privileges.
The following commands bind SCCs to the service accounts used by South modules so pods can run correctly.
oc adm policy add-scc-to-user anyuid -z k8sgw-service-account -n opscruise oc adm policy add-scc-to-user anyuid -z loggw-service-account -n opscruise oc adm policy add-scc-to-user anyuid -z promgw-service-account -n opscruise oc adm policy add-scc-to-user anyuid -z prometheus-service-account -n opscruise oc adm policy add-scc-to-user privileged -z loggw-service-account -n opscruise oc adm policy add-scc-to-user privileged -z prometheus-service-account -n opscruise oc adm policy add-scc-to-user privileged -z ne-service-account -n opscruise oc adm policy add-scc-to-user privileged -z opscruise-bundle-loki -n opscruise oc adm policy add-scc-to-user privileged -z opscruise-bundle-promtail -n opscruise
After completing the OpenShift-specific prerequisites steps above, deploy CO South using your preferred method, such as Helm CLI, Argo CD, or Terraform.
Create secrets manually
The Virtana CO South components require values, for example, Keycloak client credentials and Docker registry credentials, to authenticate to the Opscruise backend and pull images. So, if secrets are not provided through the values file, they must already exist in the cluster or namespace with the expected names.
You perform this manual secret creation before deploying South, especially when their organization’s security policy requires secrets to be managed outside Helm, such as via Vault or Fuze, or a dedicated secrets pipeline.
Create a Keycloak client secret oc-kc-secret
You can create a Keycloak client secret oc-kc-secret manually if secret_source is set to none, which contains KEYCLOAK_CLIENT_SECRET, KEYCLOAK_CLIENT_ID, and KEYCLOAK_CLIENT_TOKEN.
export KEYCLOAK_CLIENT_ID="xxxx"
export KEYCLOAK_CLIENT_SECRET="xxxx"
kubectl create secret generic oc-kc-secret \
--from-literal=KEYCLOAK_CLIENT_SECRET=${KEYCLOAK_CLIENT_SECRET} \
--from-literal=KEYCLOAK_CLIENT_ID=${KEYCLOAK_CLIENT_ID} \
--from-literal=KEYCLOAK_CLIENT_TOKEN="Basic $(echo -n "${KEYCLOAK_CLIENT_ID}:${KEYCLOAK_CLIENT_SECRET}" | base64 -w 0)" \
-n opscruise
cat <<EOF
{
"KEYCLOAK_CLIENT_SECRET": "${KEYCLOAK_CLIENT_SECRET}",
"KEYCLOAK_CLIENT_ID": "${KEYCLOAK_CLIENT_ID}",
"KEYCLOAK_CLIENT_TOKEN": "Basic $(echo -n "${KEYCLOAK_CLIENT_ID}:${KEYCLOAK_CLIENT_SECRET}" | base64 -w 0)"
}
EOFexport KEYCLOAK_CLIENT_ID="xxxx"
export KEYCLOAK_CLIENT_SECRET="xxxx"
kubectl create secret generic oc-kc-secret \
--from-literal=KEYCLOAK_CLIENT_SECRET=${KEYCLOAK_CLIENT_SECRET} \
--from-literal=KEYCLOAK_CLIENT_ID=${KEYCLOAK_CLIENT_ID} \
--from-literal=KEYCLOAK_CLIENT_TOKEN="Basic $(echo -n "${KEYCLOAK_CLIENT_ID}:${KEYCLOAK_CLIENT_SECRET}" | base64 -b 0)" \
-n opscruise
cat <<EOF
{
"KEYCLOAK_CLIENT_SECRET": "${KEYCLOAK_CLIENT_SECRET}",
"KEYCLOAK_CLIENT_ID": "${KEYCLOAK_CLIENT_ID}",
"KEYCLOAK_CLIENT_TOKEN": "Basic $(echo -n "${KEYCLOAK_CLIENT_ID}:${KEYCLOAK_CLIENT_SECRET}" | base64 -b 0)"
}
EOFCreate Docker registry credentials
Enter the following command to create a secret directly in Kubernetes.
export DOCKER_USERNAME="xxxx"
export DOCKER_PASSWORD="xxxx"
kubectl create secret docker-registry oc-ns-docker-creds \
--docker-server=https://index.docker.io/v2/ \
--docker-username="${DOCKER_USERNAME}" \
--docker-password="${DOCKER_PASSWORD}" \
-n opscruise
Enter the following command to create a secret in Fuze or Vault using JSON.
registry_server="https://index.docker.io/v2/"
registry_username="xxxx"
registry_password="xxxx"
encoded_registry_auth=$(echo -n ${registry_username}:${registry_password} | base64)
DOCKER_CONFIG=$(echo -n "{\"auths\": {\"${registry_server}\": {\"auth\": \"${encoded_registry_auth}\"}}}")
cat <<EOF
{
"dockerconfigjson": "${DOCKER_CONFIG}"
}
EOFEnvironment-specific settings
This section covers three commonly used optional configurations for CO South deployments, which include enabling TLS and Basic Authentication for Prometheus, deploying the Zenoss Kubernetes Agent, and enabling GKE Autopilot compatibility. Apply only the subsections that match your environment or requirements by updating your virtana-co-values.yaml.
Prometheus: Enable TLS and Basic Authentication
Use this configuration when you want Prometheus endpoints protected using HTTPS (TLS) and Basic Authentication.
prometheus:
security:
authentication:
enabled: true
password: prom123
tls:
enabled: true
selfSignedCerts: true
hostname: "" The following table describes each field in the configuration file.
Field | Description | Default value |
|---|---|---|
| Configuration section for the Prometheus component deployed as part of CO South. | |
| Enables or disables Basic Authentication. |
|
| Password used for Basic Auth. |
|
| Enables or disables TLS (HTTPS). |
|
| When |
|
| Optional DNS name for Prometheus. | " " |
Deploy the Zenoss Kubernetes agent
Enable this when you want CO South to deploy the Zenoss Kubernetes Agent and connect it to your Zenoss environment.
zenoss-agent-kubernetes:
enabled: true
zenoss:
clusterName: ""
address: ""
apiKey: ""The following table describes each field in the configuration file.
Field | Description | Default value |
|---|---|---|
| Turns the Zenoss agent deployment on or off. |
|
| Cluster identifier or name as it should appear in Zenoss. | " " |
| Zenoss endpoint or address used by the agent to communicate. | " " |
| API key or token used to authenticate to Zenoss. | " " |
GKE autopilot support
Enable this only when your CO South target cluster is a GKE Autopilot cluster (as opposed to standard GKE). The default value of gkeAutoPilot is set to false.
global: gkeAutoPilot: false
If you keep the default GKE cluster type as false, then it considers standard GKE behavior, and if you update the value to true, then autopilot-compatible behavior is considered.
Container Observability South base values file for reference
This section provides a curated snapshot of the most important virtana-co-values.yaml settings used to deploy and operate CO South, focusing on the parameters you most commonly review or tune during installation and troubleshooting. Use it as a quick reference to understand how global credentials, registry settings, backend connectivity, scheduling controls (tolerations or affinity), and key module configurations, such as k8sgw, promgw, Prometheus, etc., fit together. You can adjust only what’s required for your environment while keeping the UI-generated tenant or cluster values intact.
##### Opscruise credentials #####
global:
opscruiseChartVersion: TO_BE_DEFINED
secret_source: "valuesfile"
imagePullSecrets:
- name: oc-ns-docker-creds
##### AWS Credentials #####
awsCredentials:
regions:
- us-east-1
aws_access_key_id: aws_access_key_id
aws_secret_access_key: aws_secret_access_key
roleArn: ""
gkeAutoPilot: false
##### gateway Credentials #####
gatewayCreds:
environment:
DOCKER_SERVER: "https://index.docker.io/v1/"
DOCKER_USERNAME: "<DOCKER_USERNAME>"
DOCKER_PASSWORD: "<DOCKER_PASSWORD>"
DOCKER_EMAIL: "<DOCKER_EMAIL>"
OPSCRUISE_ENDPOINT: "<OPSCRUISE_BACKEND_KAFKA_ENDPOINT>:443"
KEYCLOAK_ENABLED: "true"
KEYCLOAK_URL: "https://auth.opscruise.io:443"
KEYCLOAK_CLIENT_ID: "<KAFKA_CLIENT_ID>"
KEYCLOAK_CLIENT_SECRET: "<KEYCLOAK_CLIENT_SECRET>"
KEYCLOAK_REALM: "<KEYCLOAK_REALM>"
OPSCRUISE_ACCOUNT_ID: "<KEYCLOAK_CLUSTERID>"
externalCadvisor: false
nodeExporterPort: 9100
useGlobalRepository: false
globalRepositoryName: ""
k8sClusterFqdn: "cluster.local"
metricScraper: "prometheus"
metricScrapeInterval: 60
machine_type: "small"
## namespace filtering ##
# namespaceFiltering:
# namespaceAllowList:
# - kube-system
# - collectors
# - opscruise
## allow whitelisted pod labels ##
# whitelistedPodLabels:
# - app
tolerations:
- key: opscruise
effect: NoSchedule
operator: Exists
daemonsetTolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
operator: Exists
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: opscruise
operator: In
values:
- "true"
weight: 1
daemonsetAffinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: opscruise
operator: In
values:
- "true"
weight: 1
# awsgw:
# enabled: false
# azuregw:
# enabled: false
# gcpgw:
# enabled: false
# k8sgw:
# enabled: true
# promgw:
# enabled: true
# loggw-loki:
# enabled: true
# tracegw:
# enabled: false
# eventgw:
# enabled: false
# trace-router:
# enabled: false
# opscruise-node-exporter:
# enabled: false
# opscruise-node-exporter-new:
# enabled: true
# otel-metric-collector:
# enabled: true
# kube-state-metrics:
# enabled: true
# prometheus:
# enabled: true
# loki-stack:
# enabled: true
# prometheus-yace-exporter:
# enabled: false
# jaeger:
# enabled: false
# jaeger-operator:
# enabled: false
# prometheus-postgres-exporter:
# enabled: false
# prometheus-mongodb-exporter:
# enabled: false
# kafka-exporter:
# enabled: false
# fluent-bit:
# enabled: false
# prometheus-mysql-exporter:
# enabled: false
# influxdb-exporter:
# enabled: false
# x509-certificate-exporter:
# enabled: false
# prometheus-redis-exporter:
# enabled: false
# nginx-prometheus-exporter:
# enabled: false
# beyla:
# enabled: false
# alloy:
# enabled: true
# otel-trace-collector:
# enabled: false
##### Awsgw configs #####
awsgw:
enabled: false
logLevel: "info"
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
priorityClassName: ""
# Resources
resources:
limits:
cpu: 500m
memory: 250Mi
requests:
cpu: 50m
memory: 50Mi
##### K8sgw configs #####
k8sgw:
logLevel: "info"
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
priorityClassName: ""
# Resources
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 50m
memory: 50Mi
## namespace filtering
# configMap:
# config:
# kubernetes:
# namespace_allow_list:
# - kube-system
# - collectors
# - opscruise
##### Promgw configs #####
promgw:
logLevel: "info"
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
priorityClassName: ""
# Resources
resources:
limits:
cpu: 500m
memory: 300Mi
requests:
cpu: 50m
memory: 50Mi
##### Loggw config #####
loggw-loki:
enabled: true
logLevel: "INFO"
config:
oauthAcceptUnsecureServer: "true"
jgateway:
lokiHost: "opscruise-bundle-loki.opscruise.svc.K8S_CLUSTER_FQDN:3100"
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
priorityClassName: ""
# Resources
resources:
limits:
cpu: 500m
memory: 1024Mi
requests:
cpu: 50m
memory: 256Mi
##### Azuregw configs #####
azuregw:
enabled: false
logLevel: "INFO"
azureCredentials:
- azureauth_clientId: azureauth_clientId
azureauth_tenantId: azureauth_tenantId
azureauth_clientSecret: azureauth_clientSecret
azureauth_subId: azureauth_subId
name: "credential_name"
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
priorityClassName: ""
#Resources
resources:
limits:
cpu: 500m
memory: 1024Mi
requests:
cpu: 50m
memory: 256Mi
##### Gcpgw configs #####
gcpgw:
enabled: false
logLevel: "INFO"
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
priorityClassName: ""
#Resources
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 50m
memory: 128Mi
##### Tracegw configs #####
tracegw:
enabled: false
logLevel: "INFO"
persistentType: ebs
storageClassName: ""
slo_storage_size: 20Gi
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
priorityClassName: ""
service:
type: ClusterIP
#Resources
resources:
limits:
cpu: 500m
memory: 3Gi
requests:
cpu: 50m
memory: 128Mi
# env config
config:
tracegw:
filterTagsKey: notag
filterTagsValue: notag
traceDataFromJeager: "true"
mode: poll #or listen
##### eventgw #####
eventgw:
enabled: false
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
priorityClassName: ""
##### trace-router #####
trace-router:
enabled: false
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
priorityClassName: ""
service:
type: ClusterIP
#Resources
resources:
limits:
cpu: 500m
memory: 3Gi
requests:
cpu: 50m
memory: 128Mi
##### Node-Exporter-New configs #####
opscruise-node-exporter-new:
## Only lowercase accepted
#logLevel: "info" - Keep the logLevel as info for GKE Autopilot clusters
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
# operator: Exists
priorityClassName: ""
#args:
#btfFilePath: path-to-btf-file
#kconfigFilePath: path-to-kconfig
#customArgs:
#- "--collector.ocflowbpfcollector.skip-bpf-verification"
#- "--collector.ocflowbpfcollector.enable-dns-tracking"
#- "--no-collector.ocflowbpfcollector.retain-original-public-ip-sources"
# publicIPAggregationSubnetPatterns:
# - pattern: "172.16.0.86/32"
# aggregate_to: "1.2.3.4"
# aggregate_name: ""
# - pattern: "1.1.1.1/0" 1"
# aggregate_name: ""
##### BEYLA #####
beyla:
# podLabels:
# key: value
annotations:
# #key: value
# podAnnotations:
# key: value
tolerations:
# # - key: node-role.kubernetes.io/master
# # effect: NoSchedule
# # operator: Exists
priorityClassName: ""
affinity:
##### OTEL Metric Collector #####
otel-metric-collector:
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
# operator: Exists
priorityClassName: ""
# additional_receivers_configs:
# prometheus:
# config:
# scrape_configs:
# - job_name: new-job-exporter
# static_configs:
# - targets:
# - '172.16.71.143:9256'
# - '172.16.73.102:9256'
# scheme: http
# tls_config:
# insecure_skip_verify: true
##### cadvisor configs #####
cadvisor:
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
# operator: Exists
priorityClassName: ""
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
hostNetwork: false
#Resources
resources:
limits:
cpu: 300m
memory: 512Mi
requests:
cpu: 50m
memory: 128Mi
# customArgs:
# - --enable_load_reader=false
##### KSM configs #####
kube-state-metrics:
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/master
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
priorityClassName: ""
#Resources
resources:
requests:
cpu: 50m
memory: 30Mi
limits:
cpu: 300m
memory: 250Mi
##### prometheus configs #####
prometheus:
# logLevel: info
enableIstio: false
enablePersistent: false
prometheusEcsDiscovery:
enabled: false
awsCredentials:
regions:
- us-east-1
aws_access_key_id: aws_access_key_id
aws_secret_access_key: aws_secret_access_key
roleArn: ""
# scrape_interval: 30
labels:
#key: value
annotations:
#key: value
tolerations:
# - key: node-role.kubernetes.io/<node>
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
priorityClassName: ""
#Resources
resources:
requests:
cpu: 50m
memory: 1000Mi
limits:
memory: 5Gi
nonIstioConfigMap:
enabledScrapeJobs:
- oc-kubernetes-pods
- oc-app-exporters
- kubernetes-nodes
- kubernetes-nodes-cadvisor
- kubernetes-apiservers
- kube-scheduler
additionalScrapeConfigs:
# - job_name: ecs-task-targets
# file_sd_configs:
# - files: ['/mnt/*.yml']
# refresh_interval: 1m
# - job_name: kubernetes-nodes-cadvisor
# scheme: https
# tls_config:
# ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
# bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
# kubernetes_sd_configs:
# - role: node
# relabel_configs:
# - action: labelmap
# regex: __meta_kubernetes_node_label_(.+)
# - target_label: __address__
# replacement: kubernetes.default.svc:443
# - source_labels: [__meta_kubernetes_node_name]
# regex: (.+)
# target_label: __metrics_path__
# replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
# metric_relabel_configs:
# - action: replace
# source_labels: [id]
# regex: '^/machine.slice/machine-rkt\x2d([^\]+)\.+/([^/]+).service$'
# target_label: rkt_container_name
# replacement: '${2}-${1}'
# - action: replace
# source_labels: [id]
# regex: '^/system.slice/(.+).service$'
# target_label: systemd_service_name
# replacement: '${1}'
# - action: replace
# source_labels: [container]
# regex: (.*)
# target_label: container_label_io_kubernetes_container_name
# replacement: ${1}
# - action: replace
# source_labels: [pod]
# regex: (.*)
# target_label: container_label_io_kubernetes_pod_name
# replacement: ${1}
# - action: replace
# source_labels: [namespace]
# regex: (.*)
# target_label: container_label_io_kubernetes_pod_namespace
# replacement: ${1}
# - action: replace
# source_labels: [id]
# regex: '.+?pod([^\.g-z]+?)[\.\/\s](.*)'
# target_label: container_label_io_kubernetes_pod_uid
# replacement: ${1}
##### Prometheus Postgres Exporter #####
prometheus-postgres-exporter:
enabled: false
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 50m
memory: 128Mi
config:
datasource:
# Specify one of both datasource or datasourceSecret
host: "<POSTGRESQL_SERVICENAME.NAMESPACE.svc.cluster.local>"
user: "postgres_exporter"
# Only one of password and passwordSecret can be specified
password: "<POSTGRES_EXPORTER_PASSWORD>"
# Specify passwordSecret if DB password is stored in secret.
passwordSecret: {}
# name: <Secret name>
# key: <Password key inside secret>
database: "<DB_NAME>"
sslmode: disable
autoDiscoverDatabases: false
excludeDatabases: []
includeDatabases: []
tolerations:
# - key: node-role.kubernetes.io/<node>
# effect: NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: opscruise
operator: In
values:
- "true"
weight: 1
##### Prometheus MongoDB Exporter #####
prometheus-mongodb-exporter:
enabled: false
mongodb:
uri: "mongodb://${USERNAME}:${PASSWORD}@<SERVICE_NAME.<NAMESPACE>.svc.cluster.local"
existingSecret:
name: ""
key: "mongodb-uri"
resources:
limits:
cpu: 250m
memory: 192Mi
requests:
cpu: 50m
memory: 128Mi
tolerations:
# - key: node-role.kubernetes.io/<node>
# effect: NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: opscruise
operator: In
values:
- "true"
weight: 1
##### Kafka Exporter #####
kafka-exporter:
enabled: false
args:
- --kafka.server=<KAFKA_SERVICE>.<NAMESPACE>.svc.cluster.local:<KAFKA_SERVICE_PORT>
- --zookeeper.server=<ZOOKEEPER_SERVICE>.<NAMESPACE>.svc.cluster.local:<ZOOKEEPER_SERVICE_PORT>
mutiple_kafka_zookeepers:
# - <KAFKA_SERVICE_1>.<NAMESPACE>.svc.cluster.local:<KAFKA_SERVICE_PORT>,<ZOOKEEPER_SERVICE_1>.<NAMESPACE>.svc.cluster.local:<ZOOKEEPER_SERVICE_PORT>
# - <KAFKA_SERVICE_2>.<NAMESPACE>.svc.cluster.local:<KAFKA_SERVICE_PORT>,<ZOOKEEPER_SERVICE_2>.<NAMESPACE>.svc.cluster.local:<ZOOKEEPER_SERVICE_PORT>
tolerations:
# - key: node-role.kubernetes.io/<node>
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
resources:
requests:
cpu: 50m
memory: 256Mi
limits:
cpu: "0.5"
memory: 256Mi
##### Prometheus MYSQL Exporter #####
prometheus-mysql-exporter:
enabled: false
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 50m
memory: 128Mi
mysql:
db: "<DB_NAME>"
host: "<MYSQL_SERVICE>.<NAMESPACE>.svc.cluster.local"
param: ""
# If "existingPasswordSecret" is specified, "pass" can be ignored
pass: ""
port: 3306
user: "mysql_exporter"
# If "pass" is specified, "existingPasswordSecret" can be ignored
existingPasswordSecret:
name: ""
key: ""
tolerations:
# - key: node-role.kubernetes.io/<node>
# effect: NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: opscruise
operator: In
values:
- "true"
weight: 1
##### InfluxDB Exporter #####
influxdb-exporter:
enabled: false
# common_dns: If InfluxDB is running within cluster but different namespace/externally on a VM and is accessible with DNS
# external_ip: If InfluxDB is running externally on a VM and is accessible with IP address only
endpoint_type: common_dns
common_dns_name: ""
common_dns_port: ""
external_ip_address: ""
external_ip_port: ""
serviceLabels: {}
annotations: {}
##### Loki-stack Promtail Integration #####
loki-stack:
promtail:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
resources:
limits:
cpu: 200m
memory: 512Mi
requests:
cpu: 50m
memory: 128Mi
pipelineStages:
- docker: {}
- replace:
# Example to remove: "2025-09-25T07:18:51.353495749Z stderr F "
expression: '(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z (?:stdout|stderr) [A-Z] )'
replace: ''
- multiline:
# Combine multiple patterns for detecting the first line of a multiline log:
# Pattern 1: ISO8601 with milliseconds and Zulu timezone (e.g., 2025-09-25T07:18:51.353Z)
# Pattern 2: Date + time without T separator (e.g., 2025-09-25 07:18:51)
# Pattern 3: Date + time with space separator enclosed in square bracket (e.g., [2025-09-25 07:18:51])
# Pattern 4: Custom prefix starting with "Unexpected character" (example pattern)
# Pattern 5: Matches log level prefixes like "info: " or "error: "
# Pattern 6: Identify zero-width space (not visible space) as first line of a multiline block.
firstline: '(^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z)|(^\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2})|(^\[\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}\])|(^\w{10} \w{9} \(\?\) \w{2} \w{8} \d{1,2})|(^(?:info|error): )|(^\x{200B}\[)'
max_wait_time: 3s
# affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
# config:
# client:
# external_labels:
# cluster_name: CLUSTER_NAME
extraVolumes:
- name: varlog
hostPath:
path: /var/log
extraVolumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
extraScrapeConfigs:
- job_name: kubernetes-nodes-debian
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- replacement: kubelet-logs
target_label: namespace
- replacement: /var/log/syslog
target_label: __path__
- source_labels: [kubernetes_io_hostname]
target_label: host
- job_name: kubernetes-nodes-redhat
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- replacement: kubelet-logs
target_label: namespace
- replacement: /var/log/messages
target_label: __path__
- source_labels: [kubernetes_io_hostname]
target_label: host
loki:
resources:
limits:
cpu: 300m
memory: 4Gi
requests:
cpu: 50m
memory: 512Mi
# config:
# limits_config:
# reject_old_samples_max_age: 24h
# table_manager:
# retention_deletes_enabled: true
# retention_period: 24h
# tolerations:
# - key: node-role.kubernetes.io/<node>
# effect: NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: opscruise
operator: In
values:
- "true"
weight: 1
##### Prometheus Redis Exporter #####
prometheus-redis-exporter:
enabled: false
redisAddress: redis://<REDIS_IP/FQDN>:6379
tolerations:
# - key: node-role.kubernetes.io/<node>
# effect: NoSchedule
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: opscruise
operator: In
values:
- "true"
weight: 1
##### Nginx Prometheus Exporter #####
nginx-prometheus-exporter:
enabled: false
args:
- -nginx.scrape-uri=http://NGINX_ENDPOINT:PORT/METRIC_ENDPOINT
tolerations:
# - key: node-role.kubernetes.io/<node>
# effect: NoSchedule
affinity:
# nodeAffinity:
# preferredDuringSchedulingIgnoredDuringExecution:
# - preference:
# matchExpressions:
# - key: opscruise
# operator: In
# values:
# - "true"
# weight: 1
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 20m
memory: 128Mi
##### prometheus-yace-exporter #####
prometheus-yace-exporter:
enabled: false
image:
repository: ghcr.io/nerdswords/yet-another-cloudwatch-exporter
tag: v0.61.2
pullPolicy: IfNotPresent
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 20m
memory: 128MiGlobal Configuration
The global section is the foundation of the CO South values file. It defines site-wide defaults, such as chart versioning, secret management strategy, image pull credentials, cluster type, metric collection settings, and resource sizing profiles, that are inherited by all modules unless explicitly overridden at the individual module level.
Chart version and secret source
The global section contains settings that apply across all deployed modules unless overridden at the individual module level.
global:
opscruiseChartVersion: TO_BE_DEFINED
secret_source: "valuesfile"
imagePullSecrets:
- name: oc-ns-docker-creds
gkeAutoPilot: false
externalCadvisor: false
nodeExporterPort: 9100
useGlobalRepository: false
globalRepositoryName: ""
k8sClusterFqdn: "cluster.local"
metricScraper: "prometheus"
metricScrapeInterval: 60
machine_type: "small"Field | Description | Default values |
|---|---|---|
| The version of the OpsCruise Helm chart being deployed. |
|
| Defines where the deployment should read secrets from. |
|
| A list of Kubernetes secrets used to authenticate with private container image registries when pulling images. |
|
| Enables or disables configuration adjustments specific to GKE Autopilot clusters. |
|
| When set to |
|
| The port on which the Prometheus Node Exporter listens for host-level metrics. |
|
| Specifies the metrics collection backend. |
|
| The interval (in seconds) at which metrics are scraped from targets. |
|
| When set to |
|
| The global image repository path is used when useGlobalRepository is | (" ") An empty string means no global override is active. |
| The fully qualified domain name (FQDN) of the Kubernetes cluster's internal DNS. |
|
| A sizing profile that controls default resource allocations (CPU, memory) across services. You can select |
|
Global AWS credentials
This subsection configures the AWS credentials used by cloud-aware modules to authenticate with AWS services and specify which regions to monitor.
global:
awsCredentials:
regions:
- us-east-1
aws_access_key_id: aws_access_key_id
aws_secret_access_key: aws_secret_access_key
roleArn: ""Field | Description | Default value |
|---|---|---|
| Configuration for authenticating with AWS services. | |
| A list of AWS regions the deployment interacts with. |
|
| The AWS access key ID used for authentication. Replace with your actual key. | |
| The AWS secret access key paired with the access key ID. Replace with your actual secret. | |
| An optional AWS IAM Role ARN to assume for cross-account access or scoped permissions. | " " |
Global gateway credentials
This subsection provides the backend connectivity and image-registry credentials that all South gateway modules use to authenticate with the Opscruise or Virtana backend (via Keycloak) and to pull container images from the Docker registry.
global:
gatewayCreds:
environment:
DOCKER_SERVER: "https://index.docker.io/v1/"
DOCKER_USERNAME: "<DOCKER_USERNAME>"
DOCKER_PASSWORD: "<DOCKER_PASSWORD>"
DOCKER_EMAIL: "<DOCKER_EMAIL>"
OPSCRUISE_ENDPOINT: "<OPSCRUISE_BACKEND_KAFKA_ENDPOINT>:443"
KEYCLOAK_ENABLED: "true"
KEYCLOAK_URL: "https://auth.opscruise.io:443"
KEYCLOAK_CLIENT_ID: "<KAFKA_CLIENT_ID>"
KEYCLOAK_CLIENT_SECRET: "<KEYCLOAK_CLIENT_SECRET>"
KEYCLOAK_REALM: "<KEYCLOAK_REALM>"
OPSCRUISE_ACCOUNT_ID: "<KEYCLOAK_CLUSTERID>"Field | Description | Default value |
|---|---|---|
| Docker registry server URL. |
|
| Docker registry username. |
|
| Docker registry password. |
|
| Docker registry email. |
|
| Backend Kafka endpoint to which South components send telemetry. |
|
| Enables or disables Keycloak-based authentication. |
|
| Keycloak server URL. |
|
| Keycloak client ID. |
|
| Keycloak client secret. |
|
| Keycloak realm name. |
|
| Opscruise account or cluster identifier. |
|
Global namespace filtering and pod label whitelisting
These optional settings let you restrict which namespaces are monitored and control which pod labels are collected or forwarded by CO South. By default, all namespaces and no specific pod labels are filtered.
global: # namespaceFiltering: # namespaceAllowList: # - kube-system # - collectors # - opscruise # whitelistedPodLabels: # - app
Field | Description |
|---|---|
| Restricts monitoring to only the listed namespaces. |
| Allows specific pod labels to be collected or forwarded. |
Global tolerations and affinity
These settings control pod scheduling across the cluster. Global tolerations and affinity rules determine which nodes CO South pods can (or prefer to) run on. Separate daemonset variants apply only to daemonset-based modules that need to run on every node, including master or control-plane nodes.
global:
tolerations:
- key: opscruise
effect: NoSchedule
operator: Exists
daemonsetTolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
operator: Exists
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: opscruise
operator: In
values:
- "true"
weight: 1
daemonsetAffinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: opscruise
operator: In
values:
- "true"
weight: 1Field | Description |
|---|---|
tolerations | Global tolerations for non-daemonset modules. |
daemonsetTolerations | Additional tolerations for daemonsets. |
affinity | Global node affinity for non-daemonset modules. |
daemonsetAffinity | Additional node affinity for daemonsets. |
Global module enable/disable flags
This subsection provides a centralized toggle for every CO South module. Modules enabled by default include node-exporter, KSM, loggw-loki, loki, promgw, k8sgw, promtail, and prometheus. You can override these flags here or via the Helm CLI.
global: # awsgw: # enabled: false # k8sgw: # enabled: true # promgw: # enabled: true # loggw-loki: # enabled: true # tracegw: # enabled: false # eventgw: # enabled: false # trace-router: # enabled: false # opscruise-node-exporter: # enabled: false # opscruise-node-exporter-new: # enabled: true # otel-metric-collector: # enabled: true # kube-state-metrics: # enabled: true # prometheus: # enabled: true # loki-stack: # enabled: true # prometheus-yace-exporter: # enabled: false # jaeger: # enabled: false # jaeger-operator: # enabled: false # prometheus-postgres-exporter: # enabled: false # prometheus-mongodb-exporter: # enabled: false # kafka-exporter: # enabled: false # fluent-bit: # enabled: false # prometheus-mysql-exporter: # enabled: false # influxdb-exporter: # enabled: false # x509-certificate-exporter: # enabled: false # prometheus-redis-exporter: # enabled: false # nginx-prometheus-exporter: # enabled: false # beyla: # enabled: false # alloy: # enabled: true # otel-trace-collector: # enabled: false
Core gateway modules
The Core Gateway Modules serve as the essential data conduits between your monitored environment and the Opscruise/Virtana backend. These modules function as specialized collectors that gather metrics, metadata, logs, events, and traces from diverse sources that include Kubernetes clusters and major cloud providers (AWS, Azure, GCP), standardizing and streaming the data for unified observability and analysis.
AWS gateway
The AWS Gateway collects AWS cloud metrics from the configured AWS regions and forwards them to the Opscruise backend. Enable this module only if your environment includes AWS resources you want to monitor alongside your Kubernetes workloads.
awsgw:
enabled: false
logLevel: "info"
labels: {}
annotations: {}
tolerations: []
affinity: {}
priorityClassName: ""
resources:
limits:
cpu: 500m
memory: 250Mi
requests:
cpu: 50m
memory: 50MiField | Description | Default value |
|---|---|---|
| Enables or disables the AWS Gateway module. |
|
| Log verbosity level. |
|
| Custom Kubernetes labels applied to awsgw pods. |
|
| Custom Kubernetes annotations applied to awsgw pods. |
|
| Module-specific tolerations. |
|
| Module-specific node affinity. |
|
| Kubernetes PriorityClass name for pod scheduling priority. |
|
| CPU and memory requests and limits for the pod. |
Kubernetes gateway
The Kubernetes Gateway is a core CO South module that discovers and collects Kubernetes cluster metadata, including pods, deployments, services, nodes, and events, and streams it to the Opscruise backend. It supports optional namespace-level filtering to limit which namespaces are monitored.
k8sgw:
logLevel: "info"
labels: {}
annotations: {}
tolerations: []
affinity: {}
priorityClassName: ""
resources:
limits:
cpu: 500m
memory: 500Mi
requests:
cpu: 50m
memory: 50Mi
# configMap:
# config:
# kubernetes:
# namespace_allow_list:
# - kube-system
# - collectors
# - opscruiseField | Description |
|---|---|
| Sets the logging verbosity level for the Kubernetes Gateway. Common values are "info", "debug", "warn", and "error". |
| Custom Kubernetes labels to apply to the k8sgw pod(s). Useful for organizing, filtering, or selecting resources. |
| Custom Kubernetes annotations to attach to the k8sgw pod(s). Often used for integrations with monitoring tools, ingress controllers, or policy engines. |
| A list of Kubernetes tolerations that allow the k8sgw pod(s) to be scheduled on nodes with matching taints. |
| Kubernetes affinity or anti-affinity rules that control which nodes the k8sgw pod(s) can be scheduled on, based on node labels or other pod locations. |
| The name of a Kubernetes PriorityClass to assign to the k8sgw pod(s). Determines scheduling and eviction priority relative to other pods. |
| CPU and memory requests and limits for the cluster. |
| Module-level namespace filtering for k8sgw only. |
Prometheus gateway
The Prometheus Gateway receives scraped metrics from Prometheus and forwards them to the Opscruise backend over the authenticated Kafka channel. It acts as the bridge between the local Prometheus instance and the Virtana cloud platform.
promgw:
logLevel: "info"
labels: {}
annotations: {}
tolerations: []
affinity: {}
priorityClassName: ""
resources:
limits:
cpu: 500m
memory: 300Mi
requests:
cpu: 50m
memory: 50MiField | Description |
|---|---|
| Sets the logging verbosity level for the Prometheus Gateway. Common values are "info", "debug", "warn", and "error". |
| Custom Kubernetes labels to apply to the promgw pod(s). Useful for organizing, filtering, or selecting resources. |
| Custom Kubernetes annotations to attach to the promgw pod(s). Often used for integrations with monitoring tools, ingress controllers, or policy engines. |
| A list of Kubernetes tolerations that allow the promgw pod(s) to be scheduled on nodes with matching taints. |
| Kubernetes affinity or anti-affinity rules that control which nodes the promgw pod(s) can be scheduled on, based on node labels or other pod locations. |
| The name of a Kubernetes PriorityClass to assign to the promgw pod(s). Determines scheduling and eviction priority relative to other pods. |
| The maximum CPU and memory the promgw container is allowed to consume. |
| The minimum CPU and memory guaranteed to the promgw container. |
Log gateway - Loki
The Log Gateway is a Java-based gateway that reads logs from the in-cluster Loki instance and forwards them to the Opscruise backend. It handles OAuth authentication and connects to Loki using the cluster-internal DNS endpoint derived from global.k8sClusterFqdn.
loggw-loki:
enabled: true
logLevel: "info"
config:
oauthAcceptUnsecureServer: "true"
jgateway:
lokiHost: "opscruise-bundle-loki.opscruise.svc.K8S_CLUSTER_FQDN:3100"
labels: {}
annotations: {}
tolerations: []
affinity: {}
priorityClassName: ""
resources:
limits:
cpu: 500m
memory: 1024Mi
requests:
cpu: 50m
memory: 256MiField | Description | Default value |
|---|---|---|
Enables or disables the Log Gateway. |
| |
| Allows OAuth connections to non-TLS servers. |
|
| Internal Loki endpoint used by the log gateway. |
|
Azure gateway
The Azure Gateway collects Azure cloud metrics from one or more Azure subscriptions and forwards them to the Opscruise backend. It supports multiple credential sets for monitoring resources across different subscriptions or tenants.
azuregw:
enabled: false
logLevel: "INFO"
azureCredentials:
- azureauth_clientId: azureauth_clientId
azureauth_tenantId: azureauth_tenantId
azureauth_clientSecret: azureauth_clientSecret
azureauth_subId: azureauth_subId
name: "credential_name"
labels: {}
annotations: {}
tolerations: []
affinity: {}
priorityClassName: ""
resources:
limits:
cpu: 500m
memory: 1024Mi
requests:
cpu: 50m
memory: 256MiField | Description | Default value |
|---|---|---|
| Enables or disables the Azure Gateway. | false |
| List of Azure credential sets. | |
| Azure AD application (client) ID. | |
| Azure AD tenant ID. | |
| Azure AD client secret. | |
| Azure subscription ID. | |
| Readable name for Azure credential set. |
GCP gateway
The GCP Gateway collects Google Cloud Platform metrics and forwards them to the Opscruise backend. Enable this module only if your environment includes GCP resources you want to monitor.
gcpgw:
enabled: false
logLevel: "INFO"
labels: {}
annotations: {}
tolerations: []
affinity: {}
priorityClassName: ""
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 50m
memory: 128MiTrace gateway
The Trace Gateway collects distributed tracing data and forwards trace spans to the Opscruise backend for analysis and SLO tracking. It supports persistent storage for SLO data and can operate in either poll (pull) or listen (push) mode.
tracegw:
enabled: false
logLevel: "INFO"
persistentType: ebs
storageClassName: ""
slo_storage_size: 20Gi
labels: {}
annotations: {}
tolerations: []
affinity: {}
priorityClassName: ""
service:
type: ClusterIP
resources:
limits:
cpu: 500m
memory: 3Gi
requests:
cpu: 50m
memory: 128Mi
config:
tracegw:
filterTagsKey: notag
filterTagsValue: notag
traceDataFromJeager: "true"
mode: pollField | Description | Default value |
|---|---|---|
| Enables or disables the Trace Gateway. |
|
| Persistent volume type. Supported: |
|
| Kubernetes StorageClass name. Leave blank for default. |
|
| Persistent volume size for SLO data. |
|
| Kubernetes Service type. |
|
| Tag key used for trace filtering. |
|
| Tag key used for trace filtering. |
|
| Whether trace data is sourced from Jaeger. |
|
| Data retrieval mode. Supported: |
|
Event gateway
The Event Gateway captures Kubernetes cluster events, such as pod scheduling, node conditions, and resource warnings, and forwards them to the Opscruise backend for correlation with metrics and logs.
eventgw:
enabled: false
labels: {}
annotations: {}
tolerations: []
affinity: {}
priorityClassName: ""Trace router
The Trace Router acts as an intermediary routing layer for distributed traces, directing trace data between trace sources and the Trace Gateway. It is typically used in complex tracing topologies where traces need to be filtered, sampled, or routed before reaching the gateway.
trace-router:
enabled: false
labels: {}
annotations: {}
tolerations: []
affinity: {}
priorityClassName: ""
service:
type: ClusterIP
resources:
limits:
cpu: 500m
memory: 3Gi
requests:
cpu: 50m
memory: 128MiField | Description |
|---|---|
| Controls whether the Trace Router module is deployed. Set to true to enable or false to disable. |
| Custom Kubernetes labels to apply to the trace-router pod(s). Useful for organizing, filtering, or selecting resources. |
| Custom Kubernetes annotations to attach to the trace-router pod(s). Often used for integrations with monitoring tools, ingress controllers, or policy engines. |
| A list of Kubernetes tolerations that allow the trace-router pod(s) to be scheduled on nodes with matching taints. |
| Kubernetes affinity or anti-affinity rules that control which nodes the trace-router pod(s) can be scheduled on, based on node labels or other pod locations. |
| The name of a Kubernetes PriorityClass to assign to the trace-router pod(s). Determines scheduling and eviction priority relative to other pods. |
| Specifies the Kubernetes Service type used to expose the Trace Router. |
| The maximum CPU and memory the trace-router container is allowed to consume. |
| The minimum CPU and memory guaranteed to the trace-router container. |
Metric collection components
The Metric Collection Components are the specialized agents and collectors responsible for gathering raw performance data from your infrastructure. These modules operate at the host, container, and application levels, utilizing technologies like eBPF and OpenTelemetry to capture granular resource usage, network flows, and request-level metrics. They provide the raw data foundation that allows Opscruise to visualize health and troubleshoot performance bottlenecks across the entire stack.
Node Exporter (New)
The new Node Exporter is the recommended replacement for the legacy version. It collects host-level metrics and additionally supports eBPF-based flow collection (network flow visibility, DNS tracking) via custom arguments. It also supports public IP aggregation patterns for network analytics.
opscruise-node-exporter-new:
labels: {}
annotations: {}
tolerations: []
priorityClassName: ""
# args:
# btfFilePath: path-to-btf-file
# kconfigFilePath: path-to-kconfig
# customArgs:
# - "--collector.ocflowbpfcollector.skip-bpf-verification"
# - "--collector.ocflowbpfcollector.enable-dns-tracking"
# publicIPAggregationSubnetPatterns:
# - pattern: "172.16.0.86/32"
# aggregate_to: "1.2.3.4"
# aggregate_name: ""Field | Description |
|---|---|
| Log verbosity. Keep as |
| Path to BTF file for eBPF-based collectors. |
| Path to kernel config file. |
| Additional CLI arguments. |
| Patterns for aggregating public IPs into representative addresses. |
| CIDR pattern to match. |
| IP address to aggregate matched traffic to. |
| An optional readable name for the aggregated IP. |
Beyla
Beyla provides automatic, zero-code instrumentation of applications using eBPF. It captures HTTP/gRPC request metrics and traces without requiring any changes to application code or container images, making it ideal for gaining instant observability into services that are not yet manually instrumented.
beyla:
annotations: {}
tolerations: []
priorityClassName: ""
affinity: {}Field | Description |
|---|---|
| Custom labels applied to Beyla pods. |
| Custom annotations on the Beyla resource. |
| Custom annotations applied to Beyla pods. |
OTEL metric collector
The OpenTelemetry Metric Collector receives, processes, and exports metrics using the OpenTelemetry protocol. It can be extended with additional Prometheus scrape configurations to collect metrics from custom exporters or services not covered by the default CO South modules.
otel-metric-collector:
labels: {}
annotations: {}
tolerations: []
priorityClassName: ""
# additional_receivers_configs:
# prometheus:
# config:
# scrape_configs:
# - job_name: new-job-exporter
# static_configs:
# - targets:
# - '172.16.71.143:9256'
# scheme: http
# tls_config:
# insecure_skip_verify: trueUse additional_receivers_configs to add custom Prometheus scrape jobs to the OTEL collector alongside the default targets.
cAdvisor
cAdvisor runs as a daemonset on every node and collects container-level resource usage and performance metrics (CPU, memory, filesystem, network) for all running containers. It provides the per-container granularity that complements node-level metrics from Node Exporter.
cadvisor:
labels: {}
annotations: {}
tolerations: []
priorityClassName: ""
affinity: {}
hostNetwork: false
resources:
limits:
cpu: 300m
memory: 512Mi
requests:
cpu: 50m
memory: 128MiField | Description | Default value |
|---|---|---|
| Whether cAdvisor pods use the host network namespace. |
|
| Additional CLI arguments | set to |
| CPU and memory requests and limits for the container. |
Kube State metrics
Kube State Metrics (KSM) generates Prometheus-format metrics about the state of Kubernetes objects, such as deployments, pods, nodes, jobs, and config maps, by listening to the Kubernetes API server. It provides the "desired vs. actual" state visibility that raw resource metrics alone cannot offer.
kube-state-metrics:
labels: {}
annotations: {}
tolerations: []
affinity: {}
priorityClassName: ""
resources:
requests:
cpu: 50m
memory: 30Mi
limits:
cpu: 300m
memory: 250MiPrometheus
Prometheus is the primary metric scraping engine in CO South. It discovers and scrapes metrics from Kubernetes nodes, pods, cAdvisor, kube-state-metrics, and any custom exporters, then makes them available to the Prometheus Gateway (promgw) for forwarding to the backend. It also supports Istio integration and ECS service discovery for hybrid environments.
prometheus:
enableIstio: false
enablePersistent: false
prometheusEcsDiscovery:
enabled: false
awsCredentials:
regions:
- us-east-1
aws_access_key_id: aws_access_key_id
aws_secret_access_key: aws_secret_access_key
roleArn: ""
resources:
requests:
cpu: 50m
memory: 1000Mi
limits:
memory: 5Gi
nonIstioConfigMap:
enabledScrapeJobs:
- oc-kubernetes-pods
- oc-app-exporters
- kubernetes-nodes
- kubernetes-nodes-cadvisor
- kubernetes-apiservers
- kube-scheduler
additionalScrapeConfigs: []Field | Description | Default value |
|---|---|---|
| Enables or disables Istio service mesh integration for Prometheus. |
|
| Enables persistent storage for Prometheus data. |
|
| Module-level scrape interval. | |
| Enables ECS service discovery for Prometheus. |
|
| AWS credentials for ECS discovery. | |
| List of default scrape job names enabled when Istio is not used. | |
| Custom Prometheus scrape configurations, for example, ECS targets, and custom cAdvisor. |
|
Database and middleware exporters
The Database and Middleware Exporters are specialized bridge components designed to extract deep visibility from stateful services and messaging systems. Since databases and middleware often operate as "black boxes" with their own internal telemetry formats, these exporters scrape service-specific data, such as query latency, connection pools, and queue depths, and translate them into a unified Prometheus-compatible format. This allows Opscruise to correlate the health of your data layer directly with the performance of your application services.
Prometheus PostgreSQL exporter
The PostgreSQL Exporter scrapes database-level metrics, such as connections, transactions, locks, replication lag, or table/index statistics, from a PostgreSQL instance and exposes them in Prometheus format. It supports direct password configuration or Kubernetes Secret-based credential management.
prometheus-postgres-exporter:
enabled: false
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 50m
memory: 128Mi
config:
datasource:
host: "<POSTGRESQL_SERVICENAME.NAMESPACE.svc.cluster.local>"
user: "postgres_exporter"
password: "<POSTGRES_EXPORTER_PASSWORD>"
passwordSecret: {}
database: "<DB_NAME>"
sslmode: disable
autoDiscoverDatabases: false
excludeDatabases: []
includeDatabases: []Field | Description | Default value |
|---|---|---|
| Enables or disables the PostgreSQL exporter. |
|
| PostgreSQL service DNS or IP. | |
| Database user for the exporter. |
|
| Database password. Only one of password or passwordSecret should be used. | |
| Kubernetes Secret name containing the password. | |
| Key inside the Secret. | |
| Database name to connect to. | |
| PostgreSQL SSL mode. |
|
| Automatically discover and monitor all databases. |
|
| Databases to exclude from auto-discovery. |
|
| Databases to include from auto-discovery. |
|
Prometheus MongoDB exporter
The MongoDB Exporter scrapes database-level metrics from a MongoDB instance and exposes them in Prometheus format. It supports both inline connection URI and Kubernetes Secret-based credential management.
prometheus-mongodb-exporter:
enabled: false
mongodb:
uri: "mongodb://${USERNAME}:${PASSWORD}@<SERVICE_NAME.<NAMESPACE>.svc.cluster.local"
existingSecret:
name: ""
key: "mongodb-uri"
resources:
limits:
cpu: 250m
memory: 192Mi
requests:
cpu: 50m
memory: 128MiField | Description | Default value |
|---|---|---|
| Enables or disables the MongoDB exporter. |
|
| MongoDB connection URI. Ignored if | |
| Name of an existing Kubernetes Secret containing the URI. |
|
| Key inside the Secret holding the connection URI. |
|
Kafka exporter
The Kafka Exporter scrapes Kafka broker and consumer group metrics and exposes them in Prometheus format. It connects to both Kafka brokers and ZooKeeper, and supports multi-broker/multi-ZooKeeper configurations.
kafka-exporter:
enabled: false
args:
- --kafka.server=<KAFKA_SERVICE>.<NAMESPACE>.svc.cluster.local:<KAFKA_SERVICE_PORT>
- --zookeeper.server=<ZOOKEEPER_SERVICE>.<NAMESPACE>.svc.cluster.local:<ZOOKEEPER_SERVICE_PORT>
mutiple_kafka_zookeepers: []
resources:
requests:
cpu: 50m
memory: 256Mi
limits:
cpu: "0.5"
memory: 256MiField | Description | Default value |
|---|---|---|
enabled | Enables or disables the Kafka exporter. |
|
args | CLI arguments specifying Kafka and ZooKeeper server endpoints. | |
mutiple_kafka_zookeepers | List of additional Kafka or ZooKeeper endpoint pairs for multi-broker setups. |
|
Prometheus MySQL exporter
The MySQL Exporter scrapes database-level metrics from a MySQL instance and exposes them in Prometheus format. It supports both inline password and Kubernetes Secret-based credential management.
prometheus-mysql-exporter:
enabled: false
resources:
limits:
cpu: 100m
memory: 200Mi
requests:
cpu: 50m
memory: 128Mi
mysql:
db: "<DB_NAME>"
host: "<MYSQL_SERVICE>.<NAMESPACE>.svc.cluster.local"
param: ""
pass: ""
port: 3306
user: "mysql_exporter"
existingPasswordSecret:
name: ""
key: ""Field | Description | Default value |
|---|---|---|
| Enables or disables the MySQL exporter. |
|
| Database name. | |
| MySQL service DNS or IP. | |
| Additional DSN parameters. |
|
| Database password. Ignored if |
|
| MySQL port. |
|
| Database user. |
|
| Kubernetes Secret name containing the password. |
|
| Key inside the Secret. |
|
InfluxDB exporter
The InfluxDB Exporter collects InfluxDB metrics and exposes them in Prometheus format. It supports two connectivity modes: DNS-based access (for in-cluster or DNS-resolvable instances) and direct IP-based access.
influxdb-exporter:
enabled: false
endpoint_type: common_dns
common_dns_name: ""
common_dns_port: ""
external_ip_address: ""
external_ip_port: ""
serviceLabels: {}
annotations: {}Field | Description | Default value |
|---|---|---|
| Enables or disables the InfluxDB exporter. |
|
| How to reach InfluxDB.
|
|
| DNS name of the InfluxDB instance. Used when |
|
| Port of the InfluxDB instance. Used when |
|
| IP address. Used when |
|
| Port used when |
|
| Custom labels on the exporter Service. |
|
| Custom annotations. |
|
Loki-stack promtail integration
The Loki Stack deploys Promtail and Loki. Promtail tails container and node logs, applies pipeline stages, such as parsing, multiline merging, or timestamp stripping, and ships them to Loki. The Log Gateway (loggw-loki) then reads from Loki and forwards logs to the Opscruise backend.
Promtail
Promtail is the log shipping agent that runs on every node, discovers pod and node log files, applies configurable pipeline stages, and pushes the processed logs to the in-cluster Loki instance.
loki-stack:
promtail:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
resources:
limits:
cpu: 200m
memory: 512Mi
requests:
cpu: 50m
memory: 128Mi
pipelineStages:
- docker: {}
- replace:
expression: '(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d+Z (?:stdout|stderr) [A-Z] )...'
replace: ''
- multiline:
firstline: '(^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}\.\d{3}Z)|(^\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2})|(^\[\d{4}-\d{2}-\d{2} \d{1,2}:\d{2}:\d{2}\])|(^\w{10} \w{9} \(\?\) \w{2} \w{8} \d{1,2})|(^(?:info|error): )|(^\x{200B}\[)...'
max_wait_time: 3s
extraVolumes:
- name: varlog
hostPath:
path: /var/log
extraVolumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
extraScrapeConfigs:
- job_name: kubernetes-nodes-debian
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- replacement: kubelet-logs
target_label: namespace
- replacement: /var/log/syslog
target_label: __path__
- source_labels: [kubernetes_io_hostname]
target_label: host
- job_name: kubernetes-nodes-redhat
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- replacement: kubelet-logs
target_label: namespace
- replacement: /var/log/messages
target_label: __path__
- source_labels: [kubernetes_io_hostname]
target_label: hostField | Description | Default value |
|---|---|---|
| Tolerations for Promtail daemonset pods. |
|
| CPU and memory requests and limits for the container. | |
| Promtail log processing pipeline. Includes docker (CRI parsing), replace (strip kubelet prefix timestamps), and multiline (combine multi-line logs). | |
| Regex patterns to detect the first line of a multi-line log entry. | |
| Maximum time to wait for additional lines before flushing. |
|
| Additional volumes mounted into Promtail pods. |
|
| Mount points for extra volumes. |
|
| Additional Promtail scrape configurations for node-level logs. | Debian + RedHat node log jobs |
Loki
Loki is the log aggregation backend that stores and indexes logs shipped by Promtail. It provides the query interface used by the Log Gateway (loggw-loki) to retrieve and forward logs to the Opscruise backend. It supports configurable retention policies and sample age limits.
loki-stack:
loki:
resources:
limits:
cpu: 300m
memory: 4Gi
requests:
cpu: 50m
memory: 512Mi
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: opscruise
operator: In
values:
- "true"
weight: 1Field | Description |
|---|---|
| Maximum age of log samples accepted. |
| Enables automatic deletion of old log data. |
| Duration to retain log data. |
| Node affinity for Loki pods. |
Prometheus Redis exporter
The Redis Exporter scrapes Redis server metrics, such as, memory usage, connected clients, commands processed, keyspace statistics, or replication info, and exposes them in Prometheus format for monitoring Redis instances alongside your Kubernetes workloads.
prometheus-redis-exporter: enabled: false redisAddress: redis://<REDIS_IP/FQDN>:6379
Field | Description |
|---|---|
| Enables or disables the Redis exporter. |
| Redis connection URI. |
Nginx Prometheus exporter
The Nginx Exporter scrapes Nginx server metrics from the Nginx stub_status or metrics endpoint and exposes them in Prometheus format for monitoring Nginx instances running in or alongside your cluster.
nginx-prometheus-exporter:
enabled: false
args:
- -nginx.scrape-uri=http://NGINX_ENDPOINT:PORT/METRIC_ENDPOINT
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 20m
memory: 128MiField | Description |
|---|---|
| Enables or disables the Nginx exporter. |
| CLI arguments. |
| CPU and memory requests and limits for the server. |
YACE - AWS CloudWatch exporter
YACE (Yet Another CloudWatch Exporter) scrapes AWS CloudWatch metrics and exposes them in Prometheus format. It enables monitoring of AWS-managed services, for example, RDS, ELB, Lambda, SQS) that do not run inside the Kubernetes cluster but are part of the overall application infrastructure.
prometheus-yace-exporter:
enabled: false
image:
repository: ghcr.io/nerdswords/yet-another-cloudwatch-exporter
tag: v0.61.2
pullPolicy: IfNotPresent
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 20m
memory: 128MiField | Description | Default value |
|---|---|---|
| Enables or disables the YACE CloudWatch exporter. |
|
| Container image repository. |
|
| Container image tag/version. |
|
| Kubernetes image pull policy. |
|
| CPU and memory requests and limits for the server. |