Skip to main content

Container Observability upgrade

Container Observability runs as a set of Helm releases across shared services, the frontend and backend, and the South components. Upgrade Container Observability whenever you upgrade the Virtana Platform to 2026.6.1.

This topic describes how to upgrade an existing Container Observability deployment to version 2026.6.1. Complete the procedures in the order shown: first upgrade the shared services, then the frontend and backend, and finally the South components.

Prerequisites

Before you upgrade Container Observability to 2026.6.1, confirm that your environment meets the following requirements:

  • An existing Container Observability deployment running.

  • An upgraded Global View 2026.6.1 deployment. The frontend upgrade reads the organization ID from the upgraded Global View and Keycloak.

  • A Kubernetes cluster with Helm 3.x installed, and access to the cluster from the machine that runs the upgrade.

  • Access to the Virtana Helm repository and valid Docker registry credentials.

  • For the shared services upgrade, storage classes for the Solr and Zookeeper persistent volume claims that support volume expansion (allowVolumeExpansion: true).

What's new in 2026.6.1

Version 2026.6.1 changes several parts of the Container Observability configuration. Review these changes before you prepare your values files:

  • Shared services deployment tag changed: The shared services deployment tag changes from tags.oc_shared_services=true to tags.platform=true. Update your deployment commands to use the new tag.

  • Solr and Zookeeper images replaced: The Bitnami Solr and Zookeeper images are replaced. Because the new images store data differently, the upgrade runs a one-time data migration for the shared services. Plan for this migration before you begin.

  • New frontend namespace variable: A new variable, FRONTEND_NAMESPACE, identifies the namespace that hosts the Container Observability frontend. You set this variable for each backend deployment.

The following table shows the deployment-tag change for the shared services:

Previous release

Current release

--set tags.oc_shared_services=true

--set tags.platform=true

Deployments that use the previous tag (tags.oc_shared_services=true) must be updated to use tags.platform=true to remain compatible with this release.

Upgrade the shared services

Upgrade the Container Observability shared services first. This stage migrates the Solr and Zookeeper data, updates the deployment tag, and redeploys the shared services with the 2026.6.1 chart.

Migrate Solr and Zookeeper data

Version 2026.6.1 replaces the Bitnami Solr and Zookeeper images used by the shared services. Because the replacement images store data differently, the upgrade runs a one-time job that migrates your existing Solr and Zookeeper data to the new images. The migration applies only to a Solrcloud instance installed in the oc-shared-services namespace.

Warning

This upgrade migrates data. Read this section before you start, and complete the storage, traffic, and security checks. If you skip the checks, the migration job can fail or cause data inconsistency.

Before you start the upgrade, confirm that the Solr and Zookeeper persistent volume claims (PVCs) can be resized:

  • Confirm that the storage class used by the Solr and Zookeeper PVCs supports volume expansion (allowVolumeExpansion: true).

  • If a Solr or Zookeeper PVC is more than 45% full, the migration job attempts to resize it automatically. Automatic resizing requires a storage class that supports volume expansion.

  • If your storage class already supports volume expansion, you can skip the usage check below.

To measure how full the Solr and Zookeeper PVCs are, run the following script from a machine that has access to the cluster where the shared services are deployed. The script writes a solr_pvc_usage.csv report:

#!/bin/bash
OUTPUT_FILE="solr_pvc_usage.csv"
NAMESPACE="oc-shared-services"
echo "Pod,Filesystem,Size,Used,Available,UsePercent" > "$OUTPUT_FILE"

# Solr pods
for pod in solrcloud-solr-0 solrcloud-solr-1 solrcloud-solr-2
do
  kubectl -n "$NAMESPACE" exec "$pod" -- df -h /bitnami/solr 2>/dev/null | \
  awk -v pod="$pod" 'NR==2 {print pod "," $1 "," $2 "," $3 "," $4 "," $5}' \
  >> "$OUTPUT_FILE"
done

# Zookeeper pods
for pod in solrcloud-zookeeper-0 solrcloud-zookeeper-1 solrcloud-zookeeper-2
do
  kubectl -n "$NAMESPACE" exec "$pod" -- df -h /bitnami/zookeeper 2>/dev/null | \
  awk -v pod="$pod" 'NR==2 {print pod "," $1 "," $2 "," $3 "," $4 "," $5}' \
  >> "$OUTPUT_FILE"
done
echo "Results written to $OUTPUT_FILE"

After the script runs, review the report and resize any PVC that exceeds the threshold:

  • Open solr_pvc_usage.csv and check whether any PVC is more than 45% full.

  • If a PVC is more than 45% full and your storage class does not support volume expansion, resize the PVC manually before you start the upgrade. If a PVC exceeds the threshold and is not resized, the migration job can fail.

  • Migration takes a few minutes, depending on how much data is stored in Solrcloud. Run the helm upgrade command for the control plane service with a timeout of at least one hour so the migration has time to complete.

Plan for the following before you begin the migration:

  • Stop all traffic to the cluster before migration to avoid data inconsistency. Resume traffic after the migration completes and you can view Kubernetes events data in Container Observability.

  • The migration job creates a Kubernetes role. Confirm that no cluster policy blocks the creation of this role or the execution of the job. The role needs: get, list, watch, patch, and update on the solrcloud-solr, solrcloud-zookeeper, apache-solr, and apache-zookeeper statefulsets; get, list, watch, create, delete, and exec on pods; get, list, watch, create, update, patch, and delete on PVCs; and get, list, and watch on deployments.

When you set co-config-events-service-shared.solr.migration.enabled to true in app-mon-shared-services-values.yaml, the upgrade runs the migration as follows:

  1. The job creates a PVC named solrcloud-backup-pvc to use as temporary backup storage.

  2. The job checks the Solr and Zookeeper PVC usage. If usage exceeds the threshold, the job resizes the PVCs, then patches the existing Solrcloud statefulset and waits for the pods to come online.

  3. The job backs up the data from each Solr pod to the backup PVC, one pod at a time, then merges the data and writes a backup.complete marker so a re-run skips the backup step.

  4. The deployment deletes the existing Solr and Zookeeper pods and creates new pods that use the new images.

  5. A second job restores the data from the backup PVC into the new Solr pods through the Solr API, then removes the temporary data from the pods.

  6. When the job completes, your data is available in the Global View UI.

To force a fresh backup on a re-run, delete the backup.complete marker on the migration PVC. Deleting the solrcloud-backup-pvc PVC instead removes all backed-up data, so the job must back up the data again.

Migration is enabled by default in 2026.6.1. To tune the migration, set the following values in app-mon-shared-services-values.yaml:

co-config-events-service-shared:
  solr:
    migration:
      enabled: true
      storageSize: "100Gi"      
      pvcResizeThreshold: 45   
      resources:
        requests:
          cpu: 100m
          memory: 1Gi
        limits:
          cpu: 500m
          memory: 1Gi

The following table describes the migration fields. The field prefix co-config-events-service-shared.solr.migration is omitted from the field names:

Field

Description

Default value

solr.migration.enabled

Enables the Solr and Zookeeper data migration during the upgrade.

true

solr.migration.storageSize

Size of the PVC created for the migration job. Set it to the same size as the Solrcloud PVC.

"100Gi"

solr.migration.pvcResizeThreshold

Usage percentage above which the job resizes the Solr and Zookeeper PVCs. Applies only when migration is enabled.

45

solr.migration.resources.requests.cpu

Minimum CPU reserved for the migration job.

100m

solr.migration.resources.requests.memory

Minimum memory reserved for the migration job.

1Gi

solr.migration.resources.limits.cpu

Maximum CPU the migration job can consume.

500m

solr.migration.resources.limits.memory

Maximum memory the migration job can consume.

1Gi

When the migration is complete, finish these cleanup tasks:

  • Set co-config-events-service-shared.solr.migration.enabled to false in app-mon-shared-services-values.yaml so the migration job does not run again during future upgrades.

  • Keep the solrcloud-backup-pvc PVC for a few days as a backup. After you confirm that all data appears in Global View, delete it to free space.

  • The Solr and Zookeeper PVCs from the previous deployment remain in the cluster. Delete them to free space after you confirm a successful migration.

Create the shared services values file for upgrade

Create a file named app-mon-shared-services-values.yaml with the following content, and change the values to match your environment. The following sections describe each block in the file.

After the including the VictoriaMetrics cluster commands, add Solr migration toggle commands in the end which enables the Solr and Zookeeper data migration.

global:
  environment: "app"
  machine_type: "small" 
  secret_source: "valuesfile" 
  dockerRegistryCredentials: 
    DOCKER_SERVER: "https://index.docker.io/v2/"
    DOCKER_USERNAME: "username"
    DOCKER_PASSWORD: "password"

oc-shared-kafka: 
  controller:
    persistence:
      storageClass: ""  
      size: 100Gi       
    
    resources:
      requests:
        cpu: 500m
        memory: 1536Mi
      limits:
        cpu: 1
        memory: 1536Mi
    heapOpts: -Xmx1g -Xms512m
    provisioning:
      enabled: true
      topics:
      - name: ops_ingester_tsdb
        partitions: 40
    extraConfig: |
      num.partitions=40
      default.replication.factor=1
      log.retention.hours=1
      log.segment.bytes=1073741824
      message.max.bytes=20981520
      socket.send.buffer.bytes=102400
      socket.receive.buffer.bytes=102400
      socket.request.max.bytes=104857600
      offsets.topic.replication.factor=1
      transaction.state.log.min.isr=1
      transaction.state.log.replication.factor=1
    nodeSelector: {}
    tolerations: []

cp-metrics-service:
  global:
    nodeSelector: &nodeSelector {}
  env:
    KAFKA_LISTENER_CONCURRENCY: "2"
  horizontalPodAutoscaler:
    enabled: true
    maxReplicas: 5
    minReplicas: 2

victoria-metrics-cluster:
  vmstorage:
    replicaCount: 4
    
    retentionPeriod: 1
    persistentVolume:
      storageClassName: "" 
      size: 100Gi
    resources:
      limits:
        cpu: 500m
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 1Gi
    nodeSelector: {}
    tolerations: []
  vmselect:
    horizontalPodAutoscaler:
      minReplicas: 2
      maxReplicas: 10
    nodeSelector: {}
    tolerations: []
  vminsert:
    horizontalPodAutoscaler:
      minReplicas: 2
      maxReplicas: 10
    nodeSelector: {}
    tolerations: []

co-config-events-service-shared:
  solr:
    migration:
      enabled: true

Deployment-wide settings

This block sets the environment, machine size, secret source, and Docker registry credentials shared across the shared services:

global:
  environment: "app"
  machine_type: "small"          
  secret_source: "valuesfile"    
  dockerRegistryCredentials:     
    DOCKER_SERVER: "https://index.docker.io/v2/"
    DOCKER_USERNAME: "username"
    DOCKER_PASSWORD: "password"

The following table describes the deployment-wide fields:

Field

Description

Default value

global.environment

Logical environment name applied to the deployment.

"app"

global.machine_type

Resource profile applied to the shared services.

"small" (small/medium/large)

global.secret_source

Where secrets come from. valuesfile creates secrets from this file; none expects secrets to already exist in the namespace.

"valuesfile"

global.dockerRegistryCredentials.DOCKER_SERVER

Docker registry endpoint used to pull images.

"https://index.docker.io/v2/"

global.dockerRegistryCredentials.DOCKER_USERNAME

User name for the Docker registry.

"username"

global.dockerRegistryCredentials.DOCKER_PASSWORD

Password for the Docker registry.

"password"

Shared Kafka parameters

This block configures the shared Kafka controller, including persistence, resource sizing, JVM heap, topic provisioning, and Kafka server properties:

oc-shared-kafka:
  controller:
    persistence:
      storageClass: ""    
      size: 100Gi         
    resources:
      requests:
        cpu: 500m
        memory: 1536Mi
      limits:
        cpu: 1
        memory: 1536Mi
    heapOpts: -Xmx1g -Xms512m
    provisioning:
      enabled: true
      topics:
        - name: ops_ingester_tsdb
          partitions: 40
    extraConfig: |
      num.partitions=40
      default.replication.factor=1
      log.retention.hours=1
      ...
    nodeSelector: {}
    tolerations: []

The following table describes the shared Kafka fields:

Field

Description

Default value

oc-shared-kafka.controller.persistence.storageClass

StorageClass for Kafka persistent volumes. If blank, the default StorageClass is used.

""

oc-shared-kafka.controller.persistence.size

Size of the Kafka PVC. Size it as 50 GB per backend.

100Gi

oc-shared-kafka.controller.resources.requests.cpu

Minimum CPU reserved for the Kafka container.

500m

oc-shared-kafka.controller.resources.requests.memory

Minimum memory reserved for Kafka.

1536Mi

oc-shared-kafka.controller.resources.limits.cpu

Maximum CPU Kafka can consume.

1

oc-shared-kafka.controller.resources.limits.memory

Maximum memory Kafka can consume.

1536Mi

oc-shared-kafka.controller.heapOpts

Kafka JVM heap options, where -Xmx is the maximum heap size and -Xms is the initial heap size.

-Xmx1g -Xms512m

oc-shared-kafka.controller.provisioning.enabled

Enables automatic creation of Kafka topics during deployment.

true

oc-shared-kafka.controller.provisioning.topics

Topics to create. Each entry has a name and a partitions count.

ops_ingester_tsdb, 40 partitions

oc-shared-kafka.controller.extraConfig

Additional Kafka server properties injected into the controller configuration.

oc-shared-kafka.controller.nodeSelector

Constrains Kafka pods to nodes with matching labels.

{}

oc-shared-kafka.controller.tolerations

Allows Kafka pods to schedule onto tainted nodes.

[]

Metrics service parameters

This block configures the metrics service, including the Kafka listener concurrency and horizontal pod auto-scaling. It also defines a reusable nodeSelector anchor:

cp-metrics-service:
  global:
    nodeSelector: &nodeSelector {}
  env:
    KAFKA_LISTENER_CONCURRENCY: "2"
  horizontalPodAutoscaler:
    enabled: true
    maxReplicas: 5
    minReplicas: 2

The following table describes the metrics service fields:

Field

Description

Default value

cp-metrics-service.global.nodeSelector

Node selection rules for the metrics service, defined as a reusable anchor.

&nodeSelector {}

cp-metrics-service.env.KAFKA_LISTENER_CONCURRENCY

Number of concurrent Kafka listener threads the service uses.

"2"

cp-metrics-service.horizontalPodAutoscaler.enabled

Enables horizontal pod auto-scaling for the metrics service.

true

cp-metrics-service.horizontalPodAutoscaler.minReplicas

Minimum number of replicas when auto-scaling is enabled.

2

cp-metrics-service.horizontalPodAutoscaler.maxReplicas

Maximum number of replicas when auto-scaling is enabled.

5

VictoriaMetrics cluster parameters

This block configures the VictoriaMetrics cluster that stores time-series metrics, including storage retention, persistence, resources, and auto-scaling for the select and insert components:

victoria-metrics-cluster:
  vmstorage:
    replicaCount: 4
    retentionPeriod: 1    
    persistentVolume:
      storageClassName: ""    
      size: 100Gi
    resources:
      limits:
        cpu: 500m
        memory: 1Gi
      requests:
        cpu: 500m
        memory: 1Gi
    nodeSelector: {}
    tolerations: []
  vmselect:
    horizontalPodAutoscaler:
      minReplicas: 2
      maxReplicas: 10
  vminsert:
    horizontalPodAutoscaler:
      minReplicas: 2
      maxReplicas: 10

The following table describes the VictoriaMetrics cluster fields:

Field

Description

Default value

victoria-metrics-cluster.vmstorage.replicaCount

Number of vmstorage replicas.

4

victoria-metrics-cluster.vmstorage.retentionPeriod

Data retention period. Supported values: 1w, 1d, or a number that means months (2 = 2 months).

1

victoria-metrics-cluster.vmstorage.persistentVolume.storageClassName

StorageClass for vmstorage volumes. If blank, the default StorageClass is used.

""

victoria-metrics-cluster.vmstorage.persistentVolume.size

Size of the vmstorage PVC.

100Gi

victoria-metrics-cluster.vmstorage.resources

CPU and memory requests and limits for vmstorage.

500m/1Gi

victoria-metrics-cluster.vmselect.horizontalPodAutoscaler

Minimum and maximum vmselect replicas when auto-scaling.

2/10

victoria-metrics-cluster.vminsert.horizontalPodAutoscaler

Minimum and maximum vminsert replicas when auto-scaling.

2/10

Solr migration toggle

This block enables the Solr and Zookeeper data migration described in Migrate Solr and Zookeeper data. Keep it enabled for the upgrade, then disable it afterward:

co-config-events-service-shared:
  solr:
    migration:
      enabled: true

The following table describes the migration toggle field:

Field

Description

Default value

co-config-events-service-shared.solr.migration.enabled

Runs the Solr and Zookeeper data migration during the upgrade. Set to false after the migration completes.

true

Deploy the shared services

Deploy the shared services with Helm, Argo CD, or Terraform. Before you begin, update the Helm repository and check the latest version of the virtana-repo/virtana-co-controller chart:

helm repo update
helm search repo virtana-repo/virtana-co-controller

See the Deploy shared services for detailed configuration settings.

Re-deploy the frontend and backend

After you upgrade the shared services, see Deploy Container Observability Frontend and Backend to re-deploy the CO frontend and one backend per monitored cluster.

Re-deploy the South components

After deploying the frontend and backend, see Deploy Container Observability South to re-deploy the CO South components to collect data from your monitored clusters.