Growth is not an accident. It’s a perfect fit.

Back to all news

GitOps Best Practices

What is GitOps?

GitOps is an operational model where Git is the single source of truth for your infrastructure and applications. Instead of manually configuring servers or clicking through dashboards, you define everything in code, store it in Git, and let automation handle the rest.

The simple rule: What's in Git = What runs in production.

How It Works

  • All configuration lives in Git — infrastructure, applications, policies
  • Changes go through pull requests — code review, approval, audit trail
  • GitOps operator watches the repo — tools like ArgoCD or Flux
  • Automatic sync — operator ensures the live environment matches Git
  • Self-healing — if someone makes manual changes, the system reverts them

Key Advantages

Benefit What It Means for You
Faster Deployments Minutes instead of hours; no manual steps
Complete Audit Trail Every change tracked: who, what, when, why
Easy Rollbacks Revert to any previous state with git revert
Reduced Human Error No more “fat finger” mistakes in production
Enhanced Security No direct production access needed; all changes reviewed
Disaster Recovery Rebuild entire environment from Git in minutes
Drift Detection System alerts when reality doesn’t match desired state

Common Use Cases

  • Kubernetes Deployments — manage apps, scaling, and updates across clusters
  • Infrastructure as Code — provision cloud resources (VMs, networks, storage)
  • Multi-Environment Management — consistent config across dev/staging/prod
  • Compliance & Governance — enforce policies, maintain audit trails
  • Configuration Management — centralize and version all system configs

Architecture Components

Component Role Example Tools
Git Repository Source of truth for all configs GitHub, GitLab, Bitbucket
GitOps Operator Watches Git, applies changes ArgoCD, Flux, Jenkins X
CI Pipeline Builds, tests, validates GitHub Actions, GitLab CI
Target Environment Where workloads run Kubernetes, AWS, Azure

GitOps vs Traditional Deployment

Aspect Traditional GitOps
Change process SSH/console access Git commit + PR
Audit trail Scattered logs Full Git history
Rollback Manual, error-prone git revert
Environment drift Undetected Auto-corrected
Access control Prod credentials needed Git permissions only

Implementation Recommendations

Start with:

  • One non-critical application or environment
  • ArgoCD or Flux as the GitOps operator
  • Clear repository structure (separate repos for apps vs infrastructure)

Repository structure example:

├── apps/
│   ├── app-a/
│   └── app-b/
├── infrastructure/
│   ├── networking/
│   └── storage/
└── environments/
    ├── dev/
    ├── staging/
    └── production/

Most Common Mistakes in GitOps Implementation

  • Not Using Linting/Code Checks: Failing to use linting or code checks often results in inconsistent quotation marks, indentation, and overall messy code. This makes maintenance difficult and increases the risk of errors.
  • Poor Secrets Management: Whether it’s committing secrets directly to the repository or not using an external secrets store, poor secrets management complicates audits and creates potential security risks. It also makes password rotation and automation more difficult.
  • “All-in-One” Approach: We often encounter clients who keep all their Ansible playbooks or Terraform manifests in a single file. This makes the code difficult to read, maintain, and scale.
  • Insufficient Security for Repository Pushes: Allowing direct pushes to the main branch, merging pull requests without approvals, or not enforcing checks can lead to unauthorized or untested changes being deployed.
  • Poor Code Segregation for Different Environments: Improperly designed code segregation for different environments can lead to unintended changes in environments where they were not intended, increasing the risk of misconfigurations.
  • Manual Interventions and Ignoring GitOps: Making quick, temporary fixes directly on servers without updating the repository undermines the GitOps approach. These changes are not tracked, leading to configuration drift and inconsistency.

The Bottom Line

GitOps brings the same rigor we apply to application code to infrastructure management. The result: faster, safer, and more reliable deployments with complete visibility and control.

Key takeaway: GitOps reduces deployment risk and operational overhead while improving speed and compliance — benefiting both engineering teams and business stakeholders.

Learn More About Our Approach

GitOps is part of our TechfittingTM framework, which aligns technology, people, and processes to drive business growth. If you’re looking to implement GitOps or enhance your DevOps practices, explore our DevOps services to see how we can help.

Growth is not an accident. It’s a perfect fit.

Back to all news

Training Course: Legislation in Payment Services Provision in Slovakia and the EU

Course Overview

This course provides a comprehensive overview of the regulatory framework for payment services in Slovakia and the EU. It covers the evolution of Payment Services Directives (PSD1, PSD2, PSD3), Payment Service Regulation (PSR), Open Banking and FIDA. The course is designed for professionals in banking, compliance, and financial services who need to understand the legal and practical implications of these regulations.

Venue, Date, and Time

  • Venue: Grow2FIT s.r.o., Nivy Tower, Mlynské Nivy 5, Bratislava
  • Date: 29.04.2026
  • Time: 09:00 – 15:00 (including a 1-hour lunch break from 11:30 to 12:30)
  • Price: 199€ incl. VAT for 1 participant

Part 1: Payment Services Directives and Regulation

  • What are PSD1, PSD2, and PSD3?
    Overview of each directive, their scope, and what they regulate.
  • What is PSR (Payment Service Regulation)?
    Explanation of PSR, its purpose, and its scope.
  • Goals of PSR
    Key objectives of PSR and its impact on the payment services market.
  • Key Contributions of PSD2 and the Need for PSD3
    Why PSD3 was developed and its main improvements over PSD2.
  • Key Differences Between PSD3 and PSR
    Comparison of PSD3 and PSR, highlighting their unique features and overlaps.
  • Impact of PSD3 and PSR on Consumers and Banks
    How these regulations will affect consumers and financial institutions.
  • New Rulebooks vs. New Legislation
    How new rulebooks align with legislative changes, with practical examples.

Part 2: Open Banking and FIDA

  • Open Banking and API
    What Open Banking means and how APIs facilitate it.
  • FIDA: Objectives and Benefits
    Goals of the Financial Data Access (FIDA) framework and its advantages.
  • How FIDA Works
    Practical explanation of FIDA’s functionality.

Who Should Attend?

  • Compliance officers
  • Banking professionals
  • Financial consultants
  • Legal advisors
  • Payment service providers

Course Benefits

  • Gain a deep understanding of the regulatory landscape for payment services in the EU and Slovakia.
  • Learn about the practical implications of PSD3, PSR, and Open Banking.
  • Stay updated on the latest legislative changes and their impact on your organization.

Your Lector

Milos Molnar foto

Mária Gardianová
Senior Banking Consultant

Mária brings over 32 years of experience in banking, specializing in domestic and cross-border payment services. She has worked in prominent commercial banks, including Creditanstalt, Hypovereinsbank, HVB Bank, Tatrabanka, and BKS Bank, and played a key role in two major mergers: BACA-HVB and HVB-Unibanka. She also spent 4.5 years at the National Bank of Slovakia as a chief inspector overseeing banking and payment services, as well as AML compliance in payment institutions.

Mária was a long-term member of the Slovak Banking Association (2010–2018) and served as a delegate for the Slovak Republic in Brussels at the SEPA Working Group, contributing to the implementation of IBAN+, SCT, and SDD (2012–2016). Additionally, she represented the National Bank of Slovakia in the EBA SCPS (Standing Committee on Payment Services) from 2019 to 2021. Currently, she works at UniCredit S.p.A., focusing on IT service projects and payment service products across CEE countries within the UniCredit Group.

Training Course Registration

    Full Name (required)

    Email (required)

    Company Name (required)

    Phone Number (optional)

    Notes/Specific Requests (optional)

    By submitting this form, you agree to the processing of your personal data in accordance with our privacy policy for the purpose of organizing the training.

    Growth is not an accident. It’s a perfect fit.

    Back to all news

    Case Study: KvaPay – Infrastructure Enhancements & GitOps Deployment

    The Challenge

    KvaPay needed to modernise its infrastructure to support reliable scaling, faster application deployments, and improved operational visibility. Their existing setup lacked automation, advanced monitoring, and structured management of Kubernetes clusters. The company aimed to adopt GitOps principles, strengthen observability, and implement disaster recovery capabilities to ensure stability and efficiency.

    KvaPay crypto payment platform dashboard with transactions, balances and online POS

    The Solution

    We applied our Techfitting™ methodology—tailoring technology, people, and processes to customers’ real business needs. Instead of generic tooling or over-engineering, we designed an infrastructure that “fits” exactly what the business required to grow.

    The architecture was built around two main layers:

    • Kubernetes Infrastructure – Application workloads managed through Rancher, with automated provisioning via Terraform and Ansible.
    • Support Infrastructure – Monitoring, CI/CD runners, error tracking, and backup services.

    Key technologies: Kubernetes, GitLab, Rancher, Terraform, Ansible, Proxmox, ArgoCD.

    All changes were version-controlled and reproducible via GitOps, reducing operational risk. Networks were segmented into facility, production, staging, and backup environments to ensure security and resilience.

    Key Deliverables

    • GitOps automation – CI/CD pipelines with GitLab CI, Terraform, Ansible, and Packer for VM provisioning and configuration.
    • Kubernetes clusters – Centralized management via Rancher, with a monitoring stack (Prometheus, Grafana, Loki) for unified observability.
    • Monitoring & observability – Prometheus/Grafana for clusters and Sentry for application-level exceptions.
    • Resilient infrastructure – Segmented and isolated networks with automated provisioning of masters, workers, and runners.
    • Backup & disaster recovery – External storage integrated as a backup target with tested recovery workflows.
    • Documentation & training – Comprehensive design documentation, workflows, and handover to the KvaPay team.

    Kubernetes cluster architecture with management, production, staging and development environments

    Impact

    By applying Techfitting™, Grow2FIT delivered an infrastructure that was neither “too much” nor “too little,” but exactly what KvaPay needed to grow:

    • Faster, more reliable deployments with full GitOps control.
    • A single pane of glass for infrastructure and application monitoring.
    • Developers empowered by real-time exception tracking through Sentry.
    • Secure, isolated environments ensuring production stability.
    • Future-proof scalability with automated provisioning and centralized management.

    Client Statement

    When we first approached Grow2FIT, we only had rough ideas of what we wanted to achieve. Their team guided us through the analysis and design process, helping us transform those ideas into a robust, well-structured solution perfectly tailored to our needs. Throughout the project, they demonstrated deep expertise, clear communication, and a professional approach that made every phase smooth and effective. We see Grow2FIT as a reliable long-term partner bringing valuable experience and structure to our projects.

    Marián Babušek
    CEO, KvaPay

    Growth is not an accident. It’s a perfect fit.

    Back to all news

    Appello: Delivery & Testing Assessment

    The Challenge

    Appello, a leading provider of innovative Modular/Microservice products for the financial sector, reached out to Grow2FIT with the goal of improving efficiency across its delivery, testing, and tooling practices. Like many innovative technology companies, Appello faced the challenge of scaling its development and testing processes while maintaining flexibility and speed. In particular, the company needed stronger test automation in a low-code environment, unified governance in Jira, and practical recommendations to streamline software delivery and team collaboration.

    Appello cover image

    Our Approach

    With our Techfitting™ mindset – combining the right people, technology, and processes – Grow2FIT designed and delivered a series of Quick Assessments. These focused on the most critical areas of Appello’s IT landscape:

    • Jira governance and optimisation – from security and upgrade options to standardised workflows and reporting.
    • Test automation strategy for low-code platforms – introducing a metadata-driven approach to make automated tests faster to build, easier to maintain, and future-proof.
    • Security improvements – ensuring the tooling environment was aligned with best practices ahead of upcoming audits.

    The Outcome

    The result was a clear and actionable roadmap that provided Appello with both immediate wins and long-term direction. By adopting our recommendations, Appello is now positioned to:

    • speed up testing while reducing maintenance effort,
    • improve governance and security of its tooling,
    • and strengthen its overall software delivery capability.

    Appello final report example

    Impact

    Appello gained not only expert recommendations but also confidence in the next steps for scaling its operations. The collaboration demonstrated the value of Techfitting™: when the right expertise, tools, and processes come together at the right time, growth is no accident – it’s a perfect fit.

    Client Statement

    Grow2FIT has proven to be a trusted partner, delivering quick and focused support exactly when we needed it most. Their expert insights and responsive approach helped us streamline our delivery and testing processes with confidence.

    János Szilágyi – Head of PMO

    Appello Logo

    Services Provided

    Growth is not an accident. It’s a perfect fit.

    Back to all news

    Case Study: SkyToll – Automation of Selenium Grid Environment Setup

    How we transformed sequential testing to parallel by replacing a single Selenium instance with a distributed Grid and automating deployment on MS Azure using Infrastructure as Code approach.

    About the Project

    For SkyToll company, we designed and implemented a transition from a single Selenium instance to a scalable Selenium Grid. The project involved designing a distributed architecture on MS Azure, creating automated scripts for deployment using Infrastructure as Code approach, and migrating existing PHP test scripts from a single-instance to parallel grid environment.

    Current State (AS-IS)

    • Testing was performed in a single Selenium instance with sequential test execution
    • Limited testing capacity without parallelization capability
    • Lack of scalable infrastructure for distributed testing
    • Dependency on a single test node without redundancy

    Implemented Solution

    • Distributed platform: Transition to Selenium Grid with parallel testing capability
    • Infrastructure as Code: Automated scripts for reproducible deployment of entire grid infrastructure
    • Test parallelization: Transformation from sequential testing to parallel execution
    • Cloud-native approach: Utilization of Azure services for optimal scalability

    Technical Solution

    Deployment and Infrastructure

    • Cloud-native solution: Deployment on MS Azure
    • Automated deployment: Ansible scripts for completely automated grid infrastructure creation
    • Scalable architecture: Hub-node topology enabling dynamic addition of test nodes
    • Parallel execution: Capability for concurrent execution of multiple tests across nodes

    Integrations and Tools

    • Selenium Grid: Central coordination of distributed testing instead of single instance
    • PHP scripts: Migration of existing test scripts for grid architecture with parallelization support
    • MS Azure: Cloud platform for hosting and scaling
    • Ansible: Infrastructure as Code for automated deployment of entire infrastructure
    • DevOps tools: Automation of deployment processes

    Results and Benefits

    Operational Advantages

    • Parallel testing: Capability for parallel execution of multiple tests simultaneously
    • Scalability: Ability to dynamically add test nodes as needed
    • Increased capacity: Elimination of single Selenium instance bottleneck
    • Redundancy: Elimination of dependency on single test node
    • Rapid deployment: Automated scripts enable quick creation of new grid environment

    Technical Benefits

    • Distributed platform: Transition from monolithic to distributed testing architecture
    • Reproducibility: Infrastructure as Code ensures consistent deployment
    • Cloud benefits: Utilization of Azure services for optimal performance and cost efficiency

    Result: SkyToll obtained a modern, scalable testing environment with parallel test execution capability, which dramatically accelerated the entire testing process and increased capacity without dependency on a single node.

    Services Provided

    Growth is not an accident. It’s a perfect fit.

    Back to all news

    Case Study: ČSOB CZ – DevOps Capabilities Assessment

    How we conducted a detailed DevOps assessment for ČSOB, delivering actionable recommendations and clear roadmaps across GitOps, Service Mesh, and Observability to accelerate their digital transformation.

    About the Project

    For ČSOB CZ (Czechoslovak Commercial Bank), we conducted a structured DevOps Capabilities Discovery Phase to assess current practices and design transformation strategies across three key areas: GitOps and Versioning, Service Mesh, and Observability. The engagement delivered concrete recommendations with clear prioritisation to enable ČSOB’s teams to implement meaningful improvements while respecting banking sector regulatory requirements.

    Our proven assessment methodology

    Phase 1: Introductory Workshop & Current State Analysis

    • Infrastructure assessment: Detailed evaluation of existing DevOps tools, including GitLab, GitHub Actions, ArgoCD, OpenShift, and Broadcom monitoring stack.
    • DevOps process evaluation: In-depth analysis of development and deployment workflows, CI/CD configurations, release management, and operational procedures to identify optimisation opportunities.
    • Team engagement: Direct collaboration with ČSOB’s development and operations teams to understand real pain points and organisational constraints.
    • Baseline establishment: Documentation of current capabilities across all focus areas.

    Phase 2: Gap Analysis & Solution Design

    • Industry benchmarking: Comparison against banking sector best practices and modern DevOps standards.
    • Technology evaluation: Assessment of tools like OpenTelemetry, Prometheus, Elastic Stack, HashiCorp Vault, and Consul for their fit within ČSOB’s environment.
    • TO-DO recommendations development: Creation of detailed activity lists for each focus area, with specific recommendations ranked by importance and implementation complexity.
    • Roadmap prioritisation: Development of prioritised implementation roadmaps that identify quick wins and strategic initiatives, with the selection of immediate next steps based on impact assessment.

    Key outcomes and value delivered

    GitOps & Versioning Optimisation

    Our assessment provided specific recommendations for consolidating CI/CD workflows, implementing Software Bill of Materials (SBOM) for container security, and establishing automated certificate lifecycle management. We designed a clear path from their current multi-tool environment (GitLab, UrbanCode, GitHub Actions, ArgoCD) to a standardised, secure deployment process.

    Service Mesh Enhancement

    We delivered a modular implementation strategy, breaking down their complex service mesh into manageable components (API Gateway, Ingress/Egress Gateway, Service Discovery), with specific guidance for gradual integration, starting with 5-10 applications. Our recommendations addressed their existing Consul, Envoy, and Vault implementations while planning for OpenShift integration.

    Comprehensive Observability

    We designed a hybrid approach, maintaining their Broadcom investments while introducing modern capabilities through OpenTelemetry and Prometheus. This addressed their vendor lock-in concerns while improving troubleshooting by better correlating data between metrics, logs, and traces.

    Key benefits achieved

    • Enhanced security through container signing and SBOM implementation.
    • Improved operational efficiency via automated deployment processes.
    • Vendor flexibility through open standards adoption.
    • Faster problem resolution with better observability correlation.

    Why our approach works

    This engagement demonstrates our systematic methodology for DevOps transformation in highly regulated environments. We combine in-depth technical assessments with collaborative workshops to ensure that recommendations are both technically sound and organizationally achievable.

    Key success factors

    • Banking sector expertise: Understanding of regulatory constraints and compliance requirements.
    • Collaborative approach: Working directly with client teams rather than imposing external solutions.
    • Practical focus: Delivering actionable recommendations with clear implementation paths.
    • Risk-aware planning: Phased implementation minimising operational disruption.

    The result: ČSOB received not just an assessment report, but a practical transformation roadmap they could immediately begin executing with their existing teams and resources.

    Client Statement

    The external study has provided us with a valuable independent perspective on our processes and brought in new insights that go beyond our existing in-house know-how. Thanks to the objective and pragmatic assessment of further development options, we now have a clearer understanding of potential innovations and risks that we might have overlooked internally. I especially appreciate the perspective and independence that the external team brought to the evaluation, enabling us to make more qualified decisions about our future direction.

    Roman Mašek, Director – IT Digital Services

    Services Provided

     

    Growth is not an accident. It’s a perfect fit.

    Back to all news

    Case Study: SkyToll – Automated Central Monitoring

    How we transformed infrastructure monitoring by consolidating 7 fragmented systems into one monitoring platform with advanced automation – environment preparation shortened from 1.5 days to 25 minutes, replacing 7 dashboards with just 1 and automatic Jira ticket creation.

    About the Project

    For SkyToll, we designed and implemented a comprehensive automated central infrastructure monitoring solution aimed at consolidating and automating oversight of client systems. The project involved migration from fragmented monitoring to a unified platform with advanced analytical and reporting capabilities.

    Key Challenges

    Current State (AS-IS)

    • Separate Zabbix installations for each customer with different OS versions, databases, and configurations.
    • No unified dashboard across different Zabbix installations.
    • Primarily technical monitoring without service-level perspective.
    • Manual process for creating Jira tickets.
    • Employees monitoring 7 different Zabbix dashboards.

    Implemented Solution

    • Consolidated Platform: Independently built monitoring platform with Grafana dashboards consolidating status from local Zabbix installations.
    • Client Customization: Zabbix plugins customized according to specific requirements of individual customers.
    • Process Automation: Automated Jira ticket creation for detected issues.
    • Extended Monitoring: In addition to technical monitoring, service-level monitoring was implemented (planned application-level monitoring was not implemented).
    • Centralized Management: Consolidation of all Zabbix installations into one control center.

    Technical Solution

    Deployment and Infrastructure

    • Infrastructure as Code: Deployment using Terraform, Ansible, and GitLab CI/CD.
    • Configuration Automation: Zabbix automatically configured using Git and Ansible.
    • 98% time savings: Dramatic improvement in deployment efficiency – complete installation shortened from 1-1.5 days to 25 minutes.
    • Scalability: Ability to rapidly deploy for any number of new clients.
    • Note: Only automated installations were addressed; Zabbix configuration remained manual.

    Integrations and Tools

    • Grafana: Central dashboards and visualizations with automated deployment.
    • Zabbix: Advanced automation, noise reduction, JMX monitoring with plugin customization.
    • Jira Integration: Automatic incident creation and replication.
    • ELK Stack / New Relic: Service level monitoring (application-level monitoring not implemented).
    • API Verification: ACK messages via API interface.
    • DevOps Tools: Terraform, Ansible, GitLab CI/CD.

    Results and Benefits

    Operational Advantages

    • Monitoring Consolidation: One central dashboard instead of monitoring 7 different systems.
    • Process Automation: Elimination of manual processes in incident creation.
    • Dramatic deployment time reduction: New environment installation accelerated from 1-1.5 days to 25 minutes (98% time savings).
    • Increased Efficiency: Faster identification and resolution of issues.
    • Better Visibility: Comprehensive view of all system states.

    Technical Benefits

    • Unified Platform: Standardized environment for all monitoring activities.
    • Standardization: Strategy for upgrading and consolidating Zabbix versions (4.x → 5.x → 6.x → 6.4).
    • High Availability: Production deployment with HA architecture.
    • Extended Capabilities: Advanced analytical and reporting functions.

    Strategic Direction

    The project represents the first step in a long-term monitoring modernization strategy with planned gradual migration to the latest tool versions and expansion with advanced application monitoring features. The solution ensures scalability for future growth and integration of new technologies.

    Contact Person

    Roman Minár, Head of IT Operations

    Services Provided

     

    Growth is not an accident. It’s a perfect fit.

    Back to all news

    🏃‍♂️💙 Grow2FIT Ran Strong at the ČSOB Bratislava Marathon

    🏃‍♂️💙 At Grow2FIT, we love tech — and we love outdoor sports just as much!

    That’s why we were proud to take part in this year’s ČSOB Bratislava Marathon with two relay teams.

    It was an amazing experience to spend the whole Sunday running through the streets of Bratislava, cheering each other on, and celebrating the results together afterward. The energy, the teamwork, and the city vibes were unforgettable.

    💬 Big shoutout to the event organizers – everything was well-prepared and professional. We’ll definitely be back next year!

     

    Growth is not an accident. It’s a perfect fit.

    Back to all news

    Disaster Recovery in Ceph with cephadm, Ceph-CSI, and RBD Mirror

    Introduction

    Ceph is a highly available, scalable, and resilient storage solution widely used in cloud and enterprise environments. However, even with its built-in redundancy, disaster recovery (DR) strategies are essential to ensure business continuity in case of data center failures, network outages, or hardware failures. Ceph provides robust disaster recovery options, including RBD mirroring, to replicate block storage volumes across geographically separated Ceph clusters.

    With the introduction of cephadm, Ceph cluster management has become more straightforward, making it easier to deploy and maintain disaster recovery setups. Additionally, Ceph-CSI enables Kubernetes clusters to consume Ceph storage efficiently. In this article, we will explore how to set up disaster recovery in Ceph using cephadm, Ceph-CSI, and RBD Mirror to protect RBD volumes used by Kubernetes clusters deployed across two data centers.

    Disaster Recovery Architecture

    We have two geographically separated data centers:

    • Primary Data Center (Production): Hosts a Kubernetes cluster and a Ceph cluster.
    • Secondary Data Center (Disaster Recovery – DR): Hosts another Kubernetes cluster and a Ceph cluster where data is replicated using RBD mirroring.

    Kubernetes workloads in the primary DC store their persistent data in Ceph RBD volumes via Ceph-CSI. These volumes are mirrored asynchronously to the secondary DC using RBD mirroring, ensuring data availability in case of a failure in the primary DC.

    Deploying Ceph with cephadm in Both Data Centers

    Bootstrap the Ceph Cluster

    On each Ceph cluster (Primary and Secondary):

    cephadm bootstrap --mon-ip 

    Add Additional Nodes

    cephadm shell -- ceph orch host add  

    Deploy Required Services

    ceph orch apply mon
    ceph orch apply mgr
    ceph orch apply osd --all-available-devices
    ceph orch apply rbd-mirror

    Ensure that the rbd-mirror daemon is running on both clusters:

    ceph orch ps | grep rbd-mirror

    Configure RBD Mirroring

    On the primary Ceph cluster:

    rbd mirror pool enable  snapshot

    Export and import the authentication key:

    ceph auth get-key client.rbd-mirror > rbd-mirror.key
    scp rbd-mirror.key 
    ssh  'ceph auth import -i rbd-mirror.key'

    On the secondary Ceph cluster, add a peer connection:

    rbd mirror pool peer add  client.rbd-mirror@

    Verify peering status:

    rbd mirror pool status 

    Installing Ceph-CSI in Kubernetes Clusters

    Now we have ceph cluster ready and we can deploy ceph-csi on our k8s clusters. We need to deploy ceph-csi in both locations, but

    Deploy Ceph-CSI Driver

    kubectl apply -f https://raw.githubusercontent.com/ceph/ceph-csi/devel/deploy/csi-rbdplugin.yaml
    kubectl apply -f https://raw.githubusercontent.com/ceph/ceph-csi/devel/deploy/csi-rbdplugin-provisioner.yaml 

    Enable RBD Mirroring on the Pool

    rbd mirror pool enable  snapshot

    Configure StorageClass to Use the Mirrored Pool

    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
      name: ceph-rbd-mirrored
    provisioner: rbd.csi.ceph.com
    parameters:
      clusterID: 
      pool: 
      imageFormat: "2"
      imageFeatures: layering
      csi.storage.k8s.io/provisioner-secret-name: csi-rbd-secret
      csi.storage.k8s.io/provisioner-secret-namespace: default
      csi.storage.k8s.io/node-stage-secret-name: csi-rbd-secret
      csi.storage.k8s.io/node-stage-secret-namespace: default
    reclaimPolicy: Delete
    allowVolumeExpansion: true

    Apply this StorageClass:

    kubectl apply -f storageclass.yaml

    Create a PersistentVolumeClaim (PVC)

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: ceph-rbd-pvc
    spec:
      accessModes:
        - ReadWriteOnce
      resources:
        requests:
          storage: 10Gi
      storageClassName: ceph-rbd-mirrored

    Failover Process (Switching to the Secondary Data Center)

    Promote the Secondary Ceph Cluster

    rbd mirror pool promote 

    Update ClusterID and PoolID Mappings

    Ensure that the Kubernetes cluster in the DR site correctly maps the Ceph cluster’s ClusterID and PoolID using the predefined mapping.

           apiVersion: v1
           kind: ConfigMap
           metadata:
             name: ceph-csi-config
           data:
             cluster-mapping.json: |-
               [
                 {
                   "clusterIDMapping": {
                     "primary-cluster-id": "secondary-cluster-id"
                   },
                   "RBDPoolIDMapping": [
                     {
                       "1": "2"
                     },
                     {
                       "11": "12"
                     }
                   ]
                 }
               ]
    

    Apply this updated mapping:

          
    kubectl apply -f ceph-csi-config.yaml

    Modify Ceph-CSI Config to Update Monitor Addresses on Secondary Cluster

    To use a mirrored and promoted RBD image on a secondary site during a failover, you need to replace the primary monitor addresses with the IP addresses of the secondary cluster in ceph-csi-config. Otherwise, Ceph-CSI won’t be able to use the volumes, and application pods will become stuck in the ContainerCreating state. Thus, during failover, both clusters will have the same monitor IP addresses in csi-config on secondary site.

           apiVersion: v1
           kind: ConfigMap
           metadata:
             name: ceph-csi-config
           data:
             config.json: |-
               [
                 {
                  "clusterID": "ceph1",
                  "rbd": {
                     "radosNamespace": "",
                  },
                  "monitors": [
                    "192.168.39.82:6789"
                  ],
                  "cephFS": {
                    "subvolumeGroup": ""
                  }
                 },
                 {
                  "clusterID": "ceph2",
                  "rbd": {
                     "radosNamespace": "",
                  },
                  "monitors": [
                    "192.168.39.82:6789"
                  ],
                  "cephFS": {
                    "subvolumeGroup": ""
                  }
                 }
               ]
    

    Apply the updated configuration

    kubectl apply -f ceph-csi-config.yaml

    Modify StorageClass to Point to the Secondary Cluster

    parameters:
    clusterID: secondary-cluster-id

    Apply the modified StorageClass:

    kubectl apply -f storageclass.yaml

    Restart Affected Workloads

    kubectl rollout restart deployment 

    Validate Data Accessibility

    Ensure the applications can access data stored in the secondary Ceph cluster.

    Failback Process (Restoring to the Primary Data Center)

    Demote the Secondary Cluster and Re-enable Mirroring

    rbd mirror pool demote 

    Update ClusterID and PoolID Mappings Back to Primary

    apiVersion: v1
    kind: ConfigMap
    metadata:
      name: ceph-csi-config

    Modify StorageClass to Point Back to the Primary Cluster

    parameters:
    clusterID: primary-cluster-id

    Restart Workloads to Use the Primary Cluster

    kubectl rollout restart deployment 

    Verify Mirroring and Data Integrity

    rbd mirror pool status 

    Conclusion

    By configuring ClusterID and PoolID mappings and ensuring proper Ceph monitor address updates during failover, you enable seamless disaster recovery for Kubernetes workloads using Ceph-CSI. This approach maintains data accessibility and consistency, facilitating a smoother failover and failback process. Using cephadm, deploying and managing mirroring has become significantly easier, enabling organizations to set up failover mechanisms efficiently. By following the above steps, you can ensure data integrity, minimize downtime, and enhance business continuity in the event of a disaster.

    Author

    Kamil Madáč
    Grow2FIT Cloud&DevOps Consultant

    Kamil is a Senior Cloud / Infrastructure consultant with 20+ years of experience and strong know-how in designing, implementing, and administering private cloud solutions (primarily built on OpenSource solutions such as OpenStack). He has many years of experience with application development in Python and currently also with development in Go. Kamil has substantial know-how in SDS (Software-defined storage), SDN (Software-defined networking), Data storage (Ceph, NetApp), administration of Linux servers and operation of deployed solutions.
    Kamil regularly contributes to OpenSource projects (OpenStack, Kuryr, Requests Lib – Python).

    The entire Grow2FIT consulting team: Our Team

    Related services

    Growth is not an accident. It’s a perfect fit.

    Back to all news

    Enhancing DevOps Excellence: Our Collaboration with ČSOB CZ

    🇨🇿 Greetings from Prague

    We’re on-site with CSOB, diving deep into the assessment of their DevOps capabilities. Our focus? Enhancing practices in these critical areas:

    🔧 GitOps – Streamlining configuration, versioning, and automated deployments
    🌐 Service Mesh – Optimizing service discovery, traffic management, and security integration
    📊 Observability – Building robust monitoring, logging, and alerting systems for long-term insights

    Excited to collaborate with the talented ČSOB DevOps team to align their practises with cutting-edge industry standards while respecting the unique needs of the banking sector. 🚀

    Check our DevOps services here