Programas extraescolares enfocados a acercar a los más jóvenes el mundo de la tecnología: programación, inteligencia artificial, electrónica, videojuegos, ciberseguridad, etc...
The role of Disaster Recovery architectures for the protection of critical data in cloud environments
Technological attacks or failures are not always foreseeable, and it is in this context where DR (Disaster Recovery) architectures play a fundamental role, guaranteeing business continuity even in the face of unexpected events that can affect structures and data.
Natural disasters, technological failures and even human error can compromise the integrity and persistence of data, highlighting the importance of having a robust strategy in place to minimise their impact and ensure the availability of affected resources.
Technologies Azure DevOps | Visual Studio Code | Terraform | Docker | Kubernetes | DR Architectures
⭐Best Capstone Award 2024
What is the motivation?
Cloud environments have great potential to help ensure business continuity and protect critical data from unplanned adverse events, minimising both downtime to return to operation and data loss. Used correctly, the use of cloud-enabled tools can help ensure data integrity and availability, strengthening business resilience and the ongoing protection of critical business assets.
Design an active-passive DR architecture between Azure and GPC that is scalable and resilient to failures.
Achieve a Recovery Time or RTO of 30 minutes for business critical infrastructure and resources.
Achieve a 30-minute Recovery Point or RPO for the business database.
Design an automated deployment and management strategy.
Development
The development of this solution for disasters affecting critical infrastructure has moved away from the usual practice of using native resources offered by cloud providers and has focused on non-native migration tools. The implementation of the solution has been based on:
Security measuresare one of the key factors for a resilient structure. The first major group of measures is related to data security, such as automating database backups with DevOps. In case a security breach does occur, the second group of measures, dedicated to the implementation of an action protocol, is focused on achieving a fast and efficient recovery of the service in case of interruption.
Pipeline in Azure DevOpsThis approach allowed the deployment of the infrastructure, but also the configuration and automation of the backups, allowing them to be transferred from Azure (active environment) to FCP.
Deployment of the architectureThis type of deployment requires more basic tools, such as Visual Studio Code, but also more specialised tools, such as Terraform, which allow working with Infrastructure as Code (IaC). Tools from Google Cloud and Azure, such as Kubernetes, were also used to help in the design, deployment and management process.
Scenario planningIn order to anticipate unforeseen events, a multitude of scenarios were considered. They were divided into natural, technical and human, each with a range of possible events leading to service interruption and data loss, such as cloud provider failure or accidental data corruption.
Creating a recovery planOnce an event occurs, a rapid response is necessary to minimise possible damage. To this end, processes such as healthchecks and SQL database backups were automated. The plan itself also needs to be reviewed, so periodic testing was carried out, and different performance metrics and success criteria were established.
Results
Moving away from the usual practice of using native resources presented several challenges, but the project achieved its main goal: transferring database backups from Azure to GCP, always automatically and periodically as set out in the recovery plan. All this made it possible to minimise the cost of infrastructure on Google Cloud Platform with a single storage bucket. The benefits that the project brings to an organisation are:
Automation of the entire process through pipelines.
Low economic cost for project implementation, because at any given time only the cost of the infrastructure that is in operation is borne, as only one backup storage bucket is deployed in the passive cloud.
Future scalability, requiring further security development phases.
Conclusions
This project has demonstrated a departure from the typical options offered by cloud providers to achieve a disaster recovery (DR) solution in cloud environments to ensure business continuity and protection of critical data and systems. The use of an active-passive architecture between Azure and Google Cloud Platform has enabled the reduction of Recovery Time On Time (RTO) and Recovery Point (RPO) to as little as 30 minutes, ensuring the availability and integrity of infrastructure and business resources.
The use of tools such as Terraform, Azure DevOps and Kubernetes enabled the automation of Infrastructure as Code (IaC), facilitating both the deployment and management of the system. In addition, database protection was enhanced by a focus on data security and automation of backups through pipelines, allowing databases to be protected and recoverable in the event of a disaster.