This course teaches key resilience concepts and strategies to design robust cloud architectures on AWS. Understanding the importance of resilience, the course focuses on building fault-tolerant and scalable systems that can withstand failures while maintaining high availability.

Building Resilient Architectures on AWS
Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.
Recommended experience
Recommended experience
What you'll learn
Understand resilience concepts and apply them to AWS architecture for fault tolerance.
Implement auto-scaling and secure data backup strategies to ensure high availability.
Design and orchestrate disaster recovery plans, chaos testing, and resiliency monitoring on AWS.
Details to know

Add to your LinkedIn profile
June 2026
15 assignments
See how employees at top companies are mastering in-demand skills

There are 15 modules in this course
This module explores the essential principles and strategies for building resilient cloud applications. Learners will discover how to leverage AWS services, implement isolation techniques, and address common threats to ensure reliability and continuous improvement in cloud environments.
What's included
1 video8 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
8 readings•Total 39 minutes
- Introduction•5 minutes
- Cloud Resilience•6 minutes
- Resilient Foundations•5 minutes
- Decoupling Systems through Isolation Principles•6 minutes
- Facing the Cloud's Storms•4 minutes
- Software Bugs and Security Threats•4 minutes
- Empowering Yourself with AWS Services•3 minutes
- The Continuous Resilience Journey•6 minutes
1 assignment•Total 16 minutes
- Resilience in Application Design and Management•16 minutes
This module explores strategies for building resilient and cost-effective compute infrastructures on AWS, including the use of auto scaling, redundancy, and fault tolerance. Learners will discover how to optimize resource allocation with Spot and Reserved Instances, monitor system health using both AWS-native and open source observability tools, and extend resilience practices to containers and serverless environments.
What's included
1 video6 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
6 readings•Total 32 minutes
- Introduction•7 minutes
- Embracing Auto Scaling for Dynamic Resource Management•5 minutes
- Optimizing Cost-Efficiency with Spot and Reserved Instances•6 minutes
- Monitoring and Maintaining a Healthy Infrastructure•6 minutes
- AWS-Managed Open Source Observability Services•4 minutes
- Extending Resilience to Containers and Serverless•4 minutes
1 assignment•Total 16 minutes
- Resilient Systems and Auto Scaling Fundamentals•16 minutes
This module explores essential strategies for protecting organizational data, including encryption, intrusion detection, and robust backup practices. Learners will discover how to leverage AWS services for disaster recovery, multi-region replication, and resilient web application architectures. Practical scenarios and tools for monitoring, incident response, and disaster recovery drills are also covered.
What's included
1 video7 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
7 readings•Total 41 minutes
- Introduction•6 minutes
- Encryption, Intrusion Detection, and Prevention•7 minutes
- Backup Validation and Disaster Recovery Testing•7 minutes
- AWS Services for Multi-Region and Geo-Replication•4 minutes
- A Simple, Resilient, Global Web Application Architecture•6 minutes
- Sudden Spike in Database CPU Utilization•6 minutes
- AWS Tools to Power Your DR Drills•5 minutes
1 assignment•Total 16 minutes
- Data Protection and Resilience in Cloud Infrastructure•16 minutes
This module explores strategies for maintaining system functionality during partial failures, focusing on monitoring, log analysis, and proactive issue detection. Learners will discover how to leverage Amazon CloudWatch, machine learning, and generative AI to enhance operational resilience and automate recovery processes. Practical techniques for streamlining incident response and reducing false alarms are also covered.
What's included
1 video6 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
6 readings•Total 31 minutes
- Introduction•5 minutes
- Log Analysis through Amazon CloudWatch•5 minutes
- Predicting Issues Before They Occur•7 minutes
- Streamlining Recovery with Preconfigured Actions•6 minutes
- Leveraging ML and GenAI to Enhance Issue Detection and Response•4 minutes
- ML for Issue Identification•4 minutes
1 assignment•Total 16 minutes
- Mastering System Reliability and Failure Handling•16 minutes
This module delves into the collaborative nature of resilience in AWS environments, emphasizing the division of responsibilities between AWS and its customers. Learners will explore best practices for managing database infrastructure, securing cloud resources, and implementing continuous testing to ensure critical infrastructure resilience. Practical tools and techniques for ongoing validation and improvement of AWS environments are also covered.
What's included
1 video4 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
4 readings•Total 27 minutes
- Introduction•7 minutes
- Customer Responsibilities•6 minutes
- The Importance of Continuous Testing for Critical Infrastructure Resilience in AWS Environments•6 minutes
- Tools and Techniques to Perform Continuous Testing of AWS Environments•8 minutes
1 assignment•Total 16 minutes
- Exploring AWS Security and Responsibility Frameworks•16 minutes
This module explores strategies for building resilient cloud applications using AWS Well-Architected principles. Learners will discover how to implement frequent, reversible infrastructure changes, refine operational procedures, anticipate failures, and apply security measures to protect systems and data. By the end, participants will be equipped to enhance the reliability and security of their AWS-based solutions.
What's included
1 video5 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
5 readings•Total 40 minutes
- Introduction•4 minutes
- Making Frequent, Small, Reversible Changes•3 minutes
- Refining Operations Procedures Frequently•4 minutes
- Anticipating Failure•23 minutes
- Protection•6 minutes
1 assignment•Total 16 minutes
- AWS Resiliency Best Practices•16 minutes
This module explores strategies for building resilient applications that can withstand component failures and maintain high availability. Learners will examine techniques such as load balancing, redundancy, state management, microservices, and event-driven architectures to enhance fault tolerance. Practical guidance on data backup, limits, and timeouts is also provided to ensure robust system performance.
What's included
1 video8 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
8 readings•Total 42 minutes
- Introduction•6 minutes
- Load Balancing Workloads Across Redundant Systems•6 minutes
- State Management Stateless Versus Stateful Approaches•4 minutes
- Applying Redundancy for File Storage•8 minutes
- Backing Up Data Regularly•4 minutes
- Using Microservices for Decoupling Services•5 minutes
- Limits and Timeouts•4 minutes
- Event-Driven Architecture (EDA)•5 minutes
1 assignment•Total 16 minutes
- Fault Tolerance and System Resilience•16 minutes
This module explores strategies for enhancing the resilience of serverless applications, including the use of dead-letter queues, handling service quotas, and implementing effective monitoring and observability. Learners will gain practical knowledge to ensure fault tolerance and high availability in cloud-native environments.
What's included
1 video5 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
5 readings•Total 27 minutes
- Introduction•4 minutes
- Building Resilience into Serverless•6 minutes
- DLQs for Failed Asynchronous Invocations•5 minutes
- Handling Quotas with Queues SQS Example•4 minutes
- Monitoring and Observability for Serverless Applications•8 minutes
1 assignment•Total 16 minutes
- Resiliency in Serverless Architectures•16 minutes
This module explores how container technologies enhance application resiliency in cloud environments. Learners will examine orchestration platforms, scaling strategies, inter-service communication, service mesh architectures, and runtime security best practices for containerized applications.
What's included
1 video6 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
6 readings•Total 34 minutes
- Introduction•10 minutes
- Container Orchestration Platforms•5 minutes
- Scaling and Load-Balancing Containerized Applications•6 minutes
- Inter-Service Communication with Containers•4 minutes
- Service Mesh•5 minutes
- Securing Container Runtimes•4 minutes
1 assignment•Total 16 minutes
- Containerization and Resiliency in Cloud Architectures•16 minutes
This module explores strategies for building highly available and fault-tolerant cloud architectures by leveraging multi-region deployments. Learners will examine serverless failover, content delivery networks, active-active and hub-and-spoke models, and advanced cell-based designs to ensure continuous service and data consistency. Practical examples illustrate how to enhance resilience and performance in distributed systems.
What's included
1 video6 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
6 readings•Total 29 minutes
- Introduction•7 minutes
- Simplified Failover with Serverless•4 minutes
- Using CloudFront•4 minutes
- Delving into Active-Active Regional Architectures•5 minutes
- Using Centralized Data Sharing•4 minutes
- Introducing Cell-Based Architectures•5 minutes
1 assignment•Total 16 minutes
- Resilient Architectures Across Regions•16 minutes
This module examines practical examples of resilient cloud architectures, focusing on strategies to ensure system reliability and security across single and multiple availability zones and regions. Learners will explore best practices for deploying workloads, configuring multi-site architectures, and implementing security measures to withstand failures and attacks.
What's included
1 video5 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
5 readings•Total 26 minutes
- Introduction•4 minutes
- Reliability Configurations in Single AZ•5 minutes
- Reliability Considerations in Multi-AZ Architecture•7 minutes
- Multi-site Architecture Deployments•4 minutes
- An Example of DDoS/Security Resilient Architecture•6 minutes
1 assignment•Total 16 minutes
- Resilient Architecture in Practice•16 minutes
This module guides learners through the essential practices of monitoring, alerting, and auditing cloud environments to ensure system reliability and resilience. You will explore how to design effective observability strategies, implement alerting mechanisms, and audit your AWS environment for continuous improvement.
What's included
1 video5 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
5 readings•Total 25 minutes
- Introduction•5 minutes
- Steps in Designing Observability•7 minutes
- Alerting•3 minutes
- Logging Key Metrics and Events•4 minutes
- Auditing Environments for Resilience•6 minutes
1 assignment•Total 16 minutes
- Observability and System Resilience in Cloud Infrastructure•16 minutes
This module introduces the fundamentals of chaos engineering testing, focusing on how to define steady state, inject faults, and validate system resilience. Learners will explore practical techniques for simulating failures and monitoring system responses to ensure robust and reliable environments.
What's included
1 video4 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
4 readings•Total 26 minutes
- Introduction•5 minutes
- Defining Steady State•6 minutes
- Introducing Faults•6 minutes
- API Gateway Failure•9 minutes
1 assignment•Total 16 minutes
- Chaos Engineering Fundamentals•16 minutes
This module explores the essential strategies and techniques for developing, implementing, and testing disaster recovery plans in cloud environments. Learners will examine hot standby configurations, the importance of regular DR testing, and methods for conducting security assessments to ensure business continuity. By the end, participants will be equipped to enhance organizational resilience against disruptions.
What's included
1 video4 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
4 readings•Total 28 minutes
- Introduction•11 minutes
- Hot Standby•4 minutes
- Testing Disaster Recovery Plans•6 minutes
- Security Testing•7 minutes
1 assignment•Total 16 minutes
- Disaster Recovery Fundamentals•16 minutes
This module explores key AWS resilience services and frameworks for building robust cloud architectures. Learners will discover how to implement immutable backups, utilize AWS Resilience Hub, and understand the components of AWS Disaster Recovery Service (DRS) to enhance system reliability and availability.
What's included
1 video5 readings1 assignment
1 video•Total 1 minute
- Overview•1 minute
5 readings•Total 31 minutes
- Introduction•6 minutes
- Immutable Backups with AWS Backup Vault Lock•6 minutes
- How Does the AWS Resilience Lifecycle Framework Work•3 minutes
- How Does AWS Resilience Hub Work•6 minutes
- AWS DRS Components•10 minutes
1 assignment•Total 16 minutes
- Building Resilient Architectures with AWS•16 minutes
Instructor

Offered by
Why people choose Coursera for their career

Felipe M.

Jennifer J.

Larry W.

Chaitanya A.
Frequently asked questions
Yes, you can preview the first video and view the syllabus before you enroll. You must purchase the course to access content not included in the preview.
If you decide to enroll in the course before the session start date, you will have access to all of the lecture videos and readings for the course. You’ll be able to submit assignments once the session starts.
Once you enroll and your session begins, you will have access to all videos and other resources, including reading items and the course discussion forum. You’ll be able to view and submit practice assessments, and complete required graded assignments to earn a grade and a Course Certificate.
If you complete the course successfully, your electronic Course Certificate will be added to your Accomplishments page - from there, you can print your Course Certificate or add it to your LinkedIn profile.
This course is currently available only to learners who have paid or received financial aid, when available.
Yes. In select learning programs, you can apply for financial aid or a scholarship if you can’t afford the enrollment fee. If fin aid or scholarship is available for your learning program selection, you’ll find a link to apply on the description page.
More questions
Financial aid available,





