University of Colorado Boulder

Mastering Classic Reinforcement Learning Algorithms

Ends soon! Keep adding new skills with 10,000+ programs for $239 (usually $399). Save now.

University of Colorado Boulder

Mastering Classic Reinforcement Learning Algorithms

Ashutosh Trivedi

Instructor: Ashutosh Trivedi

Included with Coursera Plus

Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

1 week to complete
at 10 hours a week
Flexible schedule
Learn at your own pace
Gain insight into a topic and learn the fundamentals.
Intermediate level

Recommended experience

1 week to complete
at 10 hours a week
Flexible schedule
Learn at your own pace

What you'll learn

  • Formulate sequential decision-making problems as deterministic decision processes, Markov chains, and finite Markov decision processes.

  • Explain and apply core reinforcement-learning concepts, including discounting, value functions, policies, Bellman equations, and optimality.

  • Implement planning algorithms for finite Markov decision processes, including value iteration, policy iteration, and linear programming formulations.

  • Compare tabular reinforcement-learning algorithms, including bandits, Monte Carlo methods, temporal-difference learning, SARSA, and Q-learning.

Details to know

Shareable certificate

Add to your LinkedIn profile

Recently updated!

June 2026

Assessments

6 assignments

Taught in English

See how employees at top companies are mastering in-demand skills

 logos of Petrobras, TATA, Danone, Capgemini, P&G and L'Oreal

There are 5 modules in this course

This module introduces the modeling and optimization foundations for sequential decision-making in their simplest form: deterministic decision processes with discounted rewards. We begin with states, actions, transitions, and rewards as a language for representing decision problems over time. We then develop value functions and Bellman equations as tools for optimizing long-term return. The goal is to build intuition for why dynamic programming is correct in the simpler setting of deterministic decision processes before introducing stochastic transitions, learning from sampled experience, and bootstrapping in later modules.

What's included

11 videos12 readings2 assignments

This module adds stochasticity to the deterministic picture developed in the previous module. Learners continue with the surprise-quiz example, now with uncertain outcomes: studying usually helps but may not always help, and relaxing may reduce preparation but may not always do so. The module first introduces stochastic transitions as probability distributions over next states, then studies Markov chains as stochastic systems without choices and finally adds actions to obtain Markov decision processes. The goal is to make expected discounted reward, policies, and Bellman equations feel like natural extensions of the deterministic setting.

What's included

8 videos8 readings1 assignment

This module focuses on known-model optimization. Learners use Bellman equations as computational tools for policy evaluation, policy improvement, value iteration, policy iteration, and linear programming formulations of discounted MDPs.

What's included

9 videos8 readings1 assignment

This module begins the transition from planning to reinforcement learning. In planning, the MDP model is known and Bellman backups compute expectations exactly. In reinforcement learning, the model is replaced by sampled experience. Learners first view reinforcement learning as sample-based dynamic programming, then study rewards, uncertainty, agent--environment interaction, bandit estimation, exploration versus exploitation, Monte Carlo policy evaluation, and Monte Carlo control.

What's included

9 videos11 readings1 assignment

This module completes the tabular reinforcement-learning part of Course 1. Module 4 introduced sample-based learning through bandits and Monte Carlo methods. Module 5 introduces temporal-difference learning: updating after one sampled transition by combining an observed reward with a bootstrapped value estimate. The module ends by summarizing tabular reinforcement learning and motivating the transition to function approximation and deep RL.

What's included

8 videos9 readings1 assignment

Instructor

Ashutosh Trivedi
University of Colorado Boulder
2 Courses47 learners

Offered by

Why people choose Coursera for their career

Felipe M.

Learner since 2018
"To be able to take courses at my own pace and rhythm has been an amazing experience. I can learn whenever it fits my schedule and mood."

Jennifer J.

Learner since 2020
"I directly applied the concepts and skills I learned from my courses to an exciting new project at work."

Larry W.

Learner since 2021
"When I need courses on topics that my university doesn't offer, Coursera is one of the best places to go."

Chaitanya A.

"Learning isn't just about being better at your job: it's so much more than that. Coursera allows me to learn without limits."

Frequently asked questions