What are Diffusion Models?

[Updated on 2021-09-19: Highly recommend this blog post on score-based generative modeling by Yang Song (author of several key papers in the references)]. So far, I’ve written about three types of generative models, GAN, VAE, and Flow-based models. They have shown great success in generating high-quality samples, but each has some limitations of its own. GAN models are known for potentially unstable training and less diversity in generation due to their adversarial training nature....

July 11, 2021 · 17 min · Lilian Weng

Flow-based Deep Generative Models

So far, I’ve written about two types of generative models, GAN and VAE. Neither of them explicitly learns the probability density function of real data, $p(\mathbf{x})$ (where $\mathbf{x} \in \mathcal{D}$) — because it is really hard! Taking the generative model with latent variables as an example, $p(\mathbf{x}) = \int p(\mathbf{x}\vert\mathbf{z})p(\mathbf{z})d\mathbf{z}$ can hardly be calculated as it is intractable to go through all possible values of the latent code $\mathbf{z}$. Flow-based deep generative models conquer this hard problem with the help of normalizing flows, a powerful statistics tool for density estimation....

October 13, 2018 · 21 min · Lilian Weng

Policy Gradient Algorithms

[Updated on 2018-06-30: add two new policy gradient methods, SAC and D4PG.] [Updated on 2018-09-30: add a new policy gradient method, TD3.] [Updated on 2019-02-09: add SAC with automatically adjusted temperature]. [Updated on 2019-06-26: Thanks to Chanseok, we have a version of this post in Korean]. [Updated on 2019-09-12: add a new policy gradient method SVPG.] [Updated on 2019-12-22: add a new policy gradient method IMPALA....

April 8, 2018 · 52 min · Lilian Weng

A (Long) Peek into Reinforcement Learning

[Updated on 2020-09-03: Updated the algorithm of SARSA and Q-learning so that the difference is more pronounced. [Updated on 2021-09-19: Thanks to 爱吃猫的鱼, we have this post in Chinese]. A couple of exciting news in Artificial Intelligence (AI) has just happened in recent years. AlphaGo defeated the best professional human player in the game of Go. Very soon the extended algorithm AlphaGo Zero beat AlphaGo by 100-0 without supervised learning on human knowledge....

February 19, 2018 · 31 min · Lilian Weng

The Multi-Armed Bandit Problem and Its Solutions

The algorithms are implemented for Bernoulli bandit in lilianweng/multi-armed-bandit. Exploitation vs Exploration The exploration vs exploitation dilemma exists in many aspects of our life. Say, your favorite restaurant is right around the corner. If you go there every day, you would be confident of what you will get, but miss the chances of discovering an even better option. If you try new places all the time, very likely you are gonna have to eat unpleasant food from time to time....

January 23, 2018 · 10 min · Lilian Weng