Writings
Notes on research, things Iām reading, and ideas Iām still working out.
-
Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables PEARL learns task belief from context via a probabilistic encoder, making off-policy meta-RL both sample-efficient and easier to adapt.
-
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks MAML optimizes for rapid adaptation so a model can learn new tasks with only a few gradient steps, across supervised learning and RL.
-
Duelling Architecture for DQNs Duelling networks split value and advantage estimation to make state-value learning cleaner and more stable.
-
Double DQNs Double DQN reduces overestimation by separating action selection from action evaluation using the target network.
-
Playing Atari with Deep Reinforcement Learning The original DQN setup for learning Atari control directly from pixels ā Q-learning with a deep network, experience replay, and fixed targets.