[LG Aimers] 강화학습4 - Function Approximation

728x90

LG Aimers 학습 내용을 정리한 글입니다.

Last Lecture : Tabular Representations - 어떤 유한한 상태 행동들을 가정하고, Transition 모델과 Reward function 같은 것들이 Matrix나 Vector형태로 나타날 수 있고 Value function은 vector로 혹은 Matrix로 나타날 수 있다.

Motivation for Function Approximation : 기본적으로 모든 각각의 State를 배우고 싶지 않다.

Feature Vector : represent a states

Linear Value Function Approximation(VFA) for Prediction With An Oracle

Incremental Prediction Algorithm (Oracle이 없는 상황) : target- TD

MC Linear VFA for Policy Evaluation

Control with VFA

Incremental Model-Free Control Approaches

Convergence of TD Methods with VFA

: Bellman backup을 통해 특정한 Feature representation을 가진 value function을 맞추려고 한다.

- Bellman operators 자체는 contractions이다.

- function approximation이 없을 때는 TD algorithm, SARSA algorithm, Q-learning algorithm들이 다 Contraction에 의해서 optimal로 가는 것이 맞지만,

Feature representation을 통한 function approximation 때문에 bellman backup을 한 다음 function approximator에 맞추는 step이 추가로 들어가고, 이 step이 contraction이 아닌 expansion이 될 수 있어서 이 알고리즘들의 Convergence은 확실히 증명하기 어렵다.

True SGD TD Algorithm

: MC algorithms가 true SGD methods이다. (TD 알고리즘들은 Semi-gradient이다.)

Minimizing the Bellman Error

Double sampling이라고 부르는 Next state를 두 개 가져와서 서로 Independent sample을 통해서 Gradient를 evaluation 해야 된다. → Residual-gradient 알고리즘 (실제로 쓰이지는 않는다.)

출처 : https://www.lgaimers.ai/

728x90

'Add, > LG Aimers' 카테고리의 다른 글

[LG Aimers] 강화학습6 - Policy Gradient (0)	2023.07.29
[LG Aimers] 강화학습5 - Deep Q Learning (0)	2023.07.29
[LG Aimers] 강화학습3 - Model-Free Control (0)	2023.07.27
[LG Aimers] 강화학습2 - Model-Free Policy Evaluation (0)	2023.07.27
[LG Aimers] 강화학습1 - MDP and Planning (0)	2023.07.26

알쓸싱잡

[LG Aimers] 강화학습4 - Function Approximation

'Add, > LG Aimers' 카테고리의 다른 글

댓글

티스토리툴바

[LG Aimers] 강화학습4 - Function Approximation

'Add, > LG Aimers' 카테고리의 다른 글

관련글

댓글

티스토리툴바