← IndexGF-01 · SCHOOL — 2025
Q-Learner
REINFORCEMENT LEARNING · PYTHON
FIG. 01 — REAL SCREENSHOT PENDING. THE PATTERN STANDS IN.
A tabular Q-learning agent dropped into a 2D gridworld with nothing but a reward signal and an ε-greedy disposition. It wanders, it bumps into walls, and — after enough episodes — it stops embarrassing itself.
The interesting part was reward shaping: small tweaks to the signal changed the personality of the policy more than any hyperparameter. Convergence plots and a value heatmap document the journey from random walk to competence.
INCLUDING
- PYTHON + NUMPY
- ε-GREEDY POLICY
- REWARD SHAPING
- VALUE HEATMAP
SOURCE — AVAILABLE UPON REQUEST.
COURSEWORK FALLS UNDER THE HONOR CODE.
COURSEWORK FALLS UNDER THE HONOR CODE.