← IndexGF-01 · SCHOOL — 2025

Q-Learner

REINFORCEMENT LEARNING · PYTHON

FIG. 01 — REAL SCREENSHOT PENDING. THE PATTERN STANDS IN.

A tabular Q-learning agent dropped into a 2D gridworld with nothing but a reward signal and an ε-greedy disposition. It wanders, it bumps into walls, and — after enough episodes — it stops embarrassing itself.

The interesting part was reward shaping: small tweaks to the signal changed the personality of the policy more than any hyperparameter. Convergence plots and a value heatmap document the journey from random walk to competence.

INCLUDING

PYTHON + NUMPY
ε-GREEDY POLICY
REWARD SHAPING
VALUE HEATMAP

SOURCE — AVAILABLE UPON REQUEST.
COURSEWORK FALLS UNDER THE HONOR CODE.

NEXTGF-02

PID ControlSIMULATED ROBOTICS · CONTROL LOOPS2025→