Select Model
Pick a pre-trained checkpoint to fine-tune.
Method
Full parameter or LoRA.
Training Data
Upload a CSV or use the default data.
Hyperparameters
Optimize fine-tuning behavior.
Batch Size
4
Learning Rate
0.00001
Max Length
512
Understand
Masked loss and optional LoRA math.
Train
Train your model.
Demo mode: fine-tuning disabled.
Live Metrics
Loss and gradients.
Progress
0.0%
Elapsed
-
ETA
-
Loss
-
Running Loss
-
Grad Norm
-
Dynamics
Visualize gradient norms and optimization trajectory.
What am I looking at?
Gradient Norms per Layer: This shows the magnitude (L2 norm) of the gradients for each part of the model. It tells you "how hard" each layer is trying to change.
- During Finetuning, these gradients are typically smaller than in pretraining.
- If you use LoRA, only the LoRA adapter layers will show significant gradients.
Loss Landscape Trajectory: We visualize the optimization path by projecting the high-dimensional weight updates onto a 3D space.
- X & Y: Random projection of current weights.
- Z: Trianing Loss.
- This helps you see if the model is finding a "valley" or getting stuck in a plateau.
Inspect Batch
Prompt vs response tokens and attention patterns.
Select a sample to inspect prompt/response tokens.
Eval History
Train vs validation loss.
Logs
Checkpoint events.