Select Model

Pick a pre-trained checkpoint to fine-tune.

Method

Full parameter or LoRA.

Training Data

Upload a CSV or use the default data.

Hyperparameters

Optimize fine-tuning behavior.

Core Settings
Optimization
Evaluation & Checkpointing

Batch Size

4

Learning Rate

0.00001

Max Length

512

Understand

Masked loss and optional LoRA math.

Equations

Sequence construction

sequence=[prompt]+[response]\text{sequence} = [\text{prompt}] + [\text{response}]

Masked loss

mi=1 for response tokens, 0 otherwisem_i = 1 \text{ for response tokens, } 0 \text{ otherwise}

L=imiLiimi\mathcal{L} = \frac{\sum_i m_i \cdot \mathcal{L}_i}{\sum_i m_i}

Code Snippets

No snippets available yet.

Train

Train your model.

Demo mode: fine-tuning disabled.

Live Metrics

Loss and gradients.

Progress

0.0%

Elapsed

-

ETA

-

Loss

-

Running Loss

-

Grad Norm

-

Dynamics

Visualize gradient norms and optimization trajectory.

Gradient Norms per Layer
Start training to see gradients
Loss Landscape in 3D (Loss on Z-axis)
Waiting for training data...
What am I looking at?

Gradient Norms per Layer: This shows the magnitude (L2 norm) of the gradients for each part of the model. It tells you "how hard" each layer is trying to change.

  • During Finetuning, these gradients are typically smaller than in pretraining.
  • If you use LoRA, only the LoRA adapter layers will show significant gradients.

Loss Landscape Trajectory: We visualize the optimization path by projecting the high-dimensional weight updates onto a 3D space.

  • X & Y: Random projection of current weights.
  • Z: Trianing Loss.
  • This helps you see if the model is finding a "valley" or getting stuck in a plateau.

Inspect Batch

Prompt vs response tokens and attention patterns.

0 / 3

Select a sample to inspect prompt/response tokens.

0 / 0
0 / 0
Query ↓Key →

Eval History

Train vs validation loss.

Logs

Checkpoint events.

No logs yet.