Build an LLM

Select Model

Pick a pre-trained checkpoint to fine-tune.

Method

Full parameter or LoRA.

Training Data

Upload a CSV or use the default data.

Hyperparameters

Optimize fine-tuning behavior.

Core Settings

Batch Size

Epochs

Max Steps/Epoch

Max Length

Optimization

Learning Rate

Weight Decay

Evaluation & Checkpointing

Eval Interval

Save Interval

Auto Start

Batch Size

4

Learning Rate

0.00001

Max Length

512

Understand

Masked loss and optional LoRA math.

Equations

Sequence construction

$\text{sequence} = [\text{prompt}] + [\text{response}]$

Masked loss

$m_i = 1 \text{ for response tokens, } 0 \text{ otherwise}$

$\mathcal{L} = \frac{\sum_i m_i \cdot \mathcal{L}_i}{\sum_i m_i}$

Code Snippets

No snippets available yet.

Train

Train your model.

Demo mode: fine-tuning disabled.

Live Metrics

Loss and gradients.

Progress

0.0%

Elapsed

-

ETA

-

Loss

-

Running Loss

-

Grad Norm

-

Dynamics

Visualize gradient norms and optimization trajectory.

Gradient Norms per Layer

Start training to see gradients

Loss Landscape in 3D (Loss on Z-axis)

Waiting for training data...

What am I looking at?

Gradient Norms per Layer: This shows the magnitude (L2 norm) of the gradients for each part of the model. It tells you "how hard" each layer is trying to change.

During Finetuning, these gradients are typically smaller than in pretraining.
If you use LoRA, only the LoRA adapter layers will show significant gradients.

Loss Landscape Trajectory: We visualize the optimization path by projecting the high-dimensional weight updates onto a 3D space.

X & Y: Random projection of current weights.
Z: Trianing Loss.
This helps you see if the model is finding a "valley" or getting stuck in a plateau.

Inspect Batch

Prompt vs response tokens and attention patterns.

Sample

0 / 3

Select a sample to inspect prompt/response tokens.

Layer

0 / 0

Head

0 / 0

Query ↓Key →

Eval History

Train vs validation loss.

Logs

Checkpoint events.

No logs yet.