Skip to content

Running jobs

Software

The user-nodes are equipped with a normal developer packages. If you feel that something is missing, please submit a ticket via the Portal

Virtual Environment Setup

In general, our system is compatible with the documentation from Cerebras which should be followed.
In this early phase, one small tweak is required:

Edit the file /<path_to_your_venv>/lib/python3.8/site-packages/cerebras/appliance/appliance_manager.py and comment out lines 986 to 1034 (should start errors = [] and end with a single ) on a line.)

Running codes

Run as per the normal Cerebras documentation. It is advisable to run codes inside a tmux session so you can return to them without having to leave SSH sessions active whilst jobs run.

Example training Llama4b on a single CS3

With a suitably configured venv as above, and the modelzoo checked out:

  • Copy /home/y26/shared/params_tr.yaml to <your modelzoo checkout>/src/cerebras/modelzoo/models/nlp/llama/configs/
  • Navigate to <your modelzoo checkout>/src/cerebras/modelzoo/models/nlp/llama/
  • Run using python run.py CSX --num_csx=1 --mode train --params configs/params_tr.yaml --mount_dirs /home/<your_project>/<your_project>/<your username> /home/y26/shared/ --python_paths <path to your modelzoo checkout>/src/ --max_steps 50 --model_dir llama4b_u3