===== Usage ===== After :doc:`installing `, you will be able to run Reinforcement Learning experiments using the package. ---------- Experiment ---------- An experiment consists of two parts: - Training your model using a Reinforcement Learning algorithm - (Optional) Evaluating the resulting model - Harness Evaluation - Local Evaluation .. important:: All the configurations and hyperparameters for an experiment must be specified in a **configuration file** in YAML. When you run an experiment, training jobs and evaluation jobs are created and submitted to MareNostrum5 using SLURM. For evaluation, some tasks are implemented in the Evaluation Harness. For those which are not, we use custom scripts (This is what we call "Local Evaluation"). All evaluation jobs are submited to MareNostrum5 with dependencies to their respective training jobs. .. note:: Note that evaluation is optional: if you do not specify an ``"evaluation"`` field in your `config.yaml` file, then only training will happen. ------------------------------------ Configuration file for an Experiment ------------------------------------ First, you will need a configuration file in YAML for your experiment (The values marked with ``None`` are computed internally by the package): .. code-block:: yaml :caption: Example YAML configuration file :name: example_yaml_config execution: algorithm: "dpo" # Reinforcement Learning Algorithm venv: "" output_dir: "" distributed_config: "DSZero3Offload" # For distributed training in MN5 slurm: job-name: "" # output: None # error: None nodes: 2 cpus-per-task: 80 gres: "gpu:4" time: "2:00:00" account: "bsc88" qos: "acc_debug" rl_script_args: dataset_name: "" model_config_args: model_name_or_path: "" # output_dir: None attn_implementation: "flash_attention_2" torch_dtype: "bfloat16" rl_config_args: # RL configs are subclasses of transformers.TrainingArguments # Different RL algorithms have different uses of beta. # However, in most of them, it is the weight of the KL-divergence (Loss=reward+Beta*KL) beta: 0.2 max_length: 8192 max_prompt_length: 128 # Default. When specified, you use the default data collator remove_unused_columns: false dataset_num_proc: 1 # ==== # From `TrainingArguments`: # === learning_rate: 5.0e-6 num_train_epochs: 2 bf16: true eval_strategy: "steps" eval_steps: 0.05 # logging_dir: None # local_rank: None report_to: "wandb" # These arguments help to manage GPU memory per_device_train_batch_size: 2 per_device_eval_batch_size: 2 gradient_accumulation_steps: 8 gradient_checkpointing: true environment: # Bash environment variables WANDB_PROJECT: "salamandra_alignment" WANDB_NAME: "test_alignment" # WANDB_DIR: None # Evaluation is optional evaluation: harness_tasks: - "flores_en-es" - "flores_es-ca" - "wnli_es" - "xlsum_es" harness_slurm: # job name, logs, and gpus are automatically computed qos: "acc_bscls" account: "bsc88" nodes: 2 time: "12:00:00" # job-name: None # output: None # error: None # cpus-per-task: None # # gres : None # "gpu:4" ---------------------- Running an experiment ---------------------- You can use your ``config.yaml`` file to run an experiment, using the :ref:`CLI `: .. code-block:: console $ rl_salamandra_mn5 config.yaml This will generate **and** submit SLURM jobs to MareNostrum 5, you can find the trained models, slurm scripts, slurm logs, and evaluation results in your ``output_dir``. Debugging ========== For debugging, use the ``--debug`` flag: .. code-block:: console $ rl_salamandra_mn5 config.yaml --debug In debugging mode, SLURM scripts will be generated but not submitted. Skipping evaluation ==================== If you only want to train but not evaluate nmodels, you can use the ``--no_evaluation`` flag .. code-block:: console $ rl_salamandra_mn5 config.yaml --no_evaluation This will create the training and evaluation jobs for SLURM, but it will **only** submit the training jobs. This may be useful when the evaluation queue is long, or when you want to make a quick experiment. --------------- Subexperiments --------------- To experiment with different configurations of values, you can use **lists** in your ``config.yaml`` file. For example, the following ``config.yaml`` for one experiment executes 12 subexperiments: - 6 runs of DPO: on 2 models with 3 learning rates, and - 6 runs of KTO: on the same 2 models with the same 3 learning rates Note that both hyphens (``-``) and square brackes (``[]``) work for writing lists in YAML. .. code-block:: yaml :caption: Setting up subexperiments :name: example_subexperiments_config ... execution: algorithm: - "dpo" - "kto" ... model_config_args: model_name_or_path: - "model_1" - "model_2" ... rl_config_args: learning_rate: [5.0e-6, 1.0e-5, 1.0e-6] ... .. warning:: Note that any of the values in the configuration can be a list, **except** ``output_dir`` under ``execution``. The ``output_dir`` must always be an absolute path. Furthermore, for a given configuration file, all subexperiments generated from it share the same ``evaluation`` field, which will **not** be unfolded. This means that you can specify lists inside the ``evaluation`` field (for example, lists of evaluation tasks), and doing so will not create more subexperiments.