Data Preparation

High-quality data is the cornerstone of successfully fine-tuning a Large Language Model (LLM). This module simplifies the data preparation process, especially for data generated from simulation environments.

Data Collection Process

The system is designed to automatically collect decision-making data from the simulation environment. When an agent interacts within the environment, information from its decision-making process is captured and saved.

Storage Path: All collected data is uniformly stored as JSON files in the following path:

src/envs/\<env\_name\>/datasets/decisions\_\<timestamp\>.json

Here, <env_name> is the name of the environment you are currently using, and <timestamp> is the timestamp when the file was created.

Data Format

The collected data is in JSON format, with each file containing a list of decision records. Each record is a dictionary containing the key fields required for fine-tuning.

Core Field Descriptions:

prompt: The input or instruction provided to the model. This is the context that the model needs to understand and respond to.
output: The actual output generated by the model given the prompt. In SFT mode, this is typically the ideal, high-quality "expert" answer.
reward: (Optional) A numerical value used to evaluate how good the output is. This field is crucial in PPO mode, as it serves as the reward signal to guide the model's optimization direction.

Data Example:

[
  {
    "prompt": "Based on the current market conditions, should I buy stock A or stock B at this time?",
    "output": "Considering Stock A's recent financial performance and market analysis, it might be a more stable choice.",
    "reward": 0.85
  },
  {
    "prompt": "Considering the previous written test scores, who among candidate A, candidate B, and candidate C should be selected to proceed to the next interview stage?",
    "output": "Considering the resume relevance and written test scores of the three candidates, candidate A should be selected for the next interview stage.",
    "reward": 0.95
  }
]

When starting a fine-tuning task, you need to specify which JSON dataset file (or which merged files) to use via the --dataset_path parameter.

Data Collection Process​

Data Format​

Data Collection Process

Data Format