Data Preparation
High-quality data is the cornerstone of successfully fine-tuning a Large Language Model (LLM). This module simplifies the data preparation process, especially for data generated from simulation environments.
Data Collection Process
The system is designed to automatically collect decision-making data from the simulation environment. When an agent interacts within the environment, information from its decision-making process is captured and saved.
- Storage Path: All collected data is uniformly stored as JSON files in the following path:
src/envs/\<env\_name\>/datasets/decisions\_\<timestamp\>.json
Here, <env_name> is the name of the environment you are currently using, and <timestamp> is the timestamp when the file was created.
Data Format
The collected data is in JSON format, with each file containing a list of decision records. Each record is a dictionary containing the key fields required for fine-tuning.
Core Field Descriptions:
prompt: The input or instruction provided to the model. This is the context that the model needs to understand and respond to.output: The actual output generated by the model given theprompt. In SFT mode, this is typically the ideal, high-quality "expert" answer.reward: (Optional) A numerical value used to evaluate how good theoutputis. This field is crucial in PPO mode, as it serves as the reward signal to guide the model's optimization direction.
Data Example:
[
{
"prompt": "Based on the current market conditions, should I buy stock A or stock B at this time?",
"output": "Considering Stock A's recent financial performance and market analysis, it might be a more stable choice.",
"reward": 0.85
},
{
"prompt": "Considering the previous written test scores, who among candidate A, candidate B, and candidate C should be selected to proceed to the next interview stage?",
"output": "Considering the resume relevance and written test scores of the three candidates, candidate A should be selected for the next interview stage.",
"reward": 0.95
}
]
When starting a fine-tuning task, you need to specify which JSON dataset file (or which merged files) to use via the --dataset_path parameter.