在 AI PC 上本地微调 LLM：不用独显，也能把 Llama 3.2 训练成“会调用工具”的模型

本文分享一个真实跑通的案例：在AI PC上，使用 Unsloth 进行 LoRA 微调，对 meta-llama/Llama-3.2-3B-Instruct 进行训练，让模型具备 Function Calling（工具调用）能力，并完成推理与评测。

OpenVINO 中文社区

418人浏览 · 2026-02-06 17:25:48

OpenVINO 中文社区 · 2026-02-06 17:25:48 发布

作者：武卓，胡誉文

什么是 AI PC？为什么它适合做本地微调

AI PC（AI 个人电脑）正在把“推理 + 微调”从云端带回本地：它通常具备更强的端侧算力组合（CPU / iGPU / NPU 等），让开发者可以在不依赖昂贵独立显卡、甚至不必上云的情况下，通过 LoRA 等参数高效方法，完成SFT (Supervised Fine-Tuning)这类微调任务。

对开发者来说，AI PC 最大的价值在于“高频迭代”：

快速验证你的数据、Prompt、工具调用格式是否有效
不用为了短平快实验频繁开云实例，成本更可控
数据不出本机，适合企业/内部 PoC 或敏感数据场景

本文分享一个真实跑通的案例：在AI PC 上，使用 Unsloth 进行 LoRA 微调（基于Transformer Reinforcement Learning（TRL）的 SFTTrainer），对 meta-llama/Llama-3.2-3B-Instruct 进行训练，让模型具备 Function Calling（工具调用） 能力，并完成推理与评测。

训练数据集：hiyouga/glaive-function-calling-v2-sharegpt
评测数据集：Salesforce/xlam-function-calling-60k（需要申请权限）

从 0 到 1：按步骤搭环境并跑通微调

第 0 步：打开终端窗口

例如：打开 Anaconda Prompt
conda activate ...

第 1 步：安装 Visual Studio C++ 工具链

可在链接下载并安装 Visual Studio Build Tools / Community ，安装时勾选：

Desktop development with C++
MSVC v143 toolset
Windows 10/11 SDK

（可选）在当前终端启用 MSVC，运行以下命令可以验证（根据目前的安装版本和路径，以下路径名可能会有所不同：

call "C:\Program Files\Microsoft Visual Studio\18\Community\VC\Auxiliary\Build\vcvars64.bat"where cl

确认能看到 cl.exe 路径，说明编译器 OK。

第 2 步：安装 Intel oneAPI（Intel GPU/XPU 路线）

安装 oneAPI Base Toolkit 2025.2.1版本，点击https://www.intel.cn/content/www/cn/zh/developer/tools/oneapi/base-toolkit-download.html 的安装包下载安装。

需初始化 oneAPI 环境变量：

call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

第 3 步：开启 Windows 长路径（只需一次）

管理员 CMD终端窗口执行：

powershell -Command "Set-ItemProperty -Path ""HKLM:\SYSTEM\CurrentControlSet\Control\FileSystem"" -Name ""LongPathsEnabled"" -Value 1

第 4 步：创建 Conda 环境并安装 Unsloth（Intel XPU 版本）

conda create -n aipc-finetune-example python=3.11 -yconda activate aipc-finetune-example git clone https://github.com/unslothai/unsloth.gitcd unslothpip install -e .[intel-gpu-torch290]

安装完成后，你可以快速确认 XPU 可用：

import torchprint(torch.__version__)                 # 2.9.0+xpuprint(hasattr(torch, "xpu") and torch.xpu.is_available())  # True

第 5 步：配置 Level Zero SDK

Triton Intel 后端 JIT 编译依赖 Level Zero headers，在这里https://github.com/oneapi-src/level-zero/releases/tag/v1.20.2 下载 level-zero-win-sdk-1.20.2.zip 并解压到例如：

C:\Users\MindPro\On-Device Fine-tuning AIPC Example\level-zero-win-sdk-1.20.2

设置 ZE_PATH：

set ZE_PATH=C:\Users\MindPro\On-Device Fine-tuning AIPC Example\level-zero-win-sdk-1.20.2

第 6 步：Hugging Face gated 数据集访问（评测必看）

评测数据集 Salesforce/xlam-function-calling-60k 是 gated：仅设置 HF_TOKEN 不够，还需要在网页上 Request access / Agree。

推荐流程：

set HF_TOKEN=hf_xxxhuggingface-cli login

然后去数据集页面申请/同意一次。

第 7步：安装 ROUGE 评测依赖

pip install -U evaluate nltk rouge_score bert_score absl-pypython -c "import nltk; nltk.download('punkt')"

3）运行：训练 / 推理 / 评测

训练

call "C:\Program Files (x86)\Intel\oneAPI\setvars.bat" --forceset ZE_PATH=path\to\the\unzipped\level-zero-win-sdk-1.20.2 #这里放你自己的解压后的路径set UNSLOTH_DISABLE_STATISTICS=1 python train.py

具体代码如下：

import datetime from datasets import load_datasetimport torchfrom unsloth import FastLanguageModelfrom trl import SFTTrainer, SFTConfigfrom unsloth.chat_templates import get_chat_templatedef prepare_dataset(tokenizer, dataset_name, dataset_size, dataset_seed):    system_prompt = "You are a helpful assistant with access to the following functions. Use them if required -\n__TOOL_DESCRIPTION__\nTo use these functions respond with:\n{\"name\": \"function_name\", \"arguments\": {\"arg_1\": \"value_1\", \"arg_1\": \"value_1\", ...}}\n\nEdge cases you must handle:\n - If there are no functions that match the user request, you will respond politely that you cannot help."    def formatting_prompts_func(example):        texts = []        """ for query, tools, answers in zip(example['query'], example['tools'], example['answers']):            system_prompt = system_prompt.replace("__TOOL_DESCRIPTION__", tools)            messages = [                {"role": "system", "content": system_prompt},                {"role": "user", "content": query},                {"role": "assistant", "content": answers},            ] """        for messages, tools in zip(example["conversations"], example["tools"]):            messages.insert(0, {"from": "system", "value": system_prompt.replace("__TOOL_DESCRIPTION__", tools)})            text = tokenizer.apply_chat_template(                messages,                tokenize=False,                add_generation_prompt=False            )            texts.append(text)        return {"text": texts}    dataset = load_dataset(dataset_name, split="train").select(range(dataset_size)).shuffle(seed=dataset_seed)    print(dataset)    dataset = dataset.map(formatting_prompts_func, batched=True)    for i in range(3):        print(f"Sample {i}:\n{dataset[i]['text']}\n{'='*40}")    return datasetdef get_tokenizer(tokenizer):    tokenizer = get_chat_template(        tokenizer,        mapping={            "role": "from",            "content": "value",            "user": "human",            "assistant": "gpt",        },        chat_template="llama-3.1",    )    return tokenizerdef main(    repo_id_or_model_path="meta-llama/Llama-3.2-3B-Instruct",    epochs=1,    max_seq_length=2048,    dataset_size=2000,    lora_rank=16,    batch_size=2,    learning_rate="2e-4",    max_steps=None,    logging_steps=50,    gradient_accumulation_steps=1,    random_state=3407,    dataset_seed=None,    output_dir=None,):    dataset_name = "hiyouga/glaive-function-calling-v2-sharegpt"    optim = "adamw_torch"    use_gradient_checkpointing = "unsloth"    dtype = torch.bfloat16    def start_train(trainer):        return trainer.train()    lora_kwargs = {        "r": lora_rank, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128        "target_modules": ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],        "lora_alpha": 16,        "lora_dropout": 0, # Supports any, but = 0 is optimized        "bias": "none",  # Supports any, but = "none" is optimized        "use_rslora": False,        "loftq_config": None,    }    training_args_kwargs = {}    checkpoint_output_dir = f"{outputs}/checkpoints"    if max_steps is not None:        training_args_kwargs["max_steps"] = max_steps    model, tokenizer_orig = FastLanguageModel.from_pretrained(        model_name=repo_id_or_model_path,        max_seq_length=max_seq_length,        dtype=dtype,        full_finetuning=False,        load_in_4bit=False,        device_map="xpu:0",        use_gradient_checkpointing=use_gradient_checkpointing,    )    model = FastLanguageModel.get_peft_model(        model,        random_state=random_state,        **lora_kwargs    )    print(">>> Success to load model!")    tokenizer = get_tokenizer(tokenizer_orig)    dataset = prepare_dataset(tokenizer, dataset_name, dataset_size, dataset_seed)    print(">>> Success to create datasets!")    training_arguments = SFTConfig(        per_device_train_batch_size=batch_size,        gradient_accumulation_steps=gradient_accumulation_steps,        warmup_steps=5,        num_train_epochs=epochs,        learning_rate=float(learning_rate),        fp16=True if dtype == torch.float16 else False,        bf16=True if dtype == torch.bfloat16 else False,        logging_steps=logging_steps,        optim=optim,        weight_decay=0.01,        lr_scheduler_type="linear",        seed=random_state,        output_dir=checkpoint_output_dir,        report_to="none",        dataset_text_field="text",        dataset_num_proc=1,        packing=False,        **training_args_kwargs,    )    trainer = SFTTrainer(        model = model,        tokenizer = tokenizer_orig,        train_dataset = dataset,        args = training_arguments    )    print(">>> Success to define trainer!")    print(f"Starting training with:\nrandom_state: {random_state}\nepochs: {epochs}\nbatch_size: {batch_size}\ndataset_size: {dataset_size}\nlora_rank: {lora_rank}\nlearning_rate: {learning_rate}\nmax_steps: {max_steps}\nlogging_steps: {logging_steps}\ngradient_accumulation_steps: {gradient_accumulation_steps}")    trainer_stats = start_train(trainer)    print(trainer_stats)    print("Total Time: ", str(datetime.timedelta(seconds=int(trainer_stats.metrics["train_runtime"]))))if __name__ == "__main__":    # Set params accordingly    main()

运行效果如下：

（以上截图不代表性能数据，实际性能

会由于硬件类型和系统环境等浮动）

推理

具体代码如下：

import osimport jsonimport stringimport randomfrom unsloth import FastLanguageModelfrom sample_tools import (    weather_info,    send_email,    search_web,    create_calendar_event,    create_calendar_event_with_attendees,    translate_text,    set_reminder,    generate_password,    update_calendar,    get_vector_sum,)def generate_alphanumeric():    characters = string.ascii_letters + string.digits    result = ''.join(random.choice(characters) for _ in range(9))    return resultdef parse_tool_call(decoded_output):    try:        content = json.loads(decoded_output[0])    except json.JSONDecodeError as e:        print("Failed to parse JSON:", e)        print("----end----\n\n")        return None, None    try:        function_name = content['name']        arguments = content['parameters']    except KeyError as e:        print("Missing key in JSON:", e)        print("----end----\n\n")        return None, None    return function_name, argumentsdef call_tool(function_name, arguments):    try:        result = globals()[function_name](**arguments)        print("Tool call result: ", result)        return result    except Exception as e:        print("Error calling function:", e)        return Nonedef inference(model, tokenizer, query, tools, generation_args):    """    Inference function to process user queries.    The model generates a JSON tool call based on the user query, which is then parsed and executed.    """    print("----start----")    messages = [        # {"role": "system", "content": system_prompt},        {"role": "user", "content": query},    ]    input_ids = tokenizer.apply_chat_template(        messages,        tokenize=True,        add_generation_prompt=True,        add_special_tokens=False,        padding=True,        tools=tools,        return_tensors="pt",    ).to("xpu")    output = model.generate(        input_ids=input_ids,        do_sample=False,        # repetition_penalty=1.1,        num_return_sequences=1,        **generation_args    )    generated_tokens = output[:, input_ids.shape[1]:]    decoded_output = tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)    print("Assistant response:", decoded_output[0])    function_name, arguments = parse_tool_call(decoded_output)    if not function_name or not arguments:        return    result = call_tool(function_name, arguments)    tool_calls = [{        "id": generate_alphanumeric(),        "type": "function",        "function": {            "name": function_name,            "arguments": arguments        }    }]    messages.append({        "role": "assistant",        "tool_calls": tool_calls    })    messages.append({        "role": "tool",        "name": function_name,        "content": result    })    messages.append({        "role": "assistant",        "content": "Given the tool response, give the final answer to the user. Answer:\n"    })    tool_prompt = tokenizer.apply_chat_template(        messages,        system_prompt=None,        continue_final_message=True,        add_special_tokens=True,        return_tensors="pt",        return_dict=True,        tools=None,    ).to("xpu")    tool_prompt = tool_prompt.to(model.device)    out = model.generate(        **tool_prompt,        **generation_args    )    generated_text = out[0, tool_prompt['input_ids'].shape[1]:]    print("Assistant response:", tokenizer.decode(generated_text, skip_special_tokens=True))    print("----end----\n\n")def main(    repo_id_or_model_path="meta-llama/Llama-3.2-3B-Instruct",    disable_adapter=False,    lora_adapter_path="outputs/checkpoints/checkpoint-1000",    max_seq_length=2048,    max_new_tokens=150,    temperature=None,    top_p=None,    top_k=None,    min_p=None,):    model = "meta-llama/Llama-3.2-3B-Instruct"    model, tokenizer = FastLanguageModel.from_pretrained(        model_name=repo_id_or_model_path,        max_seq_length=max_seq_length,        dtype=None,        load_in_4bit=False,        fix_tokenizer=False    )    if not disable_adapter:        if os.path.exists(lora_adapter_path):            print("Loading adapter from:", lora_adapter_path)            model.load_adapter(lora_adapter_path)        else:            print(f"Adapter path {lora_adapter_path} does not exist.")            raise FileNotFoundError(f"Adapter path {lora_adapter_path} does not exist.")    generation_args = {k: v for k, v in {        "max_new_tokens": max_new_tokens,        "temperature": temperature,        "top_p": top_p,        "top_k": top_k,        "min_p": min_p,    }.items() if v is not None}    queries = [        ("Find the sum of a = [1, -1, 2] and b = [3, 0, -4].", [update_calendar, get_vector_sum]),        ("Set a reminder to call John at 3 PM tomorrow.", [update_calendar, set_reminder, send_email]),        ("Generate a password for me. I want it to be 12 characters long, with numbers and symbols.", [generate_password, weather_info]),        ("Hey, I need to add an event to my calendar. It's a meeting with my team on 15th March from 10 AM to 12 PM. The title should be \"Team Meeting\".", [update_calendar, create_calendar_event, translate_text]),        ("Create a calendar event with the title 'Meeting' starting at 10 AM and ending at 12 PM on the 15th of March 2024, with attendees John and Jane.", [create_calendar_event, create_calendar_event_with_attendees]),    ]    for query, tools in queries:        inference(model, tokenizer, query, tools, generation_args)if __name__ == "__main__":    # Set params accordingly    main()

其运行需要依赖的sample_tools.py的文件代码如下：

sample_tool.py具体代码如下：

def send_email(recipient: str, subject: str, body: str) -> dict:    """    Send an email to a recipient.    Args:        recipient: Recipient email address.        subject: Email subject.        body: Email body.    Returns:        dict: Status and message about the sent email.    """    return {"status": "success", "message": f"Email sent to {recipient} with subject '{subject}'."}def weather_info(location: str) -> dict:    """    Get weather information for a location.    Args:        location: Location to get weather info for.    Returns:        dict: Status and weather information for the location.    """    return {"status": "success", "weather": f"Weather info for {location}."}def search_web(query: str) -> dict:    """    Search the web for information.    Args:        query: Search query.    Returns:        dict: Status and search results for the query.    """    return {"status": "success", "result": f"Search results for '{query}'."}def create_calendar_event(title: str, start_time: str, end_time: str) -> dict:    """    Create a calendar event.    Args:        title: Event title.        start_time: Start time.        end_time: End time.    Returns:        dict: Status and message about the created event.    """    return {"status": "success", "message": f"Event '{title}' created from {start_time} to {end_time}."}def create_calendar_event_with_attendees(title: str, start_time: str, end_time: str, attendees: list[str] = None) -> dict:    """    Create a calendar event with attendees.    Args:        title: Event title.        start_time: Start time (ISO format).        end_time: End time (ISO format).        attendees: List of attendee email addresses.    Returns:        dict: Status and message about the created event with attendees.    """    attendees_list = attendees if attendees else []    return {        "status": "success",        "message": f"Event '{title}' created from {start_time} to {end_time} with attendees: {', '.join(attendees_list) if attendees_list else 'None'}."    }def translate_text(text: str, target_language: str) -> dict:    """    Translate text to another language.    Args:        text: Text to translate.        target_language: Target language.    Returns:        dict: Status and translation result.    """    return {"status": "success", "translation": f"Translated '{text}' to {target_language}."}def set_reminder(reminder_text: str, time: str) -> dict:    """    Set a reminder at a specific time.    Args:        reminder_text: Reminder text.        time: Time for the reminder.    Returns:        dict: Status and message about the set reminder.    """    return {"status": "success", "message": f"Reminder set: '{reminder_text}' at {time}."}def generate_password(length: int, include_numbers: bool = False, include_symbols: bool = False) -> dict:    """    Generate a random password.    Args:        length: The length of the password.        include_numbers: Whether to include numbers in the password.        include_symbols: Whether to include symbols in the password.    Returns:        dict: Status and the generated password.    """    import random    import string    chars = string.ascii_letters    if include_numbers:        chars += string.digits    if include_symbols:        chars += string.punctuation    password = ''.join(random.choice(chars) for _ in range(length))    return {"status": "success", "password": password}def update_calendar(event_title: str, start_time: str, end_time: str) -> dict:    """    Update a calendar with an event.    Args:        event_title: Event Title.        start_time: Start Time.        end_time: End Time.    Returns:        dict: Status and message about the updated calendar event.    """    return {"status": "success", "message": f"Event '{event_title}' has been successfully added to your calendar from {start_time} to {end_time}."}def get_vector_sum(a: list[float], b: list[float]) -> list[float]:    """    Performs element-wise addition of two numerical vectors.    Both vectors must be of the same length and contain numerical values.    Args:        a: First vector containing numerical values        b: Second vector containing numerical values    Returns:        Resulting vector where each element is the sum of corresponding elements in a and b    Raises:        ValueError: If vectors have different lengths    Example:        >>> get_vector_sum([1, 2], [3, 4])        [4, 6]    """    if len(a) != len(b):        raise ValueError("Vectors must be of the same length")     return [x + y for x, y in zip(a, b)]

运行效果如下：

python inference.py

评测

python evaluate_model.py

具体代码如下：

import osimport json from unsloth import FastLanguageModelfrom datasets import load_datasetfrom evaluate import load as load_metricfrom tqdm import tqdm

def get_dataset(dataset_name: str, num_samples: int, dataset_seed: int):    dataset = load_dataset(dataset_name, split="train")    dataset = dataset.shuffle(seed=dataset_seed)    valid_examples = []    for example in dataset:        answer = json.loads(example["answers"])        if len(answer) == 1:            example["answers"] = example["answers"].replace('arguments', 'parameters')            valid_examples.append(example)            if len(valid_examples) == num_samples:                break
    return valid_examples
def generate_response(    model,    tokenizer,    query: str,    tools: list[dict],    generation_args: dict):    messages = [        {"role": "user", "content": query},    ]
    input_ids = tokenizer.apply_chat_template(        messages,        tokenize=True,        add_generation_prompt=True,        add_special_tokens=False,        padding=True,        tools=tools,        return_tensors="pt",    ).to("xpu")
    output = model.generate(        input_ids=input_ids,        do_sample=False,        # repetition_penalty=1.1,        num_return_sequences=1,        **generation_args    )    generated_tokens = output[:, input_ids.shape[1]:]    return tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
def parse_tool_call(prediction):    """Parses the model's prediction to extract the function name and parameters."""
    try:        content = json.loads(prediction)    except json.JSONDecodeError as e:        print("Failed to parse JSON:", e)        print(prediction)        return None, None    try:        function_name = content['name']        arguments = content['parameters']    except KeyError as e:        print("Missing key in JSON:", e)        return None, None
    return function_name, arguments
def tool_call_metric(predictions: list[str], references: list[str]):    """Computes the accuracy of function calls based on function name and parameters.    Instead of just exact matches this metric checks if the function name and parameters match.    """
    correct = 0    total = len(predictions)    tool_call_results =[]
    for pred, ref in zip(predictions, references):        pred_funcname, pred_args = parse_tool_call(pred)        ref_funcname, ref_args = parse_tool_call(ref)
        if pred_funcname == ref_funcname and pred_args == ref_args:            correct += 1            match = True        else:            match = False
        tool_call_results.append(match)
    accuracy = correct / total if total > 0 else 0.0    return tool_call_results, {"tool_call_accuracy": accuracy}
def evaluate_model(    model,    tokenizer,    dataset: list[dict],    generation_args: dict):    """Evaluates the model and returns predictions and references."""
    predictions = []    references = []
    for example in tqdm(dataset, desc="Evaluating", unit="sample"):        query = example["query"]        reference = json.dumps(json.loads(example["answers"])[0])   # get first item as string        tools = json.loads(example["tools"])        prediction = generate_response(model, tokenizer, query, tools, generation_args)        predictions.append(prediction)        references.append(reference)
    return predictions, references
def format_metric(metric):    if isinstance(metric, dict):        return {k: float(v) if hasattr(v, 'item') else v for k, v in metric.items()}    elif hasattr(metric, 'item'):        return float(metric)    return metric
def log_evaluation_results(    output_dir: str,    evaluation_config: dict,    predictions: list[str],    references: list[str],    dataset: list[dict],    metrics: dict,    tool_call_results: list[bool],    output_filename="evaluation_results.json"):    """Logs evaluation metrics and predictions to a file."""    print(tool_call_results)    output_filepath = os.path.join(output_dir, output_filename)    os.makedirs(os.path.dirname(output_filepath), exist_ok=True)    results = {        "evaluation_config": evaluation_config,        "metrics": metrics,        "predictions": [            {                "query": ex["query"],                "tools": ex["tools"],                "reference": reference,                "prediction": pred,                "match": tool_call_result            } for ex, pred, reference, tool_call_result in zip(dataset, predictions, references, tool_call_results)        ]    }    with open(output_filepath, "w") as f:        json.dump(results, f, indent=2)    print(f"Outputs saved to {output_filepath}")
def compute_metrics(output_dir, predictions: list, references: list, disable_save_results: bool, dataset: list):    # Load metrics    exact_match_metric = load_metric("exact_match")    rouge_metric = load_metric("rouge")    bleu_metric = load_metric("bleu")    meteor_metric = load_metric("meteor")    bertscore_metric = load_metric("bertscore")
    # Exact Match - Ignore whitespaces after semicolon. The model sometimes produces no-space after colon, which is still valid JSON.    tool_call_results, tool_call_accuracy = tool_call_metric(predictions, references)    exact_match_result = format_metric(exact_match_metric.compute(predictions=predictions, references=references, regexes_to_ignore=[r"(?<=:)\s+"]))    rouge_result = format_metric(rouge_metric.compute(predictions=predictions, references=references))    bleu_result = format_metric(bleu_metric.compute(predictions=predictions, references=references))    meteor_result = format_metric(meteor_metric.compute(predictions=predictions, references=references))    bertscore_result = bertscore_metric.compute(predictions=predictions, references=references, lang="en")
    print("Tool Call Accuracy:\n", tool_call_accuracy, "\n")    print("Exact Match:\n", exact_match_result, "\n")    print("ROUGE:\n", rouge_result, "\n")    print("BLEU:\n", bleu_result, "\n")    print("METEOR:\n", meteor_result, "\n")    print("BERTScore:\n", bertscore_result, "\n")
    metrics = {        "Tool Call Accuracy": tool_call_accuracy,        "Exact Match": exact_match_result,        "ROUGE": rouge_result,        "BLEU": bleu_result,        "METEOR": meteor_result,        "BERTScore": bertscore_result,    }
    return metrics, tool_call_results
def main(    repo_id_or_model_path="meta-llama/Llama-3.2-3B-Instruct",    disable_adapter=False,    lora_adapter_path="outputs/checkpoints/checkpoint-1000",    max_seq_length=2048,    max_new_tokens=150,    temperature=None,    top_p=None,    top_k=None,    min_p=None,    num_samples=50,    disable_save_results=False,    dataset_seed=None,    output_dir=None,):    dataset_name="Salesforce/xlam-function-calling-60k"
    # Load model and tokenizer    model, tokenizer = FastLanguageModel.from_pretrained(        model_name=repo_id_or_model_path,        max_seq_length=max_seq_length,        device_map="xpu:0",        load_in_4bit=False,    )
    if not disable_adapter:        if os.path.exists(lora_adapter_path):            print("Loading adapter from:", lora_adapter_path)            model.load_adapter(lora_adapter_path)        else:            print(f"Adapter path {lora_adapter_path} does not exist.")            raise FileNotFoundError(f"Adapter path {lora_adapter_path} does not exist.")
    FastLanguageModel.for_inference(model)
    generation_args = {k: v for k, v in {        "max_new_tokens": max_new_tokens,        "temperature": temperature,        "top_p": top_p,        "top_k": top_k,        "min_p": min_p,    }.items() if v is not None}
    evaluation_config = {        "repo_id_or_model_path": repo_id_or_model_path,        "dataset_name": dataset_name,        "disable_adapter": disable_adapter,        "lora_adapter_path": lora_adapter_path,        "max_seq_length": max_seq_length,        "num_samples": num_samples,        "dataset_seed": dataset_seed,        "max_new_tokens": max_new_tokens,        "generation_args": generation_args,    }
    # Load dataset    dataset = get_dataset(dataset_name, num_samples, dataset_seed)
    def run_evaluation(**kwargs):        return evaluate_model(**kwargs)
    predictions, references = run_evaluation(        model=model,        tokenizer=tokenizer,        dataset=dataset,        generation_args=generation_args    )    metrics, tool_call_results = compute_metrics(output_dir, predictions, references, disable_save_results, dataset)
    if not disable_save_results:        log_evaluation_results(output_dir, evaluation_config, predictions, references, dataset, metrics, tool_call_results)
if __name__ == "__main__":    # Set params accordingly    main()

运行效果如下：

4）效果展示：微调后模型已学会“先调用工具，再基于结果回答”

下面是我本地运行 python inference.py 的真实输出,使用如下sample_tools.py中提供的工具（加载 LoRA adapter：checkpoint-1000）。

你会看到一个稳定模式：

模型先输出结构化 tool call JSON → 工具执行返回结果 → 模型再自然语言总结

Case 1：向量求和（结构化参数 + 正确结果归纳）

[Assistant → Tool Call JSON]{"name": "get_vector_sum", "parameters": {"a": [1, -1, 2], "b": [3, 0, -4]}} [Tool → Result][4, -1, -2] [Assistant → Final Answer]The sum of vectors a and b is [4, -1, -2].

Case 2：设置提醒（意图抽取为可执行参数）

[Assistant → Tool Call JSON]{"name": "set_reminder", "parameters": {"reminder_text": "Call John", "time": "3 PM tomorrow"}}[Tool → Result]{'status': 'success', 'message': "Reminder set: 'Call John' at 3 PM tomorrow."}[Assistant → Final Answer]Okay, I have set a reminder to call John at 3 PM tomorrow.

Case 3：生成强密码（参数可控 + 工具结果复述）

[Assistant → Tool Call JSON]{"name": "generate_password", "parameters": {"length": 12, "include_numbers": true, "include_symbols": true}}[Tool → Result]{'status': 'success', 'password': ':JlgWf0G8PXa'}[Assistant → Final Answer]The generated password for you is :JlgWf0G8PXa.

Case 4：创建日历事件（时间理解 + schema 稳定）

[Assistant → Tool Call JSON]{"name": "create_calendar_event", "parameters": {"title": "Team Meeting", "start_time": "15th March 10 AM", "end_time": "15th March 12 PM"}}[Tool → Result]{'status': 'success', 'message': "Event 'Team Meeting' created from 15th March 10 AM to 15th March 12 PM."}[Assistant → Final Answer]The event 'Team Meeting' has been successfully added to your calendar from 15th March 10 AM to 15th March 12 PM.

Case 5：带参会人的事件（更复杂 schema + 人类可读重写）

[Assistant → Tool Call JSON]{"name": "create_calendar_event_with_attendees", "parameters": {"title": "Meeting", "start_time": "2024-03-15T10:00:00", "end_time": "2024-03-15T12:00:00", "attendees": ["john@example.com", "jane@example.com"]}}[Tool → Result]{'status': 'success', 'message': "Event 'Meeting' created from 2024-03-15T10:00:00 to 2024-03-15T12:00:00 with attendees: john@example.com, jane@example.com."}[Assistant → Final Answer]Event 'Meeting' has been successfully created for 15th of March 2024, starting from 10 AM and ending at 12 PM. The attendees for this meeting are John and Jane.

5）评测结果：Tool Call Accuracy = 0.78（50 samples）

在 python evaluate.py 的 50 条样本评测中，我得到：

Tool Call Accuracy：0.78
Exact Match：0.76
ROUGE-L：≈0.953
BLEU：≈0.96
METEOR：≈0.958

这说明：在只训练了 1 epoch / 1000 steps 的情况下，模型已经具备较为可用的工具调用能力。评测日志中也观察到少量长参数样本会出现 JSON 截断导致 parse 失败，这属于 function calling 任务里常见的“长结构输出”挑战，后续可以通过更长训练步数、增加长参数样本比例或结构化约束解码进一步提升。

6）总结：步骤简单，本地 AI PC 也能完成微调闭环

跑通后你会发现：这套流程本质上就是 “装好工具链 → 配好运行时 → 一条命令训练”。

更重要的是：你完全可以在自己的 AI PC 上完成 LoRA 微调、推理验证、评测闭环——无需昂贵独立显卡，也不必上云。这对开发者做个人项目验证、企业内部 PoC、以及端侧 Agent 能力定制，都非常实用。