CodeLlama 与 CodeGemma：使用开源模型进行 AI 编码辅助

原文：towardsdatascience.com/codellama-vs-codegemma-using-open-models-for-ai-coding-assistance-da446c9157b8人工智能编码工具市场是一个价值数十亿美元的行业。预计到 2030 年将达到 172 亿美元，即使今天，VS Code 或 JetBrains IDE 的 AI 插件也有数百万次下载。但我们能否

布客飞龙

372人浏览 · 2025-11-21 00:35:00

布客飞龙 · 2025-11-21 00:35:00 发布

原文：towardsdatascience.com/codellama-vs-codegemma-using-open-models-for-ai-coding-assistance-da446c9157b8

人工智能编码工具市场是一个价值数十亿美元的行业。预计到 2030 年将达到 172 亿美元，即使今天，VS Code 或 JetBrains IDE 的 AI 插件也有数百万次下载。但我们能否将本地模型作为免费的编码助手运行，并且它的表现会怎样？在这篇文章中，我将测试两个开源模型，Code Gemma 和 Code Llama。我将它们安装在我的电脑上，我们将看看它们是如何工作的。

不再拖延，让我们开始吧！

1. 模型

在撰写本文时，有两个主要的开源模型可供免费下载，可用于编码目的：

CodeLlama。该模型由 Meta 于 2023 年发布；提供 7B、13B、34B 和 70B 四种大小。“基础”、“指令”和“Python”模型可供选择。尽管有四种大小，但只有 7B 和 13B 模型可以在本地实际使用；其他模型过于“庞大”。
CodeGemma。该模型由谷歌于 2024 年发布，提供 2B 和 7B 两种大小。2B 模型仅用于代码补全，而 7B 模型用于代码填充和自然语言提示。

在这篇文章中，我将测试 HuggingFace 上可用的 7B 和 13B 模型，它们可以以 GGUF 格式下载。我将运行一个与 OpenAI 兼容的本地服务器，这将允许我们使用这些模型与不同的应用程序一起使用。但在做那之前，让我们先在 Python 中运行模型看看它们能做什么。那些想要直接进入实际应用的读者可以跳过这部分内容。

为了测试这两个模型，我将使用免费的 Google Colab 实例。首先，让我们加载模型和分词器：

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
import transformers
import torch

model_id = "..."
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=False,
)

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=bnb_config1,
    device_map="cuda",
    torch_dtype=torch.bfloat16,
)

HuggingFace 的Transformers库足够智能，可以自动下载模型文件。7B 模型需要大约 16.2 GB 的 GPU RAM，所以我使用bits and bytes库以 4 位分辨率运行模型；之后，所需的内存占用仅为大约 5 GB。

现在，让我们创建一个代码片段来测试模型。作为一个例子，我创建了一个 Python 方法，用于将字符串列表写入文件：

python_code = """
class Writer:
   def write_file(self, filename: str, data: List[str]):
        """ Write list of strings to a text file """
        with open(filename, 'w') as f_out:
            for line in data:
                f_out.write(f"{line}n")
"""

为了测试模型的编码能力，让我们要求两个模型为它生成一个"pytest"：

chat = [{
    "role": "user",
    "content": f"Write a pytest for this Python method:n{python_code}. "
               f"Delete the created file at the end of the test."
    }]

prompt = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tokenizer.encode(prompt, add_special_tokens=False, return_tensors="pt")
outputs = model.generate(input_ids=inputs.to(model.device), max_new_tokens=1024)
result = tokenizer.decode(outputs[0])

关于结果，CodeLlama 7B生成了以下代码，生成过程耗时 19 秒：

import pytest

class TestWriter:
    def test_write_file(self):
        writer = Writer()
        filename = 'test.txt'
        data = ['line1', 'line2', 'line3']
        writer.write_file(filename, data)
        with open(filename, 'r') as f:
            lines = f.readlines()
            assert lines == data
        os.remove(filename)

CodeGemma生成了以下代码，整个过程耗时 16 秒：

import pytest

def test_write_file():
    """ Test the write_file method """
    filename = "test.txt"
    data = ["This is a test", "line 2", "line 3"]
    Writer().write_file(filename, data)

    with open(filename, "r") as f:
        assert f.read() == "This is a testnline 2nline 3n"

    import os
    os.remove(filename)

个人来说，我更喜欢第二个版本。首先，CodeGemma 提供了方法的文档字符串描述，这是现代 “linter” 工具的要求。其次，与声明一个 writer 变量并在之后使用它相比，Writer().write_file(...) 代码看起来更紧凑、更易读。第三，CodeGemma 导入了 “os” Python 模块，而 CodeLlama “忘记” 做这件事。

乍一看，两个代码片段看起来都是正确的。让我们通过执行 pytest -v file.py 命令来运行代码：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/94cbb2901d3c04a2ad512465e106030f.png

Pytest 结果，图片由作者提供

实际上，我对两个测试的正确性都有误，第一个测试中有一个错误。有趣的是，第二个测试不仅看起来更好，而且它也工作，而第一个则不行。错误从截图上很明显；读者可以尝试自己找出如何修复它。

最初，我并不打算测试 CodeGemma 2B “代码补全” 模型，但作为对读者的额外奖励，让我们试试！模型的加载方式相同；我们只需要更改模型 ID：

model_id = "google/codegemma-2b"
model = AutoModelForCausalLM.from_pretrained(model_id, ...)

这个模型是用于代码补全的。它不需要任何英文描述，我们只需要提供源代码：

# Prompt
python_code = """
class Writer:
   def write_file(self, filename: str, data: List[str]):
      ...

import pytest

def test_write_file():
    """ Test the write_file method """
"""

prompt = f"""
<|fim_prefix|>{python_code}
<|fim_suffix|>
<|fim_middle|>
"""

# Run inference
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
prompt_len = inputs["input_ids"].shape[-1]
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0][prompt_len:]))

考虑到模型的小尺寸，结果出奇地好。它生成了以下输出：

def test_write_file():
    """ Test the write_file method """
    writer = Writer()
    writer.write_file("test_file.txt", ["Hello", "World"])
    with open("test_file.txt", "r") as f:
        lines = f.readlines()
    assert lines == ["Hello
", "World
"]

如我们所见，这段代码不会“直接可用”，但逻辑看起来是正确的。所需的修复是正确格式化 assert 行：

assert lines == ["Hellon", "Worldn"]

之后，“pytest” 测试通过。模型在测试后也没有删除文件，但我没有在提示中要求这样做。最后但同样重要的是，小型模型的执行时间仅为 3.3 秒，比大型模型快 5 倍左右。

2. 运行 Llama 服务器

我们在 Python 中测试了我们的模型；现在让我们运行一个本地与 OpenAI 兼容的服务器。我将使用 Llama-cpp-python 来做这件事。这是一个不错且轻量级的项目；我们可以使用单个命令行运行任何我们想要的模型：

# Code Gemma
python3 -m llama_cpp.server --model codegemma-7b-it-Q4_K_M.gguf --n_ctx 8192 --n_gpu_layers -1 --host 0.0.0.0 --port 8000

# Code Llama 7B
python3 -m llama_cpp.server --model codellama-7b-instruct.Q4_K_M.gguf --n_ctx 8192 --n_gpu_layers -1 --host 0.0.0.0 --port 8000

# Code Llama 13B
python3 -m llama_cpp.server --model codellama-13b-instruct.Q4_K_M.gguf --n_ctx 8192 --n_gpu_layers -1 --host 0.0.0.0 --port 8000

如果没有足够的 GPU RAM 来加载模型，可以将 n_gpu_layers 参数更改为只加载一些层到 GPU 中。我们还可以在 Apple Silicon 或甚至在 CPU 上运行模型，但显然会慢一些。

3. 应用程序

目前，我们有一个本地与 OpenAI 兼容的服务器，我们准备测试一些应用程序！

3.1 AI Shell

AI Shell 是一个开源应用程序，可以将自然语言提示转换为控制台命令。这个应用程序相当受欢迎，在撰写本文时，该项目在 GitHub 上有 3.6K 个星标。AI Shell 用 TypeScript 编写，我们可以通过 npm 软件包管理器安装应用程序（在这里，我还安装了 Node JS 20.13.0）：

curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash
nvm install v20.13.0
npm install -g @builder.io/ai-shell

在运行应用程序之前，我们需要配置 API 端点：

ai config set OPENAI_KEY=12345678
ai config set OPENAI_API_ENDPOINT=http://127.0.0.1:8000/v1

现在，我们可以在控制台中输入 “ai chat” 命令在任何时候启动与模型的对话：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/17568d3430ca93201ab82a399bd23a48.png

AI Shell 终端输出，图片由作者提供

使用程序的一种方法是输入我们想要执行的命令。例如，我们可以输入类似“显示当前文件夹中的文件”的内容：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/4af8967c32fe33af140822aa3e824ab6.png

AI Shell 终端输出，图片由作者提供

惋惜，使用免费的 7B 模型时它不起作用，模型无法生成正确的 shell 命令。此外，提示中的“脚本”一词似乎让模型困惑，它生成了关于电影脚本的文本。

这个问题可能可以通过调整提示来解决，但在撰写本文时，提示在 TypeScript 源代码中是硬编码的，无法轻松配置。我的功能建议在 GitHub 上还没有人回应，但希望将来会有所改进。

3.2 ShellGPT

ShellGPT是另一个有趣的开源项目，在撰写本文时在 GitHub 上有 8.3K 个星标。我们可以轻松使用pip安装应用程序：

pip3 install shell-gpt

要使用带有本地模型的 ShellGPT，我们需要在~/.config/shell_gpt/.sgptrc文件中更改一个 API 端点：

API_BASE_URL=http://127.0.0.1:8000/v1
OPENAI_API_KEY=12345678

然后我们可以在终端 shell 中直接输入我们的请求，几乎和上一个应用中的方式一样：

sgpt "Write a command to show local files"

惋惜的是，CodeGemma模型与 ShellGPT 不兼容，LlamaCpp 服务器返回了 500 错误：“系统角色不受支持”。起初，我以为这是 LlamaCpp 的问题，但查看日志后，我发现模型元数据有这些行：

{% if messages[0]['role'] == 'system' %}
  {{ raise_exception('System role not supported')

很遗憾，CodeGemma 不支持“系统”角色，因为它在 OpenAI API 中广泛使用。因此，兼容 OpenAI 的应用程序无法使用 CodeGemma，这确实很遗憾，因为我们之前看到，CodeGemma 生成的代码相当不错。

对于CodeLlama，ShellGPT 运行良好：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/751bff8b3b1294bee3871c5eaea41e83.png

另一个有用的功能是，通过指定--shell前缀，在终端 shell 中直接执行命令：

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/6f541eba5bf2cb221b340b805bac53e3.png

有改进的空间；例如，“显示文档文件夹的大小”提示返回了du -sh ~/Documents响应。这是一个正确的 bash 命令，但 ShellGPT 无法从py`` ````字符串中获取它，我只得到了py``command not found` error.

3.3 CodeGPT

Using bash commands can be useful, but how about actual coding assistance? We can do it with the help of the open-source CodeGPT plugin. First, First, I installed the plugin in my PyCharm IDE and configured it for use with LlamaCpp:

https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/ba56a8c718ed4dfea4f2d6b1309e7bc9.png

CodeGPT Settings, Image by author

As an example, let’s consider this Python class:


class ServerConnection:

    """服务器连接处理"""

    def __init__(self):

        self.is_connected = False

        self.connection_time = -1

        self.uploads_total = 0

        self.reconnects_total = 0

        self.reconnect_threshold_sec = 64

```py

I will ask the model to refactor the variables into a separate [Python data class](https://docs.python.org/3/library/dataclasses.html).

As for the results, **CodeGemma** was not able to do it; it returned an error "System role not supported." **CodeLlama 7B** was not able to complete the task; it created a standard class instead of a data class. **CodeLlama 13B** performed the task well:

<https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/233cb2b575342e57e8b91feeb10c7885.png>

CodeGPT chat, Image by author

As a next step, I decided to ask something more complex and entered a `create a UI Python application with a textfield and button` prompt. A Llama 13B model generated this code:

import tkinter as tk

创建主窗口

root = tk.Tk()

root.title(“我的应用程序”)

创建一个文本字段

text_field = tk.Entry(root)

text_field.pack()

创建一个按钮

button = tk.Button(root, text=“点击我！”, command=lambda: print(“您点击了按钮！”))

button.pack()

启动主循环

root.mainloop()


The code is correct, but the application window was not visible – its size is not specified. I continued the chat and asked the model to change the title to "Hello World" and to set the window size to 320×200:

<https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/81183b2c933b99b567aebf22cb8886bc.png>

CodeGPT chat, Image by author

Now, the result was okay, and the requested app was working as expected:

<https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/d4a4b014c48e5e39c617eb309551568d.png>

Tkinter app, Image by author

I must admit that a 13B model is not perfect. In theory, it has a large context window and should use previous chat results, but when I asked the model to move the generated code into a class, it generated a new code without setting a window size or title:

import tkinter as tk

class HelloWorld(tk.Frame):

def __init__(self, master=None):

    super().__init__(master)

    self.pack()

    # 创建一个文本字段

    self.text_field = tk.Entry(self)

    self.text_field.pack()

    # 创建一个按钮

    self.button = tk.Button(self, text="点击我！", command=lambda: print("按钮被点击！"))

    self.button.pack()

if name == “main”:

root = tk.Tk()

app = HelloWorld(root)

root.mainloop()


但总的来说，模型创建了一个正确的类，通过一些复制粘贴，很容易完成工作。

### 4. 缺点

从所有最后的例子中，我们可以看到模型是有效的；它既能生成代码也能生成 bash 命令。但也有一些缺点和问题：

+   使用本地 LLM 实例需要一个不错的显卡。我有一张 2.5 年的 GeForce RTX 3060 显卡，拥有 8GB 的 GPU RAM。在我的 Colab 测试中，我发现 8GB 足以运行一个 7B 模型，但在实际的台式机上，没有足够的 CUDA 内存来运行——操作系统本身也需要一些 GPU 来工作。实际上，要运行一个 13B 模型，至少需要 16GB 的 GPU RAM，24GB 会更好，以便为未来的改进留出空间。这有实际意义吗？考虑到当前的 GPU 价格，我并不完全确定——对于 1000-1500 美元，我们可以订阅 AI 服务多年。

+   开源应用程序并不完美。在我的测试中，LlamaCpp 服务器有时会因“段错误”而崩溃，CodeGPT 应用程序有时不会向模型发送任何请求，我不得不重启 PyCharm，等等。它是开源的，没有任何保证，所以我不抱怨，但我必须承认，对于这些 AI 工具，我们正处于“早期采用”阶段。

+   提到另一个有趣的事实也很重要，运行一个大型本地语言模型是一个耗能的任务。作为最后的测试，我将一个功率计连接到我的台式电脑。结果显示，在正常工作状态下，它大约消耗了**80 瓦**。但是当 LLM 请求运行时，能耗几乎增加了**三倍**：

<https://github.com/OpenDocCN/towardsdatascience-blog-zh-2024/raw/master/docs/img/8467e6a9bf0a8a4f540c131100f84d3e.png>

AI 模型请求期间的功耗，图片由作者提供

### 结论

在这篇文章中，我测试了开源语言模型作为编码助手的能力，结果很有趣：

+   即使是 7B 和 13B 的小模型也能执行一些编码任务，如重构、编写单元测试或编写小型代码模板。显然，与 175B 的 ChatGPT 3.5 等大型模型相比，这些模型的能力较弱，但使用本地模型不需要任何订阅费用；从隐私的角度来看，它也可能更快、更好。

+   另一方面，运行本地模型需要高端硬件，这不仅可能很昂贵，而且也很耗能。在撰写本文时，高端 GPU 可能高达 1500 美元，仅用于运行本地 LLM 是不切实际的——对于这个价格，我们可以长期订阅云服务。

+   使用 AI 工具的挑战不仅在于硬件，还在于软件。至少在我撰写这篇帖子的时候，AI 软件的开源生态系统尚未成熟。我惊讶地发现 HuggingFace 上有 39,769 个开放的 7B 模型，但 GitHub 上开源的 AI 应用程序数量却微乎其微。本文中描述的这 3 个几乎是我能找到的全部（如果遗漏了什么，请在下面的评论中告诉我，也许我会写第二部分来回顾）。

通常来说，使用本地 LLM（大型语言模型）来完成日常编码任务是可行的，但我们也可以看到，在软件和硬件方面仍然存在许多挑战。我们也知道，现在不同的公司几乎都在努力开发更高效的 AI 芯片和更高效的模型。像微软的 Phi-3（[Microsoft’s Phi-3](https://azure.microsoft.com/en-us/blog/introducing-phi-3-redefining-whats-possible-with-slms/)）这样的新模型甚至能够在移动硬件上运行。这将如何改变 AI 行业？下一代集成显卡会是便宜、无声且兼容 CUDA 的吗？我们还不清楚。显然，会有很多新的 AI 相关硬件被宣布（M4 已经是第一个），至少我希望新的硬件不会是专有的，并且没有任何为开源使用的驱动程序。

感谢阅读。如果您喜欢这个故事，请随意[订阅](https://medium.com/@dmitryelj/membership)Medium，您将收到我新文章发布的通知，以及访问成千上万其他作者故事的完整权限。您也可以通过[LinkedIn](https://www.linkedin.com/in/dmitrii-eliuseev/)与我联系，我在那里定期发布一些不够成文章大小的短文。如果您想获取这篇和其他文章的完整源代码，请随意访问我的[Patreon 页面](https://www.patreon.com/deliuseev)。

对使用语言模型和自然语言处理感兴趣的人也可以阅读其他文章：

+   [GPT 模型：它是如何工作的？](https://towardsdatascience.com/gpt-model-how-does-it-work-74bbcc2e97d1)

+   [16 位、8 位和 4 位浮点格式 – 它是如何工作的？](https://towardsdatascience.com/16-8-and-4-bit-floating-point-formats-how-does-it-work-d157a31ef2ef)

+   [使用大型语言模型处理 Pandas DataFrame](https://towardsdatascience.com/process-pandas-dataframes-with-a-large-language-model-8362468aca47)

+   [一个周末的 AI 项目（第一部分）：在 Raspberry Pi 上运行语音识别和 LLaMA-2 GPT](https://towardsdatascience.com/a-weekend-ai-project-running-speech-recognition-and-a-llama-2-gpt-on-a-raspberry-pi-5298d6edf812)

+   [一个周末的 AI 项目（第二部分）：在 Raspberry Pi 上使用语音识别、PTT 和大型动作模型](https://towardsdatascience.com/a-weekend-ai-project-using-speech-recognition-ptt-and-a-large-action-model-on-a-raspberry-pi-ac8d839d078a)

+   [一个周末 AI 项目（第三部分）：为视障人士制作视觉助手](https://towardsdatascience.com/a-weekend-ai-project-making-a-visual-assistant-for-people-with-vision-impairments-df0b9f0b8c23)