在本指南中,我将介绍如何微调 Llama 2 以使其成为对话摘要器!

上周末,我想在我自己收集的Google Keep笔记数据集上微调Llama 2(现在在Open LLM排行榜中占据至高无上的地位);我的每个笔记都有一个标题和一个正文,所以我想训练 Llama 从给定的标题生成一个正文。

本教程的第一部分介绍如何使用Huggingface库在 samsum 对话框摘要数据集上微调 Llama 2。我倾向于发现,虽然Huggingface在transformers中建立了一个极好的库,但他们的指南往往使普通人的事情过于复杂。第二部分,对自定义数据的微调,将在本周末推出!

要开始使用,请为自己购买 A10、A10G、A100(或任何具有 >24GB GPU 内存的 GPU)。如果您不确定从哪里开始,Brev 云可以轻松访问这些 GPU 中的每一个!

1. 下载模型

Clone Meta 的 Llama 推理存储库(包含下载脚本):

git clone https://github.com/facebookresearch/llama.git

然后运行下载脚本:

bash download.sh

它会提示您在电子邮件中输入 Meta 发送给您的网址。如果您尚未注册,请在此处注册。他们向您发送电子邮件的速度出奇地快!

对于本指南,您只需要下载7B型号。

2.将模型转换为Hugging Face格式

wget https://raw.githubusercontent.com/huggingface/transformers/main/src/transformers/models/llama/convert_llama_weights_to_hf.py
pip install git+https://github.com/huggingface/transformers
pip install -e .
python convert_llama_weights_to_hf.py \
    --input_dir llama-2-7b --model_size 7B --output_dir models_hf/7B

如果您最初只下载了 7B 模型,则需要确保将模型文件移动到名为“7B”的目录中。这是我的目录结构:

llama-2-7b/
├── 7B
│   ├── checklist.chk
│   ├── consolidated.00.pth
│   └── params.json
├── tokenizer.model
└── tokenizer_checklist.chk

现在,这为我们提供了一个Hugging Face模型,我们可以利用Huggingface库对其进行微调!

3. 运行微调笔记本:

克隆Llama-recipies存储库:

git clone https://github.com/facebookresearch/llama-recipes.git

然后在首选笔记本界面中打开 quickstart.ipynb 文件:

(我像这样使用Jupyter实验室):

pip install jupyterlab
jupyter lab # in the repo you want to work in

然后只需运行整个笔记本。

确保更改该行:

model_id="./models_hf/7B"

转换为转换的实际模型路径。就是这样!你最终会得到一个 Lora 微调.

4. 在微调后的模型上运行推理

这里的问题是Huggingface只保存适配器重量,而不是完整模型。因此,我们需要将适配器权重加载到完整模型中。我努力寻找合适的文档来做到这一点......但最终还是解决了!

导入库:

import torch
from transformers import LlamaForCausalLM, LlamaTokenizer
from peft import PeftModel, PeftConfig

加载分词器和模型:

model_id="./models_hf/7B"
tokenizer = LlamaTokenizer.from_pretrained(model_id)
model =LlamaForCausalLM.from_pretrained(model_id, load_in_8bit=True, device_map='auto', torch_dtype=torch.float16)

在训练后从保存适配器的位置加载适配器:

model = PeftModel.from_pretrained(model, "/root/llama-recipes/samsungsumarizercheckpoint")

运行推理:

eval_prompt = """
Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))
---
Summary:
"""

model_input = tokenizer(eval_prompt, return_tensors="pt").to("cuda")

model.eval()
with torch.no_grad():
    print(tokenizer.decode(model.generate(**model_input, max_new_tokens=100)[0], skip_special_tokens=True))

在本系列的下一步中,我将向您展示如何格式化自己的数据集以在自定义任务上训练 Llama 2!如果你想让我快点,请在给我留言!