使用pycocoeval进行ImageCaption任务评测
ImageCaption任务是CV以及mllm研究中的基础任务。在这个blog中,我们介绍如何使用pycocoevalcap这个package评测模型的captioning能力。
1. 场景:
ImageCaption任务是CV以及mllm研究中的基础任务。在这个blog中,我们介绍如何使用pycocoevalcap这个package评测模型的captioning能力。
2. 评测指标
在ImageCaption任务中,我们通常参考bleu, meteor, rouge_L, CIDEr, SPICE等semantic matching指标。其中CIDEr和 SPICE是用来评测ImageCaption的专门指标,而bleu, meteor, rouge_L也广泛应用于NMT和summarization等NLG任务。这里对于metric的原理不进行赘述,感兴趣的小伙伴可以参考如资料:
3. pycocoevalcap 安装与配置
pycocoevalcap是微软开发的专门用于评测MS COCO数据集ImageCaption任务的工具包, 它几乎集成了所有caption evaluation指标。但是由于pycocoevalcap开发的比较早,SPICE的计算还需依赖JAVA1.8环境,安装起来并不是一帆风顺的。But,让我们一步一步来~
3.1 安装pycocoevalcap
在用户的conda虚拟环境中,直接用pip安装即可。国内的小伙伴可以基于国内镜像,这里使用清华源。
pip install pycocoevalcap -i https://pypi.tuna.tsinghua.edu.cn/simple
3.2 安装JAVA Version 8
首先访问JAVA官网。我用的是M1芯片的Macbook,于是要在这里下载
如果使用其他的设备,可以在官网里面找到符合自己设备的安装包,再进行下载。ubuntu用户可以参考这里。下载下来的是一个Java 8 Updata 431的安装向导,直接双击点开并Install即可。
随着进度条走完,JAVA Version 8 已经安装完毕,我们可以在terminal中check一下.
#查看java版本
(MLBD) xxx@xxx-MacBook-Pro ~ % java -version
java version "1.8.0_431"
Java(TM) SE Runtime Environment (build 1.8.0_431-b10)
Java HotSpot(TM) 64-Bit Server VM (build 25.431-b10, mixed mode)
#查看java 路径
(MLBD) xxx@xxx-MacBook-Pro ~ % /usr/libexec/java_home -V
Matching Java Virtual Machines (1):
1.8.431.10 (arm64) "Oracle Corporation" - "Java" /Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home # 就是这个路径‘/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home’
/Library/Internet Plug-Ins/JavaAppletPlugin.plugin/Contents/Home
在这里,我们就把JAVA version 8安装好了。
3.3 注释SPICE.py 中的cache
若直接进行SPICE指标的评测,有时会报如下错误:
---------------------------------------------------------------------------
CalledProcessError Traceback (most recent call last)
Cell In[1], line 25
22 coco_eval.evaluate()
24 return coco_eval.eval
---> 25 compute_cider(result_path, annotation_path)
Cell In[1], line 22
20 coco_eval = COCOEvalCap(coco, coco_result)
21 coco_eval.params["image_id"] = coco_result.getImgIds()
---> 22 coco_eval.evaluate()
24 return coco_eval.eval
File /opt/miniconda3/envs/MLBD/lib/python3.10/site-packages/pycocoevalcap/eval.py:53, in COCOEvalCap.evaluate(self)
51 for scorer, method in scorers:
52 print('computing %s score...'%(scorer.method()))
---> 53 score, scores = scorer.compute_score(gts, res)
54 if type(method) == list:
55 for sc, scs, m in zip(score, scores, method):
File /opt/miniconda3/envs/MLBD/lib/python3.10/site-packages/pycocoevalcap/spice/spice.py:76, in Spice.compute_score(self, gts, res)
69 os.makedirs(cache_dir)
70 spice_cmd = ['java', '-jar', '-Xmx8G', SPICE_JAR, in_file.name,
71 '-cache', cache_dir,
72 '-out', out_file.name,
...
368 cmd = popenargs[0]
--> 369 raise CalledProcessError(retcode, cmd)
370 return 0
CalledProcessError: Command '['java', '-jar', '-Xmx8G', 'spice-1.0.jar', '/opt/miniconda3/envs/MLBD/lib/python3.10/site-packages/pycocoevalcap/spice/tmp/tmpq2cks2fv', '-cache', '/opt/miniconda3/envs/MLBD/lib/python3.10/site-packages/pycocoevalcap/spice/cache', '-out', '/opt/miniconda3/envs/MLBD/lib/python3.10/site-packages/pycocoevalcap/spice/tmp/tmp8ywvtify', '-subset', '-silent']' returned non-zero exit status 1.
这里似乎是因为pycocoevalcap默认从cache里找JAVA(具体可参考pycocoevalcap安装踩坑和填坑)。此时,我们需要找到pycocoevalcap中spice.py,注释掉cache读取方面的代码。我们首先找到pycocoevalcap的安装位置:
(MLBD) xxx@xxx-MacBook-Pro mmtrain % pip show pycocoevalcap
Name: pycocoevalcap
Version: 1.2
Summary: MS-COCO Caption Evaluation for Python 3
Home-page: https://github.com/salaniz/pycocoevalcap
Author:
Author-email:
License: UNKNOWN
# 这里就是安装location
Location: /opt/miniconda3/envs/MLBD/lib/python3.10/site-packages
Requires: pycocotools
Required-by:
接着我们访问 /opt/miniconda3/envs/MLBD/lib/python3.10/site-packages/spice/spice.py 文件。进行如下注释并保存:
好了,这个bug就顺利解决辽。如果还是报错,可以参考
4. 开测:
安装与配置就绪,我们可以进行测评了。主要有两种测评方式:
4.1 pycocotools评测
我们需要将annotations和preds整合成规定的json数据格式,整体评测出所有caption相关指标的值。
在example里我们考虑两张图片的caption评测,每张图片都有5个标注者给予了annotations。
annotations.json (dist) 如下,
- annotations[‘images’][‘id’]为图片编号,用来区别不同的图片;annotations[‘annotations’][‘image_id’]与annotations[‘images’][‘id’]是对上的,都指的是图片编号。
- annotations[‘annotations’][‘id’]指的是annotations的编号; annotations[‘annotations’][‘caption’]指每个caption annotation.
{
"images": [
{
"id": 0
},
{
"id": 1
}
],
"annotations":[
{
"image_id": 0,
"id": 0,
"caption": "Two young guys with shaggy hair look at their hands while hanging out in the yard."
},
{
"image_id": 0,
"id": 1,
"caption": "Two young, White males are outside near many bushes."
},
{
"image_id": 0,
"id": 2,
"caption": "Two men in green shirts are standing in a yard."
},
{
"image_id": 0,
"id": 3,
"caption": "A man in a blue shirt standing in a garden."
},
{
"image_id": 0,
"id": 4,
"caption": "Two friends enjoy time spent together."
},
{
"image_id": 1,
"id": 5,
"caption": "Several men in hard hats are operating a giant pulley system."
},
{
"image_id": 1,
"id": 6,
"caption": "Workers look down from up above on a piece of equipment."
},
{
"image_id": 1,
"id": 7,
"caption": "Two men working on a machine wearing hard hats."
},
{
"image_id": 1,
"id": 8,
"caption": "Four men on top of a tall structure."
}
]
}
model prediction被整合到preds.json (list)中, 如下:
- "image_id"为图片编号, “caption”为model 预测出的caption。
[
{
"image_id": 0,
"caption": "Two men standing in front of a door in a garden.Two men standing in front of a door in a garden."
},
{
"image_id": 1,
"caption": "a man standing on the top of a metal tower a man standing on the top of a metal tower(36"
}
]
from pycocoevalcap.eval import COCOEvalCap
from pycocotools.coco import COCO
import os
result_path = 'preds.json'
annotation_path='annotations.json'
def compute_cider(
result_path,
annotations_path,
):
# create coco object and coco_result object
coco = COCO(annotations_path)
coco_result = coco.loadRes(result_path)
# create coco_eval object by taking coco and coco_result
coco_eval = COCOEvalCap(coco, coco_result)
coco_eval.params["image_id"] = coco_result.getImgIds()
coco_eval.evaluate()
return coco_eval.eval
compute_cider(result_path, annotation_path)
4.2 只评测我们关心的某一个/几个指标。
此时可以参考pycocoevalcap安装踩坑和填坑
from pycocoevalcap.bleu.bleu import Bleu
from pycocoevalcap.meteor.meteor import Meteor
from pycocoevalcap.rouge.rouge import Rouge
from pycocoevalcap.cider.cider import Cider
from pycocoevalcap.spice.spice import Spice
class Scorer():
def __init__(self, ref, gt):
self.ref = ref
self.gt = gt
print('setting up scorers...')
self.scorers = [
(Bleu(4), ["Bleu_1", "Bleu_2", "Bleu_3", "Bleu_4"]),
(Meteor(), "METEOR"),
(Rouge(), "ROUGE_L"),
(Cider(), "CIDEr"),
(Spice(), "SPICE"),
]
def compute_scores(self):
total_scores = {}
for scorer, method in self.scorers:
print('computing %s score...' % (scorer.method()))
score, scores = scorer.compute_score(self.gt, self.ref)
if type(method) == list:
for sc, scs, m in zip(score, scores, method):
print("%s: %0.3f" % (m, sc))
total_scores["Bleu"] = score
else:
print("%s: %0.3f" % (method, score))
total_scores[method] = score
print('*****DONE*****')
for key, value in total_scores.items():
print('{}:{}'.format(key, value))
if __name__ == '__main__':
ref = {
'1': ['go down the stairs and stop at the bottom .'],
'2': ['this is a cat.']
}
gt = {
'1': ['Walk down the steps and stop at the bottom. ', 'Go down the stairs and wait at the bottom.',
'Once at the top of the stairway, walk down the spiral staircase all the way to the bottom floor. Once you have left the stairs you are in a foyer and that indicates you are at your destination.'],
'2': ['It is a cat.', 'There is a cat over there.', 'cat over there.']
}
scorer = Scorer(ref, gt)
scorer.compute_scores()
结果如下:
setting up scorers...
computing Bleu score...
{'testlen': 14, 'reflen': 13, 'guess': [14, 12, 10, 8], 'correct': [11, 9, 5, 2]}
ratio: 1.076923076840237
Bleu_1: 0.786
Bleu_2: 0.768
Bleu_3: 0.665
Bleu_4: 0.521
computing METEOR score...
METEOR: 0.904
computing Rouge score...
ROUGE_L: 0.694
computing CIDEr score...
CIDEr: 2.274
computing SPICE score...
Parsing reference captions
Initiating Stanford parsing pipeline
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator tokenize
[main] INFO edu.stanford.nlp.pipeline.TokenizerAnnotator - TokenizerAnnotator: No tokenizer type provided. Defaulting to PTBTokenizer.
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ssplit
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator parse
[main] INFO edu.stanford.nlp.parser.common.ParserGrammar - Loading parser from serialized file edu/stanford/nlp/models/lexparser/englishPCFG.ser.gz ...
done [0.2 sec].
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator lemma
[main] INFO edu.stanford.nlp.pipeline.StanfordCoreNLP - Adding annotator ner
Loading classifier from edu/stanford/nlp/models/ner/english.all.3class.distsim.crf.ser.gz ... done [1.0 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.muc.7class.distsim.crf.ser.gz ... done [0.7 sec].
Loading classifier from edu/stanford/nlp/models/ner/english.conll.4class.distsim.crf.ser.gz ... done [0.4 sec].
Threads( StanfordCoreNLP ) [0.440 seconds]
Parsing test captions
Threads( StanfordCoreNLP )
SPICE evaluation took: 3.809 s
SPICE: 0.383
*****DONE*****
Bleu:[0.7857142856581634, 0.7676494735193371, 0.6654242733719051, 0.5209655210466142]
METEOR:0.9037037037037037
ROUGE_L:0.6938153310104529
CIDEr:2.273565655101005
SPICE:0.38333333333333336
Tips:
由于CIDER指标的计算依赖TF-IDF,我么需要喂入大于等于2张图片的preds和annotations, 否则会出现CIDER=0的情况。
更多推荐
所有评论(0)