环境准备

安装必要的库:scikit-learnnumpy。通过以下命令安装:

pip install scikit-learn numpy

数据准备

使用简单的示例数据,包含文本和对应的标签(如正面/负面):

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

# 示例数据:文本和标签(0=负面,1=正面)
texts = ["I love this movie", "This is terrible", "Great film", "Worst experience"]
labels = [1, 0, 1, 0]

特征提取

将文本转换为数值特征(词频向量):

vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

模型训练

使用朴素贝叶斯分类器进行训练:

model = MultinomialNB()
model.fit(X, labels)

预测新文本

对新输入的文本进行分类预测:

new_texts = ["Awesome movie", "Not good"]
X_new = vectorizer.transform(new_texts)
predictions = model.predict(X_new)

print("预测结果:", predictions)  # 输出示例:[1 0]

完整代码

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB

# 数据
texts = ["I love this movie", "This is terrible", "Great film", "Worst experience"]
labels = [1, 0, 1, 0]

# 特征提取
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(texts)

# 训练模型
model = MultinomialNB()
model.fit(X, labels)

# 预测
new_texts = ["Awesome movie", "Not good"]
X_new = vectorizer.transform(new_texts)
predictions = model.predict(X_new)

print("预测结果:", predictions)


 

扩展说明

  • 如需处理更复杂的数据,可改用 TfidfVectorizer 替代 CountVectorizer
  • 模型可替换为 SVMRandomForest,需调整 sklearn 的对应模块。
Logo

有“AI”的1024 = 2048,欢迎大家加入2048 AI社区

更多推荐