【Wolfram U】AI初学者指南 3

本章深入介绍了分类方法和回归预测，并使用了两个经典的机器学习案例：鸢尾花和波士顿房价回归。

weixin_47173810

508人浏览 · 2026-01-22 06:30:00

weixin_47173810 · 2026-01-22 06:30:00 发布

AI初学者指南 2

分类方法

Classify 使用的分类方法通常是自动选择的，但也可以进行修改。

将一些 2 D 坐标分类为颜色：

在这里插入图片描述

In[]:= Classify[colouredPoints, Method -> Automatic]

在这里插入图片描述
添加时间目标选项：

In[]:= Classify[colouredPoints, Method -> Automatic, TimeGoal -> 30]

在这里插入图片描述

选择正确的方法

方法

一种常见的机器学习思维模式是不去问 “如何？” 或 “为什么？”，而是问 “它有效吗？”。

选择好方法的基本方法如下：

在数据的一个子集上测试每种方法。
选择给出最佳预测结果的方法。
将该方法应用于整个数据集。

1. 测试

获取 Fisher' s Iris dataset （费雪鸢尾花数据集）的示例数据并提取训练数据的一个子集：

In[]:= trainingData = ExampleData[{"MachineLearning", "FisherIris"}, "TrainingData"];
In[]:= testingData = ExampleData[{"MachineLearning", "FisherIris"}, "TestData"];
In[]:= SeedRandom[1];
In[]:= trainingSample = RandomSample[trainingData, 30]
Out[]= {{5.9, 3., 5.1, 1.8} -> "virginica", {5.8, 2.8, 5.1, 2.4} -> "virginica", {5.1, 3.3, 1.7, 0.5} -> "setosa", {4.7, 3.2, 1.3, 0.2} -> "setosa", {5.7, 2.9, 4.2, 1.3} -> "versicolor", {5.4, 3.9, 1.7, 0.4} -> "setosa", {5.8, 2.6, 4., 1.2} -> "versicolor", {4.9, 3.1, 1.5, 0.2} -> "setosa", {5.1, 2.5, 3., 1.1} -> "versicolor", {6.7, 2.5, 5.8, 1.8} -> "virginica", {4.8, 3.4, 1.9, 0.2} -> "setosa", {5., 3.2, 1.2, 0.2} -> "setosa", {5., 3.4, 1.5, 0.2} -> "setosa", {6.3, 2.8, 5.1, 1.5} -> "virginica", {6., 2.2, 5., 1.5} -> "virginica", {5.8, 2.7, 5.1, 1.9} -> "virginica", {4.6, 3.1, 1.5, 0.2} -> "setosa", {5., 3.5, 1.6, 0.6} -> "setosa", {5.6, 2.5, 3.9, 1.1} -> "versicolor", {5.5, 3.5, 1.3, 0.2} -> "setosa", {5.9, 3., 4.2, 1.5} -> "versicolor", {7.3, 2.9, 6.3, 1.8} -> "virginica", {5.7, 2.8, 4.1, 1.3} -> "versicolor", {5.5, 2.4, 3.7, 1.} -> "versicolor", {5.6, 3., 4.5, 1.5} -> "versicolor", {5., 3.5, 1.3, 0.3} -> "setosa", {6.7, 3.1, 5.6, 2.4} -> "virginica", {4.9, 3.6, 1.4, 0.1} -> "setosa", {6.8, 3., 5.5, 2.1} -> "virginica", {5.4, 3.9, 1.3, 0.4} -> "setosa"}

创建两个分类器，一个使用马尔可夫方法，另一个使用随机森林方法：

In[]:= irisClassifier1 = Classify[trainingSample, Method -> "Markov"];
In[]:= irisClassifier2 = Classify[trainingSample, Method -> "RandomForest"];

2. 选择

仅基于准确率来看，随机森林方法的表现往往更好：

In[]:= Row[{ClassifierMeasurements[irisClassifier1, testingData], ClassifierMeasurements[irisClassifier2, testingData]}]

在这里插入图片描述

3. 应用

将随机森林方法应用于整个训练数据集：

In[]:= irisClassifier3 = Classify[trainingData, Method -> "RandomForest"]

在这里插入图片描述

检查准确率：

In[]:= ClassifierMeasurements[irisClassifier3, testingData]

在这里插入图片描述

预测

一旦模型在某些数据上经过训练，它就可以用来预测新的数据值。

这在填补缺失数据点（这一过程称为插补）时特别有用。

预测

获取马萨诸塞州波士顿的房屋数据：

In[]:= homeData = ResourceData["Sample Data: Boston Homes"];

查看列标题及其描述：

In[]:= Thread[Values[ResourceData["Sample Data: Boston Homes", {"ColumnHeadings", "ColumnDescriptions"}]]] // Dataset

特征	说明
CRIM	Per capita crime rate by town
ZN	Proportion of residential land zoned for lots over 25000 square feet
INDUS	Proportion of non-retail business acres per town
CHAS	Charles River dummy variable (1 if tract bounds river, 0 otherwise)
NOX	Nitrogen oxide concentration (parts per 10 million)
RM	Average number of rooms per dwelling
AGE	Proportion of owner-occupied units built prior to 1940
DIS	Weighted mean of distances to five Boston employment centers
RAD	Index of accessibility to radial highways
TAX	Full-value property-tax rater per $10000
PTRATIO	Pupil-teacher ratio by town
BLACK	1000(Bk-0.63)^2 where Bk is the proportion of Black or African-American residents by town
LSTAT	Lower status of the population (percent)
MEDV	Median value of owner-occupied homes in $1000s

假设你想根据房屋的其他特征来预测其价值（即 MEDV 列）。

将数据拆分为训练集和测试集，然后创建一个预测器：

In[]:= trainingData = RandomSample[homeData][[;; 400]];
 testingData = RandomSample[homeData][[401 ;;]];
In[]:= predictor = Predict[trainingData -> "MEDV", PerformanceGoal -> "Quality"]

In[]:= predictor = Predict[trainingData -> "MEDV", PerformanceGoal -> "Quality"]

在这里插入图片描述

使用预测器来估算缺失的房屋价格（单位：千美元）：

在这里插入图片描述

Out[]= 10.4

从预测器获取测量结果：

In[]:= predictorMeasurements = PredictorMeasurements[predictor, testingData -> "MEDV"]

在这里插入图片描述
从上图可以看到预测自动选择了KNN最近邻方法。

2048 AI社区

有“AI”的1024 = 2048，欢迎大家加入2048 AI社区

更多推荐

为mysql数据库建立索引

以上介绍的只是一些十分基本的东西，其实里面的学问也不少，单凭EXPLAIN我们是不能判定该方法是否就是最优化的，每个数据库都有自己的一些优化器，虽然可能还不太完善，但是它们都会在查询时对比过哪种方式较快，在某些情况下，建立索引的话也未必会快，例如索引放在一个不连续的存储空间时，这会。对于一个经常需要更新和插入的表格，就没有必要为一个很少使用的where字句单独建立索引了，对于比较小的表，排序的开销