首页 >教育解答 > 正文

机器学习画图全攻略

教育解答 AI星火模型提问2026-06-03 00:38 0 14

在机器学习中,“画图”通常指的是数据可视化（Data Visualization）和模型结果可视化，这是理解数据、调试模型和展示结果的关键步骤。

以下是机器学习中最常用的画图方法,分为 Python 工具库 和 常见绘图类型 两部分：

主要使用的 Python 库

Matplotlib：基础绘图库，灵活但代码稍多。
Seaborn：基于 Matplotlib，统计图表更美观，代码更简洁。
Plotly：交互式图表，适合展示和汇报。
Scikit-learn 内置函数：如 plot_confusion_matrix。
TensorFlow/Keras 内置回调：如 History 对象用于画训练曲线。

机器学习中最常见的 5 种画图类型

数据探索性分析（EDA）：散点图、直方图、箱线图

用于查看特征分布、相关性。

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# 示例数据
df = pd.DataFrame({
    'x': np.random.randn(100),
    'y': np.random.randn(100),
    'category': np.random.choice(['A', 'B'], 100)
})
# 1. 散点图（查看两个特征的关系）
plt.figure(figsize=(8, 6))
sns.scatterplot(x='x', y='y', hue='category', data=df)'Scatter Plot: Feature X vs Y')
plt.show()
# 2. 直方图（查看单特征分布）
plt.figure(figsize=(8, 4))
sns.histplot(df['x'], kde=True)'Histogram of Feature X')
plt.show()
# 3. 箱线图（查看异常值）
plt.figure(figsize=(8, 4))
sns.boxplot(x='category', y='x', data=df)'Box Plot by Category')
plt.show()

模型评估：混淆矩阵、ROC 曲线、PR 曲线

用于分类模型性能评估。

from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay, roc_curve, auc
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
# 生成示例数据
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 训练模型
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
y_prob = model.predict_proba(X_test)[:, 1]  # 正类概率
# 1. 混淆矩阵
cm = confusion_matrix(y_test, y_pred)
disp = ConfusionMatrixDisplay(confusion_matrix=cm, display_labels=['Class 0', 'Class 1'])
disp.plot(cmap='Blues')'Confusion Matrix')
plt.show()
# 2. ROC 曲线
fpr, tpr, _ = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')'Receiver Operating Characteristic (ROC)')
plt.legend(loc="lower right")
plt.show()

训练过程：损失曲线、准确率曲线

用于判断模型是否过拟合/欠拟合。

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
# 使用鸢尾花数据集
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.3, random_state=42)
# 注意：RandomForest 没有内置的 epoch 训练过程，这里用逻辑回归演示
from sklearn.linear_model import SGDClassifier
model = SGDClassifier(loss='log_loss')
model.fit(X_train, y_train)
# 对于深度学习（如 Keras），训练时会返回 history 对象
# 示例（假设使用 Keras）：
# history = model.fit(X_train, y_train, epochs=10, validation_split=0.2)
# plt.plot(history.history['loss'])
# plt.plot(history.history['val_loss'])
# plt.title('Model Loss')
# plt.ylabel('Loss')
# plt.xlabel('Epoch')
# plt.legend(['Train', 'Validation'])
# plt.show()

特征重要性：条形图

用于解释模型,哪些特征最重要。

from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
import pandas as pd
# 使用随机森林
model_rf = RandomForestClassifier(n_estimators=100, random_state=42)
model_rf.fit(X_train, y_train)
# 获取特征重要性
importances = model_rf.feature_importances_
feature_names = [f"Feature_{i}" for i in range(X_train.shape[1])]
# 排序
indices = np.argsort(importances)[::-1]
# 画图
plt.figure(figsize=(10, 6))"Feature Importances")
plt.bar(range(X_train.shape[1]), importances[indices], align="center")
plt.xticks(range(X_train.shape[1]), [feature_names[i] for i in indices], rotation=90)
plt.xlim([-1, X_train.shape[1]])
plt.tight_layout()
plt.show()

决策边界：2D 可视化

用于直观展示分类器如何划分空间。

from sklearn.datasets import make_moons
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
import numpy as np
import matplotlib.pyplot as plt
# 生成月牙形数据
X, y = make_moons(n_samples=300, noise=0.2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# 训练 SVM
svm = SVC(kernel='rbf', gamma='auto')
svm.fit(X_train, y_train)
# 创建网格
h = 0.02  # 网格步长
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# 预测网格点
Z = svm.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
# 画图
plt.figure(figsize=(8, 6))
plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral, alpha=0.8)
plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Spectral, edgecolors='k')'SVM Decision Boundary')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

高级可视化推荐

场景	推荐工具	说明
交互式探索	`Plotly` / `Dash`	鼠标悬停显示数值，缩放平移
高维数据降维	`t-SNE` / `UMAP` + `Seaborn`	将高维数据投影到 2D/3D 可视化
深度学习模型结构	`TensorBoard` / `Netron`	可视化神经网络层结构
特征关联矩阵	`Seaborn heatmap`	热力图显示特征间相关性
SHAP 值解释	`shap` 库	可视化每个特征对预测的影响

最佳实践建议

设置中文支持（如果输出中文标签）：

plt.rcParams['font.sans-serif'] = ['SimHei']  # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False    # 用来正常显示负号

保存高清图片：

plt.savefig('my_plot.png', dpi=300, bbox_inches='tight')

子图布局：使用 plt.subplots() 同时展示多个图。
颜色选择：避免使用过于刺眼的颜色，推荐使用 seaborn.color_palette() 获取协调配色。

#机器学习 #数据可视化 #绘图指南

上一篇：如何高效利用碎片时间学习
下一篇：虾皮电商新手孵化期全攻略，从0到1快速起号实操指南