zoukankan html css js c++ java

UCI数据集iris数据简单的可视化

数据集官网下载；

jupyter notebook 实现；

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt



fname = 'E:\pythonwork\project\Deeplearning\Task\data\iris.data'
with open(fname, 'r+', encoding='utf-8') as f:
    s = [i[:-1].split(',') for i in f.readlines()]
# 读取TXT，逗号为分隔符

# pandas读取数据 样本数为各50个
names=['slength','swidth','plength','pwidth','name']
iris = pd.DataFrame(data=s,  columns=names)
# 删除一个莫名其妙的空行：
iris.dropna(axis=0, how='any', inplace=True)
# 有三种类别：
seto = iris.iloc[0:50,:]
vers = iris.iloc[50:100,:]
virg = iris.iloc[100:150,:]
seto.shape
vers.shape
# 统计每个品种有多少个样本
iris['name'].value_counts()
# 字符串类型的数据变成float（否则不能画图）
iris.iloc[:,:4]=iris.iloc[:,:4].astype('float')
# 画出slength和swidth的关系图
plt.scatter(x=iris['slength'],y=iris['swidth'])
plt.show()

#-------------------
# 按颜色不同分类 画图
plt.scatter(x=seto['slength'],y=seto['swidth'],color='red')
plt.scatter(x=vers['slength'],y=seto['swidth'],color='blue',marker="+")
plt.scatter(x=virg['slength'],y=seto['swidth'],color='green',marker='*')
plt.xlabel('s length')
plt.ylabel('s width')
plt.show()

查看全文

相关阅读:
Python中Random随机数返回值方式
 SQL跨库查询
 正则表达式基本语法
 excel VBA使用教程
 使用某些Widows API时，明明包含了该头文件，却报错“error C2065: undeclared identifier”
电脑开机后数字键盘为关闭状态
 编译Boost 详细步骤适用 VC6 VS2003 VS2005 VS2008 VS2010
变量作用域，不能理解，先记下
 解决MySQL 在 Java 检索遇到timestamp空值时报异常的问题
 Annotation

原文地址：https://www.cnblogs.com/flowerIron/p/12037449.html