Chapter 5 - Outlier Analysis
Segment 9 - Multivariate analysis for outlier detection
import pandas as pd
import matplotlib.pyplot as plt
from pylab import rcParams
import seaborn as sb
%matplotlib inline
rcParams['figure.figsize'] = 5, 4
sb.set_style('whitegrid')
Visually inspecting boxplots
df = pd.read_csv(filepath_or_buffer='~/Data/iris.data.csv', header=None, sep=',')
df.columns=['Sepal Length','Sepal Width','Petal Length','Petal Width', 'Species']
data = df.iloc[:,0:4].values
target = df.iloc[:,4].values
df[:5]
sb.boxplot(x='Species', y = 'Sepal Length', data=df, palette='hls')
<matplotlib.axes._subplots.AxesSubplot at 0x7f10bca12e10>

Looking at the scatterplot matrix
sb.pairplot(df, hue='Species', palette='hls')
<seaborn.axisgrid.PairGrid at 0x7f10bc332ef0>
