zoukankan      html  css  js  c++  java
  • 数据挖掘导论-1

    Classification [Predictive]
    Clustering  [Descriptive]
    Association Rule Discovery [Descriptive]
    Sequential Pattern Discovery [Descriptive]
    Regression [Predictive]
    Deviation Detection [Predictive]

    categorical/qualitative
    1) nominal:
    mode众数
    entropy熵
    contingency correlation列联相关
    x,2-test卡方检验

    2) Ordinal: median/percentiles/rank correlation/
    run tests游程检验
    sign test符号检验

    numeric/quantitative

    3) Interval:
    mean/standard deviation/Pearson's correlation/t and F tests
    4) Ratio:
    geometric mean/harmonic mean/percent variation百分比变差


     data quality problems:

    1) Noise and outliers
    2) missing values
    why: 1. info not collected; 2. attributes not applicable for all
    how: 1. eliminate data objects; 2. estimate missing values; 3. Ignore missing values during analysis; 4. replace with all possible values(weighted by probabilities)
    3) duplicate data


    data preprocessing:
    1) aggregation
    2) sampling
    3) dimensionality reduction
    curse of dimensionality: dimensionality↑sparse↑,density & distance meaningful↓
    how: Principle Component Analysis; Singular Value Decomposition
    4) feature subset selection

    5) feature creation

    feature extraction: domain-specific
    mapping data to new space: Fourier transform/Wavelet transform
    feature construction: combining features

    6) discretization and binarization
    7) attribute transformation





    Euclidean density = number of points per unit volume

  • 相关阅读:
    封装aixos拦截器
    vue路由传参的三种基本方式
    vue里的路由钩子
    箭头函数特点
    webstorm激活码
    vue-cli2使用less
    vue-cli2使用rem适配
    XfZGkvBaeh
    python解析excel中图片+提取图片
    python解析谷歌在线表格链接,转化为数组形式,图片转化为链接
  • 原文地址:https://www.cnblogs.com/pxy7896/p/6493064.html
Copyright © 2011-2022 走看看