zoukankan      html  css  js  c++  java
  • 数据挖掘笔记 第一章:引言

    教科书:数据挖掘:概念与技术(第二版),Jiawei Han和Micheline Kamber 著,机械工业出版社(2007)

    Lecture 1: Introduction

    1)  Why data mining?

    Necessity Is the Mother of Invention需要是发明之母

    2) What is data mining?

    Data mining (knowledge discovery from data从大量数据中提取或挖掘知识)

    Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data从大量的数据中挖掘哪些令人感兴趣的、有用的、隐含的、先前未知的和可能有用的模式或知识

    Alternative names: Knowledge discovery (mining) in databases (KDD) 数据库中的知识挖掘

    Steps of a KDD Process

    Learning the application domain: relevant prior knowledge and goals of application

    Creating a target data set: data selection

    Data cleaning and preprocessing: (may take 60% of effort!)

    Data reduction and transformation:Find useful features, dimensionality/variable reduction, invariant representation

    Choosing functions of data mining: summarization, classification, regression, association, clustering

    Choosing the mining algorithm(s)

    Data mining: search for patterns of interest

    Pattern evaluation and knowledge presentation: visualization, transformation, removing redundant patterns, etc.

    Use of discovered knowledge

    Architecture: Typical Data Mining System

    3) On what kind of data?

    Traditional database and appllications

        Relational database, data warehouse, transactional database关系数据库,数据仓库,事务数据库

    Advanced database and advanced applications

       Object-relational databases对象-关系数据库

       Temporal database, sequence data (incl. biosequences), time-series data时间数据库、序列数据库和时间序列数据库

        Spatial database and spatiotemporal database空间数据库和时间空间数据库

        Text databases Multimedia database文本数据库和多媒体数据库

        Heterogeneous databases and legacy databases异构数据库和遗产数据库

        Data streams and sensor data数据流和传感器数据

        Structure data, graphs, social networks and link databases

        Text databases Multimedia database文本数据库和多媒体数据库

        The World-Wide Web万维网

    4) Data Mining Functionalities

       Lass/concept description: Characterization and discrimination 类/概念描述: 特性化和区分

       Frequent patterns, association, correlation and causality频繁模式、关联和相关

       Classification and prediction分类和预测 

       Cluster analysis聚类分析

       Outlier analysis离群点分析

       Trend and evolution analysis趋势和演变分析

    5) Are all the patterns interesting?

    6) Classification of data mining systems

  • 相关阅读:
    基于ArcGIS10.0和Oracle10g的空间数据管理平台十一(C#开发)空间数据字段检查
    IT技术人生路之我的大学网站开发技术团队
    分布式日志收集系统: Facebook Scribe
    基于ArcGIS10.0和Oracle10g的空间数据管理平台十(C#开发)空间数据导入RDBMS上MDB格式
    IT技术人生路之我的大学初入大学及军训
    IT技术人生路之我的大学我技术方向的转变
    基于ArcGIS10.0和Oracle10g的空间数据管理平台十三(C#开发)空间数据导出
    基于ArcGIS10.0和Oracle10g的空间数据管理平台(C#开发)系统需求分析
    web服务
    js数据转换
  • 原文地址:https://www.cnblogs.com/lanzhi/p/6468172.html
Copyright © 2011-2022 走看看