Three typical types of Data Mining applications:
Classification
Regression
Clustering
Classification
In a classification type problem, we have a variable of interest which is categorical in nature. For example, this could be:
Classification of credit risk, either good or bad
Classifying patients as high risk for heart disease
Classifying individuals as high risk for fraudulent behavior
The goals of the classification problem can include:
Finding variables that are strongly related to the variable of interest
Developing a predictive model where a set of variables are used to
Classify the variable of interest
Regression
In a regression type problem, we have a variable of interest which is continuous in nature. For example, this could be:
A measurement for a manufacturing process
Revenue in dollars
Decrease in cholesterol after taking medication
The goals of the regression problem are similar to classification and can include:
Finding variables that are strongly related to the variable of interest
Developing a predictive model where a set of varicbles are used to
predict the variable of interest
Clustering
In a clustering type problem, there is not a traditional variable of interest. Instead, the data needs sorted into cluster. For example:
Clustering indibiduals for a marketing campaign
Clustering symptoms in medical research to find relationships
Finding clusters of bands, based on customer responses
The goals of cluster analysis problem can include:
Finding variables that are most highly influence cluster assignment
Comparing the clusters across variables of interest
Assigning new cases to clusters and measuring the strength of cluster membership