출처: http://machinelearningmastery.com/how-to-get-started-with-machine-learning-algorithms-in-r/

## Linear Regression

### Ordinary Least Squares Regression

Ordinary Least Squares (OLS) regression is a linear model that seeks to find a set of coefficients for a line/hyper-plane that minimise the sum of the squared errors.

1 2 3 4 5 6 7 8 9 10 11 |
# load data data(longley) # fit model fit <- lm(Employed~., longley) # summarize the fit summary(fit) # make predictions predictions <- predict(fit, longley) # summarize accuracy rmse <- mean((longley$Employed - predictions)^2) print(rmse) |

### Stepwize Linear Regression

Stepwise Linear Regression is a method that makes use of linear regression to discover which subset of attributes in the dataset result in the best performing model. It is step-wise because each iteration of the method makes a change to the set of attributes and creates a model to evaluate the performance of the set.

### Principal Component Regression

Principal Component Regression (PCR) creates a linear regression model using the outputs of a Principal Component Analysis (PCA) to estimate the coefficients of the model. PCR is useful when the data has highly-correlated predictors.

### Partial Least Squares Regression

Partial Least Squares (PLS) Regression creates a linear model of the data in a transformed projection of problem space. Like PCR, PLS is appropriate for data with highly-correlated predictors.

## Penalized Regression

### Ridge Regression

Ridge Regression creates a linear regression model that is penalized with the L2-norm which is the sum of the squared coefficients. This has the effect of shrinking the coefficient values (and the complexity of the model) allowing some coefficients with minor contribution to the response to get close to zero.

### Least Absolute Shrinkage and Selection Operator

Least Absolute Shrinkage and Selection Operator (LASSO) creates a regression model that is penalized with the L1-norm which is the sum of the absolute coefficients. This has the effect of shrinking coefficient values (and the complexity of the model), allowing some with a minor affect to the response to become zero.

### Elastic Net

Elastic Net creates a regression model that is penalized with both the L1-norm and L2-norm. This has the effect of effectively shrinking coefficients (as in ridge regression) and setting some coefficients to zero (as in LASSO).

## Non-Linear Regression

### Multivariate Adaptive Regression Splines

Multivariate Adaptive Regression Splines (MARS) is a non-parametric regression method that models multiple nonlinearities in data using hinge functions (functions with a kink in them).

### Support Vector Machine

Support Vector Machines (SVM) are a class of methods, developed originally for classification, that find support points that best separate classes. SVM for regression is called Support Vector Regression (SVM).

### k-Nearest Neighbor

The k-Nearest Neighbor (kNN) does not create a model, instead it creates predictions from close data on-demand when a prediction is required. A similarity measure (such as Euclidean distance) is used to locate close data in order to make predictions.

### Neural Network

A Neural Network (NN) is a graph of computational units that recieve inputs and transfer the result into an output that is passed on. The units are ordered into layers to connect the features of an input vector to the features of an output vector. With training, such as the Back-Propagation algorithm, neural networks can be designed and trained to model the underlying relationship in data.

## Decision Trees

### Classification and Regression Trees

Classification and Regression Trees (CART) split attributes based on values that minimize a loss function, such as sum of squared errors.

### Conditional Decision Trees

Condition Decision Trees are created using statistical tests to select split points on attributes rather than a loss function.

### Model Trees

Model Trees create a decision tree and use a linear model at each node to make a prediction rather than using an average value

### Rule System

Rule Systems can be crated by extracting and simplifying the rules from a decision tree.

### Bagging CART

Bootstrapped Aggregation (Bagging) is an ensemble method that creates multiple models of the same type from different sub-samples of the same dataset. The predictions from each separate model are combined together to provide a superior result. This approach has shown participially effective for high-variance methods such as decision trees.

### Random Forest

Random Forest is variation on Bagging of decision trees by reducing the attributes available to making a tree at each decision point to a random sub-sample. This further increases the variance of the trees and more trees are required.

### Gradient Boosted Machine

Boosting is an ensemble method developed for classification for reducing bias where models are added to learn the misclassification errors in existing models. It has been generalized and adapted in the form of Gradient Boosted Machines (GBM) for use with CART decision trees for classification and regression.

### Cubist

Cubist decision trees are another ensemble method. They are constructed like model trees but involve a boosting-like procedure called committees that re rule-like models.

## Linear Classification

### Logistic Regression

Logistic Regression is a classification method that models the probability of an observation belonging to one of two classes. As such, normally logistic regression is demonstrated with binary classification problem (2 classes). Logistic Regression can also be used on problems with more than two classes (multinomial), as in this case.

### Linear Discriminant Analysis

LDA is a classification method that finds a linear combination of data attributes that best separate the data into classes

### Partial Least Squares Discriminant Analysis

Partial Least Squares Discriminate Analysis is the application of LDA on a dimension-reducing projection of the input data (partial least squares).

## Non-Linear Classification

### Mixture Discriminant Analysis

### Quadratic Discriminant Analysis

### Regularized Discriminant Analysis

### Neural Network

A Neural Network (NN) is a graph of computational units that recieve inputs and transfer the result into an output that is passed on. The units are ordered into layers to connect the features of an input vector to the features of an output vector. With training, such as the Back-Propagation algorithm, neural networks can be designed and trained to model the underlying relationship in data.

### Flexible Discriminant Analysis

### Support Vector Machine

Support Vector Machines (SVM) are a method that uses points in a transformed problem space that best separate classes into two groups. Classification for multiple classes is supported by a one-vs-all method. SVM also supports regression by modeling the function with a minimum amount of allowable error.

### k-Nearest Neighbors

The k-Nearest Neighbor (kNN) method makes predictions by locating similar cases to a given data instance (using a similarity function) and returning the average or majority of the most similar data instances.

### Naive Bayes

Naive Bayes uses Bayes Theorem to model the conditional relationship of each attribute to the class variable.

## Non-Linear Classification with Decision Trees

### Classification and Regression Trees

### C4.5

The C4.5 algorithm is an extension of the ID3 algorithm and constructs a decision tree to maximize information gain (difference in entropy).

### PART

PART is a rule system that creates pruned C4.5 decision trees for the data set and extracts rules and those instances that are covered by the rules are removed from the training data. The process is repeated until all instances are covered by extracted rules.

### Bagging CART

Bootstrapped Aggregation (Bagging) is an ensemble method that creates multiple models of the same type from different sub-samples of the same dataset. The predictions from each separate model are combined together to provide a superior result. This approach has shown participially effective for high-variance methods such as decision trees.

### Random Forest

Random Forest is variation on Bagging of decision trees by reducing the attributes available to making a tree at each decision point to a random sub-sample. This further increases the variance of the trees and more trees are required.

### Gradient Boosted Machine

Boosting is an ensemble method developed for classification for reducing bias where models are added to learn the misclassification errors in existing models. It has been generalized and adapted in the form of Gradient Boosted Machines (GBM) for use with CART decision trees for classification and regression.

### Boosted C5.0

The C5.0 method is a further extension of C4.5 and pinnacle of that line of methods. It was proprietary for a long time, although the code was released recently and is available in the C50 package.