# Overview feature engineering

2018, Sep 15

## Feature Engineering

• Generation
• Binarization, Quantization
• Scaling (normalization)
• Interaction features
• Log transformation
• Dimension reduction
• Clustering
• Selection
• Univariate feature selection
• Model-based selection
• Iterative feature selection
• Feature removal

### Interaction features

• create new features with combination of existing features.
• need to pre-knowledge and understanding of existing features.
• ex) height, width → square (square = height * width)
• ex) sensor1 + sensor2 → new sensor feature

### Log transformations

• Data distribution is merged extremely in a point.(ex. Poisson)
• Normal distribution is suitable for linear model.
• If data have Poisson distribution, make them to fit the linear model. Then, data will fit to normal distribution. (Poission → Normal dist)

### Dimension reduction

• is used when existing feature space is much large.
• algorithm reduces the space.
• ex) PCA, t-SNE, LDA, …

### Clustering

• without Y, create the component of dataset
• created components are used as features for classification
• is useful when you know the relationship between data to some degree.
• ex) K-means

### Feature selection

• All features are not necessary to learn a model.
• Some features are malignant to learn.
• Too many features causes overfitting.
• According to learning model, select necessary features and remove unnecessary features.

### Univariate feature selection

• selects optimal feature based on statistical model.
• analyzes the relation between y and one feature.
• is useful for linear model.
• is fast-applicable and simple technique.

### Model based feature selection

• A model find proper features in the learning process
• Possible to select the features based on Feature importance
• Useful to use Model-based feature selection as preprocessor of feature selection for other model
• Tree-based ensemble also has characteristics similar to Model-based feature selection
• Mainly used for explanation of which features are important.