Feature Extraction | HyperJ's Blog!

Feature Extraction

Posted on 2018-02-13 | Edited on 2018-03-11 | In Feature Engineering | Views:

The sklearn.feature_extraction module can be used to extract features in a format supported by machine learning algorithms from datasets consisting of formats such as text and image.

Note Feature extraction is very different from Feature selection: the former consists in transforming arbitrary data, such as text or images, into numerical features usable for machine learning. The latter is a machine learning technique applied on these features.

DictVectorizer

DictVectorizer implements what is called one-of-K or “one-hot” coding for categorical (aka nominal, discrete) features.

FeatureHasher

Text feature extraction

CountVectorizer, HashingVectorizer

Bag of Words(tokenization, counting and normalization)

Sparsity
Tf–idf term weighting
Decode

Image feature extraction

Links

Feature Extraction

Author：HyperJ
Source：HyperJ’s Blog
Link：Feature Extraction