machine learning - Huge number of classes with Multinominal Naive Bayes (scikit-learn) -

- July 15, 2013

whenever start having bigger number of classes (1000 , more) multinominalnb gets super slow , takes gigabytes of ram. same true scikit learn classification algorithms support .partial_fit() (sgdclassifier, perceptron). when working convolutional neural networks 10000 classes no problem. when want train multinominalnb on same data 12gb of ram not enough , very slow. understanding of naive bayes, lot of classes, should lot faster. might problem of scikit-learn implementation (maybe of .partial_fit() function) ? how can train multinominalnb/sgdclassifier/perceptron on 10000+ classes (batchwise)?

short answer without information:

the multinomialnb fits independent model each of classes, thus, if have c=10000+ classes fit c=10000+ models , therefore, model parameters [n_classes x n_features], quite lot of memory if n_features large.
the sgdclassifier of scikits-learn uses ova (one-versus-all) strategy train multiclass model (as sgdc not inherently multiclass) , therefore, c=10000+ models need trained.
and perceptron, documentation of scikits-learn:

perceptron , sgdclassifier share same underlying implementation. in fact, perceptron() equivalent sgdclassifier(loss=”perceptron”, eta0=1, learning_rate=”constant”, penalty=none).

so, 3 classifiers mention don't work high number of classes, independent model needs trained each of classes. recommend try inherently support multiclass classification, such randomforestclassifier.

Search This Blog

Stadnd

machine learning - Huge number of classes with Multinominal Naive Bayes (scikit-learn) -

Comments

Post a Comment

Popular posts from this blog

python - Statsmodels.api Logit model error ValueError: endog must be in the unit interval -

Capture and play voice with Asterisk ARI -

c++ - Can not find the "fiostream.h" file -