machine learning - Huge number of classes with Multinominal Naive Bayes (scikit-learn) -


whenever start having bigger number of classes (1000 , more) multinominalnb gets super slow , takes gigabytes of ram. same true scikit learn classification algorithms support .partial_fit() (sgdclassifier, perceptron). when working convolutional neural networks 10000 classes no problem. when want train multinominalnb on same data 12gb of ram not enough , very slow. understanding of naive bayes, lot of classes, should lot faster. might problem of scikit-learn implementation (maybe of .partial_fit() function) ? how can train multinominalnb/sgdclassifier/perceptron on 10000+ classes (batchwise)?

short answer without information:

  • the multinomialnb fits independent model each of classes, thus, if have c=10000+ classes fit c=10000+ models , therefore, model parameters [n_classes x n_features], quite lot of memory if n_features large.

  • the sgdclassifier of scikits-learn uses ova (one-versus-all) strategy train multiclass model (as sgdc not inherently multiclass) , therefore, c=10000+ models need trained.

  • and perceptron, documentation of scikits-learn:

perceptron , sgdclassifier share same underlying implementation. in fact, perceptron() equivalent sgdclassifier(loss=”perceptron”, eta0=1, learning_rate=”constant”, penalty=none).

so, 3 classifiers mention don't work high number of classes, independent model needs trained each of classes. recommend try inherently support multiclass classification, such randomforestclassifier.


Comments

Popular posts from this blog

Capture and play voice with Asterisk ARI -

java - Why database contraints in HSQLDB are only checked during a commit when using transactions in Hibernate? -

visual studio - Installing Packages through Nuget - "Central Directory corrupt" -