machine learning - Huge number of classes with Multinominal Naive Bayes (scikit-learn) -


whenever start having bigger number of classes (1000 , more) multinominalnb gets super slow , takes gigabytes of ram. same true scikit learn classification algorithms support .partial_fit() (sgdclassifier, perceptron). when working convolutional neural networks 10000 classes no problem. when want train multinominalnb on same data 12gb of ram not enough , very slow. understanding of naive bayes, lot of classes, should lot faster. might problem of scikit-learn implementation (maybe of .partial_fit() function) ? how can train multinominalnb/sgdclassifier/perceptron on 10000+ classes (batchwise)?

short answer without information:

  • the multinomialnb fits independent model each of classes, thus, if have c=10000+ classes fit c=10000+ models , therefore, model parameters [n_classes x n_features], quite lot of memory if n_features large.

  • the sgdclassifier of scikits-learn uses ova (one-versus-all) strategy train multiclass model (as sgdc not inherently multiclass) , therefore, c=10000+ models need trained.

  • and perceptron, documentation of scikits-learn:

perceptron , sgdclassifier share same underlying implementation. in fact, perceptron() equivalent sgdclassifier(loss=”perceptron”, eta0=1, learning_rate=”constant”, penalty=none).

so, 3 classifiers mention don't work high number of classes, independent model needs trained each of classes. recommend try inherently support multiclass classification, such randomforestclassifier.


Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -