machine learning - Huge number of classes with Multinominal Naive Bayes (scikit-learn) -
whenever start having bigger number of classes (1000 , more) multinominalnb gets super slow , takes gigabytes of ram. same true scikit learn classification algorithms support .partial_fit() (sgdclassifier, perceptron). when working convolutional neural networks 10000 classes no problem. when want train multinominalnb on same data 12gb of ram not enough , very slow. understanding of naive bayes, lot of classes, should lot faster. might problem of scikit-learn implementation (maybe of .partial_fit() function) ? how can train multinominalnb/sgdclassifier/perceptron on 10000+ classes (batchwise)?
short answer without information:
the multinomialnb fits independent model each of classes, thus, if have
c=10000+
classes fitc=10000+
models , therefore, model parameters[n_classes x n_features]
, quite lot of memory ifn_features
large.the sgdclassifier of scikits-learn uses ova (one-versus-all) strategy train multiclass model (as sgdc not inherently multiclass) , therefore,
c=10000+
models need trained.and perceptron, documentation of scikits-learn:
perceptron , sgdclassifier share same underlying implementation. in fact, perceptron() equivalent sgdclassifier(loss=”perceptron”, eta0=1, learning_rate=”constant”, penalty=none).
so, 3 classifiers mention don't work high number of classes, independent model needs trained each of classes. recommend try inherently support multiclass classification, such randomforestclassifier.
Comments
Post a Comment