class cv::ml::DTrees

Overview

The class represents a single decision tree or a collection of decision trees. More…

#include <ml.hpp>

class DTrees: public cv::ml::StatModel
{
public:
    // enums

    enum Flags;

    // classes

    class Node;
    class Split;

    // methods

    virtual
    int
    getCVFolds() const = 0;

    virtual
    int
    getMaxCategories() const = 0;

    virtual
    int
    getMaxDepth() const = 0;

    virtual
    int
    getMinSampleCount() const = 0;

    virtual
    const std::vector<Node>&
    getNodes() const = 0;

    virtual
    cv::Mat
    getPriors() const = 0;

    virtual
    float
    getRegressionAccuracy() const = 0;

    virtual
    const std::vector<int>&
    getRoots() const = 0;

    virtual
    const std::vector<Split>&
    getSplits() const = 0;

    virtual
    const std::vector<int>&
    getSubsets() const = 0;

    virtual
    bool
    getTruncatePrunedTree() const = 0;

    virtual
    bool
    getUse1SERule() const = 0;

    virtual
    bool
    getUseSurrogates() const = 0;

    virtual
    void
    setCVFolds(int val) = 0;

    virtual
    void
    setMaxCategories(int val) = 0;

    virtual
    void
    setMaxDepth(int val) = 0;

    virtual
    void
    setMinSampleCount(int val) = 0;

    virtual
    void
    setPriors(const cv::Mat& val) = 0;

    virtual
    void
    setRegressionAccuracy(float val) = 0;

    virtual
    void
    setTruncatePrunedTree(bool val) = 0;

    virtual
    void
    setUse1SERule(bool val) = 0;

    virtual
    void
    setUseSurrogates(bool val) = 0;

    static
    Ptr<DTrees>
    create();

    static
    Ptr<DTrees>
    load(
        const String& filepath,
        const String& nodeName = String()
        );
};

// direct descendants

class Boost;
class RTrees;

Inherited Members

public:
    // enums

    enum Flags;

    // methods

    virtual
    void
    clear();

    virtual
    bool
    empty() const;

    virtual
    String
    getDefaultName() const;

    virtual
    void
    read(const FileNode& fn);

    virtual
    void
    save(const String& filename) const;

    virtual
    void
    write(FileStorage& fs) const;

    template <typename _Tp>
    static
    Ptr<_Tp>
    load(
        const String& filename,
        const String& objname = String()
        );

    template <typename _Tp>
    static
    Ptr<_Tp>
    loadFromString(
        const String& strModel,
        const String& objname = String()
        );

    template <typename _Tp>
    static
    Ptr<_Tp>
    read(const FileNode& fn);

    virtual
    float
    calcError(
        const Ptr<TrainData>& data,
        bool test,
        OutputArray resp
        ) const;

    virtual
    bool
    empty() const;

    virtual
    int
    getVarCount() const = 0;

    virtual
    bool
    isClassifier() const = 0;

    virtual
    bool
    isTrained() const = 0;

    virtual
    float
    predict(
        InputArray samples,
        OutputArray results = noArray(),
        int flags = 0
        ) const = 0;

    virtual
    bool
    train(
        const Ptr<TrainData>& trainData,
        int flags = 0
        );

    virtual
    bool
    train(
        InputArray samples,
        int layout,
        InputArray responses
        );

    template <typename _Tp>
    static
    Ptr<_Tp>
    train(
        const Ptr<TrainData>& data,
        int flags = 0
        );

protected:
    // methods

    void
    writeFormat(FileStorage& fs) const;

Detailed Documentation

The class represents a single decision tree or a collection of decision trees.

The current public interface of the class allows user to train only a single decision tree, however the class is capable of storing multiple decision trees and using them for prediction (by summing responses or using a voting schemes), and the derived from DTrees classes (such as RTrees and Boost) use this capability to implement decision tree ensembles.

See also:

Decision Trees

Methods

virtual
int
getCVFolds() const = 0

If CVFolds > 1 then algorithms prunes the built decision tree using K-fold cross-validation procedure where K is equal to CVFolds. Default value is 10.

See also:

setCVFolds

virtual
int
getMaxCategories() const = 0

Cluster possible values of a categorical variable into K<=maxCategories clusters to find a suboptimal split. If a discrete variable, on which the training procedure tries to make a split, takes more than maxCategories values, the precise best subset estimation may take a very long time because the algorithm is exponential. Instead, many decision trees engines (including our implementation) try to find sub-optimal split in this case by clustering all the samples into maxCategories clusters that is some categories are merged together. The clustering is applied only in n > 2-class classification problems for categorical variables with N > max_categories possible values. In case of regression and 2-class classification the optimal split can be found efficiently without employing clustering, thus the parameter is not used in these cases. Default value is 10.

See also:

setMaxCategories

virtual
int
getMaxDepth() const = 0

The maximum possible depth of the tree. That is the training algorithms attempts to split a node while its depth is less than maxDepth. The root node has zero depth. The actual depth may be smaller if the other termination criteria are met (see the outline of the training procedure here), and/or if the tree is pruned. Default value is INT_MAX.

See also:

setMaxDepth

virtual
int
getMinSampleCount() const = 0

If the number of samples in a node is less than this parameter then the node will not be split.

Default value is 10.

See also:

setMinSampleCount

virtual
const std::vector<Node>&
getNodes() const = 0

Returns all the nodes.

all the node indices are indices in the returned vector

virtual
cv::Mat
getPriors() const = 0

The array of a priori class probabilities, sorted by the class label value.

The parameter can be used to tune the decision tree preferences toward a certain class. For example, if you want to detect some rare anomaly occurrence, the training base will likely contain much more normal cases than anomalies, so a very good classification performance will be achieved just by considering every case as normal. To avoid this, the priors can be specified, where the anomaly probability is artificially increased (up to 0.5 or even greater), so the weight of the misclassified anomalies becomes much bigger, and the tree is adjusted properly.

You can also think about this parameter as weights of prediction categories which determine relative weights that you give to misclassification. That is, if the weight of the first category is 1 and the weight of the second category is 10, then each mistake in predicting the second category is equivalent to making 10 mistakes in predicting the first category. Default value is empty Mat.

See also:

setPriors

virtual
float
getRegressionAccuracy() const = 0

Termination criteria for regression trees. If all absolute differences between an estimated value in a node and values of train samples in this node are less than this parameter then the node will not be split further. Default value is 0.01f

See also:

setRegressionAccuracy

virtual
const std::vector<int>&
getRoots() const = 0

Returns indices of root nodes.

virtual
const std::vector<Split>&
getSplits() const = 0

Returns all the splits.

all the split indices are indices in the returned vector

virtual
const std::vector<int>&
getSubsets() const = 0

Returns all the bitsets for categorical splits.

Split::subsetOfs is an offset in the returned vector

virtual
bool
getTruncatePrunedTree() const = 0

If true then pruned branches are physically removed from the tree. Otherwise they are retained and it is possible to get results from the original unpruned (or pruned less aggressively) tree. Default value is true.

See also:

setTruncatePrunedTree

virtual
bool
getUse1SERule() const = 0

If true then a pruning will be harsher. This will make a tree more compact and more resistant to the training data noise but a bit less accurate. Default value is true.

See also:

setUse1SERule

virtual
bool
getUseSurrogates() const = 0

If true then surrogate splits will be built. These splits allow to work with missing data and compute variable importance correctly. Default value is false. currently it’s not implemented.

See also:

setUseSurrogates

virtual
void
setCVFolds(int val) = 0

See also:

getCVFolds

virtual
void
setMaxCategories(int val) = 0

See also:

getMaxCategories

virtual
void
setMaxDepth(int val) = 0

See also:

getMaxDepth

virtual
void
setMinSampleCount(int val) = 0

See also:

getMinSampleCount

virtual
void
setPriors(const cv::Mat& val) = 0

The array of a priori class probabilities, sorted by the class label value.

See also:

getPriors

virtual
void
setRegressionAccuracy(float val) = 0

See also:

getRegressionAccuracy

virtual
void
setTruncatePrunedTree(bool val) = 0

See also:

getTruncatePrunedTree

virtual
void
setUse1SERule(bool val) = 0

See also:

getUse1SERule

virtual
void
setUseSurrogates(bool val) = 0

See also:

getUseSurrogates

static
Ptr<DTrees>
create()

Creates the empty model.

The static method creates empty decision tree with the specified parameters. It should be then trained using train method (see StatModel::train). Alternatively, you can load the model from file using Algorithm::load <DTrees>(filename).

static
Ptr<DTrees>
load(
    const String& filepath,
    const String& nodeName = String()
    )

Loads and creates a serialized DTrees from a file.

Use DTree::save to serialize and store an DTree to disk. Load the DTree from this file again, by calling this function with the path to the file. Optionally specify the node for the file containing the classifier

Parameters:

filepath path to serialized DTree
nodeName name of node containing the classifier