Algorithms

Note

Scikit-learn patching functionality in daal4py was deprecated and moved to a separate package, Intel(R) Extension for Scikit-learn*. All future patches will be available only in Intel(R) Extension for Scikit-learn*. Use the scikit-learn-intelex package instead of daal4py for the scikit-learn acceleration.

Classification

See also Intel(R) oneAPI Data Analytics Library Classification.

Decision Forest Classification

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Classification Decision Forest.

Examples:

class daal4py.decision_forest_classification_training
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for Decision forest, double or float

  • method (str) – [optional, default: “defaultDense”] Decision forest computation method

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • nTrees (size_t) – [optional, default: -1] Number of trees in the forest. Default is 10

  • observationsPerTreeFraction (double) – [optional, default: get_nan64()] Fraction of observations used for a training of one tree, 0 to 1. Default is 1 (sampling with replacement)

  • featuresPerNode (size_t) – [optional, default: -1] Number of features tried as possible splits per node. If 0 then sqrt(p) for classification, p/3 for regression, where p is the total number of features.

  • maxTreeDepth (size_t) – [optional, default: -1] Maximal tree depth. Default is 0 (unlimited)

  • minObservationsInLeafNode (size_t) – [optional, default: -1] Minimal number of observations in a leaf node. Default is 1 for classification, 5 for regression.

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine for the random numbers generator used by the algorithms

  • impurityThreshold (double) – [optional, default: get_nan64()] Threshold value used as stopping criteria: if the impurity value in the node is smaller than the threshold then the node is not split anymore.

  • varImportance (str) – [optional, default: “”] Variable importance computation mode

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • memorySavingMode (bool) – [optional, default: False] If true then use memory saving (but slower) mode

  • bootstrap (bool) – [optional, default: False] If true then training set for a tree is a bootstrap of the whole training set

  • minObservationsInSplitNode (size_t) – [optional, default: -1] Minimal number of observations in a split node. Default 2

  • minWeightFractionInLeafNode (double) – [optional, default: get_nan64()] The minimum weighted fraction of the sum total of weights (of all the input observations) required to be at a leaf node, 0.0 to 0.5. Default is 0.0

  • minImpurityDecreaseInSplitNode (double) – [optional, default: get_nan64()] A node will be split if this split induces a decrease of the impurity greater than or equal to the value, non-negative. Default is 0.0

  • maxLeafNodes (size_t) – [optional, default: -1] Maximum number of leaf node. Default is 0 (unlimited)

  • maxBins (size_t) – [optional, default: -1] Used with ‘hist’ split finding method only. Maximal number of discrete bins to bucket continuous features. Default is 256. Increasing the number results in higher computation costs

  • minBinSize (size_t) – [optional, default: -1] Used with ‘hist’ split finding method only. Minimal number of observations in a bin. Default is 5

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Training data set

  • labels (data_or_file) – Labels of the training data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

decision_forest_classification_training_result

class daal4py.decision_forest_classification_training_result

Properties:

model
Type:

decision_forest_classification_model

outOfBagError
Type:

Numpy array

outOfBagErrorAccuracy
Type:

Numpy array

outOfBagErrorDecisionFunction
Type:

Numpy array

outOfBagErrorPerObservation
Type:

Numpy array

variableImportance
Type:

Numpy array

class daal4py.decision_forest_classification_prediction
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the decision_forest algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] decision_forest computation method

  • votingMethod (str) – [optional, default: “”]

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data set

  • model (decision_forest_classification_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.classifier_prediction_result

Properties:

logProbabilities
Type:

Numpy array

prediction
Type:

Numpy array

probabilities
Type:

Numpy array

class daal4py.decision_forest_classification_model

Properties:

NFeatures
Type:

size_t

NumberOfClasses
Type:

size_t

NumberOfFeatures
Type:

size_t

NumberOfTrees
Type:

size_t

Decision Tree Classification

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Classification Decision Tree.

Examples:

class daal4py.decision_tree_classification_training
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for Decision tree model-based training, double or float

  • method (str) – [optional, default: “defaultDense”] Decision tree training method

  • pruning (str) – [optional, default: “”] Pruning method for Decision tree

  • maxTreeDepth (size_t) – [optional, default: -1] Maximum tree depth. 0 means unlimited depth.

  • minObservationsInLeafNodes (size_t) – [optional, default: -1] Minimum number of observations in the leaf node. Can be any positive number.

  • nBins (size_t) – [optional, default: -1] The number of bins used to compute probabilities of the observations belonging to the class. The only supported value for current version of the library is 1.

  • splitCriterion (str) – [optional, default: “”] Split criterion for Decision tree classification

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, labels, dataForPruning, labelsForPruning, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Training data set

  • labels (data_or_file) – Labels of the training data set

  • dataForPruning (data_or_file) – Pruning data set

  • labelsForPruning (data_or_file) – Labels of the pruning data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

decision_tree_classification_training_result

class daal4py.decision_tree_classification_training_result

Properties:

model
Type:

decision_tree_classification_model

class daal4py.decision_tree_classification_prediction
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for Decision tree model-based prediction

  • method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode

  • pruning (str) – [optional, default: “”] Pruning method for Decision tree

  • maxTreeDepth (size_t) – [optional, default: -1] Maximum tree depth. 0 means unlimited depth.

  • minObservationsInLeafNodes (size_t) – [optional, default: -1] Minimum number of observations in the leaf node. Can be any positive number.

  • nBins (size_t) – [optional, default: -1] The number of bins used to compute probabilities of the observations belonging to the class. The only supported value for current version of the library is 1.

  • splitCriterion (str) – [optional, default: “”] Split criterion for Decision tree classification

  • nClasses (size_t) – [optional, default: -1] Number of classes

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data set

  • model (decision_tree_classification_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.classifier_prediction_result

Properties:

logProbabilities
Type:

Numpy array

prediction
Type:

Numpy array

probabilities
Type:

Numpy array

class daal4py.decision_tree_classification_model

Properties:

NFeatures
Type:

size_t

NumberOfFeatures
Type:

size_t

Gradient Boosted Classification

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Classification Gradient Boosted Tree.

Examples:

class daal4py.gbt_classification_training
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for Gradient Boosted Trees, double or float

  • method (str) – [optional, default: “defaultDense”] Gradient Boosted Trees computation method

  • loss (str) – [optional, default: “”] Loss function type

  • varImportance (str) – [optional, default: “”] 64 bit integer flag VariableImportanceModes that indicates the variable importance computation modes

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • splitMethod (str) – [optional, default: “”] Split finding method. Default is exact

  • maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the gradient boosted trees training algorithm. Default is 50

  • maxTreeDepth (size_t) – [optional, default: -1] Maximal tree depth, 0 for unlimited. Default is 6

  • shrinkage (double) – [optional, default: get_nan64()] Learning rate of the boosting procedure. Scales the contribution of each tree by a factor (0, 1]. Default is 0.3

  • minSplitLoss (double) – [optional, default: get_nan64()] Loss regularization parameter. Min loss reduction required to make a further partition on a leaf node of the tree. Range: [0, inf). Default is 0

  • lambda (double) – [optional, default: get_nan64()] L2 regularization parameter on weights. Range: [0, inf). Default is 1

  • observationsPerTreeFraction (double) – [optional, default: get_nan64()] Fraction of observations used for a training of one tree, sampling without replacement. Range: (0, 1]. Default is 1 (no sampling, entire dataset is used)

  • featuresPerNode (size_t) – [optional, default: -1] Number of features tried as possible splits per node. Range : [0, p] where p is the total number of features. Default is 0 (use all features)

  • minObservationsInLeafNode (size_t) – [optional, default: -1] Minimal number of observations in a leaf node. Default is 5.

  • memorySavingMode (bool) – [optional, default: False] If true then use memory saving (but slower) mode. Default is false

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine for the random numbers generator used by the algorithms

  • maxBins (size_t) – [optional, default: -1] Used with ‘inexact’ split finding method only. Maximal number of discrete bins to bucket continuous features. Default is 256. Increasing the number results in higher computation costs

  • minBinSize (size_t) – [optional, default: -1] Used with ‘inexact’ split finding method only. Minimal number of observations in a bin. Default is 5

  • internalOptions (int) – [optional, default: -1] Internal options

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Training data set

  • labels (data_or_file) – Labels of the training data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

gbt_classification_training_result

class daal4py.gbt_classification_training_result

Properties:

model
Type:

gbt_classification_model

variableImportanceByCover
Type:

Numpy array

variableImportanceByGain
Type:

Numpy array

variableImportanceByTotalCover
Type:

Numpy array

variableImportanceByTotalGain
Type:

Numpy array

variableImportanceByWeight
Type:

Numpy array

class daal4py.gbt_classification_prediction
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the gbt algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] gradient boosted trees computation method

  • nIterations (size_t) – [optional, default: -1] Number of iterations of the trained model to be used for prediction

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data set

  • model (gbt_classification_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.classifier_prediction_result

Properties:

logProbabilities
Type:

Numpy array

prediction
Type:

Numpy array

probabilities
Type:

Numpy array

class daal4py.gbt_classification_model

Properties:

NFeatures
Type:

size_t

NumberOfFeatures
Type:

size_t

NumberOfTrees
Type:

size_t

k-Nearest Neighbors (kNN)

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library k-Nearest Neighbors (kNN).

Examples:

class daal4py.kdtree_knn_classification_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for KD-tree based kNN model-based training, double or float

  • method (str) – [optional, default: “defaultDense”] KD-tree based kNN training method

  • k (size_t) – [optional, default: -1] Number of neighbors

  • dataUseInModel (str) – [optional, default: “”] The option to enable/disable an usage of the input dataset in kNN model

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine for random choosing elements from training dataset

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • voteWeights (str) – [optional, default: “”] Weight function used in prediction

  • nClasses (size_t) – [optional, default: -1] Number of classes

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Training data set

  • labels (data_or_file) – [optional, default: None] Labels of the training data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

kdtree_knn_classification_training_result

class daal4py.kdtree_knn_classification_training_result

Properties:

model
Type:

kdtree_knn_classification_model

class daal4py.kdtree_knn_classification_prediction
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for KD-tree based kNN model-based prediction

  • method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode

  • k (size_t) – [optional, default: -1] Number of neighbors

  • dataUseInModel (str) – [optional, default: “”] The option to enable/disable an usage of the input dataset in kNN model

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine for random choosing elements from training dataset

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • voteWeights (str) – [optional, default: “”] Weight function used in prediction

  • nClasses (size_t) – [optional, default: -1] Number of classes

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data set

  • model (kdtree_knn_classification_modelptr) – Input model trained by the classification algorithm

Return type:

kdtree_knn_classification_prediction_result

class daal4py.classifier_prediction_result

Properties:

logProbabilities
Type:

Numpy array

prediction
Type:

Numpy array

probabilities
Type:

Numpy array

class daal4py.kdtree_knn_classification_model

Properties:

NFeatures
Type:

size_t

NumberOfFeatures
Type:

size_t

Brute-force k-Nearest Neighbors (kNN)

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library k-Nearest Neighbors (kNN).

class daal4py.bf_knn_classification_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for BF kNN model-based training, double or float

  • method (str) – [optional, default: “defaultDense”] BF kNN training method

  • k (size_t) – [optional, default: -1] Number of neighbors

  • dataUseInModel (str) – [optional, default: “”] The option to enable/disable an usage of the input dataset in kNN model

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • voteWeights (str) – [optional, default: “”] Weight function used in prediction

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine for random choosing elements from training dataset

  • nClasses (size_t) – [optional, default: -1] Number of classes

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Training data set

  • labels (data_or_file) – [optional, default: None] Labels of the training data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

bf_knn_classification_training_result

class daal4py.bf_knn_classification_training_result

Properties:

model
Type:

bf_knn_classification_model

class daal4py.bf_knn_classification_prediction
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for BF kNN model-based prediction

  • method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode

  • k (size_t) – [optional, default: -1] Number of neighbors

  • dataUseInModel (str) – [optional, default: “”] The option to enable/disable an usage of the input dataset in kNN model

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • voteWeights (str) – [optional, default: “”] Weight function used in prediction

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine for random choosing elements from training dataset

  • nClasses (size_t) – [optional, default: -1] Number of classes

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data set

  • model (bf_knn_classification_modelptr) – Input model trained by the classification algorithm

Return type:

bf_knn_classification_prediction_result

class daal4py.classifier_prediction_result

Properties:

logProbabilities
Type:

Numpy array

prediction
Type:

Numpy array

probabilities
Type:

Numpy array

class daal4py.bf_knn_classification_model

Properties:

NFeatures
Type:

size_t

NumberOfFeatures
Type:

size_t

AdaBoost Classification

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Classification AdaBoost.

Examples:

class daal4py.adaboost_training
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the AdaBoost, double or float

  • method (str) – [optional, default: “defaultDense”] AdaBoost computation method

  • weakLearnerTraining (classifier_training_batch__iface__) – [optional, default: None] The algorithm for weak learner model training

  • weakLearnerPrediction (classifier_prediction_batch__iface__) – [optional, default: None] The algorithm for prediction based on a weak learner model

  • accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the AdaBoost training algorithm

  • maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the AdaBoost training algorithm

  • learningRate (double) – [optional, default: get_nan64()] Multiplier for each classifier to shrink its contribution

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Training data set

  • labels (data_or_file) – Labels of the training data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

adaboost_training_result

class daal4py.adaboost_training_result

Properties:

model
Type:

adaboost_model

weakLearnersErrors
Type:

Numpy array

class daal4py.adaboost_prediction
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the AdaBoost, double or float

  • method (str) – [optional, default: “defaultDense”] AdaBoost computation method

  • weakLearnerTraining (classifier_training_batch__iface__) – [optional, default: None] The algorithm for weak learner model training

  • weakLearnerPrediction (classifier_prediction_batch__iface__) – [optional, default: None] The algorithm for prediction based on a weak learner model

  • accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the AdaBoost training algorithm

  • maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the AdaBoost training algorithm

  • learningRate (double) – [optional, default: get_nan64()] Multiplier for each classifier to shrink its contribution

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data set

  • model (adaboost_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.classifier_prediction_result

Properties:

logProbabilities
Type:

Numpy array

prediction
Type:

Numpy array

probabilities
Type:

Numpy array

class daal4py.adaboost_model

Properties:

Alpha
Type:

Numpy array

NFeatures
Type:

size_t

NumberOfFeatures
Type:

size_t

NumberOfWeakLearners
Type:

size_t

WeakLearnerModel()
Type:

classifier_model (or derived)

BrownBoost Classification

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Classification BrownBoost.

Examples:

class daal4py.brownboost_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for BrownBoost, double or float

  • method (str) – [optional, default: “defaultDense”] BrownBoost computation method

  • weakLearnerTraining (classifier_training_batch__iface__) – [optional, default: None] The algorithm for weak learner model training

  • weakLearnerPrediction (classifier_prediction_batch__iface__) – [optional, default: None] The algorithm for prediction based on a weak learner model

  • accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the BrownBoost training algorithm

  • maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the BrownBoost training algorithm

  • newtonRaphsonAccuracyThreshold (double) – [optional, default: get_nan64()] Accuracy threshold for Newton-Raphson iterations in the BrownBoost training algorithm

  • newtonRaphsonMaxIterations (size_t) – [optional, default: -1] Maximal number of Newton-Raphson iterations in the BrownBoost training algorithm

  • degenerateCasesThreshold (double) – [optional, default: get_nan64()] Threshold needed to avoid degenerate cases in the BrownBoost training algorithm

  • nClasses (size_t) – [optional, default: -1] Number of classes

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Training data set

  • labels (data_or_file) – Labels of the training data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

brownboost_training_result

class daal4py.brownboost_training_result

Properties:

model
Type:

brownboost_model

class daal4py.brownboost_prediction
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the BrownBoost algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] BrownBoost computation method

  • weakLearnerTraining (classifier_training_batch__iface__) – [optional, default: None] The algorithm for weak learner model training

  • weakLearnerPrediction (classifier_prediction_batch__iface__) – [optional, default: None] The algorithm for prediction based on a weak learner model

  • accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the BrownBoost training algorithm

  • maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the BrownBoost training algorithm

  • newtonRaphsonAccuracyThreshold (double) – [optional, default: get_nan64()] Accuracy threshold for Newton-Raphson iterations in the BrownBoost training algorithm

  • newtonRaphsonMaxIterations (size_t) – [optional, default: -1] Maximal number of Newton-Raphson iterations in the BrownBoost training algorithm

  • degenerateCasesThreshold (double) – [optional, default: get_nan64()] Threshold needed to avoid degenerate cases in the BrownBoost training algorithm

  • nClasses (size_t) – [optional, default: -1] Number of classes

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data set

  • model (brownboost_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.classifier_prediction_result

Properties:

logProbabilities
Type:

Numpy array

prediction
Type:

Numpy array

probabilities
Type:

Numpy array

class daal4py.brownboost_model

Properties:

Alpha
Type:

Numpy array

NFeatures
Type:

size_t

NumberOfFeatures
Type:

size_t

NumberOfWeakLearners
Type:

size_t

WeakLearnerModel()
Type:

classifier_model (or derived)

LogitBoost Classification

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Classification LogitBoost.

Examples:

class daal4py.logitboost_training
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for LogitBoost, double or float

  • method (str) – [optional, default: “friedman”] LogitBoost computation method

  • weakLearnerTraining (regression_training_batch__iface__) – [optional, default: None] The algorithm for weak learner model training

  • weakLearnerPrediction (regression_prediction_batch__iface__) – [optional, default: None] The algorithm for prediction based on a weak learner model

  • accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the LogitBoost training algorithm

  • maxIterations (size_t) – [optional, default: -1] Maximal number of terms in additive regression

  • weightsDegenerateCasesThreshold (double) – [optional, default: get_nan64()] Threshold to avoid degenerate cases when calculating weights W

  • responsesDegenerateCasesThreshold (double) – [optional, default: get_nan64()] Threshold to avoid degenerate cases when calculating responses Z

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Training data set

  • labels (data_or_file) – Labels of the training data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

logitboost_training_result

class daal4py.logitboost_training_result

Properties:

model
Type:

logitboost_model

class daal4py.logitboost_prediction
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the LogitBoost algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] LogitBoost computation method

  • weakLearnerTraining (regression_training_batch__iface__) – [optional, default: None] The algorithm for weak learner model training

  • weakLearnerPrediction (regression_prediction_batch__iface__) – [optional, default: None] The algorithm for prediction based on a weak learner model

  • accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the LogitBoost training algorithm

  • maxIterations (size_t) – [optional, default: -1] Maximal number of terms in additive regression

  • weightsDegenerateCasesThreshold (double) – [optional, default: get_nan64()] Threshold to avoid degenerate cases when calculating weights W

  • responsesDegenerateCasesThreshold (double) – [optional, default: get_nan64()] Threshold to avoid degenerate cases when calculating responses Z

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data set

  • model (logitboost_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.classifier_prediction_result

Properties:

logProbabilities
Type:

Numpy array

prediction
Type:

Numpy array

probabilities
Type:

Numpy array

class daal4py.logitboost_model

Properties:

Iterations
Type:

size_t

NFeatures
Type:

size_t

NumberOfFeatures
Type:

size_t

NumberOfWeakLearners
Type:

size_t

WeakLearnerModel()
Type:

regression_model (or derived)

Stump Weak Learner Classification

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Classification Weak Learner Stump.

Examples:

class daal4py.stump_classification_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the the decision stump training method, double or float

  • method (str) – [optional, default: “defaultDense”] Decision stump training method

  • splitCriterion (str) – [optional, default: “”] Split criterion for stump classification

  • varImportance (str) – [optional, default: “”] Variable importance computation mode

  • nClasses (size_t) – [optional, default: -1] Number of classes

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Training data set

  • labels (data_or_file) – Labels of the training data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

stump_classification_training_result

class daal4py.stump_classification_training_result

Properties:

model
Type:

stump_classification_model

variableImportance
Type:

Numpy array

class daal4py.stump_classification_prediction
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the decision stump prediction algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] Decision stump model-based prediction method

  • nClasses (size_t) – [optional, default: -1] Number of classes

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data set

  • model (stump_classification_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.classifier_prediction_result

Properties:

logProbabilities
Type:

Numpy array

prediction
Type:

Numpy array

probabilities
Type:

Numpy array

class daal4py.stump_classification_model

Properties:

LeftValue
Type:

double

NFeatures
Type:

size_t

NumberOfFeatures
Type:

size_t

RightValue
Type:

double

SplitFeature
Type:

size_t

SplitValue
Type:

double

Multinomial Naive Bayes

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Naive Bayes.

Examples:

class daal4py.multinomial_naive_bayes_training
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for multinomial naive Bayes training, double or float

  • method (str) – [optional, default: “defaultDense”] Computation method

  • priorClassEstimates (array) – [optional, default: None] Prior class estimates

  • alpha (array) – [optional, default: None] Imagined occurrences of the each word

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

  • streaming (bool) – [optional, default: False] enable streaming

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Training data set

  • labels (data_or_file) – Labels of the training data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

multinomial_naive_bayes_training_result

class daal4py.multinomial_naive_bayes_training_result

Properties:

model
Type:

multinomial_naive_bayes_model

class daal4py.multinomial_naive_bayes_prediction
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for prediction based on the multinomial naive Bayes model, double or float

  • method (str) – [optional, default: “defaultDense”] Multinomial naive Bayes prediction method

  • priorClassEstimates (array) – [optional, default: None] Prior class estimates

  • alpha (array) – [optional, default: None] Imagined occurrences of the each word

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data set

  • model (multinomial_naive_bayes_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.classifier_prediction_result

Properties:

logProbabilities
Type:

Numpy array

prediction
Type:

Numpy array

probabilities
Type:

Numpy array

class daal4py.multinomial_naive_bayes_model

Properties:

AuxTable
Type:

Numpy array

LogP
Type:

Numpy array

LogTheta
Type:

Numpy array

NFeatures
Type:

size_t

NumberOfFeatures
Type:

size_t

Support Vector Machine (SVM)

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library SVM.

Note: For the labels parameter, data is formatted as -1s and 1s

Examples:

class daal4py.svm_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the SVM training algorithm, double or float

  • method (str) – [optional, default: “boser”] SVM training method

  • C (double) – [optional, default: get_nan64()] Upper bound in constraints of the quadratic optimization problem

  • accuracyThreshold (double) – [optional, default: get_nan64()] Training accuracy

  • tau (double) – [optional, default: get_nan64()] Tau parameter of the working set selection scheme

  • maxIterations (size_t) – [optional, default: -1] Maximal number of iterations for the algorithm

  • cacheSize (size_t) – [optional, default: -1] Size of cache in bytes to store values of the kernel matrix. A non-zero value enables use of a cache optimization technique

  • doShrinking (bool) – [optional, default: False] Flag that enables use of the shrinking optimization technique

  • shrinkingStep (size_t) – [optional, default: -1] Number of iterations between the steps of shrinking optimization technique

  • kernel (kernel_function_kerneliface__iface__) – [optional, default: None] Kernel function

  • nClasses (size_t) – [optional, default: -1] Number of classes

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Training data set

  • labels (data_or_file) – Labels of the training data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

svm_training_result

class daal4py.svm_training_result

Properties:

model
Type:

svm_model

class daal4py.svm_prediction
Parameters:
  • fptype (str) – [optional, default: “double”]

  • method (str) – [optional, default: “defaultDense”]

  • C (double) – [optional, default: get_nan64()] Upper bound in constraints of the quadratic optimization problem

  • accuracyThreshold (double) – [optional, default: get_nan64()] Training accuracy

  • tau (double) – [optional, default: get_nan64()] Tau parameter of the working set selection scheme

  • maxIterations (size_t) – [optional, default: -1] Maximal number of iterations for the algorithm

  • cacheSize (size_t) – [optional, default: -1] Size of cache in bytes to store values of the kernel matrix. A non-zero value enables use of a cache optimization technique

  • doShrinking (bool) – [optional, default: False] Flag that enables use of the shrinking optimization technique

  • shrinkingStep (size_t) – [optional, default: -1] Number of iterations between the steps of shrinking optimization technique

  • kernel (kernel_function_kerneliface__iface__) – [optional, default: None] Kernel function

  • nClasses (size_t) – [optional, default: -1] Number of classes

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data set

  • model (svm_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.classifier_prediction_result

Properties:

logProbabilities
Type:

Numpy array

prediction
Type:

Numpy array

probabilities
Type:

Numpy array

class daal4py.svm_model

Properties:

Bias
Type:

double

ClassificationCoefficients
Type:

Numpy array

NFeatures
Type:

size_t

NumberOfFeatures
Type:

size_t

SupportIndices
Type:

Numpy array

SupportVectors
Type:

Numpy array

Logistic Regression

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Logistic Regression.

Examples:

class daal4py.logistic_regression_training
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for logistic regression, double or float

  • method (str) – [optional, default: “defaultDense”] logistic regression computation method

  • interceptFlag (bool) – [optional, default: False] Whether the intercept needs to be computed

  • penaltyL1 (float) – [optional, default: get_nan32()] L1 regularization coefficient. Default is 0 (not applied)

  • penaltyL2 (float) – [optional, default: get_nan32()] L2 regularization coefficient. Default is 0 (not applied)

  • optimizationSolver (optimization_solver_iterative_solver_batch__iface__) – [optional, default: None] Default is sgd momentum solver

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, labels, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Training data set

  • labels (data_or_file) – Labels of the training data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

logistic_regression_training_result

class daal4py.logistic_regression_training_result

Properties:

model
Type:

logistic_regression_model

class daal4py.logistic_regression_prediction
Parameters:
  • nClasses (size_t) – Number of classes

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the logistic regression algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] logistic regression computation method

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data set

  • model (logistic_regression_modelptr) – Input model trained by the classification algorithm

Return type:

classifier_prediction_result

class daal4py.classifier_prediction_result

Properties:

logProbabilities
Type:

Numpy array

prediction
Type:

Numpy array

probabilities
Type:

Numpy array

class daal4py.logistic_regression_model

Properties:

Beta
Type:

Numpy array

InterceptFlag
Type:

bool

NFeatures
Type:

size_t

NumberOfBetas
Type:

size_t

NumberOfFeatures
Type:

size_t

Regression

See also Intel(R) oneAPI Data Analytics Library Regression.

Decision Forest Regression

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Regression Decision Forest.

Examples:

class daal4py.decision_forest_regression_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for decision forest model-based training, double or float

  • method (str) – [optional, default: “defaultDense”] decision forest training method

  • nTrees (size_t) – [optional, default: -1] Number of trees in the forest. Default is 10

  • observationsPerTreeFraction (double) – [optional, default: get_nan64()] Fraction of observations used for a training of one tree, 0 to 1. Default is 1 (sampling with replacement)

  • featuresPerNode (size_t) – [optional, default: -1] Number of features tried as possible splits per node. If 0 then sqrt(p) for classification, p/3 for regression, where p is the total number of features.

  • maxTreeDepth (size_t) – [optional, default: -1] Maximal tree depth. Default is 0 (unlimited)

  • minObservationsInLeafNode (size_t) – [optional, default: -1] Minimal number of observations in a leaf node. Default is 1 for classification, 5 for regression.

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine for the random numbers generator used by the algorithms

  • impurityThreshold (double) – [optional, default: get_nan64()] Threshold value used as stopping criteria: if the impurity value in the node is smaller than the threshold then the node is not split anymore.

  • varImportance (str) – [optional, default: “”] Variable importance computation mode

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • memorySavingMode (bool) – [optional, default: False] If true then use memory saving (but slower) mode

  • bootstrap (bool) – [optional, default: False] If true then training set for a tree is a bootstrap of the whole training set

  • minObservationsInSplitNode (size_t) – [optional, default: -1] Minimal number of observations in a split node. Default 2

  • minWeightFractionInLeafNode (double) – [optional, default: get_nan64()] The minimum weighted fraction of the sum total of weights (of all the input observations) required to be at a leaf node, 0.0 to 0.5. Default is 0.0

  • minImpurityDecreaseInSplitNode (double) – [optional, default: get_nan64()] A node will be split if this split induces a decrease of the impurity greater than or equal to the value, non-negative. Default is 0.0

  • maxLeafNodes (size_t) – [optional, default: -1] Maximum number of leaf node. Default is 0 (unlimited)

  • maxBins (size_t) – [optional, default: -1] Used with ‘hist’ split finding method only. Maximal number of discrete bins to bucket continuous features. Default is 256. Increasing the number results in higher computation costs

  • minBinSize (size_t) – [optional, default: -1] Used with ‘hist’ split finding method only. Minimal number of observations in a bin. Default is 5

compute(data, dependentVariable, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • dependentVariable (data_or_file) – Values of the dependent variable for the input data

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

decision_forest_regression_training_result

class daal4py.decision_forest_regression_training_result

Properties:

model
Type:

decision_forest_regression_model

outOfBagError
Type:

Numpy array

outOfBagErrorPerObservation
Type:

Numpy array

outOfBagErrorPrediction
Type:

Numpy array

outOfBagErrorR2
Type:

Numpy array

variableImportance
Type:

Numpy array

class daal4py.decision_forest_regression_prediction
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for decision forest model-based prediction

  • method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • model (decision_forest_regression_modelptr) – Trained decision tree model

Return type:

decision_forest_regression_prediction_result

class daal4py.decision_forest_regression_prediction_result

Properties:

prediction
Type:

Numpy array

class daal4py.decision_forest_regression_model

Properties:

NumberOfFeatures
Type:

size_t

NumberOfTrees
Type:

size_t

Decision Tree Regression

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Regression Decision Tree.

Examples:

class daal4py.decision_tree_regression_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for Decision tree model-based training, double or float

  • method (str) – [optional, default: “defaultDense”] Decision tree training method

  • pruning (str) – [optional, default: “”] Pruning method for Decision tree

  • maxTreeDepth (size_t) – [optional, default: -1] Maximum tree depth. 0 means unlimited depth.

  • minObservationsInLeafNodes (size_t) – [optional, default: -1] Minimum number of observations in the leaf node. Can be any positive number.

compute(data, dependentVariables, dataForPruning, dependentVariablesForPruning, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • dependentVariables (data_or_file) – Values of the dependent variable for the input data

  • dataForPruning (data_or_file) – Pruning data set

  • dependentVariablesForPruning (data_or_file) – Labels of the pruning data set

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set

Return type:

decision_tree_regression_training_result

class daal4py.decision_tree_regression_training_result

Properties:

model
Type:

decision_tree_regression_model

class daal4py.decision_tree_regression_prediction
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for Decision tree model-based prediction

  • method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode

  • pruning (str) – [optional, default: “”] Pruning method for Decision tree

  • maxTreeDepth (size_t) – [optional, default: -1] Maximum tree depth. 0 means unlimited depth.

  • minObservationsInLeafNodes (size_t) – [optional, default: -1] Minimum number of observations in the leaf node. Can be any positive number.

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • model (decision_tree_regression_modelptr) – Trained decision tree model

Return type:

decision_tree_regression_prediction_result

class daal4py.decision_tree_regression_prediction_result

Properties:

prediction
Type:

Numpy array

class daal4py.decision_tree_regression_model

Properties:

NumberOfFeatures
Type:

size_t

Gradient Boosted Regression

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Regression Gradient Boosted Tree.

Examples:

class daal4py.gbt_regression_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for model-based training, double or float

  • method (str) – [optional, default: “defaultDense”] gradient boosted trees training method

  • loss (str) – [optional, default: “”] Loss function type

  • varImportance (str) – [optional, default: “”] 64 bit integer flag VariableImportanceModes that indicates the variable importance computation modes

  • splitMethod (str) – [optional, default: “”] Split finding method. Default is exact

  • maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the gradient boosted trees training algorithm. Default is 50

  • maxTreeDepth (size_t) – [optional, default: -1] Maximal tree depth, 0 for unlimited. Default is 6

  • shrinkage (double) – [optional, default: get_nan64()] Learning rate of the boosting procedure. Scales the contribution of each tree by a factor (0, 1]. Default is 0.3

  • minSplitLoss (double) – [optional, default: get_nan64()] Loss regularization parameter. Min loss reduction required to make a further partition on a leaf node of the tree. Range: [0, inf). Default is 0

  • lambda (double) – [optional, default: get_nan64()] L2 regularization parameter on weights. Range: [0, inf). Default is 1

  • observationsPerTreeFraction (double) – [optional, default: get_nan64()] Fraction of observations used for a training of one tree, sampling without replacement. Range: (0, 1]. Default is 1 (no sampling, entire dataset is used)

  • featuresPerNode (size_t) – [optional, default: -1] Number of features tried as possible splits per node. Range : [0, p] where p is the total number of features. Default is 0 (use all features)

  • minObservationsInLeafNode (size_t) – [optional, default: -1] Minimal number of observations in a leaf node. Default is 5.

  • memorySavingMode (bool) – [optional, default: False] If true then use memory saving (but slower) mode. Default is false

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine for the random numbers generator used by the algorithms

  • maxBins (size_t) – [optional, default: -1] Used with ‘inexact’ split finding method only. Maximal number of discrete bins to bucket continuous features. Default is 256. Increasing the number results in higher computation costs

  • minBinSize (size_t) – [optional, default: -1] Used with ‘inexact’ split finding method only. Minimal number of observations in a bin. Default is 5

  • internalOptions (int) – [optional, default: -1] Internal options

compute(data, dependentVariable)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • dependentVariable (data_or_file) – Values of the dependent variable for the input data

Return type:

gbt_regression_training_result

class daal4py.gbt_regression_training_result

Properties:

model
Type:

gbt_regression_model

variableImportanceByCover
Type:

Numpy array

variableImportanceByGain
Type:

Numpy array

variableImportanceByTotalCover
Type:

Numpy array

variableImportanceByTotalGain
Type:

Numpy array

variableImportanceByWeight
Type:

Numpy array

class daal4py.gbt_regression_prediction
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for model-based prediction

  • method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode

  • nIterations (size_t) – [optional, default: -1] Number of iterations of the trained model to be uses for prediction

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • model (gbt_regression_modelptr) – Trained gradient boosted trees model

Return type:

gbt_regression_prediction_result

class daal4py.gbt_regression_prediction_result

Properties:

prediction
Type:

Numpy array

class daal4py.gbt_regression_model

Properties:

NumberOfFeatures
Type:

size_t

NumberOfTrees
Type:

size_t

Linear Regression

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Linear Regression.

Examples:

class daal4py.linear_regression_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for linear regression model-based training, double or float

  • method (str) – [optional, default: “normEqDense”] Linear regression training method

  • interceptFlag (bool) – [optional, default: False] Flag that indicates whether the intercept needs to be computed

  • distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

  • streaming (bool) – [optional, default: False] enable streaming

compute(data, dependentVariables)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • dependentVariables (data_or_file) – Values of the dependent variable for the input data

Return type:

linear_regression_training_result

class daal4py.linear_regression_training_result

Properties:

model
Type:

linear_regression_model

class daal4py.linear_regression_prediction
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for linear regression model-based prediction

  • method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • model (linear_regression_modelptr) – Trained linear regression model

Return type:

linear_regression_prediction_result

class daal4py.linear_regression_prediction_result

Properties:

prediction
Type:

Numpy array

class daal4py.linear_regression_model

Properties:

Beta
Type:

Numpy array

InterceptFlag
Type:

bool

NumberOfBetas
Type:

size_t

NumberOfFeatures
Type:

size_t

NumberOfResponses
Type:

size_t

Least Absolute Shrinkage and Selection Operator

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Least Absolute Shrinkage and Selection Operator.

Examples:

class daal4py.lasso_regression_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for lasso regression model-based training, double or float

  • method (str) – [optional, default: “defaultDense”] LASSO regression training method

  • lassoParameters (array) – [optional, default: None] Numeric table that contains values of lasso parameters

  • optimizationSolver (optimization_solver_iterative_solver_batch__iface__) – [optional, default: None] Default is coordinate descent solver

  • dataUseInComputation (str) – [optional, default: “”] The flag allows to corrupt input data

  • optResultToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the optional results to compute

  • interceptFlag (bool) – [optional, default: False] Flag that indicates whether the intercept needs to be computed

compute(data, dependentVariables, weights, gramMatrix)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • dependentVariables (data_or_file) – Values of the dependent variable for the input data

  • weights (data_or_file) – [optional, default: None] NumericTable of size 1 x n with weights of samples. Applied for all method

  • gramMatrix (data_or_file) – [optional, default: None] NumericTable of size p x p with last iteration number. Applied for all method

Return type:

lasso_regression_training_result

class daal4py.lasso_regression_training_result

Properties:

gramMatrixId
Type:

Numpy array

model
Type:

lasso_regression_model

class daal4py.lasso_regression_prediction
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for lasso regression model-based prediction

  • method (str) – [optional, default: “defaultDense”]

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • model (lasso_regression_modelptr) – Trained lasso regression model

Return type:

lasso_regression_prediction_result

class daal4py.lasso_regression_prediction_result

Properties:

prediction
Type:

Numpy array

class daal4py.lasso_regression_model

Properties:

Beta
Type:

Numpy array

InterceptFlag
Type:

bool

NumberOfBetas
Type:

size_t

NumberOfFeatures
Type:

size_t

NumberOfResponses
Type:

size_t

Ridge Regression

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Ridge Regression.

Examples:

class daal4py.ridge_regression_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for ridge regression model-based training, double or float

  • method (str) – [optional, default: “normEqDense”] Ridge regression training method

  • ridgeParameters (array) – [optional, default: None] Numeric table that contains values of ridge parameters

  • interceptFlag (bool) – [optional, default: False] Flag that indicates whether the intercept needs to be computed

  • distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

  • streaming (bool) – [optional, default: False] enable streaming

compute(data, dependentVariables)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • dependentVariables (data_or_file) – Values of the dependent variable for the input data

Return type:

ridge_regression_training_result

class daal4py.ridge_regression_training_result

Properties:

model
Type:

ridge_regression_model

class daal4py.ridge_regression_prediction
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for ridge regression model-based prediction

  • method (str) – [optional, default: “defaultDense”] Computation method in the batch processing mode

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • model (ridge_regression_modelptr) – Trained ridge regression model

Return type:

ridge_regression_prediction_result

class daal4py.ridge_regression_prediction_result

Properties:

prediction
Type:

Numpy array

class daal4py.ridge_regression_model

Properties:

Beta
Type:

Numpy array

InterceptFlag
Type:

bool

NumberOfBetas
Type:

size_t

NumberOfFeatures
Type:

size_t

NumberOfResponses
Type:

size_t

Stump Regression

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Regression Stump.

Examples:

class daal4py.stump_regression_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the the decision stump training method, double or float

  • method (str) – [optional, default: “defaultDense”] Decision stump training method

  • varImportance (str) – [optional, default: “”] Variable importance mode. Variable importance computation is not supported for current version of the library

compute(data, dependentVariables, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • dependentVariables (data_or_file) – Values of the dependent variable for the input data

  • weights (data_or_file) – [optional, default: None] Optional. Weights of the observations in the training data set. Some values are skipped for backward compatibility.

Return type:

stump_regression_training_result

class daal4py.stump_regression_training_result

Properties:

model
Type:

stump_regression_model

variableImportance
Type:

Numpy array

class daal4py.stump_regression_prediction
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the decision stump prediction algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] Decision stump model-based prediction method

  • varImportance (str) – [optional, default: “”] Variable importance mode. Variable importance computation is not supported for current version of the library

compute(data, model)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • model (stump_regression_modelptr) – Trained regression model

Return type:

stump_regression_prediction_result

class daal4py.stump_regression_prediction_result

Properties:

prediction
Type:

Numpy array

class daal4py.stump_regression_model

Properties:

LeftValue
Type:

double

NumberOfFeatures
Type:

size_t

RightValue
Type:

double

SplitFeature
Type:

size_t

SplitValue
Type:

double

Principal Component Analysis (PCA)

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library PCA.

Examples:

class daal4py.pca
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for PCA, double or float

  • method (str) – [optional, default: “correlationDense”] PCA computation method

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • nComponents (size_t) – [optional, default: -1] number of components for reduced implementation

  • isDeterministic (bool) – [optional, default: False] sign flip if required

  • normalization (normalization_zscore_batchimpl__iface__) – [optional, default: None] Pointer to batch covariance

  • distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

compute(data, correlation)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • correlation (data_or_file) – [optional, default: None] Input correlation table

Return type:

pca_result

class daal4py.pca_result

Properties:

dataForTransform
Type:

Numpy array

eigenvalues
Type:

Numpy array

eigenvectors
Type:

Numpy array

means
Type:

Numpy array

variances
Type:

Numpy array

Principal Component Analysis (PCA) Transform

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library PCA Transform.

Examples:

class daal4py.pca_transform
Parameters:
  • fptype (str) – [optional, default: “double”]

  • method (str) – [optional, default: “defaultDense”]

  • nComponents (size_t) – [optional, default: -1]

compute(data, eigenvectors, dataForTransform)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • eigenvectors (data_or_file) – Transformation matrix of eigenvectors

  • dataForTransform (dict_numerictableptr) – Data for transform

Return type:

pca_transform_result

class daal4py.pca_transform_result

Properties:

transformedData
Type:

Numpy array

K-Means Clustering

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library K-Means Clustering.

Examples:

K-Means Initialization

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library K-Means Initialization.

class daal4py.kmeans_init
Parameters:
  • nClusters (size_t) – Number of clusters

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of initial clusters for K-Means algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] Method of computing initial clusters for the algorithm

  • nTrials (size_t) – [optional, default: -1] Kmeans++ only. The number of trials to generate all clusters but the first initial cluster.

  • oversamplingFactor (double) – [optional, default: get_nan64()] Kmeans|| only. A fraction of nClusters being chosen in each of nRounds of kmeans||.L = nClusters* oversamplingFactor points are sampled in a round.

  • nRounds (size_t) – [optional, default: -1] Kmeans|| only. Number of rounds for k-means||. (oversamplingFactor*nRounds) > 1 is a requirement.

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine to be used for generating random numbers for the initialization

  • distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

kmeans_init_result

class daal4py.kmeans_init_result

Properties:

centroids
Type:

Numpy array

K-Means

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library K-Means Computation.

class daal4py.kmeans
Parameters:
  • nClusters (size_t) – Number of clusters

  • maxIterations (size_t) – Number of iterations

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of K-Means, double or float

  • method (str) – [optional, default: “lloydDense”] Computation method of the algorithm

  • accuracyThreshold (double) – [optional, default: get_nan64()] Threshold for the termination of the algorithm

  • gamma (double) – [optional, default: get_nan64()] Weight used in distance computation for categorical features

  • distanceType (str) – [optional, default: “”] Distance used in the algorithm

  • resultsToEvaluate (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • assignFlag (bool) – [optional, default: False] Do data points assignment :param bool distributed: [optional, default: False] enable distributed computation (SPMD)

compute(data, inputCentroids)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • inputCentroids (data_or_file) – Initial centroids for the algorithm

Return type:

kmeans_result

class daal4py.kmeans_result

Properties:

assignments
Type:

Numpy array

centroids
Type:

Numpy array

nIterations
Type:

Numpy array

objectiveFunction
Type:

Numpy array

Density-Based Spatial Clustering of Applications with Noise

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Density-Based Spatial Clustering of Applications with Noise.

Examples:

class daal4py.dbscan
Parameters:
  • epsilon (double) – Radius of neighborhood

  • minObservations (size_t) – Minimal total weight of observations in neighborhood of core observation

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of DBSCAN, double or float

  • method (str) – [optional, default: “defaultDense”] Computation method of the algorithm

  • memorySavingMode (bool) – [optional, default: False] If true then use memory saving (but slower) mode

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • blockIndex (size_t) – [optional, default: -1] Unique identifier of block initially passed for computation on the local node

  • nBlocks (size_t) – [optional, default: -1] Number of blocks initially passed for computation on all nodes

  • leftBlocks (size_t) – [optional, default: -1] Number of blocks that will process observations with value of selected split feature lesser than selected split value

  • rightBlocks (size_t) – [optional, default: -1] Number of blocks that will process observations with value of selected split feature greater than selected split value

  • distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

compute(data, weights)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • weights (data_or_file) – [optional, default: None] Input weights of observations

Return type:

dbscan_result

class daal4py.dbscan_result

Properties:

assignments
Type:

Numpy array

coreIndices
Type:

Numpy array

coreObservations
Type:

Numpy array

nClusters
Type:

Numpy array

Outlier Detection

Multivariate Outlier Detection

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Multivariate Outlier Detection.

Examples:

class daal4py.multivariate_outlier_detection
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the multivariate outlier detection, double or float

  • method (str) – [optional, default: “defaultDense”] Multivariate outlier detection computation method

compute(data, location, scatter, threshold)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • location (data_or_file) – [optional, default: None] Vector of mean estimates of size 1 x p

  • scatter (data_or_file) – [optional, default: None] Measure of spread, the variance-covariance matrix of size p x p

  • threshold (data_or_file) – [optional, default: None] Limit that defines the outlier region, the array of size 1 x 1 containing a non-negative number

Return type:

multivariate_outlier_detection_result

class daal4py.multivariate_outlier_detection_result

Properties:

weights
Type:

Numpy array

Univariate Outlier Detection

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Univariate Outlier Detection.

Examples:

class daal4py.univariate_outlier_detection
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the univariate outlier detection algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] univariate outlier detection computation method

compute(data, location, scatter, threshold)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • location (data_or_file) – [optional, default: None] Vector of mean estimates of size 1 x p

  • scatter (data_or_file) – [optional, default: None] Measure of spread, the array of standard deviations of size 1 x p

  • threshold (data_or_file) – [optional, default: None] Limit that defines the outlier region, the array of non-negative numbers of size 1 x p

Return type:

univariate_outlier_detection_result

class daal4py.univariate_outlier_detection_result

Properties:

weights
Type:

Numpy array

Multivariate Bacon Outlier Detection

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Multivariate Bacon Outlier Detection.

Examples:

class daal4py.bacon_outlier_detection
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the BACON outlier detection, double or float

  • method (str) – [optional, default: “defaultDense”] BACON outlier detection computation method

  • initMethod (str) – [optional, default: “”] Initialization method

  • alpha (double) – [optional, default: get_nan64()] One-tailed probability that defines the (1 - lpha) quantile of the chi^2 distribution with p degrees of freedom. Recommended value: lpha / n, where n is the number of observations.

  • toleranceToConverge (double) – [optional, default: get_nan64()] Stopping criterion: the algorithm is terminated if the size of the basic subset is changed by less than the threshold

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

bacon_outlier_detection_result

class daal4py.bacon_outlier_detection_result

Properties:

weights
Type:

Numpy array

Optimization Solvers

Objective Functions

Mean Squared Error Algorithm (MSE)

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library MSE.

Examples:

class daal4py.optimization_solver_mse
Parameters:
  • numberOfTerms (size_t) – The number of terms in the function

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Mean squared error objective function, double or float

  • method (str) – [optional, default: “defaultDense”] The Mean squared error objective function computation method

  • interceptFlag (bool) – [optional, default: False] Whether the intercept needs to be computed. Default is true

  • penaltyL1 (array) – [optional, default: None] L1 regularization coefficients. Default is 0 (not applied)

  • penaltyL2 (array) – [optional, default: None] L2 regularization coefficients. Default is 0 (not applied)

  • batchIndices (array) – [optional, default: None] Numeric table of size 1 x m where m is batch size that represent a batch of indices used to compute the function results, e.g., value of the sum of the functions. If no indices are provided, all terms will be used in the computations.

  • featureId (size_t) – [optional, default: -1] The feature index to compute part of gradient/hessian/proximal projection

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, dependentVariables, argument, weights, gramMatrix)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Numeric table of size n x p with data

  • dependentVariables (data_or_file) – Numeric table of size n x 1 with dependent variables

  • argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function

  • weights (data_or_file) – NumericTable of size 1 x n with samples weights. Applied for all method

  • gramMatrix (data_or_file) – NumericTable of size p x p with last iteration number. Applied for all method

Return type:

optimization_solver_objective_function_result

setup(data, dependentVariables, argument, weights, gramMatrix)

Setup (partial) input data for using algorithm object in other algorithms.

Parameters:
  • data (data_or_file) – Numeric table of size n x p with data

  • dependentVariables (data_or_file) – Numeric table of size n x 1 with dependent variables

  • argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function

  • weights (data_or_file) – NumericTable of size 1 x n with samples weights. Applied for all method

  • gramMatrix (data_or_file) – NumericTable of size p x p with last iteration number. Applied for all method

Return type:

None

daal4py.optimization_solver_mse_result

alias of optimization_solver_objective_function_result

Logistic Loss

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Logistic Loss.

Examples:

class daal4py.optimization_solver_logistic_loss
Parameters:
  • numberOfTerms (size_t) – The number of terms in the function

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Logistic loss objective function, double or float

  • method (str) – [optional, default: “defaultDense”] The Logistic loss objective function computation method

  • interceptFlag (bool) – [optional, default: False] Whether the intercept needs to be computed. Default is true

  • penaltyL1 (float) – [optional, default: get_nan32()] L1 regularization coefficient. Default is 0 (not applied)

  • penaltyL2 (float) – [optional, default: get_nan32()] L2 regularization coefficient. Default is 0 (not applied)

  • batchIndices (array) – [optional, default: None] Numeric table of size 1 x m where m is batch size that represent a batch of indices used to compute the function results, e.g., value of the sum of the functions. If no indices are provided, all terms will be used in the computations.

  • featureId (size_t) – [optional, default: -1] The feature index to compute part of gradient/hessian/proximal projection

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, dependentVariables, argument)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Numeric table of size n x p with data

  • dependentVariables (data_or_file) – Numeric table of size n x 1 with dependent variables

  • argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function

Return type:

optimization_solver_objective_function_result

setup(data, dependentVariables, argument)

Setup (partial) input data for using algorithm object in other algorithms.

Parameters:
  • data (data_or_file) – Numeric table of size n x p with data

  • dependentVariables (data_or_file) – Numeric table of size n x 1 with dependent variables

  • argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function

Return type:

None

daal4py.optimization_solver_logistic_loss_result

alias of optimization_solver_objective_function_result

Cross-entropy Loss

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Cross Entropy Loss.

Examples:

class daal4py.optimization_solver_cross_entropy_loss
Parameters:
  • nClasses (size_t) – Number of classes (different values of dependent variable)

  • numberOfTerms (size_t) – The number of terms in the function

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Cross-entropy loss objective function, double or float

  • method (str) – [optional, default: “defaultDense”] The Cross-entropy loss objective function computation method

  • interceptFlag (bool) – [optional, default: False] Whether the intercept needs to be computed. Default is true

  • penaltyL1 (float) – [optional, default: get_nan32()] L1 regularization coefficient. Default is 0 (not applied)

  • penaltyL2 (float) – [optional, default: get_nan32()] L2 regularization coefficient. Default is 0 (not applied)

  • batchIndices (array) – [optional, default: None] Numeric table of size 1 x m where m is batch size that represent a batch of indices used to compute the function results, e.g., value of the sum of the functions. If no indices are provided, all terms will be used in the computations.

  • featureId (size_t) – [optional, default: -1] The feature index to compute part of gradient/hessian/proximal projection

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

compute(data, dependentVariables, argument)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Numeric table of size n x p with data

  • dependentVariables (data_or_file) – Numeric table of size n x 1 with dependent variables

  • argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function

Return type:

optimization_solver_objective_function_result

setup(data, dependentVariables, argument)

Setup (partial) input data for using algorithm object in other algorithms.

Parameters:
  • data (data_or_file) – Numeric table of size n x p with data

  • dependentVariables (data_or_file) – Numeric table of size n x 1 with dependent variables

  • argument (data_or_file) – Numeric table of size 1 x p with input argument of the objective function

Return type:

None

daal4py.optimization_solver_cross_entropy_loss_result

alias of optimization_solver_objective_function_result

Iterative Solvers

Stochastic Gradient Descent Algorithm

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library SGD.

Examples:

class daal4py.optimization_solver_sgd
Parameters:
  • function (optimization_solver_sum_of_functions_batch__iface__) – Objective function represented as sum of functions

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Stochastic gradient descent algorithm,

  • method (str) – [optional, default: “defaultDense”] Stochastic gradient descent computation method

  • batchIndices (array) – [optional, default: None] Numeric table that represents 32 bit integer indices of terms in the objective function. If no indices are provided, the implementation will generate random indices.

  • learningRateSequence (array) – [optional, default: None] Numeric table that contains values of the learning rate sequence

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine for random generation of 32 bit integer indices of terms in the objective function.

  • nIterations (size_t) – [optional, default: -1] Maximal number of iterations of the algorithm

  • accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the algorithm. The algorithm terminates when this accuracy is achieved

  • optionalResultRequired (bool) – [optional, default: False] Indicates whether optional result is required

  • batchSize (size_t) – [optional, default: -1] Number of batch indices to compute the stochastic gradient. If batchSize is equal to the number of terms in objective function then no random sampling is performed, and all terms are used to calculate the gradient. This parameter is ignored if batchIndices is provided.

  • conservativeSequence (array) – [optional, default: None] Numeric table of values of the conservative coefficient sequence

  • innerNIterations (size_t) – [optional, default: -1]

  • momentum (double) – [optional, default: get_nan64()] Momentum value

compute(inputArgument)

Do the actual computation on provided input data.

Parameters:

inputArgument (data_or_file) – Initial value to start optimization

Return type:

optimization_solver_sgd_result

class daal4py.optimization_solver_sgd_result

Properties:

minimum
Type:

Numpy array

nIterations
Type:

Numpy array

Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Algorithm

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library LBFGS.

Examples:

class daal4py.optimization_solver_lbfgs
Parameters:
  • function (optimization_solver_sum_of_functions_batch__iface__) – Objective function represented as sum of functions

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the LBFGS algorithm,

  • method (str) – [optional, default: “defaultDense”] LBFGS computation method

  • m (size_t) – [optional, default: -1] Memory parameter of LBFGS. The maximum number of correction pairs that define the approximation of inverse Hessian matrix.

  • L (size_t) – [optional, default: -1] The number of iterations between the curvature estimates calculations

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine for random choosing terms from objective function.

  • batchIndices (array) – [optional, default: None]

  • correctionPairBatchSize (size_t) – [optional, default: -1] Number of observations to compute the sub-sampled Hessian for correction pairs computation

  • correctionPairBatchIndices (array) – [optional, default: None]

  • stepLengthSequence (array) – [optional, default: None]

  • nIterations (size_t) – [optional, default: -1] Maximal number of iterations of the algorithm

  • accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the algorithm. The algorithm terminates when this accuracy is achieved

  • optionalResultRequired (bool) – [optional, default: False] Indicates whether optional result is required

  • batchSize (size_t) – [optional, default: -1] Number of batch indices to compute the stochastic gradient. If batchSize is equal to the number of terms in objective function then no random sampling is performed, and all terms are used to calculate the gradient. This parameter is ignored if batchIndices is provided.

compute(inputArgument)

Do the actual computation on provided input data.

Parameters:

inputArgument (data_or_file) – Initial value to start optimization

Return type:

optimization_solver_lbfgs_result

class daal4py.optimization_solver_lbfgs_result

Properties:

minimum
Type:

Numpy array

nIterations
Type:

Numpy array

Adaptive Subgradient Method

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library AdaGrad.

Examples:

class daal4py.optimization_solver_adagrad
Parameters:
  • function (optimization_solver_sum_of_functions_batch__iface__) – Objective function represented as sum of functions

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Adaptive gradient descent algorithm,

  • method (str) – [optional, default: “defaultDense”] Adaptive gradient descent computation method

  • batchIndices (array) – [optional, default: None] Numeric table that represents 32 bit integer indices of terms in the objective function. If no indices are provided, the implementation will generate random indices.

  • learningRate (array) – [optional, default: None] Numeric table that contains value of the learning rate

  • degenerateCasesThreshold (double) – [optional, default: get_nan64()] Value needed to avoid degenerate cases in square root computing.

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine for random generation of 32 bit integer indices of terms in the objective function.

  • nIterations (size_t) – [optional, default: -1] Maximal number of iterations of the algorithm

  • accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the algorithm. The algorithm terminates when this accuracy is achieved

  • optionalResultRequired (bool) – [optional, default: False] Indicates whether optional result is required

  • batchSize (size_t) – [optional, default: -1] Number of batch indices to compute the stochastic gradient. If batchSize is equal to the number of terms in objective function then no random sampling is performed, and all terms are used to calculate the gradient. This parameter is ignored if batchIndices is provided.

compute(inputArgument)

Do the actual computation on provided input data.

Parameters:

inputArgument (data_or_file) – Initial value to start optimization

Return type:

optimization_solver_adagrad_result

class daal4py.optimization_solver_adagrad_result

Properties:

minimum
Type:

Numpy array

nIterations
Type:

Numpy array

Stochastic Average Gradient Descent

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Stochastic Average Gradient Descent SAGA.

Examples:

class daal4py.optimization_solver_saga
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Stochastic average gradient descent algorithm,

  • method (str) – [optional, default: “defaultDense”] Stochastic average gradient descent computation method

  • batchIndices (array) – [optional, default: None] Numeric table that represents 32 bit integer indices of terms in the objective function. If no indices are provided, the implementation will generate random indices.

  • learningRateSequence (array) – [optional, default: None] Numeric table that contains value of the learning rate

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine for random generation of 32 bit integer indices of terms in the objective function.

  • function (optimization_solver_sum_of_functions_batch__iface__) – [optional, default: None] Objective function represented as sum of functions

  • nIterations (size_t) – [optional, default: -1] Maximal number of iterations of the algorithm

  • accuracyThreshold (double) – [optional, default: get_nan64()] Accuracy of the algorithm. The algorithm terminates when this accuracy is achieved

  • optionalResultRequired (bool) – [optional, default: False] Indicates whether optional result is required

  • batchSize (size_t) – [optional, default: -1] Number of batch indices to compute the stochastic gradient. If batchSize is equal to the number of terms in objective function then no random sampling is performed, and all terms are used to calculate the gradient. This parameter is ignored if batchIndices is provided.

compute(inputArgument, gradientsTable)

Do the actual computation on provided input data.

Parameters:
  • inputArgument (data_or_file) – Initial value to start optimization

  • gradientsTable (data_or_file) – Numeric table of size p x 1 with the values of G, where each value is an accumulated sum of squares of corresponding gradient’s coordinate values.

Return type:

optimization_solver_saga_result

class daal4py.optimization_solver_saga_result

Properties:

gradientsTable
Type:

Numpy array

minimum
Type:

Numpy array

nIterations
Type:

Numpy array

Distances

Cosine Distance Matrix

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Cosine Distance.

Examples:

class daal4py.cosine_distance
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the cosine distance, double or float

  • method (str) – [optional, default: “defaultDense”] Cosine distance computation method

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

cosine_distance_result

class daal4py.cosine_distance_result

Properties:

cosineDistance
Type:

Numpy array

Correlation Distance Matrix

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Correlation Distance.

Examples:

class daal4py.correlation_distance
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the correlation distance algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] Correlation distance computation method

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

correlation_distance_result

class daal4py.correlation_distance_result

Properties:

correlationDistance
Type:

Numpy array

Expectation-Maximization (EM)

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Expectation-Maximization.

Initialization for the Gaussian Mixture Model

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Expectation-Maximization Initialization.

Examples:

class daal4py.em_gmm_init
Parameters:
  • nComponents (size_t) – Number of components in the Gaussian mixture model

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of initial values for the EM for GMM algorithm, double or float

  • method (str) – [optional, default: “defaultDense”]

  • nTrials (size_t) – [optional, default: -1] Number of trials of short EM runs

  • nIterations (size_t) – [optional, default: -1] Number of iterations in every short EM run

  • accuracyThreshold (double) – [optional, default: get_nan64()] Threshold for the termination of the algorithm

  • covarianceStorage (str) – [optional, default: “”] Type of covariance in the Gaussian mixture model.

  • engine (engines_batchbase__iface__) – [optional, default: None] Engine to be used for randomly generating data points to start the initialization of short EM

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

em_gmm_init_result

class daal4py.em_gmm_init_result

Properties:

covariances
Type:

Numpy array

means
Type:

Numpy array

weights
Type:

Numpy array

EM algorithm for the Gaussian Mixture Model

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Expectation-Maximization for the Gaussian Mixture Model.

Examples:

class daal4py.em_gmm
Parameters:
  • nComponents (size_t) – Number of components in the Gaussian mixture model

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the EM for GMM algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] EM for GMM computation method

  • maxIterations (size_t) – [optional, default: -1] Maximal number of iterations of the algorithm.

  • accuracyThreshold (double) – [optional, default: get_nan64()] Threshold for the termination of the algorithm.

  • regularizationFactor (double) – [optional, default: get_nan64()] Factor for covariance regularization in case of ill-conditional data

  • covarianceStorage (str) – [optional, default: “”] Type of covariance in the Gaussian mixture model.

compute(data, inputWeights, inputMeans, inputCovariances)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table

  • inputWeights (data_or_file) – Input weights

  • inputMeans (data_or_file) – Input means

  • inputCovariances (list_numerictableptr) – Collection of input covariances

Return type:

em_gmm_result

class daal4py.em_gmm_result

Properties:

covariances
Type:

Numpy array

goalFunction
Type:

Numpy array

means
Type:

Numpy array

nIterations
Type:

Numpy array

weights
Type:

Numpy array

QR Decomposition

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library QR Decomposition.

QR Decomposition (without pivoting)

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library QR Decomposition without pivoting.

Examples:

class daal4py.qr
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the QR decomposition algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] Computation method of the algorithm

  • distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

  • streaming (bool) – [optional, default: False] enable streaming

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

qr_result

class daal4py.qr_result

Properties:

matrixQ
Type:

Numpy array

matrixR
Type:

Numpy array

Pivoted QR Decomposition

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Pivoted QR Decomposition.

Examples:

class daal4py.pivoted_qr
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of the pivoted QR algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] Computation method

  • permutedColumns (array) – [optional, default: None] On entry, if i-th element of permutedColumns != 0, * the i-th column of input matrix is moved to the beginning of Data * P before * the computation, and fixed in place during the computation. * If i-th element of permutedColumns = 0, the i-th column of input data * is a free column (that is, it may be interchanged during the * computation with any other free column).

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

pivoted_qr_result

class daal4py.pivoted_qr_result

Properties:

matrixQ
Type:

Numpy array

matrixR
Type:

Numpy array

permutationMatrix
Type:

Numpy array

Normalization

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Normalization.

Z-Score

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Z-Score.

Examples:

class daal4py.normalization_zscore
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the z-score normalization, double or float

  • method (str) – [optional, default: “defaultDense”] Z-score normalization computation method

  • resultsToCompute (str) – [optional, default: “”] 64 bit integer flag that indicates the results to compute

  • doScale (bool) – [optional, default: False] boolean flag that indicates the mode of computation. If true both centering and scaling, otherwise only centering.

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

normalization_zscore_result

class daal4py.normalization_zscore_result

Properties:

means
Type:

Numpy array

normalizedData
Type:

Numpy array

variances
Type:

Numpy array

Min-Max

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Min-Max.

Examples:

class daal4py.normalization_minmax
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the min-max normalization, double or float

  • method (str) – [optional, default: “defaultDense”] Min-max normalization computation method

  • lowerBound (double) – [optional, default: get_nan64()] The lower bound of the features value will be obtained during normalization.

  • upperBound (double) – [optional, default: get_nan64()] The upper bound of the features value will be obtained during normalization.

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

normalization_minmax_result

class daal4py.normalization_minmax_result

Properties:

normalizedData
Type:

Numpy array

Random Number Engines

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Engines.

class daal4py.engines_result

Properties:

randomNumbers
Type:

Numpy array

mt19937

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library mt19937.

class daal4py.engines_mt19937
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of mt19937 engine, double or float

  • method (str) – [optional, default: “defaultDense”] Computation method of the engine

  • seed (size_t) – [optional, default: -1] seed

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:

tableToFill (data_or_file) – Input table to fill with random numbers

Return type:

engines_result

daal4py.engines_mt19937_result

alias of engines_result

mt2203

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library mt2203.

class daal4py.engines_mt2203
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of mt2203 engine, double or float

  • method (str) – [optional, default: “defaultDense”] Computation method of the engine

  • seed (size_t) – [optional, default: -1] seed

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:

tableToFill (data_or_file) – Input table to fill with random numbers

Return type:

engines_result

daal4py.engines_mt2203_result

alias of engines_result

mcg59

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library mcg59.

class daal4py.engines_mcg59
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of mcg59 engine, double or float

  • method (str) – [optional, default: “defaultDense”] Computation method of the engine

  • seed (size_t) – [optional, default: -1] seed

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:

tableToFill (data_or_file) – Input table to fill with random numbers

Return type:

engines_result

daal4py.engines_mcg59_result

alias of engines_result

Distributions

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Distributions.

Bernoulli

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Bernoulli Distribution.

Examples:

class daal4py.distributions_bernoulli
Parameters:
  • p (double) – Success probability of a trial, value from [0.0; 1.0]

  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of bernoulli distribution, double or float

  • method (str) – [optional, default: “defaultDense”] Computation method of the distribution

  • engine (engines_batchbase__iface__) – [optional, default: None] Pointer to the engine

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:

tableToFill (data_or_file) – Input table to fill with random numbers

Return type:

distributions_result

daal4py.distributions_bernoulli_result

alias of distributions_result

Normal

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Normal Distribution.

Examples:

class daal4py.distributions_normal
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of normal distribution, double or float

  • method (str) – [optional, default: “defaultDense”] Computation method of the distribution

  • a (double) – [optional, default: get_nan64()] Mean

  • sigma (double) – [optional, default: get_nan64()] Standard deviation

  • engine (engines_batchbase__iface__) – [optional, default: None] Pointer to the engine

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:

tableToFill (data_or_file) – Input table to fill with random numbers

Return type:

distributions_result

daal4py.distributions_normal_result

alias of distributions_result

Uniform

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Uniform Distribution.

Examples:

class daal4py.distributions_uniform
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of uniform distribution, double or float

  • method (str) – [optional, default: “defaultDense”] Computation method of the distribution

  • a (double) – [optional, default: get_nan64()] Left bound a

  • b (double) – [optional, default: get_nan64()] Right bound b

  • engine (engines_batchbase__iface__) – [optional, default: None] Pointer to the engine

compute(tableToFill)

Do the actual computation on provided input data.

Parameters:

tableToFill (data_or_file) – Input table to fill with random numbers

Return type:

distributions_result

daal4py.distributions_uniform_result

alias of distributions_result

Association Rules

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Association Rules.

Examples:

class daal4py.association_rules
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the association rules algorithm, double or float

  • method (str) – [optional, default: “apriori”] Association rules algorithm computation method

  • minSupport (double) – [optional, default: get_nan64()] Minimum support 0.0 <= minSupport < 1.0

  • minConfidence (double) – [optional, default: get_nan64()] Minimum confidence 0.0 <= minConfidence < 1.0

  • nUniqueItems (size_t) – [optional, default: -1] Number of unique items

  • nTransactions (size_t) – [optional, default: -1] Number of transactions

  • discoverRules (bool) – [optional, default: False] Flag. If true, association rules are built from large itemsets

  • itemsetsOrder (str) – [optional, default: “”] Format of the resulting itemsets

  • rulesOrder (str) – [optional, default: “”] Format of the resulting association rules

  • minItemsetSize (size_t) – [optional, default: -1] Minimum number of items in a large itemset

  • maxItemsetSize (size_t) – [optional, default: -1] Maximum number of items in a large itemset. Set to zero to not limit the upper boundary for the size of large itemsets

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

association_rules_result

class daal4py.association_rules_result

Properties:

antecedentItemsets
Type:

Numpy array

confidence
Type:

Numpy array

consequentItemsets
Type:

Numpy array

largeItemsets
Type:

Numpy array

largeItemsetsSupport
Type:

Numpy array

Cholesky Decomposition

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Cholesky Decomposition.

Examples:

class daal4py.cholesky
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the Cholesky decomposition algorithm,

  • method (str) – [optional, default: “defaultDense”] Cholesky decomposition computation method

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

cholesky_result

class daal4py.cholesky_result

Properties:

choleskyFactor
Type:

Numpy array

Correlation and Variance-Covariance Matrices

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Correlation and Variance-Covariance Matrices.

Examples:

class daal4py.covariance
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of the correlation or variance-covariance matrix, double or float

  • method (str) – [optional, default: “defaultDense”] Computation method

  • outputMatrixType (str) – [optional, default: “”] Type of the computed matrix

  • distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

  • streaming (bool) – [optional, default: False] enable streaming

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

covariance_result

class daal4py.covariance_result

Properties:

correlation
Type:

Numpy array

covariance
Type:

Numpy array

mean
Type:

Numpy array

Implicit Alternating Least Squares (implicit ALS)

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Implicit Alternating Least Squares.

Examples:

class daal4py.implicit_als_training
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for implicit ALS model training, double or float

  • method (str) – [optional, default: “defaultDense”] Implicit ALS training method

  • nFactors (size_t) – [optional, default: -1] Number of factors

  • maxIterations (size_t) – [optional, default: -1] Maximum number of iterations of the implicit ALS training algorithm

  • alpha (double) – [optional, default: get_nan64()] Confidence parameter of the implicit ALS training algorithm

  • lambda (double) – [optional, default: get_nan64()] Regularization parameter

  • preferenceThreshold (double) – [optional, default: get_nan64()] Threshold used to define preference values

compute(data, inputModel)

Do the actual computation on provided input data.

Parameters:
  • data (data_or_file) – Input data table that contains ratings

  • inputModel (implicit_als_modelptr) – Initial model that contains initialized factors

Return type:

implicit_als_training_result

class daal4py.implicit_als_training_result

Properties:

model
Type:

implicit_als_model

class daal4py.implicit_als_model

Properties:

ItemsFactors
Type:

Numpy array

UsersFactors
Type:

Numpy array

class daal4py.implicit_als_prediction_ratings
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for implicit ALS model-based prediction, double or float

  • method (str) – [optional, default: “defaultDense”] Implicit ALS prediction method

  • nFactors (size_t) – [optional, default: -1] Number of factors

  • maxIterations (size_t) – [optional, default: -1] Maximum number of iterations of the implicit ALS training algorithm

  • alpha (double) – [optional, default: get_nan64()] Confidence parameter of the implicit ALS training algorithm

  • lambda (double) – [optional, default: get_nan64()] Regularization parameter

  • preferenceThreshold (double) – [optional, default: get_nan64()] Threshold used to define preference values

compute(model)

Do the actual computation on provided input data.

Parameters:

model (implicit_als_modelptr) – Input model trained by the ALS algorithm

Return type:

implicit_als_prediction_ratings_result

class daal4py.implicit_als_prediction_ratings_result

Properties:

prediction
Type:

Numpy array

Moments of Low Order

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Moments of Low Order.

Examples:

class daal4py.low_order_moments
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations of the low order moments, double or float

  • method (str) – [optional, default: “defaultDense”] Computation method of the algorithm

  • estimatesToCompute (str) – [optional, default: “”] Estimates to be computed by the algorithm

  • distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

  • streaming (bool) – [optional, default: False] enable streaming

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

low_order_moments_result

class daal4py.low_order_moments_result

Properties:

maximum
Type:

Numpy array

mean
Type:

Numpy array

minimum
Type:

Numpy array

secondOrderRawMoment
Type:

Numpy array

standardDeviation
Type:

Numpy array

sum
Type:

Numpy array

sumSquares
Type:

Numpy array

sumSquaresCentered
Type:

Numpy array

variance
Type:

Numpy array

variation
Type:

Numpy array

Quantiles

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Quantiles.

Examples:

class daal4py.quantiles
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the quantile algorithms, double or float

  • method (str) – [optional, default: “defaultDense”] Quantiles computation method

  • quantileOrders (array) – [optional, default: None] Numeric table with quantile orders. Default value is 0.5 (median)

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

quantiles_result

class daal4py.quantiles_result

Properties:

quantiles
Type:

Numpy array

Singular Value Decomposition (SVD)

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library SVD.

Examples:

class daal4py.svd
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the SVD algorithm, double or float

  • method (str) – [optional, default: “defaultDense”] SVD computation method

  • leftSingularMatrix (str) – [optional, default: “”] Format of the matrix of left singular vectors >

  • rightSingularMatrix (str) – [optional, default: “”] Format of the matrix of right singular vectors >

  • distributed (bool) – [optional, default: False] enable distributed computation (SPMD)

  • streaming (bool) – [optional, default: False] enable streaming

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

svd_result

class daal4py.svd_result

Properties:

leftSingularMatrix
Type:

Numpy array

rightSingularMatrix
Type:

Numpy array

singularValues
Type:

Numpy array

Sorting

Parameters and semantics are described in Intel(R) oneAPI Data Analytics Library Sorting.

Examples:

class daal4py.sorting
Parameters:
  • fptype (str) – [optional, default: “double”] Data type to use in intermediate computations for the sorting, double or float

  • method (str) – [optional, default: “defaultDense”] Sorting computation method

compute(data)

Do the actual computation on provided input data.

Parameters:

data (data_or_file) – Input data table

Return type:

sorting_result

class daal4py.sorting_result

Properties:

sortedData
Type:

Numpy array

Trees

daal4py.getTreeState()

Examples: