Model Builders for the Gradient Boosting Frameworks

Note

Scikit-learn patching functionality in daal4py was deprecated and moved to a separate package, Intel(R) Extension for Scikit-learn*. All future patches will be available only in Intel(R) Extension for Scikit-learn*. Use the scikit-learn-intelex package instead of daal4py for the scikit-learn acceleration.

Introduction

Gradient boosting on decision trees is one of the most accurate and efficient machine learning algorithms for classification and regression. The most popular implementations of it are:

XGBoost*
LightGBM*
CatBoost*

daal4py Model Builders deliver the accelerated models inference of those frameworks. The inference is performed by the oneDAL GBT implementation tuned for the best performance on the Intel(R) Architecture.

Note

Currently, experimental support for XGBoost* and LightGBM* categorical data is not supported. For the model conversion to work with daal4py, convert non-numeric data to numeric data before training and converting the model.

Conversion

The first step is to convert already trained model. The API usage for different frameworks is the same:

XGBoost:

import daal4py as d4p
d4p_model = d4p.mb.convert_model(xgb_model)

LightGBM:

import daal4py as d4p
d4p_model = d4p.mb.convert_model(lgb_model)

CatBoost:

import daal4py as d4p
d4p_model = d4p.mb.convert_model(cb_model)

Note

Convert model only once and then use it for the inference.

Classification and Regression Inference

The API is the same for classification and regression inference. Based on the original model passed to the convert_model(), d4p_prediction is either the classification or regression output.

d4p_prediction = d4p_model.predict(test_data)

Here, the predict() method of d4p_model is being used to make predictions on the test_data dataset. The d4p_prediction variable stores the predictions made by the predict() method.

SHAP Value Calculation for Regression Models

SHAP contribution and interaction value calculation are natively supported by models created with daal4py Model Builders. For these models, the predict() method takes additional keyword arguments:

d4p_model.predict(test_data, pred_contribs=True)      # for SHAP contributions
d4p_model.predict(test_data, pred_interactions=True)  # for SHAP interactions

The returned prediction has the shape:

(n_rows, n_features + 1) for SHAP contributions

(n_rows, n_features + 1, n_features + 1) for SHAP interactions

Here, n_rows is the number of rows (i.e., observations) in test_data, and n_features is the number of features in the dataset.

The prediction result for SHAP contributions includes a feature attribution value for each feature and a bias term for each observation.

The prediction result for SHAP interactions comprises (n_features + 1) x (n_features + 1) values for all possible feature combinations, along with their corresponding bias terms.

Note

The shapes of SHAP contributions and interactions are consistent with the XGBoost results. In contrast, the SHAP Python package drops bias terms, resulting in SHAP contributions (SHAP interactions) with one fewer column (one fewer column and row) per observation.

Scikit-learn-style Estimators

You can also use the scikit-learn-style classes GBTDAALClassifier and GBTDAALRegressor to convert and infer your models. For example:

from daal4py.sklearn.ensemble import GBTDAALRegressor
reg = xgb.XGBRegressor()
reg.fit(X, y)
d4p_predt = GBTDAALRegressor.convert_model(reg).predict(X)

Limitations

Model Builders support only base inference with prediction and probabilities prediction. The functionality is to be extended. Therefore, there are the following limitations: - The categorical features are not supported for conversion and prediction. - The multioutput models are not supported for conversion and prediction. - SHAP values can be calculated for regression models only.

Examples

Model Builders models conversion

Articles and Blog Posts

Improving the Performance of XGBoost and LightGBM Inference