What is Intel® SDC?

Intel® Scalable Dataframe Compiler (Intel® SDC) is an extension of Numba* that allows just-in-time and ahead-of-time compilation of Python codes, which are the mix of Pandas*, NumPy*, and other numerical functions.

Being the Numba extension, with the @njit decorator and respective compilation options Intel SDC generates machine code using the LLVM* Compiler:

Example 1: Compiling Basic Pandas* Workflow
import pandas as pd
from numba import njit

# Dataset for analysis
FNAME = "employees.csv"


# This function gets compiled by Numba*
@njit
def get_analyzed_data():
    df = pd.read_csv(FNAME)
    s_bonus = pd.Series(df['Bonus %'])
    s_first_name = pd.Series(df['First Name'])
    m = s_bonus.mean()
    names = s_first_name.sort_values()
    return m, names


# Printing names and their average bonus percent
mean_bonus, sorted_first_names = get_analyzed_data()
print(sorted_first_names)
print('Average Bonus %:', mean_bonus)

On a single machine Intel SDC uses multi-threading (based on Intel® TBB or OpenMP* ) to parallelize Pandas* and Numpy* operations. Most of these operations are parallelized on function-level, so that no extra action is required from users in most cases.