What is Intel® SDC?¶
Intel® Scalable Dataframe Compiler (Intel® SDC) is an extension of Numba* that allows just-in-time and ahead-of-time compilation of Python codes, which are the mix of Pandas*, NumPy*, and other numerical functions.
Being the Numba extension, with the @njit
decorator and respective compilation options
Intel SDC generates machine code using the LLVM* Compiler:
import pandas as pd
from numba import njit
# Dataset for analysis
FNAME = "employees.csv"
# This function gets compiled by Numba*
@njit
def get_analyzed_data():
df = pd.read_csv(FNAME)
s_bonus = pd.Series(df['Bonus %'])
s_first_name = pd.Series(df['First Name'])
m = s_bonus.mean()
names = s_first_name.sort_values()
return m, names
# Printing names and their average bonus percent
mean_bonus, sorted_first_names = get_analyzed_data()
print(sorted_first_names)
print('Average Bonus %:', mean_bonus)
On a single machine Intel SDC uses multi-threading (based on Intel® TBB or OpenMP* ) to parallelize Pandas* and Numpy* operations. Most of these operations are parallelized on function-level, so that no extra action is required from users in most cases.