AutoML with AutoGluon#

Purpose#

Machine learning has become a powerful tool for solving real-world problems. However, building effective ML models traditionally requires deep expertise in data preprocessing, feature engineering, model selection, and hyperparameter tuning. This complexity can be a barrier for many developers, data scientists, and domain experts. Automated Machine Learning (AutoML) aims to simplify and accelerate the ML workflow by automating the most time-consuming and technically demanding tasks. With AutoML, you can:

  • Reduce the need for manual trial-and-error in model selection and tuning.

  • Achieve competitive performance with minimal effort.

  • Focus more on solving business problems rather than technical implementation.

  • Make ML accessible to non-experts.

AutoGluon is an open-source AutoML toolkit developed by Amazon Web Services (AWS). It automates many steps in the machine learning pipeline, including preprocessing, model selection, hyperparameter tuning, and ensembling. It’s designed to be easy to use and powerful, even with minimal code. For example, if you’re working with tabular data, you can train a model and make predictions on new data with just three lines of code.

from autogluon.tabular import TabularPredictor
predictor = TabularPredictor(label='target_column').fit(train_data)
predictions = predictor.predict(test_data)

Whether you’re a beginner looking to get started with ML or an experienced practitioner seeking to streamline your workflow, AutoGluon provides a robust and user-friendly solution.

To demonstrate how AutoML can be applied to semiconductor testing, this tutorial uses the prediction of Minimum Operating Voltage (vMin) as an example. The term vMin refers to the lowest voltage at which an integrated circuit operates reliably. A lower vMin reduces power consumption and extends device lifespan. Traditionally, vMin search begins from an initial voltage and gradually decreases until the minimum stable point is found, making the process time-consuming. Using AutoML to predict vMin reduces test time and ATE usage, enabling faster and more efficient semiconductor testing.

About this tutorial#

In this tutorial, you will learn how to do AutoML with AutoGluon in three steps:

  • Installation

  • Model training, prediction and evaluation

  • Using model in ACS RTDI

Compatibility#

  • Ubuntu 20.4 / SmarTest 8 / RHEL79 / Nexus 3.1.0 / Edge 3.4.0-prod / Unified Server 2.3.0

Procedure#

Installation#

Ensure you have the following prepared:

  • An ACS RTDI virtual environment containing an Ubuntu Server VM.

Then follow the commands to install environment required.

Click to expand!
# 1.Download artifacts
cd ~
rm -rf jupyter_lab_autogluon
curl http://10.44.5.139/jupyter_lab_autogluon_1.0.0.tar.gz -O
mkdir -p ~/jupyter_lab_autogluon && tar -zxf ./jupyter_lab_autogluon_1.0.0.tar.gz -C ~/jupyter_lab_autogluon
mv -f jupyter_lab_autogluon_1.0.0.tar.gz jupyter_lab_autogluon

# 2.Install required system packages
sudo apt update -y
sudo apt install -y make build-essential libssl-dev zlib1g-dev \
libbz2-dev libreadline-dev libsqlite3-dev wget curl llvm \
libncurses5-dev libncursesw5-dev xz-utils tk-dev libffi-dev \
liblzma-dev python3-openssl git
# 3.Install Python 3.10.12 and create virtual environment
PYTHON_VERSION="3.10.12"
PYTHON_DIR="$HOME/.local/python310"
VENV_DIR="$HOME/jupyter310_env"

if [ ! -d "$PYTHON_DIR" ]; then
    wget "https://www.python.org/ftp/python/$PYTHON_VERSION/Python-$PYTHON_VERSION.tgz"
    tar -xzf "Python-$PYTHON_VERSION.tgz"
    cd "Python-$PYTHON_VERSION" || exit
    ./configure --prefix="$PYTHON_DIR" --enable-optimizations
    make -j $(nproc)
    make install
    cd .. || exit
    rm -rf "Python-$PYTHON_VERSION" "Python-$PYTHON_VERSION.tgz"
else
    echo "Python 3.10.12 already installed, skipping..."
fi

if [ ! -d "$VENV_DIR" ]; then
    "$PYTHON_DIR/bin/python3.10" -m venv "$VENV_DIR"
else
    echo "Virtual environment already exists, skipping..."
fi

# 4.Install JupyterLab
source "$VENV_DIR/bin/activate"
pip install --upgrade pip -q
pip install jupyterlab
jupyter lab --generate-config -y
echo "c.ServerApp.port = 8890" >> "$HOME/.jupyter/jupyter_lab_config.py"
deactivate

Install AutoGluon and its dependencies:

!python -m pip install --upgrade pip
!python -m pip install autogluon

Model training, prediction and evaluation#

Start JupyterLab:

source "$HOME/jupyter310_env/bin/activate"
jupyter lab --ip=0.0.0.0 --port=8890 ~/jupyter_lab_autogluon/jupyter_lab_autogluon_1.0.0/autogluon-example.ipynb

Loading Data#

Load the dataset required for model training and prediction. The target variable for prediction is the “vMin” column.

import pandas as pd
# 1. Obtain the dataset
train_data = pd.read_csv('data/sample.train.csv', keep_default_na=True, na_values=["", " ", "NaN"])
test_data = pd.read_csv('data/sample.test.csv', keep_default_na=True, na_values=["", " ", "NaN"])

# 2. The target for prediction is "vMin" column
label = 'vMin'

Training#

AutoGluon generates high-performance predictive models by automatically preprocessing data (including handling missing values, encoding category features, etc.), selecting appropriate base models (such as LightGBM, XGBoost, etc.), optimizing hyperparameters and integrating multiple models.

# Train the model
predictor = TabularPredictor(label=label).fit(train_data)

Prediction#

The predictor obtained through training can make predictions on the target column. The column “vMin” has raw values and the column “predicted” has predicted values.

y_pred = predictor.predict(test_data.drop(columns=[label]))
# Combine true and predicted values into a DataFrame
comparison_df = test_data[[label]].copy()
comparison_df['predicted'] = y_pred

# Display the first few rows
comparison_df.head(20)
     vMin     predicted
0    0.487    0.491009
1    0.450    0.463575
2    0.918    0.900433
3    0.640    0.622023
4    0.638    0.612958

Evaluation#

Evaluate the results of the trained model on the test set.

predictor.evaluate(test_data, silent=True)
{'root_mean_squared_error': np.float64(-0.04254107047903863),
'mean_squared_error': -0.001809742677502532,
'mean_absolute_error': -0.02221871429233551,
'r2': 0.9006343785237739,
'pearsonr': 0.9500226237145906,
'median_absolute_error': -0.009687968969345068}

AutoGluon standardizes all evaluation metrics to a “higher-is-better” format. For error-based metrics such as Root Mean Squared Error (RMSE), Mean Squared Error (MSE), Mean Absolute Error (MAE), and Median Absolute Error (MedAE), this means the reported values are negative, since lower error indicates better performance. In contrast, metrics like R2 (coefficient of determination) measure the goodness of fit, while Pearson Correlation assesses how well the predicted trend aligns with the actual trend.

To evaluate the importance of the features trained by AutoGluon, call the feature_importance() function on your test data.

predictor.feature_importance(test_data)
importance stddev p_value n p99_high p99_low
test_suite_name 0.121988 0.001625 3.782142e-09 5 0.125335 0.118641
x_spec 0.004538 0.000932 2.019610e-04 5 0.006457 0.002619
ecid_parametric_2 0.001077 0.000262 3.912536e-04 5 0.001618 0.000537
VDD0 0.001019 0.000183 1.187764e-04 5 0.001395 0.000643
ecid_end 0.000952 0.000236 4.179740e-04 5 0.001438 0.000466

To evaluate the accuracy of individual models trained by AutoGluon, call the leaderboard() function on your test data.

predictor.leaderboard(test_data)
model score_test score_val eval_metric pred_time_test pred_time_val fit_time pred_time_test_marginal pred_time_val_marginal fit_time_marginal stack_level can_infer fit_order
0 WeightedEnsemble_L2 -0.042541 -0.024852 root_mean_squared_error 0.55979 0.342045 32.233006 0.004487 0.000755 0.030322 2 True 10
1 NeuralNetFastAI -0.043398 -0.027227 root_mean_squared_error 0.088596 0.045584 14.866164 0.088596 0.045584 14.866164 1 True 6
2 RandomForestMSE -0.047547 -0.026954 root_mean_squared_error 0.204762 0.133644 11.562767 0.204762 0.133644 11.562767 1 True 3
3 LightGBMLarge -0.048979 -0.041049 root_mean_squared_error 0.100231 0.036399 3.642617 0.100231 0.036399 3.642617 1 True 9
4 LightGBM -0.049166 -0.037701 root_mean_squared_error 0.055474 0.028727 1.3965 0.055474 0.028727 1.3965 1 True 2

Using model in ACS RTDI#

Download the example Test Program and application on the Host Controller:

cd ~/apps/
curl http://10.44.5.139/apps/application-ag-v3.1.0-RHEL79.tar.gz -O
tar -zxf application-ag-v3.1.0-RHEL79.tar.gz

Make a container application from the developed model#

After training, the model is located in the AutogluonModels directory. You need to copy the model files to the container application’s directory: ~/apps/application-ag-v3.1.0/rd-autogluon-app/AutogluonModels.

Then, we need to implement model loading, receiving request messages from the Test Program, performing prediction, and returning the results to the Test Program:

# Load the model
g_ag_model = "AutogluonModels/ag-20250915_051243"   # The trained AutoGluon model
self.predictor = TabularPredictor.load(g_ag_model)
......
......
# receiving request messages
def _handle_request(self, request, logger):
    try:
        # Deep copy and parse the JSON request
        request_cp = copy.deepcopy(request)
        request_cp = json.loads(request_cp)

        # Extract row data and convert to DataFrame
        row_data = request_cp.get("data")
        print(f"---row data:{row_data}")
        df = pd.read_csv(StringIO(row_data))

        # Convert all values to numeric, coercing errors to NaN
        df = df.apply(pd.to_numeric, errors='coerce')

        # Make prediction
        predicted_value = self.predictor.predict(df).iloc[0]

Request a prediction to a running application#

We want to predict vMin (column W) with feature data from column A to column V:

A B C D E F G H I J K L M N O P Q R S T U V W
acs_lot acs_wafer acs_ecid_x acs_ecid_y ecid_start ecid_end ecid_time bin final_bin_txt ecid_lot_id acs_tester_names acs_last_test_suite ecid_parametric_1 ecid_parametric_2 test_suite_name x_spec FBVDDQ VDD0 VDD1 VDD2 VDD3 VDDMS vMin
acs_lot_004 1 1 1 12/17/2024 22:24 12/17/2024 22:33 555577 1 final_bin_txt_001 ecid_lot_id_002 acs_tester_names_001 acs_last_test_suite_002 1585.520839 2122.811813 test_suite_name_200 x_spec_005 1.35 0.8 0.8 0.8 0.8 0.8 0.595

The following is the feature data used in the prediction request:

{ "request": "Predict Target", "data": "acs_lot,acs_wafer,acs_ecid_x,acs_ecid_y,ecid_start,ecid_end,ecid_time,bin,final_bin_txt,ecid_lot_id,acs_tester_names,acs_last_test_suite,ecid_parametric_1,ecid_parametric_2,test_suite_name,x_spec,FBVDDQ,VDD0,VDD1,VDD2,VDD3,VDDMS\nacs_lot_001,16,3,4,_RARE_,10/26/2024 21:15,66506.7,982,final_bin_txt_002,ecid_lot_id_001,acs_tester_names_001,acs_last_test_suite_018,1888.513983,2143.343698,test_suite_name_089,x_spec_002,1.35,0.8,0.8,0.8,0.8,0.8" }
// Prepare feature data for prediction request
String[] values = line.split(",");
String data = headerLine + "\n" + line;
StringBuilder csvData = new StringBuilder();
csvData.append(headerLine);  // header
csvData.append("\\n");  // escaped newline
csvData.append(line);  // data row
String jsonPayload = "{ \"request\": \"Predict Target\", \"data\": \"" + csvData.toString() + "\" }";
...

Send feature data to the container application for prediction via Nexus TPI:

// Connect to the container application
NexusTPI.target("ag-app").timeout(20);
...
// Use Nexus TPI to send a request to the container application
int res = NexusTPI.request(jsonPayload);
// Receive the response
String response = NexusTPI.getResponse();
System.out.println("Response: " + response);

The returned response is in the following format, and the predicted value can be extracted from it:

Response: Predicted:0.5410701036453247

You can follow the steps below to run this example:

In this example, we use Unified Server as the image repository

Please refer to “Use Unified Server as a Container Registry” and configure Unified Server as a Docker image registry, you need to follow these steps from the documentation:

  • Configure the hosts for the Host Controller

  • Get the docker certificates from the Unified Server

  • Configure the Project and Account in Harbor

  • Configure Edge Server

Build the Docker image:

cd ~
curl http://10.44.5.139/docker/ubuntu_20.04.tar -O
sudo docker load -i ubuntu_20.04.tar
cd ~/apps/application-ag-v3.1.0/rd-autogluon-app/
sudo docker build ./ --tag=unifiedserver.local/example-project/example-repo:ag

Push the Docker image to the Unified Server:

sudo docker push unifiedserver.local/example-project/example-repo:ag

Configure the Nexus for container application deployment to the Edge Server:

gedit /opt/acs/nexus/conf/images.json
{
    "selector": {
        "device_name": "demo RTDI"
    },
    "edge": {
        "address": "<Edge IP>",
        "registry": {
            "address": "unifiedserver.local",
            "user": "robot$example_account",
            "password": "<Password>"
        },
        "containers": [
            {
                "name": "ag-app",
                "image": "example-project/example-repo:ag",
                "environment" : {
                    "ONEAPI_DEBUG": "3",
                    "ONEAPI_CONTROL_ZMQ_IP": "<Host Controller IP>"
                }
            }
        ]
    }
}
gedit /opt/acs/nexus/conf/acs_nexus.ini
[Auto_Deploy]
Enabled=false
...
[GUI]
Auto_Popup=true

To apply the modified configuration, restart Nexus:

sudo systemctl restart acs_nexus

Run the Test Program

cd ~/apps/application-ag-v3.1.0/
sh start_smt8.sh

You can view the inference results in the console log

{ "request": "Predict Target", "data": "acs_lot,acs_wafer,acs_ecid_x,acs_ecid_y,ecid_start,ecid_end,ecid_time,bin,final_bin_txt,ecid_lot_id,acs_tester_names,acs_last_test_suite,ecid_parametric_1,ecid_parametric_2,test_suite_name,x_spec,FBVDDQ,VDD0,VDD1,VDD2,VDD3,VDDMS\nacs_lot_001,16,3,4,_RARE_,10/26/2024 21:15,66506.7,982,final_bin_txt_002,ecid_lot_id_001,acs_tester_names_001,acs_last_test_suite_018,1888.513983,2143.343698,test_suite_name_089,x_spec_002,1.35,0.8,0.8,0.8,0.8,0.8" }
TPI res: 0
Response: Predicted:0.5410701036453247
{ "request": "Predict Target", "data": "acs_lot,acs_wafer,acs_ecid_x,acs_ecid_y,ecid_start,ecid_end,ecid_time,bin,final_bin_txt,ecid_lot_id,acs_tester_names,acs_last_test_suite,ecid_parametric_1,ecid_parametric_2,test_suite_name,x_spec,FBVDDQ,VDD0,VDD1,VDD2,VDD3,VDDMS\nacs_lot_004,1,1,1,12/17/2024 22:24,12/17/2024 22:33,555577,1,final_bin_txt_001,ecid_lot_id_002,acs_tester_names_001,acs_last_test_suite_002,1585.520839,2122.811813,test_suite_name_200,x_spec_005,1.35,0.8,0.8,0.8,0.8,0.8" }
TPI res: 0
Response: Predicted:0.5783701539039612
{ "request": "Predict Target", "data": "acs_lot,acs_wafer,acs_ecid_x,acs_ecid_y,ecid_start,ecid_end,ecid_time,bin,final_bin_txt,ecid_lot_id,acs_tester_names,acs_last_test_suite,ecid_parametric_1,ecid_parametric_2,test_suite_name,x_spec,FBVDDQ,VDD0,VDD1,VDD2,VDD3,VDDMS\nacs_lot_004,1,2,6,10/21/2024 12:25,10/21/2024 12:33,514916.2,177,final_bin_txt_005,ecid_lot_id_002,acs_tester_names_003,acs_last_test_suite_008,1960.714288,2155.847071,test_suite_name_042,x_spec_008,1.35,0.8,0.8,0.8,0.8,0.8" }
TPI res: 0
Response: Predicted:0.5643042922019958

For instructions on how to run the Test Program, please refer to “DPAT demo application on RTDI with SmarTest 8” -> “Run the SmarTest test program”.