Published on: March 22, 2023
At Alchemy, our machine learning (ML) models use input and output variables for training to recognize certain types of patterns and make predictions. Alchemy AI (Figure 1.1.) uses two types of ML algorithms depending on the type of the output variables:
Alchemy’s AutoML module consists of 13 different algorithms for regression and 10 different algorithms for classification which are trained in parallel in order to significantly shorten the training duration.
A dataset consists of input and output variables.
Input variables are independent variables whose values are measured and inputted by the user. In Alchemy input variables are:
Output variables are variables which depend on input variables. In Alchemy output variables are:
A dataset for training AI in Alchemy is surfaced through Alchemy’s Scan and Score or Alchemy AI functionality. Our Scan and Score and Alchemy AI functionality gives information about the:
The Show More Details button displays the number of available trials for each property separately.
Train AI in Alchemy consists of:
Hyperparameter tuning is a process which includes searching for the hyperparameters that will produce the highest performance of the models for each ML algorithm.
In Alchemy, performance of ML models are evaluated through repeated k-fold cross validation.
In k-fold cross validation, the dataset is split into k numbers of sets and each time one set is held back and models are trained with the remaining sets. Held back sets are used for performance estimation. This means that a total of k models are fit and evaluated based on the performance of mean value of held back sets. This process is repeated l times with different splits, depending on how large the dataset is. In addition, l ✕ k number of models are fitted with repeated k-fold cross validation for estimating the performance of ML models.
The process of model training is shown in Figure 3.1.
In Alchemy, selection of the best model is automatically made for each target property. Automatic selection of the best model consists of the following steps:
For automatically choosing the best model (Figure 3.2 and 3.3), different performance metrics are used.
It is important that we track performance metrics to validate the models we generate in terms of the accuracy of predicted values.
First, a couple definitions:
Performance metrics for regression models available in Alchemy are:
1. Model accuracy [%]:
$$M A=100 \%-\left(\frac{R M S E}{\bar{y}} \times 100 \%\right)$$
${MA}$ - model accuracy
${RMSE}$ - root mean squared error
$\bar{y}$ - average of all actual values
2. R2 (coefficient of determination):
$$R^2=1-\frac{\sum_{i=1}^N\left(y_i-\hat{y_i}\right)^2}{\sum_{i=1}^N\left(y_i-\bar{y}\right)^2}$$
${N}$ - number of trials
$y_i$ - actual value
$\widehat{y_i}$ - predicted value
$\bar{y}$ - average of all actual values
3. MAE (mean absolute error):
$$M A E=\frac{1}{N} \sum_{i=1}^N\left|y_i-\widehat{y_i}\right|$$
${N}$ - number of trials
$y_i$ - actual value
$\widehat{y_i}$ - predicted value
4. RMSE (root mean squared error):
$$R M S E=\sqrt{\frac{1}{N} \sum_{i=1}^N\left(y_i-\widehat{y}_i\right)^2}$$
${N}$ - number of trials
$y_i$ - actual value
$\widehat{y_i}$ - predicted value
Performance metrics for classification models available in Alchemy are:
1. Accuracy:
$$Accuracy =\frac{\text { Number of correct predictions }}{\text { Total number of predictions }}$$
2. Average Precision:
$$Average Precision =\sum_{k=1}^N \operatorname{Precision}(k) \Delta \operatorname{Recall}(k)$$
${N}$ - number of trials
$Precision(k)$ - is the precision at a cutoff of k
$\Delta Recall(k)$ - is the change in recall that happened between cutoff k-1 and cutoff k
3. F1 Score:
- Precision: accuracy of positive predictions, which is the ratio of true positive predictions to the total number of positive predictions made by the model and
$$Precision_{classI}=\frac{TP_{classI}}{TP_{classI}+FP_{classI}}$$
${Precision_{classI}}$ - precision for one class, there are as many classes as there are predefined values
${TP_{classI}}$ - true positives for class I, number of trials which were predicted correct for class I (predicted class I matched the actual class I)
${FP_{classI}}$ - false positive for class I, number of trials which were predicted incorrect to belong to class I (predicted class I did not match the actual class)
- Recall: ratio of true positive predictions to the total number of actual positive instances in the dataset
$$Recall_{classI}=\frac{TP_{classI}}{TP_{classI}+FN_{classI}}$$
${FN_{classI}}$ - false negative for class I, number of trials which were predicted incorrect to belong to another class (predicted class did not match the actual class I)
$$F1score_{class~I}=\frac{2\times Precision_{class~I}\times Recall_{class~I}}{Precision_{class~I}+Recall_{class~I}}$$
4. ROC AUC Score (area under the receiver operating characteristic curve):
At Alchemy, we strive to achieve three goals when models are trained:
All predicted property values have associated predicted confidence intervals which will show how much deviation can be expected from the predicted property value for a certain trial.