** Reading the Manual on phones and smaller screen tablets is not recommended, since you may not enjoy the best user experience.
Alchemy’s AI handles data selection, data preprocessing, model training, and model tuning. It trains, in parallel, over 10,000 different models and hyperparameter combinations on AWS using more than 20 machine learning algorithms to find the best fit for specific datasets, ensuring strong performance on both existing and unseen data. Model selection is refined using advanced statistical methods to reduce bias and improve performance, especially with smaller datasets. This technique helps mitigate overfitting while enhancing generalization. The selected models are then applied in another genetic algorithm to generate formulation recommendations for data exploitation. This algorithm guides the formulation toward the desired properties and also predicts property values, providing prediction intervals that reflect the uncertainty of the predictions.
Our AI-ready platform is comprised of three main components:
Alchemy’s Scan & Score functionality is able to:
Scan & Score has two purposes:
To enable Scan & Score functionality, Requirements on the Lab Book Overview record need to be defined. Available categories of requirements on the Lab Book Overview record include:
After selecting the category of requirements, you then select the priority and fill in the target.
Constraints are theoretical constraints for each material that is used in a trial. Examples include any generic materials like Water or TiO2, as well as proprietary materials that can be used as ingredients in trials. Once selected, target values must be defined as:
After the requirements and constraints on the Lab Book Overview record have been entered, the Scan & Score button can be clicked from any of the following records:
Scanning (Figure 2.3) searches for a 1:1 match for properties, along with their conditions, from the Lab Book Overview record with the properties present on the trial level. This means that:
Trials from the previous step are further filtered based on material constraints. Trials will be surfaced and filtered for scoring based on material constraints on the Lab Book Overview record:
From Figure 2.3 it can be seen that scanning will surface the following trials for scoring:
Scanning will not surface the following trials for scoring:
Trials that have partial matches are surfaced and scored because they have at least one exact match for a property, along with its conditions.
Once the trials are scanned, every trial is assigned with a score based on priority and how far the values of the trials properties and materials are from the targets defined on the Lab Book Overview record. When the property is missing on a trial level, because the property is not present in the trial or condition is different, the system assigns missing values and gives them a higher score.
Trials are ranked from the lowest scores, or best performing trials, to the highest scores, or worst performing trials.
At the bottom of the Lab Book Overview record is the Scan & Score button. This opens the Matching Trials component, where the system will find the best matching actual trials (Samples) from your historical database based on the given requirements, targets, and material constraints.
For this button to be enabled, at least one requirement (measured or calculated) must be added to the Requirements table in the Lab Book Overview record that has a priority of Must Have or Nice to Have. Requirements that have a No target, rate only priority will not be included in the Scan & Score functionality.
Once the Scan & Score button is clicked, the system navigates to a results page where up to ten of the best matching actual trials are displayed for your requirements.
The top of the page displays three boxes:
Clicking Show more details displays a table that provides the total number of historical trials for each corresponding Must Have or Nice to Have property that, at a minimum, partially fulfill the trial rules and can be used as a starting dataset. This number will help users determine whether model training should be attempted or if they should proceed directly to DOE.
This table will contain all must-have and nice-to-have properties defined as targets inside the Lab Book Overview record.
For each trial in the results table, you can see:
When the desired trial is selected, the button Proceed to Formulating is enabled. Clicking on it will:
With the Starting Trial selected, the Workspace record will have:
If the Material Constraints table on the Lab Book Overview record is filled in, the Scan & Score will also be extended to search trials based on material constraints. Only the trials that have matching ingredients will be taken into account.
While performing Scan & Score, one score is calculated for the performance targets, and the score for how well the trial matches the material constraints is calculated separately. They contribute equally to the final score, which the final rank is based on.
At the end of every Workspace record, there is an option to select Scan All Trials. This will open a fullscreen modal for Best Performing Trial(s), displaying the top-performing trials (10 maximum) within the current Lab Book.
However, certain requirements must be met for this feature to be enabled:
The top of the modal displays three boxes:
Clicking Show more details displays a table that provides the total number of historical trials for each corresponding Must Have or Nice to Have property that, at a minimum, partially fulfill the trial rules and can be used as a starting dataset. This number will help users determine whether model training should be attempted or if they should proceed directly to DOE.
The next section of the modal displays a table of trials, in matching order, and contains:
Applicable properties for these matching trials are displayed in a second table that contains:
Once the Best Performing Trial is selected, two additional options are enabled:
🔐 Please discuss how to add this to your system with your CSM or Salesperson.
Alchemy DOE (Design of Experiments) is a powerful tool that addresses the situation when there is little to no historical data, preventing the system from running AI. It will help you extend your dataset in the most efficient manner possible (i.e., with the smallest, well-distributed, statistically optimal dataset) so that Alchemy can train models and run AI.
No prior experience with machine learning, data science, or statistics is required to use DOE. Any chemist or scientist will be able to input their formulating objectives and constraints to be guided in the most efficient manner to achieve their goals.
A traditional method for experimenting is One Variable at a Time (OVAT) in which one variable is varied, while all others are kept constant, and its influence is assessed on the target properties (Figure 1.1).
Design of Experiments is a method in which all variables are changed at the same time and their influence is assessed on the target properties. It proposes testing of the most informative locations in the design space which are focused towards the interpretability of the process and space, filling the design space (Figure 1.1).
In Table 1.1, OVAT and DOE experiments are compared based on different characteristics.
One Variable at a Time Design of Experiments
Yes
Yes
Better than OVAT
Better filled design space
Faster (with the help of predictive models)
Number of experiments increases fast with increasing the number of input variables
Better accuracy throughout different regions of design space
The main types of designs available in Alchemy are:
At the bottom of the Lab Book Overview record, you can find the Design Experiments button. It is possible to create designed experiments when:
Once the button is clicked, the Design Experiments modal will be opened with the following information:
Screening Design is intended to identify significant main effects from a list of many, potentially varied, materials.
*Note: The Pareto principle, also known as the 80/20 rule, is a theory that maintains 80 percent of the output of a process or system is determined by 20 percent of the input.
The goals of screening design are to reduce the size of the design space through:
The subtypes of screening designs available in Alchemy are:
From the Design Experiments modal, when the Screening Design is selected as a type and you click on the Generate Formulations button, a new Workspace record will be created. This action can take some time since Alchemy is potentially creating a lot of new theoretical and actual formulations. While they are being created, this message will be displayed in your Lab Book:
Once the Workspace record is created, it will have the following:
After performing the screening design, the effects of each input variable are assessed by statistically determining if an ingredient has significant or no effect on a specific performance characteristic. Based on this information, users should be able to reduce the number of input variables they want to vary, reducing the problem space and the required number of experiments for a more in-depth exploration of the design space for optimization purposes.
You will be able to run the analysis when all actual trials (samples) have entered values and all test results are entered. Then the Analyze Screening Design button below the Tests table will become available.
Clicking this button will create a detailed analysis table to help you better understand the problem space and try to decrease it, if possible, for the optimal or adaptive design. In this table, you can see the following:
Below the Analysis table, a copy of the Material Constraints table from the Lab Book Overview record will be displayed, giving you the possibility to reduce the number of varying materials and/or conditions from processing step constraints (the one with targets defined as “between”). Reducing this number is beneficial since you will be able to use different types of designed experiments, and, eventually, become able to get the recommended trials by Alchemy AI.
After the Analysis is created, at the bottom of the Workspace record, you can perform the following actions:
Optimal design is intended to fill the design space with experimental points. Because it requires more experiments to be performed, it is normally done if the problem space is initially small enough or if the problem space has been reduced by executing a screening design.
The goals of optimal design are to:
The subtypes of optimal design available in Alchemy are:
When you click on the Design Experiment button in the Lab Book Overview record, or Continue Design of Experiment in the Workspace record, if you have fewer than five varying material and/or conditions from processing step constraints (or you reduced them to fewer than five), the Optimal design will be the preselected type. The rest of the modal will look the same as displayed when Screening design was selected.
Similarly, when creating a Screening design Workspace record, once you decide to continue with the Optimal design, the new Workspace record will be created for you. This action can take some time since Alchemy potentially creates many new theoretical and actual formulations. While they are being created, this message will be displayed in your Lab Book:
Once the Workspace record is created, it will have the following:
At Alchemy, our machine learning (ML) models use input and output variables for training to recognize certain types of patterns and make predictions. Alchemy AI (Figure 4.1) uses two types of ML algorithms depending on the type of output variables:
Alchemy’s AutoML module consists of 13 different algorithms for regression and 10 different algorithms for classification, which are trained in parallel in order to significantly shorten the training duration.
A dataset consists of input and output variables.
Input variables are independent variables whose values are measured and input by the user. In Alchemy, input variables are:
Output variables are variables which depend on input variables. In Alchemy, output variables are:
A dataset for training AI in Alchemy is surfaced through Alchemy’s Scan and Score or Alchemy AI functionality. These tools give information about the:
The Show More Details button displays the number of available trials for each property separately.
Train AI in Alchemy consists of:
Hyperparameter tuning is a process which includes searching for the hyperparameters that will produce the highest performance of the models for each ML algorithm.
In Alchemy, performance of ML models are evaluated through repeated k-fold cross validation.
In k-fold cross validation, the dataset is split into k numbers of sets and each time one set is held back and models are trained with the remaining sets. Held back sets are used for performance estimation. This means that a total of k models are fit and evaluated based on the performance of mean value of held back sets. This process is repeated l times with different splits, depending on how large the dataset is. In addition, l ✕ k number of models are fitted with repeated k-fold cross validation for estimating the performance of ML models.
The process of model training is shown in Figure 4.2.
In Alchemy, selection of the best model is automatically made for each target property. Automatic selection of the best model consists of the following steps:
For automatically choosing the best model (Figure 4.3 and Figure 4.4), different performance metrics are used.
It is important that we track performance metrics to validate the models we generate in terms of the accuracy of predicted values.
First, a couple definitions:
Performance metrics for regression models available in Alchemy are:
1. Model accuracy [%]:
$$M A=100 \%-\left(\frac{R M S E}{\bar{y}} \times 100 \%\right)$$
${MA}$ - model accuracy
${RMSE}$ - root mean squared error
$\bar{y}$ - average of all actual values
2. R2 (coefficient of determination):
$$R^2=1-\frac{\sum_{i=1}^N\left(y_i-\hat{y_i}\right)^2}{\sum_{i=1}^N\left(y_i-\bar{y}\right)^2}$$
${N}$ - number of trials
$y_i$ - actual value
$\widehat{y_i}$ - predicted value
$\bar{y}$ - average of all actual values
3. MAE (mean absolute error):
$$M A E=\frac{1}{N} \sum_{i=1}^N\left|y_i-\widehat{y_i}\right|$$
${N}$ - number of trials
$y_i$ - actual value
$\widehat{y_i}$ - predicted value
4. RMSE (root mean squared error):
$$R M S E=\sqrt{\frac{1}{N} \sum_{i=1}^N\left(y_i-\widehat{y}_i\right)^2}$$
${N}$ - number of trials
$y_i$ - actual value
$\widehat{y_i}$ - predicted value
Performance metrics for classification models available in Alchemy are:
1. Accuracy:
$$Accuracy =\frac{\text { Number of correct predictions }}{\text { Total number of predictions }}$$
2. Average Precision:
$$Average Precision =\sum_{k=1}^N \operatorname{Precision}(k) \Delta \operatorname{Recall}(k)$$
${N}$ - number of trials
$Precision(k)$ - is the precision at a cutoff of k
$\Delta Recall(k)$ - is the change in recall that happened between cutoff k-1 and cutoff k
3. F1 Score:
- Precision: accuracy of positive predictions, which is the ratio of true positive predictions to the total number of positive predictions made by the model and
$$Precision_{classI}=\frac{TP_{classI}}{TP_{classI}+FP_{classI}}$$
${Precision_{classI}}$ - precision for one class, there are as many classes as there are predefined values
${TP_{classI}}$ - true positives for class I, number of trials which were predicted correct for class I (predicted class I matched the actual class I)
${FP_{classI}}$ - false positive for class I, number of trials which were predicted incorrect to belong to class I (predicted class I did not match the actual class)
- Recall: ratio of true positive predictions to the total number of actual positive instances in the dataset
$$Recall_{classI}=\frac{TP_{classI}}{TP_{classI}+FN_{classI}}$$
${FN_{classI}}$ - false negative for class I, number of trials which were predicted incorrect to belong to another class (predicted class did not match the actual class I)
$$F1score_{class~I}=\frac{2\times Precision_{class~I}\times Recall_{class~I}}{Precision_{class~I}+Recall_{class~I}}$$
4. ROC AUC Score (area under the receiver operating characteristic curve):
At Alchemy, we strive to achieve three goals when models are trained:
All predicted property values have associated predicted confidence intervals which will show how much deviation can be expected from the predicted property value for a certain trial.
The following supplemental material is used to support the above documentation.
Mixture Design, a subtype of Screening and Optimal Designs, is intended for experiments where the user is interested in the influence of varied dependent variables (weight or volume percentage of each material) on target properties.
Mixture design in Alchemy is automatically proposed when the user inputs wt% or vol% for each material and total batch size which will be equal for all experiments from the design
Required constraints for each material are:
Rules which need to be respected for properly setting the constraints:
Mixture Process Design, a subtype of Screening and Optimal Designs, is intended for experiments where the user is interested in the influence of varied dependent variables, materials (weight or volume percentage of each material) and processing steps, on target properties.
Mixture process design in Alchemy is automatically proposed when the user inputs wt% or vol% for each material and total batch size which will be equal for all experiments from the design and processing steps.
Required constraints for each material are:
Required constraints for each condition from processing step are:
Rules which need to be respected for properly setting the constraints:
Factorial Design, a subtype of Screening Design, is intended for experiments where the user is interested in the influence of varied independent variables, materials (values for weight and/or volume) and/or processing steps on target properties.
Factorial design is automatically proposed when the user inputs weight and/or volume for materials and/or processing steps.
Required constraints for each material are:
Required constraints for each condition from processing step are:
Constraints are valid without any rules.
Response Surface Design, a subtype of Optimal Design, is intended for experiments where the user is interested in the influence of varied independent variables, materials (values for weight and/or volume) and/or processing steps, on target properties.
Response surface design is automatically proposed when the user inputs weight and/or volume for materials.
Required constraints for each material are:
Required constraints for each condition from processing step are:
Constraints are valid without any rules.