release notes
Release 61.34
Release 61.33
Release 61.32
Release 61.31
Release 61.30
View More...
Release 61.34
Release 61.33
Release 61.32
Release 61.31
Release 61.30
View More...
announcements
What's coming in 2022 Q1?
What's coming in Q4?
What's coming in March?
What's coming in the remainder of Q1?
What's coming in January?
What's coming in December?
What's coming in November?
Embedded AIhow to's
MANUALS

User Manuals

  • General System Manual
  • ELN & LIMS - User Manual
  • LIMS - User Manual
  • AI & DOE - User Manual
  • Chemical Drawing Manual

Admin Manuals

  • Configuration Portal - System Admin Manual
  • Alchemy Scripting - System Admin Manual
  • Field Property Guidelines - System Admin Manual

Onboarding

  • Customer Implementation Manual

User Manuals

  • General System Manual
  • ELN & LIMS - User Manual
  • LIMS - User Manual
  • AI & DOE - User Manual
  • Chemical Drawing Manual

Admin Manuals

  • Configuration Portal - System Admin Manual
  • Alchemy Scripting - System Admin Manual
  • Field Property Guidelines - System Admin Manual

Onboarding

  • Customer Implementation Manual
DEVELOPERS
Integration API
Integration API
BETA

Send Feedback

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
send feedback

Send Feedback

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
sign in

AI DOE User Manual

  • Introduction
  • Scan & Score
    • 2.1 Scan & Score Setup
      • 2.2 How it Works
        • 2.2.1 Scanning
        • 2.2.1 Scoring
      • 2.3 Using Scan & Score
      • 2.4 Scan All Trials
      • 2.5 Scan & Score AI-Powered Statistical Analysis
        • 2.5.1 Summary Report
        • 2.5.2 Detailed Analysis
        • 2.5.3 How to Generate Report
    • DOE
      • 3.1 Background
      • 3.2 DOE Setup
      • 3.3 Screening Design
        • 3.3.1 Analyze Screening Design
      • 3.4 Optimal Design
    • Machine Learning
      • 4.1 Dataset
      • 4.2 Train AI
        • 4.2.1 Hyperparameter Tuning
        • 4.2.2 Automatic Selection
      • 4.3 Performance Metrics
        • 4.3.1 Regression
        • 4.3.2 Classification
      • 4.4 Trained Model Goals
    • Interchangeable Materials
      • 5.1 Creating an Interchangeable Material Group
      • 5.2 Using Interchangeable Materials with Other Features
        • 5.2.1 DOE and AI Training with Interchangeable Materials
        • 5.2.2 Scan & Score with Interchangeable Materials
    • Appendices
      • 6.1 Appendix A - Mixture Design
      • 6.2 Appendix B - Mixture Process Design
      • 6.3 Appendix C - Factorial Design
      • 6.4 Appendix D - Response Surface Design

    ** Reading the Manual on phones and smaller screen tablets is not recommended, since you may not enjoy the best user experience.

    🔐 AI & DOE are Licensed Products

    Please discuss how to add them to your system with your CSM or Salesperson.

    1. Introduction

    Alchemy’s AI handles data selection, data preprocessing, model training, and model tuning. It trains, in parallel, over 10,000 different models and hyperparameter combinations on AWS using more than 20 machine learning algorithms to find the best fit for specific datasets, ensuring strong performance on both existing and unseen data. Model selection is refined using advanced statistical methods to reduce bias and improve performance, especially with smaller datasets. This technique helps mitigate overfitting while enhancing generalization. The selected models are then applied in another genetic algorithm to generate formulation recommendations for data exploitation. This algorithm guides the formulation toward the desired properties and also predicts property values, providing prediction intervals that reflect the uncertainty of the predictions.

    Our AI-ready platform is comprised of three main components:

    • Scan & Score
    • Design of Experiments (DOE)
    • Machine Learning (ML)

    2. Scan & Score

    Alchemy’s Scan & Score functionality is able to:

    • Scan trials according to the defined requirements in the Lab Book Overview record 
    • Score the trials according to the priority and targets of the properties

    Scan & Score has two purposes:

    1. Rank trials from highest to lowest performance related to material constraints, as well as priorities and targets for each property 
    2. Surface the data set to be used in the auto-ML (machine learning) module

    2.1 Scan & Score Setup

    To enable Scan & Score functionality, Requirements on the Lab Book Overview record need to be defined. Available categories of requirements on the Lab Book Overview record include:

    • Calculations, or calculated properties, that are based on materials and their quantities and material properties.
      • Example: Cost calculations
    • Tests or measured properties
      • Measured properties - Results that have been measured in the laboratory.
        • Example: Viscosity at different temperatures; Hardness on different days
      • Calculated properties - Based on other measured properties. 
        • Example: Water pickup % at different time intervals

    After selecting the category of requirements, you then select the priority and fill in the target.

    • Priority:
      • Must Have
      • Nice to Have
      • No target, rate only
    • Targets:
      • Exact
      • Lower than
      • Higher than
      • Between
    Figure 2.1 Lab Book Overview - Requirements

    Constraints are theoretical constraints for each material that is used in a trial. Examples include any generic materials like Water or TiO2, as well as proprietary materials that can be used as ingredients in trials. Once selected, target values must be defined as:

    • Between 
    • Constant
    Figure 2.2 Lab Book Overview - Constraints

    2.2 How it Works

    After the requirements and constraints on the Lab Book Overview record have been entered, the Scan & Score button can be clicked from any of the following records:

    • Lab Book Overview - Scans all trials according to requirements from different Lab Books 
    • Workspace - Scans all trials from the Lab Book with a certain Workspace record
    • Workspace generated by Design of Experiments - Scans all trials from the Lab Book with a certain workspace with generated trials record

    2.2.1 Scanning

    Scanning (Figure 2.3) searches for a 1:1 match for properties, along with their conditions, from the Lab Book Overview record with the properties present on the trial level. This means that:

    • If on the trial level these properties and conditions meet the requirements set on the Lab Book Overview record, the corresponding trials will be surfaced in Scan & Score. 
    • If at least one condition on the trial level does not match with the conditions on the Lab Book Overview record, the trials will not be surfaced for Scan & Score.

    Trials from the previous step are further filtered based on material constraints. Trials will be surfaced and filtered for scoring based on material constraints on the Lab Book Overview record:

    • If all materials have constraints from lower value > 0 to higher value > 0:
      • Trials that contain all materials from the Lab Book Overview record
      • Trials that have all materials from the Lab Book Overview record, plus some other materials, as long as the other materials have an entered value of 0 or an empty field
    • If one or more materials have a between target value constraint from a lower value = 0 to a higher value > 0:
      • Trials which contain all materials from the Lab Book Overview record
      • Trials that have all materials from the Lab Book Overview record, plus some other materials, as long as the other materials have an entered value of 0 or an empty field 
      • Trials that have all materials from the Lab Book Overview record with constraints between value > 0 to value > 0 and without materials with constraints between 0 to value > 0
    Figure 2.3 Scanning Relevant Trials

    From Figure 2.3 it can be seen that scanning will surface the following trials for scoring:

    • Trial 1 - Full Match
      • All properties and material constraints are an exact match compared to the requirements on the Lab Book Overview record
    • Trial 2 - Partial Match
      • Exact match:
        • Calculated property I
        • Calculated property II
        • Measured property I
        • Material I 
        • Material II
      •  Partial match:
        • Measured property II - Condition II has a different value compared to the requirements on the Lab Book Overview record
    • Trial 3 - Partial Match
      • Exact match:
        • Calculated property I
        • Calculated property II
        • Measured property I
        • Material I 
        • Material II 
        • Material III
      • Missing:
        • Measured property II
    • Trial 5 - Full Match
      • It has Material IV, but its value is 0
      • It does not have Material III but based on the requirements on the Lab Book Overview page its value can be 0

    Scanning will not surface the following trials for scoring:

    • Trial 4 - It does not match either the calculations, nor the tests or the materials
    • Trial 6 - It has material IV with a value > 0 and that material is not in the requirements on the Lab Book Overview record

    Trials that have partial matches are surfaced and scored because they have at least one exact match for a property, along with its conditions. 

    2.2.2 Scoring

    Once the trials are scanned, every trial is assigned with a score based on priority and how far the values of the trials properties and materials are from the targets defined on the Lab Book Overview record. When the property is missing on a trial level, because the property is not present in the trial or condition is different, the system assigns missing values and gives them a higher score. 

    Trials are ranked from the lowest scores, or best performing trials, to the highest scores, or worst performing trials.

    2.3 Using Scan & Score

    At the bottom of the Lab Book Overview record is the Scan & Score button. This opens the Matching Trials component, where the system will find the best matching actual trials (Samples) from your historical database based on the given requirements, targets, and material constraints.

    For this button to be enabled, at least one requirement (measured or calculated) must be added to the Requirements table in the Lab Book Overview record that has a priority of Must Have or Nice to Have. Requirements that have a No target, rate only priority will not be included in the Scan & Score functionality.

    Figure 2.4 Lab Book Overview Record

    Once the Scan & Score button is clicked, the system navigates to a results page where up to ten of the best matching actual trials are displayed for your requirements.

    The top of the page displays three boxes:

    • Matching Trials
      • Displays the number of 100% matching trials for targets and constraints out of the listed results. 
    • Relevant Trials
      • Displays the number of partial match trials that contain some of the same targets and constraints out of the listed results.
    • Data for predictive models
      • Displays whether the available historical data is sufficient based on the number of given constraints and matching trials:
        • The data is deemed sufficient if the number of matching trials is higher than the number of varying constraints for each property with priority Must Have and Nice to Have
        • The data is deemed partial sufficient if the number of matching trials is higher than the number of varying constraints for only some properties with priority Must Have and Nice to Have
        • The data is deemed insufficient if the number of matching trials is lower than the number of varying constraints for each property with priority Must Have and Nice to Have
    Figure 2.5 Scan & Score Results

    Clicking Show more details displays a table that provides the total number of historical trials for each corresponding Must Have or Nice to Have property that, at a minimum, partially fulfill the trial rules and can be used as a starting dataset. This number will help users determine whether model training should be attempted or if they should proceed directly to DOE.

    Figure 2.6 Scan & Score - More Details

    This table will contain all must-have and nice-to-have properties defined as targets inside the Lab Book Overview record.

    • Each measured property, as well as calculated properties based on some other measured values, will display the number of relevant trials with associated test results for the given property. If the number of trials is above the calculated threshold, it should be green; if not, it should be red.
    • For each calculated property, the value is always N/A and colored green.

    For each trial in the results table, you can see:

    • Name - The formulation code which is a link to the corresponding trial
    • Values for all Must Have and Nice to Have targeted properties (calculated or measured)
      • If the target is met, the value is green
      • If the target is not met, the value is red
    • A radio button to select the trial you wish to use as the starting trial in a newly created Workspace record.

    When the desired trial is selected, the button Proceed to Formulating is enabled. Clicking on it will:

    • Close the results component
    • Create a new Workspace record with the preselected Starting Trial matching the one selected in the previous screen

    With the Starting Trial selected, the Workspace record will have:

    • The first initial Trial created as a copy of the Starting Trial
    • The Trial table pre-populated with the first trial information
    • Calculation and Testing tables will be predefined (pulling calculated and measured properties from the current Lab Book Overview record, but with no values)

    If the Material Constraints table on the Lab Book Overview record is filled in, the Scan & Score will also be extended to search trials based on material constraints. Only the trials that have matching ingredients will be taken into account.

    While performing Scan & Score, one score is calculated for the performance targets, and the score for how well the trial matches the material constraints is calculated separately. They contribute equally to the final score, which the final rank is based on.

    2.4 Scan All Trials

    At the end of every Workspace record, there is an option to select Scan All Trials. This will open a fullscreen modal for Best Performing Trial(s), displaying the top-performing trials (10 maximum) within the current Lab Book.

    Figure 2.7 Scan all Trials

    However, certain requirements must be met for this feature to be enabled: 

    • At least one requirement, measured or calculated, must exist within the Calculations or Tests sections with a priority of Must Have or Nice to Have.
    • Requirements must be pulled from the Lab Book Overview record in order for them to have the appropriate priorities. Adding requirements within the Workspace record will generate them as No target, rate only, excluding them from the Scan All Trials feature.

    ‍

    Figure 2.8 Scan all Trials - Results

    The top of the modal displays three boxes:

    • Matching Trials
      • Displays the number of 100% matching trials for targets and material constraints out of the listed results. 
    • Relevant Trials
      • Displays the number of partial match trials that contain some of the same targets and material constraints out of the listed results.
    • Data for predictive models
      • Displays whether the available historical data is sufficient based on the number of given material constraints and matching trials:
        • The data is deemed sufficient if the number of matching trials is higher than the number of varying material constraints.
        • If the number of matching trials is lower than the number of varying material constraints, the data is deemed insufficient.

    Clicking Show more details displays a table that provides the total number of historical trials for each corresponding Must Have or Nice to Have property that, at a minimum, partially fulfill the trial rules and can be used as a starting dataset. This number will help users determine whether model training should be attempted or if they should proceed directly to DOE.

    Figure 2.9 More Details

    The next section of the modal displays a table of trials, in matching order, and contains:

    • Formulation Code: Clickable link to the corresponding Sample record
    • Workspace record name 

    Applicable properties for these matching trials are displayed in a second table that contains:

    • Associated test results for the given property based on the listed trials, grouped by the Formulation Code.
      • Measured property cells will display green if the number of trials is above the calculated threshold. 
        • If the requirement is unmet, the cell will display red.
      • Calculated property cells will always have an N/A value and display as green.
    • The ability to mark each trial for testing. 
      • The selected trial will display in the Best Performing Trial dropdown to be used as a starting point in a newly created Workspace record.

    Once the Best Performing Trial is selected, two additional options are enabled:

    • Final Conclusion
      • Closes the results modal
      • Creates a Final Conclusion record with the Best Performing Trial auto-populated  based on the selection from the previous modal.
        • Note: This is only enabled if there is no Final Conclusion record in the current Lab Book.
    • Next Trial Group
      • Closes the results modal
      • Creates a new Workspace record with the Starting Requirements auto-populated based on the Best Performing Trial selection from the previous modal.

    2.5. Scan & Score AI-Powered Statistical Analysis

    Once your dataset has been processed on the Scan & Score Results page, you gain access to an automated, human-readable report that provides deep statistical insights into your data's quality and structure. This report is essential for confirming data readiness and understanding the potential of your dataset before initiating model training.

    2.5.1. Summary Report 

    This report provides a concise, human-readable summary of your dataset's statistical profile, its readiness for Machine Learning (ML), and specific recommended actions to maximize model performance.

    1. Executive Summary

    This section gives you the immediate, high-level status of your dataset.

    • Dataset Size: Reports the total number of experimental observations (rows) and the count of all predictor (input) variables.
    • Target Variables: Identifies the number of continuous output variables available for prediction.
    • Overall AI Readiness Score: Provides a final score out of 100, categorized (e.g., Fair), to quickly assess the dataset's current suitability for optimal AI/ML use.
    • Key Data Concerns: Summarizes the most significant issues found, including the number of potential outliers detected and the percentage of observations affected.
    • Key Feature Concerns: Highlights the number of input variables flagged for constant or low variance, which are typically uninformative for prediction.
    • Model Performance Snapshot: Provides a quick measure of predictive capability for one of the primary targets (e.g., R2 value).
    1. Recommended Actions 

    Advises users to address data quality issues before proceeding to model training to ensure more reliable and accurate predictions.

    1. Detailed Analysis Summary

    This section provides the breakdown of the AI Readiness Score and advanced statistical findings. The overall score is derived from five components, each evaluated out of 100: 

    1. Data Sufficiency Assessment: Reflects whether the volume of data is adequate relative to the number of features. A higher score means less risk of overfitting. Key Metric: Samples per feature per target.
    2. Data Quality Assessment: Measures data issues like outlier presence and unused variables. A low score suggests significant data cleaning is needed.
    3. Feature Quality Assessment: Measures feature relevance by checking for issues like low-variance inputs. A high score indicates more informative, relevant features.
    4. Statistical Validity Assessment: Assesses how well the data and model residuals meet core statistical assumptions (like normality and homoscedasticity). Low scores warn that robust or non-linear modeling may be necessary.

    Model Performance Assessment: Reflects the predictive capability of an initial model on the dataset (e.g., R2 score), indicating if simple models can capture the main relationships.  

    1. Additional Analysis Details

    The report also includes Statistical and Design Insights confirming Sample Size Adequacy, reporting Design Efficiency (G-Efficiency), listing Key Predictors based on significance and correlation, and providing Residual Diagnostics to ensure the reliability of the chosen model.

    2.5.2. Detailed Analysis 

    The detailed analysis is organized into the following eight comprehensive sections:

    1. Dataset Overview & Data Availability

    This section provides a high-level summary of the dataset's composition and completeness.

    • Dataset Dimensions: Confirms the total number of observations (rows) and the number of variables (columns).
    • Variable Counts: Clearly distinguishes between the count of Input Variables (predictors) and Target Variables (outputs).
    • Data Availability: Measures the overall completeness of the dataset by reporting the percentage of non-missing values.

    2. Variable Classification & Structure

    This section confirms how the system interpreted and categorized each variable, which is fundamental for accurate modeling.

    • Variable Classification: Confirms the automatic categorization of variables as either Categorical (discrete groups) or Continuous (numerical range).
    • Variable Identification: Clearly separates and lists variables based on their role: Input Variables (the predictors) versus Target Variables (the measured outcomes).

    3. Dataset Quality Assessment

    This critical section evaluates the robustness and quality of the data for modeling, flagging common issues that can affect model performance.

    • Target Coverage: Assesses the dataset's density by reporting the ratio of observations to input variables.
    • Target Variation: For continuous targets, the Coefficient of Variation is reported to ensure there is enough variation in the outcome for the model to learn meaningful relationships.
    • Low Variance Detection: Identifies and flags variables that are constant or show minimal variation, as these are often uninformative for prediction.
    • Multicollinearity: Detects highly correlated pairs of input variables, indicating potential redundancy or instability in a linear model.
    • Normality Testing: Evaluates the statistical distribution of variables to assess whether they meet the assumptions required by various modeling techniques.
    • Outlier Detection: Identifies and flags extreme values within the dataset that may unduly influence model results.
    • Variable Representation: Reports the percentage of trials that actually utilize each input variable, highlighting underutilized materials or conditions.

    4. Experimental Design Efficiency

    This section is crucial when analyzing data from Design of Experiments (DOE), providing metrics to evaluate the quality of the design itself.

    • Design Quality Metrics: Provides core metrics for evaluating the overall effectiveness and balance of the experimental design.
    • G-efficiency Calculations: Reports the G-efficiency score, which measures the precision of parameter estimates.
    • Distance-Based Criteria: Metrics related to how well observations are spread across the experimental space.
    • Discrepancy Criteria: Metrics that quantify the non-uniformity or gaps in the design space.

    5. Cluster Analysis

    This section performs unsupervised learning to reveal inherent groupings within your data that may not be immediately obvious.

    • Pattern Detection: Identifies and visualizes natural groupings or segments (clusters) within the observations.
    • Observation Grouping: Provides information on how similar observations were grouped together, offering insight into underlying similarities across experiments or formulations.

    6. Feature Importance Analysis

    This section moves beyond simple correlations to explain why and how much each input variable influences the target variable.

    • Correlation Analysis: Reports the linear relationship between each input variable and the target variable.
    • SHAP Analysis (SHapley Additive exPlanations): Uses advanced, model-agnostic techniques to explain the contribution of each feature to the model's predictions.
    • Variable Importance Rankings: Provides a final, ranked list of input variables based on their overall significance to the prediction task.

    7. Regression Model Results

    This section summarizes the performance of the predictive models automatically generated by the system.

    • Best Model Selection: Identifies the optimal predictive model chosen based on internal scoring and cross-validation performance.
    • Model Performance Metrics: Reports standard metrics for the selected model, such as R² (explained variance) and RMSE (Root Mean Square Error).
    • Significant Variables and Coefficients: Lists the key predictor variables and their associated coefficients, detailing their direction and magnitude of impact on the target.
    • Model Comparison Statistics: Provides statistics used to compare the performance and complexity across various candidate models tested.

    8. Residual Analysis

    This final section validates the fundamental statistical assumptions of the selected model, confirming the reliability of its results.

    • Model Assumption Validation: Verifies the underlying assumptions necessary for the model's conclusions to be reliable.
    • Normality of Residuals: Assesses whether the errors (residuals) are normally distributed.
    • Homoscedasticity: Checks for constant variance of the residuals, ensuring that prediction errors are consistent across the full range of fitted values.
    • Independence: Checks for autocorrelation, ensuring that the prediction error for one observation doesn't influence the error for the next.
    • Influential Observations and Outliers: Identifies individual data points that have a disproportionate impact on the model's coefficients.
    • Diagnostic Plots: Provides essential visualizations, such as Q-Q plots and residuals vs. fitted values plots, for visual inspection of model diagnostics. 

    2.5.3. How to Generate Report

    To successfully generate a meaningful statistical report, the system requires a complete definition of the variables you wish to analyze. You must define all necessary inputs (materials/processing steps) and outputs (Calculations/Tests) in the Lab Book Overview before proceeding (Check section  2.1. Scan & Score Setup).

    Required Data Input: Constraints Table

    • It is crucial that you add Material and Processing Step variables to the Constraints table.
    • Why it's necessary: The statistical analysis assesses the relationship between the Input Variables (your materials and processing steps, defined here) and the Output Variables (your test results and calculations, defined below). Without inputs, the report cannot evaluate predictive relationships.
    • System Check: If you have not included any materials or processing steps in the Constraints table, the Generate Report button will be disabled, and a tooltip will indicate the missing requirements.

    Required Data Output: Tests and Calculations Tables

    You must also define your desired output variables:

    • Add all relevant Tests to the Tests table.
    • Add all required Calculations to the Calculations table.

    Steps to Generate the Report

    1. Run Scan & Score: After ensuring all requirements (Inputs, Tests, and Calculations) are defined in the Lab Book Overview, click the SCAN & SCORE or SCAN & SCORE THIS LAB BOOK button (2.3 Using Scan & Score).
    2. Generate Report: If the system confirms that you have enough data for analysis, the GENERATE REPORT button will become active (Figure 2.10.). Click this button.

    Note: This process initiates the deeper statistical analysis needed for the report.

    Figure. 2.10. GENERATE REPORT button on Scan & Score results page
    1. Monitor Report Generation: A notification will confirm that the report generation process has started (Figure 2.11 -2.13).
    Figure 2.11. Generation of the report successfully started 
    Figure 2.12. Progress on the report generation
    Figure 2.13. Generation of the report successfully finished
    1. View the Report: Once the process is complete, the notification will update. Click View the Report to open the full analysis.

    Report Access and Download

    The generated report is presented in two tabs:

    • Tab 1 (Figure 2.14.): Summary Report: Provides the high-level Executive Summary, key findings, and recommended actions.
    Figure 2.14. Page of the Summary report
    • Tab 2 (Figure 2.15.): Detailed Analysis: Contains the in-depth statistical breakdowns, feature quality assessments, and model diagnostics.
    Figure 2.15. Page of the Detailed Analysis 
    • Both the Summary Report and the Detailed Analysis can be downloaded as a single PDF document for offline review and sharing.

    3. DOE

    🔐 Please discuss how to add this to your system with your CSM or Salesperson.

    Alchemy DOE (Design of Experiments) is a powerful tool that addresses the situation when there is little to no historical data, preventing the system from running AI. It will help you extend your dataset in the most efficient manner possible (i.e., with the smallest, well-distributed, statistically optimal dataset) so that Alchemy can train models and run AI.

    No prior experience with machine learning, data science, or statistics is required to use DOE. Any chemist or scientist will be able to input their formulating objectives and constraints to  be guided in the most efficient manner to achieve their goals.

    3.1 Background

    A traditional method for experimenting is One Variable at a Time (OVAT) in which one variable is varied, while all others are kept constant, and its influence is assessed on the target properties (Figure 1.1). 

    Design of Experiments is a method in which all variables are changed at the same time and their influence is assessed on the target properties. It proposes testing of the most informative locations in the design space which are focused towards the interpretability of the process and space, filling the design space (Figure 1.1). 

    In Table 1.1, OVAT and DOE experiments are compared based on different characteristics.

    ‍

          One Variable at a Time                   Design of Experiments

    Figure 1.1 Graphical Representation of One Variable at a Time and Design of Experiments
    OVAT
    DOE
    Main effects
    Yes

    Yes

    Interaction effects
    No

    Yes

    Design space exploration
    Partly

    Better than OVAT

    Design space filling
    A lot of gaps in design space

    Better filled design space

    Time to achieve optimal values
    Longer (in most cases not achieved)

    Faster (with the help of predictive models)

    Number of experiments
    A lot (several levels are tested for each material)

    Number of experiments increases fast with increasing the number of input variables

    Predictions
    In certain region accurate in another very inaccurate

    Better accuracy throughout different regions of design space

    Table 1.1 Comparison of OVAT and DOE

    The main types of designs available in Alchemy are:

    • Screening Design 
    • Optimal Design

    3.2 DOE Setup

    Figure 3.2 Design Experiments

    At the bottom of the Lab Book Overview record, you can find the Design Experiments button. It is possible to create designed experiments when:

    • The AI capability is turned on for a tenant
    • The Study Overview is valid
    • There is at least one measured targeted property
    • There are at least two, but a maximum of 20, materials and/or conditions of a processing step with constraints with a target value of between
      • There can also be an unlimited number of materials and/or conditions of a processing step with constraints with a target value of constant.
    • Units for materials are chosen in formulating input settings and can be:
      • Weight units - No formulating rules required
      • Percent units - The following formulating rules a required:
        • The total for lower bound and constant constraints is less than 100% 
        • The total for higher bound and constant constraints is more than 100%

    Once the button is clicked, the Design Experiments modal will be opened with the following information:

    Figure 3.3 Design Experiments Modal
    • Design Experiment Type:
      • Screening - Usually the first type of design to be employed when there is little to no historical data. The screening design aims to “screen” the input variables and determine which ones significantly impact the target properties versus those that have low or no impact. It will help you extend your dataset most efficiently (i.e., with the smallest, well-distributed, statistically optimal dataset).
      • Optimal - A type of design that is performed to gain a deeper understanding of a problem space. Because it requires more experiments to be performed, it is normally done if the problem space is small enough initially or if the problem space has been reduced by executing a screening design.
      • Adaptive - A type of design intended to be used for augmenting the available dataset (historical dataset and/or designed experiments) with the experiments from the regions of design space with the highest uncertainty.
    • The number of formulations that will be created for you. The system will inform you about the minimum and maximum number of trials, and it is up to you to choose how many trials to perform.

    3.3 Screening Design

    Screening Design is intended to identify significant main effects from a list of many, potentially varied, materials.

    • It is automatically proposed for more than five varied variables (materials and/or conditions from processing steps).
    • A minimum number of experiments related to the materials and/or conditions from processing steps must be tested to identify which input variables are relevant for the given properties.
    • Analysis of the main effects is available after all experiments are tested, with the contribution of each material and/or conditions from processing steps to the tested property and significance according to the Pareto principle (the 80/20 rule*).
    • After the analysis in the trial table, users are advised to change constraints for materials and/or conditions from processing steps in accordance with the contribution of each to the tested property.

    *Note: The Pareto principle, also known as the 80/20 rule, is a theory that maintains 80 percent of the output of a process or system is determined by 20 percent of the input.

    ‍

    The goals of screening design are to reduce the size of the design space through:

    • Reducing the number of varied materials and/or conditions from processing steps. In further experiments only significant variables should be varied while insignificant ones should be kept constant.
    • Narrowing the constraints range.

    The subtypes of screening designs available in Alchemy are:

    • Mixture Design
    • Mixture Process Design
    • Factorial Design

    From the Design Experiments modal, when the Screening Design is selected as a type and you click on the Generate Formulations button, a new Workspace record will be created. This action can take some time since Alchemy is potentially creating a lot of new theoretical and actual formulations. While they are being created, this message will be displayed in your Lab Book:

    Figure 3.4 Waiting Message - Screening Design

    Once the Workspace record is created, it will have the following:

    • The corresponding number of theoretical trials are available in the record, as defined in the Design Experiments modal. Values inside these theoretical trials are non-editable since you must create them exactly as described in order to gain valuable insights.
    • For each theoretical trial, an actual trial will be created as well, for you to log the actual values measured for each material in the trial.
    • Trials cannot be deleted or added to the Workspace record.
    Figure 3.5 Workspace - Screening Design

    3.3.1 Analyze Screening Design

    After performing the screening design, the effects of each input variable are assessed by statistically determining if an ingredient has significant or no effect on a specific performance characteristic. Based on this information, users should be able to reduce the number of input variables they want to vary, reducing the problem space and the required number of experiments for a more in-depth exploration of the design space for optimization purposes. 

    You will be able to run the analysis when all actual trials (samples) have entered values and all test results are entered. Then the Analyze Screening Design button below the Tests table will become available.

    Figure 3.6 Analyze Screening Design

    ‍

    Clicking this button will create a detailed analysis table to help you better understand the problem space and try to decrease it, if possible, for the optimal or adaptive design. In this table, you can see the following:

    • The list of all materials and conditions from processing steps, with their defined constraints
    • For each targeted property, how increasing the value of each material and/or condition from processing steps will impact that property:
      • ↑ means that increasing the value of material or condition from processing steps will increase the property value
      • ↓ means that increasing the value of material or condition from processing steps will decrease the property value
      • * means that material or condition from processing steps has a significant impact on changing the property value
    Figure 3.7 Analysis Table

    Below the Analysis table, a copy of the Material Constraints table from the Lab Book Overview record will be displayed, giving you the possibility to reduce the number of varying materials and/or conditions from processing step constraints (the one with targets defined as “between”). Reducing this number is beneficial since you will be able to use different types of designed experiments, and, eventually, become able to get the recommended trials by Alchemy AI. 

    Figure 3.8 Material Constraints Table

    After the Analysis is created, at the bottom of the Workspace record, you can perform the following actions:

    • Continue Design of Experiments - it will open the same Design Experiments modal from the Lab Book Overview record 
    • Use Alchemy AI
    • Final conclusion
    • Next trial group
    • Scan all trials

    3.4 Optimal Design

    Optimal design is intended to fill the design space with experimental points. Because it requires more experiments to be performed, it is normally done if the problem space is initially small enough or if the problem space has been reduced by executing a screening design.

    • It is automatically proposed for five or fewer varied materials and/or conditions from processing steps, or users can choose it after completing the screening design.
    • A higher number of experiments than for screening are needed.
    • After all experiments are tested, the performance of predictive models is shown.

    The goals of optimal design are to:

    • Get the high performance of the predictive models. 
    • Recommend experiments with optimal values for target properties.

    The subtypes of optimal design available in Alchemy are:

    • Mixture Design
    • Mixture Process Design
    • Response Surface Design

    When you click on the Design Experiment button in the Lab Book Overview record, or Continue Design of Experiment in the Workspace record, if you have fewer than five varying material and/or conditions from processing step constraints (or you reduced them to fewer than five), the Optimal design will be the preselected type. The rest of the modal will look the same as displayed when Screening design was selected.

    Similarly, when creating a Screening design Workspace record, once you decide to continue with the Optimal design, the new Workspace record will be created for you. This action can take some time since Alchemy potentially creates many new theoretical and actual formulations. While they are being created, this message will be displayed in your Lab Book:

    Figure 3.9 Waiting Message - Optimal Design

    Once the Workspace record is created, it will have the following:

    • The corresponding number of theoretical trials, as defined in the Design Experiments modal. These trials are non-editable since you must create them exactly as described in order to gain valuable knowledge.
    • For each theoretical trial, an actual trial will be created as well, for you to log the actual values added to the trial.
    • Trials cannot be deleted or added to the Workspace record.
    Figure 3.10 Workspace - Optimal Design

    4. Machine Learning

    At Alchemy, our machine learning (ML) models use input and output variables for training to recognize certain types of patterns and make predictions. Alchemy AI (Figure 4.1) uses two types of ML algorithms depending on the type of output variables:

    • Regression machine learning algorithms - For training models with continuous numerical output variables.
    • Classification machine learning algorithms - For training models with predefined (categorical) output variables. 

    Alchemy’s AutoML module consists of 13 different algorithms for regression and 10 different algorithms for classification, which are trained in parallel in order to significantly shorten the training duration.

    Figure 4.1 Alchemy AI Diagram

    4.1 Dataset

    A dataset consists of input and output variables.

    Input variables are independent variables whose values are measured and input by the user. In Alchemy, input variables are:

    • Materials (e.g. resin A, solvent B)
    • Processing steps (e.g. temperature, mixing speed)

    Output variables are variables which depend on input variables. In Alchemy, output variables are:

    • Numerical properties (continuous numerical values, e.g. Viscosity, Drying time)
    • Predefined numerical properties (numerical categorical values)
    • Predefined alphanumeric properties (text or numerical categorical values, e.g. pass/fail)

    A dataset for training AI in Alchemy is surfaced through Alchemy’s Scan and Score or Alchemy AI functionality. These tools give information about the:

    • Number of matching trials with respect to the requirements added in the material constraints table on the Lab Book Overview record
    • Number of relevant trials with respect to the requirements added in the test table on the Lab Book Overview record
    • Whether it is possible to train ML models based on the available dataset

    The Show More Details button displays the number of available trials for each property separately.

    4.2 Train AI

    Train AI in Alchemy consists of:

    • Hyperparameter Tuning
    • Automatic Selection

    4.2.1 Hyperparameter Tuning

    Hyperparameter tuning is a process which includes searching for the hyperparameters that will produce the highest performance of the models for each ML algorithm.

    In Alchemy, performance of ML models are evaluated through repeated k-fold cross validation.

    In k-fold cross validation, the dataset is split into k numbers of sets and each time one set is held back and models are trained with the remaining sets. Held back sets are used for performance estimation. This means that a total of k models are fit and evaluated based on the performance of mean value of held back sets. This process is repeated l times with different splits, depending on how large the dataset is. In addition, l ✕ k number of models are fitted with repeated k-fold cross validation for estimating the performance of ML models.

    The process of model training is shown in Figure 4.2. 

    Figure 4.2 Process of Model Training

    4.2.2 Automatic Selection 

    In Alchemy, selection of the best model is automatically made for each target property. Automatic selection of the best model consists of the following steps:

    1. Get the best performing model from all models for the same algorithm, but with different hyperparameters
    2. Get the best model from all models from step 1, one model for each algorithm for regression (13) and/or one model for each algorithm for classification (10)

    For automatically choosing the best model (Figure 4.3 and Figure 4.4), different performance metrics are used.

    • Regression Models: combined performance metric which relies on mean absolute error (MAE) and root mean square error (RMSE)
    • Classification Models:
      • If the predefined output values are balanced throughout the dataset, the system will choose accuracy for finding the best models 
      • If the predefined values are imbalanced throughout the dataset, the system will choose average precision for finding the best models   
    Figure 4.3 Automatic Selection of the Best Model for Regression Algorithms
    ‍
    Figure 4.4 Automatic Selection of the Best Model for Classification Algorithms

    4.3 Performance Metrics

    It is important that we track performance metrics to validate the models we generate in terms of the accuracy of predicted values. 

    First, a couple definitions:

    • Performance Metric: Validation of the model in terms of actual and predicted values. 
    • Actual Value: The test result achieved for the property of a certain trial based on actual testing.
    • Predicted Value: The test result which is predicted from machine learning models for the property of a certain trial.

    4.3.1 Regression

    Performance metrics for regression models available in Alchemy are:


    1. Model accuracy [%]
    : 

    • metric based on scatter index

    $$M A=100 \%-\left(\frac{R M S E}{\bar{y}} \times 100 \%\right)$$

    ${MA}$ - model accuracy

    ${RMSE}$ - root mean squared error

    $\bar{y}$ - average of all actual values

    • the metric ranges from 0 to 100%, where a higher value indicates better accuracy of the model 


    2. R2 (coefficient of determination):
     

    • measure of the proportion of variance in the dependent variable that is predictable from the independent variables

    $$R^2=1-\frac{\sum_{i=1}^N\left(y_i-\hat{y_i}\right)^2}{\sum_{i=1}^N\left(y_i-\bar{y}\right)^2}$$

    ${N}$ - number of trials

    $y_i$ - actual value

    $\widehat{y_i}$ - predicted value

    $\bar{y}$  - average of all actual values

    • the metric ranges from -∞ to 1, where a higher value indicates better model (value of 1 indicates perfect model)


    3. MAE (mean absolute error):

    • measure of the average magnitude of the differences between the prediction and the actual values for target property

    $$M A E=\frac{1}{N} \sum_{i=1}^N\left|y_i-\widehat{y_i}\right|$$

    ${N}$ - number of trials

    $y_i$ - actual value

    $\widehat{y_i}$ - predicted value

    • the metric ranges from 0 to +∞, where a lower value indicates better model (value of 0 indicates perfect model)


    4. RMSE (root mean squared error)
    :

    • measure of the square root of average magnitude of differences between predicted and actual values for target property

    $$R M S E=\sqrt{\frac{1}{N} \sum_{i=1}^N\left(y_i-\widehat{y}_i\right)^2}$$

    ${N}$ - number of trials

    $y_i$ - actual value

    $\widehat{y_i}$ - predicted value

    • the metric ranges from 0 to +∞, where a lower value indicates better model (value of 0 indicates perfect model)

    4.3.2 Classification

    Performance metrics for classification models available in Alchemy are:
    ‍

    1. Accuracy: 

    • ratio of correctly predicted instances to the total number of instances in the dataset

    $$Accuracy =\frac{\text { Number of correct predictions }}{\text { Total number of predictions }}$$

    • the metric ranges from 0 to 1, where a higher value indicates better model (value of 1 indicates perfect model)


    2. Average Precision
    : 

    • area under the precision-recall curve which quantifies the model's ability to make accurate positive predictions

    $$Average Precision =\sum_{k=1}^N \operatorname{Precision}(k) \Delta \operatorname{Recall}(k)$$

    ${N}$ - number of trials

    $Precision(k)$ - is the precision at a cutoff of k

    $\Delta Recall(k)$ - is the change in recall that happened between cutoff k-1 and cutoff k

    • the metric ranges from 0 to 1 where a higher value indicates better model (value of 1 indicates perfect model)


    3. F1 Score
    : 

    • Harmonic mean of: 

    - Precision: accuracy of positive predictions, which is the ratio of true positive predictions to the total number of positive predictions made by the model and

    $$Precision_{classI}=\frac{TP_{classI}}{TP_{classI}+FP_{classI}}$$

    ${Precision_{classI}}$ - precision for one class, there are as many classes as there are predefined values

    ${TP_{classI}}$ - true positives for class I, number of trials which were predicted correct for class I (predicted class I matched the actual class I)

    ${FP_{classI}}$ - false positive for class I, number of trials which were predicted incorrect to belong to class I (predicted class I did not match the actual class)

    - Recall: ratio of true positive predictions to the total number of actual positive instances in the dataset 

    $$Recall_{classI}=\frac{TP_{classI}}{TP_{classI}+FN_{classI}}$$

    ${FN_{classI}}$ - false negative for class I, number of trials which were predicted incorrect to belong to another class (predicted class did not match the actual class I)

    $$F1score_{class~I}=\frac{2\times Precision_{class~I}\times Recall_{class~I}}{Precision_{class~I}+Recall_{class~I}}$$

    • the metric ranges from 0 to 1, where a higher value indicates a well-balanced performance, demonstrating that the model can concurrently attain high precision and high recall (value of 1 indicates perfect model which accurately predicts each class)
      ‍

    4. ROC AUC Score (area under the receiver operating characteristic curve): 

    • evaluation of a model's ability to discriminate between positive and negative instances 
    • the metric ranges from 0 to 1, where a higher value indicates better discrimination between positive and negative instances

    4.4 Trained Model Goals

    At Alchemy, we strive to achieve three goals when models are trained:

    1. Recommend trials with the optimal test results for all target properties
    2. Predict target property test result for trials with input variables defined by the user 
    3. Get insights to the input variable’s importance

    All predicted property values have associated predicted confidence intervals which will show how much deviation can be expected from the predicted property value for a certain trial.

    5. Interchangeable Materials

    In formulation development, you often work with a set of similar ingredients that can be used in place of one another. For example, you might have several solvents that are functionally equivalent but have different costs or properties. Managing these substitutions can be complex, especially when designing large sets of experiments.

    The Interchangeable Materials feature is designed to solve this problem by giving you more flexibility and control over your formulations. It allows you to group these similar materials and define clear rules for how they can be substituted for one another in your experiments.

    With this feature, you can:

    • Group similar materials: Create named groups of ingredients (e.g., "Resin", "Additive") based on their material type.
    • Define substitution rules: Specify exactly how many materials from a group can be included in a single trial. For instance, you could require that exactly one solvent from a group is used, or a range of one to three different solvents.
    • Combine with other constraints: These interchangeable groups can be used alongside other materials and processing steps in your experimental designs.

    Using this feature impacts how the system generates a Design of Experiments (DOE), scores existing trials in Scan & Score, and makes AI recommendations, as all functions will adhere to the rules you define.

    5. 1. Creating an Interchangeable Material Group

    Follow the steps to add an Interchangeable Material Group to your formulation table:

    1. To begin, navigate to the Lab Book Overview page. At the bottom of the Constraints table, you will find a button labeled +INTERCHANGEABLE MATERIALS (Figure 5.1). Once you click the button, the "Interchangeable Materials" modal will appear. Here, you will define the materials and rules for your group.
    Figure 5.1. Button for adding Interchangeable Materials Group
    1. Define the Group: 
      • Give your group a unique Group name (Figure 5.2.).
      • Select a material Type and, optionally, one or more Subtypes. This will filter the list of materials you can add to the group in a later step (Figure 5.2.).
    Figure 5.2. The Interchangeable Materials Modal set Group name and select Type and optionally Subtype
    1. Set the Constraints: This is the core of the feature, where you define the rules for the group.
    • First, choose how you want to Define Constraints On by selecting either Group or Material from the dropdown (Figure 5.3.).
    Figure 5.3. The Interchangeable Materials Modal Defining Constraints
    • If you chose "Group":
      • Set the Group target weight (Figure 5.4). You can select "Constant" for an exact value or "Between" for a range. The system will automatically calculate the allowed range for each individual material.
    Figure 5.4. The Interchangeable Materials Modal Constraints Defined on a Group level
    • If you chose "Material":
      • The Group target weight fields will become read-only. You will define the constraints for each material in the table below, and the system will automatically calculate the resulting total for the group.
    Figure 5.5. The Interchangeable Materials Modal Constraints Defined on a Material level
    1. Add Materials to the Group:
      • Click the "+ Material" button.
      • A dialog will appear showing all available materials based on the Type and Subtype you selected earlier.
      • Select the materials you want to include in the group and click "Add".
      • The selected materials will now appear in a table within the main dialog (Figure 5.6-5.8).
    Figure 5.6. The Interchangeable Materials modal, showing constraints defined at the group level, including a constant target weight and the added materials.
    Figure 5.7. The Interchangeable Materials modal, showing constraints defined at the group level, including a between target weight and the added materials.
    1. Configure Individual Material Constraints (if applicable):
      • If you chose to define constraints at the Material level in step 3, you can now set the Target Weight for each material in the table. You can set each one to be "Constant" or "Between"  and enter the desired values (Figure 5.8).
    Figure 5.8. The Interchangeable Materials modal, showing constraints defined at the material level, added materials with between or constant target weight.
    1. Set the Number of Materials:
      • At the bottom of the dialog, use the "# of Materials" dropdowns to define the minimum and maximum number of materials from this group that must be used in any given trial. For example, if you have 3 materials in the group, you could specify that each trial must use between 1 and 2 of them.
    Figure 5.9. The Interchangeable Materials Modal # of Materials to be used with defined constraints on group level.
    Figure 5.10. The Interchangeable Materials Modal # of Materials to be used with defined constraints on material level
    1. Add the Group to the Constraints table in Lab Book Overview:
      • Once you're satisfied with your configuration, click the "Add" button. The Interchangeable Material Group will be added as a new entry in your Constraints table on Lab Book Overview.
    Figure 5.11. The Interchangeable Materials Group added into Constraints table in Lab Book Overview

    5.2. Using Interchangeable Materials with Other Features

    The rules you define for interchangeable materials directly guide how the platform generates a Design of Experiments (DOE), scores trials in Scan & Score, and creates AI recommendations.

    5.2.1. DOE and AI Training with Interchangeable Materials

    In the current version the ability to use the DESIGN EXPERIMENTS and USE ALCHEMY AI buttons depends on the units you select for your materials.

    • Disabled: If you have an interchangeable group and select Target Weight [%] or Volume [%], these buttons will be disabled. A tooltip will explain that you must use absolute units.
    • Enabled: If you use absolute units like Weight or Volume, these buttons will be enabled, allowing you to proceed.

    When an interchangeable material group is added, DESIGN EXPERIMENTS and ALCHEMY AI are adjusted to incorporate the formulating strategy.

    5.2.2. Scan & Score with Interchangeable Materials

    When you use Scan & Score, the system will only show trials that follow the rules you defined for your interchangeable materials. This includes:

    • Groups which do not contain 0 in their constraints must be present in the trials
    • Trials must contain the correct number of materials from each interchangeable group (e.g., if you set a range of 1-2, only trials with 1 or 2 of those materials will be shown).

    The values in the Scan & Score table are color-coded to indicate if they meet the constraints you set. Green means the value is within the defined range while red means it violates the rules.

    6. Appendices

    The following supplemental material is used to support the above documentation.

    6.1 Appendix A - Mixture Design

    Mixture Design, a subtype of Screening and Optimal Designs, is intended for experiments where the user is interested in the influence of varied dependent variables (weight or volume percentage of each material) on target properties. 

    Mixture design in Alchemy is automatically proposed when the user inputs wt% or vol% for each material and total batch size which will be equal for all experiments from the design

    Required constraints for each material are:

    • Lower and upper bound for varied materials: 
    • Constant for non varied materials (influence of this materials cannot be assessed)

    Rules which need to be respected for properly setting the constraints:

    • Sum of all lower bounds and constant materials should be lower than 100%
    • Sum of all higher bounds and constant materials should be higher than 100%

    6.2 Appendix B - Mixture Process Design

    Mixture Process Design, a subtype of Screening and Optimal Designs, is intended for experiments where the user is interested in the influence of varied dependent variables, materials (weight or volume percentage of each material) and processing steps, on target properties. 

    Mixture process design in Alchemy is automatically proposed when the user inputs wt% or vol% for each material and total batch size which will be equal for all experiments from the design and processing steps. 

    Required constraints for each material are:

    • Lower and upper bound for varied materials or
    • Constant for non varied materials (influence of this materials cannot be assessed)

    Required constraints for each condition from processing step are:

    • Lower and upper bound for conditions of number type
    • Any of for conditions of predefined number or text (alphanumeric) type
    • Set to for conditions of predefined number or text (alphanumeric) type which are not varied and
    • Constant for non varied conditions  (influence of this conditions of processing steps cannot be assessed)

    Rules which need to be respected for properly setting the constraints:

    • Sum of all lower bounds and constant materials should be lower than 100%
    • Sum of all higher bounds and constant materials should be higher than 100%

    6.3 Appendix C - Factorial Design

    Factorial Design, a subtype of Screening Design, is intended for experiments where the user is interested in the influence of varied independent variables, materials (values for weight and/or volume) and/or processing steps on target properties. 

    Factorial design is automatically proposed when the user inputs weight and/or volume for materials and/or processing steps. 

    Required constraints for each material are:

    • Lower and upper bound or
    • Constant 

    Required constraints for each condition from processing step are:

    • Lower and upper bound - for conditions of number type
    • Any of - for conditions of predefined number or text (alphanumeric) type
    • Set to - for conditions of predefined number or text (alphanumeric) type which are not varied and
    • Constant - for non varied conditions (influence of this conditions of processing steps cannot be assessed)

    Constraints are valid without any rules.

    6.4 Appendix D - Response Surface Design

    Response Surface Design, a subtype of Optimal Design, is intended for experiments where the user is interested in the influence of varied independent variables, materials (values for weight and/or volume) and/or processing steps, on target properties. 

    Response surface design is automatically proposed when the user inputs weight and/or volume for materials. 

    Required constraints for each material are:

    • Lower and upper bound or
    • Constant 

    Required constraints for each condition from processing step are:

    • Lower and upper bound for conditions of number type
    • Any of for conditions of predefined number or text (alphanumeric) type
    • Set to for conditions of predefined number or text (alphanumeric) type which are not varied and
    • Constant for non varied conditions  (influence of this conditions of processing steps cannot be assessed)

    Constraints are valid without any rules.

    © 2017-2025, Alchemy Cloud, Inc. All Rights Reserved. Confidential & Proprietary. Not for redistribution. Legal Terms

    We use cookies to operate this website, improve its usability, personalize your experience, and track visits. By continuing to use this site, you are consenting to the use of cookies. We also share data with our social media and analytics partners. For more information, please read our updated Privacy Policy.

    Accept