release notes
Release 61.34
Release 61.33
Release 61.32
Release 61.31
Release 61.30
View More...
Release 61.34
Release 61.33
Release 61.32
Release 61.31
Release 61.30
View More...
announcements
What's coming in 2022 Q1?
What's coming in Q4?
What's coming in March?
What's coming in the remainder of Q1?
What's coming in January?
What's coming in December?
What's coming in November?
Embedded AIhow to's
MANUALS

User Manuals

  • General System Manual
  • ELN & LIMS - User Manual
  • LIMS - User Manual
  • AI & DOE - User Manual
  • Chemical Drawing Manual

Admin Manuals

  • Configuration Portal - System Admin Manual
  • Alchemy Scripting - System Admin Manual
  • Field Property Guidelines - System Admin Manual

Onboarding

  • Customer Implementation Manual

User Manuals

  • General System Manual
  • ELN & LIMS - User Manual
  • LIMS - User Manual
  • AI & DOE - User Manual
  • Chemical Drawing Manual

Admin Manuals

  • Configuration Portal - System Admin Manual
  • Alchemy Scripting - System Admin Manual
  • Field Property Guidelines - System Admin Manual

Onboarding

  • Customer Implementation Manual
DEVELOPERS
Integration API
Integration API
BETA

Send Feedback

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
send feedback

Send Feedback

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
sign in

AI DOE User Manual

  • Introduction
  • Scan & Score
    • 2.1 Scan & Score Setup
      • 2.2 How it Works
        • 2.2.1 Scanning
        • 2.2.2 Scoring
      • 2.3 Using Scan & Score
      • 2.4 Scan & Score this Lab Book
      • 2.5 Scan & Score AI-Powered Statistical Analysis
        • 2.5.1 Summary Report
        • 2.5.2 Detailed Analysis
        • 2.5.3 How to Generate Report
    • DOE
      • 3.1 Background
      • 3.2 DOE Setup
      • 3.3 Screening Design
        • 3.3.1 Analyze Screening Design
      • 3.4 Optimal Design
    • Machine Learning
      • 4.1 Dataset
      • 4.2 Train AI
        • 4.2.1 Hyperparameter Tuning
        • 4.2.2 Automatic Selection
      • 4.3 Performance Metrics
        • 4.3.1 Regression
        • 4.3.2 Classification
      • 4.4 Trained Model Goals
      • 4.5 How to use Machine Learning
        • 4.5.1 Prerequisites
        • 4.5.2 Launching Alchemy AI
        • 4.5.3 Reviewing model performance
        • 4.5.4 Model Explainer
        • 4.5.5 Find Best Results
        • 4.5.6 Improve Model Performance
    • Interchangeable Materials
      • 5.1 Creating an Interchangeable Material Group
      • 5.2 Using Interchangeable Materials with Other Features
        • 5.2.1 DOE and AI Training with Interchangeable Materials
        • 5.2.2 Scan & Score with Interchangeable Materials
    • Appendices
      • 6.1 Appendix A - Detailed Statistical Analysis: Reference
      • 6.2 Appendix B - Mixture Design
      • 6.3 Appendix C - Mixture Process Design
      • 6.4 Appendix D - Factorial Design
      • 6.5 Appendix E - Response Surface Design
      • 6.6 Appendix F - Model Explainer Dashboard
      • 6.7 Appendix G - Model Explainer Dashboard

    ** Reading the Manual on phones and smaller screen tablets is not recommended, since you may not enjoy the best user experience.

    🔐 AI & DOE are Licensed Products

    Please discuss how to add them to your system with your CSM or Salesperson.

    1. Introduction

    Alchemy’s AI handles data selection, data preprocessing, model training, and model tuning. It trains, in parallel, over 10,000 different models and hyperparameter combinations on AWS using more than 20 machine learning algorithms to find the best fit for specific datasets, ensuring strong performance on both existing and unseen data. Model selection is refined using advanced statistical methods to reduce bias and improve performance, especially with smaller datasets. This technique helps mitigate overfitting while enhancing generalization. The selected models are then applied in another genetic algorithm to generate formulation recommendations for data exploitation. This algorithm guides the formulation toward the desired properties and also predicts property values, providing prediction intervals that reflect the uncertainty of the predictions.

    Our AI-ready platform is comprised of three main components:

    • Scan & Score
    • Design of Experiments (DOE)
    • Machine Learning (ML)

    2. Scan & Score

    Alchemy’s Scan & Score functionality is able to:

    • Scan trials according to the defined requirements in the Lab Book Overview record 
    • Score the trials according to the priority and targets of the properties

    Scan & Score has two purposes:

    1. Rank trials from highest to lowest performance related to material constraints, as well as priorities and targets for each property 
    2. Surface the data set to be used in the auto-ML (machine learning) module

    2.1 Scan & Score Setup

    To enable Scan & Score functionality, Requirements on the Lab Book Overview record need to be defined (Figure 2.1). Available categories of requirements on the Lab Book Overview record include:

    • Calculations, or calculated properties, that are based on materials and their quantities and material properties.
      • Example: Cost calculations
    • Tests or measured properties
      • Measured properties - Results that have been measured in the laboratory.
        • Example: Viscosity at different temperatures; Hardness on different days
      • Calculated properties - Based on other measured properties. 
        • Example: Water pickup % at different time intervals

    After selecting the category of requirements, you then select the priority and fill in the target.

    • Priority:
      • Must Have
      • Nice to Have
      • No target, rate only
    • Targets:
      • Exact
      • Lower than
      • Higher than
      • Between
    ‍Figure 2.1 Lab Book Overview - Requirements

    Constraints (Figure 2.2) are theoretical constraints for each material/processing step that is used in a trial. Examples include any generic materials like Water or TiO2, as well as proprietary materials that can be used as ingredients in trials. Once selected, target values must be defined as:

    • Between 
    • Constant
    ‍Figure 2.2. Lab Book Overview - Constraints

    2.2 How it Works

    After the requirements and constraints on the Lab Book Overview record have been entered, the Scan & Score button can be clicked from any of the following records:

    • Lab Book Overview - scans all trials across the entire database and returns those that satisfy the defined requirements.
    • Workspace -  scans all trials within the parent Lab Book (across all its Workspaces) and returns those that satisfy the defined requirements.
    • Workspace generated by Design of Experiments - scans all trials within the parent Lab Book (across all its Workspaces, including DoE-generated ones) and returns those that satisfy the defined requirements.

    2.2.1 Scanning

    Scanning (Figure 2.3) searches for a 1:1 match for properties, along with their conditions, from the Lab Book Overview record with the properties present on the trial level. This means that:

    • If every property–condition pair defined on the Lab Book Overview record is also present at the trial level, the corresponding trial is surfaced in Scan & Score.
    • If even one property–condition pair defined on the Lab Book Overview record is missing or differs at the trial level, the trial is excluded from Scan & Score.

    Trials from the previous step are further filtered based on material constraints. Trials will be surfaced and filtered for scoring based on material constraints on the Lab Book Overview record:

    • If all materials have constraints from lower value > 0 to higher value > 0:
      • Trials that contain all materials from the Lab Book Overview record
      • Trials that have all materials from the Lab Book Overview record, plus some other materials, as long as the other materials have an entered value of 0 or an empty field
    • If one or more materials have a between target value constraint from a lower value = 0 to a higher value > 0:
      • Trials which contain all materials from the Lab Book Overview record
      • Trials that have all materials from the Lab Book Overview record, plus some other materials, as long as the other materials have an entered value of 0 or an empty field 
      • Trials that have all materials from the Lab Book Overview record with constraints between value > 0 to value > 0 and without materials with constraints between 0 to value > 0
    ‍Figure 2.3 Scanning Relevant Trials

    From Figure 2.3 it can be seen that scanning will surface the following trials for scoring:

    • Trial 1 - Full Match
      • All properties and material constraints are an exact match compared to the requirements on the Lab Book Overview record
    • Trial 5 - Full Match
      • It has Material IV, but its value is 0
      • It does not have Material III but based on the requirements on the Lab Book Overview page its value can be 0

    Scanning will not surface the following trials for scoring:

    • Trial 4 - It does not match either the calculations, nor the tests or the materials
    • Trial 3 — It matches all calculated properties, Measured property I, and all materials, but is missing Measured property II from the requirements on the Lab Book Overview record.
    • Trial 2 - Measured property II - Condition II has a different value compared to the requirements on the Lab Book Overview record
    • Trial 6 - It has material IV with a value > 0 and that material is not in the requirements on the Lab Book Overview record

    Trials that have partial matches are surfaced and scored because they have at least one exact match for a property, along with its conditions. 

    2.2.2 Scoring

    Once the trials are scanned, every trial is assigned with a score based on priority and how far the values of the trials properties and materials are from the targets defined on the Lab Book Overview record.

    Trials are ranked from the lowest scores, or best performing trials, to the highest scores, or worst performing trials.

    2.3 Using Scan & Score

    At the bottom of the Lab Book Overview record is the SCAN & SCORE ALL (Figure 2.4) button. Clicking it triggers a search of your historical database and returns a force-ranked list of Samples matching the requirements, targets, and material constraints defined on the Lab Book Overview. For this button to be enabled, at least one requirement (measured or calculated) must be added to the Requirements table in the Lab Book Overview record that has a priority of must have or nice to have. Requirements that have a no target, rate only priority will not be included in the Scan & Score functionality.

    ‍Figure 2.4. Lab Book Overview Record
    ‍

    Once the SCAN & SCORE ALL button is clicked, the system navigates to a results page that displays the matching actual trials for your requirements (Figure 2.5).

    The top of the page displays two boxes:

    • Matching Trials
      • Displays the number of trials that fully satisfy both the material and processing step constraints and the property targets defined on the Lab Book Overview record.
    • Data for predictive models
      • Displays whether the available historical data is sufficient based on the number of given constraints and matching trials:
        • The data is deemed sufficient if the number of matching trials is at least 2 × (varying constraints) + 1 for every property with a priority Must Have and Nice to Have
        • The data is deemed partial sufficient when the threshold 2 × (varying constraints) + 1 is met for some but not all properties with a priority Must Have and Nice to Have

    The data is deemed insufficient when the number of matching trials is below 2 × (varying constraints) + 1 for every property with a priority Must Have and Nice to Have

    Figure 2.5. Scan & Score Results

    The Relevant Trials (Figure 2.6) table is shown by default and displays the total number of historical trials for each Must Have or Nice to Have property. Only trials that fully match the trial rules are counted, and this dataset can be used as a starting point for model training. The number helps users determine whether model training should be attempted or whether they should proceed directly to DOE. The table can be collapsed or expanded as needed.

    • Each measured property, as well as calculated properties based on some other measured values, will display the number of relevant trials with associated test results for the given property. If the number of trials is above the calculated threshold, it should be green; if not, it should be red.
    • For each calculated property, the value is always N/A and colored green
    ‍Figure 2.6. Scan & Score - Relevant Trials

    For each trial in the results table, you can see:

    • Name - The formulation code which is a link to the corresponding Sample
    • Values for all must have and nice to have targeted properties (calculated or measured)
      • If the target is met, the value is green
      • If the target is not met, the value is red
    • A radio button to select the trial you wish to use as the starting trial in a newly created Workspace record.

    When the desired trial is selected, the button Proceed to Formulating is enabled. Clicking on it will:

    • Close the Scan & Score results page 
    • Create a new Workspace record with the preselected Starting Trial matching the one selected in the previous screen

    With the Starting Trial selected, the Workspace record will have:

    • The first initial Trial created as a copy of the Starting Trial
    • The Trial table pre-populated with the first trial information
    • Calculation and Testing tables will be predefined (pulling calculated and measured properties from the current Lab Book Overview record, but with no values)

    If the Material and Processing Step Constraints table on the Lab Book Overview record is filled in, the Scan & Score will also be extended to search trials based on material constraints. Only the trials that have matching ingredients will be taken into account.

    During Scan & Score, trials are ordered in two tiers: trials whose material and processing step constraints all fall within the defined targets are listed first, followed by trials with one or more constraint values outside those targets. Within each tier, trials are ranked by their performance score.

    2.4 Scan & Score this Lab Book

    At the bottom of every Workspace record is the SCAN & SCORE THIS LAB BOOK button. Selecting it opens the same Scan & Score results page (Figure 2.5 and 2.6) that appears when the SCAN & SCORE ALL button is clicked on the Lab Book Overview record, displaying all matching trials within the current Lab Book.

    ‍Figure 2.7 Scan all Trials

    However, certain conditions must be met for this feature to be enabled:

    • At least one requirement, measured or calculated, must exist within the CalculationsTests or sections with a priority of Must Have or Nice to Have.
    • Requirements must be pulled from the Lab Book Overview record so they carry the appropriate priorities and targets. Adding tests within the Workspace record will generate them as No target, rate only, excluding them from the Scan & Score this Lab Book feature.
    ‍Figure 2.8 Scan all Trials - Results

    Once the Best Performing Trial is selected, two additional options are enabled:

    • Final Conclusion
      • Closes the results modal
      • Creates a Final Conclusion record with the Best Performing Trial auto-populated  based on the selection from the previous modal.
        • Note: This is only enabled if there is no Final Conclusion record in the current Lab Book.
    • Next Trial Group
      • Closes the results modal
      • Creates a new Workspace record with the Starting Requirements auto-populated based on the Best Performing Trial selection from the previous modal.

    2.5. Scan & Score AI-Powered Statistical Analysis

    Once your dataset has been processed on the Scan & Score Results page, you gain access to an automated, human-readable report that provides deep statistical insights into your data's quality and structure. This report is essential for confirming data readiness and understanding the potential of your dataset before initiating model training.

    2.5.1. Summary Report 

    This report provides a concise, human-readable summary of your dataset's statistical profile, its readiness for Machine Learning (ML), and specific recommended actions to maximize model performance.

    1. Executive Summary

    This section gives you the immediate, high-level status of your dataset.

    • Dataset Size: Reports the total number of experimental observations (rows) and the count of all predictor (materials/processing steps) variables.
    • Target Variables: Identifies the number of continuous output variables available for prediction.
    • Overall AI Readiness Score: Provides a final score out of 100, categorized (e.g., Bad, Fair, Good), to quickly assess the dataset's current suitability for optimal AI/ML use.
    • Key Data Concerns: Summarizes the most significant issues found, including the number of potential outliers detected and the percentage of observations affected.
    • Key Feature Concerns: Highlights the number of input variables flagged for constant or low variance, which are typically uninformative for prediction.

    Model Performance Snapshot: Provides a quick measure of predictive capability for target properties (e.g., R2 value).
    ‍

    1. Recommended Actions 

    Advises users to address data quality issues before proceeding to model training to ensure more reliable and accurate predictions.

    1. Detailed Analysis Summary

    This section provides the breakdown of the AI Readiness Score and advanced statistical findings. The overall score is derived from five components, each evaluated out of 100: 

    1. Data Sufficiency Assessment: Reflects whether the volume of data is adequate relative to the number of features. A higher score means less risk of overfitting. Key Metric: Samples per feature per target.
    2. Data Quality Assessment: Measures data issues like outlier presence and unused variables. A low score suggests significant data cleaning is needed.
    3. Feature Quality Assessment: Measures feature relevance by checking for issues like low-variance inputs. A high score indicates more informative, relevant features.
    4. Statistical Validity Assessment: Assesses how well the data and model residuals meet core statistical assumptions (like normality and homoscedasticity). Low scores warn that robust or non-linear modeling may be necessary.
    5. Model Performance Assessment: Reflects the predictive capability of an initial model on the dataset (e.g., R2 score), indicating if simple models can capture the main relationships.
      ‍
    1. Additional Analysis Details

    The report also includes Statistical and Design Insights confirming Sample Size Adequacy, reporting Design Efficiency (G-Efficiency), listing Key Predictors based on significance and correlation, and providing Residual Diagnostics to ensure the reliability of the chosen model.

    2.5.2. Detailed Analysis 

    The detailed analysis is organized into the following eight comprehensive sections:

    1. Dataset Overview & Data Availability

    This section provides a high-level summary of the dataset's composition and completeness.

    • Dataset Dimensions: Confirms the total number of observations (rows) and the number of variables (columns).
    • Variable Counts: Clearly distinguishes between the count of Input Variables (predictors) and Target Variables (outputs).
    • Data Availability: Measures the overall completeness of the dataset by reporting the percentage of non-missing values.

    2. Variable Classification & Structure

    This section confirms how the system interpreted and categorized each variable, which is fundamental for accurate modeling.

    • Variable Classification: Confirms the automatic categorization of variables as either Categorical (discrete groups) or Continuous (numerical range).
    • Variable Identification: Clearly separates and lists variables based on their role: Input Variables (the predictors) versus Target Variables (the measured outcomes).

    3. Dataset Quality Assessment

    This critical section evaluates the robustness and quality of the data for modeling, flagging common issues that can affect model performance.

    • Target Coverage: Assesses the dataset's density by reporting the ratio of observations to input variables.
    • Target Variation: For continuous targets, the Coefficient of Variation is reported to ensure there is enough variation in the outcome for the model to learn meaningful relationships.
    • Low Variance Detection: Identifies and flags variables that are constant or show minimal variation, as these are often uninformative for prediction.
    • Multicollinearity: Detects highly correlated pairs of input variables, indicating potential redundancy or instability in a linear model.
    • Normality Testing: Evaluates the statistical distribution of variables to assess whether they meet the assumptions required by various modeling techniques.
    • Outlier Detection: Identifies and flags extreme values within the dataset that may unduly influence model results.
    • Variable Representation: Reports the percentage of trials that actually utilize each input variable, highlighting underutilized materials or conditions.

    4. Experimental Design Efficiency

    This section is crucial when analyzing data from Design of Experiments (DOE), providing metrics to evaluate the quality of the design itself.

    • Design Quality Metrics: Provides core metrics for evaluating the overall effectiveness and balance of the experimental design.
    • G-efficiency Calculations: Reports the G-efficiency score, which measures the precision of parameter estimates.
    • Distance-Based Criteria: Metrics related to how well observations are spread across the experimental space.
    • Discrepancy Criteria: Metrics that quantify the non-uniformity or gaps in the design space.

    5. Cluster Analysis

    This section performs unsupervised learning to reveal inherent groupings within your data that may not be immediately obvious.

    • Pattern Detection: Identifies and visualizes natural groupings or segments (clusters) within the observations.
    • Observation Grouping: Provides information on how similar observations were grouped together, offering insight into underlying similarities across experiments or formulations.

    6. Feature Importance Analysis

    This section moves beyond simple correlations to explain why and how much each input variable influences the target variable.

    • Correlation Analysis: Reports the linear relationship between each input variable and the target variable.
    • SHAP Analysis (SHapley Additive exPlanations): Uses advanced, model-agnostic techniques to explain the contribution of each feature to the model's predictions.
    • Variable Importance Rankings: Provides a final, ranked list of input variables based on their overall significance to the prediction task.

    7. Regression Model Results

    This section summarizes the performance of the predictive models automatically generated by the system.

    • Best Model Selection: Identifies the optimal predictive model chosen based on internal scoring and cross-validation performance.
    • Model Performance Metrics: Reports standard metrics for the selected model, such as R² (explained variance) and RMSE (Root Mean Square Error).
    • Significant Variables and Coefficients: Lists the key predictor variables and their associated coefficients, detailing their direction and magnitude of impact on the target.

    8. Residual Analysis

    This final section validates the fundamental statistical assumptions of the selected model, confirming the reliability of its results.

    • Model Assumption Validation: Verifies the underlying assumptions necessary for the model's conclusions to be reliable.
    • Normality of Residuals: Assesses whether the errors (residuals) are normally distributed.
    • Homoscedasticity: Checks for constant variance of the residuals, ensuring that prediction errors are consistent across the full range of fitted values.
    • Independence: Checks for autocorrelation, ensuring that the prediction error for one observation doesn't influence the error for the next.
    • Influential Observations and Outliers: Identifies individual data points that have a disproportionate impact on the model's coefficients.
    • Diagnostic Plots: Provides essential visualizations, such as Q-Q plots and residuals vs. fitted values plots, for visual inspection of model diagnostics. 

    2.5.3. How to Generate Report

    To successfully generate a meaningful statistical report, the system requires a complete definition of the variables you wish to analyze. You must define all necessary inputs (materials/processing steps) and outputs (Calculations/Tests) in the Lab Book Overview before proceeding (Check section  2.1. Scan & Score Setup).

    Required Data Input: Constraints Table

    • It is crucial that you add Material and Processing Step variables to the Constraints table.
    • Why it's necessary: The statistical analysis assesses the relationship between the Input Variables (your materials and processing steps, defined here) and the Output Variables (your test results and calculations, defined below). Without inputs, the report cannot evaluate predictive relationships.
    • System Check: If you have not included any materials or processing steps in the Constraints table, the Generate Report button will be disabled, and a tooltip will indicate the missing requirements.
      ‍

    Required Data Output: Tests and Calculations Tables

    • You must also define your desired output variables:
      • Add all relevant Tests to the Tests table.
      • Add all required Calculations to the Calculations table.
        ‍

    Steps to Generate the Report

    1. Run Scan & Score: After ensuring all requirements (Inputs, Tests, and Calculations) are defined in the Lab Book Overview, click the SCAN & SCORE ALL or SCAN & SCORE THIS LAB BOOK button (2.3 Using Scan & Score).
    2. Generate Report: If the system confirms that you have enough data for analysis, the GENERATE REPORT button will become active (Figure 2.8.). Click this button.

    Note: This process initiates the deeper statistical analysis needed for the report.

    Figure. 2.8. GENERATE REPORT button on Scan & Score results page
    1. Monitor Report Generation: A notification will confirm that the report generation process has started (Figure 2.9 -2.11).
    Figure 2.9. Generation of the report successfully started 
    Figure 2.10. Progress on the report generation
    Figure 2.11. Generation of the report successfully finished
    1. View the Report: Once the process is complete, the notification will update. Click View the Report to open the full analysis.

    Report Access and Download

    The generated report is presented in two tabs:

    • Tab 1 (Figure 2.12.): Summary Report: Provides the high-level Executive Summary, key findings, and recommended actions.
    Figure 2.12. Page of the Summary report
    • Tab 2 (Figure 2.13.): Detailed Analysis: Contains the in-depth statistical breakdowns, feature quality assessments, and model diagnostics.
    Figure 2.13. Page of the Detailed Analysis 
    • Both the Summary Report and the Detailed Analysis can be downloaded as a single PDF document for offline review and sharing.

    3. DOE

    🔐 Please discuss how to add this to your system with your CSM or Salesperson.

    Alchemy DOE (Design of Experiments) is a powerful tool that addresses the situation when there is little to no historical data, preventing the system from running AI. It will help you extend your dataset in the most efficient manner possible (i.e., with the smallest, well-distributed, statistically optimal dataset) so that Alchemy can train models and run AI.

    No prior experience with machine learning, data science, or statistics is required to use DOE. Any chemist or scientist will be able to input their formulating objectives and constraints to  be guided in the most efficient manner to achieve their goals.

    3.1 Background

    A traditional method for experimenting is One Variable at a Time (OVAT) in which one variable is varied, while all others are kept constant, and its influence is assessed on the target properties (Figure 3.1). 

    Design of Experiments is a method in which all variables are changed at the same time and their influence is assessed on the target properties. It proposes testing of the most informative locations in the design space which are focused towards the interpretability of the process and space, filling the design space (Figure 3.1). 

    In Table 3.1, OVAT and DOE experiments are compared based on different characteristics.

    ‍

          One Variable at a Time                   Design of Experiments

    Figure 3.1 Graphical Representation of One Variable at a Time and Design of Experiments
    Table 3.1. Comparison of OVAT and DOE
    OVAT
    DOE
    Main effects
    Yes

    Yes

    Interaction effects
    No

    Yes

    Design space exploration
    Partly

    Better than OVAT

    Design space filling
    A lot of gaps in design space

    Better filled design space

    Time to achieve optimal values
    Longer (in most cases not achieved)

    Faster (with the help of predictive models)

    Number of experiments
    A lot (several levels are tested for each material)

    Number of experiments increases fast with increasing the number of input variables

    Predictions
    In certain region accurate in another very inaccurate

    Better accuracy throughout different regions of design space

    The main types of designs available in Alchemy are:

    • Screening Design 
    • Optimal Design

    3.2 DOE Setup

    Figure 3.2. Design Experiments

    At the bottom of the Lab Book Overview record, you can find the Design Experiments button. It is possible to create designed experiments when:

    • The AI capability is turned on for a tenant
    • The Study Overview is valid
    • There is at least one measured targeted property
    • There are at least two, but a maximum of 20, materials and/or conditions of a processing step with constraints with a target value of between or any of
      • There can also be an unlimited number of materials and/or conditions of a processing step with constraints with a target value of constant or set to
    • Units for materials are chosen in formulating input settings and can be:
      • Weight units - No formulating rules required
      • Target Weight % - The following formulating rules are required:
        • The total for lower bound and constant constraints is less than 100% 
        • The total for higher bound and constant constraints is more than 100%

    Once the button is clicked, the Design Experiments modal will be opened with the following information:

    Figure 3.3. Design Experiments Modal
    • Design Experiment Type:
      • Screening - Usually the first type of design to be employed when there is little to no historical data. The screening design aims to “screen” the input variables and determine which ones significantly impact the target properties versus those that have low or no impact. It will help you extend your dataset most efficiently (i.e., with the smallest, well-distributed, statistically optimal dataset).
      • Optimal - A type of design that is performed to gain a deeper understanding of a problem space. Because it requires more experiments to be performed, it is normally done if the problem space is small enough initially or if the problem space has been reduced by executing a screening design.
    • The number of formulations that will be created for you. The system will inform you about the minimum and maximum number of trials, and it is up to you to choose how many trials to perform.

    3.3 Screening Design

    Screening Design is intended to identify significant main effects from a list of many, potentially varied, materials.

    • It is automatically proposed for more than five varied variables (materials and/or conditions from processing steps).
    • A minimum number of experiments related to the materials and/or conditions from processing steps must be tested to identify which input variables are relevant for the given properties.
    • Analysis of the main effects is available after all experiments are tested, with the contribution of each material and/or conditions from processing steps to the tested property and significance according to the Pareto principle (the 80/20 rule*).
    • After the analysis in the trial table, users are advised to change constraints for materials and/or conditions from processing steps in accordance with the contribution of each material and/or conditions from processing steps to the tested property.

    The goals of screening design are to reduce the size of the design space through:

    • Reducing the number of varied materials and/or conditions from processing steps. In further experiments only significant variables should be varied while insignificant ones should be kept constant.
    • Narrowing the constraints range.

    The subtypes of screening designs available in Alchemy are:

    • Mixture Design
    • Mixture Process Design
    • Factorial Design

    *The Pareto principle, also known as the 80/20 rule, is a theory that maintains 80 percent of the output of a process or system is determined by 20 percent of the input.

    ‍
    From the Design Experiments modal, when the Screening Design is selected as a type and you click on the Generate Formulations button, a new Workspace record will be created. This action can take some time since Alchemy is potentially creating a lot of new theoretical and actual formulations. While they are being created, this message will be displayed in your Lab Book:

    Figure 3.4 .Waiting Message - Screening Design

    Once the Workspace record is created, it will have the following:

    • The corresponding number of theoretical trials are available in the record, as defined in the Design Experiments modal. Values inside these theoretical trials are non-editable since you must create them exactly as described in order to gain valuable insights.
    • For each theoretical trial, an actual trial will be created as well, for you to log the actual values measured for each material in the trial.
    • Trials cannot be deleted or added to the Workspace record.
    Figure 3.5. Workspace - Screening Design

    3.3.1 Analyze Screening Design

    After performing the screening design, the effects of each input variable are assessed by statistically determining if an ingredient has significant or no effect on a specific performance characteristic. Based on this information, users should be able to reduce the number of input variables they want to vary, reducing the problem space and the required number of experiments for a more in-depth exploration of the design space for optimization purposes. 

    You will be able to run the analysis when all actual trials (samples) have entered values and all test results are entered. Then the Analyze Screening Design button below the Tests table will become available.

    Figure 3.6. Analyze Screening Design

    ‍

    Clicking this button will create a detailed analysis table to help you better understand the problem space and try to decrease it, if possible, for the optimal or adaptive design. In this table, you can see the following:

    • The list of all materials and conditions from processing steps, with their defined constraints
    • For each targeted property, how increasing the value of each material and/or condition from processing steps will impact that property:
      • ↑ means that increasing the value of material or condition from processing steps will increase the property value
      • ↓ means that increasing the value of material or condition from processing steps will decrease the property value
      • * means that material or condition from processing steps has a significant impact on changing the property value
    Figure 3.7. Analysis Table

    Below the Analysis table, a copy of the Material Constraints table from the Lab Book Overview record will be displayed, giving you the possibility to reduce the number of varying materials and/or conditions from processing step constraints (the one with targets defined as “between”). Reducing this number is beneficial since you will be able to use different types of designed experiments, and, eventually, become able to get the recommended trials by Alchemy AI. 

    Figure 3.8. Material Constraints Table

    After the Analysis is created, at the bottom of the Workspace record, you can perform the following actions:

    • Continue Design of Experiments - it will open the same Design Experiments modal from the Lab Book Overview record 
    • Use Alchemy AI
    • Final conclusion
    • Next trial group
    • Scan & Score this Lab Book

    3.4 Optimal Design

    Optimal design is intended to fill the design space with experimental points. Because it requires more experiments to be performed, it is normally done if the problem space is initially small enough or if the problem space has been reduced by executing a screening design.

    • It is automatically proposed for five or fewer varied materials and/or conditions from processing steps, or users can choose it after completing the screening design.
    • A higher number of experiments than for screening are needed.
    • After all experiments are tested, the performance of predictive models is shown.

    The goals of optimal design are to:

    • Get the high performance of the predictive models. 
    • Recommend experiments with optimal values for target properties.

    The subtypes of optimal design available in Alchemy are:

    • Mixture Design
    • Mixture Process Design
    • Response Surface Design

    When you click on the Design Experiment button in the Lab Book Overview record, or Continue Design of Experiment in the Workspace record, if you have fewer than five varying material and/or conditions from processing step constraints (or you reduced them to fewer than five), the Optimal design will be the preselected type. The rest of the modal will look the same as displayed when Screening design was selected.

    Similarly, when creating a Screening design Workspace record, once you decide to continue with the Optimal design, the new Workspace record will be created for you. This action can take some time since Alchemy potentially creates many new theoretical and actual formulations. While they are being created, this message will be displayed in your Lab Book:

    Figure 3.9. Waiting Message - Optimal Design

    Once the Workspace record is created, it will have the following:

    • The corresponding number of theoretical trials, as defined in the Design Experiments modal. These trials are non-editable since you must create them exactly as described in order to gain valuable knowledge.
    • For each theoretical trial, an actual trial will be created as well, for you to log the actual values added to the trial.
    • Trials cannot be added to the Workspace record.
    Figure 3.10. Workspace - Optimal Design

    4. Machine Learning

    At Alchemy, our machine learning (ML) models use input and output variables for training to recognize certain types of patterns and make predictions. Alchemy AI (Figure 4.1) uses two types of ML algorithms depending on the type of output variables:

    • Regression machine learning algorithms - For training models with continuous numerical output variables.
    • Classification machine learning algorithms - For training models with predefined (categorical) output variables. 

    Alchemy’s AutoML module consists of 13 different algorithms for regression and 10 different algorithms for classification, which are trained in parallel in order to significantly shorten the training duration.

    Figure 4.1 Alchemy AI Diagram

    4.1 Dataset

    A dataset consists of input and output variables.

    Input variables are independent variables whose values are measured and input by the user. In Alchemy, input variables are:

    • Materials (e.g. resin A, solvent B)
    • Processing steps (e.g. temperature, mixing speed)

    Output variables are variables which depend on input variables. In Alchemy, output variables are:

    • Numerical properties (continuous numerical values, e.g. Viscosity, Drying time)
    • Predefined numerical properties (numerical categorical values)
    • Predefined alphanumeric properties (text or numerical categorical values, e.g. pass/fail)

    A dataset for training AI in Alchemy is surfaced through Alchemy’s Scan and Score or Alchemy AI functionality. These tools give information about the:

    • Number of matching trials with respect to the requirements added in the material constraints table on the Lab Book Overview record
    • Number of relevant trials with respect to the requirements added in the test table on the Lab Book Overview record
    • Whether it is possible to train ML models based on the available dataset

    The Show More Details button displays the number of available trials for each property separately.

    4.2 Train AI

    Train AI in Alchemy consists of:

    • Hyperparameter Tuning
    • Automatic Selection

    4.2.1 Hyperparameter Tuning

    Hyperparameter tuning is a process which includes searching for the hyperparameters that will produce the highest performance of the models for each ML algorithm.

    In Alchemy, performance of ML models are evaluated through repeated k-fold cross validation.

    In k-fold cross validation, the dataset is split into k numbers of sets and each time one set is held back and models are trained with the remaining sets. Held back sets are used for performance estimation. This means that a total of k models are fit and evaluated based on the performance of mean value of held back sets. This process is repeated l times with different splits, depending on how large the dataset is. In addition, l ✕ k number of models are fitted with repeated k-fold cross validation for estimating the performance of ML models.

    The process of model training is shown in Figure 4.2. 

    Figure 4.2. Process of Model Training

    4.2.2 Automatic Selection 

    In Alchemy, selection of the best model is automatically made for each target property. Automatic selection of the best model consists of the following steps:

    1. Get the best performing model from all models for the same algorithm, but with different hyperparameters
    2. Get the best model from all models from step 1, one model for each algorithm for regression (13) and/or one model for each algorithm for classification (10)

    For automatically choosing the best model (Figure 4.3 and Figure 4.4), different performance metrics are used.

    • Regression Models: combined performance metric which relies on mean absolute error (MAE) and root mean square error (RMSE)
    • Classification Models:
      • If the predefined output values are balanced throughout the dataset, the system will choose accuracy for finding the best models 
      • If the predefined values are imbalanced throughout the dataset, the system will choose average precision for finding the best models   
    Figure 4.3. Automatic Selection of the Best Model for Regression Algorithms
    ‍
    Figure 4.4. Automatic Selection of the Best Model for Classification Algorithms

    4.3 Performance Metrics

    It is important that we track performance metrics to validate the models we generate in terms of the accuracy of predicted values. 

    First, a couple definitions:

    • Performance Metric: Validation of the model in terms of actual and predicted values. 
    • Actual Value: The test result achieved for the property of a certain trial based on actual testing.
    • Predicted Value: The test result which is predicted from machine learning models for the property of a certain trial.

    4.3.1 Regression

    Performance metrics for regression models available in Alchemy are:

    • measure of the proportion of variance in the dependent variable that is predictable from the independent variables

    $$R^2=1-\frac{\sum_{i=1}^N\left(y_i-\hat{y_i}\right)^2}{\sum_{i=1}^N\left(y_i-\bar{y}\right)^2}$$

    ${N}$ - number of trials


    1. R2 (coefficient of determination):
     

    $y_i$ - actual value

    $\widehat{y_i}$ - predicted value

    $\bar{y}$  - average of all actual values

    • the metric ranges from -∞ to 1, where a higher value indicates better model (value of 1 indicates perfect model)


    2. MAE (mean absolute error):

    • measure of the average magnitude of the differences between the prediction and the actual values for target property

    $$M A E=\frac{1}{N} \sum_{i=1}^N\left|y_i-\widehat{y_i}\right|$$

    ${N}$ - number of trials

    $y_i$ - actual value

    $\widehat{y_i}$ - predicted value

    • the metric ranges from 0 to +∞, where a lower value indicates better model (value of 0 indicates perfect model)


    3. RMSE (root mean squared error)
    :

    • measure of the square root of average magnitude of differences between predicted and actual values for target property

    $$R M S E=\sqrt{\frac{1}{N} \sum_{i=1}^N\left(y_i-\widehat{y}_i\right)^2}$$

    ${N}$ - number of trials

    $y_i$ - actual value

    $\widehat{y_i}$ - predicted value

    • the metric ranges from 0 to +∞, where a lower value indicates better model (value of 0 indicates perfect model)

    4.3.2 Classification

    Performance metrics for classification models available in Alchemy are:
    ‍

    1. Accuracy: 

    • ratio of correctly predicted instances to the total number of instances in the dataset

    $$Accuracy =\frac{\text { Number of correct predictions }}{\text { Total number of predictions }}$$

    • the metric ranges from 0 to 1, where a higher value indicates better model (value of 1 indicates perfect model)


    2. Average Precision
    : 

    • area under the precision-recall curve which quantifies the model's ability to make accurate positive predictions

    $$Average Precision =\sum_{k=1}^N \operatorname{Precision}(k) \Delta \operatorname{Recall}(k)$$

    ${N}$ - number of trials

    $Precision(k)$ - is the precision at a cutoff of k

    $\Delta Recall(k)$ - is the change in recall that happened between cutoff k-1 and cutoff k

    • the metric ranges from 0 to 1 where a higher value indicates better model (value of 1 indicates perfect model)


    3. F1 Score
    : 

    • Harmonic mean of: 

    - Precision: accuracy of positive predictions, which is the ratio of true positive predictions to the total number of positive predictions made by the model and

    $$Precision_{classI}=\frac{TP_{classI}}{TP_{classI}+FP_{classI}}$$

    ${Precision_{classI}}$ - precision for one class, there are as many classes as there are predefined values

    ${TP_{classI}}$ - true positives for class I, number of trials which were predicted correct for class I (predicted class I matched the actual class I)

    ${FP_{classI}}$ - false positive for class I, number of trials which were predicted incorrect to belong to class I (predicted class I did not match the actual class)

    - Recall: ratio of true positive predictions to the total number of actual positive instances in the dataset 

    $$Recall_{classI}=\frac{TP_{classI}}{TP_{classI}+FN_{classI}}$$

    ${FN_{classI}}$ - false negative for class I, number of trials which were predicted incorrect to belong to another class (predicted class did not match the actual class I)

    $$F1score_{class~I}=\frac{2\times Precision_{class~I}\times Recall_{class~I}}{Precision_{class~I}+Recall_{class~I}}$$

    • the metric ranges from 0 to 1, where a higher value indicates a well-balanced performance, demonstrating that the model can concurrently attain high precision and high recall (value of 1 indicates perfect model which accurately predicts each class)
      ‍

    4. ROC AUC Score (area under the receiver operating characteristic curve): 

    • evaluation of a model's ability to discriminate between positive and negative instances 
    • the metric ranges from 0 to 1, where a higher value indicates better discrimination between positive and negative instances

    4.4 Trained Model Goals

    At Alchemy, we strive to achieve three goals when models are trained:

    1. Recommend trials with the optimal test results for all target properties
    2. Predict target property test result for trials with input variables defined by the user 
    3. Get insights to the input variable’s importance

    All predicted property values have associated predicted confidence intervals which will show how much deviation can be expected from the predicted property value for a certain trial.

    4.5 How to use Machine Learning

    Alchemy's Machine Learning functionality — exposed through the Use Alchemy AI button — trains predictive models on your historical trial data and uses them either to recommend new formulations that are most likely to meet your requirements (when model confidence is high) or to recommend exploratory experiments that increase model accuracy in regions where it is currently weak (when model confidence is low). The workflow covers four stages:

    • Launching model training from the Lab Book Overview record.
    • Reviewing model performance and reading the guidance message that appears below the performance table.
    • Exploring how the models behave with the Model Explainer.
    • Generating either Find Best Results trials (optimized formulations) or Improve Model Performance trials (exploratory formulations), selecting the ones to test in the lab, and proceeding to testing.

    4.5.1 Prerequisites

    Before launching Alchemy AI, the Lab Book Overview record must be set up so the system has both the inputs and the outputs needed to train predictive models. The same setup described in Section 2.1 Scan & Score Setup applies here.

    Required inputs — Constraints table

    • The Constraints table on the Lab Book Overview record must contain the materials and processing steps you want the models to learn from. These are the input variables (predictors) for training.
    • Without any entries in the Constraints table, the Use Alchemy AI button is disabled.

    Required outputs — Requirements table (Calculations and Tests)

    • At least one measured or calculated property must be added to the Requirements table with a priority of Must Have or Nice to Have. Properties with a No target, rate only priority are not used in model training.
    • These are the output variables (targets) the models will learn to predict.
    • Both regression targets (continuous numerical properties such as Viscosity or Hardness) and classification targets (predefined categorical properties such as Pass / Fail or Crystalline / Powder) are supported. Alchemy automatically applies regression algorithms to numerical properties and classification algorithms to categorical properties (see Section 4.2.2).

    Sufficient historical data

    • Before model training is attempted, the available historical data must be sufficient for the properties to be predicted. The recommended path is to first run SCAN & SCORE ALL (Section 2.3) or ALCHEMY AI and confirm that the Data for predictive models box reports the data as sufficient or partially sufficient. If the data is insufficient, Alchemy recommends extending the dataset through Design of Experiments (Section 3) before returning to Machine Learning.

    4.5.2 Launching Alchemy AI

    At the bottom of the Lab Book Overview record is the Use Alchemy AI button (Figure 4.5). What happens when you click it depends on whether models have already been trained for this Lab Book:

    • If no models have been trained yet, clicking Use Alchemy AI first opens the Scan & Score results page for the historical data that matches the Requirements and Constraints defined on the record. From there, click Train AI to start training. A "Model training in progress" message is displayed (Figure 4.6) while the system trains models in parallel across multiple algorithms as described in Sections 4.2.1 and 4.2.2, and automatically selects the best-performing model for each target property.
    • If models have already been trained for this Lab Book, clicking Use Alchemy AI opens the Alchemy AI results page directly, showing the trained model performances and the Model Explainer (Section 4.5.3) without repeating the training step.
    Figure 4.5. Use Alchemy AI button on the Lab Book Overview record.
    Figure 4.6. Model training in progress — waiting message

    4.5.3 Reviewing model performance

    Once training is complete, the Alchemy AI results page opens (Figure 4.7). The top of the page displays the model performance table, which lists every Must Have and Nice to Have property together with a headline score for its best-performing model. The columns shown depend on the type of target:

    • Model accuracy [%] — displayed for regression (numerical) properties. Higher is better; the metric is described in Section 4.3.1.
    • Avg. precision — displayed for classification (categorical) properties. Ranges from 0 to 1 with higher values indicating better performance, particularly on imbalanced datasets.

    Cells are color-coded to make it easy to assess model quality at a glance:

    • Green — strong model performance, suitable for reliable predictions.
    • Yellow / amber — moderate performance; predictions should be interpreted with caution.
    • Red — poor performance; the dataset may need more trials or additional DOE before reliable predictions can be made for this property.
    Figure 4.7 Alchemy AI results — model performance table with guidance message and FIND BEST RESULTS / IMPROVE MODEL PERFORMANCE buttons

    Guidance message

    Below the model performance table, Alchemy displays a guidance message that recommends the next action based on overall model confidence:

    • If the performance for every property is at or above 80%, the message reads:

    "High model confidence across all properties. Use FIND BEST RESULTS to generate optimized formulations and accelerate your optimization."

    • If at least one property is below 80%, the message reads:

    "Low or Moderate model confidence for {property_name}. Use IMPROVE MODEL PERFORMANCE to generate trials that explore the design space and increase prediction accuracy."

    Where {property_name} is replaced with the name (or comma-separated names) of the underperforming property or properties. The two action buttons — Find Best Results and Improve Model Performance — are always shown together at the bottom of the page; the button matching the recommended action is rendered in primary (filled) style and the other in secondary (outlined) style, so the suggested next step is visually emphasized.

    Detailed analysis — Regression and Classification properties

    Clicking Show detailed analysis expands the headline table into two tables (Figure 4.8) that report the chosen model and its full set of metrics for each target property:

    • Regression properties — columns: Property, Best performing model (e.g., LinearRegression, BayesianRidge, Lasso, ARDRegression), R², Mean absolute error (MAE), Root mean squared error (RMSE).
    • Classification properties — columns: Property, Accuracy, F1 score, ROC AUC score, Average precision.

    Definitions for these metrics are given in Section 4.3 Performance Metrics. Use these tables to confirm that the chosen model and its error/score values are acceptable for your use case before generating formulations. Clicking Hide detailed analysis collapses the tables again.

    Figure 4.8 Detailed analysis — Regression properties and Classification properties tables

    4.5.4 Model Explainer

    The Model Explainer section below the performance tables lists every target property as a collapsible entry. Expanding a property opens a tabbed view that lets you inspect the model from several angles — from the most influential inputs through to per-trial predictions and "what if" scenarios. The exact set of tabs depends on whether the property is regression-based or classification-based:

    • Regression properties (numerical targets) show: Feature Importances, Regression Stats, Individual Predictions, What If…, Feature Dependence.
    • Classification properties (categorical targets) show: Feature Importances, Classification Stats, Individual Predictions, What If…, Feature Dependence.

    The Feature Importances, Individual Predictions, What If…, and Feature Dependence tabs work the same way for both target types; only the Stats tab differs. A Download button in the top-right of the Model Explainer panel exports the current view for offline review or sharing.

    4.5.4.1 Feature Importances

    The Feature Importances tab answers the question “Which materials and processing steps matter the most?” and ranks every input variable by its overall impact on the model's predictions. Two scoring methods are available:

    • SHAP values (default) — attribute each prediction to its input features using SHapley Additive exPlanations. The bar chart shows the average absolute SHAP value per feature — the higher the bar, the stronger the average influence of that feature on the predicted property.
    • Permutation Importances — measure how much the model's performance metric drops when the values of a single feature are randomly shuffled. A larger drop means the feature is more important.

    Switch between the two methods using the Importances type dropdown to cross-validate the ranking of influential features. The Depth dropdown can be used to limit how many features are shown.

    Figure 4.9 Feature Importances tab — SHAP values / Permutation Importances controls and the Feature Descriptions panel

    4.5.4.2 Regression Stats (properties with numerical non predefined values)

    The Regression Stats tab (Figure 4.10) provides formal validation of model quality for the chosen target. It includes:

    • Predicted vs. Observed scatter plot (Figure 4.10) — points should cluster along the diagonal (predicted ≈ actual). The closer the points to the diagonal, the more reliable the model.
    • Residual plots (Figure 4.10) — show the residual (actual minus predicted) plotted against either the predicted value, a chosen feature, or the trial index. Residuals should look randomly scattered around zero; a clear trend, funnel, or curve indicates the model is systematically wrong in some part of the design space. Use the Horizontal axis dropdown to choose what is plotted on the X axis (Predicted, or any input feature) and the Residual type dropdown to switch between Difference, Ratio, and Log ratio residuals.
    Figure 4.10. Regression Stats — residual plots with Horizontal axis and Residual type controls

    4.5.4.3 Classification Stats (properties with predefined values)

    For classification targets, the Stats tab is replaced with Classification Stats, which validates the model's ability to distinguish between classes. It contains:

    • Metrics table (Figure 4.11) — Accuracy, F1, ROC AUC, and Average precision. Hovering on a metric reveals a short plain-English explanation (for example, "46% of predicted labels was predicted correctly" for Accuracy).
    • Confusion Matrix (Figure 4.11) — a grid of observed vs. predicted classes. High percentages on the top-left to bottom-right diagonal indicate accurate predictions; high values off the diagonal indicate where the model confuses classes. Three controls let you adjust how the matrix is rendered: a Cutoff prediction probability slider sets the probability threshold above which a sample is classified into the positive class, a Highlight percentage checkbox toggles the percentage labels inside each cell, and Normalisation radio buttons (Overall / Observed / Predicted) change whether the percentages are normalized across all samples, within each observed class, or within each predicted class.
    Figure 4.11. Classification Stats — Confusion Matrix with cutoff probability slider and normalisation controls
    • Precision and Classification plots (Figure 4.12) — visualize how predicted-probability values map to actual outcomes; clear separation between the class distributions and a steady upward Precision line indicate that the model's confidence scores are well calibrated.
    • ROC AUC plot (Figure 4.12) — the ROC curve traces the trade-off between True Positive Rate and False Positive Rate across decision thresholds. A curve that bows toward the top-left and an AUC value close to 1 indicate a strong model.
    • PR AUC plot (Figure 4.12) — the Precision–Recall curve traces the trade-off between precision and recall. A curve that stays high toward the top-right indicates the model finds most positive cases while remaining accurate, which is especially informative on imbalanced datasets.
    Figure 4.12. Classification Stats — Precision, Classification plots, ROC AUC and PR AUC curves
    • Lift and Cumulative Precision curves (Figure 4.13) — quantify how much better the model performs than random sampling, and the precision you can expect when testing only the top X% of recommended trials.
    Figure 4.13. Classification Stats — Lift and Cumulative Precision curves

    4.5.4.4 Individual Predictions

    The Individual Predictions tab lets you inspect the model's prediction for any specific trial in the training set, and explains why the model produced that prediction. The view differs slightly between regression and classification:

    • For regression targets (Figure 4.14) — select a trial via the Select Index dropdown or the Random Index button. The Prediction box on the right reports the Observed value, the Predicted value, and the Residual (smaller residuals indicate a more accurate prediction). Two filters — Predicted range and Absolute residuals — narrow the index list to trials within a chosen predicted-value or residual range.
    • For classification targets (Figure 4.15) — the Prediction box reports the predicted probability for each class as a small table, alongside a donut chart. An asterisk next to a class label indicates the observed class for that trial. The Observed State multi-select and Predicted probability range slider filter the index list to trials in the classes and probability ranges of interest.
    Figure 4.14. Individual Predictions — regression target with prediction, observed and residual value 
    Figure 4.15. Individual Predictions — classification target with class-probability table and donut chart

    Two further plots help explain how the model arrived at the displayed prediction:

    • Contributions plot (waterfall) (Figure 4.16) — starts from the model's average baseline and shows, step by step, how each feature pushed the prediction up or down. The size of each bar indicates the magnitude of that feature's effect on this specific trial.
    • Partial Dependence Plot (PDP) (Figure 4.16) — selects a single feature (e.g., a material) and traces the predicted target as that feature's value sweeps across its range, while the other features are held at the values from the selected trial. This makes it easy to see whether the trend is monotonic, has a sweet spot, or plateaus.
    Figure 4.16. Individual Predictions — Contributions waterfall plot (left) and Partial Dependence Plot (right) for a regression target

    4.5.4.5 What If…

    The What If… tab (Figure 4.17) lets you run virtual experiments without leaving the application. Pick a baseline trial via the Select Index dropdown, then edit any of the values in the Feature Input section — numeric inputs (e.g., material amounts, temperature, time) accept any numerical value, and categorical inputs (e.g., Atmosphere, Lab Equipment) use a dropdown.

    Figure 4.17. What If… — baseline selector and editable Feature Input panel.

    When the feature value is adjusted, the Prediction box updates instantly:

    • For regression targets, the predicted value changes in real time.
    • For classification targets, the predicted class probabilities update in real time, making it easy to find the “tipping point” at which the most likely class flips from one category to another (Figure 4.18).
    Figure 4.18. What If… for a classification target — class-probability table and donut chart update in real time as feature values are edited.

    This view supports both target-seeking (“what do I change to hit this spec?”) and sensitivity analysis (“how much does this property move when I push that input by 10%?”) without spending lab resources on a physical trial.

    4.5.4.6 Feature Dependence

    The Feature Dependence tab (Figure 4.19) shows how a chosen input feature relates to its impact on predictions across the whole dataset, and how that impact is modulated by a second feature. It contains two paired views:

    • SHAP Summary (left) — ranks features by SHAP value and displays a beeswarm plot for each, where each point is a trial. Color encodes the feature's value (red = high, blue = low), so you can read the direction of effect at a glance — e.g., “high values of feature X push the prediction up.” Use the Summary Type dropdown to switch between detailed and aggregated views.
    • SHAP Dependence (right) — plots the SHAP value of one feature against its raw value, optionally colored by a second feature. Use the Feature dropdown to choose the X axis feature and Color feature to choose the interacting feature. Distinct color bands or fan-shaped patterns reveal interactions — cases where the effect of one ingredient depends on the level of another.
    Figure 4.19. Feature Dependence — Shap Summary (left) and Shap Dependence plot (right).

    4.5.5 Find Best Results — generating optimized formulations

    Use the Find Best Results path when overall model confidence is high (the guidance message recommends it) and you are ready to ask Alchemy AI for the formulations most likely to satisfy your defined targets.

    Click the Find Best Results button at the bottom-left of the results page. A "Generating formulations — Please wait, this process can take a few minutes…" loading message is displayed (Figure 4.20) while Alchemy runs its genetic algorithm on top of the trained models to produce formulation recommendations that optimize toward the defined targets.

    Figure 4.20. Find the best Results button and guidance message.

    When generation is finished, the AI recommended trials page appears (Figure 4.21). The page is headed "AI recommended trials ranked by your requirements:" and is organized as follows.

    Header row

    • Matching Order — the rank of each formulation (#1, #2, #3, …), ordered from best to worst match against the defined targets.
    • Trial Code — the auto-generated name of the recommended formulation (e.g., Formulation 1, Formulation 19).
    • Select for Testing — a checkbox used to mark the formulations you want to carry forward to testing.

    Properties table

    The first table lists every Must Have and Nice to Have property, showing the target you defined alongside each formulation's predicted value and prediction interval:

    • Property, Priority, Target, Unit — repeated from the Lab Book Overview record.
    • Value — the model's predicted value for this property in the recommended formulation. Cells are green when the prediction meets the target and red when it does not.
    • Prediction Interval [Low – High] — the range within which the true value is expected to fall, reflecting the model's uncertainty. Narrower intervals indicate higher-confidence predictions.

    Materials and processing steps table

    The second table lists every material and processing step from the Constraints table alongside each formulation's recommended weight or setting:

    • Material — the name of the ingredient or processing condition.
    • Material properties — the Type and Subtype defined for the material.
    • Target — the constraint defined on the Lab Book Overview record.
    • Unit — the unit of measurement for the value.
    • Weight [g] (or formulating input unit) — the recommended quantity for the formulation. Values inside the defined constraint range are shown with a green background.
    • Processing-step groups are expanded but can also be collapsed.
    • A Total row at the bottom of the table reports the sum of the formulation's material weights.
    Figure 4.21. Find Best Results — properties and materials per recommended formulation.

    Selecting trials and proceeding to testing

    Tick the Select for Testing checkbox for each formulation you want to test in the laboratory — more than one formulation can be selected. Once at least one formulation has been selected, the Proceed to Testing button becomes active (Figure 4.21). Clicking it closes the AI recommended trials page and creates a new Workspace record containing the selected formulations as actual trials, pre-populated with the recommended material weights and processing-step settings, so laboratory test results can be recorded against them.

    4.5.6 Improve Model Performance — generating exploratory experiments

    Use the Improve Model Performance path when at least one trained model has low or moderate confidence (the guidance message recommends it; see Figure 4.22). Instead of optimizing toward the targets, this path generates an adaptive design — a small set of new experiments placed in regions of the design space where the model is uncertain or where feature interactions are not yet well understood. Testing those experiments and re-training the models tightens predictions in the weakest areas, so that subsequent Find Best Results runs can be trusted.

    How adaptive design works. Adaptive design replaces the traditional one-shot experimental matrix with a sequential strategy in which additional data points are placed where the predictive model shows the largest variance, so each new experiment gives the model the most information possible. 

    Running Improve Model Performance

    Click the Improve Model Performance button at the bottom of the results page (Figure 4.22). A "Planning next experiments — Please wait, this process can take a few minutes…" loading message is displayed while Alchemy plans the next set of experiments.

    Figure 4.22. Improve Model Performance button and guidance message.

    When the planning step is finished, the AI suggested experiments page appears (Figure 4.23). The page is organized differently from the Find Best Results page (Figure 4.21, Section 4.5.5): the Materials and processing steps table is shown first, followed by the Properties table with per-property Uncertainty values and a Global Uncertainty score for each experiment. This ordering reflects the purpose of the page — the suggested experiments are not the ones predicted to best meet the targets, but deliberately exploratory experiments designed to reduce uncertainty in the underperforming models.

    Header row

    The header identifies each suggested experiment:

    • Experiment Order — the position of each experiment (#1, #2, #3, …), ordered from most to least informative.
    • Experiment Code — the auto-generated name of the suggested experiment (e.g., Experiment 1, Experiment 2).
    • Select for Testing — a checkbox used to mark the experiments you want to carry forward to testing.

    Materials and processing steps table

    The first table lists every material and processing step from the Constraints table alongside each experiment's recommended quantity or setting:

    ‍

    • Material — the name of the ingredient.
    • Material properties — the Type and Subtype defined for the material.
    • Target — the constraint defined on the Lab Book Overview record.
    • Unit — the unit of measurement.
    • Weight [g] (or formulating input unit) — the recommended quantity for the experiment. Cells with a green background indicate that the value falls within the defined constraint range.
    • Processing-step groups are expanded but can also be collapsed
    • Total row at the bottom of the table reports the sum of the experiment's material weights.

    Properties table 

    The second table reports the model's uncertainty for each Must Have and Nice to Have property in each experiment, plus a single Global Uncertainty score per experiment:

    • Target — the property name (Purity, Molar Mass, State, Yield Percentage, …).
    • Uncertainty — one column per experiment, expressing how confident the model is in its prediction for that property–experiment pair. Higher values indicate that the experiment falls in a region where additional data would most improve the model. Note that uncertainty values are reported on the property's own scale, so they are not directly comparable between properties.
    • Global Uncertainty — a single score per experiment, summarizing the model's overall uncertainty across all target properties. Experiments are ranked by Global Uncertainty, with the most informative ones (highest expected information gain) listed first.

    Unlike the Find Best Results page, no prediction interval is shown on this page — the goal is to surface where the models are weakest, not to predict an outcome with confidence.

    Selecting experiments and proceeding to testing

    As with Find Best Results, tick the Select for Testing checkbox for each experiment you want to run. Any number of experiments can be selected. Once at least one experiment has been selected, the Proceed to Testing button at the bottom-left of the page becomes active. Clicking it closes the page and creates an AI Generated Workspace containing the selected experiments as actual trials, ready to be tested in the laboratory.

    After the laboratory results are recorded, return to the Alchemy AI page and re-train the models. The new data should improve the performance of the previously underperforming properties; once their performance crosses the 80% threshold, the guidance message will switch to recommending Find Best Results, and you can move on to optimization.

    Figure 4.23. Improve Model Performance — properties and materials per recommended formulation.

    5. Interchangeable Materials

    In formulation development, you often work with a set of similar ingredients that can be used in place of one another. For example, you might have several solvents that are functionally equivalent but have different costs or properties. Managing these substitutions can be complex, especially when designing large sets of experiments.

    The Interchangeable Materials feature is designed to solve this problem by giving you more flexibility and control over your formulations. It allows you to group these similar materials and define clear rules for how they can be substituted for one another in your experiments.

    With this feature, you can:

    • Group similar materials: Create named groups of ingredients (e.g., "Resin", "Additive") based on their material type.
    • Define substitution rules: Specify exactly how many materials from a group can be included in a single trial. For instance, you could require that exactly one solvent from a group is used, or a range of one to three different solvents.
    • Combine with other constraints: These interchangeable groups can be used alongside other materials and processing steps in your experimental designs.

    Using this feature impacts how the system generates a Design of Experiments (DOE), scores existing trials in Scan & Score, and makes AI recommendations, as all functions will adhere to the rules you define.

    5.1. Creating an Interchangeable Material Group

    Follow the steps to add an Interchangeable Material Group to your formulation table:

    1. To begin, navigate to the Lab Book Overview page. At the bottom of the Constraints table, you will find a button labeled +INTERCHANGEABLE MATERIALS (Figure 5.1). Once you click the button, the "Interchangeable Materials" modal will appear. Here, you will define the materials and rules for your group.
    Figure 5.1. Button for adding Interchangeable Materials Group
    1. Define the Group: 
      • Give your group a unique Group name (Figure 5.2.).
      • Select a material Type and, optionally, one or more Subtypes. This will filter the list of materials you can add to the group in a later step (Figure 5.2.).
    Figure 5.2. The Interchangeable Materials Modal set Group name and select Type and optionally Subtype
    1. Set the Constraints: This is the core of the feature, where you define the rules for the group.
    • First, choose how you want to Define Constraints On by selecting either Group or Material from the dropdown (Figure 5.3.).
    Figure 5.3. The Interchangeable Materials Modal Defining Constraints
    • If you chose "Group":
      • Set the Group target weight (Figure 5.4). You can select "Constant" for an exact value or "Between" for a range. The system will automatically calculate the allowed range for each individual material.
    Figure 5.4. The Interchangeable Materials Modal Constraints Defined on a Group level
    • If you chose "Material":
      • The Group target weight fields will become read-only. You will define the constraints for each material in the table below, and the system will automatically calculate the resulting total for the group.
    Figure 5.5. The Interchangeable Materials Modal Constraints Defined on a Material level
    1. Add Materials to the Group:
      • Click the "+ Material" button.
      • A dialog will appear showing all available materials based on the Type and Subtype you selected earlier.
      • Select the materials you want to include in the group and click "Add".
      • The selected materials will now appear in a table within the main dialog (Figure 5.6-5.8).
    Figure 5.6. The Interchangeable Materials modal, showing constraints defined at the group level, including a constant target weight and the added materials.
    Figure 5.7. The Interchangeable Materials modal, showing constraints defined at the group level, including a between target weight and the added materials.
    1. Configure Individual Material Constraints (if applicable):
      • If you chose to define constraints at the Material level in step 3, you can now set the Target Weight for each material in the table. You can set each one to be "Constant" or "Between"  and enter the desired values (Figure 5.8).
    Figure 5.8. The Interchangeable Materials modal, showing constraints defined at the material level, added materials with between or constant target weight.
    1. Set the Number of Materials:
      • At the bottom of the dialog, use the "# of Materials" dropdowns to define the minimum and maximum number of materials from this group that must be used in any given trial. For example, if you have 3 materials in the group, you could specify that each trial must use between 1 and 2 of them.
    Figure 5.9. The Interchangeable Materials Modal # of Materials to be used with defined constraints on group level.
    Figure 5.10. The Interchangeable Materials Modal # of Materials to be used with defined constraints on material level
    1. Add the Group to the Constraints table in Lab Book Overview:
      • Once you're satisfied with your configuration, click the "Add" button. The Interchangeable Material Group will be added as a new entry in your Constraints table on Lab Book Overview.
    Figure 5.11. The Interchangeable Materials Group added into Constraints table in Lab Book Overview

    5.2. Using Interchangeable Materials with Other Features

    The rules you define for interchangeable materials directly guide how the platform generates a Design of Experiments (DOE), scores trials in Scan & Score, and creates AI recommendations.

    5.2.1. DOE and AI Training with Interchangeable Materials

    When an interchangeable material group is added, DESIGN EXPERIMENTS and ALCHEMY AI are adjusted to incorporate the formulating strategy. The formulating strategy determines how the interchangeable materials are varied and combined during experiment design and AI training, ensuring that the generated trials and model outputs respect the defined group relationships.

    5.2.2. Scan & Score with Interchangeable Materials

    When you use Scan & Score, the system will only show trials that follow the rules you defined for your interchangeable materials. This includes:

    • Groups which do not contain 0 in their constraints must be present in the trials
    • Trials must contain the correct number of materials from each interchangeable group (e.g., if you set a range of 1-2, only trials with 1 or 2 of those materials will be shown).

    The values in the Scan & Score table are color-coded to indicate if they meet the constraints you set. Green means the value is within the defined range while red means it violates the rules.

    6. Appendices

    The following supplemental material is used to support the above documentation.

    6.1. Appendix A — Detailed Statistical Analysis: Reference

    This appendix provides a deeper reference for the Detailed Analysis tab of the AI-Powered Statistical Analysis report (Section 2.5.2). It explains, for each section of the report, what the metric or test means, why it matters for Machine Learning, and what action to take if the result is unfavorable.

    6.1.1. Dataset Overview

    Reports the size of the dataset: number of input variables (independent variables) and number of target variables (dependent variables). Use this as a quick sanity check that the dataset captured by Scan & Score matches what you defined on the Lab Book Overview record.

    6.1.2. Variable Type Detection

    Confirms which variables Alchemy treats as inputs (predictors) versus targets (outputs), and whether each variable is categorical or continuous. This classification drives the choice of regression vs. classification algorithms in training.

    6.1.3. Dataset Quality Assessment

    6.1.3.1. Target Coverage Analysis

    Reports how many data points per feature are available in the dataset for each target variable. Low coverage relative to the number of features increases the risk of overfitting; the model has too few examples to learn from.

    6.1.3.2. Variable Representation per Target

    How many times each feature is present in the dataset with a value different than 0. Feature representation that is very low — i.e., the variable is rarely observed or is mostly missing — makes it difficult for ML models to learn a reliable relationship: the model either ignores the feature (treating it as low-variance and uninformative) or overfits based on the few rare samples. This significantly reduces robustness and usually requires manual intervention (deleting the underrepresented materials from the Constraints table on the Lab Book Overview record, which also excludes the few trials that contain them from the dataset).

    6.1.3.3. Target Variation Analysis (Coefficient of Variation)

    Reports the coefficient of variation (CV) for each target variable. A high CV indicates a large amount of intrinsic variability relative to the mean: the data is either noisy or has a vast, poorly modeled range. The signal (the actual trend) is obscured by the unexplained scatter, which makes it significantly harder for an ML model to achieve high predictive accuracy.

    To resolve this, let ALCHEMY AI / TRAIN AI evaluate robust non-linear algorithms such as Gradient Boosting and Random Forests, which cope well with heterogeneous variance, and select the one that performs best on your dataset.

    Note that with small sample sizes, the standard deviation is highly unstable and likely to change significantly as more data are added.

    6.1.3.4. Constant or Low-Variance Variable Detection

    Identifies variables with constant or very small variance. Such variables provide little to no discriminative information to the model: they cannot help distinguish between different outcomes or samples. Including them adds noise and computational overhead without contributing to predictive power. Constant variables are automatically excluded from model training, since they cannot improve predictions. For low-variance variables that are not strictly constant, consider whether the lack of variation is intentional or whether more diverse samples should be collected before continuing.

    6.1.3.5. Multicollinearity Detection

    Multicollinearity is the statistical phenomenon where two or more independent predictor variables are highly correlated with one another. Multicollinear pairs introduce redundancy and instability, making it difficult to accurately interpret individual feature importance or coefficient values. To resolve this, use ALCHEMY AI / TRAIN AI to evaluate a wider set of algorithms — including regularization techniques such as Ridge and Lasso regression that are designed to handle multicollinearity — and let the system select the algorithm that performs best on your dataset.

    6.1.3.6. Normality Testing

    Tests whether input variables and target variables are normally distributed. Failing the normality test for target variables primarily concerns Linear Regression (OLS): non-normal target variables usually lead to non-normal residuals, which violates a core OLS assumption and renders p-values and significance tests unreliable. Non-normal input features do not violate core OLS assumptions, but severe skewness still hinders finding clean linear relationships and may require feature transformation.

    When target variables are clearly non-normal, use ALCHEMY AI / TRAIN AI to evaluate a diverse set of algorithms — including non-linear models that do not require the normality assumption — and let the system select the one that performs best on your dataset.

    6.1.3.7. Outlier Detection

    Tests for outliers using the modified z-score and informs the user which input or target values are flagged. Outliers in either inputs or targets typically reduce model performance and reliability: extreme values can disproportionately influence the fitted regression line. Input outliers can shift the slope of the line; target outliers increase the model's overall residual error (e.g., MSE).

    Recommended action. For input outliers, design additional trials to fill the gap in the design space between the cluster of typical input values and the flagged extreme value. For target outliers, retest flagged trials as replicates to confirm whether the extreme value is reproducible; repeated tests help distinguish a real outlier from a one-time measurement error. If the outlier proves genuine, use ALCHEMY AI / TRAIN AI to evaluate robust algorithms such as Random Forests and Gradient Boosting, which are less sensitive to extreme values, and let the system select the one that performs best on your dataset.

    6.1.4. Design Efficiency Criteria

    6.1.4.1. G-efficiency

    G-efficiency is a score that tells you how well spread out your data points are across the design space. G-optimality seeks to protect against the worst-case prediction variance — i.e., the model’s ability to predict well across the whole design space.

    A higher G-efficiency value is better: it indicates a more optimal design where the variance of the predicted response is minimized across the design space, leading to more reliable and stable coefficient estimates. The direct impact on modeling is that the resulting regression model has greater statistical power to detect true factor effects and produce more precise predictions across all factor settings.

    6.1.4.2. Distance-based criteria

    Assess how uniformly spread out the experiment points are within the multidimensional design space. For ML, maximizing this distance ensures better coverage of the input domain, which increases the model’s ability to generalize and make accurate predictions across the entire range of possible recipes or conditions.

    6.1.4.3. Discrepancy criteria

    Quantify the non-uniformity or irregularity in the distribution of data points within the multidimensional design space. For ML, minimizing discrepancy means maximizing the space-filling quality of the data — crucial for building accurate predictive models that perform reliably across the entire range of possible input combinations. Low discrepancy (high uniformity) ensures that no large empty gaps exist in the experimental region, making predictions more robust and reducing uncertainty.

    6.1.5. Cluster Analysis

    Cluster Analysis groups your historical trials by similarity, identifying distinct "families" of formulations within the dataset. For each cluster, the report lists the input variables that are statistically higher or lower than the overall dataset average — these values describe what makes the cluster distinct, not whether it is "better" or "worse" than the others.

    Neither many nor few clusters is automatically preferable. What matters is whether the clusters align with formulation strategies you recognize: a few well-defined clusters that map to families you've explored confirms the dataset has structure the model can learn from, while many small clusters or one large indistinct cluster usually indicates a dataset that lacks meaningful sub-structure.

    The analysis also serves as a coverage check: if the design region you care about is sparsely represented in the surfaced clusters, the model will potentially struggle to predict there — consider extending the dataset with ALCHEMY AI / TRAIN AI (IMPROVE MODEL PERFORMANCE) to fill the gap.

    6.1.6. Feature Importance Analysis

    6.1.6.1. Correlation Analysis

    Reports the linear relationship between each input variable and the target. Correlation analysis is important for ML because it quickly measures the strength and direction of simple associations between features and the target. From this you learn whether each input feature has high, moderate, or low predictive potential (strong, moderate, or low correlation with the target). The SHAP analysis (Section 6.1.6.2) then refines this picture by capturing non-linear effects and feature interactions that correlation alone can't see.

    6.1.6.2. SHAP Analysis

    SHAP measures how much each feature affects the prediction of a model. The SHAP value is a quantitative score derived from comparing predictions made when a feature’s information is present (its actual value) versus when its information is absent (replaced by a baseline or average). This comparison lets SHAP accurately attribute the final prediction score to each input.

    6.1.7. Regression Model Results

    6.1.7.1. Best Model Performance

    Identifies the best-performing model, along with the coefficients, p-value, and significance for each predictor.

    6.1.7.2. Residual Analysis

    Validates the model’s assumptions through diagnostic plots and tests. If these tests fail, the resulting p-values and confidence intervals are unreliable, which prevents the user from trusting the model’s assessment of factor significance and the precision of its predictions. Consequently, the user cannot be confident in identifying the truly influential factors or in optimizing the process based on the model’s conclusions.

    6.1.7.3. Unusual Observations

    Flags individual data points with High Leverage, High Cook’s Distance, and High DFITS — i.e., observations that exert disproportionate influence on the regression model. High Leverage points have extreme input feature values that pull the fitted line toward themselves. High Cook’s Distance and High DFITS identify data points that, if removed, would significantly change the model’s coefficient estimates and/or predicted values. These points require careful review — they are often outliers or mismeasurements that distort the model.

    ‍

    6.2. Appendix B - Mixture Design

    Mixture Design, a subtype of Screening and Optimal Designs, is intended for experiments where the user is interested in the influence of varied dependent variables (weight or volume percentage of each material) on target properties. 

    Mixture design is automatically proposed when the Formulating input is set to Target Weight [%] — i.e., when each material's quantity is expressed as a percentage of the total batch rather than as an absolute weight or volume.

    Required constraints for each material are:

    • Lower and upper bound for varied materials: 
    • Constant for non varied materials (influence of this materials cannot be assessed)

    Rules which need to be respected for properly setting the constraints:

    • Sum of all lower bounds and constant materials should be lower than 100%
    • Sum of all higher bounds and constant materials should be higher than 100%

    ‍

    6.3. Appendix C - Mixture Process Design

    Mixture Process Design, a subtype of Screening and Optimal Designs, is intended for experiments where the user is interested in the influence of varied dependent variables, materials (weight or volume percentage of each material) and processing steps, on target properties. 

    Mixture Process design is automatically proposed when the Formulating input is set to Target Weight [%] and the design includes one or more processing steps in addition to the materials.

    Required constraints for each material are:

    • Lower and upper bound for varied materials or
    • Constant for non varied materials (influence of this materials cannot be assessed)

    Required constraints for each condition from processing step are:

    • Lower and upper bound for conditions of number type
    • Any of for conditions of predefined number or text (alphanumeric) type
    • Set to for conditions of predefined number or text (alphanumeric) type which are not varied and
    • Constant for non varied conditions  (influence of this conditions of processing steps cannot be assessed)

    Rules which need to be respected for properly setting the constraints:

    • Sum of all lower bounds and constant materials should be lower than 100%
    • Sum of all higher bounds and constant materials should be higher than 100%

    6.4 Appendix D - Factorial Design

    Factorial Design, a subtype of Screening Design, is intended for experiments where the user is interested in the influence of varied independent variables, materials (values for weight and/or volume) and/or processing steps on target properties. 

    Factorial design is automatically proposed when the user inputs weight and/or volume for materials and/or processing steps. 

    Required constraints for each material are:

    • Lower and upper bound or
    • Constant 

    Required constraints for each condition from processing step are:

    • Lower and upper bound - for conditions of number type
    • Any of - for conditions of predefined number or text (alphanumeric) type
    • Set to - for conditions of predefined number or text (alphanumeric) type which are not varied and
    • Constant - for non varied conditions (influence of this conditions of processing steps cannot be assessed)

    Constraints are valid without any rules.

    6.5 Appendix E - Response Surface Design

    Response Surface Design, a subtype of Optimal Design, is intended for experiments where the user is interested in the influence of varied independent variables, materials (values for weight and/or volume) and/or processing steps, on target properties. 

    Response surface design is automatically proposed when the user inputs weight and/or volume for materials. 

    Required constraints for each material are:

    • Lower and upper bound or
    • Constant 

    Required constraints for each condition from processing step are:

    • Lower and upper bound for conditions of number type
    • Any of for conditions of predefined number or text (alphanumeric) type
    • Set to for conditions of predefined number or text (alphanumeric) type which are not varied and
    • Constant for non varied conditions  (influence of this conditions of processing steps cannot be assessed)

    Constraints are valid without any rules.

    ‍

    6.6. Appendix F — Model Explainer Dashboard (regression): Plain-English Guide

    This appendix is a non-technical reference for the Model Explainer tabs that are available for regression target properties (continuous numerical outputs such as Viscosity, Hardness, or Drying time). It describes, for each tab, the question the tab answers, what the user is looking at, how to read “good” vs. “bad,” and why the view matters.

    Acronyms used in this appendix

    RMSE (Root Mean Squared Error). Measured in the same units as the property being predicted; represents the typical distance between the prediction and the true value. Smaller numbers mean predictions are closer to the target.

    MAE (Mean Absolute Error). Also in the same units as the property; shows the average distance between predictions and actual values. Lower values mean more consistently accurate predictions.

    R² (Coefficient of Determination). Ranges from 0 to 1 and measures how well the predicted values follow the actual data. Values closer to 1 mean the points fall closer to the ideal diagonal line (predicted = actual).

    SHAP (SHapley Additive exPlanations). Shown on a positive-to-negative scale where distance from zero indicates how strongly a feature pushes the prediction up or down, and larger magnitudes mean greater impact on reaching or missing the target.

    6.6.1. Feature Importance

    Question it answers: Which ingredients or process parameters matter the most?

    What you’re looking at: A ranked list of features showing which ones have the biggest overall impact on the predicted property. Longer bars indicate bigger influence; shorter bars, smaller influence.

    What “good” looks like: A clear set of top drivers that align with your domain intuition; the most important features are also measurable and controllable in production.

    Why this matters: Tells you which variables deserve the tightest control in manufacturing, helps R&D focus effort on the inputs that actually move performance, and lets you deprioritize or simplify low-importance features. Rule of thumb: if changing a feature would meaningfully change the prediction, it should appear near the top.

    6.6.2. Regression Statistics (Model Performance)

    Question it answers: How good is this model overall?

    What you’re looking at: Error metrics (RMSE, MAE), the R² score, and a Predicted vs. Actual plot. RMSE and MAE describe how far off predictions are on average — lower is better. R² describes how much of the real-world variation the model explains — higher is better. The Predicted vs. Actual plot shows visually how close predictions are to reality.

    What “good” looks like: Low error values, R² close to 1, and points clustered along the diagonal of the Predicted vs. Actual plot.

    Why this matters: These tell you whether the model is reliable enough to guide real decisions, not just to generate interesting charts.

    6.6.3. Residuals Analysis

    Question it answers: Is the model making consistent, unbiased errors?

    What you’re looking at: Residuals vs. Predicted and Residuals vs. Feature plots. A residual is the actual value minus the predicted value; you want errors to look random rather than patterned.

    What “good” looks like: Points scattered randomly around zero, with no curves, funnels, or trends.

    Why this matters: Patterns here mean the model is systematically wrong in certain regions of the design space — risky for optimization or scale-up.

    6.6.4. Individual Predictions & Contributions

    Question it answers: Why did this specific formulation get this prediction?

    What you’re looking at: The predicted versus actual value for one chosen sample, paired with a contribution (SHAP) breakdown that shows how each feature pushed the prediction up or down. The model starts from an average baseline; each feature either increases or decreases the prediction; the sum of those pushes equals the final predicted value.

    What “good” looks like: The largest contributions come from known, meaningful features; the explanation matches scientific intuition.

    Why this matters: This is where trust is built. Users can see exactly which ingredients or conditions caused a high or low prediction.

    6.6.5. What-If Analysis (Scenario Planning)

    Question it answers: What happens if I change this ingredient or process setting?

    What you’re looking at: Interactive controls to change inputs, with predictions and explanations updating in real time. This is a digital experiment: adjust a concentration, temperature, or parameter, and instantly see how the predicted property changes — and why.

    What “good” looks like: Smooth, predictable responses to small input changes, and a clear sense of which inputs the property is most sensitive to.

    Why this matters: Enables fast exploration without physical experiments; supports target-seeking (“what do I change to hit this spec?”); and helps avoid risky or extreme combinations before committing to lab work.

    6.6.6. Feature Dependence

    Question it answers: How does changing one input generally affect the output?

    What you’re looking at: Plots showing how a feature’s value relates to its impact on predictions. The X-axis is the feature value, the Y-axis is how much that feature pushes the prediction up or down, and color encodes another feature that may interact with it.

    What “good” looks like: Clear trends — increase, decrease, or plateau — and interactions that make physical or chemical sense.

    Why this matters: Helps identify non-linear effects, sweet spots, and interactions where two ingredients together behave differently than alone.

    Feature Interactions (Synergy vs. Antagonism):

    Synergy: two features together have a bigger positive effect than expected. Antagonism: one feature reduces the effect of another. No interaction: the two features behave independently.

    What you're looking at: When points stratify into distinct color bands (different colors taking different SHAP-value paths along the X-axis), you have an interaction — the effect of the X-axis feature depends on the value of the color feature. When the colors are randomly mixed throughout the cloud, the two features behave independently and there is no interaction.

    Why this matters: Critical for formulation science, where combinations matter more than single ingredients.

    6.7. Appendix G — Model Explainer Dashboard (classification): Plain-English Guide

    This appendix is a non-technical reference for the Model Explainer tabs that are available for classification target properties (predefined categorical outputs such as Pass/Fail, or any other set of named categories). It mirrors the structure of Appendix F but adapts the explanations to classification metrics.

    Acronyms and key terms used in this appendix

    ROC AUC (Receiver Operating Characteristic Area Under the Curve). A performance measurement for classification problems at various threshold settings. It tells how well the model is able to distinguish between classes.

    PR AUC (Precision–Recall Area Under the Curve). Summarizes the trade-off between precision and recall using different probability thresholds.

    Log Loss (Logarithmic Loss). Penalizes false classifications by taking into account the probability of the prediction. Lower values indicate better performance.

    SHAP (SHapley Additive exPlanations). Distance from zero indicates how strongly a feature pushes the prediction toward one class or another; larger magnitudes mean greater impact.

    Class (Category). The specific group or label the model is trying to predict (for example, “Pass” vs. “Fail”).

    Probability. A number between 0 and 1 (or 0% to 100%) that represents the model’s confidence. A 0.85 probability means the model is 85% certain a sample belongs to a specific class.

    6.7.1. Feature Importance

    As in the regression case (Appendix F), Feature Importance ranks the inputs by their overall impact on the predicted class. Same interpretation, same rule of thumb: longer bars mean a feature has stronger influence on the predicted probability of a class.

    6.7.2. Classification Statistics (Model Performance)

    Question it answers: How reliable is this model overall?

    What you’re looking at: Accuracy, Precision, Recall, F1 Score, ROC AUC Score, PR AUC Score, and Log Loss.

    Accuracy. Percentage of total predictions that were correct — higher is better.

    Precision. When the model predicts a category, how often is it actually correct — higher is better.

    Recall. Out of all the actual instances of a category, how many did the model correctly identify — higher is better.

    F1 Score. A balanced metric that combines precision and recall — closer to 1 is better.

    ROC AUC Score. 0–1 score that measures the model’s ability to separate classes — closer to 1 is better.

    PR AUC Score. Area under the Precision–Recall curve; higher values indicate better performance, particularly on imbalanced datasets.

    Log Loss. Measures the “uncertainty” of predictions — lower values mean the model is more confident and correct.

    What “good” looks like: High Accuracy, Precision, Recall, F1, ROC AUC, and PR AUC; low Log Loss; and high diagonal percentages on the Confusion Matrix (Observed class = Predicted class).

    6.7.3. Interpretation of Plots

    The Classification Stats tab includes several plots that go beyond aggregate metrics. Each is described below in plain English.

    Confusion Matrix

    What it is. A grid of observed vs. predicted classes — a scorecard showing where the model was right and where it got confused.

    What “good” looks like. High numbers in the diagonal boxes (top-left to bottom-right) and near-zero numbers everywhere else.

    Why this matters. It tells you exactly what kind of mistakes the model makes — does it miss real cases (false negatives) or sound false alarms (false positives)?

    Precision and Classification plots

    What they are. Two paired views of how “separation” works. The Classification plot shows the distribution of labels relative to a probability cutoff; the Precision plot tracks how the actual positive rate moves with the model’s predicted probability.

    What “good” looks like. In the Classification plot, two distinct “humps” (one per category) with very little overlap. In the Precision plot, a steady upward line.

    Why this matters. If the humps overlap, the model is “unsure” in that region. If the precision line goes up, you can trust a “90% confidence” score to be right roughly 90% of the time.

    ROC AUC and PR AUC plots

    What they are. “Stress test” curves that show the trade-off between being thorough and being accurate.

    What “good” looks like. The ROC curve should bow deeply toward the top-left; the PR curve should stay high and trend toward the top-right.

    Why this matters. These show the overall “strength” of the model. A high AUC means the model is excellent at ranking the best formulations at the top of the list.

    Lift and Cumulative Precision curves

    What they are. “Value add” curves that show how much better the model performs than random guessing.

    What “good” looks like. Curves that start very high on the left and stay well above the diagonal baseline.

    Why this matters. They tell you, for example, “If I only have the budget to test the top 10% of samples recommended by the model, how many successes will I find?” — a Lift of 3.0 means three times more successes than picking at random.

    6.7.4. Individual Predictions & Contributions

    Question it answers: Why did this specific formulation get this prediction?

    What you’re looking at: A Contributions plot — a SHAP breakdown showing how each feature pushed the specific prediction up (toward one class) or down (toward another) — and a Partial Dependence Plot, which shows how the predicted probability of a class changes if you vary one specific feature while holding the others constant. The model starts from an average baseline; each feature increases or decreases the probability of the designated positive class; the sum of those pushes equals the final predicted probability.

    What “good” looks like: The largest contributions come from known, meaningful features, and the explanation matches scientific intuition.

    Why this matters: This is where trust is built. Users can see precisely which factors led to a high or low probability of achieving the positive class.

    6.7.5. What-If Analysis (Scenario Planning)

    Question it answers: What happens to the final category if I change a temperature, time, or concentration?

    What you’re looking at: Interactive controls that update probabilities in real time. Adjust a concentration, temperature, or parameter; instantly see how the probability of a specific outcome shifts; and read the explanation of why it shifted.

    What “good” looks like: Smooth, predictable probability shifts where small changes in inputs lead to logical changes in the probability of a category, plus a clear sense of which “tipping points” cause the model to switch its prediction from one category to another.

    Why this matters: Enables fast exploration without physical experiments; supports target-seeking (“what do I change to hit this category?”); and helps avoid risky or extreme combinations before committing to lab work.

    6.7.6. Feature Dependence

    Same idea as the regression view (Appendix F), but the Y-axis here represents how much the feature pushes the predicted probability for the positive class up or down, rather than a numerical target value.

    6.7.7. Feature Interactions (Synergy vs. Antagonism)

    Same definitions as in Appendix F, Synergy = two features together have a bigger effect on the predicted class than expected; antagonism = one feature reduces the effect of another; no interaction = they behave independently. Reading the Feature Dependence colored scatter: distinct color bands indicate an interaction (the effect of the X-axis feature depends on the color feature); randomly mixed colors indicate no interaction.

    ‍

    © 2017-2026, Alchemy Cloud, Inc. All Rights Reserved. Confidential & Proprietary. Not for redistribution. Legal Terms

    We use cookies to operate this website, improve its usability, personalize your experience, and track visits. By continuing to use this site, you are consenting to the use of cookies. We also share data with our social media and analytics partners. For more information, please read our updated Privacy Policy.

    Accept