Understand your results
Last updated
Last updated
©2024 Total Materia AG. All Rights Reserved
In addition to the prediction results, the Result page displays the model's performance metrics, confidence interval and applicability domain. It's important to note that, unlike confidence interval, other testing metrics reflect the overall model performance, not the specific prediction.
Testing metrics
Metric coefficients quantify the model’s performance compared to measured (real) values of predicted properties.
List of used performance metrics:
Mean absolute percentage error (MAPE)
Definition: It is the average of absolute percentage errors between predicted and actual values.
Interpretation: It provides a measure of how far off the predicted values on average are from the actual values, expressed as a percentage. A lower value indicates a higher-quality model, where 0 means the model made no errors.
For example: density models are expected to have a MAPE of around 5% (since density depends only on the chemical composition of materials), but more complex models like the one predicting Fatigue Strength are expected to have a MAPE of up to 20%.
Root mean squared error (RMSE)
Definition: The root of squared differences between predicted and actual values.
Interpretation: It measures the average magnitude of the errors between predicted and actual values. A lower value indicates a higher-quality model, where 0 means the model made no errors. It provides a measure in the same units as the data, making it straightforward to interpret.
Mean absolute error (MAE)
Definition: The average of absolute differences between actual and predicted values.
Interpretation: It measures the average magnitude of the errors between predicted values and actual values, without considering the direction of the errors. It indicates how much on average the predictions deviate from the actual values. A low value indicates a higher-quality model, where 0 means the model made no errors. Like RMSE, it also provides a measure in the same units as the data.
Coefficient of determination (R²)
Definition: The coefficient of determination indicates the proportion of variance in the dependent variable (predicted property in our case) that is explained by the independent variables (inputs in our case). It is used to evaluate the fit of a regression model. It quantifies how well the independent variables explain the variability of the dependent variable in a regression model.
Interpretation: It ranges from 0 to 1, where a higher value indicates a higher-quality model. Value 1 indicates that the model perfectly explains all the variability of the dependent variable. All data points lie exactly on the regression line.
Correlation coefficient (r)
Definition: The Pearson correlation coefficient between the actual and predicted values. It quantifies the strength of the relationship between actual and predicted values.
Interpretation: It ranges from 0 to 1. The value of 0 implies no relation between the actual and predicted values, while the value of 1 implies a perfect match (predicted and actual values are the same).
Predictions within relative error (with different error margin 5%, 10% or 20%)
Definition: The share of predictions within <5%, <10% or <20% relative error.
Interpretation: For example, a value of 92% for predictions within a relative error of ±5% means that 92% of all predictions made by the model on testing had a relative error of no more than 5%. It ranges from 0% to 100%, where a higher value indicates a higher quality model.
To estimate the performance and applicability of the model, the prediction accuracy must be quantified using several numerical measures:
MAPE, MAE, and RMSE are similar, whereby RMSE is more sensitive to high residuals, while MAPE is dimensionless, i.e. it is expressed in %,
Because of being dimensionless, as well as r and R², MAPE is convenient for comparing the performance of different models, even if they predict different properties,
RMSE and MAE provide a more meaningful interpretation because they are measured in the units of the predicted property,
Also, the MAPE, RMSE, and MAE can be used to calculate the confidence interval,
Finally, a metric that provides a share of predictions within relative error allows flexible tolerance evaluation for different precision requirements, making it easier to understand and communicate the model’s effectiveness.
Applicability
Applicability helps describe the reliability of a performed model’s prediction. It indicates whether the chosen type of material (defined by chemical composition for metals) and conditions (heat treatment, product form, temperature, etc.) are likely to give prediction with higher or lower accuracy.
In general, the applicability domain of the model is the alloys space in which the model makes predictions with a given reliability. It is the specific range of materials and properties characteristics within which the model can predict with certain reliability. Material and property characteristics are information (inputs) used for model training.
In Predictor, applicability is determined using a range-based approach and has three levels. This approach involves defining the position of the chosen material and conditions for prediction in comparison to the data used for training and testing the model.
Levels of applicability range in Predictor are:
1) Fully applicable – values of all chosen inputs are in the testing domain of the model. When prediction is fully applicable, it is expected to have high accuracy and reliability.
2) Applicable – values of all chosen inputs are in the training domain, but one or more inputs are outside of the model’s testing domain. In other words, the value of some input or inputs chosen for prediction is outside of the range in which the model was tested.
3) Extrapolation – values of one or more inputs are outside of the model scope (training domain). This means that the value of one or more inputs is not aligned with the range of values used for training the model. By definition, this domain gives the least reliability BUT does not necessarily mean inaccurate prediction, because some cases can be very close to the model scope.
When prediction is applicable or in extrapolation, it makes a difference if low, medium, or high significance input is out of the scope. Therefore, besides the level of applicability, you can see what type of input is causing this level.
Applicability can be Applicability (LSI), Applicability (MSI), Applicability (HSI), where:
LSI - low significant input/s
MSI - medium significant input/s
HSI - highly significant input/s
In line with the previous, extrapolation can be Extrapolation (LSI), Extrapolation (MSI), and Extrapolation (HSI).
Dataset size rating
This parameter indicates the amount of data available for training models for certain material group. The dataset size is rated from 1 to 5, with 5 representing larger datasets that are more likely to yield more accurate predictions.
How is confidence interval set?
Definition: Confidence interval (CI) is range in which true value of variable is expected to be, given with the certain probability (confidence level). In Predictor, CI represents the range of values in which true value of the property you are predicting is expected to be.
Implementation:
CI is calculated on prediction level (unlike other metrics describing the general model’s performance). The process is:
Test set is partitioned into smaller regions determined by the distance of each point from the dataset distribution. The number of regions varies based on the dataset size to ensure that each region:
contains a sufficient number of data points for statistically reliable estimation
contains an approximately equal number of data points.
Confidence intervals for errors are computed for each region.
A new datapoint (input) is initially classified into one of the regions based on its distance from the dataset. Subsequently, the confidence interval corresponding to that region is assigned to the data point.
As an example, in this situation confidence interval tells us that the true value of the predicted property lies within the range: 16.19610-6/°C -16.52210-6/°C with the 80% probability (confidence level). The higher the confidence level is, the wider the confidence interval you get.