We know how to learn and predict with AiSara. Sometimes, a question comes to mind when we predict with AiSara: “How do I verify my model with AiSara? How do I know if the prediction is reliable/accurate?”. Within this article, we will shine some light regarding this matter and we hope it will help you in validating your model with AiSara.
Before we begin, I will briefly explain regarding the workflow. These are the steps that we will discuss:
Prepare your training and testing dataset
Use AiSara to Learn from the training data, then predict with the test data.
Calculate the Pearson’s Correlation (R2) between the true value, and the predicted value of AiSara.
Repeat step 1 and 3, but with a higher number of training data set.
Compare the results (by using R-Squared, R2) of AiSara between the 2 models.
We have automated this for you, with the Blind Test function in our Excel, we’ll talk more about it in the end.
We will be using a sample oil and gas History Matching dataset, which consists of 6 variables to find a minimum target value which represents the minimum error, also known as hm_error in the dataset. Minimum error means, the best solution possible. In case you are wondering where the dataset comes from, it is from hours of simulation to produce the results as shown. Now let us proceed to the first step which is:
Let’s set aside some dataset for training and testing
What is training dataset? A set of data that is used for learning.
What is testing dataset? A set of data that is independent of training dataset, to assess the performance of the trained model.
So firstly, we will set aside 30% of the data for training and 70% of the data for testing. There are 6 input variables as shown above, and 1 output variable, which is hm_error.
Then we learn the 6 inputs with the training data and predict the hm_error with testing data.
Once we have predicted the testing data with our trained model, here is where we verify our model using the Pearson correlation, which you can learn more about it here. We can simply use the RSQ function in excel, to find our R2. For this walkthrough, the function we used is “=RSQ(True Output, Predicted Output)”, and the result is R2 0.0780. The low R2 result shows that there is almost no linear correlation between the true output value with the predicted output value, which means the model is weak.
In Summary, the closer the R2 is to 1, the higher the correlation between AiSara’s prediction model and the minimum history match error.
Great, we have our first test. Next, we repeat the same process as above by using 50% training dataset and 50% testing data. Let’s cut to the chase, and here are the results