Step one in the process of building a machine-learning model is to review the features (fields) available in the data and to develop an optimal feature subset selection (OSS) which we will be used to build the machine-learning model.
The dataset contained a months worth of data (472 records) and included thirty-five features or variables. 472 records is not enough data to build a comprehensive machine-learning model but is enough data for us to build a working prototype.
Features are also sometimes referred to as “variables” or “attributes and represent a column in the tabular data or CSV files. Each feature, or column, represents a measurable piece of data that can be used for analysis.
There is no right and wrong answer when selecting features. The process of selecting features is trial and error with a goal of selecting features to obtain the most accurate prediction.
We eliminated features that were measured post and pre procedure such as post anesthesia in and out time because those features would be unknown at the time of scheduling and do not impact operating room time. I used fifteen features to predict actual duration of the procedure.
The model was built in a Swift playground on a Mac using the MLBoostedTreeClassifier from Apple’s CoreML framework. This classifier is Apple’s version on the XGBoost in Python or LSBoost in Mathlab. In addition to gradient boosted decision trees I also tried logistic regression, linear regression and random forest. I researched typical parameters for the MLBoostedTree classifier and tweaked them (mainly max iterations) to improve the models validation accuracy and minimize training log loss and validation log loss.
//Define the model parameters
let boostedTreeModelParameters = MLBoostedTreeClassifier.ModelParameters.init(validation: MLBoostedTreeClassifier.ModelParameters.ValidationData.split(strategy: .automatic), maxDepth: 6, maxIterations: 100, minLossReduction: 0.0, minChildWeight: 0.1, randomSeed: 42, stepSize: 0.3, earlyStoppingRounds: 200, rowSubsample: 1.0, columnSubsample: 1.0)
Create a Swift Playground on your Mac in Xcode. Select the following code below, copy and paste it into your Swift Playground.
//
// ORSchedule.playground
//
import Cocoa
import CoreML
import CreateML
//Define Paths
let dataPath = "/Users/jburke/Developer/Machine Learning/OR Validate Booking/Data/ORValidate 3.0/"
let modelPath = "/Users/jburke/Developer/Machine Learning/OR Validate Booking/Model Files/"
//Define filenames
let filename = "ORSchedule"
let csvFilename = filename + ".csv"
let boostedTreeModelFilename = "boostedTree" + filename + ".mlmodel"
let trainingCSV = URL(fileURLWithPath: dataPath + csvFilename)
let ORTrainingData = try MLDataTable(contentsOf: trainingCSV)
let (trainingData, testData) = ORTrainingData.randomSplit(by: 0.8, seed: 0)
//Define the model parameters
let boostedTreeModelParameters = MLBoostedTreeClassifier.ModelParameters.init(validation: MLBoostedTreeClassifier.ModelParameters.ValidationData.split(strategy: .automatic), maxDepth: 6, maxIterations: 100, minLossReduction: 0.0, minChildWeight: 0.1, randomSeed: 42, stepSize: 0.3, earlyStoppingRounds: 200, rowSubsample: 1.0, columnSubsample: 1.0)
//Create the Model - The targetColumn "actualDuration"
//Boosted Tree Classifier
let modelFilename = boostedTreeModelFilename
let ORScheduleModel = try MLBoostedTreeClassifier(trainingData: trainingData, targetColumn: "actualDuration", featureColumns: nil, parameters: boostedTreeModelParameters)
//Evaluate Model
let evaluationMetrics = ORScheduleModel.evaluation(on: testData)
let trainingMetrics = ORScheduleModel.trainingMetrics
let validataionMetrics = ORScheduleModel.validationMetrics
//Save the model in the ML Model directory
var outputURL = URL(fileURLWithPath: modelPath + modelFilename)
var modelMetadata = MLModelMetadata(author: "John Burke",
shortDescription: modelFilename + " From " + dataPath + csvFilename,
license: nil,
version: "3.0",
additional: nil)
try ORScheduleModel.write(to: outputURL, metadata: modelMetadata)
Here is the output from building a boosted tree machine-learning model.
column_type_hints = {}
Finished parsing file /Users/jburke/Developer/Machine Learning/OR Validate Booking/Data/ORValidate 3.0/ORSchedule.csv
Parsing completed. Parsed 100 lines in 0.02756 secs.
Finished parsing file /Users/jburke/Developer/Machine Learning/OR Validate Booking/Data/ORValidate 3.0/ORSchedule.csv
Parsing completed. Parsed 472 lines in 0.006072 secs.
Using 16 features to train a model to predict actualDuration.
Boosted trees classifier:
--------------------------------------------------------
Number of examples : 343
Number of classes : 153
Number of feature columns : 16
Number of unpacked features : 16
+-----------+--------------+-------------------+---------------------+-------------------+---------------------+
| Iteration | Elapsed Time | Training Accuracy | Validation Accuracy | Training Log Loss | Validation Log Loss |
+-----------+--------------+-------------------+---------------------+-------------------+---------------------+
| 1 | 0.263568 | 0.172012 | 0.000000 | 4.314401 | 4.975359 |
| 2 | 0.535498 | 0.387755 | 0.038462 | 3.730001 | 4.909915 |
| 3 | 0.822292 | 0.539359 | 0.038462 | 3.215479 | 4.842813 |
| 4 | 1.116074 | 0.647230 | 0.038462 | 2.775416 | 4.830961 |
| 5 | 1.412139 | 0.763848 | 0.038462 | 2.391763 | 4.858502 |
| 10 | 2.815473 | 0.985423 | 0.076923 | 1.142246 | 4.982971 |
| 25 | 6.559168 | 1.000000 | 0.076923 | 0.200959 | 5.452729 |
| 50 | 12.441487 | 1.000000 | 0.038462 | 0.060751 | 5.819847 |
| 75 | 17.456728 | 1.000000 | 0.038462 | 0.040338 | 5.968743 |
| 100 | 21.466581 | 1.000000 | 0.038462 | 0.035923 | 6.017910 |
+-----------+--------------+-------------------+---------------------+-------------------+---------------------+
Trained model successfully saved at /Users/jburke/Developer/Machine Learning/OR Validate Booking/Model Files/boostedTreeORSchedule.mlmodel.
The model was built using a randomized 80/20 split of the data. Twenty percent of the data was used to test the model during the build process. The model achieved 100% training accuracy range after ~25 iterations using the boosted trees algorithm. Remember we are testing the model with 20% of the data (random split) we used to build the model, so just because we have achieved a 100% training accuracy does not mean that there will not be fluctuation in the confidence level of the models predictions. The real test is using data the model has not seen before.
This machine-learning model is ready to use in an app.
I built a single view iOS application to test the model using data it had not seen. The following video demonstrates the application.
We selected Dr. Mayfair a thoracic surgeon to demonstrate the surgical booking machine-learning model. It is a planned outpatient procedure scheduled in room 4. He is using general anesthesia to preform a bronchoscopy scheduled for 8:15am on July 14th and we are booking it for an hour.
The model confirmed the booking for an hour and predicted the procedure should take 00:47 minutes. However the model confidence is only 51.1%.
The model confidence is 51.1% which under 80% so the confidence field is highlighted in yellow as a caution.
Dr. Syphax an orthopedic surgeon is planning to perform a Arthroscopy shoulder labral repair under general anesthesia in OR 6.
The procedure was scheduled for an hour.
The model predicted more time was needed to accommodate this procedure and highlighted the time field in red.
The model confidence is 61.7% which under 80% so the confidence field is highlighted in yellow as a caution.
We revised the booking and increased the block time from 01:00 to 03:15. The model now considers this a valid surgical booking and highlights the time field in green.
The model confidence increased to 69.1% which is still under 80% so the confidence field remains highlighted in yellow as a caution.