4  Results

We summarize cross-validated training performance of class metrics in the training set. The accuracy, F1-score, and kappa, are the metrics of interest. Workflows are ordered by their mean estimates across the outer folds of the nested CV for each metric.

4.1 Training Set

4.2 Rank Aggregation

Multi-step methods:

  • sequential: sequential algorithm sequence of subsampling methods and algorithms used are:
    • HGSOC vs. non-HGSOC using upsubsampling and XGBoost
    • CCOC vs. non-CCOC using no subsampling and random forest
    • ENOC vs. non-ENOC using no subsampling and support vector machine
    • MUOC vs. LGSOC using SMOTE subsampling and random forest
  • two_step: two-step algorithm sequence of subsampling methods and algorithms used are:
    • HGSOC vs. non-HGSOC using upsampling and XGBoost
    • CCOC vs. ENOC vs. MUOC vs. LGSOC using SMOTE subsampling and support vector machine

We conduct rank aggregation using a two-stage nested approach:

  1. First we rank aggregate the per-class metrics for F1-score, balanced accuracy and kappa.
  2. Then we take the aggregated lists from the three metrics and perform a final rank aggregation.
  3. The top workflows from the final rank aggregation are used for gene optimization in the confirmation set

4.2.1 Across Classes

4.2.2 Across Metrics

Table 4.10: Rank Aggregation Comparison of Metrics Used
Rank F1 Balanced Accuracy Kappa
1 sequential sequential sequential
2 two_step two_step two_step
3 up_xgb up_mr up_xgb
4 up_rf smote_mr up_rf
5 smote_svm up_xgb smote_svm
6 hybrid_svm smote_xgb hybrid_rf
7 smote_xgb hybrid_mr smote_xgb
8 hybrid_rf hybrid_xgb hybrid_svm
9 up_svm down_xgb up_svm
10 hybrid_xgb smote_svm hybrid_xgb
11 none_rf hybrid_rf smote_rf
12 smote_mr down_mr smote_mr
13 smote_rf hybrid_svm none_svm
14 hybrid_mr up_rf none_rf
15 up_mr smote_rf hybrid_mr
16 down_mr down_rf up_mr
17 down_rf none_svm down_mr
18 down_svm down_svm down_rf
19 down_xgb none_rf down_svm
20 NA up_svm down_xgb
21 NA none_mr none_mr
22 NA none_xgb none_xgb
Table 4.11: Top 5 Workflows from Final Rank Aggregation
Rank Workflow
1 sequential
2 two_step
3 up_xgb
4 up_rf
5 smote_svm

4.2.3 Top Workflows

We look at the per-class evaluation metrics of the top 5 workflows.

Table 4.12: Top Workflow Per-Class Evaluation Metrics
Histotypes
Metric Workflow HGSOC CCOC ENOC MUOC LGSOC
Accuracy sequential 0.963 (0.956, 0.968) 0.917 (0.896, 0.958) 0.856 (0.774, 0.909) 0.951 (0.882, 1) 0.951 (0.882, 1)
2-STEP 0.963 (0.956, 0.968) 0.934 (0.896, 0.958) 0.839 (0.729, 0.896) 0.892 (0.812, 0.96) 0.971 (0.938, 1)
Up-XGB 0.96 (0.944, 0.976) 0.985 (0.968, 0.996) 0.963 (0.956, 0.976) 0.977 (0.968, 0.988) 0.981 (0.976, 0.988)
Up-RF 0.955 (0.92, 0.976) 0.983 (0.972, 0.992) 0.959 (0.948, 0.972) 0.979 (0.964, 0.988) 0.982 (0.968, 0.988)
SMOTE-SVM 0.95 (0.936, 0.968) 0.979 (0.968, 0.988) 0.954 (0.94, 0.964) 0.978 (0.96, 0.984) 0.981 (0.972, 0.984)
Sensitivity sequential 0.978 (0.966, 0.99) 0.814 (0.75, 0.882) 0.865 (0.812, 0.941) 0.965 (0.909, 1) 0.92 (0.6, 1)
2-STEP 0.978 (0.966, 0.99) 0.866 (0.824, 0.933) 0.735 (0.556, 0.824) 0.842 (0.75, 0.923) 0.767 (0, 1)
Up-XGB 0.981 (0.976, 0.986) 0.817 (0.571, 0.941) 0.679 (0.444, 0.818) 0.8 (0.538, 1) 0.35 (0.25, 0.667)
Up-RF 0.988 (0.979, 0.995) 0.793 (0.571, 0.882) 0.635 (0.444, 0.812) 0.766 (0.462, 0.909) 0.183 (0, 0.333)
SMOTE-SVM 0.966 (0.955, 0.977) 0.757 (0.571, 0.85) 0.681 (0.5, 0.864) 0.748 (0.538, 0.909) 0.642 (0.5, 0.75)
Specificity sequential 0.897 (0.875, 0.918) 0.969 (0.938, 1) 0.847 (0.733, 0.875) 0.92 (0.6, 1) 0.965 (0.909, 1)
2-STEP 0.897 (0.875, 0.918) 0.969 (0.935, 1) 0.893 (0.833, 0.935) 0.908 (0.833, 0.973) 0.977 (0.953, 1)
Up-XGB 0.875 (0.826, 0.919) 0.996 (0.992, 1) 0.981 (0.97, 0.991) 0.986 (0.979, 0.992) 0.993 (0.984, 1)
Up-RF 0.822 (0.738, 0.892) 0.996 (0.992, 1) 0.98 (0.966, 0.987) 0.989 (0.983, 0.996) 0.997 (0.988, 1)
SMOTE-SVM 0.88 (0.804, 0.919) 0.993 (0.987, 1) 0.971 (0.962, 0.975) 0.989 (0.983, 0.996) 0.987 (0.976, 0.996)
F1-Score sequential 0.977 (0.973, 0.98) 0.868 (0.828, 0.933) 0.86 (0.788, 0.914) 0.966 (0.923, 1) 0.91 (0.75, 1)
2-STEP 0.977 (0.973, 0.98) 0.899 (0.848, 0.933) 0.755 (0.606, 0.848) 0.788 (0.667, 0.923) 0.733 (0, 1)
Up-XGB 0.975 (0.964, 0.986) 0.862 (0.667, 0.97) 0.685 (0.471, 0.857) 0.753 (0.636, 0.87) 0.368 (0.25, 0.571)
Up-RF 0.972 (0.949, 0.986) 0.849 (0.696, 0.938) 0.652 (0.471, 0.788) 0.757 (0.571, 0.87) 0.362 (0.286, 0.4)
SMOTE-SVM 0.969 (0.96, 0.981) 0.811 (0.667, 0.919) 0.638 (0.5, 0.809) 0.752 (0.583, 0.833) 0.524 (0.364, 0.714)
Balanced Accuracy sequential 0.938 (0.928, 0.947) 0.892 (0.859, 0.938) 0.856 (0.773, 0.908) 0.943 (0.8, 1) 0.943 (0.8, 1)
2-STEP 0.938 (0.928, 0.947) 0.917 (0.88, 0.952) 0.814 (0.694, 0.88) 0.875 (0.792, 0.948) 0.872 (0.479, 1)
Up-XGB 0.928 (0.901, 0.952) 0.906 (0.781, 0.971) 0.83 (0.714, 0.905) 0.893 (0.765, 0.99) 0.671 (0.621, 0.829)
Up-RF 0.905 (0.858, 0.941) 0.894 (0.784, 0.941) 0.808 (0.714, 0.898) 0.878 (0.727, 0.95) 0.59 (0.5, 0.665)
SMOTE-SVM 0.923 (0.885, 0.948) 0.875 (0.781, 0.925) 0.826 (0.735, 0.919) 0.869 (0.761, 0.948) 0.814 (0.746, 0.869)
Kappa sequential 0.879 (0.862, 0.895) 0.808 (0.754, 0.903) 0.712 (0.547, 0.818) 0.877 (0.679, 1) 0.877 (0.679, 1)
2-STEP 0.879 (0.862, 0.895) 0.85 (0.769, 0.903) 0.635 (0.402, 0.769) 0.716 (0.538, 0.896) 0.718 (-0.029, 1)
Up-XGB 0.871 (0.822, 0.905) 0.854 (0.65, 0.968) 0.666 (0.452, 0.844) 0.741 (0.62, 0.863) 0.36 (0.239, 0.566)
Up-RF 0.85 (0.768, 0.903) 0.84 (0.682, 0.933) 0.63 (0.452, 0.773) 0.746 (0.554, 0.863) 0.213 (0, 0.396)
SMOTE-SVM 0.839 (0.783, 0.876) 0.8 (0.65, 0.913) 0.614 (0.48, 0.789) 0.741 (0.563, 0.825) 0.516 (0.352, 0.706)
Figure 4.7: Top 5 Workflow Per-Class Evaluation Metrics by Metric
Table 4.13: Top Workflow Per-Class Evaluation Metrics and Ranks
Workflow Rank HGSOC CCOC ENOC MUOC LGSOC
F1-Score
sequential 1 0.977 0.868 0.860 0.966 0.910
2-STEP 2 0.977 0.899 0.755 0.788 0.733
Up-XGB 3 0.975 0.862 0.685 0.753 0.368
Up-RF 4 0.972 0.849 0.652 0.757 0.362
SMOTE-SVM 5 0.969 0.811 0.638 0.752 0.524
Balanced Accuracy
sequential 1 0.938 0.892 0.856 0.943 0.943
2-STEP 2 0.938 0.917 0.814 0.875 0.872
Up-XGB 5 0.928 0.906 0.830 0.893 0.671
SMOTE-SVM 10 0.923 0.875 0.826 0.869 0.814
Up-RF 14 0.905 0.894 0.808 0.878 0.590
Kappa
sequential 1 0.879 0.808 0.712 0.877 0.877
2-STEP 2 0.879 0.850 0.635 0.716 0.718
Up-XGB 3 0.871 0.854 0.666 0.741 0.360
Up-RF 4 0.850 0.840 0.630 0.746 0.213
SMOTE-SVM 5 0.839 0.800 0.614 0.741 0.516
Figure 4.8: Top 5 Workflow Per-Class Evaluation Metrics by Metric

Misclassified cases from a previous step of the sequence of classifiers are not included in subsequent steps of the training set CV folds. Thus, we cannot piece together the test set predictions from the sequential and two-step algorithms to obtain overall metrics.

4.3 Confirmation Set

Now we’d like to see how our best five workflows perform in the confirmation set. The class-specific F1-scores will be used. The top performing method will be selected for gene optimization.

Table 4.14: Evaluation Metrics on Confirmation Set Models
Histotypes
Method Metric Overall HGSOC CCOC ENOC MUOC LGSOC
Sequential Accuracy 0.830 0.860 0.977 0.882 0.969 0.974
Sensitivity 0.592 0.950 0.847 0.486 0.593 0.083
Specificity 0.923 0.683 0.993 0.961 0.985 0.990
F1-Score 0.618 0.900 0.891 0.578 0.615 0.105
Balanced Accuracy 0.757 0.817 0.920 0.723 0.789 0.537
Kappa 0.648 0.670 0.877 0.512 0.599 0.093
2-STEP Accuracy 0.833 0.860 0.970 0.889 0.978 0.969
Sensitivity 0.608 0.950 0.861 0.477 0.667 0.083
Specificity 0.923 0.683 0.984 0.972 0.992 0.986
F1-Score 0.633 0.900 0.867 0.590 0.720 0.091
Balanced Accuracy 0.766 0.817 0.923 0.724 0.829 0.535
Kappa 0.655 0.670 0.850 0.530 0.709 0.075
Up-XGB Accuracy 0.844 0.868 0.970 0.896 0.980 0.975
Sensitivity 0.658 0.958 0.875 0.467 0.741 0.250
Specificity 0.927 0.693 0.982 0.981 0.990 0.989
F1-Score 0.680 0.905 0.869 0.599 0.755 0.273
Balanced Accuracy 0.793 0.825 0.929 0.724 0.865 0.619
Kappa 0.678 0.688 0.852 0.544 0.744 0.260
Up-RF Accuracy 0.835 0.857 0.975 0.883 0.974 0.981
Sensitivity 0.613 0.972 0.875 0.383 0.667 0.167
Specificity 0.918 0.633 0.988 0.983 0.987 0.997
F1-Score 0.648 0.900 0.887 0.522 0.679 0.250
Balanced Accuracy 0.765 0.802 0.931 0.683 0.827 0.582
Kappa 0.646 0.654 0.873 0.466 0.665 0.243
SMOTE-SVM Accuracy 0.827 0.866 0.958 0.888 0.972 0.970
Sensitivity 0.650 0.939 0.861 0.477 0.556 0.417
Specificity 0.927 0.725 0.970 0.970 0.990 0.981
F1-Score 0.656 0.902 0.821 0.586 0.625 0.345
Balanced Accuracy 0.788 0.832 0.916 0.723 0.773 0.699
Kappa 0.651 0.690 0.797 0.525 0.611 0.330
Figure 4.9: Evaluation Metrics on Confirmation Set Models
Figure 4.10: Entropy vs. Predicted Probability in Confirmation Set
Figure 4.11: Gene Optimized Workflows Per-Class Metrics in Confirmation Set
Figure 4.12: Confusion Matrices for Confirmation Set Models

4.4 Gene Optimization

4.4.1 Up-XGB

Figure 4.18: Gene Optimization for Up-XGB Classifier using Balanced Accuracy

In the Up-XGB classifier, the mean balanced accuracy is highest when we reach 12 genes added, hence the optimal number of genes used will be n=28+12=40 The added genes are: HNF1B, TPX2, TFF1, CYP2C18, TFF3, WT1, GPR64, KLK7, SLC3A1, IGFBP1, GAD1 and LGALS4.

Table 4.15: Gene Profile of Optimal Set in Up-XGB Workflow
Set Genes PrOTYPE SPOT Optimal Set Candidate Rank
Base COL11A1
CD74
CD2
TIMP3
LUM
CYTIP
COL3A1
THBS2
TCF7L1
HMGA2
FN1
POSTN
COL1A2
COL5A2
PDZK1IP1
FBN1
HIF1A
CXCL10
DUSP4
SOX17
MITF
CDKN3
BRCA2
CEACAM5
ANXA4
SERPINE1
CRABP2
DNAJC9
Candidates HNF1B 1
TPX2 2
TFF1 3
CYP2C18 4
TFF3 5
WT1 6
GPR64 7
KLK7 8
SLC3A1 9
IGFBP1 10
GAD1 11
LGALS4 12
MET 13
GCNT3 14
FUT3 15
C1orf173 16
EGFL6 17
MUC5B 18
C10orf116 19
DKK4 20
IL6 21
CAPN2 22
KGFLP2 23
BRCA1 24
CYP4B1 25
IGKC 26
PBX1 27
TSPAN8 28
SEMA6A 29
SENP8 30
PAX8 31
TP53 32
SERPINA5 33
ATP5G3 34
CPNE8 35
LIN28B 36
STC1 37
EPAS1 38
BCL2 39
MAP1LC3A 40
SCGB1D2 41
ADCYAP1R1 42
IGJ 43
ZBED1 44

4.4.2 SMOTE-SVM

Figure 4.19: Gene Optimization for SMOTE-SVM Classifier using Balanced Accuracy

In the SMOTE-SVM classifier, the mean balanced accuracy is optimal with 0 genes added, hence the optimal number of genes used will be n=28+0=28 The added genes are: .

Table 4.16: Gene Profile of Optimal Set in SMOTE-SVM Workflow
Set Genes PrOTYPE SPOT Optimal Set Candidate Rank
Base COL11A1
CD74
CD2
TIMP3
LUM
CYTIP
COL3A1
THBS2
TCF7L1
HMGA2
FN1
POSTN
COL1A2
COL5A2
PDZK1IP1
FBN1
HIF1A
CXCL10
DUSP4
SOX17
MITF
CDKN3
BRCA2
CEACAM5
ANXA4
SERPINE1
CRABP2
DNAJC9

4.5 Validation Set

Table 4.17: Evaluation Metrics on Validation Set Model, Up-XGB, Full Set
Histotypes
Metric Overall HGSOC CCOC ENOC MUOC LGSOC
Accuracy 0.902 0.915 0.975 0.951 0.983 0.979
Sensitivity 0.797 0.937 1.000 0.602 0.913 0.533
Specificity 0.954 0.836 0.973 0.989 0.985 0.986
F1-Score 0.742 0.945 0.862 0.707 0.737 0.457
Balanced Accuracy 0.876 0.886 0.987 0.796 0.949 0.760
Kappa 0.743 0.756 0.849 0.681 0.729 0.447
Figure 4.20: Up-XGB, Full Set, Per-Class Metrics in Validation Set
Figure 4.21: Confusion Matrix for Validation Set Model
Figure 4.22: ROC Curves for Up-XGB, Full Set Model in Validation Set
Figure 4.23: Calibration Plots for Up-XGB, Full Set Model in Validation Set
Figure 4.24: Validation Summary
Table 4.18: Clinicopath characteristics between correct and incorrect predictions of ENOC cases
Characteristic Predicted ENOC Correctly
N = 53
1
Missed ENOC
N = 35
1
p-value2
Age at diagnosis 52 (46, 63) 57 (51, 63) 0.2
Tumour grade

0.002
    low grade 41 (93%) 18 (64%)
    high grade 3 (6.8%) 10 (36%)
    Unknown 9 7
FIGO tumour stage

0.2
    I 40 (77%) 22 (63%)
    II-IV 12 (23%) 13 (37%)
    Unknown 1 0
Race

>0.9
    white 48 (92%) 28 (90%)
    non-white 4 (7.7%) 3 (9.7%)
    Unknown 1 4
ARID1A

0.4
    absent/subclonal 11 (21%) 5 (14%)
    present 42 (79%) 30 (86%)
1 Median (Q1, Q3); n (%)
2 Wilcoxon rank sum test; Pearson’s Chi-squared test; Fisher’s exact test
Figure 4.25: Volcano Plots of Validation Set Predictions
Figure 4.26: Subtype Prediction Summary among Predicted HGSC Samples