4 Results – Ovarian Cancer Histotypes: Report of Statistical Findings

4.1 Training Set

Table 4.1: Training Set Mean Accuracy

			Histotypes
Subsampling	Algorithms	Overall	HGSC	CCOC	ENOC	LGSC	MUC
none	rf	0.912	0.935	0.982	0.949	0.982	0.975
	svm	0.925	0.945	0.979	0.962	0.985	0.98
	xgb	0.81	0.811	0.937	0.935	0.982	0.955
	mr	0.809	0.811	0.934	0.936	0.982	0.955
down	rf	0.824	0.873	0.977	0.928	0.92	0.95
	svm	0.803	0.839	0.977	0.905	0.915	0.97
	xgb	0.694	0.758	0.928	0.921	0.839	0.942
	mr	0.841	0.873	0.979	0.934	0.928	0.967
up	rf	0.928	0.958	0.982	0.957	0.983	0.976
	svm	0.916	0.944	0.979	0.955	0.978	0.977
	xgb	0.923	0.953	0.981	0.958	0.982	0.972
	mr	0.886	0.924	0.977	0.94	0.967	0.963
smote	rf	0.928	0.955	0.983	0.959	0.982	0.976
	svm	0.916	0.947	0.973	0.953	0.982	0.976
	xgb	0.927	0.957	0.98	0.959	0.985	0.972
	mr	0.901	0.935	0.982	0.949	0.969	0.967
hybrid	rf	0.917	0.95	0.976	0.953	0.981	0.975
	svm	0.916	0.943	0.979	0.953	0.979	0.977
	xgb	0.925	0.954	0.982	0.959	0.983	0.972
	mr	0.893	0.927	0.979	0.947	0.964	0.968

Table 4.2: Training Set Mean Sensitivity

			Histotypes
Subsampling	Algorithms	Overall	HGSC	CCOC	ENOC	LGSC	MUC
none	rf	0.579	0.994	0.79	0.393	0	0.718
	svm	0.674	0.989	0.724	0.642	0.302	0.714
	xgb	0.208	1	0.04	0	0	0
	mr	0.207	1	0.013	0.022	0	0
down	rf	0.742	0.854	0.886	0.441	0.783	0.743
	svm	0.81	0.808	0.822	0.681	0.95	0.786
	xgb	0.693	0.701	0.873	0.4	0.855	0.636
	mr	0.815	0.851	0.861	0.689	0.855	0.822
up	rf	0.687	0.987	0.785	0.648	0.262	0.753
	svm	0.751	0.962	0.786	0.69	0.548	0.77
	xgb	0.761	0.967	0.819	0.633	0.548	0.839
	mr	0.766	0.922	0.81	0.671	0.648	0.776
smote	rf	0.712	0.979	0.833	0.646	0.312	0.788
	svm	0.744	0.967	0.74	0.646	0.598	0.77
	xgb	0.79	0.965	0.846	0.63	0.655	0.856
	mr	0.776	0.935	0.833	0.691	0.626	0.794
hybrid	rf	0.737	0.964	0.808	0.648	0.462	0.803
	svm	0.751	0.963	0.74	0.699	0.598	0.754
	xgb	0.79	0.964	0.846	0.646	0.655	0.839
	mr	0.796	0.924	0.833	0.657	0.755	0.81

Figure 4.2: Training Set Mean Sensitivity

Table 4.3: Training Set Mean Specificity

			Histotypes
Subsampling	Algorithms	Overall	HGSC	CCOC	ENOC	LGSC	MUC
none	rf	0.933	0.694	0.996	0.987	1	0.988
	svm	0.947	0.765	0.996	0.984	0.997	0.993
	xgb	0.803	0.016	0.999	1	1	1
	mr	0.804	0.021	0.997	1	1	1
down	rf	0.956	0.954	0.983	0.962	0.922	0.96
	svm	0.954	0.971	0.987	0.919	0.914	0.979
	xgb	0.932	0.974	0.932	0.96	0.838	0.957
	mr	0.961	0.962	0.987	0.95	0.93	0.975
up	rf	0.96	0.84	0.996	0.978	0.997	0.988
	svm	0.962	0.874	0.991	0.974	0.985	0.988
	xgb	0.967	0.897	0.991	0.98	0.99	0.979
	mr	0.966	0.935	0.988	0.959	0.973	0.973
smote	rf	0.963	0.861	0.993	0.98	0.994	0.986
	svm	0.96	0.863	0.989	0.974	0.99	0.987
	xgb	0.972	0.921	0.989	0.981	0.99	0.978
	mr	0.968	0.933	0.991	0.967	0.975	0.976
hybrid	rf	0.966	0.893	0.987	0.974	0.991	0.983
	svm	0.96	0.862	0.995	0.97	0.986	0.988
	xgb	0.97	0.91	0.991	0.981	0.989	0.979
	mr	0.967	0.938	0.989	0.967	0.968	0.977

Figure 4.3: Training Set Mean Specificity

Table 4.4: Training Set Mean F1-Score

			Histotypes
Subsampling	Algorithms	Overall	HGSC	CCOC	ENOC	LGSC	MUC
none	rf	0.752	0.961	0.848	0.487	NaN	0.713
	svm	0.723	0.967	0.8	0.673	0.413	0.762
	xgb	0.749	0.895	0.167	NaN	NaN	NaN
	mr	0.569	0.895	0.042	0.2	NaN	NaN
down	rf	0.605	0.916	0.832	0.433	0.27	0.574
	svm	0.635	0.89	0.82	0.478	0.292	0.698
	xgb	0.511	0.798	0.661	0.425	0.197	0.497
	mr	0.661	0.915	0.844	0.563	0.293	0.692
up	rf	0.736	0.974	0.846	0.652	0.392	0.734
	svm	0.729	0.965	0.822	0.661	0.448	0.751
	xgb	0.736	0.971	0.84	0.648	0.489	0.73
	mr	0.683	0.952	0.815	0.59	0.403	0.657
smote	rf	0.747	0.972	0.858	0.663	0.421	0.742
	svm	0.73	0.967	0.779	0.637	0.521	0.745
	xgb	0.755	0.973	0.84	0.654	0.576	0.733
	mr	0.708	0.959	0.848	0.633	0.417	0.682
hybrid	rf	0.718	0.968	0.809	0.632	0.449	0.732
	svm	0.731	0.965	0.813	0.65	0.482	0.746
	xgb	0.753	0.971	0.852	0.659	0.55	0.729
	mr	0.703	0.953	0.832	0.615	0.422	0.695

Table 4.5: Training Set Mean Balanced Accuracy

			Histotypes
Subsampling	Algorithms	Overall	HGSC	CCOC	ENOC	LGSC	MUC
none	rf	0.756	0.844	0.893	0.69	0.5	0.853
	svm	0.811	0.877	0.86	0.813	0.65	0.854
	xgb	0.506	0.508	0.52	0.5	0.5	0.5
	mr	0.505	0.511	0.505	0.511	0.5	0.5
down	rf	0.849	0.904	0.934	0.702	0.852	0.852
	svm	0.882	0.89	0.905	0.8	0.932	0.883
	xgb	0.813	0.838	0.902	0.68	0.846	0.796
	mr	0.888	0.906	0.924	0.819	0.892	0.898
up	rf	0.823	0.913	0.891	0.813	0.629	0.87
	svm	0.857	0.918	0.889	0.832	0.767	0.879
	xgb	0.864	0.932	0.905	0.806	0.769	0.909
	mr	0.866	0.928	0.899	0.815	0.81	0.875
smote	rf	0.837	0.92	0.913	0.813	0.653	0.887
	svm	0.852	0.915	0.864	0.81	0.794	0.878
	xgb	0.881	0.943	0.917	0.806	0.823	0.917
	mr	0.872	0.934	0.912	0.829	0.801	0.885
hybrid	rf	0.851	0.929	0.897	0.811	0.726	0.893
	svm	0.856	0.913	0.867	0.835	0.792	0.871
	xgb	0.88	0.937	0.919	0.814	0.822	0.909
	mr	0.882	0.931	0.911	0.812	0.861	0.893

Figure 4.5: Training Set Mean Balanced Accuracy

Table 4.6: Training Set Mean Kappa

			Histotypes
Subsampling	Algorithms	Overall	HGSC	CCOC	ENOC	LGSC	MUC
none	rf	0.7	0.768	0.839	0.463	0	0.7
	svm	0.754	0.808	0.789	0.653	0.407	0.752
	xgb	0.023	0.026	0.062	0	0	0
	mr	0.025	0.034	0.019	0.039	0	0
down	rf	0.582	0.663	0.82	0.395	0.249	0.55
	svm	0.565	0.602	0.807	0.432	0.271	0.682
	xgb	0.447	0.501	0.628	0.308	0.171	0.469
	mr	0.623	0.663	0.833	0.529	0.273	0.675
up	rf	0.778	0.861	0.837	0.629	0.308	0.722
	svm	0.754	0.822	0.81	0.637	0.437	0.739
	xgb	0.773	0.85	0.83	0.625	0.481	0.716
	mr	0.695	0.777	0.802	0.558	0.389	0.638
smote	rf	0.78	0.856	0.849	0.641	0.33	0.73
	svm	0.749	0.829	0.764	0.612	0.512	0.733
	xgb	0.788	0.866	0.83	0.632	0.569	0.719
	mr	0.727	0.803	0.838	0.606	0.404	0.665
hybrid	rf	0.759	0.844	0.797	0.607	0.44	0.719
	svm	0.751	0.82	0.802	0.625	0.472	0.734
	xgb	0.782	0.854	0.842	0.638	0.543	0.715
	mr	0.711	0.784	0.821	0.586	0.408	0.679

4.2 Rank Aggregation

Multi-step methods:

sequential: sequential algorithm sequence of subsampling methods and algorithms used are:
- HGSC vs. non-HGSC using upsubsampling and random forest
- CCOC vs. non-CCOC using SMOTE subsampling and XGBoost
- ENOC vs. non-ENOC using hybrid subsampling and support vector machine
- LGSC vs. MUC using hybrid subsampling and random forest
two_step: two-step algorithm sequence of subsampling methods and algorithms used are:
- HGSC vs. non-HGSC using SMOTE subsampling and random forest
- CCOC vs. ENOC vs. MUC vs. LGSC using hybrid subsampling and support vector machine

We conduct rank aggregation using a two-stage nested appraoch:

First we rank aggregate the per-class metrics for F1-score, balanced accuracy and kappa.
Then we take the aggregated lists from the three metrics and perform a final rank aggregation.
The top workflows from the final rank aggregation are used for gene optimization in the confirmation set

4.2.1 Across Classes

Table 4.7: F1-Score Rank Aggregation Summary

Table 4.8: Balanced Accuracy Rank Aggregation Summary

Table 4.9: Kappa Rank Aggregation Summary

4.2.2 Across Metrics

Table 4.10: Rank Aggregation Comparison of Metrics Used

Rank	F1	Balanced Accuracy	Kappa
1	sequential	sequential	sequential
2	two_step	smote_xgb	smote_rf
3	smote_rf	hybrid_xgb	smote_xgb
4	hybrid_xgb	smote_mr	hybrid_xgb
5	smote_xgb	down_mr	two_step
6	up_rf	two_step	up_svm
7	hybrid_svm	up_xgb	up_xgb
8	up_svm	hybrid_mr	up_rf
9	smote_svm	smote_rf	smote_svm
10	up_xgb	up_mr	hybrid_svm
11	hybrid_rf	down_svm	hybrid_rf
12	none_svm	up_svm	none_svm
13	smote_mr	hybrid_rf	smote_mr
14	hybrid_mr	smote_svm	hybrid_mr
15	up_mr	hybrid_svm	up_mr
16	down_mr	up_rf	down_mr
17	down_svm	down_rf	none_rf
18	down_rf	none_svm	down_svm
19	down_xgb	none_rf	down_rf
20	NA	down_xgb	down_xgb
21	NA	none_mr	none_mr
22	NA	none_xgb	none_xgb

Table 4.11: Top 5 Workflows from Final Rank Aggregation

Rank	Workflow
1	sequential
2	smote_rf
3	smote_xgb
4	hybrid_xgb
5	two_step

4.2.3 Top Workflows

We look at the per-class evaluation metrics of the top 5 workflows.

Table 4.12: Top Workflow Per-Class Evaluation Metrics

		Histotypes
Metric	Workflow	HGSC	CCOC	ENOC	LGSC	MUC
Accuracy	sequential	0.951 (0.94, 0.964)	0.929 (0.875, 0.96)	0.857 (0.781, 0.935)	0.95 (0.867, 1)	0.95 (0.867, 1)
	SMOTE-RF	0.955 (0.936, 0.98)	0.983 (0.972, 0.988)	0.959 (0.94, 0.972)	0.982 (0.972, 0.992)	0.976 (0.96, 0.984)
	SMOTE-XGB	0.957 (0.936, 0.98)	0.98 (0.96, 0.992)	0.959 (0.937, 0.976)	0.985 (0.976, 0.992)	0.972 (0.968, 0.98)
	hybrid-XGB	0.954 (0.936, 0.968)	0.982 (0.968, 0.992)	0.959 (0.925, 0.972)	0.983 (0.972, 0.992)	0.972 (0.964, 0.988)
	2-STEP	0.949 (0.924, 0.964)	0.909 (0.826, 0.957)	0.848 (0.783, 0.936)	0.957 (0.935, 0.978)	0.931 (0.891, 0.957)
Sensitivity	sequential	0.975 (0.961, 0.99)	0.863 (0.75, 0.941)	0.817 (0.75, 0.938)	0.96 (0.8, 1)	0.947 (0.818, 1)
	SMOTE-RF	0.979 (0.969, 0.986)	0.833 (0.6, 0.944)	0.646 (0.556, 0.75)	0.312 (0, 0.667)	0.788 (0.462, 0.909)
	SMOTE-XGB	0.965 (0.951, 0.986)	0.846 (0.6, 0.944)	0.63 (0.444, 0.818)	0.655 (0.25, 0.857)	0.856 (0.615, 1)
	hybrid-XGB	0.964 (0.956, 0.972)	0.846 (0.667, 0.944)	0.646 (0.333, 0.773)	0.655 (0.25, 0.857)	0.839 (0.538, 1)
	2-STEP	0.967 (0.95, 0.98)	0.839 (0.688, 0.933)	0.754 (0.583, 1)	0.883 (0.667, 1)	0.856 (0.786, 1)
Specificity	sequential	0.851 (0.833, 0.875)	0.963 (0.938, 1)	0.899 (0.812, 0.938)	0.947 (0.818, 1)	0.96 (0.8, 1)
	SMOTE-RF	0.861 (0.776, 0.946)	0.993 (0.987, 0.996)	0.98 (0.966, 0.988)	0.994 (0.984, 1)	0.986 (0.979, 0.996)
	SMOTE-XGB	0.921 (0.857, 0.965)	0.989 (0.983, 1)	0.981 (0.97, 0.991)	0.99 (0.98, 0.996)	0.978 (0.971, 0.988)
	hybrid-XGB	0.91 (0.837, 0.947)	0.991 (0.987, 0.996)	0.981 (0.97, 0.991)	0.989 (0.976, 0.996)	0.979 (0.967, 0.992)
	2-STEP	0.871 (0.766, 0.957)	0.947 (0.9, 1)	0.884 (0.812, 1)	0.966 (0.921, 1)	0.96 (0.917, 1)
F1-Score	sequential	0.97 (0.963, 0.978)	0.891 (0.8, 0.941)	0.852 (0.774, 0.938)	0.92 (0.8, 1)	0.963 (0.9, 1)
	SMOTE-RF	0.972 (0.959, 0.988)	0.858 (0.72, 0.919)	0.663 (0.571, 0.762)	0.421 (0.222, 0.667)	0.742 (0.545, 0.833)
	SMOTE-XGB	0.973 (0.96, 0.988)	0.84 (0.643, 0.941)	0.654 (0.5, 0.857)	0.576 (0.286, 0.857)	0.733 (0.667, 0.783)
	hybrid-XGB	0.971 (0.96, 0.981)	0.852 (0.714, 0.944)	0.659 (0.387, 0.829)	0.55 (0.333, 0.857)	0.729 (0.609, 0.87)
	2-STEP	0.969 (0.954, 0.978)	0.865 (0.733, 0.941)	0.738 (0.615, 0.897)	0.782 (0.667, 0.842)	0.864 (0.762, 0.917)
Balanced Accuracy	sequential	0.913 (0.899, 0.93)	0.913 (0.844, 0.955)	0.858 (0.781, 0.935)	0.953 (0.9, 1)	0.953 (0.9, 1)
	SMOTE-RF	0.92 (0.878, 0.966)	0.913 (0.798, 0.968)	0.813 (0.772, 0.864)	0.653 (0.5, 0.831)	0.887 (0.724, 0.946)
	SMOTE-XGB	0.943 (0.906, 0.97)	0.917 (0.792, 0.964)	0.806 (0.709, 0.905)	0.823 (0.621, 0.927)	0.917 (0.801, 0.986)
	hybrid-XGB	0.937 (0.899, 0.959)	0.919 (0.827, 0.97)	0.814 (0.652, 0.882)	0.822 (0.623, 0.927)	0.909 (0.763, 0.988)
	2-STEP	0.919 (0.863, 0.954)	0.893 (0.794, 0.951)	0.819 (0.745, 0.956)	0.924 (0.833, 0.989)	0.908 (0.858, 0.971)
Kappa	sequential	0.842 (0.805, 0.881)	0.839 (0.71, 0.911)	0.715 (0.562, 0.871)	0.884 (0.706, 1)	0.884 (0.706, 1)
	SMOTE-RF	0.856 (0.799, 0.922)	0.849 (0.706, 0.912)	0.641 (0.539, 0.74)	0.33 (0, 0.663)	0.73 (0.525, 0.825)
	SMOTE-XGB	0.866 (0.8, 0.922)	0.83 (0.622, 0.937)	0.632 (0.467, 0.844)	0.569 (0.276, 0.853)	0.719 (0.65, 0.772)
	hybrid-XGB	0.854 (0.797, 0.9)	0.842 (0.697, 0.94)	0.638 (0.348, 0.814)	0.543 (0.326, 0.853)	0.715 (0.59, 0.863)
	2-STEP	0.833 (0.745, 0.883)	0.796 (0.605, 0.908)	0.632 (0.465, 0.851)	0.758 (0.647, 0.802)	0.818 (0.692, 0.888)

Figure 4.7: Top 5 Workflow Per-Class Evaluation Metrics by Metric

Table 4.13: Top Workflow Per-Class Evaluation Metrics and Ranks

Workflow	Rank	HGSC	CCOC	ENOC	LGSC	MUC
F1-Score
sequential	1	0.970	0.891	0.852	0.920	0.963
2-STEP	2	0.969	0.865	0.738	0.782	0.864
SMOTE-RF	3	0.972	0.858	0.663	0.421	0.742
hybrid-XGB	4	0.971	0.852	0.659	0.550	0.729
SMOTE-XGB	5	0.973	0.840	0.654	0.576	0.733
Balanced Accuracy
sequential	1	0.913	0.913	0.858	0.953	0.953
SMOTE-XGB	2	0.943	0.917	0.806	0.823	0.917
hybrid-XGB	3	0.937	0.919	0.814	0.822	0.909
2-STEP	6	0.919	0.893	0.819	0.924	0.908
SMOTE-RF	9	0.920	0.913	0.813	0.653	0.887
Kappa
sequential	1	0.842	0.839	0.715	0.884	0.884
SMOTE-RF	2	0.856	0.849	0.641	0.330	0.730
SMOTE-XGB	3	0.866	0.830	0.632	0.569	0.719
hybrid-XGB	4	0.854	0.842	0.638	0.543	0.715
2-STEP	5	0.833	0.796	0.632	0.758	0.818

Figure 4.8: Top 5 Workflow Per-Class Evaluation Metrics by Metric

Misclassified cases from a previous step of the sequence of classifiers are not included in subsequent steps of the training set CV folds. Thus, we cannot piece together the test set predictions from the sequential and two-step algorithms to obtain overall metrics.

4.3 Optimal Gene Sets

4.3.1 Sequential Algorithm

Figure 4.9: Gene Optimization for Sequential Classifier

In the sequential algorithm, all sequences have relatively flat average F1-scores across the number of genes added. However, we can observe in sequence 4, the F1-score is highest when we reach 9 genes added, hence the optimal number of genes used will be n=28+9=37 The added genes are: CYP2C18, HNF1B, EGFL6, TFF3, IL6, CYP4B1, LGALS4, SLC3A1 and IGFBP1.

Table 4.14: Gene Profile of Optimal Set in Sequential Algorithm

Set	Genes	PrOTYPE	SPOT	Optimal Set	Candidate Rank
Base	COL11A1	✔		◉
	CD74	✔		◉
	CD2	✔		◉
	TIMP3	✔		◉
	LUM	✔		◉
	CYTIP	✔		◉
	COL3A1	✔		◉
	THBS2	✔		◉
	TCF7L1	✔	✔	◉
	HMGA2	✔		◉
	FN1	✔		◉
	POSTN	✔		◉
	COL1A2	✔		◉
	COL5A2	✔		◉
	PDZK1IP1	✔		◉
	FBN1	✔		◉
	HIF1A		✔	◉
	CXCL10		✔	◉
	DUSP4		✔	◉
	SOX17		✔	◉
	MITF		✔	◉
	CDKN3		✔	◉
	BRCA2		✔	◉
	CEACAM5		✔	◉
	ANXA4		✔	◉
	SERPINE1		✔	◉
	CRABP2		✔	◉
	DNAJC9		✔	◉
Candidates	CYP2C18			◉	1
	HNF1B			◉	2
	EGFL6			◉	3
	TFF3			◉	4
	IL6			◉	5
	CYP4B1			◉	6
	LGALS4			◉	7
	SLC3A1			◉	8
	IGFBP1			◉	9
	WT1			ⓧ	10
	MUC5B			ⓧ	11
	TFF1			ⓧ	12
	GPR64			ⓧ	13
	TP53			ⓧ	14
	BRCA1			ⓧ	15
	MET			ⓧ	16
	FUT3			ⓧ	17
	CPNE8			ⓧ	18
	TPX2			ⓧ	19
	PBX1			ⓧ	20
	EPAS1			ⓧ	21
	SCGB1D2			ⓧ	22
	KLK7			ⓧ	23
	SEMA6A			ⓧ	24
	DKK4			ⓧ	25
	CAPN2			ⓧ	26
	GAD1			ⓧ	27
	STC1			ⓧ	28
	IGJ			ⓧ	29
	GCNT3			ⓧ	30
	TSPAN8			ⓧ	31
	SERPINA5			ⓧ	32
	C1orf173			ⓧ	33
	PAX8			ⓧ	34
	LIN28B			ⓧ	35
	ZBED1			ⓧ	36
	ATP5G3			ⓧ	37
	BCL2			ⓧ	38
	KGFLP2			ⓧ	39
	IGKC			ⓧ	40
	SENP8			ⓧ	41
	MAP1LC3A			ⓧ	42
	C10orf116			ⓧ	43
	ADCYAP1R1			ⓧ	44

4.3.2 SMOTE-RF

Figure 4.10: Gene Optimization for SMOTE-RF Classifier

In the SMOTE-RF classifier, the mean F1-score is highest when we reach 16 genes added, hence the optimal number of genes used will be n=28+16=44 The added genes are: HNF1B, TFF3, TPX2, SLC3A1, CYP2C18, TFF1, WT1, KLK7, IGFBP1, LGALS4, GAD1, GCNT3, C1orf173, CAPN2, FUT3 and DKK4.

Table 4.15: Gene Profile of Optimal Set in SMOTE-RF Workflow

Set	Genes	PrOTYPE	SPOT	Optimal Set	Candidate Rank
Base	COL11A1	✔		◉
	CD74	✔		◉
	CD2	✔		◉
	TIMP3	✔		◉
	LUM	✔		◉
	CYTIP	✔		◉
	COL3A1	✔		◉
	THBS2	✔		◉
	TCF7L1	✔	✔	◉
	HMGA2	✔		◉
	FN1	✔		◉
	POSTN	✔		◉
	COL1A2	✔		◉
	COL5A2	✔		◉
	PDZK1IP1	✔		◉
	FBN1	✔		◉
	HIF1A		✔	◉
	CXCL10		✔	◉
	DUSP4		✔	◉
	SOX17		✔	◉
	MITF		✔	◉
	CDKN3		✔	◉
	BRCA2		✔	◉
	CEACAM5		✔	◉
	ANXA4		✔	◉
	SERPINE1		✔	◉
	CRABP2		✔	◉
	DNAJC9		✔	◉
Candidates	HNF1B			◉	1
	TFF3			◉	2
	TPX2			◉	3
	SLC3A1			◉	4
	CYP2C18			◉	5
	TFF1			◉	6
	WT1			◉	7
	KLK7			◉	8
	IGFBP1			◉	9
	LGALS4			◉	10
	GAD1			◉	11
	GCNT3			◉	12
	C1orf173			◉	13
	CAPN2			◉	14
	FUT3			◉	15
	DKK4			◉	16
	C10orf116			ⓧ	17
	MUC5B			ⓧ	18
	MET			ⓧ	19
	GPR64			ⓧ	20
	IGKC			ⓧ	21
	PAX8			ⓧ	22
	ATP5G3			ⓧ	23
	CPNE8			ⓧ	24
	PBX1			ⓧ	25
	IL6			ⓧ	26
	TP53			ⓧ	27
	KGFLP2			ⓧ	28
	EGFL6			ⓧ	29
	SEMA6A			ⓧ	30
	CYP4B1			ⓧ	31
	STC1			ⓧ	32
	EPAS1			ⓧ	33
	BRCA1			ⓧ	34
	LIN28B			ⓧ	35
	TSPAN8			ⓧ	36
	SERPINA5			ⓧ	37
	SCGB1D2			ⓧ	38
	BCL2			ⓧ	39
	ZBED1			ⓧ	40
	ADCYAP1R1			ⓧ	41
	IGJ			ⓧ	42
	SENP8			ⓧ	43
	MAP1LC3A			ⓧ	44

4.3.3 2-STEP

Figure 4.11: Gene Optimization for 2-STEP Classifier

Table 4.16: Gene Profile of Optimal Set in 2-STEP Workflow

Set	Genes	PrOTYPE	SPOT	Optimal Set	Candidate Rank
Base	COL11A1	✔		◉
	CD74	✔		◉
	CD2	✔		◉
	TIMP3	✔		◉
	LUM	✔		◉
	CYTIP	✔		◉
	COL3A1	✔		◉
	THBS2	✔		◉
	TCF7L1	✔	✔	◉
	HMGA2	✔		◉
	FN1	✔		◉
	POSTN	✔		◉
	COL1A2	✔		◉
	COL5A2	✔		◉
	PDZK1IP1	✔		◉
	FBN1	✔		◉
	HIF1A		✔	◉
	CXCL10		✔	◉
	DUSP4		✔	◉
	SOX17		✔	◉
	MITF		✔	◉
	CDKN3		✔	◉
	BRCA2		✔	◉
	CEACAM5		✔	◉
	ANXA4		✔	◉
	SERPINE1		✔	◉
	CRABP2		✔	◉
	DNAJC9		✔	◉
Candidates	CYP2C18			◉	1
	MUC5B			◉	2
	HNF1B			◉	3
	IL6			◉	4
	SLC3A1			◉	5
	EGFL6			◉	6
	WT1			◉	7
	ZBED1			◉	8
	MET			◉	9
	SENP8			◉	10
	KLK7			◉	11
	TFF3			◉	12
	CPNE8			◉	13
	STC1			◉	14
	GAD1			◉	15
	LIN28B			ⓧ	16
	IGJ			ⓧ	17
	DKK4			ⓧ	18
	EPAS1			ⓧ	19
	GCNT3			ⓧ	20
	SCGB1D2			ⓧ	21
	CYP4B1			ⓧ	22
	C1orf173			ⓧ	23
	IGFBP1			ⓧ	24
	TPX2			ⓧ	25
	SEMA6A			ⓧ	26
	ATP5G3			ⓧ	27
	SERPINA5			ⓧ	28
	FUT3			ⓧ	29
	C10orf116			ⓧ	30
	KGFLP2			ⓧ	31
	ADCYAP1R1			ⓧ	32
	TP53			ⓧ	33
	PBX1			ⓧ	34
	GPR64			ⓧ	35
	LGALS4			ⓧ	36
	CAPN2			ⓧ	37
	BCL2			ⓧ	38
	MAP1LC3A			ⓧ	39
	TSPAN8			ⓧ	40
	TFF1			ⓧ	41
	PAX8			ⓧ	42
	BRCA1			ⓧ	43
	IGKC			ⓧ	44

4.4 Test Set Performance

Now we’d like to see how our best methods perform in the confirmation and validation sets. The class-specific F1-scores will be used.

The top 2 methods are the sequential and SMOTE-RF classifiers. We can test 2 additional methods by using either the full set of genes or the optimal set of genes for both of these classifiers.

4.4.1 Confirmation Set

Table 4.17: Evaluation Metrics on Confirmation Set Models

			Histotypes
Method	Metric	Overall	HGSC	CCOC	ENOC	LGSC	MUC
Sequential, Full Set	Accuracy	0.829	0.861	0.964	0.888	0.975	0.969
	Sensitivity	0.591	0.950	0.861	0.467	0.083	0.593
	Specificity	0.923	0.688	0.977	0.972	0.992	0.985
	F1-Score	0.610	0.901	0.844	0.581	0.111	0.615
	Balanced Accuracy	0.757	0.819	0.919	0.720	0.538	0.789
	Kappa	0.646	0.674	0.823	0.521	0.100	0.599
Sequential, Optimal Set	Accuracy	0.816	0.852	0.963	0.875	0.970	0.972
	Sensitivity	0.554	0.955	0.875	0.383	0.000	0.556
	Specificity	0.916	0.651	0.974	0.974	0.989	0.990
	F1-Score	0.573	0.895	0.840	0.506	0.000	0.625
	Balanced Accuracy	0.735	0.803	0.924	0.679	0.494	0.773
	Kappa	0.614	0.648	0.819	0.443	-0.014	0.611
SMOTE-RF, Full Set	Accuracy	0.841	0.866	0.972	0.897	0.975	0.972
	Sensitivity	0.652	0.953	0.875	0.477	0.250	0.704
	Specificity	0.927	0.697	0.984	0.981	0.989	0.984
	F1-Score	0.667	0.904	0.875	0.607	0.273	0.679
	Balanced Accuracy	0.789	0.825	0.930	0.729	0.619	0.844
	Kappa	0.673	0.685	0.859	0.553	0.260	0.664
SMOTE-RF, Optimal Set	Accuracy	0.840	0.869	0.966	0.900	0.980	0.964
	Sensitivity	0.669	0.943	0.861	0.505	0.333	0.704
	Specificity	0.930	0.725	0.979	0.979	0.992	0.976
	F1-Score	0.677	0.905	0.849	0.628	0.381	0.623
	Balanced Accuracy	0.800	0.834	0.920	0.742	0.663	0.840
	Kappa	0.676	0.696	0.830	0.574	0.371	0.604
2-STEP, Full Set	Accuracy	0.835	0.861	0.966	0.891	0.972	0.980
	Sensitivity	0.651	0.941	0.875	0.486	0.250	0.704
	Specificity	0.927	0.706	0.977	0.972	0.986	0.992
	F1-Score	0.669	0.900	0.851	0.598	0.250	0.745
	Balanced Accuracy	0.789	0.824	0.926	0.729	0.618	0.848
	Kappa	0.664	0.677	0.832	0.538	0.236	0.735
2-STEP, Optimal Set	Accuracy	0.843	0.866	0.967	0.900	0.975	0.977
	Sensitivity	0.639	0.953	0.875	0.495	0.167	0.704
	Specificity	0.927	0.697	0.979	0.981	0.990	0.989
	F1-Score	0.660	0.904	0.857	0.624	0.200	0.717
	Balanced Accuracy	0.783	0.825	0.927	0.738	0.579	0.846
	Kappa	0.676	0.685	0.839	0.570	0.188	0.705

Figure 4.12: Entropy vs. Predicted Probability in Confirmation Set

Figure 4.13: Gene Optimized Workflows Per-Class Metrics in Confirmation Set

Figure 4.14: Confusion Matrices for Confirmation Set Models

Figure 4.15: ROC Curves for Sequential Full Model in Confirmation Set

Figure 4.16: ROC Curves for Sequential, Optimal Model in Confirmation Set

Figure 4.17: ROC Curves for SMOTE-RF, Full Set Model in Confirmation Set

Figure 4.18: ROC Curves for SMOTE-RF, Optimal Set Model in Confirmation Set

Figure 4.19: ROC Curves for 2-STEP Full Model in Confirmation Set

Figure 4.20: ROC Curves for 2-STEP Optimal Model in Confirmation Set

4.4.2 Validation Set

Table 4.18: Evaluation Metrics on Validation Set Model, SMOTE-RF, Optimal Set

		Histotypes
Metric	Overall	HGSC	CCOC	ENOC	LGSC	MUC
Accuracy	0.889	0.909	0.971	0.949	0.973	0.977
Sensitivity	0.783	0.911	0.986	0.727	0.467	0.826
Specificity	0.961	0.903	0.970	0.973	0.982	0.980
F1-Score	0.706	0.940	0.840	0.736	0.368	0.644
Balanced Accuracy	0.872	0.907	0.978	0.850	0.724	0.903
Kappa	0.728	0.754	0.824	0.707	0.355	0.633

Figure 4.21: SMOTE-RF Per-Class Metrics in Validation Set

Figure 4.22: Confusion Matrix for Validation Set Model

Figure 4.23: ROC Curves for SMOTE-RF, Optimal Set Model in Validation Set

Table 4.19: Clinicopath characteristics between correct and incorrect predictions of ENOC cases

Characteristic	Predicted ENOC Correctly N = 64¹	Missed ENOC N = 24¹	p-value²
Age at diagnosis	53 (46, 62)	59 (53, 64)	0.040
Tumour grade			<0.001
low grade	49 (94%)	10 (50%)
high grade	3 (5.8%)	10 (50%)
Unknown	12	4
FIGO tumour stage			0.030
I	49 (78%)	13 (54%)
II-IV	14 (22%)	11 (46%)
Unknown	1	0
Race			0.7
white	58 (92%)	18 (90%)
non-white	5 (7.9%)	2 (10%)
Unknown	1	4
ARID1A			0.8
absent/subclonal	11 (17%)	5 (21%)
present	53 (83%)	19 (79%)
¹ Median (Q1, Q3); n (%)
² Wilcoxon rank sum test; Fisher’s exact test; Pearson’s Chi-squared test

Figure 4.24: Volcano Plots of Validation Set Predictions

Figure 4.25: Subtype Prediction Summary among Predicted HGSC Samples