The Institutional Review Board (IRB) of Chung-Ang University Hospital approved this study. This was a retrospective study, and the informed consent requirement was waived by the IRB of Chung-Ang University Hospital (IRB no. 1912-004-358). This study was conducted in accordance with the ethical standards set out in the Declaration of Helsinki.
Study participants
The orbital scans used in this study were obtained from 200 GO patients and 100 normal controls between December 2010 and December 2018. GO patients were diagnosed based on Bartley’s criteria.23and the severity of GO (mild or moderate to severe) was assessed according to the EUGOGO consensus severity scale (European Group on Graves Orbitopathy)24. Patients with mild GO have one or more of the following: minor eyelid retraction (
CT images were assessed jointly by an ophthalmologist and a radiologist, and images that were incomplete or inconsistent with clinical findings were excluded. A total of 288 sets of CT images were obtained, including 99 cases of mild GO, 94 of moderate to severe GO, and 95 of normal controls.
Data preparation
CT images obtained in the axial, coronal and sagittal planes were uploaded to a RadiAnt DICOM viewer (Medixant Co., Poznan, Poland). To overcome variations caused by differences in CT equipment, spline interpolation was used to fix the number of images in each plane to 32. Then we manually cropped the region of interest (ROI) and removed the remaining black margin. To meet the fixed-size input requirement for NNs, the CT images were then interpolated by zooming, expanding the region of interest to 128 × 128 for the axial (128, 128, 32) and sagittal ( 128, 128, 32), and 64 × 128 for the coronal plane (64, 128, 32). CT images were scaled to Hounsfield Unit (HU) values, and fat and EOM were selected in the ranges of -110 to -10 and 0 to 40 HU, respectively, to suppress unnecessary pixels.17. Finally, all images were normalized by scaling between 0 and 1. Figure 3 shows a schematic overview of our preprocessing steps.
Data preparation process. Soft tissue thresholds were set at -100 to +40 Hounsfield Unit (HU) attenuation values to remove unnecessary pixels. Manual cropping was performed and the extracted region of interest (ROI) was unified in size by interpolation.
All 288 cases were separated and combined into four experimental groups: (1) moderate to severe GO vs normal controls, (2) mild GO vs normal controls, (3) moderate to severe GO vs mild GO, and (4) GO moderate to severe vs mild GO vs normal controls. Then, each experimental group was represented as an isolated data set. To mitigate the effects of selection bias due to gender and age, we used cross-validation with random assignment regardless of participants’ clinical or demographic characteristics; for each dataset, 80% was used as a training set to train the NNs and the remaining 20% was used as a test set to evaluate the trained NNs. The final performance of the proposed NN and the existing NNs was measured by averaging the results of 30 repetitive experiments.
Convolutional NN
In practice, it is preferable to consider CT images of the axial, coronal and sagittal planes because they deliver different information for the diagnosis of GO. However, the conventional NN is designed to accept three-channel inputs such as RGB color images. Although a single image plane can be managed by increasing the number of input channels of existing NNs, conventional NNs are unable to use these three image planes simultaneously with their original input layer. To solve this problem, we design a new NN capable of accepting three image planes simultaneously. Figure 4 shows an overview of the proposed NN. Each cell describes the behavior of the operator and the shapes of the input and output nodes. The proposed NN has three input layers consisting of 32-bit single-precision floating-point elements that take axial (128 × 128 × 32), sagittal (128 × 128 × 32), and coronal (64 × 128 × 32) images. ), which are processed independently before the last fully connected layer. In the proposed NN, firstly, features are extracted from the convolutional and depth-convolutional layers based on the input CT images of the maximum three different planes. A maximum pooling layer follows each convolution operation. After the first step, the sizes of the axial and coronal images are reduced to 32 × 32 × 16 and 16 × 32 × 16, respectively. Since only one orbit is included in each sagittal image, the reduced size (32 × 32 × 32) was larger than those of the axial and coronal CT images. Then, feature maps are extracted separately from the left and right orbits of each image to compare the orbits to detect asymmetry or one-sided GO. For this purpose, we use deep convolution, where the filter size has been set to half the image size. Each convolutional filter extracts a real value from the image separated from the left or right orbit. Specifically, 16 convolutional filters are used for each orbit to produce the input values for the next 16 × 2 nodes. Third, each group of 16×2 nodes is flattened into 32×1 nodes, which are fully connected to the following 4×1 nodes. Finally, the output values of the 4×1 nodes are transferred to the output layer. The output layer includes a sigmoid node to calculate the significance value if the dataset consists of two classes, such as mild GO patients versus normal controls. For multi-class classification, such as moderate to severe GO patients versus mild GO patients versus normal controls, the output layer includes three softmax nodes.

Introduction to neural network modeling. The neural network is made of convolutional operators with a mid-depth convolution layer for binocular comparison that reduces the number of parameters compared to that of a convolutional neural network.
Figure 5 shows a block diagram of the proposed NN. As shown in Figure 5, the proposed NN is divided into three parts of convolution layers, the fully connected layer and the classifier. The parts up to the fully connected layer are computed independently for each plane, and after concatenation, the extracted features are combined and used for prediction. Additionally, Fig. Additional S1 visualizes the learned convolutional filters of the proposed NN. For simplicity, only the first 10 filters of each CT plane are shown, and the half-depth convolutional filters are visualized by averaging for each of the eight-by-eight grid areas.

The block diagram of the proposed NN. The feature map sizes and operation types for each layer are described sequentially based on the data flow.
NN Rating
The performance of the proposed NN was evaluated by comparing it to three conventional NNs: GoogLeNet Inception v1 (GoogLeNet), 50-layer Deep Residual Learning (ResNet-50) and 16-layer Very Deep Convolutional Network from Visual Geometry group (VGG-16). )25,26,27. NNs were implemented using Tensorflow (2.1.0) and Keras (2.3.1) APIs, and the overall experiments were run on a GTX 1080Ti 11GB GPU. For a fair comparison, we used the method Well-known Xavier initialization supported by default in APIs for all networks28. Specifically, since conventional NNs can form only one of the image planes, we reported the performance values of conventional NNs when single-plane CT images were used as input data. Each sample of the ImageNet dataset has 3 channels, which is an RGB color scale, on the other hand, each sample of the medical data used in the experiment has 32 input channels. So, these convolutional networks were trained from scratch.
For comparison, three oculoplasty specialists were asked to perform four independent experiments in which they compared three experimental groups with complete CT sets without any clinical information, as was done with proposed and conventional NNs. The final rating was decided by majority vote, and the diagnostic performance in terms of area under the receiver operating characteristic (ROC) curve was compared to that of the proposed NN.
statistical analyzes
All statistical analyzes were performed using the open source software R 3.4.0 (R Foundation for Statistical Computing, Vienna, Austria). Data were expressed as means with standard deviations for continuous variables and as sample counts for categorical variables. Differences in age, sex, and clinical characteristics between clinical groups were analyzed by one-way analysis of variance with Games-Howell post-hoc analyzes and Pearson’s chi-square test, respectively. A value of p