Authors
- Ivan Lorencin – University of Rijeka, Faculty of Engineering
- Nikola Anđelić – University of Rijeka, Faculty of Engineering
- Sandi Baressi Šegota – University of Rijeka, Faculty of Engineering
- Daniel Štifanić – University of Rijeka, Faculty of Engineering
- Jelena Musulin – University of Rijeka, Faculty of Engineering
- Vedran Mrzljak – University of Rijeka, Faculty of Engineering
- Elitza Markova-Car – University of Rijeka, Department of Biotechnology
- Zlatan Car – University of Rijeka, Faculty of Engineering
Article type:
Original Scientific Paper
Abstract:
One of the challenges in medical data classification is determining the sufficient training dataset size. This research shows the effect of training dataset size on various artificial neural network (ANN) configurations. All ANNs were trained and tested using the Wisconsin Breast Cancer (Diagnostic) Dataset, which contains 569 samples. The dataset was divided into a training set of 410 samples and a test set of 159 samples. The training procedure for all ANNs was performed with datasets ranging from 10 to 410 samples. Performances of all ANNs were evaluated using ROC analysis.
The results show that if datasets smaller than 282 samples are used for training, higher AUC values are achieved using a deep ANN designed with a ReLU activation function. Conversely, with larger datasets, the best performance is obtained using ANNs designed with one hidden layer and a Logistic sigmoid activation function. Our results indicate that smaller datasets can be used for classifier training if the appropriate ANN architecture is chosen. Furthermore, it can be concluded that there is no need for large datasets to design an ANN for breast cancer diagnosis.
Keywords:
Activation function; Artificial neural network; Breast cancer diagnosis; Dataset size

