Innovative Framework for Thyroid Disease Detection by Leveraging Hybrid AGTEO Feature Selection and GRU Classification Model

: Thyroid disease remains a significant health concern, necessitating advanced diagnostic tools for swift and accurate identification. The initial step involves preprocessing datasets, employing an Outlier Detection Method with Isolated Forest in conjunction with data normalization techniques to eliminate noise and standardize the data, laying a robust groundwork for subsequent analysis. Subsequently, feature extraction is conducted utilizing an Enhanced AlexNet architecture augmented by a more intricate Chameleon Swarm Algorithm (CSA) model to discern finer patterns within the data, enhancing the discriminative nature of the extracted features. Following this, a feature selection strategy employing hybrid optimization is deployed, amalgamating the strengths of Equilibrium Optimizer and Artificial Gorilla Troops Optimizer (AGTO) into a hybrid model named HAGTEO, aiming to identify the most informative features, thus reducing dimensionality and enhancing classifier efficiency. Ultimately, the Gated Recurrent Unit (GRU) classifier is employed for thyroid disease classification based on the extracted and selected features. Renowned for its capability to capture temporal dependencies, the GRU model further enhances classification accuracy. The proposed framework is subjected to testing on two distinct datasets, demonstrating its efficacy in thyroid disease detection. Experimental outcomes reveal superior performance compared to conventional methods, achieving accuracies of 98.07% and 98.00% for dataset 1 and dataset 2, respectively. As an advanced diagnostic solution for thyroid disease, it holds promising potential.


Introduction
Incidences of thyroid disease have been going up in recent years.Thyroid glands are important organs that controls metabolism.A dysfunctional thyroid gland may have any number of abnormalities.Two commonly occurring types are hyperthyroidism and hypothyroidism [1].Thyroid disorders are diagnosed in many people, including hyperthyroidism and hypothyroidism, each year.Triiodothyronine (T3) and levothyroxine (T4) are secreted by the thyroid gland; insufficient amounts of these hormones can cause hypo-and hyperthyroidism [2].In the literature, many methods are put forward for detecting thyroid disease diagnosis.Good treatment for the patient requires that it make thyroid disease predictions proactively, so as to save human lives and prevent unnecessary medical bills.Early thyroid diagnosis prediction is attained by utilising deep learning and machine learning methods in conjunction with the advancement of data processing and computation technologies [3].These methods can also classify kinds of illness as hypothyroidism or hyperthyroidism etc.
In recent years, the progress of data mining, big data, image-processing technologies, and parallel computing has led to their widespread adoption across various healthcare domains, contributing significantly to improving human health and well-being.Data mining applications in healthcare could include drug discovery, early diagnosis and diagnosis, virus outbreak prediction, and patient-specific drug adaptation testing' own conditions; and management analysis of healthcare information based on statistics calculation.The health care professionals try as best they can to identify illnesses early on so that treatment may be given promptly without great expense, and the disease itself is cured quickly.One of the conditions that gravely affects a significant portion of the human population is thyroid disease [4].On the other hand, according to the American Thyroid Association (the world's leading professional group), thyroid disease affects 20 million people in the US [5].At least 12% of Americans will experience thyroid disorders at some point in their lives [6,7].According to these figures, thyroid-related Int.Res.J. Multidiscip.Technovation, 6(3) (2024) 112-127 | 113 illnesses shouldn't be taken lightly.Efforts to improve health care techniques for the prevention and early thyroid disease detection should be made use of advanced technologies as much possible [8].
In agriculture, like other fields the last few years has seen a boom in deep learning (DL) methods.Computer vision and artificial intelligence have progressed so much that there are new ideas [9,10].These approaches are more accurate than conventional methods, providing better decision making.Because of this progress in hardware technology, DL methods are now used to solve problems that otherwise would be too complex for a reasonable amount of time [11,12].The findings of research in this field are no trifles.DL is already an up-to-date technique for land cover classification, and it could also be used in many other kinds of tasks.Several kinds of deep neural networks (DNNs) have led to excellent results in hyperspectral analysis [13].As for classification, CNNs have proven to be effective.

Motivations
Given that thyroid disease diagnosis must be rapid and accurate, this study sets out to refine existing diagnostic methods.With early detection regarded as important for improving patient outcomes, the research is aimed at providing a reliable and advanced tool to the medical community.The goal is to develop appropriate tools for diagnosing and treating thyroid disease that clinicians can use without themselves becoming victims of its side effects.

Problem Statement
Existing methods of diagnosis for thyroid disease suffer from problems in terms of accuracy and speed.Misdiagnosis and delayed diagnosis can cause poor health.An accurate and modern diagnostic approach is needed to increase the precision of thyroid diseases.Many of the current techniques experience difficulties with high-dimensional and complex data, so a new approach must be able to effectively extract salient features.Facing these obstacles, this paper needs a comprehensive framework that simultaneously optimizes feature selection and classification.This allows for a more reliable and accurate detection system to help ensure timely medical interventions save lives.

Main Contributions
Data Preprocessing: By combining outlier detection with data normalization technologies, a solid foundation for analysis is laid down.The model runs on squeaky-clean and standardized data.
Advanced Feature Extraction: Using the AlexNet architecture and an optimized Chameleon Swarm Algorithm (CSA) model enabled us to capture complicated elements in this data, yielding a robust set of features that allowed the new system to better distinguish between varied thyroid conditions.
Efficient Feature Selection: This hybrid optimization method, combining Artificial Gorilla Troops Optimizer (AGTO) with Equilibrium Optimizer (EO), results in the removal of all redundant features that add no useful information.The process not only reduces dimensionality but also dramatically improves classifier efficiency as a whole.

Classification: Leveraging
The capacity of the model to represent temporal dependencies in the data, it classified thyroid disease accurately and precisely by using a specialized recurrent neural network (GRU).

Organization of the paper
The remaining sections of the study are organised in the form of shadows: A. In Section 2, the relevant works are summarized; B. The model proposed by the author is explained briefly in section 3; C: Details of results analysis as well as validation and an explanation for failure cases are found below section 4 and D: Summary and conclusion appear finally under section 5.  [14].The feature extraction methods are built upon the MLP and Image Transformer models.The presence of numerous redundant features can lead to overfitting of classifiers, diminishing their ability to generalize effectively.Six feature transformation techniques are investigated for reducing the dimensionality of the data in order to avoid the overfitting problem: PCA, TSVD, FastICA, ISOMAP, LLE, and UMP.On the transformed dataset, the five classifiers LR, NB, SVC, KNN, and RF are assessed via the 5-fold stratified cross-validation technique.The performance is assessed using stratified cross-validation because there are significant class imbalances in both datasets.Models at various stages of analysis are ranked using the MEREC-TOPSIS MCDM technique.In the wrapper feature selection mode, the optimal strategy for dimensionality reduction is assessed in the second stage, while the optimal strategies for feature extraction and classification are chosen in the first.Applying the recently proposed FOX optimisation algorithm for metaheuristics, the two best-ranked models are further chosen for the weighted average ensemble learning and features selection.One study, by Brindha & Muthukumaravel, aimed to assess the performance of two different classifier models in diagnosing thyroid disease [16].

Related works
After training the models using UCI repository data, these method's accuracy and precision at detecting hyperthyroidism and hypothyroidism, respectively, were examined.But their results revealed that the CNN classifier was superior to the SVM classifier, which achieved 89 % accuracy and a precision of 87 %, producing more consistent and reliable results.
Punit Gupta et al., proposed an approach which uses a differential evolution (DE)-based optimization algorithm to fine-tune the parameters of machine learning models [17] [18].The main contributions were an enhanced feature selection accuracy of the dataset and a multiclass classification to differentiate flanked by three different types of thyroid disorders.In terms of the evaluated criteria, the extremely selective algorithm XGBoost showed the best classification.When the hyperparameters were optimised, the model best cutting-edge models with a correctness of 99%.
The use of deep learning algorithms for thyroid ultrasound image segmentation, feature extraction, and classification differentiation is covered in detail in this article by Xie, [19].It also provides an overview of the deep learning algorithms used to process multimodal ultrasound images.Lastly, it highlights the issues with thyroid ultrasonography image diagnosis as it stands today and anticipates new avenues for research and development.This work can help advance the use of deep learning in clinical ultrasound image diagnosis of thyroid disease and serve as a resource for medical professionals diagnosing thyroid disease.
Dhamodaran, et al. set out to look at the feasibility of using support vector machines (SVM), (KNN), and Nave Bayes to categorise thyroid datasets into several classes [20].By comparing several machine learning methods, it found one with the highest illness prediction accuracy.Predictions of future TD cases and estimates of the affected rate levels were shown to be more accurate using the Expert organization for TD Diagnosis (ESTDD) model.The sophisticated model achieved the specified levels of performance in terms of accuracy (98.53%), throughput (98.34%).

Research Gaps
Sharma et al. [14] does research on thyroid disease in which deep learning techniques and feature extraction methods are combined.But the exploration of how different feature transformation methods affect model performance is another gap.Although Mohan et al.Vgg-19-LSTMmethod appears to have potential for thyroid disease diagnosis, the performance of this technique, like all the convolutional techniques, was not tested across various datasets [15].While Brindha et al. concentrate on SVM and CNN for diagnosis of thyroid, they comment on the advantages of CNN over SVM.However, there is a lack of research on evaluating these models on different datasets other than the UCI repository data [16].However, while Gupta et al.Vgg-19-LSTM has the best performance of all the LSPs, there is still a gap in research concerning whether it is generalizable to different datasets, and whether it is robust to changes in the quality of images [17].Alnaggar et al. present a XGBoost-based classifier with good accuracy, but the research shortfall comes in evaluating the model's performance on different datasets, as well as testing the model's scalability [18].Xie, work on deep learning for ultrasound image diagnosis offers interesting findings [19].However, there remains a void in terms of addressing challenges unique to thyroid ultrasound image diagnosis and putting forward avenues for future research.Although Dhamodaran et al. study of SVM, KNN, and Naive Bayes for thyroid disease classification is thorough, a significant gap in research remains in systematically comparing the performance of these models across different datasets while accounting for the differences in prevalence of disease [20].

Proposed Methodology
Figure 1 shows the proposed work flow the thyroid disease detection model.

Dataset Description
The Thyroid Disease Data Set and the Mikeizbikiv Database are the two datasets used to identify thyroid disorders.
The content shown in table 1 is gathered from the two databases that are open to the public in order to predict thyroid diseases is indicated by , where the term is considered as and the overall quantity of data displayed is indicated by .

Outlier Detection Method Based on Isolated Forest
The isolated forest algorithm was employed in this study to detect outliers.The isolated forest (iForest) algorithm is an integrated learning-based unsupervised anomaly detection technique that doesn't require prior knowledge of the training set's label information [21].The iForest divides the data space, which contains all of the samples, into two subspaces along a given dimension using a random hyperplane.Each subspace only contains a portion of the original data, and it divides the two subspaces again in the same manner, repeating the process until each subspace has just one datum left.

Online sources Dataset description
Dataset 1 "https://www.kaggle.com/datasets/yasserhessein/thyroiddisease-data-set?select=hypothyroid.cs: access date 2022-12-30" The "Thyroid Disease Data Set" database 1 contains the information needed to predict thyroid diseases.The data is gathered by the Garavan Institute, and Ross Quinlan provides the documentation.There are 2800 total data instances, 972 test instances, and a large number of missing data (29).There is another database that has 9172 instances and 20 classes.The dataset is in the file format hypothyroid.csvand has a size of 276.17 KB.

Dataset 2
"https://github.com/mikeizbicki/datasets/blob/master/csv/ uci/ann-train.data:access data of the link 2022-12-30" The "mikeizbikiv database," also known as database  Only a few partitions are needed to isolate the abnormal data points because the density of the subspace containing the abnormal data is significantly lower than that of the normal data clusters.
The outlier ratio parameter was set to a low value due to the small sample size.Once the parameters were set, it looked for outliers on the three data sets.It decided to remove the outliers and use the linear interpolation method.

Data Normalization
In order to standardise the data because different thyroid data sets had different dimensions, this study employed maximum and minimum standardisation.
where max denotes the data's maximum value and min its minimum value.

AlexNet Model Structure
In this paper, the convolution kernel, fully connected layer, convolution layer 1, convolution layer 2, and convolution layer 5 are used after the max pooling layer; the thyroid data is utilised as the model's input data [22].192, 3x3 convolution kernels are used in the third convolutional layer, and another 192, 3x3 convolution kernel is used in the fourth convolutional layer.192, 3x3 convolution kernels are used in the fifth layer of convolution, 48, 11x11 convolution kernels are used in the first layer, and 128, 5x5 convolution kernels are used in the second layer; Lastly, the final classification is based on three fully connected layers.

Learning Phase
The data are sampled to create the training dataset during the learning phase: (1) (),  (1) ), (  (2) (),  (2) ) ⋯ , (  () (),  () )} where    symbolises the collection of   (),  symbolises the group of labels  , and (  () (),  () ) represents the training dataset's  − ℎ data ( = 123, … , ).Furthermore, one-hot encoding is employed to encrypt the labels on the data sets used for training and validation: where  1 represents a PU.Similarly, a 2x1 class score vector represents the output of the last fully connected layer of the AlexNet model: where   (⋅) is the expression that has the model parameter for the AlexNet  and  |  (⋅) is the phrase that goes with   .In this case,  |  (  () ()) shows the rating for classification of   .
As a result, the two fictitious probability expressions that follow: where Consequently, maximizing the conditional probability is the goal of the AlexNet model's training, specifical To facilitate computation, it present the logarithmic function: It want () to be as large as possible, that is −() to be as tiny as feasible, and the loss function can be obtained: The aim of training an AlexNet is to determine which θ would maximise the MAP ( |    ), namely: where  * symbolises the optimal θ under the MAP.
Using Equation (8) as the basis for the loss function, it employed the CSA optimisation algorithm to gradually adjust the model's parameter θ.This allowed the training process to converge, resulting in the optimal parameter θ being obtained by the model, and ultimately, the trained model, which can be represented as: where (  ) is both the detection threshold and the likelihood ratio.Equation ( 14), when Equations ( 12) and ( 13 where the false alarm probability constraint can be used to determine the threshold λ.To facilitate analysis, it creates the training datasets  1 and  0 possess an equal quantity of samples, meaning that ( 1 ) = ( 0 ) = 0.5; in addition: )

CSA based Hyper parameter Tuning
One of the newest metaheuristics is CSA, which Braik introduced in 2021.The hunting and food-finding processes of chameleons, a highly specialised class of animals with the capacity to adjust their colour in order to better fit their environment, are modelled by this algorithm [23].Chameleons are omnivores, meaning they eat insects and can live in semi-desert regions, lowlands, mountains, and deserts [23].They follow the prey with their eyes, track them, and then attack them as part of a multi-step food hunting process, as Figure 2 illustrates.The following subsections provide an explanation of this algorithm's mathematical models and steps.

Initialization and Function Evaluation
By creating an initial population at random, the population-based metaheuristic known as CSA initiates the optimisation process.A d-dimensional search area is used to generate the n-sized chameleon population, which is made up of all possible solutions to the optimisation problem.Equation ( 18) describes the chameleon's location in the search area at any given iteration: where  = 1,2 …  symbolises the number of iterations,  ,  symbolises the chameleon's position.Equation (19) demonstrates the process of creating the initial population according to the size of the issue as well as how many chameleons are in the search region: where   is the ith chameleon's initial vector,   and   refer to the search space's upper and lower bounds, accordingly, and r is a number from zero to one that is uniformly random.Based on the evaluation of the objective function, the quality of each step's solution is evaluated for each new position.

Search of Prey
Equation (20) establishes a foundation for describing the chameleons' movement patterns during their search based on how they update their position:  Where,  1 and  2 two positive values that govern the capacity for exploration. 1 ,  2 , and  3 are generated using random uniform numbers that range from 0 to 1.   is a randomly generated integer, uniformly generated at index , in the interval 0-1.  shows the likelihood that the chameleon will perceive prey. ( rand -0.5) influences how exploration and exploitation are carried out, and can be either -1 or 1.  is a function that grows less as the iteration count increases and depends on the iteration's parameter.

Chameleon's Eyes Rotation
Through the use of their eyes, chameleons can determine the location of their prey.They can see the prey 360 degrees thanks to this rotating feature [24].The subsequent actions take place as follows: • The chameleon's initial position serves as its centre of gravity (i.e., the beginning); • The rotation matrix is located, revealing the location of the prey; • The rotation matrix at the centre of gravity is used to update the chameleon's position; • At last, the chameleons are put back in the starting position.

Hunting Prey
When their prey gets too close, chameleons attack.The ideal chameleon is the one that is closest to the prey and is thought to yield the best results.This chameleon attacks its prey with its tongue.Its situation is made better by the chameleon's ability to spread its tongue to twice its original length.The chameleon is able to successfully snatch prey as a result and makes use of the pursuit space [24].Equation ( 21) provides a numerical representation of the chameleon's tongue's speed as it extends towards prey:

Optimizer for Artificial Gorilla Troops
The AGTO emulates the way of life of a group of gorillas in the wild.The two processes that make up the GTO are the exploration and exploitation processes, which are similar to other metaheuristic optimisation techniques [25,26].The best solution is represented by the silverback gorilla, and the gorillas' and candidates' locations are denoted by the letters X and GX, respectively.The following is a description of the GTO's phases [27][28][29].

Exploration Phase
Three mechanisms underpin the GTO's exploitation phase: gorillas moving into new areas, gorillas moving to familiar locations, and gorillas moving towards one another.The transitions between these motions are adjusted using an adjustable operator (P) in the manner described below: Where; where  is a control variable's upper limit, while  is the lower boundary. 1 ,  2 ,  3 , and  4 are arbitrary numbers in [0 − 1] .The value of  varies between -1 and 1.

Exploitation Phase
The silverback gorilla, the swarm's leader, is followed by the male and female gorillas.However, backback or young male gorillas replace the silverback when it ages or dies begin to struggle for control of the females and the leadership position.

The Equilibrium Optimizer
An effective optimizer that mimics the control volume's balance is the EO.The search agents are represented by the focus while in the dynamic equilibrium state.The mass-balanced equation is represented by the following equation: where ,  , and  are the following, in that order: volume, flow rate, and concentration.
where  = ( ) . 0 and  0 are the starting time and the initial focus.A vector pool (  ) is built using the EO technique, comprising the average solution and the top four solutions, as shown below: The following is the formulation of the EO's main equation: Where; wherein  and  are two randomly generated vectors. 1 and  2 are two values for constants that were chosen to be, respectively, 2 and 1.  1 and  2 are arbitrary parameters between 0 and 1.The value of the constant GP was chosen to be 0.5.When comparing the generated solution to the previous solution in the memory-saving step of the EO, if the new solution proves to be superior, it is refreshed.
The tendency of the GTO towards local optima and stagnation are its main drawbacks.In this way, the phases of the conventional GTO are integrated with the EO technique's exploitation and exploration stages to enhance the latter's search capabilities.The AGTEO that is being presented aims to integrate the EO and GTO.The three exploration operators of the GTO (motion to a known location, motion to an unknown location, and the exploration technique of the EO, which is a particle's memory-saving approach), when combined, and motion to other gorillas), an excellent hybrid AGTOEO is suggested.It can also search effectively.Moreover, the suggested hybrid algorithm combines the GTO and EO's respective exploitation techniques.This includes the GTO's concentration updating and the EO's particle motion concerning the best solution, or silverback.Notably, the current optimisation process iteration is represented by the symbol t.Until the stopping criteria are met that is, repeat this process until the number of iterations in the current iteration equals the maximum number.

3.5.GRU Classification
A significant problem with fully connected neural networks has been effectively resolved by Recurrent Neural Network (RNN) development, particularly with regard to LSTM networks.The problem is that there is a lot of data loss over time or in space in fully connected networks, leading to problems with vanishing and exploding gradients [30].The introduction of gates into the LSTM network effectively addressed these two problems, which are referred to as vanishing and exploding gradients.Information flow in an LSTM network is regulated and controlled by input, output, and forgetting gates.By this breakthrough, the problems caused by data loss are addressed by the capacity of LSTM to identify and acquire knowledge from sequences with extended dependencies.The GRU, which is essentially an improved LSTM model, is another significant development in this regard.Simpler network topology and fewer training parameters are achieved by the GRU, while maintaining the training efficacy of LSTM.The GRU is therefore better suited to handle sequential data, such as LIB charging and discharging profiles.One of the GRU update equations' steps, the update gate, is shown in Figure 3.   and the reset gate   are the two key gates at the center of this architecture.These gates are crucial for regulating data flow and locating data temporal dependencies.
Essentially, an update gate is a traditional recurrent network's combination of input and forgetting gates.It chooses which fresh data to include and which information from the prior state to keep.The following equation (47) displays the   formula: where   is a GRU model's update gate at time step  that determines the amount of newly added data and the amount of the previous hidden state to preserve .Information flow is regulated by Using the activation function, values between 0 and 1 are squeezed.The update gate's weight matrix,   , determines the relative importance of the previous hidden state ℎ −1 and input   .[ℎ −1 ,   ] is the current input concatenated   and earlier concealed condition ℎ −1 to create the input of the update gate.  is a term for bias that alters the decision boundary of the sigmoid to adjust the gate's behavior in equation ( 47).

Int
The reset gate is symbolized by   , is yet another essential part of the design.The amount of the previous condition that needs to be reset or erased is set when calculating the current state.Through adjusting the memory reset,   enables the network to interpret data and identify pertinent patterns and dependencies while keeping or discarding specific details from earlier time steps.The equation for   is displayed as follows in equation (48): wherein   The amount of the previous hidden state that (the reset gate) defines ℎ −1 to reset or neglect when utilizing the current input to compute the fresh concealed state at each time step ,   .By squeezing the weighted sum of the input along with the earlier concealed state, the sigmoid activation function  ensures that   accepts values in the range of 0 to 1, regulating the quantity of reset necessary in Equation (48).

Experimental Setup
The efficacy of the DL models was evaluated using the WEKA 3.8.6 environment.A data mining programme called WEKA is under the GNU General Public Licence.Data preparation, visualisation, and more are just some of the many features it offers alongside its extensive model library.

Performance Metrics
The performance metrics used for comparing the outcomes of the recommended strategy.They include the F-score, Precision, Sensitivity, Accuracy, Specificity, and Negative Predictive Value (NPV).These criteria are used to measure the recommended model's categorization performance.The letters "FN", "FP", "TN", and "TP" in the table stand for "false negative", "false positive", "true negative", and "true positive", respectively.Dataset 1 and Dataset 2 respectively, proving its ability to correctly classify cases of thyroid disease.In addition, the model performs well in terms of sensitivity, specificity, precision and F1-Score, two measures of correctly picking positive cases, and two of avoiding false positives, respectively.It is remarkable that HAGTEO-GRU always outperforms alternative models like SMO-GRU, SLO-GRU, IAOA-GRU and HAGTSO-GRU, in terms of all performance metrics.Additionally, the low FPR and FNR values affirm the model's proficiency in minimizing misclassifications.Taken as a whole, these results point up the effectiveness of HAGTEO as a feature selector, making it a promising and reliable choice for thyroid disease detection in clinical practice.

𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 (𝑆𝑃𝐸𝐶) =
When all the classification results from 2 separate datasets are compiled in a comprehensive analysis as shown in table 3 and figure 5, the proposed model, HAGTEO-GRU, demonstrates a clear performance advantage in thyroid disease detection.On Dataset 1, HAGTEO-GRU gives an outstanding accuracy of 98.0681 %, beating offerings such as LSTM, 1D-CNN, IAOA-EL, and HAGTSO-HDL.In particular, HAGTEO-GRU proves to be highly sensitive, specific and precise, being able to accurately isolate positive samples while at the same time reducing both false positives and false negatives.Its F1-Score and MCC values show the ability of the model to select the features well and perform feature classification, which further highlight the model's balanced performance.On Dataset 2, HAGTEO-GRU maintains its high standard of excellence with an accuracy of 98.00424%, beating other models in all of the various evaluation metrics.These consistently high precision, recall, and F1-Score demonstrate HAGTEO-GRU's stability in dealing with various datasets.These results all attest to the utility of the proposed model, HAGTEO-GRU, as a promising and reliable instrument for thyroid disease detection.Its efficacy in contributing to higher accuracy in clinical diagnosis is of considerable significance.The Scope of proposed GRU model versus current RNN model in comparative analysis this paper examines the most illuminating performance metrics through ROC curves, confusion matrices, and model loss-accuracy plots.The ability of the RNN to distinguish true positive from false positive rates is depicted in Figure 6, which shows the corresponding ROC curve.At the same time, you can compare the proposed GRU model directly with Figure 9: its ROC curve.More importantly, the proposed GRU model has a better ROC curve, which is not only more sensitive but also more specific than the RNN.Confusion matrices for the RNN and the proposed GRU model are given in Figures 7 and  10, respectively.These matrices give a global perspective of the models' performance across all the classes.The proposed GRU model performs consistently better than the RNN at classifying examples.This is apparent in the higher values along the diagonal, which represent correct predictions.Training results over epochs for the RNN and the proposed GRU model are presented in figures 8 and 11, respectively.The model suggested in the results section converges to lower loss values and higher accuracy faster, thus proving its efficiency in learning and generalizing patterns from the dataset.In sum, these visualizations emphasize the superior performance of the proposed GRU model relative to the standard RNN model.The ROC analysis input, confusion matrices, and training plots, altogether point to the increased discriminative power, accuracy, and efficiency of the proposed GRU model in the face of thyroid disease detection.These findings highlight the promise of GRU for being a high-level and powerful classifier that can help improve diagnostic accuracy in clinical situations.

Conclusion
This paper presents a comprehensive model for thyroid disease detection, encompassing state-of-the-art techniques from data preprocessing to feature extraction, selection, and classification.The methodology addresses diagnostic challenges in thyroid disease with a focus on enhancing efficiency and accuracy.Initial preprocessing involves outlier detection using Isolated Forest and normalization to ensure a clean, standardized dataset.Integration of these steps ensures the reliability of subsequent stages.Feature extraction employs the powerful AlexNet architecture augmented with an improved Chameleon Swarm Algorithm (CSA) to identify subtle data structures, enhancing feature discrimination.Deep learning is utilized for its efficacy in handling complex, highdimensional data.Feature selection employs a HAGTEO optimization approach, combining Artificial Gorilla Troops Optimizer (AGTO) and Equilibrium Optimizer (EO) to reduce dimensionality and enhance classification effectiveness.The Gated Recurrent Unit (GRU) classifier leverages temporal relationships for precise disease classification.Comprehensive testing on two datasets demonstrates high accuracy (98.0681% and 98.00424% for dataset 1 and dataset 2 respectively), outperforming traditional methods.However, limitations include potential biases in the datasets and generalization ability.Future research directions could involve exploring additional advanced deep learning architectures, incorporating diverse datasets, and conducting real-world clinical trials to further refine and expand the proposed framework.
. Res.J. Multidiscip.Technovation, 6(3) (2024) 112-127 | 120 The subsequent parameter of a individual system was used to behaviour the assessment:Processor: Intel(R) Core (TM) i7-97250H CPU @ 2.60 GHz; Memory: 16 GB; OS: Windows 10 Home, 64-bit CPU based on the x64 instruction set results for feature selection in table 2 and figure 4, the HAGTEO-GRU model consistently outperforms other methods in terms of a variety of evaluation metrics on both Dataset 1 and Dataset 2.More importantly, HAGTEO-GRU reaches the highest accuracy percentages 98.0681 % and 98.00424 %, respectively of correct classifications on

Figure 3 .
Figure 3.The basic structure of a GRU

Figure 8 .Figure 9 .Figure 10 .Figure 11 .
Figure 8. Model loss and model accuracy of RNN . AUC-ROC of 95.48%, F2-score of 92.01%, and accuracy score of 90.65% were also achieved by the model on the histopathological dataset.This work takes advantage of the novel combination of various algorithms to enhance the diagnosis of thyroid cancer.

Table 2 .
Effectiveness evaluation of the developed GRU-based thyroid disease prediction model among various Meta heuristic algorithms

Table 3 .
Performance estimation of the developed GRU-based thyroid disease detection technique among distinct conventional thyroid disease detection algorithms