Voice Disorder Recognition and Analysis Using Knowledge Engineering Techniques

Nowadays, the use of mobile application is most important thing in the healthcare sector is increasing rapidly. Mobile technologies not only for communication for multimedia content (e.g. clinical audio-visual notes and medical records) but also promising solutions for people who desire the identification, monitoring, and treatment of their health conditions anywhere and at any time. Mobile E-healthcare systems can contribute to make patient care faster, better, and cheaper. Several pathological conditions can benefit from the use of mobile technologies. In this paper we focus on dysphonia, an alteration of the voice quality that affects about one person in three at least once in his/her lifetime. Voice disorders are rapidly spreading, although they are often underestimated. Mobile health systems can be an easy and fast support to voice pathology detection. The identification of an algorithm that discriminates between pathological and healthy voices with more accuracy is necessary to realize a valid and precise mobile health system. . This technique is evaluated by based on experimental results deep neural networks with machine learning approach to provide an accuracy of 99.89% in detecting voice. In this field to detect any abnormal structure and analysis without human intervention in health care sector to enhance the utility of well beginning system.


INTRODUCTION
The introduction of mobile devices for data transmission or disease control and monitoring has been a main attraction of research and business communities. They offer, in fact, numerous opportunities to realise efficient mobile health (mhealth) systems. These solutions can allow patients and doctors to access medical records, clinical audio-visual notes and drug information anywhere and at any time from their mobile devices, such as a tablet or smartphone, to monitor several conditions [1]. M-health solutions can also be used in other important applications such as the detection and prevention of specific diseases, decision making and the management of chronic conditions and emergencies, improving the quality of patient care and reducing the costs of healthcare. Several pathological conditions can be detected and monitored, such as the well known and widespread cardiovascular diseases. In recent years, probably also due to the diffusion of the Internet of Things (IoT) and cloud technologies, there has been a development of monitoring systems in an unobtrusive, portable and easy way using wearable sensors and wireless communications, such as the solutions described in [2]- [7]. These systems are able to achieve health data monitoring and analysis, helpful for patients suffering from cardiovascular diseases or for their physical therapy. If, on the one hand, health monitoring systems for cardiovascular diseases are so celebrated, on the other hand, there are other little known and often underestimated disorders, such as dysphonia, that could benefit from m-health solutions. Dysphonia is a disorder that occurs when the voice quality, pitch and loudness are altered. About 10% of the population suffer from this disorder [8], caused mainly by unhealthy social habits and voice abuse. Unfortunately, a large number of individuals with voice disorders do not seek treatment. Therefore, m-health systems could be an efficient support for the diagnosis and screening of voice disorders.
Clinical voice pathology detection is performed through the execution of several procedures, such as the acoustic analysis. It consists of an estimation of appropriate parameters extracted from voice signal to evaluate any possible alterations of the vocal tract, according to the guidelines of the SIFEL protocol (Società Italiana di Foniatria e Logopedia), developed by the Italian Society of Logopedics and Phoniatrics, following the instructions of the Committee for Phoniatrics of the European Society of Laryngology. It is a non-invasive examination in clinical practice, complementary to other medical tests, such as the laryngoscopic examination based on the direct observation of the vocal folds. Several acoustic parameters are estimated to evaluate the state of health of the voice. Unfortunately, the accuracy of these parameters in the detection of voice disorders is, often, related to the algorithms used to estimate them. For this reason the main effort of researchers is oriented to the study of acoustic parameters and the application of classification techniques able to obtain a high discrimination accuracy. Recently, speech pathology has focused interest on machine learning techniques. In this work, we want to discuss the application of machine learning algorithms and features selection methods capable of discriminating between pathological and healthy voices with a better accuracy. In detail, we evaluate the pathology recognition using the information data of patients, such as age and gender, and different features extracted from the voice signals. The adopted parameters are those estimated in the clinical acoustic analysis, such as the Fundamental Frequency (F0), jitter, shimmer and Harmonic to Noise Ratio (HNR). In addition, other parameters, the Mel-Frequency Cepstral Coefficients (MFCC), the first and second derivatives, are used due to their wide application both in machine learning techniques and in voice disorders classification as reported in several studies. The performances are evaluated in terms of accuracy, sensitivity, specificity and receiver operating characteristic (ROC) area for each considered machine learning methods.

RELATED WORK
Speech or, in general, the voice signal is used in several kinds of application ranging from emotion recognition to patient healthcare state recognition. Several m-health solutions, such as, adopt these signals to estimate the state of voice health, as well as systems that use voice signals to evaluate emotional condition. Voice disorder detection has, often, been achieved through expert systems techniques, and over recent years, several approaches have been developed to improve the performance in terms of accuracy in the discrimination between healthy and pathological voices. These studies are focused on the identification of parameters to measure the voice condition and new techniques able to detect voice disorders. Among several expert systems techniques existing in literature, Support Vector Machine (SVM) has been widely used in voice signal processing. Godino-Llorente et al, for example, focused on the classification of pathological and healthy voices based on MFCC to train and test an SVM classifier. These have obtained a good accuracy (95%). However, the poor numerosity of the used dataset composed of only 173 pathological and 53 healthy voices selected by the Massachusetts Eye and Ear Infirmary voice and speech lab (MEEI) database should be obsverved. Additionally, important information, as for example the pathologies of the selected voices, is not available in this work. The SVM technique was also used in to estimate the presence of dysphonia, investigating four types of pathology: chronic laryngitis, cysts, Reinke's edema and spasmodic dysphonia. The authors proposed an algorithm based on the use of MFCC and Linear Discriminant Analysis (LDA) as a dimensionality reduction method. This algorithm identifies the presence of a pathology with a discrete accuracy (86%). However, it was tested on a very limited dataset. In fact, only 70 pathological and 40 healthy voices were selected by the Saarbruechen Voice Database (SVD) .

SYSTEM METHODS
In this study we analysed the accuracy in the discrimination of pathological from healthy voices of the main machine learning techniques to identify the most reliable one. The idea has been to integrate the best one in a valid m-health system, where the voice signal can be acquired by a mobile device, such as a smartphone or table, processed in realtime to extract the voice features, and analyzed by using the machine learning classifier to detect the presence or not of a voice disorders, as shown in Figure 1. In detail, we have evaluated the performance of SVM, the principal adopted technique in literature in relation to the Kernel function, and of some other machine learning algorithms used to identify the presence of voice disorders. The analysis has been performed using the WEKA tool, one of the most commonly used tools for data mining tasks, selected for the data analysis due to its efficiency, versatility and affordability. In the following subsections we introduce the dataset used in this study, the features extracted from the voice signal and used for the classification, and the machine learning techniques compared.

FEATURES EXTRACTION
Feature extraction is an important task that allows an improvement of the analysis and classification. The choice of which features of the speech signal to use in our study was made by taking into account two considerations. On the one hand, we have used the main parameters adopted by the specialist during the clinical evaluation; on the other, we have chosen the main features used in several correlated studies existing in literature concerning the use of machine learning techniques for the voice classification.
In detail, the parameters used in clinical practice are: • Fundamental Frequency (F0): this represents the rate of vibration of the vocal folds constituting an important index of laryngeal function. It is at the basis of the other parameters calculated in the acoustic analysis and most noise estimation methods.
• Jitter: this describes the instabilities of the oscillating pattern of the vocal folds, quantifying the cycle-to-cycle changes in fundamental frequency.
• Shimmer: this indicates the instabilities of the oscillating pattern of the vocal folds, quantifying the cycle-to cycle changes in amplitude.
• Harmonic to Noise Ratio (HNR): this quantifies the ratio of signal information over noise due to turbulent airflow, resulting from an incomplete vocal fold closure in speech pathologies.
The parameters used in other correlated studies are: • Mel-Frequency Cepstral Coefficients (MFCC): these coefficients try to analyse the vocal tract independently of the vocal folds that can be damaged due to voice pathologies. In this work, the experiments were conducted using 13 MFCC coefficients. • First and second derivatives of cepstral coefficient: these are useful to investigate the properties of the dynamic behaviour of the speech signal.

CLASSIFICATION
Classification includes a board range of decision theoretic approaches to the identification of voice datasets. All classification algorithms are based on the assumption that the voices in question depicts one or more features and that each of these features belongs to one of several distinct and exclusive classes. These classes may be specified a priori by an analysis or automatically clustered is the another form of component labelling.
Classification algorithms typically employ two phases of processing: 1] Training phase. 2] Testing phase.
Normally training phase, enhance the signal features are isolated in each classification categories, and then formulated the training class. In the testing phase, feature space partitions are used to classify voice signal features.

Minimum (mean) distance classifier:
Deep neural networks is a subset of machine learning algorithm that is very good at recognizing patterns but typically requires a large number of data. To recognize each object in an voice and then, implemented using Deep neural networks (DNN). Each layer are extracting further feature of the voice.

RESULTS AND DISCUSSION
The performance of the selected machine learning classification techniques was evaluated in terms of accuracy, sensibility, specificity and ROC area by using the following measurements:

CONCLUSION
In recent years, the use of mobile multimedia services and applications in healthcare sector has been increasing significantly. Mobile health applications allow people to access medical information and data of interest at any time and anywhere, useful for the monitoring and detection of specific diseases, such as dysphonia, a voice disorder often underestimated that affects a great percentage of people. Research on mobile automatic systems to estimate voice disorders has received considerable attention in the last few years due to its objectivity and non-invasive nature. Machine learning techniques can be a valid support to investigate new approaches to signal processing in an easy and fast way that can be implemented in an m-health solution. This study compares the performance of different voice pathology identification methods, taking into account the main machine learning techniques. Several techniques are applied such as the Support Vector Machine, Decision Tree, Bayesian Classification, Logistic Model Tree and Instance-based Learning algorithms. Moreover, in this work we focus on identifying appropriate voice signal features by using the comparative study of different classifiers. All analyses are performed on a wide dataset of 1370 voices selected from the Saarbruecken Voice Database.