Pest and Disease Identification with Incomplete Data

Edio da Costa`

Edio da Costa` Dili Institute of Technology

Keywords: Missing data, ambiguity, similarity, pest and diseases.

Abstract

The problem of similarity of the symptoms causes a high degree of ambiguity in identifying pests and diseases, another problem in identifying pests and diseases is the incomplete symptoms (missing data) are told by the farmers because the symptoms conveyed have similarities with pests and other diseases making it difficult to identify. The objective of this study is to identify pests and diseases based on incomplete data. The similarity method with Jaccard Similarity (JS), Cosine Similarity (CS), and Dice Similarity (DS) is used to solve the problem of incomplete data. The purpose of the three methods is to find the best accuracy to solve the problem of incomplete data of symptoms to identify the pests and diseases of rice plants. The result of the experiment shows DS obtained the highest performance of accuracy compared to JS and CS.

Downloads

Download data is not yet available.

References

Santos M. S., Pereira R. C., Costa A. F., Soares J. P., Santos J. and Abreu P. H.. (2019) Generating Synthetic Missing Data: A Review by Missing Mechanism," in IEEE Access, vol. 7, pp. 11651-11667, doi: 10.1109/ACCESS.201 9.2891360.

Capariño E. T., Sison A. M. and Medina R. P.. (2019), Application of the Modified Imputation Method to Missing Data to Increase Classification Performance. IEEE 4th International Conference on Computer and Communication Systems (ICCCS), Singapore, pp. 134-139, doi: 10.1109/CCOMS.2019.8821632.

Sessa J. and Syed D. (2016). Techniques to deal with missing data," 2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA), Ras Al Khaimah, pp. 1-4, doi:10.1109/ICEDSA.2016.78184

Gupta P. and Tiwari P. (2016) "Measures of cosine similarity intended for fuzzy sets, intuitionistic and interval-valued intuitionistic fuzzy sets with application in medical diagnoses. International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, pp. 1846-1849.

Little T. D., Jorgensen T. D., Lang K. M. and Moore E. W. G. (2013). On the joys of missing data", J. Pediatric Psychol., vol. 39, pp. 151-162.

Ezzine I. and Benhlima L. (2018) "A Study of Handling Missing Data Methods for Big Data.IEEE 5th International Congress on Information Science and Technology (CiSt), Marrakech, pp. 498-501, doi: 10.1109/CIST.2018.8596389.

Zeng D., Xie D., Liu R. and Li X. (2017). Missing value imputation methods for TCM medical data and its effect in the classifier accuracy. IEEE 19th International Conference on e-Health Networking, Applications and Services (Healthcom), Dalian, pp. 1-4, doi: 10.1109/HealthCom 2017.8210844.

Ehrlinger L., Grubinger T., Varga B., Pichler M., Natschläger T. and Zeindl J. (2019) Treating Missing Data in Industrial Data Analytics. Thirteenth International Conference on Digital Information Management (ICDIM), Berlin, Germany, pp. 148-155, doi: 10.1109/ICDIM.2018.8846984.

Song S., Sun Y., Zhang A., Chen L. and Wang J., (2018) "Enriching Data Imputation under Similarity Rule Constraints," in IEEE Transactions on Knowledge and Data Engineering, vol. 32, no. 2, pp. 275-287, 1 Feb. 2020, doi: 10.1109/TKDE.2018.2883103.

Costa E., Tjandrasa H., Djanali S., (2018) Text mining for pest and disease identification on rice farming with interactive text messaging, International Journal of Electrical and Computer Engineering Vol.8 (3), pp.1671-1683.

Chahal M.. (2016). Information Retrieval Using Dice Similarity Coefficent. Internationl Journal of Advanced Research of Computer Science and Software Engineering. Vol. 6.

Tada V.. (2013). Comparision of Jaccard, Dice, Cosine Similarity Coefficient to Find the Best Fitness Value for Web Retrieve Documents Using Genetic Algorithm. International Journal of Inovation in Engineering and Technology. Vol. 2. pp. 202-205.

Francis, A.S. Dhas B.K., Anoop. (2016). Identification of Leaf Diseases in Pepper Plants Using Soft Computing Techniques.” International Conference on Emerging Devices and Smart Systems (ICEDSS), pp.168-173.

García-Laencina P. J., P. Abreu H., Abreu M. H., and Afonoso N. (2015). Missing data imputation on the 5-year survival prediction of breast cancer patients with unknown discrete values,’’ Comput. Biol. Med., vol. 59, pp. 125–133, Apr.

Howell D. C. (2007). The treatment of missing data,’’ in The Sage Handbook of Social Science Methodology. London, U.K.: Sage, pp. 208–224.

García-Laencina P. J., Sancho-Gómez J.-L., and Figueiras-Vidal A. R.. (2010). ‘‘Pattern classification with missing data: A review,’’ Neural Comput. Appl., vol. 19, no. 2, pp. 263–282.

Santos M. S., Soares J. P., Abreu P. H., Araújo H., and Santos J. (2017). Influence of data distribution in missing data imputation,’’ in Proc. Conf. Artif. Intell. Med. Eur. Vienna, Austria: Springer, pp. 285–294.

Deng (2016). "On Biostatistics and Clinical Trials". Archived from the original on 15 March 2016. Retrieved 13 May 2016.

Roderick J. A., Rubin, Donald B. (2002), Statistical Analysis with Missing Data (2nd ed.), Wiley

Makhtar M., Neagu D.C., Ridley M.J. (2011) Comparing Multi-class Classifiers: On the Similarity of Confusion Matrices for Predictive Toxicology Applications. In: Yin H., Wang W., Rayward-Smith V. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2011. IDEAL 2011. Lecture Notes in Computer Science, vol 6936. Springer, Berlin, Heidelb.

Faria A. (2014) Automatic identification of fruit flies (Diptera: Tephritidae). J. Vis. Commun. Image R, vol.25, pp.1516-1527.

Sabab A., Pritom A.I. (2016). Cardiovascular Disease Prognosis Using Effective Classification and Feature Selection Technique. International Conference on Medical Engineering, Health Informatics and Technology (MediTec). pp.1-6.

Soyusiawaty D. and Zakaria Y. "Book Data Content Similarity Detector With Cosine Similarity (Case study on digilib.uad.ac.id). (2018). 12th International Conference on Telecommunication Systems, Services, and Applications (TSSA), Yogyakarta, Indonesia, pp. 1-6, doi: 10.1109/TSSA.2018.8708758.

Kurniawan N., Yanti M. Z. A., Nazri, and Zulvandri. (2014). Expert Systems for Self-Diagnosing of Eye Diseases Using Naïve Bayes. International Conference of Advanced Informatics: Concept, Theory and Application, pp. 113-116

Kaewthai S., Kiattisin S. (2015). Diabetes Dose Titration Identification Model.” Biomedical Engineering International Conference. pp.1-5.

Polit D.F., Beck C.T.. (2012). Nursing Research: Generating and Assessing Evidence for Nursing Practice, 9th ed. Philadelphia, USA: Wolters Klower Health, Lippincott Williams & Wilkins

Lan et al. (2009). Supervised and Traditional Term Weighting Methods for Automatic Text Categorization. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.31, pp. 721-735.

Saltn C., Buckley, (1998). Term weighting approaches in automatic text retrieval.” Information Processing and Management, vol. 24, pp. 513-523.

Chrsten. (2006). A comparison of personal name matching: techniques and practical issues, in: Workshops Proceedings of the 6th IEEE International Con- ference on Data Mining. pp. 290–294.

Bilenko, R. J. Mooney, W.W. Cohen, P.D. Ravikumar, S.E. Fienberg. (2003). Adaptive name matching in information integration, IEEE Intell. Syst. 18 (5), 16–23.

F. Samatova, W. Hendrix, Jenkins, K. Padmanabhan, A Chakraborty. (2015). Graph-based Proximity Measures. Department of Computer Science North Carolina State University.

Zięba M., (2014) "Service-Oriented Medical System for Supporting Decisions With Missing and Imbalanced Data," in IEEE Journal of Biomedical and Health Informatics, vol. 18, no. 5, pp. 1533-1540, Sept. 2014, doi: 10.1109/JBHI.2014.2322281.