Improvementin product categorization from machine learning algorithms and similarity coefficients
DOI:
https://doi.org/10.14488/1676-1901.v21i4.4483Keywords:
Product categorization, Machine Learning, Decision-Tree, Neural Network, Naive BayesAbstract
Product categorization is an ordinary task in every business, but it involves some pitfalls when it is made by people’s judgment only. Inconsistences in product’s attributes can lead to wrong analysis, and wrongbusiness decisions at the end. Thus,the use of machine learning techniques can contribute to improve this process. The present study evaluated the use of different machine learning algorithms and problem-solving strategies in a product categorization activity based on their descriptions taking into accounta company with high speed of creation of new products, and therefore more susceptible to errors when this task is made manuallyandproposeda new process for this activity that integrates technology asa support.A new process was proposed from the best algorithm, converting the process from manual to semiautomatic. Besides the specific benefits to the company, this study also contributes to practice in unveiling the processes of building, validating and choosing machine learning models.
Downloads
References
BADRIYAH, T.; WIJAYANTO, E.T.; SYARIF, I.; KRISTALINA, P.A hybrid recommendation system for E-commerce based on product description and user profile. In: SEVENTH INTERNATIONAL CONFERENCEON INNOVATIVE COMPUTING TECHNOLOGY (INTECH). IEEE,2017. https://doi.org/10.1109/INTECH.2017.8102435
BRANDAO, M.S.; GODINHO FILHO, M.; DA SILVA, A.L. Luxury supply chain management: a framework proposal based on a systematic literature review.International Journal of Physical Distribution & Logistics Management, 2021.https://doi.org/10.1108/IJPDLM-04-2020-0110
CALVO-VALVERDE, L.A.; MENA-ARIAS, J.A. Evaluation of different text representation techniques and distance metrics using KNN for documents classification.Tecnología en marcha, v. 33, n. 1, p. 64-79, 2020.
CHANDRA, B.; MAZUMDAR, S.; ARENA, V.C.; PARIMI, N.Elegant Decision Tree Algorithm for Classification in Data Mining.In:WISE WORKSHOPS, 2002.
CHERIYAN, S.; IBRAHIM, S.; MOHANAN, S.; TREESA, S.Intelligent SalesPrediction Using Machine Learning Techniques. In:INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRONICS & COMMUNICATIONS ENGINEERING(ICCECE). IEEE,2018.https://doi.org/10.1109/iCCECOME.2018.8659115
LIMA, F.R.P. SILVA, A.L.; GODINHO FILHO, M.;DIAS, E.M. Systematic review:resilience enablers to combat counterfeit medicines.Supply Chain Management: An International Journal, 2018.https://doi.org/10.1108/SCM-04-2017-0155
FARIA, N.C. Cadastro de Materiais -Um Tesouro Ignorado pelas Empresas.Disponível em: https://www.guialog.com.br/Y542.htm.Acessoem: 04abr.2004.
GOMAA, W.;FAHMY, A.A. A survey of text similarity approaches.International journal of Computer Applications, v. 68, n. 13, p. 13-18, 2013.https://doi.org/10.5120/11638-7118
HARRAG, F.; EL-QAWASMEH,E.; PICHAPPAN, P. Improving Arabic text categorization using decision trees. In:INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 1.,IEEE, 2009.
HARRIS, J.G.; DAVENPORT, T.H.Competing on analytics: The new science of winning. Harvard Business Review, 2017.HASAN, A.; MOIN, S.; KARIM, A.; SHAMSHIRBAND, S.Machine learning-based sentiment analysis for twitter accounts.Mathematical and Computational Applications, v.23, n. 1, p. 11, 2018.https://doi.org/10.3390/mca23010011
HSSINA, B.; MERBOUHA, A.; EZZIKOURI,H.; ERRITALI, M.A comparativestudy of decision tree ID3 and C4. 5.International Journal of Advanced Computer Science and Applications, v. 4, n. 2, p. 13-19, 2014.https://doi.org/10.14569/SpecialIssue.2014.040203
HUANG, A. Similarity measures for text document clustering.In:PROCEEDINGS OF THE SIXTH NEW ZEALAND COMPUTER SCIENCE RESEARCH STUDENT CONFERENCE (NZCSRSC2008), Christchurch, New Zealand: [s.e],2008.
IMAM. Padrão Descritivo de Materiais –PDM.Disponível em:https://www.imam.com.br/consultoria/artigo/pdf/padrao-descrtivo-de-materiais-pdm.pdf. Acesso em: 4 abr. 2021.
JIN, C.; DE-LIN, L.; FEN-XIANG, M. An improved ID3 decision tree algorithm.In:INTERNATIONAL CONFERENCE ON COMPUTERSCIENCE & EDUCATION, 4.,IEEE,2009.
JOHNSON, D.E.;OLES, F. J.;ZHANG, T.;GOETZ, T. A decision-tree-based symbolic rule induction system for text categorization.IBM Systems Journal, v. 41, n. 3, p. 428-437, 2002.https://doi.org/10.1147/sj.413.0428
JORDAN, M.I.; MITCHELL, T.M. Machine learning: Trends, perspectives, and prospects.Science, v. 349, n. 6245, p. 255-260, 2015. https://doi.org/10.1126/science.aaa8415
KIBRIYA, A. M.; FRANK, E.; PFAHRINGER, B.; HOLMES, G.Multinomial naive bayes for text categorization revisited.In:AUSTRALASIAN JOINT CONFERENCEON ARTIFICIAL INTELLIGENCE. Springer,Berlin, Heidelberg, 2004. https://doi.org/10.1007/978-3-540-30549-1_43
KIM, S. B.;HAN, K. S.;RIM, H. C.;MYAENG, S. H.Some effective techniques for Naïve Bayestext classification.IEEE transactions on knowledge and data engineering, v. 18, n. 11, p.1457-1466, 2006.https://doi.org/10.1109/TKDE.2006.180
LAKSHMI, T. M.; MARTIN, A.; BEGUM, R. M.; VENKATESAN, V. P.An Analysis on Performance of Decision Tree Algorithms using Student's Qualitative
Data.International Journal of Modern Education & Computer Science, v. 5, n. 5, 2013.https://doi.org/10.5815/ijmecs.2013.05.03
LEAL, R. S. Métricas Comuns em Machine Learning: como analisar a qualidade de chat bots inteligentes —métricas (3 de 4).Disponível em:https://medium.com/as-m%C3%A1quinas-que-pensam.
MARTINEZ-MARTIN, N. What are important ethical implications of usingfacial recognition technology in health care?.AMA journal of ethics, v. 21, n. 2, p. E180, 2019.https://doi.org/10.1001/amajethics.2019.180
MIAO, F.; ZHANG, P.; JIN, L.; WU, H.Chinese news text classification based on machine learning algorithm.In:INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), 10.,2018.https://doi.org/10.1109/IHMSC.2018.10117
MOORE, M.M.;SLONIMSKY, E.;LONG, A.D.;SZE, R.W.;IYER, R.S. Machine learning concepts, concerns and opportunities for a pediatric radiologist.Pediatric radiology, v. 49, n. 4, p. 509-516, 2019.https://doi.org/10.1007/s00247-018-4277-7
NASSIF, A. B.; SHAHIN, I.; ATTILLI, I.; AZZEH, M.; SHAALAN, K.Speech recognition using deep neural networks: A systematic review.IEEE,v. 7, p. 19143-19165,2019.https://doi.org/10.1109/ACCESS.2019.2896880
PALMA NETO, L. G.; NICOLETTI, M. C. Introdução às redes neurais construtivas.São Carlos, SP: Editora da Universidade Federal de São Carlos, 2005.PAVLYSHENKO, B.M. Machine-learning models for sales time series forecasting.Data, v. 4, n. 1, p.15,2019.https://doi.org/10.3390/data4010015
PENG, W.; CHEN, J.; ZHOU,H. An implementation of ID3-decision tree learning algorithm, v. 13, 2009. https://doi.org/10.1109/ICCSE.2009.5228509
QIANG, G. An effective algorithm for improving the performance of NaïveBayes for text classification.In:SECOND INTERNATIONAL CONFERENCE ON COMPUTER RESEARCH AND DEVELOPMENT. IEEE, 2010.https://doi.org/10.1109/ICCRD.2010.160
RAD, S.E.; BEHJAT, A.R. Document Classification base on Ensemble Classifiers Support Vector Machine Multi-layer Perceptronand k-Nearest Neighbors.J.Biochem. Tech, v. 2, p.174-182, 2019.
RUECKEL, V.; KOCH, A.; FELDMANN, K.; MEERKAMM, H. Process data management in the whole product creation process.In:PROCEEDINGSOF THE NINTH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN.IEEE, 2005. https://doi.org/10.1109/CSCWD.2005.194329S
ABUNA, P.M.; SETYOHADI, D.B. Summarizing Indonesian text automatically by using sentence scoring and decision tree.In:INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE),2.,2017. https://doi.org/10.1109/ICITISEE.2017.8285473
SEBASTIANI, F. Machine learning in automated textcategorization.ACM computing surveys (CSUR), v. 34, n. 1, p. 1-47, 2002.https://doi.org/10.1145/505282.505283
SEBASTIANI, F. Text categorization. In: ENCYCLOPEDIA of Database Technologies and Applications.IGI Global, p. 683-687, 2005. https://doi.org/10.4018/978-1-59140-560-3.ch112
SHI, L.; WENG, M.; MA, X.; XI, L. Rough set based decision tree ensemble algorithm for text classification.Journal of Computational Information Systems, v. 6, n. 1, p. 89-95, 2010.
SINGH, S.; GUPTA, P. Comparative study ID3, cart and C4. 5 decision tree algorithm: a survey.International Journal of Advanced Information Science and Technology (IJAIST), v. 27,n. 27, p. 97-103, 2014.
VARIAN, H.Artificial intelligence, economics, and industrial organization.In:NATIONALBureauof Economic Research, 2018.https://doi.org/10.3386/w24839
VIJAY MEENA, M. K.; KAVITHA, K. A survey on similarity measures in text mining.Machine Learning and Applications: an InternationalJournal, v. 3, n. 2, p. 19-28, 2016.https://doi.org/10.5121/mlaij.2016.3103
WANG, Z.; DI, H.; SHAFIQ, M.A.; ALAUDAH, Y.; ALREGIB, G.Successful leveraging of image processing and machine learning in seismic structural interpretation: A review.The Leading Edge, v. 37, n. 6,p. 451-461, 2018.https://doi.org/10.1190/tle37060451.1
ZHANG, H.; LI, D. Naïve Bayes text classifier.In:INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC 2007),2007. https://doi.org/10.1109/GrC.2007.40
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Revista Produção Online
This work is licensed under a Creative Commons Attribution 4.0 International License.
The Journal reserves the right to make spelling and grammatical changes, aiming to keep a default language, respecting, however, the style of the authors.
The published work is responsibility of the (s) author (s), while the Revista Produção Online is only responsible for the evaluation of the paper. The Revista Produção Online is not responsible for any violations of Law No. 9.610 / 1998, the Copyright Act.
The journal allows the authors to keep the copyright of accepted articles, without restrictions
This work is licensed under a Creative Commons License .