Improvementin product categorization from machine learning algorithms and similarity coefficients

Authors

  • Maicom Sergio Brandão Universidade Federal de São Carlos (UFScar), São Carlos, SP
  • Moacir Godinho-Filho Universidade Federal de São Carlos (UFScar), São Carlos, SP
  • Walther Azzolini Junior Universidade de São Paulo, São Carlos, SP
  • Bruna Christina Battissacco Universidade de São Paulo, São Carlos, SP
  • Josadak Astorino Marçola Universidade Paulista, Araraquara, SP

DOI:

https://doi.org/10.14488/1676-1901.v21i4.4483

Keywords:

Product categorization, Machine Learning, Decision-Tree, Neural Network, Naive Bayes

Abstract

Product categorization is an ordinary task in every business, but it involves some pitfalls when it is made by people’s judgment only. Inconsistences in product’s attributes can lead to wrong analysis, and wrongbusiness decisions at the end. Thus,the use of machine learning techniques can contribute to improve this process. The present study evaluated the use of different machine learning algorithms and problem-solving strategies in a product categorization activity based on their descriptions taking into accounta company with high speed of creation of new products, and therefore more susceptible to errors when this task is made manuallyandproposeda new process for this activity that integrates technology asa support.A new process was proposed from the best algorithm, converting the process from manual to semiautomatic. Besides the specific benefits to the company, this study also contributes to practice in unveiling the processes of building, validating and choosing machine learning models.

Downloads

Download data is not yet available.

References

BADRIYAH, T.; WIJAYANTO, E.T.; SYARIF, I.; KRISTALINA, P.A hybrid recommendation system for E-commerce based on product description and user profile. In: SEVENTH INTERNATIONAL CONFERENCEON INNOVATIVE COMPUTING TECHNOLOGY (INTECH). IEEE,2017. https://doi.org/10.1109/INTECH.2017.8102435

BRANDAO, M.S.; GODINHO FILHO, M.; DA SILVA, A.L. Luxury supply chain management: a framework proposal based on a systematic literature review.International Journal of Physical Distribution & Logistics Management, 2021.https://doi.org/10.1108/IJPDLM-04-2020-0110

CALVO-VALVERDE, L.A.; MENA-ARIAS, J.A. Evaluation of different text representation techniques and distance metrics using KNN for documents classification.Tecnología en marcha, v. 33, n. 1, p. 64-79, 2020.

CHANDRA, B.; MAZUMDAR, S.; ARENA, V.C.; PARIMI, N.Elegant Decision Tree Algorithm for Classification in Data Mining.In:WISE WORKSHOPS, 2002.

CHERIYAN, S.; IBRAHIM, S.; MOHANAN, S.; TREESA, S.Intelligent SalesPrediction Using Machine Learning Techniques. In:INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRONICS & COMMUNICATIONS ENGINEERING(ICCECE). IEEE,2018.https://doi.org/10.1109/iCCECOME.2018.8659115

LIMA, F.R.P. SILVA, A.L.; GODINHO FILHO, M.;DIAS, E.M. Systematic review:resilience enablers to combat counterfeit medicines.Supply Chain Management: An International Journal, 2018.https://doi.org/10.1108/SCM-04-2017-0155

FARIA, N.C. Cadastro de Materiais -Um Tesouro Ignorado pelas Empresas.Disponível em: https://www.guialog.com.br/Y542.htm.Acessoem: 04abr.2004.

GOMAA, W.;FAHMY, A.A. A survey of text similarity approaches.International journal of Computer Applications, v. 68, n. 13, p. 13-18, 2013.https://doi.org/10.5120/11638-7118

HARRAG, F.; EL-QAWASMEH,E.; PICHAPPAN, P. Improving Arabic text categorization using decision trees. In:INTERNATIONAL CONFERENCE ON NETWORKED DIGITAL TECHNOLOGIES, 1.,IEEE, 2009.

HARRIS, J.G.; DAVENPORT, T.H.Competing on analytics: The new science of winning. Harvard Business Review, 2017.HASAN, A.; MOIN, S.; KARIM, A.; SHAMSHIRBAND, S.Machine learning-based sentiment analysis for twitter accounts.Mathematical and Computational Applications, v.23, n. 1, p. 11, 2018.https://doi.org/10.3390/mca23010011

HSSINA, B.; MERBOUHA, A.; EZZIKOURI,H.; ERRITALI, M.A comparativestudy of decision tree ID3 and C4. 5.International Journal of Advanced Computer Science and Applications, v. 4, n. 2, p. 13-19, 2014.https://doi.org/10.14569/SpecialIssue.2014.040203

HUANG, A. Similarity measures for text document clustering.In:PROCEEDINGS OF THE SIXTH NEW ZEALAND COMPUTER SCIENCE RESEARCH STUDENT CONFERENCE (NZCSRSC2008), Christchurch, New Zealand: [s.e],2008.

IMAM. Padrão Descritivo de Materiais –PDM.Disponível em:https://www.imam.com.br/consultoria/artigo/pdf/padrao-descrtivo-de-materiais-pdm.pdf. Acesso em: 4 abr. 2021.

JIN, C.; DE-LIN, L.; FEN-XIANG, M. An improved ID3 decision tree algorithm.In:INTERNATIONAL CONFERENCE ON COMPUTERSCIENCE & EDUCATION, 4.,IEEE,2009.

JOHNSON, D.E.;OLES, F. J.;ZHANG, T.;GOETZ, T. A decision-tree-based symbolic rule induction system for text categorization.IBM Systems Journal, v. 41, n. 3, p. 428-437, 2002.https://doi.org/10.1147/sj.413.0428

JORDAN, M.I.; MITCHELL, T.M. Machine learning: Trends, perspectives, and prospects.Science, v. 349, n. 6245, p. 255-260, 2015. https://doi.org/10.1126/science.aaa8415

KIBRIYA, A. M.; FRANK, E.; PFAHRINGER, B.; HOLMES, G.Multinomial naive bayes for text categorization revisited.In:AUSTRALASIAN JOINT CONFERENCEON ARTIFICIAL INTELLIGENCE. Springer,Berlin, Heidelberg, 2004. https://doi.org/10.1007/978-3-540-30549-1_43

KIM, S. B.;HAN, K. S.;RIM, H. C.;MYAENG, S. H.Some effective techniques for Naïve Bayestext classification.IEEE transactions on knowledge and data engineering, v. 18, n. 11, p.1457-1466, 2006.https://doi.org/10.1109/TKDE.2006.180

LAKSHMI, T. M.; MARTIN, A.; BEGUM, R. M.; VENKATESAN, V. P.An Analysis on Performance of Decision Tree Algorithms using Student's Qualitative

Data.International Journal of Modern Education & Computer Science, v. 5, n. 5, 2013.https://doi.org/10.5815/ijmecs.2013.05.03

LEAL, R. S. Métricas Comuns em Machine Learning: como analisar a qualidade de chat bots inteligentes —métricas (3 de 4).Disponível em:https://medium.com/as-m%C3%A1quinas-que-pensam.

MARTINEZ-MARTIN, N. What are important ethical implications of usingfacial recognition technology in health care?.AMA journal of ethics, v. 21, n. 2, p. E180, 2019.https://doi.org/10.1001/amajethics.2019.180

MIAO, F.; ZHANG, P.; JIN, L.; WU, H.Chinese news text classification based on machine learning algorithm.In:INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), 10.,2018.https://doi.org/10.1109/IHMSC.2018.10117

MOORE, M.M.;SLONIMSKY, E.;LONG, A.D.;SZE, R.W.;IYER, R.S. Machine learning concepts, concerns and opportunities for a pediatric radiologist.Pediatric radiology, v. 49, n. 4, p. 509-516, 2019.https://doi.org/10.1007/s00247-018-4277-7

NASSIF, A. B.; SHAHIN, I.; ATTILLI, I.; AZZEH, M.; SHAALAN, K.Speech recognition using deep neural networks: A systematic review.IEEE,v. 7, p. 19143-19165,2019.https://doi.org/10.1109/ACCESS.2019.2896880

PALMA NETO, L. G.; NICOLETTI, M. C. Introdução às redes neurais construtivas.São Carlos, SP: Editora da Universidade Federal de São Carlos, 2005.PAVLYSHENKO, B.M. Machine-learning models for sales time series forecasting.Data, v. 4, n. 1, p.15,2019.https://doi.org/10.3390/data4010015

PENG, W.; CHEN, J.; ZHOU,H. An implementation of ID3-decision tree learning algorithm, v. 13, 2009. https://doi.org/10.1109/ICCSE.2009.5228509

QIANG, G. An effective algorithm for improving the performance of NaïveBayes for text classification.In:SECOND INTERNATIONAL CONFERENCE ON COMPUTER RESEARCH AND DEVELOPMENT. IEEE, 2010.https://doi.org/10.1109/ICCRD.2010.160

RAD, S.E.; BEHJAT, A.R. Document Classification base on Ensemble Classifiers Support Vector Machine Multi-layer Perceptronand k-Nearest Neighbors.J.Biochem. Tech, v. 2, p.174-182, 2019.

RUECKEL, V.; KOCH, A.; FELDMANN, K.; MEERKAMM, H. Process data management in the whole product creation process.In:PROCEEDINGSOF THE NINTH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN.IEEE, 2005. https://doi.org/10.1109/CSCWD.2005.194329S

ABUNA, P.M.; SETYOHADI, D.B. Summarizing Indonesian text automatically by using sentence scoring and decision tree.In:INTERNATIONAL CONFERENCES ON INFORMATION TECHNOLOGY, INFORMATION SYSTEMS AND ELECTRICAL ENGINEERING (ICITISEE),2.,2017. https://doi.org/10.1109/ICITISEE.2017.8285473

SEBASTIANI, F. Machine learning in automated textcategorization.ACM computing surveys (CSUR), v. 34, n. 1, p. 1-47, 2002.https://doi.org/10.1145/505282.505283

SEBASTIANI, F. Text categorization. In: ENCYCLOPEDIA of Database Technologies and Applications.IGI Global, p. 683-687, 2005. https://doi.org/10.4018/978-1-59140-560-3.ch112

SHI, L.; WENG, M.; MA, X.; XI, L. Rough set based decision tree ensemble algorithm for text classification.Journal of Computational Information Systems, v. 6, n. 1, p. 89-95, 2010.

SINGH, S.; GUPTA, P. Comparative study ID3, cart and C4. 5 decision tree algorithm: a survey.International Journal of Advanced Information Science and Technology (IJAIST), v. 27,n. 27, p. 97-103, 2014.

VARIAN, H.Artificial intelligence, economics, and industrial organization.In:NATIONALBureauof Economic Research, 2018.https://doi.org/10.3386/w24839

VIJAY MEENA, M. K.; KAVITHA, K. A survey on similarity measures in text mining.Machine Learning and Applications: an InternationalJournal, v. 3, n. 2, p. 19-28, 2016.https://doi.org/10.5121/mlaij.2016.3103

WANG, Z.; DI, H.; SHAFIQ, M.A.; ALAUDAH, Y.; ALREGIB, G.Successful leveraging of image processing and machine learning in seismic structural interpretation: A review.The Leading Edge, v. 37, n. 6,p. 451-461, 2018.https://doi.org/10.1190/tle37060451.1

ZHANG, H.; LI, D. Naïve Bayes text classifier.In:INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC 2007),2007. https://doi.org/10.1109/GrC.2007.40

Published

2022-03-25

How to Cite

Brandão, M. S., Godinho-Filho, M., Azzolini Junior, W., Battissacco, B. C. ., & Astorino Marçola, J. (2022). Improvementin product categorization from machine learning algorithms and similarity coefficients. Revista Produção Online, 21(4), 2093–2124. https://doi.org/10.14488/1676-1901.v21i4.4483

Issue

Section

Papers