CIE 2018- Quebec City

idetc2018

The paper titled “Supplier Clustering Based on Unstructured Manufacturing Capability Data” was presented at ASME IDETC/CIE 2018 conference in Quebec City on August 28th. Ramin Sabbagh was the primary author on this paper.  In this research, we successfully demonstrated how a hybrid approach based on document cluttering (K-means method) and topic modeling  (supported by LDA) can be used for automated extension of a SKOS thesaurus. Our longterm objective is to improve the performance of supervised machine learning techniques, such as document classification, with the aid of semantic models such as formal thesauri and ontologies. After including the new research findings, the journal version of the paper will be submitted to JCISE in near future.

This paper was one of the papers presented at Smart Manufacturing Informatics symposium. Dr. Ameri was the chair and symposium organizer.

Abstract: 

The descriptions of capabilities of manufacturing companies can be found in multiple locations including company websites, legacy system databases, and ad hoc documents and spreadsheets. The capability descriptions are often represented using natural language. To unlock the value of unstructured capability information and learn from it, there is a need for developing advanced quantitative methods supported by machine learning and natural language processing techniques. This research proposes a multi-step unsupervised learning methodology using K-means clustering and topic modeling techniques in order to build clusters of suppliers based on their capabilities, extract and organize the manufacturing capability terminology, and discover nontrivial patterns in manufacturing capability corpora. The capability data is extracted either directly from the website of manufacturing firms or from their profiles in e-sourcing portals and directories. Feature extraction and dimensionality reduction process in this work in supported by N-gram extraction and Latent Semantic Analysis (LSA) methods. The proposed clustering method is validated experimentally based a dataset composed of 150 capability descriptions collected from web-based sourcing directories such as the Thomas Net directory for manufacturing companies. The results of the experiment show that the proposed method creates supplier cluster with high accuracy.