Smart E-Commerce Using Customized Algorithms

Author:
Besaleva, Liliya, Computer Science - School of Engineering and Applied Science, University of Virginia
Advisor:
Weaver, Alfred, Department of Computer Science, University of Virginia
Abstract:

Applications for machine learning algorithms can be observed in numerous places in our modern lives. From medical diagnosis predictions to smarter ways of shopping online, big fast data is streaming in and being utilized constantly. Unfortunately, unusual instances of data, called imbalanced data, are still being ignored at large because of the inadequacies of analytical methods that are designed to handle homogenized data sets and to “smooth out” outliers. Consequently, rare use cases of significant importance remain neglected and lead to high-cost loses or even tragedies. In the past decade, a myriad of approaches handling this problem that range from data modifications to alterations of existing algorithms have appeared with varying success. Yet, the majority of them have major drawbacks when applied to different application domains because of the non-uniform nature of the applicable data. Within the vast domain of e-commerce, we are proposing an innovative approach for handling imbalanced data, which is a hybrid meta-classification method that will consist of a mixed solution of multimodal data formats and algorithmic adaptations for an optimal balance between prediction accuracy, sensitivity and specificity for multiclass imbalanced datasets. Our solution will be divided into two main phases serving different purposes. In phase one, we will classify the outliers with less accuracy for faster, more urgent situations, which require immediate predictions that can withstand possible errors in the classification. In phase two, we will do a deeper analysis of the results and aim at precisely identifying high-cost multiclass imbalanced data with larger impact. The goal of this work is to provide a solution that improves the data usability, classification accuracy and resulting costs of analyzing massive data sets in e-commerce.

Degree:
PHD (Doctor of Philosophy)
Keywords:
ecommerce, machine learning, online learning, distributed, feature selection, imbalanced data
Language:
English
Rights:
All rights reserved (no additional license for public reuse)
Issued Date:
2017/10/17