- 20 Mar, 2020
As our real experiences of customers and e-commerce companies, personalized size and fit recommendations is a crucial issue for any e-commerce platform. When customers purchase products online, it is the difficulty of finding products with the right fit. Not fitting well may effect the user experience and also returning non-fit products would increase the cost of the e-commerce platform. In order to provide a better online shopping experience, platforms need to find ways to recommend the right product sizes and the best fitting products to their customers. Predicting the correct fit drives customer satisfaction and benefits the business by reducing costs incurred due to size-related returns.
For solving this problem, predicting the true or correct size of product for a specific user would be an important predictive task. To tackle this problem, we have to use a suitable dataset with relevant features in a large amount, to observe and understand data properties by statistical information, then to select appropriate models, utilize optimization approaches, consider the strength and weakness, avoid overfitting issues, and measure these models effectiveness and the predicative power of models.
In the case, let’s consider a latent factor model for recommending product size which is Small, Fit, Large to customers. Latent factors for customers and products in the model correspond to their physical true size, and theses are learnt from past product purchase and returns data. The outcome for a customer-product pair is predicted based on the difference between customer and product true sizes, and efficient algorithms are proposed for computing customer and product true size values that minimize two loss function variants. In this case, implement Hinge loss function instead of using the squared error loss function, which is so-called MSE, as there are three labels Small, Fit, Large for measurements the true size. Once we get the true size of customers and the true size of products, we can use this system to recommend the right size of products for the customer.
Based on the dataset has special properties, such as sparsity and imbalance, trying to use the large margin nearest neighbor classification (LMNN) which is modified distance as an approach to improve the k-nearest neighbor classification (kNN) by weight or scale specific dimensions, for instance keep the x-axis but adjust the y-axis. Then, we can utilize the logistic regression and do optimization.
In order to compare the performance with other models as baseline, we can try Similarity, simple logistic regression, Support Vector Machine (SVM), and Gradient Boosting Classification. For evaluating the model, to use the Area under the ROC Curve (AUC) instead of the simple accuracy measurement. Unlike the accuracy is always biased on size of test data, the AUC is better measure of classifier performance than accuracy because it does not bias on size of test or evaluation data, which is less sensitive than accuracy. Moverover, I chose the Jaccard similarity as the baseline to compare with the models.