Recommendation systems are systems that recommend similar products based on the context and the industry they are being used in. They come in multiple flavors. Some of the widely used recommendation systems are :
- Netflix for Movies
- Spotify for Music
- LinkedIn for Human Resources
As part of this project, I built a content-based recommendation system that recommends similar apparel products.
Content-Based recommendation systems
The idea behind the recommendation system is simple; the user likes a product and based on the features of the product, a similar product is recommended. This approach to recommendation is useful when we have limited information about the user.
My Recommendation model leverages two approaches to perform content-based recommendation :
- Approach 1- Recommendation based on Product Features: Implemented Count Vectorizer and TF-IDF to generate word vectors and then used Cosine Similarity to recommend the closest products based on product description and product brand
- Approach 2- Recommendation based on Image Features: Implemented VGG-16 Convolution Neural Networks to extract image features and then used Cosine Similarity to recommend the closes products based on the image features
By definition, the cosine similarity calculates the angle between two vectors based on the Euclidean dot product. From a practical application standpoint, cosine similarity is useful to find similar documents (or word vectors) when the Euclidean distance might not give the right representation as to the number of terms increase( this is based on the understanding that increasing document size will also increase the occurrence of common terms and give a false view of similarity)
- Fetch Data- I leveraged the Kaggle dataset(data source below) which has approximately 180K Women Apparel Product details.
- Filter out duplicates- Pass the dataset to a filtering algorithm built-in Python to filter out duplicate values.
- Train the Recommendation System-
- Capture User Requirements- Built a Web Interface in Streamlit to capture user preferences for a product
- Recommendation- Based on the user preferences captured on the UI, the model recommends similar products
In order to enhance the quality of the recommendations and avoid duplicate values being recommended, I deployed an algorithm in Python which would mark an item as duplicates:
- Scenario A- Same products in different sizes
- Scenario B- Products with duplicate descriptions
- Scenario C- Same products with different colors
I have attached a short video below which demonstrates the Web-App built on Streamlit.