Recommender Systems: Key Machine Learning Papers

Aug 24, 2025

#papers #ml #recsys

A curated collection of influential and recent machine learning papers focused on recommender systems. This page aims to provide researchers and practitioners with easy access to foundational and cutting-edge work in the field.

Foundational Papers

Title	Authors	Year	Link	Focus	Key Contribution
Collaborative Filtering for Implicit Feedback Datasets	Yifan Hu, Yehuda Koren, Chris Volinsky	2008	PDF	Implicit Data Modeling	Proposes a matrix factorization model tailored specifically for implicit feedback datasets, which are much more common in practice than explicit ratings.
Blockbusters and Wallflowers: Speeding up Diverse and Accurate Recommendations with Random Walks	Christoffel Fabian ; Paudel, Bibek ; Newell, Chris ; Bernstein, Abraham	2015	PDF	Accuracy vs Diversity	Introduces random walks on item similarity graphs to generate recommendations that are both accurate and diverse, particularly for long-tail items.
Metadata Embeddings for User and Item Cold-start Recommendations	Maciej Kula	2015	PDF	Cold-start Problem	Bridges content-based information with collaborative filtering to handle the cold-start problem by learning metadata embeddings that can predict latent user/item vectors.
Embarrassingly Shallow Autoencoders for Sparse Data	Harald Steck	2019	PDF	Efficient collaborative filtering	Introduces a shallow (single-layer) autoencoder architecture for collaborative filtering, specifically designed to handle sparse implicit feedback data.
Recency Aware Collaborative Filtering for Next Basket Recommendation	Guglielmo Faggioli, Mirko Polato, Fabio Aiolli	2020	PDF	Recency & Frequency	Improves Collaborative filtering by accounting for recency and frequency.

Deep Learning for Recommendations

Title	Authors	Year	Link	Focus	Key Contribution
Learning Deep Structured Semantic Models for Web Search using Clickthrough Data	Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, Larry Heck	2013	PDF	Semantic matching	Introduces Deep Semantic Similarity Model (DSSM) that uses two tower architecture to map both queries and documents into a common low-dimensional semantic space.
Wide & Deep Learning for Recommender Systems	Heng-Tze Cheng	2016	PDF	Hybrid architecture	Combines wide linear models and deep neural networks for memorization and generalization in recommendations.
Deep & Cross Network for Ad Click Predictions	Ruoxi Wang, Gang Fu, Bin Fu, Mingliang Wang	2017	PDF	Feature interactions	Proposes the Deep & Cross Network (DCN), which adds a cross layer to explicitly model feature interactions up to high orders without manual feature engineering.
DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction	Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He, and Zhenhua Dong	2018	PDF	Feature interactions	Introduces DeepFM combining Factorization Machine (FM) component to learn low-order feature interactions, and a deep neural network to learn high-order interactions.
AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks	Weiping Song, Zhijian Duan, Yewen Xu, Chence Shi, Ming Zhang, Zhiping Xiao, Jian Tang	2019	PDF	Interaction learning	Introduces AutoInt, which uses multi-head self-attention (like in Transformers) to automatically learn feature interactions without manual design.
LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation	Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, Meng Wang	2020	PDF	Graph-based CF	Proposes LightGCN, a simplified version of graph convolutional networks (GCNs) using only neighborhood aggregation specifically for collaborative filtering.
DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems	Ruoxi Wang, Rakesh Shivanna, Derek Z. Cheng, Sagar Jain, Dong Lin, Lichan Hong, Ed H. Chi	2020	PDF	Feature interactions	Improves DCN by introducing a more efficient "low-rank" cross layer and adds multi-task learning.
Self-Attentive Sequential Recommendation	Wang-Cheng Kang, Julian McAuley	2018	PDF	Sequential Recommendations	Introduces a self-attention-based model (SASRec) for sequential recommendation, inspired by the Transformer architecture to captures short- and long-term dependencies in user behavior sequences.
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer	Fei Sun, Jun Liu, Jian Wu	2019	PDF	Bidirectional context	Applies the BERT-style bidirectional Transformer to model user behavior sequences in a non-autoregressive manner.
Context-Aware Sequential Model for Multi-Behaviour Recommendation	Shereen Elsayed, Ahmed Rashed, Lars Schmidt-Thieme	2023	PDF	Contextual sequential models	Proposes Context-Aware Sequential Model (CASM) with context-aware multi-head self-attention.

Practical Tips and Tricks

Title	Authors	Year	Link	Focus	Key Contribution
Deep Residual Learning for Image Recognition	Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun	2015	PDF	Fixing vanishing gradients	Introduced residual blocks, which use skip connections to help train very deep networks effectively.
Densely Connected Convolutional Networks	Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger	2016	PDF	Parameter efficiency	Proposed DenseNet to achieve significant improvements in parameter efficiency and accuracy, while mitigating the vanishing gradient problem and requiring fewer parameters.
Gaussian Error Linear Units (GELUS)	Dan Hendrycks, Kevin Gimpel	2018	PDF	Efficient activation function	Proposed a high-performing activation function for better gradient flow in neural networks.
Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems	Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach	2023	PDF	Efficient Embedding	Proposed Unified Embedding with three major benefits: simplified feature configuration, strong adaptation to dynamic data distributions, and compatibility with modern hardware.
On Embeddings for Numerical Features in Tabular Deep Learning	Yury Gorishniy, Ivan Rubachev, Artem Babenko	2023	PDF	Embedding Numerical Features	Explores Piecewise Linear Encoding and trainable periodic encoding for numerical features.
SMMR: Sampling-Based MMR Reranking for Faster, More Diverse, and Balanced Recommendations and Retrieval	Kiryl Liakhnovich, Oleg Lashinin, Andrei Babkin	2025	PDF	Performant re-ranking	Propose Sampled Maximal Marginal Relevance (SMMR), a sampling-based extension of MMR that introduces randomness into item selection to improve relevance-diversity trade-offs.