Recommender Systems: Key Machine Learning Papers
Aug 24, 2025
A curated collection of influential and recent machine learning papers focused on recommender systems. This page aims to provide researchers and practitioners with easy access to foundational and cutting-edge work in the field.
Foundational Papers
| Title | Authors | Year | Link | Focus | Key Contribution |
|---|---|---|---|---|---|
| Collaborative Filtering for Implicit Feedback Datasets | Yifan Hu, Yehuda Koren, Chris Volinsky | 2008 | Implicit Data Modeling | Proposes a matrix factorization model tailored specifically for implicit feedback datasets, which are much more common in practice than explicit ratings. | |
| Blockbusters and Wallflowers: Speeding up Diverse and Accurate Recommendations with Random Walks | Christoffel Fabian ; Paudel, Bibek ; Newell, Chris ; Bernstein, Abraham | 2015 | Accuracy vs Diversity | Introduces random walks on item similarity graphs to generate recommendations that are both accurate and diverse, particularly for long-tail items. | |
| Metadata Embeddings for User and Item Cold-start Recommendations | Maciej Kula | 2015 | Cold-start Problem | Bridges content-based information with collaborative filtering to handle the cold-start problem by learning metadata embeddings that can predict latent user/item vectors. | |
| Embarrassingly Shallow Autoencoders for Sparse Data | Harald Steck | 2019 | Efficient collaborative filtering | Introduces a shallow (single-layer) autoencoder architecture for collaborative filtering, specifically designed to handle sparse implicit feedback data. | |
| Recency Aware Collaborative Filtering for Next Basket Recommendation | Guglielmo Faggioli, Mirko Polato, Fabio Aiolli | 2020 | Recency & Frequency | Improves Collaborative filtering by accounting for recency and frequency. |
Deep Learning for Recommendations
| Title | Authors | Year | Link | Focus | Key Contribution |
|---|---|---|---|---|---|
| Learning Deep Structured Semantic Models for Web Search using Clickthrough Data | Po-Sen Huang, Xiaodong He, Jianfeng Gao, Li Deng, Alex Acero, Larry Heck | 2013 | Semantic matching | Introduces Deep Semantic Similarity Model (DSSM) that uses two tower architecture to map both queries and documents into a common low-dimensional semantic space. | |
| Wide & Deep Learning for Recommender Systems | Heng-Tze Cheng | 2016 | Hybrid architecture | Combines wide linear models and deep neural networks for memorization and generalization in recommendations. | |
| Deep & Cross Network for Ad Click Predictions | Ruoxi Wang, Gang Fu, Bin Fu, Mingliang Wang | 2017 | Feature interactions | Proposes the Deep & Cross Network (DCN), which adds a cross layer to explicitly model feature interactions up to high orders without manual feature engineering. | |
| DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction | Huifeng Guo, Ruiming Tang, Yunming Ye, Zhenguo Li, Xiuqiang He, and Zhenhua Dong | 2018 | Feature interactions | Introduces DeepFM combining Factorization Machine (FM) component to learn low-order feature interactions, and a deep neural network to learn high-order interactions. | |
| AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks | Weiping Song, Zhijian Duan, Yewen Xu, Chence Shi, Ming Zhang, Zhiping Xiao, Jian Tang | 2019 | Interaction learning | Introduces AutoInt, which uses multi-head self-attention (like in Transformers) to automatically learn feature interactions without manual design. | |
| LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation | Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, Meng Wang | 2020 | Graph-based CF | Proposes LightGCN, a simplified version of graph convolutional networks (GCNs) using only neighborhood aggregation specifically for collaborative filtering. | |
| DCN V2: Improved Deep & Cross Network and Practical Lessons for Web-scale Learning to Rank Systems | Ruoxi Wang, Rakesh Shivanna, Derek Z. Cheng, Sagar Jain, Dong Lin, Lichan Hong, Ed H. Chi | 2020 | Feature interactions | Improves DCN by introducing a more efficient "low-rank" cross layer and adds multi-task learning. | |
| Self-Attentive Sequential Recommendation | Wang-Cheng Kang, Julian McAuley | 2018 | Sequential Recommendations | Introduces a self-attention-based model (SASRec) for sequential recommendation, inspired by the Transformer architecture to captures short- and long-term dependencies in user behavior sequences. | |
| BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer | Fei Sun, Jun Liu, Jian Wu | 2019 | Bidirectional context | Applies the BERT-style bidirectional Transformer to model user behavior sequences in a non-autoregressive manner. | |
| Context-Aware Sequential Model for Multi-Behaviour Recommendation | Shereen Elsayed, Ahmed Rashed, Lars Schmidt-Thieme | 2023 | Contextual sequential models | Proposes Context-Aware Sequential Model (CASM) with context-aware multi-head self-attention. |
Practical Tips and Tricks
| Title | Authors | Year | Link | Focus | Key Contribution |
|---|---|---|---|---|---|
| Deep Residual Learning for Image Recognition | Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun | 2015 | Fixing vanishing gradients | Introduced residual blocks, which use skip connections to help train very deep networks effectively. | |
| Densely Connected Convolutional Networks | Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger | 2016 | Parameter efficiency | Proposed DenseNet to achieve significant improvements in parameter efficiency and accuracy, while mitigating the vanishing gradient problem and requiring fewer parameters. | |
| Gaussian Error Linear Units (GELUS) | Dan Hendrycks, Kevin Gimpel | 2018 | Efficient activation function | Proposed a high-performing activation function for better gradient flow in neural networks. | |
| Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems | Benjamin Coleman, Wang-Cheng Kang, Matthew Fahrbach | 2023 | Efficient Embedding | Proposed Unified Embedding with three major benefits: simplified feature configuration, strong adaptation to dynamic data distributions, and compatibility with modern hardware. | |
| On Embeddings for Numerical Features in Tabular Deep Learning | Yury Gorishniy, Ivan Rubachev, Artem Babenko | 2023 | Embedding Numerical Features | Explores Piecewise Linear Encoding and trainable periodic encoding for numerical features. | |
| SMMR: Sampling-Based MMR Reranking for Faster, More Diverse, and Balanced Recommendations and Retrieval | Kiryl Liakhnovich, Oleg Lashinin, Andrei Babkin | 2025 | Performant re-ranking | Propose Sampled Maximal Marginal Relevance (SMMR), a sampling-based extension of MMR that introduces randomness into item selection to improve relevance-diversity trade-offs. |