Recommendations by Concise User Profiles from Review Text: Abstract and Introduction

:::info
This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Ghazaleh H. Torbati, Max Planck Institute for Informatics Saarbrucken, Germany & ghazaleh@mpi-inf.mpg.de;

(2) Andrew Yates, University of Amsterdam Amsterdam, Netherlands & a.c.yates@uva.nl;

(3) Anna Tigunova, Max Planck Institute for Informatics Saarbrucken, Germany & tigunova@mpi-inf.mpg.de;

(4) Gerhard Weikum, Max Planck Institute for Informatics Saarbrucken, Germany & weikum@mpi-inf.mpg.de.

:::

Table of Links

Abstract and Introduction
Related Work
Methodology
Experimental Design
Experimental Results
Conclusion
Ethics Statement and References

Abstract

Recommender systems are most successful for popular items and users with ample interactions (likes, ratings etc.). This work addresses the difficult and underexplored case of supporting users who have very sparse interactions but post informative review texts. Our experimental studies address two book communities with these characteristics. We design a framework with Transformer-based representation learning, covering user-item interactions, item content, and user-provided reviews. To overcome interaction sparseness, we devise techniques for selecting the most informative cues to construct concise user profiles. Comprehensive experiments, with datasets from Amazon and Goodreads, show that judicious selection of text snippets achieves the best performance, even in comparison to LLMgenerated rankings and to using LLMs to generate user profiles.

Index Terms—recommender, sparse data, user text, language model, transformer

I. INTRODUCTION

A. Motivation

Recommender systems have become a mature technology and are ubiquitous in applications [1]. Methodologies fall into two major families or hybrid combinations: i) interactionbased recommenders that leverage binary signals (e.g., membership in personal playlists or libraries) or numeric ratings for user-item pairs, and ii) content-based recommenders that exploit item features and user-provided content, ranging from meta-data attributes (e.g., item categories) all the way to textually rich reviews. In many settings, where user-item signals are easily collectable and abundant, interaction-based recommenders achieve excellent performance and are hard to beat. Some applications combine interaction data with informative item features, again relying on massive data (e.g., [2]).

In contrast, the data-poor regime of long-tail users and long-tail items has received less attention. Some works advocate countering data sparseness by clustering long-tail items (e.g., [3]). However, this is not sufficient to gear up for users with few interactions but diverse tastes. Approaches for cold-start and cross-domain recommendations and zero-shot learning (e.g., [4]–[7]), are often focused on the item side only: handling new items by transferring (latent) knowledge from existing ones with dense data.

Note that sparseness is not just a lack of user-item interactions, but also refers to the users and items themselves: users with merely a handful of items do not exhibit a clear picture of their tastes, and items that are completely unseen at training time are a challenging case. Moreover, even users with a good number of items can be a major problem when the preferred items are highly diverse, such as less known books from all kinds of genres. To address these difficulties, the most promising approach is to leverage review texts by users. In settings where users spend substantial time per item, such as books or travel destinations (as opposed to items with short attention spans, like music or restaurants), even data-poor users with few interactions may leave rich texts that express their interests and tastes. The approach has been studied before (e.g., [8]–[11]), but the performance is much lower than for settings with ample interaction data. This paper aims to advance the state of the art on this theme, by tackling both issues of data sparseness and diversity of tastes via textbased concise user profiles.

B. Research Questions

Our approach constructs and encodes concise user profiles from reviews of data-poor text-rich users, focusing on the book domain for concreteness (using data from Goodreads and Amazon). Unlike movies or restaurants, books have a much longer tail of popularity and exhibit huge diversity of user tastes, making books a particularly challenging domain for data-poor recommendations. Also, negative reviews are very rare (most have ratings like 4 or 5 stars), so that we deal with binary data and have no negative samples. This approach raises several research questions (RQs):

RQ1: Learning with Language Models. How can we best incorporate language models, like BERT, T5 or ChatGPT, for item ranking or for encoding the gist of a user’s reviews, enhanced with the LM’s world knowledge?

RQ2: Long Text. Since neural architectures can consume only limited input size (e.g., 512 tokens), we have to select a subset of snippets. Even with a large capacity, the computational and energy cost is typically quadratic in the number of input tokens, so that being selective has benefits. What are the most informative cues that should be fed into the model?

RQ3: Review Aspects. User reviews express a mix of aspects, including personal background (e.g., “I’m a retired teacher”), sentiment expressions (e.g., “what a great story”), general emotions (e.g., “brings back lovely memories”), and comments about the book contents. As sentiments do not add value over the already known fact that the book was liked, only the content comments yield cues towards new recommendations. How should this issue be considered in constructing a concise user profile with informative cues?

C. Approach

We devise a general framework, called CUP, for constructing Concise User Profiles, and encoding them in a recommender system, which allows us to configure a variety of methods. We propose a Tranformer-based architecture that supports end-to-end learning of item and user encodings, making use of a language model. Our choice for the language model is BERT; alternatives such as T5 can be easily plugged in but incur higher computational cost. The end-to-end learning operates on sufficiently short, judiciously constructed profiles, and leverages the available user item interactions as well.

On top of the Tranformer, we place feed-forward layers for a supervised classifier. The classifier and the representation learning are jointly trained end-to-end. The prediction scores for user-item pairs yield the per-user ranking of items. This architecture can take as input a spectrum of user and item information including user reviews and item descriptions and metadata (e.g., categories).

The rationale for this design is twofold: i) ensuring that inference-time computations are efficient, leveraging trainingtime-computed vectors for users and items and solely computing a scalar product at run-time, and ii) being able to express a wide spectrum of user profiling techniques within the same architecture. Alternative architectures, such as CNNbased, or LLM-based with smart prompts, are baselines in our experiments.

We specifically devise new techniques for coping with the case where users have few reviews with long and noisy text (see RQ2 and RQ3), such that their total length exceeds the number of tokens that the encoders can consume. Our techniques are based on information-retrieval-style statistics (like idf weights) for n-grams or sentences, or on neural similarity measures (using SentenceBERT), or on running review texts through a generative language model, like T5 or ChatGPT, for generating user profiles. Table VII illustrates the role of concise user profiles, by giving examples for two users of Amazon books and the Goodreads community, with profiles computed by different methods.

In experiments, we limit the number of tokens per user profile, as a stress-test. The main argument for this choice is to keep the system as light-weight as possible. The computational cost of the Transformer, and thus also the environmental footprint, are quadratic in the number of input tokens. Section V-D experimentally shows that a small token budget is competitive to larger ones, while having advantages in efficiency.

D. Contributions

The salient contributions of this work are the following:

• a new framework, called CUP, for Transformer-based recommenders that leverage language models and concise user profiles from reviews;

• judicious techniques for selecting and encoding informative cues from the long and noisy text of user reviews, outperforming prior baselines as well as methods based on large language models (LLMs);

• comprehensive experiments with focus on data-poor but text-rich users with highly diverse preferences, and difficult predictions (e.g., recommending new authors, not just new books by previously seen authors).

We make code, data and experimental details available on our project web page[1].

[1] https://personalization.mpi-inf.mpg.de/CUP