CHI '95 ProceedingsTopIndexes
PapersTOC

Social Information Filtering: Algorithms for Automating "Word of Mouth''

Upendra Shardanand and Pattie Maes

MIT Media-Lab
20 Ames Street Rm. 305
Cambridge, MA 02139
shard@media.mit.edu\\ (617) 253-7441
pattie@media.mit.edu
(617) 253-7442

© ACM

Abstract

This paper describes a technique for making personalized recommendations from any type of database to a user based on similarities between the interest profile of that user and those of other users. In particular, we discuss the implementation of a networked system called Ringo, which makes personalized recommendations for music albums and artists. Ringo's database of users and artists grows dynamically as more people use the system and enter more information. Four different algorithms for making recommendations by using social information filtering were tested and compared. We present quantitative and qualitative results obtained from the use of Ringo by more than 2000 people.

Keywords:

social information filtering, personalized recommendation systems, user modeling, information retrieval, intelligent systems, CSCW.

Introduction

Recent years have seen the explosive growth of the sheer volume of information. The number of books, movies, news, advertisements, and in particular on-line information, is staggering. The volume of things is considerably more than any person can possibly filter through in order to find the ones that he or she will like.

People handle this information overload through their own effort, the effort of others and some blind luck. First of all, most items and information are removed from the stream simply because they are either inaccessible or invisible to the user. Second, a large amount of filtering is done for us. Newspaper editors select what articles their readers want to read. Bookstores decide what books to carry. However with the dawn of the electronic information age, this barrier will become less and less a factor. Finally, we rely on friends and other people whose judgement we trust to make recommendations to us.

We need technology to help us wade through all the information to find the items we really want and need, and to rid us of the things we do not want to be bothered with. The common and obvious approach used to tackle the problem of information filtering is content-based filtering[1]. Keyword-based filtering and patent semantic indexing [2] are some example content-based filtering techniques. Content-based filtering techniques recommend items for the user's consumption based on correlations between the content of the items and the user's preferences. For example, the system may try to correlate the presence of keywords in an article with the user's taste. However, content-based filtering has limitations:

A complementary filtering technique is needed to address these issues. This paper presents social information filtering, a general approach to personalized information filtering. Social Information filtering essentially automates the process of ``word-of-mouth'' recommendations: items are recommended to a user based upon values assigned by other people with similar taste. The system determines which users have similar taste via standard formulas for computing statistical correlations.

Social Information filtering overcomes some of the limitations of content-based filtering. Items being filtered need not be amenable to parsing by a computer. Furthermore, the system may recommend items to the user which are very different (content-wise) from what the user has indicated liking before. Finally, recommendations are based on the quality of items, rather than more objective properties of the items themselves.

This paper details the implementation of a social information filtering system called Ringo, which makes personalized music recommendations to people on the Internet. Results based on the use of this system by thousands of actual users are presented. Various social information filtering algorithms are described, analyzed and compared. These results demonstrate the strength of social information filtering and its potential for immediate application.

RINGO: A PERSONALIZED MUSIC
RECOMMENDATION SYSTEM

Social Information filtering exploits similarities between the tastes of different users to recommend (or advise against) items. It relies on the fact that people's tastes are not randomly distributed: there are general trends and patterns within the taste of a person and as well as between groups of people. Social Information filtering automates a process of ``word-of-mouth'' recommendations. A significant difference is that instead of having to ask a couple friends about a few items, a social information filtering system can consider thousands of other people, and consider thousands of different items, all happening autonomously and automatically. The basic idea is:

  1. The system maintains a user profile, a record of the user's interests (positive as well as negative) in specific items.
  2. It compares this profile to the profiles of other users, and weighs each profile for its degree of similarity with the user's profile. The metric used to determine similarity can vary.
  3. Finally, it considers a set of the most similar profiles, and uses information contained in them to recommend (or advise against) items to the user.

Ringo [1] is a social information filtering system which makes personalized music recommendations. People describe their listening pleasures to the system by rating some music. These ratings constitute the person's profile. This profile changes over time as the user rates more artists. Ringo uses these profiles to generate advice to individual users. Ringo compares user profiles to determine which users have similar taste (they like the same albums and dislike the same albums). Once similar users have been identified, the system can predict how much the user may like an album/artist that has not yet been rated by computing a weighted average of all the ratings given to that album by the other users that have similar taste.

Ringo is an on--line service accessed through electronic mail or the World Wide Web. Users may sign up with Ringo by sending e-mail to ringo@media.mit.edu with the word ``join'' in the body. The user then interacts with Ringo by sending commands and data to a central server via e-mail. Once an hour, the server processes all incoming messages and sends replies as necessary. Alternatively, users can try out Ringo via the World Wide Webb (http://ringo.media.mit.edu).

When a user first accesses Ringo, he or she is presented a list of 125 artists. The user rates artists for how much they like to listen to them. If the user is not familiar with an artist or does not have a strong opinion, the user is asked not to rate that item. Users are specifically advised to rate artists for how much they like to {\em listen} to them, not for any other criteria such as musical skill, originality, or other possible categories of judgment.


7 BOOM! One of my FAVORITE few!
Can't live without it.
6 Solid. They are up there.
5 Good Stuff.
4 Doesn't turn me on, doesn't bother me.
3 Eh. Not really my thing.
2 Barely tolerable.
1 Pass the earplugs.

Figure 1. Ringo's scale for rating music.

The scale for ratings varies from 1 ``pass the earplugs''' to 7 ``one of my favorite few, can't live without them'' (Figure 1). A seven point scale was selected since studies have shown that the reliability of data collected in surveys does not increase substantially if the number of choices is increased beyond seven\cite{behave}. Ratings are not normalized because as we expected, users rate albums in very different ways. For example, some users only give ratings to music they like (e.g. they only use 6's and 7's), while other users will give bad as well as good ratings (1's as well as 7's). An absolute scale was employed and descriptions for each rating point were provided to make it clear what each number means.

The list of artists sent to a user is selected in two parts. Part of the list is generated from a list of the most often rated artists. This ensures that a new user has the opportunity to rate artists which others have also rated, so that there is some commonality in people's profiles. The other part of the list is generated through a random selection from the (open) database of artists. Thus, artists are never left out of the loop. A user may also request a list of some artist's albums, and rate that artist's albums on an individual basis. The procedure for picking an initial list of artists for the user to rate leaves room for future improvement and research, but has been sufficient for our early tests. Figure 2 shows part of one user's ratings of the initial 125 artists selected by Ringo.


6 "10,000 Maniacs''
3 "AC/DC''
3 "Abdul, Paula''
2 "Ace of Base''
1 "Adams, Bryan''
"Aerosmith''
"Alpha Blondy''
6 "Anderson, Laurie''
5 "Arrested Development''
"Autechre''
3 "B-52s''
"Babes in Toyland''
"Be Bop Deluxe''
5 "Beach Boys, The''
"Beastie Boys''
4 "Beat Happening''
7 "Beatles, The''
1 "Bee Gees''


Figure 2. Part of one person's survey.

Once a person's initial profile has been submitted, the user can ask Ringo for predictions. Specifically, a person can ask Ringo to (1) suggest new artists/albums that the user will enjoy, (2) list artists/albums that the user will hate, and (3) make a prediction about a specific artist/album. Ringo processes such a request using its social information filtering algorithm, detailed in the next section. It then presents the person with the result. Figure 3 provides an example of Ringo's suggestions. Every recommendation includes a measure of confidence which depends on factors such as the number of similar users used to make this prediction, the consistency among those users' values, etc. (cfr. [7] for details.) Ringo's reply does not include any information about the identity of the other users whose profiles were used to make the recommendations.

Figure 3: Some of Ringo's suggestions.

Ringo provides a range of functions apart from making recommendations. For example, when rating an artist or album, a person can also write a short review, which Ringo stores. Two actual reviews entered by users are shown in Figure 4. Notice that the authors of these reviews are free to decide whether to sign these reviews or keep them anonymous. When a user is told to try or to avoid an artist, any reviews for that artist written by similar users are provided as well. Thus, rather than one ``thumbs-up, thumbs-down'' review being given to the entire audience, each user receives personalized reviews from people that have similar taste.


Tori Amos has my vote for the best artist ever. Her lyrics and music are very inspiring and thought provoking. Her music is perfect for almost any mood. Her beautiful mastery of the piano comes from her playing since she was two years old. But, her wonderful piano arrangements are accompanied by her angelic yet seductive voice. If you don't have either of her two albums, I would very strongly suggest that you go, no better yet, run down and pick them up. They have been a big part of my life and they can do the same for others. ---- user@place.edu}

I'd rather dive into a pool of dull razor blades than listen to Yoko Ono sing. OK, I'm exaggerating. But her voice is *awful* She ought to put a band together with Linda McCartney. Two Beatles wives with little musical talent.


Figure 4: Two sample reviews written by users.

In addition, Ringo offers other miscellaneous features which increase the appeal of the system. Users may add new artists and albums into the database. This feature was responsible for the growth of the database from 575 artists at inception to over 2500 artists in the first 6 weeks of use. Ringo, upon request, provides dossiers on any artist. The dossier includes a list of that artist's albums and straight averages of scores given that artist and the artist's albums. It also includes any added history about the artist, which can be submitted by any user. Users can also view a ``Top 30'' and ``Bottom 30'' list of the most highly and most poorly rated artists, on average. Finally, users can subscribe to a periodic newsletter keeping them up to date on changes and developments in Ringo.

ALGORITHMS AND QUANTITATIVE RESULTS

Ringo became available to the Internet public July 1, 1994. The service was originally advertised on only four specialized USENET newsgroups. After a slow start, the number of people using Ringo grew quickly. Word of the service spread rapidly as people told their friends, or sent messages to mailing lists. Ringo reached the 1000-user mark in less than a month, and had 1900 users after 7 weeks. At the time of this writing (September 1994) Ringo has 2100 users and processes almost 500 messages a day.

Like the membership, the size of the database grew quickly. Originally, Ringo had only 575 artists in its database. As we soon discovered, users were eager to add artists and albums to the system. At the time of this writing, there are over 3000 artists and 9000 albums in Ringo's database.

Thanks to this overwhelming user interest, we have an enormous amount of data on which to test various social information information filtering algorithms. This section discusses four algorithms that were evaluated and gives more details about the ``winning'' algorithm. For our tests, the profiles of 1000 people were considered. A profile is a sparse vector of the user's ratings for artists. 1,876 different artists were represented in these profiles.

To test the different algorithms, 20 % of the ratings in each person's profile were then randomly removed. These ratings comprised the target set of profiles. The remaining 80 % formed the source set. To evaluate each algorithm, we predicted a value for each rating in the target set, using only the data in the source set. Three such target sets and data sets were randomly created and tested, to check for consistency in our results. For brevity, the results from the first set are presented throughout this paper, as results from all three sets only differed slightly.

In the source set, each person rated on average 106 artists of the 1,876 possible. The median number of ratings was 75, and the most ratings by a single person was 772! The mean score of each profile, i.e. the average score given all artists by a user, was 3.7.

Evaluation Criteria

The following criteria were used to evaluate each prediction scheme:

Base Case Algorithm

A point of comparison is needed in order to measure the quality of social information filtering schemes in general. As a base case, for each artist in the target set, the mean score received by an artist in the source set is used as the predicted score for that artist. A social information filtering algorithm is neither personalized nor accurate unless it is a significant improvement over this base case approach.

FIGURE 5 depicts the distribution of the errors, E. | E | (as mean with bar over E) is 1.3, and the standard deviation o (sigma) is 1.6. The distribution has a nice bell curve shape about 0, which is what was desired. At first glance, it may seem that this mindless scheme does not behave too poorly. However, let us now restrict our examination to the extreme target values, where the score is 6 or greater or 2 or less. These values, after all, are the critical points. Users are most interested in suggestions of items they would love or hate, not of items about which they would be ambivalent.

Figure 5: The distribution of errors in predictions of the Base Algorithm.

The distribution of errors for extreme values is shown by the dark gray bars in Figure 5. The mean error and standard deviation worsen considerably, with | E | (as mean with bar over E) = 1.8 and o (sigma) = 2.0$. Note the lack of the desired bell curve shape. It is in fact the sum of two bell curves. The right hill is mainly the errors for those target values which are 2 or less. The left hill is mainly the errors for those target values which are 6 or greater.

For the target values 6 or greater, the mean absolute error is much worse, with | E | (as mean with bar over E) = 2.1. Why the great discrepancy in error characteristics between all values and only extreme values? Analysis of the database indicates that the mean score for each artist converges to approximately 4. Therefore, this scheme performs well in cases where the target value is near 4. However, for the areas of primary interest to users, the base algorithm is useless.

Social Information Filtering Algorithms

Four different social information filtering algorithms were evaluated. Due to space limitations, the algorithms are described here briefly. Exact mathematical descriptions as well as more detailed analysis of the algorithms can be found in [ 7 ]. .

The Mean Squared Differences Algorithm. The first algorithm measures the degree of dissimilarity between two user profiles, Ux (U subscript x) and Uy (U subscript y) by the mean squared difference between the two profiles:

Predictions can then be made by considering all users with a dissimilarity to the user which is less than a certain threshold L and computing a weighted average of the ratings provided by these most similar users, where the weights are inverse proportional to the dissimilarity.

The Pearson Algorithm. An alternative approach is to use the standard Pearson r correlation coefficient to measure similarity between user profiles:

This coefficient ranges from -1, indicating a negative correlation, via O, indicating no correlation, to +1 indicating a positive correlation between two users. Again, predictions can be made by computing a weighted average of other user's ratings, where the Pearson r coefficients are used as the weights. In contrast with the previous algorithm, this algorithm makes use of negative correlations as well as positive correlations to make predictions.

The Constrained Pearson r Algorithm. Close inspection of the Pearson r algorithm and the coefficients it produced prompted us to test a variant which takes the positivity and negativity of ratings into account. Since the scale of ratings is absolute, we ``know'' that values below 4 are negative, while values above 4 are positive. We modified the Pearson r scheme so that only when there is an instance where both people have rated an artist positively, above 4 , or both negatively, below 4, will the correlation coefficient increase. More specifically, the standard Pearson r equation was altered to become:

To produce recommendations to a user, the constrained Pearson r algorithm first computes the correlation coefficient between the user and all other users. Then all users whose coefficient is greater than a certain threshold L are identified. Finally a weighted average of the ratings of those similar users is computed, where the weight is proportional to the coefficient. This algorithm does not make use of negative ``correlations'' as the Pearson r algorithm does. Analysis of the constrained Pearson r coefficients showed that there are few very negative coefficients, so including them makes little difference.

The Artist-Artist Algorithm . The preceding algorithms deal with measuring and employing similarities between users. Alternatively, one can employ the use of correlations between artists or albums to generate predictions. The idea is simply an inversion of the previous three methodologies. Say Ringo needs to predict how a user, Murray, will like ``Harry Connick, Jr''. Ringo examines the artists that Murray has already rated. It weighs each one with respect to their degree of correlation with ``Harry Connick, Jr''. The predicted rating is then simply a weighted average of the artists that Murray has already scored. An implementation of such a scheme using the constrained Pearson r correlation coefficient was evaluated.

Results

A summary of some of our results (for different values of the threshold L) are presented in table \ref{tab:table2}. More details can be found in [7]. Overall, in terms of accuracy and the percentage of target values which can be predicted, the constrained Pearson r algorithm performed the best on our dataset if we take into account the error as well as the number of target values that can be predicted. The mean square differences and artist-artist algorithms may perform slightly better in terms of the quality of the predictions made, but they are not able to produce as many predictions.

Table 1: Summary of Results.

As expected, there is a tradeoff between the average error of the predictions and the percentage of target values that can be predicted. This tradeoff is controlled by the parameter L, the minimum degree of similarity between users that is required for one user to influence the recommendations made to another.

FIGURE 6 illustrates the distribution of errors for the best algorithm with the threshold L equal to 0.6. The distribution for extreme values approaches a bell curve, as desired. The statistics for all values and extreme values are | E | (as mean with bar over E) = 1.1, o (sigma) = 1.4 and | E | (as mean with bar over E) = 1.2, o (sigma) = 1.6, respectively. These results are quite excellent, especially as the mean absolute error for extreme values approaches that of all values. At this threshold level, 91 % of the target set is predictable.

QUALITATIVE RESULTS

Ultimately, what is more important than the numbers in the previous section is the human response to this new technology. As of this writing over 2000 people have used Ringo. Our source for a qualitative judgment of Ringo is the users themselves. The Ringo system operators have received a staggering amount of mail from users--- questions, comments, and bug reports. The results described in this section are all based on user feedback and observed use patterns.

One observation is that a social information filtering system becomes more competent as the number of users in the system increases. FIGURE 7 illustrates how the error in a recommendation relates to the number of people profiles consulted to make the recommendation. As the number of user scores used to generate a prediction increases, the deviation in error decreases significantly. This is the case because the more people use the system, the greater the chances are of finding close matches for any particular user. The system may need to reach a certain {\em critical mass} of collected data before it becomes useful. Ringo's competence develops over time, as more people use the system. Understandably then, in the first couple weeks of Ringo's life, Ringo was relatively incompetent. During these days we received many messages letting us know how poorly Ringo performed. Slowly, the feedback changed. More and more often we received mail about how "unnervingly accurate'' Ringo was, and less about how it was incorrect. Ringo's growing group of regular "customers'' indicates that it is now at a point where the majority of people find the service useful.

Figure 7 Caption.

However, many people are disappointed by Ringo's initial performance. We are often told that a person must do one or two iterations of rating artists before Ringo becomes accurate. A user would rate the initial set, then receive predictions. If the user knows any of the predicted artists are not representative of their personal taste, they rate those artists. This will radically alter the members of the user's ``similar user'' neighborhood. After these iterations, Ringo works satisfactorily. This indicates that what is needed is better algorithm for determining the ``critical'' artists to be rated by the user so as to distinguish the user's tastes and narrow down the group of similar users.

Beyond the recommendations, there are other factors which are responsible for Ringo's great appeal and phenomenal growth. The additional features, such as being a user-grown database, and the provisions for reviews and dossiers add to its functionality. Foremost, however, is the fact that Ringo is not a static system. The database and user base is continually growing. As it does, Ringo's recommendations to the user changes. For this reason, people enjoy Ringo and use it on a regular basis.

RELATED WORK

Several other attempts have been made at building filtering services that rely on patterns among multiple users. The Tapestry system \cite{tapistry} makes it possible to request Netnews documents that have been approved by other users. However, users must themselves know who these similar people are and specifically request documents annotated by those people. That is, using the Tapestry system the user still needs to know which other people may have similar tastes. Thus, the social information filtering is still left to the user.

During the development of Ringo, we learned about the existence of similar projects in a similar state of development. One such example is Grouplens [4], a system applying social information filtering to the personalized selection of Netnews. GroupLens employs Pearson r correlation coefficients to determine similarity between users. On our dataset, the algorithms described in this paper performed better than the algorithm used by Grouplens.

Two other recently developed systems are a video recommendation service implemented at Bellcore, Morristown, NJ and a movie recommendation system developed at ICSI, Berkeley, CA. Unfortunately, as of this writing, there is no information available about the algorithms used in these systems, nor about the results obtained.

The user modeling community has spawned a range of recommendation systems which use information about a user to assign that user to one of a finite set of hand-built, predefined user classes or stereotypes. Based on the stereotype the user belongs to, the system then makes recommendations to the user. For example [5] recommends novels to users based on a stereotype classification. This method is far less personalized than the social information filtering method described in this paper. The reason is that in social information filtering, in a sense every user defines a stereotype that another user can to some degree belong to. The number of stereotypes which is used to define the user's taste is much larger.

Finally, some commercial software packages exist that make recommendations to users. An example is Movie Select, a movie recommendation software package by Paramount Interactive Inc. One important difference is that these systems use a data set that does not change over time. Furthermore, these systems also do not record any history of a person's past use. As far as can be deduced from the software manuals and brochures, these systems store correlations between different items and use those correlations to make recommendations. As such the recommendations made are less personalized than in social information filtering systems.

CONCLUSIONS AND FUTURE WORK

Experimental results obtained with the Ringo system have demonstrated that social information filtering methods can be used to make personalized recommendations to users. Ringo has been tested and used in a real-world application and received a positive response. The techniques employed by the system could potentially be used to recommend books, movies, news articles, products, and more.

More work needs to be done in order to make social information filtering applicable when dealing with very large user groups and a less narrow domain. Work is currently under way to speed up the algorithm by the use of clustering techniques, so as to reduce the number of similarity measures that need to be computed. We are also using clustering techniques among the artist data so as to identify emergent musical genres and make use of these distinct genres in the prediction algorithms.

Finally, we haven't even begun to explore the very interesting and controversial social and economical implications of social information filtering systems like Ringo.

ACKNOWLEDGEMENTS

Carl Feynman proposed the mean squared differences algorithm and initially developed some of the ideas that led to this work. Karl Sims proposed the artist-artist algorithm. Lee Zamir and Max Metral implemented the WWW version of Ringo and are currently responsible for maintaining the Ringo system. Alan Wexelblat and Max Metral provided useful comments on an earlier version of the paper. This work was sponsored by the National Science Foundation under grant number IRI-92056688 and by the ``News in the Future'' Consortium of the MIT Media Lab.

References

  1. {1 cacm} Special Issue on Information Filtering, Communications of the ACM, Vol. 35, No. 12, December 1992.
  2. {2 latent} Scott Deerweester, Susan Dumais, George Furnas, Thomas Landauer, Richard Harshman, ``Indexing by Latent Semantic Analysis'', Journal of the American Society for Information Science, Vol. 41, No. 1, 1990, pp. 391--407.
  3. {3 tapistry} David Goldberg, David Nichols, Brian M. Oki and Douglas Terry, ``Using Collaborative Filtering to Weave an Information Tapestry'', Communications of the ACM, Vol. 35, No. 12, December 1992, pp. 61--70.
  4. {4 grouplens} Paul Resnick, Neophytos Iacovou, Mitesh Sushak, Peter Bergstrom, John Riedl, ``GroupLens: An Open Architecture for Collaborative Filtering of Netnews'', to be published in the Proceedings of the CSCW 1994 conference, October 1994.
  5. {5 rich} Elaine Rich, ``User Modeling via Stereotypes'', Cognitive Science, Vol. 3, pp. 335-366, 1979.
  6. {6 behave} Robert Rosenthal and Ralph Rosnow, ``Essentials of Behavioral Research: Methods and Data and Analysis'', McGraw Hill, second edition, 1991.
  7. {7 shard} Upendra Shardanand, ``Social Information Filtering for Music Recommendation'', MIT EECS M. Eng. thesis, also TR-94-04, Learning and Common Sense Group, MIT Media Laboratory, 1994.