Sugestio has extensive procedures for selecting the most appropriate algorithm for each customer. During an offline evaluation phase, we examine historical consumption behavior, like purchase history or clicking behavior. If items have metadata such as tags or categories associated with them, we try to determine if this information can be used in the recommendation process.
Evaluation procedure
Consumption data is first sorted by timestamp. This ordered dataset is then split into a training set and a test set by selecting a time boundary. Records with a timestamp older than the time boundary are part of the training set. The test set consists of all consumption data with a timestamp after the time boundary. Subsequently, each recommendation algorithm has to predict the user behavior in the test set, based on the training set as input information. The generated suggestions are then evaluated based on commonly used metrics like precision, recall, F1 and root mean square error (RMSE). Finally, we compare the performance of the different algorithms and select the optimal solution.
Cold start problems
New websites or applications don’t have a lot of historical data to work with, leading to unreliable results when applying this standard evaluation procedure. The Sugestio team has a background in recommendation research and has worked with publicly available datasets such as Netflix, Movielens, Jester and Bookcrossing. From our experience with these datasets and from projected user behavior, we can make an educated guess as to which algorithms will perform well. When sufficient data has been collected, the standard evaluation procedures can be applied.