Microsoft published a paper in ICML 2009 named ‘Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine’, which is claimed won the competition of most accurate and scalable CTR predictor across MS. This article shows how to inference this model(let’s call it Ad predictor) step-by-step.
Pros. and Cons.
I like it because it’s totally based on Bayesian, and Bayesian is beautiful. Online learning is naturally supported, and the precition accuracy is comparable with FTRL and OWLQN. And both training and prediction is light-weight and fast. Btw: one shortage of this model is it’s not sparse, which may be a big issue when applied on big dataset with huge amount of features.
Inference using Expectation Propagation step by step
Firstly, following is the factor graph of ad predictor.
For each sample, we can use the formula of step 13 to update the posterior parameter of W, which is very easy to be implemented.
Prediction
After training, we can predict with following formula:
Prediction Accuracy
I compared it with FTRL and OWLQN on one dataset for age&gender prediction. AUC of this model is comparable with OWLQN and FTRL, so I recommend you have a try in your case.
Insights
1). You can find variance of each feature increases after every exposure, which makes sense.
2). This model shows samples with more features will have bigger variance, which does not make sense very much. I think the reason is we assume all the features are independent. Any insights from you?