Hypothetical situation, lets say you've got a list of movies that you want to rank in a website or a report or something, and you have user-submitted ratings for them, but some are more popular than others, so your data looks like this:
data ratings; input name $ rating; datalines; Lincoln 9 Lincoln 8 Lincoln 9 Amour 9 Argo 5 Argo 10 ; run;
The easiest thing to do would be to calculate an average rating for each movie like this:
proc sql; select distinct name, avg(rating) as average from ratings group by name order by average desc; run;
But hey! That's not cool. It looks like Amour wins, because its average rating is 9. Maybe we want to consider Lincoln as better because 3 people think it's very high. A good way to deal with this is by instead taking a Bayesian Average.
This means we're going to add in some "dummy" votes for each movie, who give each movie the average rating a movie gets. How many (C) is a judgement call, the more we add, the harder we make it for an obscure movie to be near the top. Likewise, if a movie's first rating is low, it keeps it from suddenly dropping to the bottom of the list. If we expect thousands of ratings for each movie, a C=1000 might be appropriate. In this example, I use a small C of 10.
proc sql; select avg(rating) into :average from ratings; select distinct name, (sum(rating) + &average * 10) / (count(*) + 10) as b_average from ratings group by name order by b_average; quit;
And look! Lincoln is back on top, since its bayesian average more closely reflects a product of the number of ratings it has and what those ratings are.