Contact Sales

Joe's blog on analog models got me thinking about what makes a good analog model. I suspect the models that have given bad rep to analogs are not done well.

They may be as little as an Excel dump of everything in the demographic database with instructions for the analyst to choose some variables and "find an analog." Analogs have been around for a long time. First developed by William Applebaum, they were used to predict sales in subareas like zip codes or census tracts when it was very difficult to calculate statistical measures. These days they are more often applied to primary trade areas and with modern computing power a variety of statistics can be used to build analog models.


In my experience, good analog models add discipline and understanding to site selection especially when complementing a more detailed model. Although they may be easy to develop, there is a lot of knowledge behind them (both technical and managerial).  Although they focus on primary trade areas, good analog models should build on the insights of more detailed spatial interaction models (and management experience). They need good definitions of trade areas and stores have to be grouped into meaningful segments. They have to include the most important factors and give them the proper weights (We used four or five factors and gave more weight to some than others).  Multiple regression helped to sort out the relative importance of each factor and to understand how much they overlapped.

The keys to a great analog model in my estimation are:

  1. Choosing the right variables and measuring them correctly
  2. Defining the right trade area.
  3. Weighting the model factors properly
  4. Reporting the results with sufficient detail to inform, but not overwhelm.

Getting the right variables in the model is arguably the most important step. The best models I have seen included solid measures of sales potential, competition, density, distance and site characteristics.  I put this ahead of getting the right trade area, because selecting the right variables is important to defining the trade area properly. If the trade areas are large, then greater weight should be given to closer areas than more distant ones when calculating averages.  Competition  needs to be clearly identified and included in the model factors, or used to adjust the trade area or the potential within the trade area. 

It is common practice to use density to group stores for analog models.  In fact density shows up in three different places in my own models.  It is used to group stores for modeling, as an adjustment to travel time and trade area size and again as a model factor within each density group.  In my experience, site characteristics were the most difficult to measure and build into analog models. Part of the problem was the sheer number of site factors and the difficulty of getting unbiased ratings.  Part was lack of variation in many factors.  For example, we could never get parking spaces or visibility to work in our early models, because all our model stores had adequate parking and visibility.

We tried to keep the analogs within the same market as the target site, although our more detailed model worked well across markets.  We also found it necessary to adjust the sales of our older stores to give better estimates of mature sales at the target sites.

If you have enough stores to always find a couple of close analogs, then weighting is less important.  If you have trouble finding good matches, then you need to think about which factors are the most important and how much weight should be given to each (I use regression to suggest weights and to balance the similarity measure). 

The model output should include the values of the key drivers and similarity measures for each driver and an overall score.   If there is much variation in the percentage of sales in the primary trade area, I like to see that percentage reported.   It is also important that an analog model select more than one analog.  With several analogs, you reduce the chance of selecting an outlier and you get a sort of confidence interval for the expected sales.  If sales at your top analogs are close, you have more confidence in the target site.  It there is wide variation in sales or if you have difficulty finding a second analog, it is reason to be very careful. 

Alan Gordon
Specialties:Combining insights from academic research, retail and real estate practitioners into statistical models for site evaluation, sales forecasting and direct marketing. I have gained unique insights in real estate modeling and sales forecasting from having simultaneous responsibility for both top line sales forecasting and real estate modeling.