Friday, March 23, 2012

Retrieving Top x within Top y within Top z ...

How can I efficiently retrieve the top x managers within the top y regions within the top z states within the top q countries?

The only way I have been able to do this is to first find the top q countries, and store the list in an IN clause. Then for each of those, find the top z states and create another IN clause that contains the country+state concatenated. Then for each of those, find the top y regions and concatenate country+state+region. Then finally find the top x managers within this IN clause.

This works fine for a few hundred records, but once it reaches the thousands, it takes much too long. The final IN clause contains thousands of entries.

Isn't there a simpler way to approach this, especially with SQL Server 2005?

Instead of an IN clause containing a comma-delimited list, you can write a query inside the IN clause parentheses.

That might be a bit simpler. Just nest your four queries.

Now, I want to make sure I understand your example.

Let's say there are 100 countries we do business in, and there are 1000 salesmen in each country.

I sell five times more than any other salesman in the world.

However, the other 999 salesmen in my country are slack-jawed buffoons who have sold almost nothing.

So, my country has the lowest overall sales of any other country.

If your list was for top salesmen instead of top managers, would I be on it?

If you wanted the top 5 salesmen, would 4 of my slack-jawed colleagues also be on the list, since together we are the top 5 salesmen in our country?

How I might suggest to proceed would be influenced by your answers!

|||

You've nailed the complexity of this problem.

If you were the top salesman in the lowest-sales country, and we were looking at the top 50 countries, then no, you would not be selected.

If we were looking at all 100 countries and 5 salesmen in each, then you and 4 others would be selected.

We're looking for the top x in EACH of the top y ...

Because of this, I cannot get the sub-query to work, since it just ends up with the top x overall. Is there any way in SQL Server to ask for the Top x FOR EACH y?

|||

give me some create table statements and insert statements for sample data, and I'll see what I can do!


No comments:

Post a Comment