How to find IF single rows meet a criteria ELSE aggregate multiple rows within a group - sql

I have some accounting data where I need to select a single row within a group if it meets a dollar amount criteria OR if it does not I need to sum/combine multiple rows in that group to see if that group meets the criteria. Example data:
Continent
Region
Sales Amount
South America
North
$300
South America
South
$100
South America
West
$500
South America
East
$200
North America
North
$100
North America
South
$50
North America
West
$50
North America
East
$400
Europe
North
$100
Europe
South
$200
Europe
West
$100
Europe
East
$100
Asia
North
$75
Asia
South
$100
Asia
West
$100
Asia
East
$100
Africa
North
$500
Africa
South
$700
Africa
West
$100
Africa
East
$100
In the above example, I want to find all continents that have single regions/rows with $500 in sales OR I want to find countries where 2 or more regions can be combined to meet the $500 amount. My expected result would be:
Continent
Region_1
Region_2
Sales Amount_1
Sales Amount_2
Canada
West
not applicable
$500
USA
North,East
not applicable
$500
Europe
North,South,West,East
not applicable
$500
Asia
does not meet criteria
not applicable
does not meet criteria
Africa
South
North
$700
$500
Region_2 is only applicable if more than one region within a continent meets the sales amount criteria of $500 on its own.

Related

How can I pull out the second highest product usage from a SQL Server table?

We have a product usage table for software. It has 4 fields, [product name], [usage month], [users] and [Country]. We must report the data by Country and Product Name for licensing purposes. Our rule is to report the second highest number of users per country for each product. The same products can be used in all countries. It based on monthly usage numbers, so second peak usage for fy 2020. Since all of the data is in one table I am having trouble figuring out the SQL to get the information I need from the table.
I am thinking I need to do multiple selects (inner select? ) and group the data in a way to pull out the product name, peak usage and country. But that is where I am getting confused as to the best approach.
Example Data looks like this:
[product name], [usage month], [users], [Country]
Product1 January 831 United States of America
Product1 December 802 United States of America
Product1 September 687 United States of America
Product1 August 407 United States of America
Product1 July 799 United States of America
Product1 June 824 United States of America
Product1 April 802 United States of America
Product1 May 796 United States of America
Product1 February 847 United States of America
Product1 March 840 United States of America
Product1 November 818 United States of America
Product1 October 841 United States of America
Product2 March 1006 United States of America
Product2 February 1076 United States of America
Product2 April 890 United States of America
Product2 May 831 United States of America
Product2 September 538 United States of America
Product2 October 1053 United States of America
Product2 July 673 United States of America
Product2 August 87 United States of America
Product2 November 994 United States of America
Product2 January 1042 United States of America
Product2 December 952 United States of America
Product2 June 873 United States of America
I had originally thought about breaking this out into multiple tables and then trying sql against each product table, but since this is something I will need to do monthly, I didn't want to redesign the ETL that loads the data because 1) I don't control that ETL and 2) I felt like that would be a move backwards for a repetitive task. We were also looking into Power BI to do this for us, but haven't foound the right approach, and I would honestly rather have this in SQL.
If I follow you correctly:
select *
from (
select t.*,
row_number() over(partition by product_name, country order by users desc) rn
from mytable t
) t
where rn = 2
This generates one row per product and country, that corresponds to the second highest number of users.
For one country it should be fairly simple. This is off the top of my head, but a bit of tweaking should do it. This comes from your table names, which is likely way off (right?).
SELECT top 2 users
FROM ProductCounts
WHERE County = #Country
ORDER BY users DESC
LIMIT 1;
I don't really get a sense of how your data is entered to get a good feel of a better way to store the data to get the information you desire for your report.
You can use this, it returns the second highest user count grouped by first country and second product. Take as note that when there is only 1 user count per country and product the it will not show up, there have to be at least two user counts per country and product.
SELECT
country, product, users
FROM
ProductCounts
WHERE
(SELECT COUNT(*) FROM ProductCounts AS p
WHERE
p.country = ProductCounts.country
AND
p.product = ProductCounts.product
AND
p.users >= ProductCounts.users ) = 2
GROUP BY
country, product

How to show SUM() value in T-SQL with condition?

I am trying to solve third problem from this site https://sqlzoo.net/wiki/SUM_and_COUNT.
3.)Give the total GDP of Africa:
Given relation to solve this:
name continent area population gdp
Afghanistan Asia 652230 25500100 20343000000
Albania Europe 28748 2831741 12960000000
Algeria Africa 2381741 37100000 188681000000
Andorra Europe 468 78115 3712000000
Angola Africa 1246700 20609294 100990000000
...
I wrote this:
SELECT SUM(gdp)
FROM world
GROUP BY continent = 'Africa'
It gives me basically 2 sums(Africa and world).
SUM(gdp)
69762111000000
1811788000000
How to show only sum of gdp of Africa?
Add the where clause:
SELECT SUM(gdp) FROM world WHERE continent = 'Africa'
This way you will takie only results from Africa to the sum.
SELECT SUM(gdp)
FROM world
WHERE continent = 'Africa'

Why does this correlated subquery work? (SQLZOO Select within Select 7)

So you don't have to go searching out for it, the data they're presenting for the question set looks like this and the table is called world
name continent area population gdp
Afghanistan Asia 652230 25500100 20343000000
Albania Europe 28748 2831741 12960000000
Algeria Africa 2381741 37100000 188681000000
Andorra Europe 468 78115 3712000000
Angola Africa 1246700 20609294 100990000000
They present an exercise where you use a query to select the largest country by area in each continent. They do most of it for you so getting to the answer isn't hard. This is the correct query:
SELECT continent, name, area FROM world x
WHERE area >= ALL
(SELECT area FROM world y
WHERE y.continent=x.continent
AND area>0)
I can understand what must be happening for it to work, but not why. y.continent = x.continent must by some sort of fancy GROUP BY, but... the lesson doesn't explain it and I'd really like to understand what's happening behind the scenes.
It's not a loop, or grouping. Lets picture the rowset represented as aliased as x in the query:
name continent area population gdp
Afghanistan Asia 652230 25500100 20343000000
Albania Europe 28748 2831741 12960000000
Algeria Africa 2381741 37100000 188681000000
Andorra Europe 468 78115 3712000000
Angola Africa 1246700 20609294 100990000000
Now lets add an extra column that "contains" the subquery1, with the outer x value substituted:
name continent area population gdp subquery
Afghanistan Asia 652230 25500100 20343000000 (select area FROM world y WHERE y.continent='Asia' AND area>0)
Albania Europe 28748 2831741 12960000000 (select area FROM world y WHERE y.continent='Europe' AND area>0)
Algeria Africa 2381741 37100000 188681000000 (select area FROM world y WHERE y.continent='Africa' AND area>0)
Andorra Europe 468 78115 3712000000 (select area FROM world y WHERE y.continent='Europe' AND area>0)
Angola Africa 1246700 20609294 100990000000 (select area FROM world y WHERE y.continent='Africa' AND area>0)
Let's represent those results that are returned by the subquery:
name continent area population gdp subquery
Afghanistan Asia 652230 25500100 20343000000 (652230)
Albania Europe 28748 2831741 12960000000 (28748,468)
Algeria Africa 2381741 37100000 188681000000 (2381741,1246700)
Andorra Europe 468 78115 3712000000 (28748,468)
Angola Africa 1246700 20609294 100990000000 (2381741,1246700)
Now, for each row, we compare our area column against each value returned by the subquery. That's what the ALL forces - the WHERE clause is only satisfied if all of those comparisons are true. And the nature of the comparison (>=) means that its only true across all comparisons for the country within each continent with the largest area.
1Since it's a correlated subquery, it's effectively evaluated once per row, so I think it's reasonable to show what is evaluated on a per-row basis. Note that a naive implementation may in fact evaluate the subquery a row at a time and so it will e.g. gather all of the areas within Europe (and Africa) twice whilst processing the entire outer query.
You simply want the subquery to return area values for a specific continent. In other words, you want to compare the area of a country with area of all countries being on the same continent.
For example, for the second row you compare 28748 with all values in sequence (28748, 468) when you evaluate the condition. That sequence is returned by the subquery, and it considers the fact that you want to compare only with countries in Europe.
EDIT: you ask how the nested query do the group by. The answer is: it does not. Due to the fact that the data have just one country per continent with largest area it may seems that we perform the group by. However if we have a different data:
name continent area population gdp
--------------------------------------------------------
Afghanistan Asia 652230 25500100 20343000000
Pakistan Asia 652230 2500100 2034300000
then we return both rows for one continent value, since they both satisfy the condition that you want a country with largest area in continent.

SQL issue that I should be able to answer but I cannot

Here's the tiny bit of data I am to query:
name continent area population gdp
Afghani Asia 652230 25500100 20343000000
Albania Europe 28748 2831741 12960000000
Algeria Africa 2381741 37100000 188681000000
Andorra Europe 468 78115 3712000000
Angola Africa 1246700 20609294 100990000000
Given the above data, the request was to select two columns with France, Germany, Italy and their populations.
Here was my thought:
Select name, population
where name = 'France','Germany','Italy'
Where was any screw-up, if you would be so kind.
The = operator doesn't take multiple arguments. You're looking for the in operator. Additionally, you're missing a from clause:
SELECT name, population
FROM populations
WHERE name IN ('France', 'Germany', 'Italy')

Sams Teach Yourself SQL in 10 minutes - Question about GROUP BY

i read the book "Sams Teach Yourself SQL in 10 minutes, Third Edition" and in the lesson 10 "Grouping Data", section "Creating Groups", i can't understand the following:
"Aside from the aggregate calculations statements, every column in your SELECT statement must be present in the GROUP BY clause."
Why? I tried this and i think that it is not true.
For example, consider a table 'World' with the columns 'continent', 'country', 'population'.
SELECT continent, country
FROM World
GROUP BY continent;
According to the book, this should lead to an error, right? But it doesn't. I can group my data depending on the continent (so we have at the results 7 continents) and next to each continent, a random country name.
Like this
continent country
North America Canada
South America Brazil
Europe France
Africa Cameroon
Asia Japan
Australia New Zealand
Antarctica TuxLand
You are most probably using MySQL which allows ungrouped and unaggregated expressions in SELECT clause.
This is violation of standard of course.
This is intended to simplify GROUP BY with joins on a PRIMARY KEY:
SELECT a.*, SUM(b.value)
FROM a
JOIN b
ON b.a_id = a.id
GROUP BY
a.id
Normally, you would have either to add all columns from a into the GROUP BY clause or use a subquery.
MySQL allows you not to do it since all values from a are guaranteed to be the same for a given value of the PRIMARY KEY (which is grouped on).
This is correct and should produce no error in some forms of SQL such as MySQL. You may optionally use the GROUP BY statement on more than one column but it's not required.
GROUP BY will list the first result of the columns specified - so in your case, it would return the first country/continent pair.
PostgreSQL and MySQL allow this, using one field for the group by.
The tutorial probably assumes you should use GROUP BY on all fields so from what you select, you don't lose any data - it would show every country/continent in the above example, but only once.
Here's an example table:
Continent | Country | Random_Field
---------------------------------------------
North America Canada Cake
North America Canada Dog
South America Brazil Cat
Europe France Frog
Africa Cameroon House
Asia Japan Gadget
Asia India Dance
Australia New Zealand Frodo
Antarctica TuxLand Linux
In your first statement:
SELECT continent, country
FROM World
GROUP BY continent;
The output would be:
Continent | Country
--------------------------
North America Canada
South America Brazil
Europe France
Africa Cameroon
Asia Japan
Australia New Zealand
Antarctica TuxLand
Notice one of the Asia rows was lost, despite being different.
Using a GROUP BY on both:
SELECT continent, country
FROM World
GROUP BY continent, country;
Would yield:
Continent | Country
-----------------------------
North America Canada
South America Brazil
Europe France
Africa Cameroon
Asia Japan
Asia India
Australia New Zealand
Antarctica TuxLand