How can I pull out the second highest product usage from a SQL Server table? - sql

We have a product usage table for software. It has 4 fields, [product name], [usage month], [users] and [Country]. We must report the data by Country and Product Name for licensing purposes. Our rule is to report the second highest number of users per country for each product. The same products can be used in all countries. It based on monthly usage numbers, so second peak usage for fy 2020. Since all of the data is in one table I am having trouble figuring out the SQL to get the information I need from the table.
I am thinking I need to do multiple selects (inner select? ) and group the data in a way to pull out the product name, peak usage and country. But that is where I am getting confused as to the best approach.
Example Data looks like this:
[product name], [usage month], [users], [Country]
Product1 January 831 United States of America
Product1 December 802 United States of America
Product1 September 687 United States of America
Product1 August 407 United States of America
Product1 July 799 United States of America
Product1 June 824 United States of America
Product1 April 802 United States of America
Product1 May 796 United States of America
Product1 February 847 United States of America
Product1 March 840 United States of America
Product1 November 818 United States of America
Product1 October 841 United States of America
Product2 March 1006 United States of America
Product2 February 1076 United States of America
Product2 April 890 United States of America
Product2 May 831 United States of America
Product2 September 538 United States of America
Product2 October 1053 United States of America
Product2 July 673 United States of America
Product2 August 87 United States of America
Product2 November 994 United States of America
Product2 January 1042 United States of America
Product2 December 952 United States of America
Product2 June 873 United States of America
I had originally thought about breaking this out into multiple tables and then trying sql against each product table, but since this is something I will need to do monthly, I didn't want to redesign the ETL that loads the data because 1) I don't control that ETL and 2) I felt like that would be a move backwards for a repetitive task. We were also looking into Power BI to do this for us, but haven't foound the right approach, and I would honestly rather have this in SQL.

If I follow you correctly:
select *
from (
select t.*,
row_number() over(partition by product_name, country order by users desc) rn
from mytable t
) t
where rn = 2
This generates one row per product and country, that corresponds to the second highest number of users.

For one country it should be fairly simple. This is off the top of my head, but a bit of tweaking should do it. This comes from your table names, which is likely way off (right?).
SELECT top 2 users
FROM ProductCounts
WHERE County = #Country
ORDER BY users DESC
LIMIT 1;
I don't really get a sense of how your data is entered to get a good feel of a better way to store the data to get the information you desire for your report.

You can use this, it returns the second highest user count grouped by first country and second product. Take as note that when there is only 1 user count per country and product the it will not show up, there have to be at least two user counts per country and product.
SELECT
country, product, users
FROM
ProductCounts
WHERE
(SELECT COUNT(*) FROM ProductCounts AS p
WHERE
p.country = ProductCounts.country
AND
p.product = ProductCounts.product
AND
p.users >= ProductCounts.users ) = 2
GROUP BY
country, product

Related

How to find IF single rows meet a criteria ELSE aggregate multiple rows within a group

I have some accounting data where I need to select a single row within a group if it meets a dollar amount criteria OR if it does not I need to sum/combine multiple rows in that group to see if that group meets the criteria. Example data:
Continent
Region
Sales Amount
South America
North
$300
South America
South
$100
South America
West
$500
South America
East
$200
North America
North
$100
North America
South
$50
North America
West
$50
North America
East
$400
Europe
North
$100
Europe
South
$200
Europe
West
$100
Europe
East
$100
Asia
North
$75
Asia
South
$100
Asia
West
$100
Asia
East
$100
Africa
North
$500
Africa
South
$700
Africa
West
$100
Africa
East
$100
In the above example, I want to find all continents that have single regions/rows with $500 in sales OR I want to find countries where 2 or more regions can be combined to meet the $500 amount. My expected result would be:
Continent
Region_1
Region_2
Sales Amount_1
Sales Amount_2
Canada
West
not applicable
$500
USA
North,East
not applicable
$500
Europe
North,South,West,East
not applicable
$500
Asia
does not meet criteria
not applicable
does not meet criteria
Africa
South
North
$700
$500
Region_2 is only applicable if more than one region within a continent meets the sales amount criteria of $500 on its own.

Is there a function in SQL that automatically generates more rows by month?

I've got a large database that's got all our transactions and shipping costs from them, here's a simplified version:
Source Table
Date
ROUTE
Cost
01/20/21
USA to UK
$40
01/01/21
USA to UK
$40
01/10/21
USA to UK
$40
12/20/20
USA to UK
$30
11/20/20
USA to UK
$20
11/20/20
USA to UK
$20
And I want to see the average cost by month before so it would look like:
Route
Nov 2020
Dec 2020
Jan 2020
USA to UK
$20
$30
$40
How do I write a code that I can repeat for when say April comes around and I have to refresh this table and I don't need to create new columns for Feb, March, etc.?
Here is a possible way of doing it by using PIVOT in Snowflake: https://docs.snowflake.com/en/sql-reference/constructs/pivot.html
Let's say "monthname" is a column you extracted out of your "Date"-column, probably this helps:
select * from yourTable
pivot(sum(cost) for monthname in ('January', 'February', 'March', 'April'))
order by route;
The values of your monthname-column should match the one in the brackets.
As this is a more static solution, you still have to adjust the code every month. Here probably writing a stored procedure is helping: https://docs.snowflake.com/en/sql-reference/stored-procedures.html

Joining a Table with Itself with multiple WHERE statemetns

Long time reader, first time poster.
I'm trying to consolidate a table I have to the rate of sold goods getting lost in transit. In this table, we have four kinds of products, three countries of origin, three transit countries (where the goods are first shipped to before being passed to customers) and three destination countries. The table is as follows.
Status Product Count Origin Transit Destination
--------------------------------------------------------------------
Delivered Shoes 100 Germany France USA
Delivered Books 50 Germany France USA
Delivered Jackets 75 Germany France USA
Delivered DVDS 30 Germany France USA
Not Delivered Shoes 7 Germany France USA
Not Delivered Books 3 Germany France USA
Not Delivered Jackets 5 Germany France USA
Not Delivered DVDS 1 Germany France USA
Delivered Shoes 300 Poland Netherlands Canada
Delivered Books 80 Poland Netherlands Canada
Delivered Jackets 25 Poland Netherlands Canada
Delivered DVDS 90 Poland Netherlands Canada
Not Delivered Shoes 17 Poland Netherlands Canada
Not Delivered Books 13 Poland Netherlands Canada
Not Delivered Jackets 1 Poland Netherlands Canada
Delivered Shoes 250 Spain Ireland UK
Delivered Books 20 Spain Ireland UK
Delivered Jackets 150 Spain Ireland UK
Delivered DVDS 60 Spain Ireland UK
Not Delivered Shoes 19 Spain Ireland UK
Not Delivered Books 8 Spain Ireland UK
Not Delivered Jackets 8 Spain Ireland UK
Not Delivered DVDS 10 Spain Ireland UK
I would like to create a new table that shows the count of goods delivered and not delivered in one row, like this.
Product Delivered Not_Delivered Origin Transit Destination
Shoes 100 7 Germany France USA
Books 50 3 Germany France USA
Jackets 75 5 Germany France USA
DVDS 30 1 Germany France USA
Shoes 300 17 Poland Netherlands Canada
Books 80 13 Poland Netherlands Canada
Jackets 25 1 Poland Netherlands Canada
DVDS 90 0 Poland Netherlands Canada
Shoes 250 19 Spain Ireland UK
Books 20 8 Spain Ireland UK
Jackets 150 8 Spain Ireland UK
DVDS 60 10 Spain Ireland UK
I've had a look at some other posts and so far I haven't found exactly what I'm looking for. Perhaps the issue here is that there will be multiple WHERE statements in the code to ensure that I don't group all shoes together, ore all country groups.
Is this possible with SQL?
Something like this?
select
product
,sum(case when status = 'Delivered' then count else 0 end) as delivered
,sum(case when status = 'Not Delivered' then count else 0 end) as not_delivered
,origin
,transit
,destination
from table
group by
product
,origin
,transit
,destination
This is rather easy; instead of one line per Product, Origin, Transit, Destination and Status, you want one result line per Product, Origin, Transit and Destination only. So group by these four columns and aggregate conditionally:
select
product, origin, transit, destination,
sum(case when status = 'Delivered' then "count" else 0 end) as delivered,
sum(case when status = 'Not Delivered' then "count" else 0 end) as not_delivered
from mytable
group by product, origin, transit, destination;
BTW: It is not a good idea to use a keyword for a column name. I used double quotes to use your column count, which is standard SQL, but I don't know if it works in Google BigQuery. Maybe it must be "Count" rather than "count" or something entirely else.)
SELECT
product, origin, transit, destination,
SUM([count] * (status = 'Delivered')) AS delivered,
SUM([count] * (status = 'Not Delivered')) AS not_delivered
FROM mytable
GROUP BY 1, 2, 3, 4

Geo maps setup in GoodData

Can anyone help me with preparing data for the new feature Geo Maps. I want to show the below data on geo maps.
Country Name Sales
Russia 1244
Canada 3553
Germany 5345
Australia 2456
France 2566
United Kingdom 6743
India 3677
United States 5633
Thanks in advance,
the setup is quite easy and you can find out more information here:
https://developer.gooddata.com/article/setting-up-data-for-geo-charts
Basically it is about setting up the correct date type for columns that represents geo-information.
JT

Sams Teach Yourself SQL in 10 minutes - Question about GROUP BY

i read the book "Sams Teach Yourself SQL in 10 minutes, Third Edition" and in the lesson 10 "Grouping Data", section "Creating Groups", i can't understand the following:
"Aside from the aggregate calculations statements, every column in your SELECT statement must be present in the GROUP BY clause."
Why? I tried this and i think that it is not true.
For example, consider a table 'World' with the columns 'continent', 'country', 'population'.
SELECT continent, country
FROM World
GROUP BY continent;
According to the book, this should lead to an error, right? But it doesn't. I can group my data depending on the continent (so we have at the results 7 continents) and next to each continent, a random country name.
Like this
continent country
North America Canada
South America Brazil
Europe France
Africa Cameroon
Asia Japan
Australia New Zealand
Antarctica TuxLand
You are most probably using MySQL which allows ungrouped and unaggregated expressions in SELECT clause.
This is violation of standard of course.
This is intended to simplify GROUP BY with joins on a PRIMARY KEY:
SELECT a.*, SUM(b.value)
FROM a
JOIN b
ON b.a_id = a.id
GROUP BY
a.id
Normally, you would have either to add all columns from a into the GROUP BY clause or use a subquery.
MySQL allows you not to do it since all values from a are guaranteed to be the same for a given value of the PRIMARY KEY (which is grouped on).
This is correct and should produce no error in some forms of SQL such as MySQL. You may optionally use the GROUP BY statement on more than one column but it's not required.
GROUP BY will list the first result of the columns specified - so in your case, it would return the first country/continent pair.
PostgreSQL and MySQL allow this, using one field for the group by.
The tutorial probably assumes you should use GROUP BY on all fields so from what you select, you don't lose any data - it would show every country/continent in the above example, but only once.
Here's an example table:
Continent | Country | Random_Field
---------------------------------------------
North America Canada Cake
North America Canada Dog
South America Brazil Cat
Europe France Frog
Africa Cameroon House
Asia Japan Gadget
Asia India Dance
Australia New Zealand Frodo
Antarctica TuxLand Linux
In your first statement:
SELECT continent, country
FROM World
GROUP BY continent;
The output would be:
Continent | Country
--------------------------
North America Canada
South America Brazil
Europe France
Africa Cameroon
Asia Japan
Australia New Zealand
Antarctica TuxLand
Notice one of the Asia rows was lost, despite being different.
Using a GROUP BY on both:
SELECT continent, country
FROM World
GROUP BY continent, country;
Would yield:
Continent | Country
-----------------------------
North America Canada
South America Brazil
Europe France
Africa Cameroon
Asia Japan
Asia India
Australia New Zealand
Antarctica TuxLand