Using SQL DENSE_RANK to determine duplicates - sql

Here is an example of the data for which I am trying to find all the orders with same quantities, ignoring the OrderID column
Product Location Customer OrderID Quantity
Eggs Chicago XYZ 2011 10
Eggs Chicago XYZ 2012 10
Eggs Chicago XYZ 2013 15
So, I used DENSE_RANK function in the SQL
Select Product,Location,Customer,OrderID,Quantity,
Ranking = DENSE_RANK() OVER (PARTITION BY Product,Location,Customer,Quantity
ORDER BY OrderID ASC)
FROM MyTable
to get the data below
Product Location Customer OrderID Quantity Ranking
Eggs Chicago XYZ 2011 10 1
Eggs Chicago XYZ 2012 10 2
Eggs Chicago XYZ 2013 15 1
So, based on the ranking I was able to filter out the records that have the same quantity across different orderIDs and treat them as one.
So far everything is good and I am happy. But, one of another crazy requirement is this form of aggregation should be done only for the first change in quantity. For example, if the above data happens to be like one below
Product Location Customer OrderID Quantity
Eggs Chicago XYZ 2011 10
Eggs Chicago XYZ 2012 10
Eggs Chicago XYZ 2013 15
Eggs Chicago XYZ 2014 15
Eggs Chicago XYZ 2015 15
The same SQL would produce result
Product Location Customer OrderID Quantity Ranking
Eggs Chicago XYZ 2011 10 1
Eggs Chicago XYZ 2012 10 2
Eggs Chicago XYZ 2013 15 1
Eggs Chicago XYZ 2013 15 2
Eggs Chicago XYZ 2013 15 3
But, I would need the result to be
Product Location Customer OrderID Quantity Ranking
Eggs Chicago XYZ 2011 10 1
Eggs Chicago XYZ 2012 10 2
Eggs Chicago XYZ 2013 15 1
Eggs Chicago XYZ 2013 15 1
Eggs Chicago XYZ 2013 15 1
Please, note the ranking remains 1 for all the records after the first change in quantity.
Is it possible to tweak my SQL to get the above behavior?
Thanks for any suggestions.

If I understand you correctly, you want to use DENSE_RANK() to eliminate duplicate rows in your data.
It seems you’ve already solved your problem. If you want to eliminate the duplicates, use the same SQL code you have above and delete any rows with Ranking > 1. This will leave you with one copy of each row with the same unique key (e.g. Product, Location, Customer, OrderID).

This feels a bit dirty but I think it's correct:
SELECT
Product,
Location,
Customer,
OrderID,
Quantity,
DENSE_RANK()
OVER (PARTITION BY
Product,
Location,
Customer,
Quantity
ORDER BY
CASE WHEN
Quantity = (SELECT MIN(Quantity) FROM Orders) THEN OrderID
ELSE 0 END ASC
) AS Ranking
FROM
Orders
See fiddle

Related

Postgres rank() without duplicates

I'm ranking race data for series of cycling events. Racers win various amounts of points for their position in races. I want to retain the discrete event scoring, but also rank the racer in the series. For example, considering a sub-query that returns this:
License #
Rider Name
Total Points
Race Points
Race ID
123
Joe
25
5
567
123
Joe
25
12
234
123
Joe
25
8
987
456
Ahmed
20
12
567
456
Ahmed
20
8
234
You can see Joe has 25 points, as he won 5, 12, and 8 points in three races. Ahmed has 20 points, as he won 12 and 8 points in two races.
Now for the ranking, what I'd like is:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
2
456
Ahmed
20
12
567
2
456
Ahmed
20
8
234
But if I use rank() and order by "Total Points", I get:
Place
License #
Rider Name
Total Points
Race Points
Race ID
1
123
Joe
25
5
567
1
123
Joe
25
12
234
1
123
Joe
25
8
987
4
456
Ahmed
20
12
567
4
456
Ahmed
20
8
234
Which makes sense, since there are three "ties" at 25 points.
dense_rank() solves this problem, but if there are legitimate ties across different racers, I want there to be gaps in the rank (e.g if Joe and Ahmed both had 25 points, the next racer would be in third place, not second).
The easiest way to solve this I think would be to issue two queries, one with the "duplicate" racers eliminated, and then a second one where I can retain the individual race data, which I need for the points break down display.
I can also probably, given enough effort, think of a way to do this in a single query, but I'm wondering if I'm not just missing something really obvious that could accomplish this in a single, relatively simple query.
Any suggestions?
You have to break this into steps to get what you want, but that can be done in a single query with common table expressions:
with riders as ( -- get individual riders
select distinct license, rider, total_points
from racists
), places as ( -- calculate non-dense rankings
select license, rider, rank() over (order by total_points desc) as place
from riders
)
select p.place, r.* -- join rankings into main table
from places p
join racists r on (r.license, r.rider) = (p.license, p.rider);
db<>fiddle here

Adding rows in a table from data that is not in a column

I'm trying to create a table to add all Medals won by the participant countries in the Olympics.
I scraped the data from Wikipedia and have something similar to this:
Year
Country_Name
Host_city
Host_Country
Gold
Silver
Bronze
1986
146
Los Angeles
United States
41
32
30
1986
67
Los Angeles
United States
12
12
12
And so on
I double-checked the data for some years, and it seems very accurate. The Country_Name has an ID because I have a Country_ID table that I created and updated the names with the ID:
Country_ID
Country_Name
1986
1
1986
2
So far so good. Now I want to create a new table where I'll have all countries in a specific year and the total medals for that country. I managed to easily do that for countries that participated in an edition, here's an example for the 1896 edition:
INSERT INTO Cumultative_Medals_by_Year(Country_ID, Year, Culmutative_Gold, Culmutative_Silver, Culmutative_Bronze, Total_Medals)
SELECT a.Country_Name, a.Year, SUM(a.Gold) As Cumultative_Gold, SUM(a.Silver) As Cumultative_Silver, SUM(a.Bronze) As Cumultative_Bronze, SUM(a.Gold) + SUM(a.Silver) + SUM(a.Bronze) AS Total_Medals
FROM Country_Medals a
Where a.Year >= 1896 AND Year < 1900
Group By a.Country_Name, a.Year
And I'll have this table:
Country_ID
Year
Cumultative_Gold
Cumultative_Silver
Cumultative_Bronze
Total_Medals
6
1986
2
0
0
5
7
1986
2
1
2
5
35
1986
1
2
3
6
46
1986
5
4
2
11
49
1986
6
5
2
13
51
1986
2
3
2
7
52
1986
10
18
19
47
58
1986
2
1
3
6
85
1986
1
0
1
2
131
1986
1
2
0
3
146
1986
11
7
2
20
To add the other editions I just have to edit the dates, "Where a.Year >= 1900 AND Year < 1904", for example.
INSERT INTO Cumultative_Medals_by_Year(Country_ID, Year, Culmutative_Gold, Culmutative_Silver, Culmutative_Bronze, Total_Medals)
SELECT a.Country_Name, a.Year, SUM(a.Gold) As Cumultative_Gold, SUM(a.Silver) As Cumultative_Silver, SUM(a.Bronze) As Cumultative_Bronze, SUM(a.Gold) + SUM(a.Silver) + SUM(a.Bronze) AS Total_Medals
FROM Country_Medals a
Where a.Year >= 1900 AND Year < 1904
Group By a.Country_Name, a.Year
And the table will grow.
But I'd like to also add all the other countries for the year 1896. This way I'll have a full record of all countries. So for example, you see that Country 1 has no medals in the 1896 Olympic edition, but I'd like to also add it there, even if the sum becomes NULL (where I'll update with a 0).
Why do I want that? I'd like to do an Animated Bar Chart Race, and with the data I have, some counties go "away" from the race. For example, the US didn't participate in the 1980 Olympics, so for a brief moment, the Bar for the US in the chart goes away just to return in 1984 (when it participated again). Another example is the Soviet Union, even though they do not participate anymore, it's the second participant with most medals won (only behind the US), but as the country does not have more participation after 1988, the bar just goes away after that year. By keeping a record of medals for all countries in all editions would prevent that from happening.
I'm pretty sure there are lots of countries that have won metals that were not around in 1896. But if you want a row for every country and every year, then generate the rows you want using cross join. Then join in the available information:
select c.Country_Name, y.Year,
SUM(cm.Gold) As Cumulative_Gold,
SUM(cm.Silver) As Cumulative_Silver,
SUM(cm.Bronze) As Cumulative_Bronze,
COALESCE(SUM(cm.Gold), 0) + COALESCE(SUM(cm.Silver), 0) + COALESCE(SUM(cm.Bronze), 0) AS Total_Medals
from (select distinct year from Country_Medals) y cross join
(select distinct country_name from country_medals) c left join
country_medals cm
on cm.year = y.year and
cm.country_name = c.country_name
group By c.Country_Name, y.Year

How to assign equal revenue weight to every location of a company in a table? Google Big Query

I am working on a problem where I have the following table:
+----------+ | +------+ | +------------+
company_id | country | total revenue
1 Russia 1200
2 Croatia 1200
2 Italy 1200
3 USA 1200
3 UK 1200
3 Italy 1200
There are 3 companies in this table, but company '2' and company '3' have offices in 2 and 3 countries respectively. All companies pay 1200 per month, and because company 2 has 2 offices it shows as if they paid 1200 per month 2 times, and because company 3 has 3 offices it shows as if it paid 1200 per month 3 times. Instead, I would like revenue to be equally distributed based on how many times company_id appears in the table. company_id will only appear more than once for every additional country in which a company is based.
Assuming each company always pays 1,200 per month, my desired output is:
+----------+ | +------+ | +------------+
company_id | country | total revenue
1 Russia 1200
2 Croatia 600
2 Italy 600
3 USA 400
3 UK 400
3 Italy 400
Being new to SQL, I was thinking this can maybe be done through CASE WHEN statement, but I only learned to use CASE WHEN when I want to output a string depending on a condition. Here, I am trying to assign equal revenue weight to each company's country, depending on in how many countries a company is based in.
Thank you in advance for you help!
Below is for BigQuery Standard SQL
#standardSQL
SELECT company_id, country,
total_revenue / (COUNT(1) OVER(PARTITION BY company_id)) AS total_revenue
FROM `project.dataset.table`
If to apply to sample data from your question - output is
Row company_id country total_revenue
1 1 Russia 1200.0
2 2 Croatia 600.0
3 2 Italy 600.0
4 3 USA 400.0
5 3 UK 400.0
6 3 Italy 400.0

Include "0" results in COUNT(*) aggregate

Good morning, I've searched in the forum one doubt that I have but the results that I've seen didn't give me a solution.
I have two tables.
CARS:
Id Model
1 Seat
2 Audi
3 Mercedes
4 Ford
BREAKDOWNS:
IdBd Description Date Price IdCar
1 Engine 01/01/2020 500 € 3
2 Battery 05/01/2020 0 € 1
3 Wheel's change 10/02/2020 110,25 € 4
4 Electronic system 15/03/2020 100 € 2
5 Brake failure 20/05/2020 0 € 4
6 Engine 25/05/2020 400 € 1
I wanna make a query that shows the number of breakdowns by month with 0€ of cost.
I have this query:
SELECT Year(breakdowns.[Date]) AS YEAR, StrConv(MonthName(Month(breakdowns.[Date])),3) AS MONTH, Count(*) AS [BREAKDOWNS]
FROM cars LEFT JOIN breakdowns ON (cars.Id = breakdowns.IdCar AND breakdowns.[Price]=0)
GROUP BY breakdowns.[Price], Year(breakdowns.[Date]), Month(breakdowns.[Date]), MonthName(Month(breakdowns.[Date]))
HAVING ((Year([breakdowns].[Date]))=[Insert a year:])
ORDER BY Year(breakdowns.[Date]), Month(breakdowns.[Date]);
And the result is (if I put year '2020'):
YEAR MONTH BREAKDOWNS
2020 January 1
2020 May 1
And I want:
YEAR MONTH BREAKDOWNS
2020 January 1
2020 February 0
2020 March 0
2020 May 1
Thanks!
The HAVING condition should be in WHERE (otherwise it changes the Outer to an Inner join). But as long as you don't use columns from cars there's no need to join it.
To get rows for months without a zero price you should switch to conditional aggregation (Access doesn't support Standard SQL CASE, but IIF?).
SELECT Year(breakdowns.[Date]) AS YEAR,
StrConv(MonthName(Month(breakdowns.[Date])),3) AS MONTH,
SUM(CASE WHEN breakdowns.[Price]=0 THEN 1 ELSE 0 END) AS [BREAKDOWNS]
FROM breakdowns
JOIN cars
ON (cars.Id = breakdowns.IdCar)
WHERE ((Year([breakdowns].[Date]))=[Insert a year:])
GROUP BY breakdowns.[Price], Year(breakdowns.[Date]), Month(breakdowns.[Date]), MonthName(Month(breakdowns.[Date]))
ORDER BY Year(breakdowns.[Date]), Month(breakdowns.[Date]

Joining tables with different column name but same value in sqlite

I'm using SQLite to work with my database
I have two different tables, with key columns that have different names but the same value.
As such:
shoes
Identification | Name | Shoe size
1 Bob 10
2 John 12
payment
PaymentID | Price | Year
1 20 2013
2 38 2015
I need
Identification(or PaymentID, no matter) | Name | Shoe size | Price | Year
1 Bob 10 20 2013
2 John 12 38 2015
I've been searching, and trying to understand the tutorials to no avail. I guess im just too stupid
select s.identification, s.name, s.`shoe size`, p.price, p.year
from shoes s
join payment p on p.paymentid = s.identification