Match every record only once in a joined table - sql

I have two tables. The first inv containing records of invoices, the second containing payments. I want to match the payments in the inv table by inv_amount and inv_date. There might be more than one invoice with the same amount on the same day and also more than one payment of the same amount on the same day.
The payment should be matched with the first matching invoice and every payment must only be matched once.
This is my data:
Table inv
inv_id | inv_amount | inv_date | inv_number
--------+------------+------------+------------
1 | 10 | 2018-01-01 | 1
2 | 16 | 2018-01-01 | 1
3 | 12 | 2018-02-02 | 2
4 | 14 | 2018-02-03 | 3
5 | 19 | 2018-02-04 | 3
6 | 19 | 2018-02-04 | 5
7 | 5 | 2018-02-04 | 6
8 | 40 | 2018-02-04 | 7
9 | 19 | 2018-02-04 | 8
10 | 19 | 2018-02-05 | 9
11 | 20 | 2018-02-05 | 10
12 | 20 | 2018-02-07 | 11
Table pay
pay_id | pay_amount | pay_date
--------+------------+------------
1 | 10 | 2018-01-01
2 | 12 | 2018-02-02
4 | 19 | 2018-02-04
3 | 14 | 2018-02-03
5 | 5 | 2018-02-04
6 | 19 | 2018-02-04
7 | 19 | 2018-02-05
8 | 20 | 2018-02-07
My Query:
SELECT DISTINCT ON (inv.inv_id) inv.inv_id,
inv.inv_amount,
inv.inv_date,
inv.inv_number,
pay.pay_id
FROM ("2016".pay
RIGHT JOIN "2016".inv ON (((pay.pay_amount = inv.inv_amount) AND (pay.pay_date = inv.inv_date))))
ORDER BY inv.inv_id
resulting in:
inv_id | inv_amount | inv_date | inv_number | pay_id
--------+------------+------------+------------+--------
1 | 10 | 2018-01-01 | 1 | 1
2 | 16 | 2018-01-01 | 1 |
3 | 12 | 2018-02-02 | 2 | 2
4 | 14 | 2018-02-03 | 3 | 3
5 | 19 | 2018-02-04 | 3 | 4
6 | 19 | 2018-02-04 | 5 | 4
7 | 5 | 2018-02-04 | 6 | 5
8 | 40 | 2018-02-04 | 7 |
9 | 19 | 2018-02-04 | 8 | 6
10 | 19 | 2018-02-05 | 9 | 7
11 | 20 | 2018-02-05 | 10 |
12 | 20 | 2018-02-07 | 11 | 8
The record inv_id = 6 should not match with pay_id = 4 for it would mean that payment 4 was inserted twice
Desired result:
inv_id | inv_amount | inv_date | inv_number | pay_id
--------+------------+------------+------------+--------
1 | 10 | 2018-01-01 | 1 | 1
2 | 16 | 2018-01-01 | 1 |
3 | 12 | 2018-02-02 | 2 | 2
4 | 14 | 2018-02-03 | 3 | 3
5 | 19 | 2018-02-04 | 3 | 4
6 | 19 | 2018-02-04 | 5 | <- should be empty**
7 | 5 | 2018-02-04 | 6 | 5
8 | 40 | 2018-02-04 | 7 |
9 | 19 | 2018-02-04 | 8 | 6
10 | 19 | 2018-02-05 | 9 | 7
11 | 20 | 2018-02-05 | 10 |
12 | 20 | 2018-02-07 | 11 | 8
Disclaimer: Yes I asked that question yesterday with the original data but someone pointed out that my sql was very hard to read. I, therefore, tried to create a cleaner representation of my problem.
For convenience, here's an SQL Fiddle to test: http://sqlfiddle.com/#!17/018d7/1

After seeing the example I think I've got the query for you:
WITH payments_cte AS (
SELECT
payment_id,
payment_amount,
payment_date,
ROW_NUMBER() OVER (PARTITION BY payment_amount, payment_date ORDER BY payment_id) AS payment_row
FROM payments
), invoices_cte AS (
SELECT
invoice_id,
invoice_amount,
invoice_date,
invoice_number,
ROW_NUMBER() OVER (PARTITION BY invoice_amount, invoice_date ORDER BY invoice_id) AS invoice_row
FROM invoices
)
SELECT invoice_id, invoice_amount, invoice_date, invoice_number, payment_id
FROM invoices_cte
LEFT JOIN payments_cte
ON payment_amount = invoice_amount
AND payment_date = invoice_date
AND payment_row = invoice_row
ORDER BY invoice_id, payment_id

Related

How to lag return only previous month grouping from a set of date values?

Hi I have a a table as :
date_key
month
customer_id
2022-01-01
1
1
2022-01-23
1
1
2022-02-02
2
1
2022-02-15
2
1
2022-02-16
2
1
2022-02-18
2
1
2022-02-16
2
1
2022-05-18
5
1
2022-06-11
6
1
2022-06-12
6
1
2022-06-13
6
1
2022-06-15
6
1
and want to lag on last previous month above
date_key
month
customer_id
lastMonth
2022-01-01
1
1
2022-01-23
1
1
2022-02-02
2
1
1
2022-02-15
2
1
1
2022-02-16
2
1
1
2022-02-18
2
1
1
2022-02-16
2
1
1
2022-05-18
5
1
2
2022-06-11
6
1
5
2022-06-12
6
1
5
2022-06-13
6
1
5
2022-06-15
6
1
5
I tried using
select *
lag(month,1) over(partition by customer_id order by month) lastMonth
from table
However this does not seem to get the result as needed.
Please do help.
Try this one:
SELECT *,
LAST_VALUE(month) OVER (
PARTITION BY customer_id
ORDER BY EXTRACT(YEAR FROM date_key) * 12 + month
RANGE BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
) AS lastMonth
FROM sample_table
ORDER BY date_key;
Query results:
+-----+------------+-------+-------------+-----------+
| Row | date_key | month | customer_id | lastMonth |
+-----+------------+-------+-------------+-----------+
| 1 | 2022-01-01 | 1 | 1 | null |
| 2 | 2022-01-23 | 1 | 1 | null |
| 3 | 2022-02-02 | 2 | 1 | 1 |
| 4 | 2022-02-15 | 2 | 1 | 1 |
| 5 | 2022-02-16 | 2 | 1 | 1 |
| 6 | 2022-02-16 | 2 | 1 | 1 |
| 7 | 2022-02-18 | 2 | 1 | 1 |
| 8 | 2022-05-18 | 5 | 1 | 2 |
| 9 | 2022-06-11 | 6 | 1 | 5 |
| 10 | 2022-06-12 | 6 | 1 | 5 |
| 11 | 2022-06-13 | 6 | 1 | 5 |
| 12 | 2022-06-15 | 6 | 1 | 5 |
+-----+------------+-------+-------------+-----------+

SQL value of one column based on max values in other selected rows

I am using the Northwind sample database table and I would like to find the top categoryId for each supplierId...
+-----------+----------------------------------+------------+------------+
| ProductID | ProductName | SupplierID | CategoryID |
+-----------+----------------------------------+------------+------------+
| 1 | Chai | 1 | 1 |
+-----------+----------------------------------+------------+------------+
| 2 | Chang | 1 | 1 |
+-----------+----------------------------------+------------+------------+
| 3 | Aniseed Syrup | 1 | 2 |
+-----------+----------------------------------+------------+------------+
| 4 | Chef Anton's Cajun Seasoning | 2 | 2 |
+-----------+----------------------------------+------------+------------+
| 5 | Chef Anton's Gumbo Mix | 2 | 2 |
+-----------+----------------------------------+------------+------------+
| 6 | Grandma's Boysenberry Spread | 3 | 2 |
+-----------+----------------------------------+------------+------------+
| 7 | Uncle Bob's Organic Dried Pears | 3 | 7 |
+-----------+----------------------------------+------------+------------+
| 8 | Northwoods Cranberry Sauce | 3 | 2 |
+-----------+----------------------------------+------------+------------+
| 9 | Mishi Kobe Niku | 4 | 6 |
+-----------+----------------------------------+------------+------------+
| 10 | Ikura | 4 | 8 |
+-----------+----------------------------------+------------+------------+
| 11 | Queso Cabrales | 5 | 4 |
+-----------+----------------------------------+------------+------------+
| 12 | Queso Manchego La Pastora | 5 | 4 |
+-----------+----------------------------------+------------+------------+
| 13 | Konbu | 6 | 8 |
+-----------+----------------------------------+------------+------------+
| 14 | Tofu | 6 | 7 |
+-----------+----------------------------------+------------+------------+
| 15 | Genen Shouyu | 6 | 2 |
+-----------+----------------------------------+------------+------------+
| 16 | Pavlova | 7 | 3 |
+-----------+----------------------------------+------------+------------+
| 17 | Alice Mutton | 7 | 6 |
+-----------+----------------------------------+------------+------------+
| 18 | Carnarvon Tigers | 7 | 8 |
+-----------+----------------------------------+------------+------------+
| 19 | Teatime Chocolate Biscuits | 8 | 3 |
+-----------+----------------------------------+------------+------------+
| 20 | Sir Rodney's Marmalade | 8 | 3 |
+-----------+----------------------------------+------------+------------+
| 21 | Sir Rodney's Scones | 8 | 3 |
+-----------+----------------------------------+------------+------------+
| 22 | Gustaf's Knäckebröd | 9 | 5 |
+-----------+----------------------------------+------------+------------+
| 23 | Tunnbröd | 9 | 5 |
+-----------+----------------------------------+------------+------------+
| 24 | Guaraná Fantástica | 10 | 1 |
+-----------+----------------------------------+------------+------------+
| 25 | NuNuCa Nuß-Nougat-Creme | 11 | 3 |
+-----------+----------------------------------+------------+------------+
| 26 | Gumbär Gummibärchen | 11 | 3 |
+-----------+----------------------------------+------------+------------+
| 27 | Schoggi Schokolade | 11 | 3 |
+-----------+----------------------------------+------------+------------+
| 28 | Rössle Sauerkraut | 12 | 7 |
+-----------+----------------------------------+------------+------------+
| 29 | Thüringer Rostbratwurst | 12 | 6 |
+-----------+----------------------------------+------------+------------+
| 30 | Nord-Ost Matjeshering | 13 | 8 |
+-----------+----------------------------------+------------+------------+
| 31 | Gorgonzola Telino | 14 | 4 |
+-----------+----------------------------------+------------+------------+
| 32 | Mascarpone Fabioli | 14 | 4 |
+-----------+----------------------------------+------------+------------+
| 33 | Geitost | 15 | 4 |
+-----------+----------------------------------+------------+------------+
| 34 | Sasquatch Ale | 16 | 1 |
+-----------+----------------------------------+------------+------------+
| 35 | Steeleye Stout | 16 | 1 |
+-----------+----------------------------------+------------+------------+
| 36 | Inlagd Sill | 17 | 8 |
+-----------+----------------------------------+------------+------------+
| 37 | Gravad lax | 17 | 8 |
+-----------+----------------------------------+------------+------------+
| 38 | Côte de Blaye | 18 | 1 |
+-----------+----------------------------------+------------+------------+
| 39 | Chartreuse verte | 18 | 1 |
+-----------+----------------------------------+------------+------------+
| 40 | Boston Crab Meat | 19 | 8 |
+-----------+----------------------------------+------------+------------+
| 41 | Jack's New England Clam Chowder | 19 | 8 |
+-----------+----------------------------------+------------+------------+
| 42 | Singaporean Hokkien Fried Mee | 20 | 5 |
+-----------+----------------------------------+------------+------------+
| 43 | Ipoh Coffee | 20 | 1 |
+-----------+----------------------------------+------------+------------+
| 44 | Gula Malacca | 20 | 2 |
+-----------+----------------------------------+------------+------------+
| 45 | Rogede sild | 21 | 8 |
+-----------+----------------------------------+------------+------------+
| 46 | Spegesild | 21 | 8 |
+-----------+----------------------------------+------------+------------+
| 47 | Zaanse koeken | 22 | 3 |
+-----------+----------------------------------+------------+------------+
| 48 | Chocolade | 22 | 3 |
+-----------+----------------------------------+------------+------------+
| 49 | Maxilaku | 23 | 3 |
+-----------+----------------------------------+------------+------------+
| 50 | Valkoinen suklaa | 23 | 3 |
+-----------+----------------------------------+------------+------------+
| 51 | Manjimup Dried Apples | 24 | 7 |
+-----------+----------------------------------+------------+------------+
| 52 | Filo Mix | 24 | 5 |
+-----------+----------------------------------+------------+------------+
| 53 | Perth Pasties | 24 | 6 |
+-----------+----------------------------------+------------+------------+
| 54 | Tourtière | 25 | 6 |
+-----------+----------------------------------+------------+------------+
| 55 | Pâté chinois | 25 | 6 |
+-----------+----------------------------------+------------+------------+
| 56 | Gnocchi di nonna Alice | 26 | 5 |
+-----------+----------------------------------+------------+------------+
| 57 | Ravioli Angelo | 26 | 5 |
+-----------+----------------------------------+------------+------------+
| 58 | Escargots de Bourgogne | 27 | 8 |
+-----------+----------------------------------+------------+------------+
| 59 | Raclette Courdavault | 28 | 4 |
+-----------+----------------------------------+------------+------------+
| 60 | Camembert Pierrot | 28 | 4 |
+-----------+----------------------------------+------------+------------+
| 61 | Sirop d'érable | 29 | 2 |
+-----------+----------------------------------+------------+------------+
| 62 | Tarte au sucre | 29 | 3 |
+-----------+----------------------------------+------------+------------+
| 63 | Vegie-spread | 7 | 2 |
+-----------+----------------------------------+------------+------------+
| 64 | Wimmers gute Semmelknödel | 12 | 5 |
+-----------+----------------------------------+------------+------------+
| 65 | Louisiana Fiery Hot Pepper Sauce | 2 | 2 |
+-----------+----------------------------------+------------+------------+
| 66 | Louisiana Hot Spiced Okra | 2 | 2 |
+-----------+----------------------------------+------------+------------+
| 67 | Laughing Lumberjack Lager | 16 | 1 |
+-----------+----------------------------------+------------+------------+
| 68 | Scottish Longbreads | 8 | 3 |
+-----------+----------------------------------+------------+------------+
| 69 | Gudbrandsdalsost | 15 | 4 |
+-----------+----------------------------------+------------+------------+
| 70 | Outback Lager | 7 | 1 |
+-----------+----------------------------------+------------+------------+
| 71 | Flotemysost | 15 | 4 |
+-----------+----------------------------------+------------+------------+
| 72 | Mozzarella di Giovanni | 14 | 4 |
+-----------+----------------------------------+------------+------------+
| 73 | Röd Kaviar | 17 | 8 |
+-----------+----------------------------------+------------+------------+
| 74 | Longlife Tofu | 4 | 7 |
+-----------+----------------------------------+------------+------------+
| 75 | Rhönbräu Klosterbier | 12 | 1 |
+-----------+----------------------------------+------------+------------+
| 76 | Lakkalikööri | 23 | 1 |
+-----------+----------------------------------+------------+------------+
| 77 | Original Frankfurter grüne Soße | 12 | 2 |
+-----------+----------------------------------+------------+------------+
Using the query
SELECT SupplierID, CategoryID, COUNT(CategoryID) AS Total FROM [dbo].[Products] GROUP BY CategoryID, SupplierID
I get the table
+------------+------------+-------+
| SupplierID | CategoryID | Total |
+------------+------------+-------+
| 1 | 1 | 2 |
+------------+------------+-------+
| 1 | 2 | 1 |
+------------+------------+-------+
| 2 | 2 | 4 |
+------------+------------+-------+
| 3 | 2 | 2 |
+------------+------------+-------+
| 3 | 7 | 1 |
+------------+------------+-------+
| 4 | 6 | 1 |
+------------+------------+-------+
| 4 | 7 | 1 |
+------------+------------+-------+
| 4 | 8 | 1 |
+------------+------------+-------+
| 5 | 4 | 2 |
+------------+------------+-------+
| 6 | 2 | 1 |
+------------+------------+-------+
| 6 | 7 | 1 |
+------------+------------+-------+
| 6 | 8 | 1 |
+------------+------------+-------+
| 7 | 1 | 1 |
+------------+------------+-------+
| 7 | 2 | 1 |
+------------+------------+-------+
| 7 | 3 | 1 |
+------------+------------+-------+
| 7 | 6 | 1 |
+------------+------------+-------+
| 7 | 8 | 1 |
+------------+------------+-------+
| 8 | 3 | 4 |
+------------+------------+-------+
| 9 | 5 | 2 |
+------------+------------+-------+
| 10 | 1 | 1 |
+------------+------------+-------+
| 11 | 3 | 3 |
+------------+------------+-------+
| 12 | 1 | 1 |
+------------+------------+-------+
| 12 | 2 | 1 |
+------------+------------+-------+
| 12 | 5 | 1 |
+------------+------------+-------+
| 12 | 6 | 1 |
+------------+------------+-------+
| 12 | 7 | 1 |
+------------+------------+-------+
| 13 | 8 | 1 |
+------------+------------+-------+
| 14 | 4 | 3 |
+------------+------------+-------+
| 15 | 4 | 3 |
+------------+------------+-------+
| 16 | 1 | 3 |
+------------+------------+-------+
| 17 | 8 | 3 |
+------------+------------+-------+
| 18 | 1 | 2 |
+------------+------------+-------+
| 19 | 8 | 2 |
+------------+------------+-------+
| 20 | 1 | 1 |
+------------+------------+-------+
| 20 | 2 | 1 |
+------------+------------+-------+
| 20 | 5 | 1 |
+------------+------------+-------+
| 21 | 8 | 2 |
+------------+------------+-------+
| 22 | 3 | 2 |
+------------+------------+-------+
| 23 | 1 | 1 |
+------------+------------+-------+
| 23 | 3 | 2 |
+------------+------------+-------+
| 24 | 5 | 1 |
+------------+------------+-------+
| 24 | 6 | 1 |
+------------+------------+-------+
| 24 | 7 | 1 |
+------------+------------+-------+
| 25 | 6 | 2 |
+------------+------------+-------+
| 26 | 5 | 2 |
+------------+------------+-------+
| 27 | 8 | 1 |
+------------+------------+-------+
| 28 | 4 | 2 |
+------------+------------+-------+
| 29 | 2 | 1 |
+------------+------------+-------+
| 29 | 3 | 1 |
+------------+------------+-------+
As you can see supplier 1 makes 2 category 1 products and 1 catergory 2 product. Therefore the first line in the query should read
+------------+------------+-------+
| SupplierID | CategoryID | Total |
+------------+------------+-------+
| 1 | 1 | 2 |
+------------+------------+-------+
Next should be supplierId #2 which makes a total of 4 category 2 products. The final table should look like this...
+------------+------------+-------+
| SupplierID | CategoryID | Total |
+------------+------------+-------+
| 1 | 1 | 2 |
+------------+------------+-------+
| 2 | 2 | 4 |
+------------+------------+-------+
| 3 | 2 | 2 |
+------------+------------+-------+
| 4 | 6 | 1 |
+------------+------------+-------+
| 5 | 4 | 2 |
+------------+------------+-------+
| 6 | 2 | 1 |
+------------+------------+-------+
| 7 | 1 | 1 |
+------------+------------+-------+
| 8 | 3 | 4 |
+------------+------------+-------+
| 9 | 5 | 2 |
+------------+------------+-------+
| 11 | 3 | 3 |
+------------+------------+-------+
| 12 | 1 | 1 |
+------------+------------+-------+
| 13 | 8 | 1 |
+------------+------------+-------+
| 14 | 4 | 3 |
+------------+------------+-------+
| 15 | 4 | 3 |
+------------+------------+-------+
| 16 | 1 | 3 |
+------------+------------+-------+
| 17 | 8 | 3 |
+------------+------------+-------+
| 18 | 1 | 2 |
+------------+------------+-------+
| 19 | 8 | 2 |
+------------+------------+-------+
| 20 | 1 | 1 |
+------------+------------+-------+
| 21 | 8 | 2 |
+------------+------------+-------+
| 22 | 3 | 2 |
+------------+------------+-------+
| 23 | 3 | 2 |
+------------+------------+-------+
| 24 | 5 | 1 |
+------------+------------+-------+
| 25 | 6 | 2 |
+------------+------------+-------+
| 26 | 5 | 2 |
+------------+------------+-------+
| 27 | 8 | 1 |
+------------+------------+-------+
| 28 | 4 | 2 |
+------------+------------+-------+
| 29 | 2 | 1 |
+------------+------------+-------+
| 29 | 3 | 1 |
+------------+------------+-------+
I know a lot of suppliers only make one item for a given category and this isn't a great example but just trying to learn here.
Thanks
I think you can make use of row number based on partition by supplier and then use aggregate function along with row number for ranking. Then only select the one where you have more rows for a given supplier. I just took some part of your sample data and did it in this way.
with cte as (
select 1 as ProductID, 'Chai' as ProductNmae, 1 as SupplierID, 1 as CategoryID union all
select 2 as ProductID, 'Chang' as ProductNmae, 1 as SupplierID, 1 as CategoryID union all
select 3 as ProductID, 'Aniseed Syrup' as ProductNmae, 1 as SupplierID, 2 as CategoryID union all
select 4 as ProductID, 'Chef Anton''s Cajun Seasoning' as ProductNmae, 2 as SupplierID, 2 as CategoryID union all
select 5 as ProductID, 'Chef Anton''s Gumbo Mix' as ProductNmae, 2 as SupplierID, 2 as CategoryID union all
select 6 as ProductID, 'Grandma''s Boysenberry Spread' as Product_name , 3 as SupplierID, 2 as CategoryID union all
select 7 as ProductID, 'Uncle Bob''s Organic Dried Pears' as Product_name , 3 as SupplierID, 7 as CategoryID union all
select 8 as ProductID, 'Northwoods Cranberry Sauce' as Product_name , 3 as SupplierID, 2 as CategoryID )
select t.SupplierID, t.CategoryID, t.total from (
select supplierID, CategoryID , ROW_NUMBER() over (partition by supplierID order by count(1) desc) rownum, count(1) total from cte
group by supplierID, CategoryID ) t
where t.rownum = 1
Output:
SupplierID CategoryID total
1 1 2
2 2 2
3 2 2
In Sql server you can write a query as:
select SupplierID ,
CategoryID ,
Total
from (
select
SupplierID ,
CategoryID ,
Total ,
ROW_NUMBER() over (partition by SupplierID order by Total desc) as rownum
from (
SELECT SupplierID
, CategoryID
, COUNT(CategoryID) AS Total
FROM [dbo].[Products]
GROUP BY CategoryID, SupplierID
) as Innertable
) as Outertable
where rownum = 1
order by SupplierID
First you have to generate the category counts by supplier, then you have to rank them from highest to lowest, and finally select only the highest. In the following query, I've done that by using nested queries:
-- Select only the top category counts by supplier
SELECT
[SupplierID],
[CategoryID],
[Total]
FROM (
-- Rank category counts by supplier
SELECT
*,
RANK() OVER (PARTITION BY [SupplierID] ORDER BY [Total] DESC) AS [Rank]
FROM (
-- Generate category counts by supplier
SELECT
[SupplierID],
[CategoryID],
COUNT(*) AS [Total]
FROM [Products]
GROUP BY
[SupplierID],
[CategoryID]
) AS SupplierCategoryCounts
) AS RankedSupplierCategoryCounts
WHERE [Rank] = 1
ORDER BY [SupplierID]

Count Since Last Max Within Window

I have been working on this query for most of the night, and just cannot get it to work. This is an addendum to this question. The query should find the "Seqnum" of the last Maximum over the last 10 records. I am unable to limit the last Maximum to just the window.
Below is my best effort at getting there although I have tried many other queries to no avail:
SELECT [id], high, running_max, seqnum,
MAX(CASE WHEN ([high]) = running_max THEN seqnum END) OVER (ORDER BY [id]) AS [lastmax]
FROM (
SELECT [id], [high],
MAX([high]) OVER (ORDER BY [id] ROWS BETWEEN 9 PRECEDING AND CURRENT ROW) AS running_max,
ROW_NUMBER() OVER (ORDER BY [id]) as seqnum
FROM PY t
) x
When the above query is run, the below results.
id | high | running_max | seqnum | lastmax |
+----+--------+-------------+--------+---------+
| 1 | 28.12 | 28.12 | 1 | 1 |
| 2 | 27.45 | 28.12 | 2 | 1 |
| 3 | 27.68 | 28.12 | 3 | 1 |
| 4 | 27.4 | 28.12 | 4 | 1 |
| 5 | 28.09 | 28.12 | 5 | 1 |
| 6 | 28.07 | 28.12 | 6 | 1 |
| 7 | 28.2 | 28.2 | 7 | 7 |
| 8 | 28.7 | 28.7 | 8 | 8 |
| 9 | 28.05 | 28.7 | 9 | 8 |
| 10 | 28.195 | 28.7 | 10 | 8 |
| 11 | 27.77 | 28.7 | 11 | 8 |
| 12 | 28.27 | 28.7 | 12 | 8 |
| 13 | 28.185 | 28.7 | 13 | 8 |
| 14 | 28.51 | 28.7 | 14 | 8 |
| 15 | 28.5 | 28.7 | 15 | 8 |
| 16 | 28.23 | 28.7 | 16 | 8 |
| 17 | 27.59 | 28.7 | 17 | 8 |
| 18 | 27.6 | 28.51 | 18 | 8 |
| 19 | 27.31 | 28.51 | 19 | 8 |
| 20 | 27.11 | 28.51 | 20 | 8 |
| 21 | 26.87 | 28.51 | 21 | 8 |
| 22 | 27.12 | 28.51 | 22 | 8 |
| 23 | 27.22 | 28.51 | 23 | 8 |
| 24 | 27.3 | 28.5 | 24 | 8 |
| 25 | 27.66 | 28.23 | 25 | 8 |
| 26 | 27.405 | 27.66 | 26 | 8 |
| 27 | 27.54 | 27.66 | 27 | 8 |
| 28 | 27.65 | 27.66 | 28 | 8 |
+----+--------+-------------+--------+---------+
Unfortunately the lastmax column is taking the last max of all the previous records and not the max of the last 10 records only. The way it should result is below:
It is important to note that their can be duplicates in the "High" column, so this will need to be taken into account.
Any help would be greatly appreciated.
This isn't a bug. The issue is that high and lastmax have to come from the same row. This is a confusing aspect when using window functions.
Your logic in the outer query is looking for a row where the lastmax on that row matches the high on that row. That last occurred on row 8. The subsequent maxima are "local", in the sense that there was a higher value on that particular row.
For instance, on row 25, the value is 26.660. That is the maximum value that you want from row 26 onward. But on row 25 itself, then maximum is 28.230. That is clearly not equal to high on that row. So, it doesn't match in the outer query.
I don't think you can easily do what you want using window functions. There may be some tricky way.
A version using cross apply works. I've used id for the lastmax. I'm not sure if you really need seqnum:
select py.[id], py.high, t.high as running_max, t.id as lastmax
from py cross apply
(select top (1) t.*
from (SELECT top (10) t.*
from PY t
where t.id <= py.id
order by t.id desc
) t
order by t.high desc
) t;
Here is a db<>fiddle.

Add unique id to groups of ordered transactions [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I currently have a table with transactions that are sequentially ordered for each group like so:
| transaction_no | value |
|----------------|-------|
| 1 | 8 |
| 2 | 343 |
| 3 | 28 |
| 4 | 102 |
| 1 | 30 |
| 2 | 5 |
| 3 | 100 |
| 1 | 12 |
| 2 | 16 |
| 3 | 28 |
| 4 | 157 |
| 5 | 125 |
However I'm interested in add another column that assigns a unique ID to each
grouping (set of transactions where the transaction_no starts with 1 and ends with x
where the transaction_no immediately after x is 1). So the goal is a table like this:
| transaction_no | value | stmt_id |
|----------------|-------|---------|
| 1 | 8 | 1001 |
| 2 | 343 | 1001 |
| 3 | 28 | 1001 |
| 4 | 102 | 1001 |
| 1 | 30 | 1002 |
| 2 | 5 | 1002 |
| 3 | 100 | 1002 |
| 1 | 12 | 1003 |
| 2 | 16 | 1003 |
| 3 | 28 | 1003 |
| 4 | 157 | 1003 |
| 5 | 125 | 1003 |
How would I do this?
This is a variation of the gaps-and-island problem. For it to be solvable, as commented by Gordon Linoff, you need a column that can be used to order the rows. I assume that such a column exists and is called id.
The typical solution involves ranking the records and performing a window sum. When the difference between the overal rank and the window sum changes, a new group starts.
Consider the following query:
select
id,
transaction,
value,
1000
+ rn
- sum(case when transaction_no = lag_transaction_no + 1 then 1 else 0 end)
over(order by id) grp
from (
select
t.*,
row_number() over(order by id) rn,
lag(transaction_no) over(order by id) lag_transaction_no
from mytable t
) t
With this sample data:
id | transaction_no | value
-: | -------------: | ----:
1 | 1 | 8
2 | 2 | 343
3 | 3 | 28
4 | 4 | 102
5 | 1 | 30
6 | 2 | 5
7 | 3 | 100
8 | 1 | 12
9 | 2 | 16
10 | 3 | 28
11 | 4 | 157
12 | 5 | 125
The query returns:
id | transaction_no | value | grp
-: | -------------: | ----: | ---:
1 | 1 | 8 | 1001
2 | 2 | 343 | 1001
3 | 3 | 28 | 1001
4 | 4 | 102 | 1001
5 | 1 | 30 | 1002
6 | 2 | 5 | 1002
7 | 3 | 100 | 1002
8 | 1 | 12 | 1003
9 | 2 | 16 | 1003
10 | 3 | 28 | 1003
11 | 4 | 157 | 1003
12 | 5 | 125 | 1003
Demo on SQL Server 2012 DB Fiddle

Rolling total with no sub-select and no vendor specific extensions

What I'm trying to achieve: rolling total for quantity and amount for a given day, grouped by hour.
It's easy in most cases, but if you have some additional columns (dir and product in my case) and you don't want to group/filter on them, that's a problem.
I know there are extensions in Oracle and MSSQL specifically for that, and there's SELECT OVER PARTITION in Postgres.
At the moment I'm working on an app prototype, and it's backed by MySQL, and I have no idea what it will be using in production, so I'm trying to avoid vendor lock-in.
The entrire table:
> SELECT id, dir, product, date, hour, quantity, amount FROM sales
ORDER BY date, hour;
+------+-----+---------+------------+------+----------+--------+
| id | dir | product | date | hour | quantity | amount |
+------+-----+---------+------------+------+----------+--------+
| 2230 | 65 | ABCDEDF | 2014-09-11 | 1 | 1 | 10 |
| 2231 | 64 | ABCDEDF | 2014-09-11 | 3 | 4 | 40 |
| 2232 | 64 | ABCDEDF | 2014-09-11 | 5 | 5 | 50 |
| 2235 | 64 | ZZ | 2014-09-11 | 7 | 6 | 60 |
| 2233 | 64 | ABCDEDF | 2014-09-11 | 7 | 6 | 60 |
| 2237 | 66 | ABCDEDF | 2014-09-11 | 7 | 6 | 60 |
| 2234 | 64 | ZZ | 2014-09-18 | 3 | 1 | 11 |
| 2236 | 66 | ABCDEDF | 2014-09-18 | 3 | 1 | 100 |
| 2227 | 64 | ABCDEDF | 2014-09-18 | 3 | 1 | 100 |
| 2228 | 64 | ABCDEDF | 2014-09-18 | 5 | 2 | 200 |
| 2229 | 64 | ABCDEDF | 2014-09-18 | 7 | 3 | 300 |
+------+-----+---------+------------+------+----------+--------+
For a given date:
> SELECT id, dir, product, date, hour, quantity, amount FROM sales
WHERE date = '2014-09-18'
ORDER BY hour;
+------+-----+---------+------------+------+----------+--------+
| id | dir | product | date | hour | quantity | amount |
+------+-----+---------+------------+------+----------+--------+
| 2227 | 64 | ABCDEDF | 2014-09-18 | 3 | 1 | 100 |
| 2236 | 66 | ABCDEDF | 2014-09-18 | 3 | 1 | 100 |
| 2234 | 64 | ZZ | 2014-09-18 | 3 | 1 | 11 |
| 2228 | 64 | ABCDEDF | 2014-09-18 | 5 | 2 | 200 |
| 2229 | 64 | ABCDEDF | 2014-09-18 | 7 | 3 | 300 |
+------+-----+---------+------------+------+----------+--------+
The results that I need, using sub-select:
> SELECT date, hour, SUM(quantity),
( SELECT SUM(quantity) FROM sales s2
WHERE s2.hour <= s1.hour AND s2.date = s1.date
) AS total
FROM sales s1
WHERE s1.date = '2014-09-18'
GROUP by date, hour;
+------------+------+---------------+-------+
| date | hour | sum(quantity) | total |
+------------+------+---------------+-------+
| 2014-09-18 | 3 | 3 | 3 |
| 2014-09-18 | 5 | 2 | 5 |
| 2014-09-18 | 7 | 3 | 8 |
+------------+------+---------------+-------+
My concerns for using sub-select:
once there are round million records in the table, the query may become too slow, not sure if it's subject to optimizations even though it has no HAVING statements.
if I had to filter on a product or dir, I will have to put those conditions to both main SELECT and sub-SELECT too (WHERE product = / WHERE dir =).
sub-select only counts a single sum, while I need two of them (sum(quantity) и sum(amount)) (ERROR 1241 (21000): Operand should contain 1 column(s)).
The closest result I were able to get using JOIN:
> SELECT DISTINCT(s1.hour) AS ih, s2.date, s2.hour, s2.quantity, s2.amount, s2.id
FROM sales s1
JOIN sales s2 ON s2.date = s1.date AND s2.hour <= s1.hour
WHERE s1.date = '2014-09-18'
ORDER by ih;
+----+------------+------+----------+--------+------+
| ih | date | hour | quantity | amount | id |
+----+------------+------+----------+--------+------+
| 3 | 2014-09-18 | 3 | 1 | 100 | 2236 |
| 3 | 2014-09-18 | 3 | 1 | 100 | 2227 |
| 3 | 2014-09-18 | 3 | 1 | 11 | 2234 |
| 5 | 2014-09-18 | 3 | 1 | 100 | 2236 |
| 5 | 2014-09-18 | 3 | 1 | 100 | 2227 |
| 5 | 2014-09-18 | 5 | 2 | 200 | 2228 |
| 5 | 2014-09-18 | 3 | 1 | 11 | 2234 |
| 7 | 2014-09-18 | 3 | 1 | 100 | 2236 |
| 7 | 2014-09-18 | 3 | 1 | 100 | 2227 |
| 7 | 2014-09-18 | 5 | 2 | 200 | 2228 |
| 7 | 2014-09-18 | 7 | 3 | 300 | 2229 |
| 7 | 2014-09-18 | 3 | 1 | 11 | 2234 |
+----+------------+------+----------+--------+------+
I could stop here and just use those results to group by ih (hour), calculate the sum for quantity and amount and be happy. But something eats me up telling that this is wrong.
If I remove DISTINCT most rows become to be duplicated. Replacing JOIN with its invariants doesn't help.
Once I remove s2.id from statement you get a complete mess with disappearing/collapsion meaningful rows (e.g. ids 2236/2227 got collapsed):
> SELECT DISTINCT(s1.hour) AS ih, s2.date, s2.hour, s2.quantity, s2.amount
FROM sales s1
JOIN sales s2 ON s2.date = s1.date AND s2.hour <= s1.hour
WHERE s1.date = '2014-09-18'
ORDER by ih;
+----+------------+------+----------+--------+
| ih | date | hour | quantity | amount |
+----+------------+------+----------+--------+
| 3 | 2014-09-18 | 3 | 1 | 100 |
| 3 | 2014-09-18 | 3 | 1 | 11 |
| 5 | 2014-09-18 | 3 | 1 | 100 |
| 5 | 2014-09-18 | 5 | 2 | 200 |
| 5 | 2014-09-18 | 3 | 1 | 11 |
| 7 | 2014-09-18 | 3 | 1 | 100 |
| 7 | 2014-09-18 | 5 | 2 | 200 |
| 7 | 2014-09-18 | 7 | 3 | 300 |
| 7 | 2014-09-18 | 3 | 1 | 11 |
+----+------------+------+----------+--------+
Summing doesn't help, and it adds up to the mess.
First row (hour = 3) should have SUM(s2.quantity) equal 3, but it has 9. What does SUM(s1.quantity) shows is a complete mystery to me.
> SELECT DISTINCT(s1.hour) AS hour, sum(s1.quantity), s2.date, SUM(s2.quantity)
FROM sales s1 JOIN sales s2 ON s2.date = s1.date AND s2.hour <= s1.hour
WHERE s1.date = '2014-09-18'
GROUP BY hour;
+------+------------------+------------+------------------+
| hour | sum(s1.quantity) | date | sum(s2.quantity) |
+------+------------------+------------+------------------+
| 3 | 9 | 2014-09-18 | 9 |
| 5 | 8 | 2014-09-18 | 5 |
| 7 | 15 | 2014-09-18 | 8 |
+------+------------------+------------+------------------+
Bonus points/boss level:
I also need a column that will show total_reference, the same rolling total for the same periods for a different date (e.g. 2014-09-11).
If you want a cumulative sum in MySQL, the most efficient way is to use variables:
SELECT date, hour,
(#q := q + #q) as cumeq, (#a := a + #a) as cumea
FROM (SELECT date, hour, SUM(quantity) as q, SUM(amount) as a
FROM sales s
WHERE s.date = '2014-09-18'
GROUP by date, hour
) dh cross join
(select #q := 0, #a := 0) vars
ORDER BY date, hour;
If you are planning on working with databases such as Oracle, SQL Server, and Postgres, then you should use a database more similar in functionality and that supports that ANSI standard window functions. The right way to do this is with window functions, but MySQL doesn't support those. Postgres, SQL Server, and Oracle all have free versions that yo can use for development purposes.
Also, with proper indexing, you shouldn't have a problem with the subquery approach, even on large tables.