How do I group aggregated data a certain way - sql

I have the following sample transactional item receipt data, consisting of Item, Vendor and Receipt Date:
Item
Vendor
Receipt_Date
A
1
2021-01-01 00:00:00.000
A
2
2021-01-31 00:00:00.000
B
1
2021-02-01 00:00:00.000
B
2
2021-02-10 00:00:00.000
B
3
2021-02-20 00:00:00.000
C
7
2021-03-01 00:00:00.000
I want to select the Vendor for each Item, based on the last (max) Receipt Date, so the expected result for the above sample would be:
Item
Last_Vendor_For_Receipt
A
2
B
3
C
7
I can group the data per Item and Vendor, but I cannot figure out how to achieve the above expected result with an outer query. I'm using SQL Server 2012. Here's the initial query:
select
ir.Item
,ir.Vendor
,max(ir.Receipt_Date) Last_Receipt_Date
from
ItemReceipt ir
I checked online and in the forum, but it was hard to search for my specific question.
Thanks

Here is one approach using TOP with ROW_NUMBER:
SELECT TOP 1 WITH TIES *
FROM yourTable
ORDER BY ROW_NUMBER() OVER (PARTITION BY Item ORDER BY Receipt_Date DESC);

First you select the desired max date per item:
select max(Receipt_Date) as max_rcpt_date
, Item
from your_unknown_table
group by Item
And then you can use this as a subquery to get the vendor:
select Item
, Vendor
from your_unknown_table
where ( Receipt_Date, Item ) in
( select max(Receipt_Date) as max_rcpt_date
, Item
from your_unknown_table
group by Item
)
This will work in Oracle. I'm not sure if this subquery-structure in SQL-Server wil work.

Related

Handling duplicates when rolling totals using OVER Partition by

I'm trying to get the rolling amount column totals for each date, from the 1st day of the month to whatever the date column value is, shown in the input table.
Output Requirements
Partition by the 'team' column
Restart rolling totals on the 1st of each month
Question 1
Is my below query correct to get my desired output requirements shown in Output Table below? It seems to work but I must confirm.
SELECT
*,
SUM(amount) OVER (
PARTITION BY
team,
month_id
ORDER BY
date ASC
) rolling_amount_total
FROM input_table;
Question 2
How can I handle duplicate dates, shown in the first 2 rows of Input Table? Whenever there is a duplicate date the amount is a duplicate as well. I see a solution here: https://stackoverflow.com/a/60115061/6388651 but no luck getting it to remove the duplicates. My non-working code example is below.
SELECT
*,
SUM(amount) OVER (
PARTITION BY
team,
month_id
ORDER BY
date ASC
) rolling_amount_total
FROM (
SELECT DISTINCT
date,
amount,
team,
month_id
FROM input_table
) t
Input Table
date
amount
team
month_id
2022-04-01
1
A
2022-04
2022-04-01
1
A
2022-04
2022-04-02
2
A
2022-04
2022-05-01
4
B
2022-05
2022-05-02
4
B
2022-05
Desired Output Table
date
amount
team
month_id
Rolling_Amount_Total
2022-04-01
1
A
2022-04
1
2022-04-02
2
A
2022-04
3
2022-05-01
4
B
2022-05
4
2022-05-02
4
B
2022-05
8
Q1. Your sum() over () is correct
Q2. Replace from input_table, in your first query, with :
from (select date, sum(amount) as amount, team, month_id
from input_table
group by date, team, month_id
) as t

Select earliest date and count rows in table with duplicate IDs

I have a table called table1:
id created_date
1001 2020-06-01
1001 2020-01-01
1001 2020-07-01
1002 2020-02-01
1002 2020-04-01
1003 2020-09-01
I'm trying to write a query that provides me a list of distinct IDs with the earliest created_date they have, along with the count of rows each id has:
id created_date count
1001 2020-01-01 3
1002 2020-02-01 2
1003 2020-09-01 1
I managed to write a window function to grab the earliest date, but I'm having trouble figuring out where to fit the count statement in one:
SELECT
id,
created_date
FROM ( SELECT
id,
created_date,
row_number() OVER(PARTITION BY id ORDER BY created_date) as row_num
FROM table1)
) AS a
WHERE row_num = 1
You would use aggregation:
select id, min(create_date), count(*)
from table1
group by id;
I find it amusing that you want to use window functions -- which are considered more advanced -- when lowly aggregation suffices.

Find Column With Max on Other Column Grouping By Another Column

I have a table like this:
Id - ItemId - Price - SalesId - Date
1 12 99.99924 21899234 2025-01-01 00:00:00.000000
2 123 12.34567 348923 2021-01-01 00:00:00.000000
3 1234 1234.5 3321234 2022-01-01 00:00:00.000000
4 12345 3.3246 2154234 2023-01-01 00:00:00.000000
5 1234 451.234 3423 2020-02-01 00:00:00.000000
6 12345 0.989 71112357 2020-09-15 20:20.10.000000
7 123 3435.3 71112357 2020-09-14 20:10:12.000000
I am trying to find the Price of an Item with latest Date. For example, if we tried to find ItemId = 1234, the one with the latest date is this 2022-01-01 00:00:00.000000 that has Id = 3, it has the price of 1234.5. That's what I'm trying to find by this query, the price of this item.
I am a beginner to SQL and tried the following query, but it gives me this error:
select "ItemId",
max("Date"),
"Price"
from "Products"
group by "ItemId"
[42803] ERROR: column "Products.Price" must appear in the GROUP BY clause or be used in an aggregate function
I appreciate any help here. Thank you!
In Postgres, you can use distinct on:
select distinct on ("ItemId") p.*
from "Products" p
order by "ItemId", "Date" desc;
Note: If you are learning SQL, don't use double quotes for string and column names.
You can try using row_number()
select * from
(
select ItemId,Date,Price,row_number() over(partition by itemid order by date desc) as rn
from Products
)A where rn=1

SQL Query - Design struggle

I am fairly new to SQL Server (2012) but I was assigned the project where I have to use it.
The database consists of one table (counted in millions of rows) which looks mainly like this:
Number (float) Date (datetime) Status (nvarchar(255))
999 2016-01-01 14:00:00.000 Error
999 2016-01-02 14:00:00.000 Error
999 2016-01-03 14:00:00.000 Ok
999 2016-01-04 14:00:00.000 Error
888 2016-01-01 14:00:00.000 Error
888 2016-01-02 14:00:00.000 Ok
888 2016-01-03 14:00:00.000 Error
888 2016-01-04 14:00:00.000 Error
777 2016-01-01 14:00:00.000 Error
777 2016-01-02 14:00:00.000 Error
I have to create a query which will show me only the phone numbers (one number per row so probably Group by number?) that meet the conditions:
Number reappears at least 3 times
Last two times (that has to be based on date; originally records are not sorted by date) has to be an Error
For example, in the table above the phone number that meets the criteria is only 888, beacuse for 999 2nd newest status is Ok and number 777 reoccurs only 2 times.
I will appreciate any kind of help!
Thanks in advance!
You can use row_number() and conditional aggregation:
select number
from (select t.*,
row_number() over (partition by number order by date desc) as seqnum
from t
) t
group by number
having count(*) >= 3 and
max(case when seqnum = 1 then status end) = 'Error' and
max(case when seqnum = 2 then status end) = 'Error';
Note: float is a really, really bad type to use for the "number" column. In particular, two numbers can look the same but differ in low-order bits. They will produce different rows in the group by.
You should probably use varchar() for telephone numbers. That gives you the most flexibility. If you need to store the number as a number, then decimal/numeric is a much, much better choice than float.
select *, ROW_NUMBER() OVER(partition by Number, order by date desc) as times
FROM
(
select Number, Date
From table
where Number in
(
select Number
from table
group by Number
having count (*) >3
) as ABC
WHERE ABC.times in (1,2) and ABC.Status = 'Error'
with CTE as
(
select t1.*, row_number() over(partition by t1.Number order by t1.date desc) as r_ord
from MyTable t1
)
select C1.*
from CTE C1
inner join
(
select Number
from CTE
group by Number
having max(r_ord) >=3
) C2
on C1.Number = C2.Number
where C1.r_ord in (1,2)
and C1.Status = 'Error'

Firebird Query- Return first row each group

In a firebird database with a table "Sales", I need to select the first sale of all customers. See below a sample that show the table and desired result of query.
---------------------------------------
SALES
---------------------------------------
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
3 25 05/04/16 08:10
4 31 07/03/16 10:22
5 22 01/02/16 12:30
6 22 10/01/16 08:45
Result: only first sale, based on sale date.
ID CUSTOMERID DTHRSALE
1 25 01/04/16 09:32
2 30 02/04/16 11:22
4 31 07/03/16 10:22
6 22 10/01/16 08:45
I've already tested following code "Select first row in each GROUP BY group?", but it did not work.
In Firebird 2.5 you can do this with the following query; this is a minor modification of the second part of the accepted answer of the question you linked to tailored to your schema and requirements:
select x.id,
x.customerid,
x.dthrsale
from sales x
join (select customerid,
min(dthrsale) as first_sale
from sales
group by customerid) p on p.customerid = x.customerid
and p.first_sale = x.dthrsale
order by x.id
The order by is not necessary, I just added it to make it give the order as shown in your question.
With Firebird 3 you can use the window function ROW_NUMBER which is also described in the linked answer. The linked answer incorrectly said the first solution would work on Firebird 2.1 and higher. I have now edited it.
Search for the sales with no earlier sales:
SELECT S1.*
FROM SALES S1
LEFT JOIN SALES S2 ON S2.CUSTOMERID = S1.CUSTOMERID AND S2.DTHRSALE < S1.DTHRSALE
WHERE S2.ID IS NULL
Define an index over (customerid, dthrsale) to make it fast.
in Firebird 3 , get first row foreach customer by min sales_date :
SELECT id, customer_id, total, sales_date
FROM (
SELECT id, customer_id, total, sales_date
, row_number() OVER(PARTITION BY customer_id ORDER BY sales_date ASC ) AS rn
FROM SALES
) sub
WHERE rn = 1;
İf you want to get other related columns, This is where your self-answer fails.
select customer_id , min(sales_date)
, id, total --what about other colums
from SALES
group by customer_id
So simple as:
select CUSTOMERID min(DTHRSALE) from SALES group by CUSTOMERID