Query Last data group by column - sql

I have this data;
date owner p.code product
---- ----- ----- ------
21.08.2020 Micheal 5 apple
22.08.2020 Micheal 5 apple
15.08.2020 George 4 biscuit
14.08.2020 George 4 biscuit
10.08.2020 Micheal 4 biscuit
23.08.2020 Alice 2 pear
15.08.2020 Alice 2 pear
14.08.2020 Micheal 2 pear
11.08.2020 Micheal 2 pear
I want to group them trought to product and show last date and last owner.
like this ;
date owner p.code product
---- ----- ------ ------
22.08.2020 Micheal 5 apple
15.08.2020 George 4 biscuit
23.08.2020 Alice 2 pear

In Oracle, you can phrase this using group by:
select product, code,
max(date) as max_date,
max(owner) keep (dense_rank first order by date desc) as owner_at_max_date
from t
group by product, code;
The keep syntax is Oracle's rather verbose way of implementing a first() aggregation function.

You can use window functions:
select *
from (
select t.*, row_number() over(partition by product order by date desc) rn
from mytable t
) t
where rn = 1

Related

Creating the SQL script that tracks progress of an object

I have a table with this schema:
Fruit Truck ID Bucket ID Date
------ ----- --------- ----------
Apple 1 101 2018/04/01
Apple 1 101 2018/04/10
Apple 1 112 2018/04/16
Apple 2 782 2018/08/18
Apple 2 782 2018/09/12
Apple 1 113 2019/09/12
Apple 1 113 2019/09/21
My goal is to write an SQL script that returns the start and end dates of each truck & bucket pair for each fruit. The intended result is below:
Fruit Truck ID Bucket ID Start Date End Date
------ ----- --------- ---------- ----------
Apple 1 101 2018/04/01 2018/04/16
Apple 1 112 2018/04/16 2018/08/18
Apple 2 782 2018/08/18 2018/09/12
Apple 1 113 2019/09/12 2019/09/21
I have tried solving this through lag/lead window functions, but it the dates are not correct. Is there another method of solving this using window functions or do I have to create sub queries for this?
I think you want aggregation and window functions:
select fruit, truck_id, bucket_id,
min(date) start_date,
lead(min(date), 1, max(date)) over(partition by fuit order by min(date)) end_date
from mytable
group by fruit, truck_id, bucket_id

SQL nested query and use of MAX to extract most recent transaction and/or comment

We have a SQL database table recording customer comments (ARCMM). I want to extract the most recent comment for each customer. Some customers do not have any comments (i.e. no entries in ARCMM).
The most recent comment for a customer will have the most recent date (field DATEENTR) and, for that date, the highest value of field CNTUNIQ. The query below does not work as expected. Best fix?
Query:
SELECT
----- Customer masterfile
[ARCUS].[IDCUST],
[ARCUS].[NAMECUST],
----- Customer comments
[ARCMM].[CNTUNIQ],
[ARCMM].[DATEENTR],
[ARCMM].[TEXT]
FROM
[ARCUS]
----- Table ARCMM roto ID AR0021 Customer Comments -----
LEFT JOIN [ARCMM]
ON
[ARCMM].[IDCUST] = [ARCUS].[IDCUST]
AND
[ARCMM].[CNTUNIQ] =
(
SELECT MAX([CNTUNIQ])
FROM [ARCMM] ARCMMcopy2
WHERE
[ARCMMcopy2].[IDCUST] = [ARCMM].[IDCUST]
AND
[ARCMM].[DATEENTR] =
(
SELECT MAX([DATEENTR])
FROM [ARCMM] ARCMMcopy1
WHERE
[ARCMMcopy1].[IDCUST] = [ARCMM].[IDCUST]
)
)
Sample table ARCMM data:
IDCUST DATEEENTR CNTUNIQ TEXT
Bob 20200311 1 Bob has woken up
Bob 20200311 2 Bob is having breakfast
Bob 20200629 1 Bob is sleeping <most recent for IDCUST Bob
Jill 20200128 1 Order started
Jill 20200218 1 Order sent
Jill 20200218 2 Goods received
Jill 20200218 3 Goods counted
Jill 20200325 1 Invoice received
Jill 20200325 2 Invoice processed <most recent for IDCUST Jill
Alison 20200225 1 Swimming
Alison 20200425 1 Walking
Alison 20200425 2 Running
Alison 20200425 3 Running
Alison 20200425 4 Sprinting
Alison 20200425 5 Jogging
Alison 20200425 6 Stopped <most recent for IDCUST Alison
Results from my SQL query attempt:
IDCUST NAMECUST CNTUNIQ DATEENTR TEXT
Bob Bob Brown Null Null Null
Jill Jill Jenkins Null Null Null
Alison Alison Allpress 6 20200425 Stopped
Desired results:
IDCUST NAMECUST CNTUNIQ DATEENTR TEXT
Bob Bob Brown 1 20200629 Bob is sleeping
Jill Jill Jenkins 2 20200325 Invoice processed
Alison Alison Allpress 6 20200425 Stopped
You could use row_number() within the left join, if your database supports window functions:
SELECT
c.[IDCUST],
c.[NAMECUST],
m.[CNTUNIQ],
m.[DATEENTR],
m.[TEXT]
FROM [ARCUS] c
LEFT JOIN (
SELECT
m.*,
ROW_NUMBER() OVER(
PARTITION BY [IDCUST]
ORDER BY [DATEENTR] DESC, [CNTUNIQ] DESC
) rn
FROM [ARCMM] m
) m ON m.[IDCUST] = c.[IDCUST] and m.rn = 1

How to query: "for which do these values apply"?

I'm trying to match and align data, or resaid, count occurrences and then list for which values those occurrences occur.
Or, in a question: "How many times does each ID value occur, and for what names?"
For example, with this input
Name ID
-------------
jim 123
jim 234
jim 345
john 123
john 345
jane 234
jane 345
jan 45678
I want the output to be:
count ID name name name
------------------------------------
3 345 jim john jane
2 123 jim john
2 234 jim jane
1 45678 jan
Or similarly, the input could be (noticing that the ID values are not aligned),
jim john jane jan
----------------------------
123 345 234 45678
234 123 345
345
but that seems to complicate things.
As close as I am to the desired results is in SQL, as
for ID, count(ID)
from table
group by (ID)
order by count desc
which outputs
ID count
------------
345 3
123 2
234 2
45678 1
I'll appreciate help.
You seem to want a pivot. In SQL, you have to specify the number of columns in advance (unless you construct the query as a string).
But the idea is:
select ID, count(*) as cnt,
max(case when seqnum = 1 then name end) as name_1,
max(case when seqnum = 2 then name end) as name_2,
max(case when seqnum = 3 then name end) as name_3
from (select t.*,
row_number() over (partition by id order by id) as seqnum -- arbitrary ordering
from table t
) t
group by ID
order by count desc;
If you have an unknown number of columns, you can aggregate the values into an array:
select ID, count(*) as cnt,
array_agg(name order by name) as names
from table t
group by ID
order by count desc
the query would look similar to this if that's what you're looking for.
SELECT
name,
id,
COUNT(id) as count
FROM
dataSet
WHERE
dataSet.name = 'input'
AND dataSet.id = 'input'
GROUP BY
name,
id

Limit column value repeats to top 2

So I have this query:
SELECT
Search.USER_ID,
Search.SEARCH_TERM,
COUNT(*) AS Search.count
FROM Search
GROUP BY 1,2
ORDER BY 3 DESC
Which returns a response that looks like this:
USER_ID SEARCH_TERM count
bob dog 50
bob cat 45
sally cat 38
john mouse 30
sally turtle 10
sally lion 5
john zebra 3
john leopard 1
And my question is: How would I change the query, so that it only returns the top 2 most-searched-for-terms for any given user? So in the example above, the last row for Sally would be dropped, and the last row for John would also be dropped, leaving a total of 6 rows; 2 for each user, like so:
USER_ID SEARCH_TERM count
bob dog 50
bob cat 45
sally cat 38
john mouse 30
sally turtle 10
john zebra 3
In SQL Server, you can put the original query into a CTE, add the ROW_NUMBER() function. Then in the new main query, just add a WHERE clause to limit by the row number. Your query would look something like this:
;WITH OriginalQuery AS
(
SELECT
s.[User_id]
,s.Search_Term
,COUNT(*) AS 'count'
,ROW_NUMBER() OVER (PARTITION BY s.[USER_ID] ORDER BY COUNT(*) DESC) AS rn
FROM Search s
GROUP BY s.[User_id], s.Search_Term
)
SELECT oq.User_id
,oq.Search_Term
,oq.count
FROM OriginalQuery oq
WHERE rn <= 2
ORDER BY oq.count DESC
EDIT: I specified SQL Server as the dbms I used here, but the above should be ANSI-compliant and work in Snowflake.

Display the latest modified record for each employee

emp table as like this
id Name Date Modified
1 Ram 2017-01-05
2 Kishore 2017-02-04
3 John 2017-04-22
1 Ram K 2017-04-25
1 Ram Kumar 2017-05-01
2 Kishore Babu 2017-05-05
3 John B 2017-06-01
Assuming you're using a reasonable rdbms that supports window functions, row_number should do the trick:
SELECT id, name, date_modified
FROM (SELECT id, name, date_modified,
ROW_NUMBER() OVER (PARTITION BY id ORDER BY date_modified DESC) rn
FROM emp) t
WHERE rn = 1