SQL Query -- how to find lowest 2 numbers - sql

I need to create a query that finds the lowest 2 values for each unique item in a table -- I am trying to find the first 2 shipments of each item.
So if the shipping table has:
ID ---- Date --- PartID
1 ---- 1/1 ---- 1
2 ---- 1/2 ---- 2
3 ---- 1/2 ---- 1
4 ---- 1/3 ---- 1
I would want rows 1, 2, and 3 returned as they are the first and second shipment of each item.
I can create a query that gets the lowest 2 values:
Select Min(ShipmentID) as SID
from dbo.Shipment
UNION
Select Min(ShipmentID) as SID
from dbo.Shipment
where (ShipmentID >
(Select Min(ShipmentID)
from dbo.Shipment))
but when I add in other information I only get the lowest for each item, not both:
Select Min(ShipmentID) as SID, AddressIDBilling
from dbo.Shipment
Group by AddressIDBilling
UNION
Select Min(ShipmentID) as SID, AddressIDBilling
from dbo.Shipment
where (ShipmentID >
(Select Min(ShipmentID)
from dbo.Shipment))
Group By AddressIDBilling
Order By AddressIDBilling
-- returns only 1 row for each AddressID, not the 2 records that I would want.

If SQL server, use a CTE and a row_number()
with CTE as
(
select PartID, Date, row_number() over(partition by PartID order by Date) as PartOrd
from MyTable
)
select PartID, Date, PartOrd
from CTE
where PartOrd <=2

The normal way of doing this uses window functions, in this case rank() or row_number() (depending on how you want to handle ties):
select s.*
from (select s.*,
row_number() over (partition by partid order by date asc) as seqnum
from dbo.shipment s
) s
where seqnum <= 2;

Related

Unable to get dedupe records with rank

I am trying to dedupe my dataset using rank, but it is not assigning a different number to the second record. What am I doing wrong here?
with get_rank as (
select id, code, rank() over (partition by id order by z.rowid) as ranking
from mytable z
)
select *
from get_rank
where ranking = 1
and id = 72755
ID CODE RANKING
---------- ---- ----------
72755 M 1
72755 M 1
Use row_number():
with get_rank as (
select id, code,
row_number() over (partition by id order by z.rowid) as ranking
from mytable z
)
select *
from get_rank
where ranking = 1 and id = 72755;
It is guaranteed to return a different value for each row.

Finding the highest COUNT of a group per individual GROUP BY query in Hive

I have a table of customer transactions where an individual_id appears once for every different transaction.
There is a category column called Name_desc which i would like to group by individual and find the most common category of name_desc per individual.
Suppose data is like below
Id Name_desc
---- ------
1 a
2 c
1 b
2 c
1 b
I want below output
Id Name_desc( most occuring category)
------ ------
1 b
2 c
I tried with below query and got an
Error while compiling statement: FAILED: ParseException line 4:19 cannot recognize input near 'select' 'max' '(' in expression specification
error
select name_desc, count(*) as count_e
from db.cust_scan
group by id, name_desc
having count(*)= ( select max(count_e),id
from
(
select id, name_desc, count(*) as count_e
from
db.cust_scan
where
base_div_nbr =1
and
country_code ='US'
and
retail_channel_code=1
and visit_date between '2019-01-01' and '2019-12-31'
GROUP by
individual_id, tt_id_desc
order by individual_id, count_e desc
) as t
group by individual_id )
I would appreciate any suggestions or help with regard to query. If there is an efficient way of getting this job done. Let me know.
This following script written and tested for MSSQL. But as HIVE also support the same Row_Number() ans sub query, this following query should help you getting your required output-
SELECT A.Id, A.Name_desc
FROM
(
SELECT Id,Name_desc,
row_number() over (partition by id order by COUNT(*) desc) AS RN
FROM your_table
GROUP BY Id,Name_desc
) A
WHERE RN = 1
You need subquery in Hive:
SELECT s.Id, s.Name_desc
FROM
(
select s.*, row_number() over (partition by s.id order by s.cnt desc) rn
from
(
SELECT Id, Name_desc, COUNT(*) cnt
FROM your_table
GROUP BY Id, Name_desc
) s
) s
WHERE rn= 1;

SQL MIN(value) matching row in PostgreSQL

I have a following tables:
TABLE A:
ID ID NAME PRICE CODE
00001 B 1000 1
00002 A 2000 1
00003 C 3000 1
Here is the SQL I use:
Select Min (ID),
Min (ID NAME),
Sum(PRICE)
From A
GROUP BY CODE
Here is what I get:
ID ID NAME PRICE
00001 A 6000
As you can see, ID NAME don't match up with the min row value. I need them to match up.
I would like the query to return the following
ID ID NAME PRICE
00001 B 6000
What SQL can I use to get that result?
If you want one row, use limit or fetch first 1 row only:
select a.*
from a
order by a.price asc
fetch first 1 row only;
If, for some reason, you want the sum() of all prices, then you can use window functions:
select a.*, sum(a.price) over () as sum_prices
from a
order by a.price asc
fetch first 1 row only;
You can use row_number() function :
select min(id), max(case when seq = 1 then id_name end) as id_name, sum(price) as price, code
from (select t.*, row_number() over (partition by code order by id) seq
from table t
) t
group by code;
you can also use sub-query
select t1.*,t2.* from
(select ID,Name from t where ID= (select min(ID) from t)
) as t1
cross join (select sum(Price) as total from t) as t2
https://dbfiddle.uk/?rdbms=postgres_10&fiddle=a496232b552390a641c0e5c0fae791d1
id name total
1 B 6000

Find minimum value in groups of rows

In the SQL space (specifically T-SQL, SQL Server 2008), given this list of values:
Status Date
------ -----------------------
ACT 2012-01-07 11:51:06.060
ACT 2012-01-07 11:51:07.920
ACT 2012-01-08 04:13:29.140
NOS 2012-01-09 04:29:16.873
ACT 2012-01-21 12:39:37.607 <-- THIS
ACT 2012-01-21 12:40:03.840
ACT 2012-05-02 16:27:17.370
GRAD 2012-05-19 13:30:02.503
GRAD 2013-09-03 22:58:48.750
Generated from this query:
SELECT Status, Date
FROM Account_History
WHERE AccountNumber = '1234'
ORDER BY Date
The status for this particular object started at ACT, then changed to NOS, then back to ACT, then to GRAD.
What is the best way to get the minimum date from the latest "group" of records where Status = 'ACT'?
Here is a query that does this, by identifying the groups where the student statuses are the same and then using simple aggregation:
select top 1 StudentStatus, min(WhenLastChanged) as WhenLastChanged
from (SELECT StudentStatus, WhenLastChanged,
(row_number() over (order by "date") -
row_number() over (partition by studentstatus order by "date)
) as grp
FROM Account_History
WHERE AccountNumber = '1234'
) t
where StudentStatus = 'ACT'
group by StudentStatus, grp
order by WhenLastChanged desc;
The row_number() function assigns sequential numbers within groups of rows based on the date. For your data, the two row_numbers() and their difference is:
Status Date
------ -----------------------
ACT 2012-01-07 11:51:06.060 1 1 0
ACT 2012-01-07 11:51:07.920 2 2 0
ACT 2012-01-08 04:13:29.140 3 3 0
NOS 2012-01-09 04:29:16.873 4 1 3
ACT 2012-01-21 12:39:37.607 5 4 1
ACT 2012-01-21 12:40:03.840 6 5 1
ACT 2012-05-02 16:27:17.370 7 6 1
GRAD 2012-05-19 13:30:02.503 8 1 7
GRAD 2013-09-03 22:58:48.750 9 2 7
Notice the last row is constant for rows that have the same status.
The aggregation brings these together and chooses the latest (top 1 . . . order by date desc) of the first dates (min(date)).
EDIT:
The query is easy to tweak for multiple account numbers. I probably should have written that way to begin with, except the final selection is trickier. The results from this has the date for each status and account:
select StudentStatus, min(WhenLastChanged) as WhenLastChanged
from (SELECT StudentStatus, WhenLastChanged, AccountNumber
(row_number() over (partition by AccountNumber order by WhenLastChanged) -
row_number() over (partition by AccountNumber, studentstatus order by WhenLastChanged)
) as grp
FROM Account_History
) t
where StudentStatus = 'ACT'
group by AccountNumber, StudentStatus, grp
order by WhenLastChanged desc;
But you can't get the last one per account quite so easily. Another level of subqueries:
select AccountNumber, StudentStatus, WhenLastChanged
from (select AccountNumber, StudentStatus, min(WhenLastChanged) as WhenLastChanged,
row_number() over (partition by AccountNumber, StudentStatus order by min(WhenLastChanged) desc
) as seqnum
from (SELECT AccountNumber, StudentStatus, WhenLastChanged,
(row_number() over (partition by AccountNumber order by WhenLastChanged) -
row_number() over (partition by AccountNumber, studentstatus order by WhenLastChanged)
) as grp
FROM Account_History
) t
where StudentStatus = 'ACT'
group by AccountNumber, StudentStatus, grp
) t
where seqnum = 1;
This uses aggregation along with the window function row_number(). This is assigning sequential numbers to the groups (after aggregation), with the last date for each account getting a value of 1 (order by min(WhenLastChanged) desc). The outermost select then just chooses that row for each account.
SELECT [Status], MIN([Date])
FROM Table_Name
WHERE [Status] = (SELECT [Status]
FROM Table_Name
WHERE [Date] = (SELECT MAX([Date])
FROM Table_Name)
)
GROUP BY [Status]
Try here Sql Fiddle
Hogan: basically, yes. I just want to know the date/time when the
account was last changed to ACT. The records after the point above
marked THIS are just extra.
Instead of just looking for act we can look for first time status changes and select act (and max) from that.
so... every time a status changes:
with rownumb as
(
select *, row_number() OVER (order by date asc) as rn
)
select status, date
from rownumb A
join rownumb B on A.rn = B.rn-1
where a.status != b.status
now finding the max of the act items.
with rownumb as
(
select *, row_number() OVER (order by date asc) as rn
), statuschange as
(
select status, date
from rownumb A
join rownumb B on A.rn = B.rn-1
where a.status != b.status
)
select max(date)
from satuschange
where status='Act'

SQL display only MAX of my COUNT() column

Have a query that shows this...
salesPersonId Total
------------- -----------
AB4 3
GT10 2
JB9 1
JS1 2
KT8 4
TC3 4
VG7 2
WC2 7
(8 row(s) affected)
My query is...
SELECT so.salesPersonId, COUNT(so.orderId) AS 'Total'
FROM salesOrder AS so
GROUP BY so.salesPersonId
GO
I wanted to do this...
SELECT so.salesPersonId, COUNT(so.orderId) AS 'Total'
FROM salesOrder AS so
WHERE MAX(COUNT(so.orderId))
GROUP BY so.salesPersonId
GO
This gives me an error, any ideas on how to show only the salesPersonId with the highest total? Here being WC2.
You can use a common table expression (or a subquery) to get the breakdown and then select all entries in your CTE where their total is equal to max total (as there may be more than one):
;WITH TotalOrders
AS
(
SELECT so.salesPersonId, COUNT(so.orderId) AS 'Total'
FROM salesOrder AS so
GROUP BY so.salesPersonId
)
SELECT *
FROM TotalOrders [TO]
WHERE [TO].Total = (SELECT MAX([TO].Total) FROM TotalOrders [TO])
WITH totalCount
AS
(
SELECT so.salesPersonId, COUNT(so.orderId) AS 'Total'
FROM salesOrder AS so
GROUP BY so.salesPersonId
),
maxCount AS
(
SELECT salesPersonId, Total,
DENSE_RANK() OVER (ORDER BY Total DESC) rn
FROM totalCount
)
SELECT salesPersonId, Total
FROM maxCount
WHERE rn = 1