T-SQL group by 2 tables - sql

My tables looks like this:
For each TimeOfDay I would like to get the most frequent Category. For example if there are 3 auctions with unique ClosedTime but each of this Time has TimeOfDay=1 and 2 of these auctions have CategoryId=1 and one auction CategoryId=2 I would like to get:
TimeOfDay | CategoryId
1 | 1
I have tried group by TimeOfDay and CategoryId but still I don't know how to get top category for each TimeOfDay group. I have this:
select t.TimeOfDay, a.CategoryId, count(a.CategoryId)
numberOfSalesInCategory
from Auction a
join Time t on t.Id = a.ClosedTime
where IsSuccess = 1
group by t.TimeOfDay, a.CategoryId
and result for some sample data:
TimeOfDay | CategoryId | numberOfSalesInCategory
0 1 1
1 1 1
1 2 3
2 2 1
0 3 1
3 3 1
3 4 2
So for these data I would like to get:
TimeOfDay | CategoryId
0 | 1 or 3 numberOfSalesInCategory for both is 1
1 | 2 numberOfSalesInCategory is 3
2 | 2 only one category
3 | 4 numberOfSalesInCategory is 2

Technically, you are looking for the mode. There can be multiple modes, if multiple values all have the same frequency. If you are happy to arbitrarily choose one, then a conditional aggregation with row_number() is the solution:
select TimeOfDay,
max(case when seqnum = 1 then CategoryId end) as ModeCategory
from (select t.TimeOfDay, a.CategoryId, count(*) as numberOfSalesInCategory,
row_number() over (partition by t.TimeOfDay order by count(*) ) as seqnum
from Auction a join
Time t
on t.id = a.ClosedTime
where a.isSuccess = 1
group by t.TimeOfDay, a.CategoryId
) ta
group by TimeOfDay;

You could put the current statement in a CTE, rank them with RANK() and then do a stuff statement.
e.g.
; WITH T AS (SELECT t.TimeOfDay, a.CategoryId, COUNT(a.CategoryId)
numberOfSalesInCategory
FROM Auction a
JOIN Time t ON t.Id = a.ClosedTime
WHERE IsSuccess = 1
GROUP BY t.TimeOfDay, a.CategoryId)
, S AS (SELECT T.*
, RANK() OVER (PARTITION BY TimeOfDay ORDER BY numberOfSalesInCategory DESC) RankOrder
FROM T)
SELECT DISTINCT TimeOfDay
, STUFF(((SELECT ' or ' + CONVERT(NVARCHAR, CategoryId)
FROM S
WHERE RankOrder = 1
AND TimeOfDay = BloobleBlah.TimeOfDay
FOR XML PATH('')), 1, 4, '') CategoryId
FROM S BloobleBlah

Related

Selecting top most row in Bigquery based on conditions

I have a huge table, where sometimes 1 product ID has multiple specifications. I want to select the newest but unfortunately, I don't have the date information. please consider this example dataset
Row ID Type Sn Sn_Ind
1 3 SLN SL20 20
2 1 SL SL 0
3 2 SL SL 0
4 1 M SL21 10
5 3 M SL21 10
6 1 SLN SL20 20
I used the below query to somehow group the products in give them row numbers like
with cleanedMasterData as(
SELECT *
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Sn DESC, Sn_Ind DESC) AS rn
FROM `project.dataset.table`
)
-- where rn = 1
)
select * from cleanedMasterData
Please find below the example table after cleaning
Row ID Type Sn Sn_Ind rn
1 1 SL SL 0 1
2 1 M SL21 10 2
3 1 SLN SL20 20 3
4 2 SL SL 0 1
5 3 M SL21 10 1
6 3 SLN SL20 20 2
but if you see for ID 2 and 3, I can easily select the top row with where rn = 1
but for ID 1, my preferred row would be 2 because that is the newest.
My question here is how do I prioritise a value in column so that I can get the desired solution like :
Row ID Type Sn Sn_Ind rn
1 1 M SL21 10 1
2 2 SL SL 0 1
3 3 M SL21 10 1
As the values are fixed in Sn column - for ex SL, SL20, SL19, SL21 etc - If somehow I can give weightage to these values and create a new temp column with weightage and sort based on it, then?
Thank you for your support in advance!!
Consider below
SELECT *
FROM `project.dataset.table`
WHERE TRUE
QUALIFY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY IF(Sn = 'SL', 0, 1) DESC, Sn DESC) = 1
If applied to sample data in your question - output is
It wasn't difficult, I tried a few things and it worked out. If anyone can optimize the below solution even more that would be awesome.
first the dataset
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 ID, 'SLN' Type, 'SL20' Sn, 20 Sn_Ind UNION ALL
SELECT 1 , 'SL' , 'SL' , 0 UNION ALL
SELECT 2 , 'SL' , 'SL' , 0 UNION ALL
SELECT 1 , 'M' , 'SL21' , 10 UNION ALL
SELECT 3 , 'M' , 'SL21' , 10 UNION ALL
SELECT 1 , 'SLN' , 'SL20' , 20
)
with weightage as(
SELECT
*,
MAX(CASE Sn WHEN 'SL' THEN 0 ELSE 1 END) OVER (PARTITION BY Sn) AS weightt,
FROM
`project.dataset.table`
ORDER BY
weightt DESC, Sn DESC
), main as (
select * EXCEPT(rn, weightt)
from (
select * ,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY weightt DESC, Sn DESC) AS rn
from weightage )
where rn = 1
)
select * from main
after this, I can get the desired result
Row ID Type Sn Sn_Ind
1 1 M SL21 10
2 2 SL SL 0
3 3 M SL21 10

Select non existing Numbers from Table each ID

I‘m new in learning TSQL and I‘m struggling getting the numbers that doesn‘t exist in my table each ID.
Example:
CustomerID Group
1 1
3 1
6 1
4 2
7 2
I wanna get the ID which does not exist and select them like this
CustomerID Group
2 1
4 1
5 1
5 2
6 2
....
..
The solution by usin a cte doesn‘t work well or inserting first the data and do a not exist where clause.
Any Ideas?
If you can live with ranges rather than a list with each one, then an efficient method uses lead():
select group_id, (customer_id + 1) as first_missing_customer_id,
(next_ci - 1) as last_missing_customer_id
from (select t.*,
lead(customer_id) over (partition by group_id order by customer_id) as next_ci
from t
) t
where next_ci <> customer_id + 1
Cross join 2 recursive CTEs to get all the possible combinations of [CustomerID] and [Group] and then LEFT join to the table:
declare #c int = (select max([CustomerID]) from tablename);
declare #g int = (select max([Group]) from tablename);
with
customers as (
select 1 as cust
union all
select cust + 1
from customers where cust < #c
),
groups as (
select 1 as gr
union all
select gr + 1
from groups where gr < #g
),
cte as (
select *
from customers cross join groups
)
select c.cust as [CustomerID], c.gr as [Group]
from cte c left join tablename t
on t.[CustomerID] = c.cust and t.[Group] = c.gr
where t.[CustomerID] is null
and c.cust > (select min([CustomerID]) from tablename where [Group] = c.gr)
and c.cust < (select max([CustomerID]) from tablename where [Group] = c.gr)
See the demo.
Results:
> CustomerID | Group
> ---------: | ----:
> 2 | 1
> 4 | 1
> 5 | 1
> 5 | 2
> 6 | 2

How to Generate Row number Partition by two column match in sql

Tbl1
---------------------------------------------------------
Id Date Qty ReOrder
---------------------------------------------------------
1 1-1-18 1 3
2 2-1-18 0 3
3 3-1-18 2 3
4 4-1-18 3< >3
5 5-1-18 2 3
6 6-1-18 0 3
7 7-1-18 1 3
8 8-1-18 0 3
---------------------------------------------------------
I want the result like below
---------------------------------------------------------
Id Date Qty ReOrder
---------------------------------------------------------
1 1-1-18 1 3
5 5-1-18 2 3
---------------------------------------------------------
if ReOrder not same with Qty then date will be same upto after reorder=Qty
You can use cumulative approach with row_number() function :
select top (1) with ties *
from (select *, max(case when qty = reorder then 'v' end) over (order by id desc) grp
from table
) t
order by row_number() over(partition by grp order by id);
Unfortunately this will require SQL Server, But you can also do:
select *
from (select *, row_number() over(partition by grp order by id) seq
from (select *, max(case when qty = reorder then 'v' end) over (order by id desc) grp
from table
) t
) t
where seq = 1;

SQL update all records except the last one with a value

I need to make a query where only the last line of each user that has a car gets a license plate number.
ID UserId LicensePlate HasCar
1 1 ABC123 1
2 1 ABC123 1
3 2 NULL 0
4 3 UVW789 1
5 3 UVW789 1
Should become:
ID UserId LicensePlate HasCar
1 1 NULL 1
2 1 ABC123 1
3 2 NULL 0
4 3 NULL 1
5 3 UVW789 1
So I basically need to find all users with a licenseplate and change all but the last one and make the LicensePlate NULL
Assuming the ID column is an identity column so it can provide the ordering, something like this should do the trick:
;WITH CTE AS
(
SELECT Id,
UserId,
LicensePlate,
ROW_NUMBER() OVER(PARTITION BY UserId ORDER BY Id DESC) rn
FROM Table
WHERE HasCar = 1
)
UPDATE CTE
SET LicensePlate = NULL
WHERE rn > 1
You can try this
UPDATE l
SET l.LicensePlate = null
FROM Car l
INNER JOIN (SELECT UserId, Max(Id) AS max_id
FROM Car
GROUP BY UserId) m ON m.UserId = l.UserId
AND m.max_id <> l.id
You can do it with a join on the table itself like that :
UPDATE car c
INNER JOIN car c2 ON c.userId = c2.userId AND c.id < c2.id AND c.HasCar = 1 AND c2.HasCar = 1
SET c.LicensePlate = NULL
The condition c.id < c2.id will avoid to select the last line
By using LAG Function also you can achieve it.
;WITH License(ID,UserId,LicensePlate,HasCar)
as
(
SELECT 1,1,'ABC123',1 UNION ALL
SELECT 2,1,'ABC123',1 UNION ALL
SELECT 3,2,NULL ,0 UNION ALL
SELECT 4,3,'UVW789',1 UNION ALL
SELECT 5,3,'UVW789',1
)
SELECT ID,UserId,LAG(LicensePlate,1,NULL) OVER(PARTITION BY UserId ORDER BY LicensePlate),HasCar FROM License

Any other alternative to write this SQL query

I need to select data base upon three conditions
Find the latest date (StorageDate Column) from the table for each record
See if there is more then one entry for date (StorageDate Column) found in first step for same ID (ID Column)
and then see if DuplicateID is = 2
So if table has following data:
ID |StorageDate | DuplicateTypeID
1 |2014-10-22 | 1
1 |2014-10-22 | 2
1 |2014-10-18 | 1
2 |2014-10-12 | 1
3 |2014-10-11 | 1
4 |2014-09-02 | 1
4 |2014-09-02 | 2
Then I should get following results
ID
1
4
I have written following query but it is really slow, I was wondering if anyone has better way to write it.
SELECT DISTINCT(TD.RecordID)
FROM dbo.MyTable TD
JOIN (
SELECT T1.RecordID, T2.MaxDate,COUNT(*) AS RecordCount
FROM MyTable T1 WITH (nolock)
JOIN (
SELECT RecordID, MAX(StorageDate) AS MaxDate
FROM MyTable WITH (nolock)
GROUP BY RecordID)T2
ON T1.RecordID = T2.RecordID AND T1.StorageDate = T2.MaxDate
GROUP BY T1.RecordID, T2.MaxDate
HAVING COUNT(*) > 1
)PT ON TD.RecordID = PT.RecordID AND TD.StorageDate = PT.MaxDate
WHERE TD.DuplicateTypeID = 2
Try this and see how the performance goes:
;WITH
tmp AS
(
SELECT *,
RANK() OVER (PARTITION BY ID ORDER BY StorageDate DESC) AS StorageDateRank,
COUNT(ID) OVER (PARTITION BY ID, StorageDate) AS StorageDateCount
FROM MyTable
)
SELECT DISTINCT ID
FROM tmp
WHERE StorageDateRank = 1 -- latest date for each ID
AND StorageDateCount > 1 -- more than 1 entry for date
AND DuplicateTypeID = 2 -- DuplicateTypeID = 2
You can use analytic function rank , can you try this query ?
Select recordId from
(
select *, rank() over ( partition by recordId order by [StorageDate] desc) as rn
from mytable
) T
where rn =1
group by recordId
having count(*) >1
and sum( case when duplicatetypeid =2 then 1 else 0 end) >=1