Distinct columns having the max date value - sql

Let's say I have a table:
LpOpenTradeId LPSource SymbolId Volume CreatedUser CreatedDate
1 2 1 10.00 2 2015-12-11 00:00:00.000
2 2 4 12.00 2 2015-12-11 00:00:00.000
3 2 1 10.00 2 2015-12-11 10:53:00.000
4 2 3 1.00 2 2015-12-11 18:03:14.676
5 2 5 1.00 2 2015-12-14 09:38:33.691
6 2 3 2.00 2 2015-12-14 09:39:30.305
7 2 4 13.00 2 2015-12-14 09:43:13.916
8 3 1 15.00 2 2015-12-11 10:53:00.000
I want to select the distinct LPSource and SymbolId columns with the Volumes having max CreatedDates. I mean the target result set is:
LPSource SymbolId Volume CreatedDate
2 1 10.00 2015-12-11 10:53:00.000
2 4 13.00 2015-12-14 09:43:13.916
2 3 2.00 2015-12-14 09:39:30.305
2 5 1.00 2015-12-14 09:38:33.691
3 1 15.00 2015-12-11 10:53:00.000
How can I express myself to have this resultset in T-SQL?
Thanks,

You can use ROW_NUMBER:
SELECT LPSource, SymbolId, Volume, CreatedDate
FROM (
SELECT LPSource, SymbolId, Volume, CreatedDate,
ROW_NUMBER() OVER (PARTITION BY LPSource, SymbolId
ORDER BY CreatedDate DESC) AS rn
FROM mytable) AS t
WHERE t.rn = 1
In case of CreatedDate ties, i.e. more than one records sharing the same maximum CreatedDate value within the same LPSource, SymbolId partition, the above query will randomly select one record. You can use RANK to select all records in such a case.

IF OBJECT_ID('Tabel1','U') IS NOT NULL
BEGIN
DROP TABLE Tabel1
END
CREATE TABLE Tabel1 (LpOpenTradeId INT
,LPSource INT
,SymbolId INT
,Volume DECIMAL(10,2)
,CreatedUser INT
,CreatedDate DATETIME2
)
INSERT INTO Tabel1 VALUES (1,2,1,10,2,'2015-12-11 00:00:00.000');
INSERT INTO Tabel1 VALUES (2,2,4,12,2,'2015-12-11 00:00:00.000');
INSERT INTO Tabel1 VALUES (3,2,1,10,2,'2015-12-11 10:53:00.000');
INSERT INTO Tabel1 VALUES (4,2,3,1,2,'2015-12-11 18:03:14.676');
INSERT INTO Tabel1 VALUES (5,2,5,1,2,'2015-12-14 09:38:33.691');
INSERT INTO Tabel1 VALUES (6,2,3,2,2,'2015-12-14 09:39:30.305');
INSERT INTO Tabel1 VALUES (7,2,4,13,2,'2015-12-14 09:43:13.916');
INSERT INTO Tabel1 VALUES (8,3,1,15,2,'2015-12-11 10:53:00.000');
SELECT DISTINCT t1.LPSource
,t1.SymbolId
,t1.Volume
,t1.CreatedDate
FROM Tabel1 t1
JOIN (
SELECT LPSource
,SymbolId
,MAX(CreatedDate) AS CreatedDate
FROM Tabel1
GROUP BY LPSource
,SymbolId
) t2 on t2.LPSource = t1.LPSource AND t2.SymbolId = t1.SymbolId AND t2.CreatedDate = t1.CreatedDate
The JOIN part :
SELECT LPSource
,SymbolId
,MAX(CreatedDate) AS CreatedDate
FROM Tabel1
GROUP BY LPSource
,SymbolId
get's the latest LPSource and SymbolId. You then JOIN your initial table with all the columns in this temporary table (t2) giving you the result needed.t1.LPSource, t1.SymbolId, t1.Volume, t1.CreatedDate

Use NOT EXISTS to return a row if no other row with same LPSource/SymbolId has (1) a later CreatedDate, or (2) same CreatedDate but a higher Volume.
select distinct LPSource, SymbolId, Volume, CreatedDate
from tablename t1
where not exists (select 1 from tablename t2
where t2.LPSource = t1.LPSource
and t2.SymbolId = t1.SymbolId
and (t2.CreatedDate > t1.CreatedDate
or (t2.CreatedDate = t1.CreatedDate and
t2.volume > t1.volume))

Related

Table containing duplicates needs a mass update

I've got a table containing data like this
transactionCode
column2
column3
column4
wCode
aCode
column7
column8
column9
column10
liNumber
7938636
2
INVOICE
NULL
1
MZ690577
2021-01-28
NULL
2021-01-28
1
6
7938636
2
INVOICE
NULL
1
MD191807
2021-01-28
NULL
2021-01-28
1
4
7938631
2
INVOICE
NULL
1
MZ320771
2021-01-28
NULL
2021-01-28
1
1
7938631
2
INVOICE
NULL
1
7803A112
2021-01-28
NULL
2021-01-28
4
2
7938576
2
INVOICE
NULL
1
8201A216
2021-01-29
NULL
2021-01-29
1
1
7938598
2
INVOICE
NULL
1
SP046271
2021-01-29
NULL
2021-01-29
1
14
I've also got a script like this which finds the duplicates for me
WITH cte
AS (SELECT transactionid,
aCode,
liNumber,
wCode,
RN = Row_number()
OVER(
partition BY
transactionid,
aCode,
liNumber,
wCode
ORDER BY
transactionid)
FROM duplicates)
SELECT * FROM cte
WHERE RN > 1;
When running that script the data shown is in a format like this..
transactionID
aCode
liNumber
wCode
RN
1012751
DISCOUNT
9
1
2
I can then search for that aCode or transactionID in the duplicates table to see how many there are. So far in my duplicates table, that script returns a total of 34,791 rows. Note, items that have the same liNumber needs to be changed.
My ask is, how do I go about doing this with this large amount of data?
For example,
Transaction 7938636 might have 5 rows. All with the same wCode and the same aCode BUT the liNumber goes up in increments like 1, 2, 3, 4 ect. When a row has the same liNumber; say 1 then that is classed as a duplicate. I then need to update that duplicate row to continue the increments, from 6, 7 , 8 ect.
Does this make sense?
Since the liNumber is not of identical sequence like (1,2,3,4...and so on) you can go with a workaround by updating the duplicates by adding the max of the liNumber with the corresponding rownumber as below.
declare #tbl table(id int, wCode int, aCode varchar(50), liNumber int)
insert into #tbl
values(7938636,1,'MZ690577',1)
,(7938636,1,'MZ690577',1)
,(7938636,1,'MZ690577',2)
,(7938636,1,'MZ690577',3)
,(7938636,1,'MZ690577',8)
,(7938636,1,'MZ690577',9)
,(7938636,1,'MZ690577',9)
declare #maxvalue int = (select max(linumber) from #tbl)
;with cte as
(
select *,ROW_NUMBER()over(partition by liNumber order by id,liNumber) partitionedrn
,#maxvalue + ROW_NUMBER()over(order by id,liNumber)maxx
from #tbl
)
update cte set liNumber = maxx
where partitionedrn > 1
select * from #tbl
Note: This is just a sample data and I did not consider your table to its entirety.

Agg Functions while Partitioning Data in SQL

I have a table that looks like this:
store_id industry_id cust_id amount gender
1 100 1000 1.00 M
2 100 1000 2.05 M
3 100 1000 3.15 M
4 100 1000 4.00 M
5 100 2000 5.00 F
6 200 2000 5.20 F
7 200 5000 6.05 F
8 200 6000 7.10 F
Here's the code to create this table:
CREATE TABLE t1(
store_id int,
industry_id int,
cust_id int,
amount float,
gender char
);
INSERT INTO t1 VALUES(1,100,1000,1.00, 'M');
INSERT INTO t1 VALUES(2,100,1000,2.05, 'M');
INSERT INTO t1 VALUES(3,100,1000,3.15, 'M');
INSERT INTO t1 VALUES(4,100,1000,4.00, 'M');
INSERT INTO t1 VALUES(5,100,2000,5.00, 'F');
INSERT INTO t1 VALUES(6,200,2000,5.20, 'F');
INSERT INTO t1 VALUES(7,200,5000,6.05, 'F');
INSERT INTO t1 VALUES(8,200,6000,7.10, 'F');
The question I'm trying to answer is: What is the avg. transaction amount for the top 20% of customers by industry?
This should yield these results:
store_id. industry_id avg_amt_top_20
1 100 4.80
2 100 4.80
3 100 4.80
4 100 4.80
5 100 4.80
6 200 7.10
7 200 7.10
8 200 7.10
Here's what I have so far:
SELECT
store_id, industry_id,
avg(CASE WHEN percentile>=0.80 THEN amount ELSE NULL END) OVER(PARTITION BY industry_id) as cust_avg
FROM(
SELECT store_id, industry_id, amount, cume_dist() OVER(
PARTITION BY industry_id
ORDER BY amount desc) AS percentile
FROM t1
) tmp
GROUP BY store_id, industry_id;
This fails on the GROUP BY (contains nonaggregated column 'amount'). What's the best way to do this?
What is the avg. transaction amount for the top 20% of customers by industry?
Based on this question, I don't see why store_id is in the results.
If I understand correctly, you need to aggregate to get the total by customer. Then you can use NTILE() to determine the top 20%. The final step is aggregating by industry:
SELECT industry_id, AVG(total)
FROM (SELECT customer_id, industry_id, SUM(amount) as total,
NTILE(5) OVER (PARTITION BY industry_id ORDER BY SUM(amount) DESC) as tile
FROM t
GROUP BY customer_id, industry_id
) t
WHERE tile = 1
GROUP BY industry_id

Tsql Update row with same ID and latest date

How can I update row U with the same ID2 and latest date. I am able to write the select clause:
SELECT ID2
, MAX(Date)
FROM TABLE
GROUP BY ID2
but I have problem with update clause.
I have table:
ID1 |ID2 |Date |U
1 1 2015-02-18 NULL
2 1 2015-02-11 NULL
3 2 2015-02-17 NULL
4 2 2015-02-14 NULL
5 2 2015-02-11 NULL
6 3 2015-02-14 NULL
7 3 2015-02-10 NULL
What I want to achive:
ID1 |ID2 |Date |U
1 1 2015-02-18 Update
2 1 2015-02-11 NULL
3 2 2015-02-17 Update
4 2 2015-02-14 NULL
5 2 2015-02-11 NULL
6 3 2015-02-14 Update
7 3 2015-02-10 NULL
I will do this using CTE with Row_Number window function
;with cte as
(
select ID1 ,ID2 ,Date ,U, Row_Number() over(partition by ID2 order by Date desc) rn
From Yourtable
)
update Cte set U = 'Update'
where RN=1
When there is a tie in max date per ID2 then use Dense_rank to update both the records.
;with cte as
(
select ID1 ,ID2 ,Date ,U, Dense_Rank() over(partition by ID2 order by Date desc) rn
From Yourtable
)
update Cte set U = 'Update'
where RN=1
One approach would be to use your SELECT...MAX query to filter the TABLE. Something like this;
UPDATE T SET U = 'UPDATE'
FROM T JOIN (
SELECT ID2
, MAX(Date) AS MaxDate
FROM TABLE
GROUP BY ID2) AS X ON X.ID2 = T.ID2 AND X.MaxDate = T.Date
One point to note about this approach is that if there is more than 1 record with the max Date per ID2 then all records with the max Date will be updated.
interesting way to achieve this:
UPDATE T1
SET T1.U='Update'
FROM TestTable T1
WHERE 0=(SELECT COUNT(1) FROM TestTable T2 WHERE T1.ID2=T.ID2 AND T1.Date<T2.Date)

How to maintain cumulative sum for each User in SQL server

I had a table like
ID UserID rupees time
1 1 200 2014-01-05
---------------------------------
2 1 500 2014-04-06
----------------------------------
3 2 10 2014-05-05
----------------------------------
4 2 20 2014-05-06
----------------------------------
I want the output lie
ID UserID Rupees time CumulativeSum
1 1 200 2014-01-05 200
-------------------------------------------------
2 1 500 2014-04-06 700
-------------------------------------------------
3 2 10 2014-05-06 10
-------------------------------------------------
4 2 20 2014-05-06 30
---------------------------------------------------
How can i get this table as purput
Please try using CTE:
;With T as(
select
*,
ROW_NUMBER() over(partition by UserId order by [time]) RN
from tbl
)
select
UserID,
rupees,
[time],
(select SUM(rupees)
from T b
where b.UserID=a.UserID and b.RN<=a.RN) CumulativeSum
from T a
For records with column value time increasing, try the below query:
select
UserID,
rupees,
[time],
(select SUM(rupees)
from tbl b
where b.UserID=a.UserID and b.[time]<=a.[time]) CumulativeSum
from tbl a
For SQL Server 2012 or later, you can use SUM() with an OVER clause that specifies a ROW clause:
declare #t table (ID int,UserID int,rupees int,[time] date)
insert into #t(ID,UserID,rupees,[time]) values
(1,1,200,'20140105'),
(2,1,500,'20140406'),
(3,2, 10,'20140505'),
(4,2, 20,'20140506')
select
*,
SUM(rupees) OVER (
PARTITION BY UserID
ORDER BY id /* or time? */
ROWS BETWEEN
UNBOUNDED PRECEDING AND
CURRENT ROW)
as total
from #t
Result:
ID UserID rupees time total
----------- ----------- ----------- ---------- -----------
1 1 200 2014-01-05 200
2 1 500 2014-04-06 700
3 2 10 2014-05-05 10
4 2 20 2014-05-06 30
DECLARE #t table (UserID INT,rupees INT,DateKey Date )
INSERT INTO #t VALUES
(1,200,'2014-01-05'),
(2,300,'2014-01-06'),
(2,800,'2014-03-06')
select UserID,
rupees,
DateKey,
(SELECT SUM(rupees)from #t t
where t.rupees <= tt.rupees) from #t tt
GROUP BY UserID,rupees,DateKey
Hope this too helps you.
DECLARE #tab TABLE (id INT,userId INT,rupees INT,[time] Date)
INSERT INTO #tab VALUES
(1,1,200 ,'2014-01-05'),
(2,1,500 ,'2014-04-06'),
(3,2,10 ,'2014-05-05'),
(4,2,20 ,'2014-05-06')
SELECT LU.id,LU.userId,LU.rupees,LU.time,SUM(b.rupees) CumulativeSum
FROM (SELECT *,ROW_NUMBER() OVER (PARTITION BY userId ORDER BY [time]) R FROM #tab) B
JOIN (SELECT *,ROW_NUMBER() OVER (PARTITION BY userId ORDER BY [time]) R FROM #tab) LU
ON B.userId = LU.userId AND B.R <= LU.R
GROUP BY LU.id,LU.userId,LU.rupees,LU.time
Result
I am assuming that you are not using SQL Server 2012, which provides the cumulative sum function. The other answers use some form of the row_number() function, but these seems totally unnecessary. I usually approach cumulative sums using correlated subqueries:
select ID, UserID, rupees, [time],
(select sum(rupees)
from table t2
where t2.UserId = t.UserId and
t2.ID <= t.ID
) as CumulativeSum
from table t;
This requires having a column that uniquely identifies each row, and that seems to be the purpose of id. For performance, I would want to have an index on table(UserId, ID, rupees).
select *, SUM(rupees) OVER (
PARTITION BY UserID
ORDER BY id) as CumSum from #tbl

update of a column based on max date and group by

i have three tables orders, orders_delivered, orders_delivered_sta
and the data in the three tables look like
table orders
orders_id
10
11
12
13
table orders_delivered
orders_delivered_id orders_id
10 1000
10 1001
11 1002
12 1003
12 1004
13 1005
13 1006
13 1007
table orders_delivered_sta
orders_delivered_sta_id orders_delivered_id date now_ind
1 1000 02/11/2011 0
2 1000 01/10/2006 0
3 1000 09/13/2011 0
4 1001 01/19/2010 0
5 1001 02/21/2011 0
6 1002 02/11/2009 0
7 1002 08/27/2010 0
8 1003 07/15/2012 0
9 1004 03/09/2007 0
10 1010 10/01/2010 0
11 1011 03/27/2011 0
12 1012 07/25/2010 0
13 1013 09/18/2004 0
so i need to update orders_delivered_sta table such that now_ind should be 1 for the max date of one orders_delivered_id
like for one orders_delivered_id 1000 the max date is 09/13/2011 for this set of orders_delivered_id and date (1000,09/13/2011) the now_ind should be 1 and if the column orders_delivered_id has one and only one id then that should be changed to 1
there is some data in orders_delivered_sta table which are not in orders and orders_delivered tables those need not to be changed. the orders_delivered_id which are in oreders_delivered table only needs to change
so the desired output should look like
table orders_delivered_sta
orders_delivered_sta_id orders_delivered_id date now_ind
1 1000 02/11/2011 0
2 1000 01/10/2006 0
3 1000 09/13/2011 1
4 1001 01/19/2010 0
5 1001 02/21/2011 1
6 1002 02/11/2009 0
7 1002 08/27/2010 1
8 1003 07/15/2012 1
9 1004 03/09/2007 1
10 1010 10/01/2010 0
11 1011 03/27/2011 0
12 1012 07/25/2010 0
13 1013 09/18/2004 0
table structure:
create table orders
(
order_id int primary key
)
insert into orders select 10
insert into orders select 11
insert into orders select 12
insert into orders select 13
create table orders_delivered
(
orders_delivered_id int primary key,
orders_id int FOREIGN KEY(orders_id)REFERENCES orders (orders_id)
)
insert into orders_delivered select 1000,10
insert into orders_delivered select 1001,10
insert into orders_delivered select 1002,11
insert into orders_delivered select 1003,12
insert into orders_delivered select 1004,12
insert into orders_delivered select 1005,13
insert into orders_delivered select 1006,13
insert into orders_delivered select 1007,13
create table orders_delivered_sta
(
orders_delivered_sta_id int primary key,
orders_delivered_id int FOREIGN KEY(orders_delivered_id)REFERENCES orders_delivered (orders_delivered_id),
date char(10),
now_ind int
)
insert into orders_delivered_sta select 1,1000,'02/11/2011', 0
insert into orders_delivered_sta select 2,1000,'01/10/2006', 0
insert into orders_delivered_sta select 3,1000,'09/13/2011', 0
insert into orders_delivered_sta select 4,1001,'01/19/2010', 0
insert into orders_delivered_sta select 5,1001,'02/21/2011', 0
insert into orders_delivered_sta select 6,1002,'02/11/2009', 0
insert into orders_delivered_sta select 7,1002,'08/27/2010', 0
insert into orders_delivered_sta select 8,1003,'07/15/2012', 0
insert into orders_delivered_sta select 9,1004,'03/09/2007', 0
insert into orders_delivered_sta select 10,1010,'10/01/2010', 0
insert into orders_delivered_sta select 11,1011,'03/27/2011', 0
insert into orders_delivered_sta select 12,1012,'07/25/2010', 0
insert into orders_delivered_sta select 13,1013,'09/18/2004', 0
You could use a CTE and a window MAX():
;
WITH max_dates AS (
SELECT
*,
max_date = MAX(date) OVER (PARTITION BY orders_delivered_id)
FROM orders_delivered_sta
WHERE orders_delivered_id IN (SELECT orders_delivered_id FROM orders_delivered)
)
UPDATE max_dates
SET now_ind = 1
WHERE date = max_date
References:
WITH common_table_expression (Transact-SQL)
OVER Clause (Transact-SQL)
This is the query in MySQL, but translating it to SQL-Server should be straight forward as I am using plain SQL. Notice I have changed the date to be in a different form (YYYY-MM-DD) to avoid castings from string to date.
update t3
set t3.now_ind = 1
where t3.orders_delivered_sta_id in (
select distinct t1.orders_delivered_sta_id from t1
left join (
select t2.orders_delivered_id, max(t2.adate) as MaxDate from t2
group by t2.orders_delivered_id
) t2 on (t1.orders_delivered_id = t2.orders_delivered_id) and (t1.adate = t2.MaxDate)
where t2.orders_delivered_id is not null
) and exists (
select * from o1
join od1 on (o1.order_id = od1.orders_delivered_id)
where (t3.orders_delivered_id = od1.orders_id)
)
Here is an example
Hope this helps
PS: You did need those 3 tables... I'll read questions better next time :)
Try this:
UPDATE orders_delivered_sta
SET now_ind = 1
WHERE orders_delivered_sta_id IN(
SELECT orders_delivered_sta_id
FROM (
SELECT orders_delivered_sta_id,
ROW_NUMBER() OVER(PARTITION BY orders_delivered_id ORDER BY date DESC) AS num
FROM orders_delivered_sta) AS T
WHERE T.num = 1)