T-SQL - Remove All Duplicates Except Most Recent (SQL Server 2005)

T-SQL - Remove All Duplicates Except Most Recent (SQL Server 2005) - sql

I have a T-SQL function that will pull all records inserted into a main table within the last 60 minutes and insert them into a table variable. I've then got some more code that will filter that set into another table variable to be returned.
In this set I'm expecting some records to have multiple occurrences but they will have a unique date time. I would like to delete every record that has greater than or equal to 3 occurrences, but keep the one with the most recent datetime value.
EDIT: Sorry, I thought I was more clear than it appears I actually was.
This data is error log data from a legacy system, so duplicates can be expected. The idea is that if they cross a certain threshold they need to be reported.
For example, the below is what should end up in #table_variable_2:
| ColA | ColB | DateTimeColumn | ColC |
---------------------------------------------------
1 | A | B | 2015-08-24 11:06:14.000 | C |
2 | A | B | 2015-08-24 11:18:58.000 | C |
3 | A | B | 2015-08-24 12:07:45.000 | C |
4 | A2 | B2 | 2015-08-24 12:17:24.000 | C2 |
5 | A2 | B2 | 2015-08-24 13:25:32.000 | C2 |
6 | A3 | B3 | 2015-08-24 14:52:10.000 | C3 |
7 | A3 | B3 | 2015-08-24 14:52:34.000 | C3 |
8 | A3 | B3 | 2015-08-24 14:52:45.000 | C3 |
9 | A3 | B3 | 2015-08-24 14:53:15.000 | C3 |
10 | A3 | B3 | 2015-08-24 14:53:32.000 | C3 |
This is what I expect to be returned:
| ColA | ColB | DateTimeColumn | ColC |
---------------------------------------------------
1 | A | B | 2015-08-24 12:07:45.000 | C |
2 | A2 | B2 | 2015-08-24 12:09:35.000 | C2 |
3 | A2 | B2 | 2015-08-24 13:25:32.000 | C2 |
4 | A3 | A3 | 2015-08-24 14:53:32.000 | C3 |
It's okay to have some duplicates, there's just the chance of having a lot of them.
EDIT 2: Solved without the CTE function
DELETE #rtrn_tbl FROM #rtrn_tbl
AS a
INNER JOIN
(
SELECT ColA, ColB, MAX(DateTimeColumn) AS MaxDate, ColC FROM #rtrn_tbl
GROUP BY ColA, ColB, ColC
HAVING COUNT(*) > 2
) AS b
ON a.ColA = b.ColA AND a.ColB=a.ColB and a.ColC = b.ColC
WHERE a.DateTimeColumn <> b.MaxDate;

I think you have to use PARTITION BY ColA, ColB, ColC ORDER BY DateTimeColumn DESC instead, then you can delete all but one (the most recent):
WITH cte AS
(
SELECT ColA, ColB, DateTimeColumn, ColC,
ROW_NUMBER() OVER (PARTITION BY ColA, ColB, ColC ORDER BY DateTimeColumn DESC) AS r_count
FROM #table_variable_2
)
DELETE
FROM cte
WHERE r_count > 1

WITH cte AS (SELECT ColA, ColB, DateTimeColumn, ColC,
ROW_NUMBER() OVER (PARTITION BY ColA, ColB, DateTimeColumn,ColC
ORDER BY ColA, DateTimeColumn desc) AS r_count
FROM #table_variable_2)
, cte1 as (select * from cte where r_count >= 3)
DELETE FROM cte1
WHERE r_count <> 1
You can do one more cte to select all records with r_count>=3.And then delete to preserve the latest record.

Related

SQL Server - Return different values based on row count

I have two tables, table 1 is the target table, I’ve provided the required values in idCode1- idCode3.
Table 2 is the source, each idBill will have one or more idCode. If there were two rows to represent 2 unique idCode, then I want to insert to idCode 1 and 2 respectively.
I was thinking a case statement where I could test for the number of idCode and then insert first value to 1, second value to 2 etc. When I tried a bunch of case, when, exists, count etc it would always return 2 rows if there were 2 idCode values, and the idCode would only insert to idCode1. The end result must be a single row in table1 for each idBill and however many idCode for that idBill inserted to 1, 2, 3.
Sorry I couldn’t post the picture as I don’t have enough points. Here is a rough pipe delimited example of it:
| idTable1 | idBill | idCode1 | idCode2 | idCode3 |
| 1 | 1234 | A1 | A2 | |
| 2 | 1235 | E3 | E2 | A1 |
| idTable2 | idBill | codeId |
| 10 | 1234 | A1 |
| 20 | 1234 | A2 |
| 30 | 1235 | E3 |
| 40 | 1235 | E2 |
| 50 | 1235 | A1 |
Hopefully this makes sense. Thanks so much!

You can use conditional aggregation:
select s.idbill,
max(case when seqnum = 1 then s.codeid end) as codeid1,
max(case when seqnum = 2 then s.codeid end) as codeid2,
max(case when seqnum = 3 then s.codeid end) as codeid3
into target
from (select s.*, row_number() over (partition by idbill order by idtable2) as seqnum
from source s
) s
group by s.idbill;

SQL: Select Most Recent Sequentially Distinct Value w/ Grouping

I am having trouble writing a query that would select the last "new" sequentially distinct value (let's call this column Col A) grouped based on another column (Col B). Since this is a bit ambiguous/confusing, here is an example to explain (assume row number is indicative of sequence inside groups; in my issue the rows are ordered by date):
|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1 | A | A |
| 2 | B | A |
| 3 | C | A |
| 4 | B | B |
| 5 | A | B |
| 6 | B | B |
Would select:
| 3 | C | A |
| 6 | B | B |
Note that although B also appears in row 4, the fact that row 5 contains A means that the B in row 6 is sequentially distinct. But if table looked like this:
|--------|-------|-------|
| RowNum | Col A | Col B |
|--------|-------|-------|
| 1 | A | A |
| 2 | B | A |
| 3 | C | A |
| 4 | B | B |
| 5 | A | B |
| 6 | A | B | <--
Then we would want to select:
| 3 | C | A |
| 5 | A | B |
I think that this would be an easier problem if I wasn't concerned with values being distinct but not sequential. I'm not really sure how to even consider sequence when making a query.
I have attempted to solve this by calculating the min/max row numbers where each value of Col A appears. That calculation (using the second sample table) would produce a result like this:
|--------|--------|--------|--------|
| ColA | ColB | MinRow | MaxRow |
|--------|--------|--------|--------|
| A | A | 1 | 1 |
| B | A | 2 | 2 |
| C | A | 3 | 3 |
| A | B | 5 | 6 |
| B | B | 4 | 4 |
A solution raised in a related post (SQL: Select Row with Last New Sequentially Distinct Value) went on a similar path, essentially taking the most recent RowNum which differs from the last ColA and then picks the next row. However, in that question I failed to address the need for the query to work for multiple groups, hence the new post.
Any help with this problem, if it is at all possible to do in SQL, would be greatly appreciated. I am running SQL 2008 SP4.

Hmmm . . . One method is to get the last value. Then choose all the last rows with that value and aggregate:
select min(rownum), colA, colB
from (select t.*,
first_value(colA) over (partition by colB order by rownum desc) as last_colA
from t
) t
where rownum > all (select t2.rownum
from t t2
where t2.colB = t.colB and t2.colA <> t.last_colA
)
group by colA, colB;
Or, without the aggregation:
select t.*
from (select t.*,
first_value(colA) over (partition by colB order by rownum desc) as last_colA,
lag(colA) over (partition by colB order by rownum) as prev_clA
from t
) t
where rownum > all (select t2.rownum
from t t2
where t2.colB = t.colB and t2.colA <> t.last_colA
) and
(prev_colA is null or prev_colA <> colA);
But in SQL Server 2008, let's treat this as a gaps-and-islands problem:
select t.*
from (select t.*,
min(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as min_rownum_group,
max(rownum) over (partition by colB, colA, (seqnum_b - seqnum_ab) ) as max_rownum_group
from (select t.*,
row_number() over (partition by colB order by rownum) as seqnum_b,
row_number() over (partition by colB, colA order by rownum) as seqnum_ab,
max(rownum) over (partition by colB order by rownum) as max_rownum
from t
) t
) t
where rownum = min_rownum_group and -- first row in the group defined by adjacent colA, colB
max_rownum_group = max_rownum -- last group for each colB;
This identifies each of the groups using a difference of row numbers. It calculates the maximum rownum for the group and overall in the data. These are the same for the last group.

SQL SUM Group by - based on 'group' from another table

I hope I am explaining this correctly.
I have 2 tables, with
first table (table1)
+------------+------+-------+-------+
| Date | Item | Block | Total |
+------------+------+-------+-------+
| 2017-01 | a1 | B1 | 10.0 |
| 2017-01 | a2 | B1 | 20.0 |
| 2017-01 | a3 | B2 | 30.0 |
| 2017-02 | a1 | B1 | 40.0 |
| 2017-02 | a2 | B1 | 50.0 |
| 2017-02 | a3 | B2 | 60.0 |
+------------+------+-------+-------+
second table (table2)
+------------+------+
| Item Group | Item |
+------------+------+
| IG1 | a1 |
| IG1 | a2 |
| IG2 | a2 |
| IG2 | a3 |
+------------+------+
*Note that, one item group has multiple items.
The items may appear several time in different item groups.
Now, I need to sum the total (table1), based on Item Group (table2), Date and Block, in the end, final table:
+---------+------------+-------+-------+
| Date | Item Group | Block | Total |
+---------+------------+-------+-------+
| 2017-01 | IG1 | B1 | 30.0 |
| 2017-01 | IG2 | B1 | 20.0 |
| 2017-01 | IG1 | B2 | 0.0 |
| 2017-01 | IG2 | B2 | 30.0 |
+---------+------------+-------+-------+
How to achieve this with SQL query?
EDIT:
OK. It seems that this is an easy one. Shame on me. I didn't know the join and Group By can be applied that way. SQL is really awesome. That saves tons of coding.

A join and a simple group by should work for you in this case:
select t1.Date, t2.ItemGroup, t1.Block, sum(t1.Total) Total
from table1 t1 join table2 t2 on t1.Item = t2.Item
group by t1.Date, t2.ItemGroup, t1.Block

Simple group by after a join
SELECT f.date, s.[Item Group], f.[Block], Total = sum(f.Total) from firsttable f
INNER JOIN secondtable s
ON f.item = s.item
GROUP BY f.date, s.[Item Group], f.[Block]

We can use Window Clause Over() and Partition By to achieve this. Since your query is straight forward and used all the columns in select list and Group by clause, we can use a simple join and Group by clause.
Anyways here is the query using Over Clause:
select DISTINCT t1.dateyear, t2.ItemGroup, t1.Block,
sum(Total) over(PARTITION BY t2.ItemGroup, t1.dateyear, t1.Block) as GroupTotal
FROM tab_1 t1
JOIN tab_2 t2
ON t1.Item = t2.Item

I have a similar question. I want to select data from a table based on another table.
This is what I have:
Table1
Articlename, group, status
Table2
Articlename, date
This is what I want:
If an article in Table1 has a group and status=1 Count posts in Table2 between date1 and date2, group result by the group in Table1
The result should look something like this
Group Quantity
-----------------
GR1 250
GR2 50
GR3 110

select most recent rows from one table 1 and join to get rows from table 2

I need N number of recent data selecting some columns from table 1 and some from table 2.
For example, I need 2 most recent rows from table 1 and table 2.
Table 1
id | Fname | LName
------------------------
1 | F1 | L1
2 | F2 | L2
3 | F3 | L3
4 | F4 | L4
Table 2
id | City | Date
---+----------------------
1 | C1 | 02/23/2014
2 | C2 | 02/01/2014
3 | C3 | 02/20/2014
4 | C4 | 02/19/2014
Desired Result
Fname| City | Date
----------------------------
F1 | C1 | 02/23/2014
F3 | C3 | 02/20/2014

I suppose that you have the same IDs in both T1 and T2 (why two different tables?).
If so:
SELECT FNAME, CITY, MY_DATE
FROM (SELECT T1.FNAME, T2.CITY, T2.MY_DATE
FROM T2, T1
WHERE T2.ID = T1.ID
ORDER BY T2.MY_DATE DESC)
WHERE ROWNUM <= 2;
Otherwise, please explain what's the difference between T1 and T2...

Sql Server get first matching value

I have two tables History and Historyvalues:
History
HID(uniqeidentifier) | Version(int)
a1 | 1
a2 | 2
a3 | 3
a4 | 4
Historyvalues
HVID(uniqeidentifier) | HID(uniqeidentifier) | ControlID(uniqeidentifier) | Value(string)
b1 | a1 | c1 | value1
b2 | a2 | c1 | value2
b3 | a2 | c2 | value3
Now I Need a query where I can get a list with the last historyvalue of each control from a specific Version like:
Get the last values from Version 3 -> receiving ->
HVID | ControlID | Value
b2 | c1 | value2
b3 | c2 | value3
I tried something like this:
Select HVID, ControlId, max(Version), Value from
(
Select HVID, ControlId, Version, Value
from History inner JOIN
Historyvalues ON History.HID = Historyvalues.HID
where Version <= 3
) as a
group by ControlId
order by Version desc
but this does not work.
Are there any ideas?
Thank you very much for your help.
Best regards

Latest version from each control with your specific Version (WHERE t1.Version <= 3)
Query:
SQLFIDDLEExample
SELECT HVID, ControlId, Version, Value
FROM
(
SELECT t2.HVID, t2.ControlId, t1.Version, t2.Value,
ROW_NUMBER() OVER(PARTITION BY t2.ControlId ORDER BY t1.Version DESC) as rnk
FROM History t1
JOIN Historyvalues t2
ON t1.HID = t2.HID
WHERE t1.Version <= 3
) AS a
WHERE a.rnk = 1
ORDER BY a.Version desc
Result:
| HVID | CONTROLID | VERSION | VALUE |
|------|-----------|---------|--------|
| b2 | c1 | 2 | value2 |
| b3 | c2 | 2 | value3 |

here is your solution
Select Historyvalues.HVID,Historyvalues.ControlID,Historyvalues.Value
from Historyvalues
inner join History on Historyvalues.hid=History.hid
where Historyvalues.hvid in (
select MAX(Historyvalues.hvid) from Historyvalues
inner join History on Historyvalues.hid=History.hid
group by ControlID)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

T-SQL - Remove All Duplicates Except Most Recent (SQL Server 2005) - sql

Related

SQL Server - Return different values based on row count

SQL: Select Most Recent Sequentially Distinct Value w/ Grouping

SQL SUM Group by - based on 'group' from another table

select most recent rows from one table 1 and join to get rows from table 2

Sql Server get first matching value

Categories

Resources