Group By, Count and Delete on Consecutive Records

Group By, Count and Delete on Consecutive Records - sql

I have a tricky SQL question. This is based on SQL server 2008 R2.
From a Log table, I have to combine consecutive records which have the same messages (MSG), count how many messages are combined (COUNT), and then delete the duplicate messages. This also needs to be done within a date range, so that any records outside of that range is left alone.
To make this more understandable, here is a small example of the data:
ID DATE MSG COUNT
1 2013-08-17 mail NULL
2 2013-08-17 mail NULL
3 2013-08-17 www NULL
4 2013-08-18 www NULL
5 2013-08-18 www NULL
6 2013-08-18 www NULL
7 2013-08-18 mail NULL
8 2013-08-18 www NULL
9 2013-08-19 mail NULL
10 2013-08-19 mail NULL
11 2013-08-20 mail NULL
12 2013-08-20 mail NULL
13 2013-08-21 www NULL
14 2013-08-22 mail NULL
15 2013-08-22 mail NULL
16 2013-08-23 mail NULL
17 2013-08-23 mail NULL
18 2013-08-23 mail NULL
The result should look like the followng:
ID DATE MSG COUNT
1 2013-08-17 mail NULL
2 2013-08-17 mail NULL
3 2013-08-17 www NULL
6 2013-08-18 www 3
7 2013-08-18 mail 1
8 2013-08-18 www 1
12 2013-08-20 mail 4
13 2013-08-21 www 1
15 2013-08-22 mail 2
16 2013-08-23 mail NULL
17 2013-08-23 mail NULL
18 2013-08-23 mail NULL
So, basically, the query should
handle data only within a given date range (in this example from 2013-08-18 to 2013-08-22)
combine consecutive rows based on the text of the MSG field
count the combined data and set the value in the COUNT field
delete the duplicate records (in this example e.g ID 6 stays, but ID 5 and ID 4 should be deleted)
As I am not an expert in SQL, I would really appreciate any help, suggestions or SQL queries.

Try this:
DROP TABLE #temp
GO
select
*
into #temp
from (
select '1' as id,'2013-08-17' as [date], 'mail' as msg,'NULL' as [count] union all
select '2','2013-08-17','mail','NULL' union all
select '3','2013-08-17','www','NULL' union all
select '4','2013-08-18','www','NULL' union all
select '5','2013-08-18','www','NULL' union all
select '6','2013-08-18','www','NULL' union all
select '7','2013-08-18','mail','NULL' union all
select '8','2013-08-18','www','NULL' union all
select '9','2013-08-19','mail','NULL' union all
select '10','2013-08-19','mail','NULL' union all
select '11','2013-08-20','mail','NULL' union all
select '12','2013-08-20','mail','NULL' union all
select '13','2013-08-21','www','NULL' union all
select '14','2013-08-22','mail','NULL' union all
select '15','2013-08-22','mail','NULL' union all
select '16','2013-08-23','mail','NULL' union all
select '17','2013-08-23','mail','NULL' union all
select '18','2013-08-23','mail','NULL'
) x
GO
select
t.*,
rwn
from #temp t
join (
select
id, [date], [msg], [rwn] = row_number() over(partition by [date], [msg] order by id )
from #temp
where 1=1
and [date] between '2013-08-18' and '2013-08-22'
) x
on t.id=x.id
order by
t.date, t.msg
Just modify it for UPDATE and then delete all rows where rwn>1
EDIT:
Your data type is probably text, so you get sort/comparison the errors. Do you really need text? It is a large object data type (blob), which can store several GB of text. Try changing this to varchar(8000) for example, or if these are really that big messages, varchar(max) will do, too

Hi please try this hope it helps you, The way i understand is u need to group and remove duplicate and retain 1 only. sorry about my english
DECLARE #Table_2 TABLE (ID INT, [DATE] date, MSG Varchar(50), [COUNT] int)
Declare #fromDate as date = '2013-08-18'
Declare #toDate as date = '2013-08-22'
INSERT INTO #Table_2 (ID, [DATE], MSG, [COUNT])
SELECT MAX(DISTINCT ID) AS ID, DATE, MSG, COUNT(DATE) AS COUNT
FROM dbo.Table_1
where [DATE] between #fromDate and #toDate
GROUP BY DATE, MSG
UPDATE Table_1
SET [COUNT] = T2.COUNT
FROM Table_1 AS T1 INNER JOIN
#Table_2 AS T2
ON T1.ID = T2.ID
WHERE T1.ID = T2.ID
DELETE T1
FROM Table_1 AS T1
FULL OUTER JOIN #Table_2 AS T2
ON T1.DATE = T2.DATE AND T1.MSG = T2.MSG
WHERE (T1.DATE = T2.DATE AND T1.MSG = T2.MSG) AND T1.ID != T2.ID

My idea is to do it with 2 queries:
(i) The first one was to only count and update the records.
(ii) The second one was to delete all the records given the date range that had a NULL value on COUNT column.
EDIT: I did the step (i), but I couldn't make it keep the COUNT value NULL for the ones to be deleted. It updates all rows with the COUNT. Now you just have to DELETE the right rows.
Step (i):
(For MySQL)
UPDATE tab ta JOIN
(SELECT date, msg, COUNT(*) AS cnt FROM tab GROUP BY date, msg) tb
SET ta.count = tb.cnt
WHERE ta.date = tb.date AND ta.msg = tb.msg AND
ta.date BETWEEN
DATE('2013-08-18') AND DATE('2013-08-21');
PS: The syntax for DATE I used is for MySQL, you might adapt it for MS SQL Server.
(For MS SQL Server)
UPDATE ta
SET ta.count = tb.cnt
FROM tab ta,
(SELECT date, msg, COUNT(*) AS cnt FROM tab GROUP BY date, msg) tb
WHERE ta.date = tb.date AND ta.msg = tb.msg AND ta.date
BETWEEN CAST('2013-08-18' AS DATE) AND CAST('2013-08-20' AS DATE);

Related

SQL Query Optimization to retrieve non-null entries

Need help in optimizing SQL query, I have figured a way to solve the problem by using UNIONALL, but my worry is that performance will be impacted as the record set is huge in production env.
I have a table of records in below format, I need help in retrieving the non-null entries if available otherwise pick the null entries.
In the below case; Query should exclude RowIds 1,7 and retrieve everything else, i.e because there are non-null entries for that combination.
RowID
UniqueID
TrackId
1
325
NULL
2
325
8zUAC
3
325
99XER
4
427
NULL
5
632
2kYCV
6
533
NULL
7
774
NULL
8
774
94UAC
--UNIONALL Command
SELECT A.* FROM
( SELECT * FROM [MY_PKG].[TEMP] WHERE TRACKID is not null) A
WHERE A.UNIQUEID in
( SELECT UNIQUEID FROM [MY_PKG].[TEMP] WHERE TRACKID is null
)
UNION ALL
SELECT B.* FROM
( SELECT * FROM [MY_PKG].[TEMP] WHERE TRACKID is null) B
WHERE B.UNIQUEID not in
( SELECT UNIQUEID FROM [MY_PKG].[TEMP] WHERE TRACKID is not null
)
Temp Table Creation Scrip
CREATE TABLE MY_PKG.TEMP
( UNIQUEID varchar(3),
TRACKID varchar(5)
);
INSERT INTO MY_PKG.TEMP
( UNIQUEID, TRACKID)
VALUES
('325',null),
('325','8zUAC'),
('325','99XER'),
('427',null),
('632','2kYCV'),
('533','2kYCV'),
('774',null),
('774','94UAC')

You can use the NOT EXISTS operator with a correlated subquery:
SELECT * FROM TEMP T
WHERE TRACKID IS NOT NULL
OR (TRACKID IS NULL
AND NOT EXISTS(
SELECT 1 FROM TEMP D
WHERE D.UNIQUEID = T.UNIQUEID AND
D.TRACKID IS NOT NULL)
)
See demo

Access SQL Subquery Criteria based on Next Record

I have a column of QualityCheckTimes. I also have a different table with the StartTimes and EndTimes of ProductionSkids.
I need a query that returns for each QualityCheckTime, the minimum SkidID and maximum SkidID based on their StartTimes and EndTimes.
Sample Data
QCCheckTimes
12:00 AM
1:00 AM
2:00 AM
SkidID SkidStartTime SkidEndTime
1 12:05 AM 12:20 AM
2 12:21 AM 12:40 AM
3 12:41 AM 12:50 AM
4 12:51 AM 1:06 AM
Expected Output:
QCCheckTimes MinSkidID MaxSkidID
12:00 AM Skid1 Skid3
1:00 AM Skid4 ...
2:00 AM ...
I've tried a few things, but the crux of it is that I need to find a way to get all the matching Skid Times between two QualityCheck times, with those QualityTimes being on separate rows.
SELECT...
WHERE [SkidStartDateTime] >= [QualitySamples_tbl].[SampleDateTime]
AND [SkidEndDateTime] < NEXT?? [QualitySamples_tbl].[SampleDateTime]);

I don't use access but in SQL Server you have two functions to achieve this query LAG and LEAD, previous and next row, you can see a tutorial here http://www.c-sharpcorner.com/UploadFile/f82e9a/lag-and-lead-functions-in-sql-server/
so in SQL you can do something like this (this example is using integers not time)
----------------your single table-----------------
declare #a table(id int)
insert into #a
select 1 union
select 5 union
select 10 union
select 14 union
select 17 union
select 20
----------------table with ranges-----------------
declare #b table (start int, finish int)
insert into #b
select 1,4 union
select 2,5 union
select 5,8 union
select 10,15
----------------Aux Table--------------------------
declare #a_PlusNext table (id int, idNext int)
insert into #a_PlusNext
SELECT id,LEAD(id) OVER(ORDER BY id) nextId from #A
----------------------Final Query------------------
SELECT
*
FROM #a_PlusNext
INNER JOIN #b on start >= id and finish <= idNext

You can use subqueries and Hour() function to achieve this.
Try this:
table1 contains: QCCheckTimes
table2 Contains: SkidID,SkidStartTime,SkidEndTime
Code:
SELECT table1.qcchecktimes,
Iif([minofid] IS NULL, "...", "skid" & [minofid]) AS MinSkidID,
Iif([c].[maxofid] = [b].[minofid], "...",
Iif([maxofid] IS NULL, "...", "skid" & [maxofid])) AS MaxSkidID
FROM (table1
LEFT JOIN (SELECT table1.qcchecktimes,
Min(table2.skidid) AS MinOfID
FROM table1,
table2
WHERE (( ( Hour([skidendtime]) ) = Hour([qcchecktimes]) ))
GROUP BY table1.qcchecktimes) AS b
ON table1.qcchecktimes = b.qcchecktimes)
LEFT JOIN (SELECT table1.qcchecktimes,
Max(table2.skidid) AS MaxOfid
FROM table1,
table2
WHERE (( ( Hour([skidendtime]) ) = Hour([qcchecktimes]) ))
GROUP BY table1.qcchecktimes) AS c
ON table1.qcchecktimes = c.qcchecktimes
GROUP BY table1.qcchecktimes,
Iif([minofid] IS NULL, "...", "skid" & [minofid]),
Iif([c].[maxofid] = [b].[minofid], "...",
Iif([maxofid] IS NULL, "...", "skid" & [maxofid]));

Copy Data and increment PK in destination table

I have a temp table with data that needs to be split into 3 other tables. Each of those tables has a primary key that is not shared with each other or with the temp table. Here is a small sampling:
Table 1
RSN AGENT STATUS STATUS DECRIPTION
0 280151 51 Terminated
1 86 57 C/O Comp Agent Dele
2 94 57 C/O Comp Agent Dele
3 108 51 Terminated
Table 2
RSN AGENT CITY
1 10 Englewood
2 123 Jackson
3 35 Eatontown
4 86 Trenton
Table 3
RSN AGT SIGN NO_EMP START_DATE
0 241008 Y 1 2002-10-31 00:00:00.000
1 86 Y 0 2002-10-24 09:51:10.247
2 94 Y 0 2002-10-24 09:51:10.247
3 108 Y 0 2002-10-24 09:51:10.247
I need to check each table to see if the data in the temp table exists and if it does not I want to insert those rows with a RSN# starting with the max number in that table. So if I have 5000 records in the first table and I am adding 5000 new rows they will be numbered 5001 through 10000.
I then need to check to see if any columns have changed for matching rows and update them.
Thanks in advance for your assistance.
Scott

You have to repeat the code bellow for T1, 2 and 3 and update matching and not matching columns.
Insert new value:
Insert Into Table1(col1, col2, ...)
Select t.col1, t.col2
From temp as t
Left Join table1 as t1 On t.matchcol1 = t1.matchcol1 and t.matchcol2 = t1.matchcol2
Where t.col1 is null
replace matchcol1 by a list of matching columns between T and T1
update:
Update t1 set col1 = t.col1, t.col2 = t1.col2, ...
From table1 as t1
Inner Join temp as t On t.matchcol1 = t1.matchcol1 and t.matchcol2 = t1.matchcol2 and ...
Where col1 <> t.col1 or t.col2 <> t1.col2 or ...
This may work as well:
I am not sure you really need to update something or just insert and how you link temp and table1 in order to know if it has been changed.
Insert Into Table1(RSN, AGENT, STATUS, STATUS, DECRIPTION)
Select (Select max(RSN) From table1) + Row_number() over(order by agent)
, AGENT, STATUS, STATUS, DECRIPTION
From (
Select AGENT, STATUS, STATUS, DECRIPTION From TempTable
Except
Select AGENT, STATUS, STATUS, DECRIPTION From Table1
) as t1
Or you can upgrade to SQL Server 2008 and use Merge. It would be a lot easier

I ended up adding 4 new columns to my staging table; a temp rsn#, which is an identity column starting with 1, and an rsn# for each of my 3 destination tables. I created a variable getting the max value from each table and then added that to my temp rsn#.

edit and Update records using reference id

i have table with multiple records in a field name Comments... with my aspx code the data in comments column gets inserted in three rows with different requirementcommentid but the field comment will remain same
to retrieve distinct i used this query
SELECT distinct (
select top 1 requirementcommentid
from Requirementcomment
where requirementcomment=rc.requirementcomment
and fcr.SectionID in(
SELECT sectionid
FROM [dbo].udfGetSectionID_allComYear(2151)
)
AND fcr.FirmID = 20057
),
rc.IsRejected,
fcr.SectionID,
rc.UserID,
rc.RequirementComment,
convert(varchar(25), dateadd(hour, -5, rc.InsertDate),101) as InsertDate,
Department.DeptName,
FirmUser.DepartmentID,
rc.FirmComplianceYearID
FROM RequirementComment rc
INNER JOIN FirmComplianceRequirement fcr ON fcr.FirmComplianceRequirementID = rc.FirmComplianceRequirementID
INNER JOIN FirmUser ON FirmUser.FirmUserID =rc.UserID
INNER JOIN Department ON Department.DeptID = FirmUser.DepartmentID WHERE rc.IsRejected = 1
AND fcr.SectionID in(SELECT sectionid FROM [dbo].udfGetSectionID_allComYear (2151))
AND fcr.FirmID = 20057 AND rc.RequirementComment!=''
if i want to edit this distinct comment and update it.how can i do this... as only one comment row get edited remaining two rows value in field comment remain the same...!
i want remaining data to be updated automatically if i clicked on edit and updated only single record

If you can not solve this with a procedure when storing, or in .NET, consider to use a trigger. I have made a generic example, since your example code is a bit complex :)
CREATE TABLE TMP_TriggerTable
(
ID INT IDENTITY(1,1) PRIMARY KEY
, ID2 INT NOT NULL
, Comment VARCHAR(255) NOT NULL
)
GO
INSERT INTO TMP_TriggerTable
SELECT 1, 'asd'
UNION ALL
SELECT 1, 'asd'
UNION ALL
SELECT 1, 'asd'
UNION ALL
SELECT 2, 'asd'
UNION ALL
SELECT 2, 'asd'
UNION ALL
SELECT 2, 'asd'
GO
CREATE TRIGGER TRG_TMP_TriggerTable ON TMP_TriggerTable
AFTER UPDATE
AS
BEGIN
WITH InsertedIDPriority AS
(
--Handle if more than one related comment was updated
SELECT Prio = ROW_NUMBER() OVER (PARTITION BY ID2 ORDER BY ID)
, ID
, ID2
, Comment
FROM INSERTED
)
UPDATE t SET Comment = i.Comment FROM TMP_TriggerTable t
JOIN InsertedIDPriority i ON
t.ID2 = i.ID2 --Select all related comments
AND t.ID != i.ID2 --No need to update main column two times
AND i.Prio = 1 --Handle if more than one related comment was updated
END
GO
UPDATE TMP_TriggerTable SET Comment = 'asd2' WHERE ID = 1
/*
SELECT * FROM TMP_TriggerTable
--Returns--
ID ID2 Comment
1 1 asd2
2 1 asd2
3 1 asd2
4 2 asd
5 2 asd
6 2 asd
*/

How do you find a missing number in a table field starting from a parameter and incrementing sequentially?

Let's say I have an sql server table:
NumberTaken CompanyName
2 Fred 3 Fred 4 Fred 6 Fred 7 Fred 8 Fred 11 Fred
I need an efficient way to pass in a parameter [StartingNumber] and to count from [StartingNumber] sequentially until I find a number that is missing.
For example notice that 1, 5, 9 and 10 are missing from the table.
If I supplied the parameter [StartingNumber] = 1, it would check to see if 1 exists, if it does it would check to see if 2 exists and so on and so forth so 1 would be returned here.
If [StartNumber] = 6 the function would return 9.
In c# pseudo code it would basically be:
int ctr = [StartingNumber]
while([SELECT NumberTaken FROM tblNumbers Where NumberTaken = ctr] != null)
ctr++;
return ctr;
The problem with that code is that is seems really inefficient if there are thousands of numbers in the table. Also, I can write it in c# code or in a stored procedure whichever is more efficient.
Thanks for the help

Fine, if this question isn't going to be closed, I may as well Copy and paste my answer from the other one:
I called my table Blank, and used the following:
declare #StartOffset int = 2
; With Missing as (
select #StartOffset as N where not exists(select * from Blank where ID = #StartOffset)
), Sequence as (
select #StartOffset as N from Blank where ID = #StartOffset
union all
select b.ID from Blank b inner join Sequence s on b.ID = s.N + 1
)
select COALESCE((select N from Missing),(select MAX(N)+1 from Sequence))
You basically have two cases - either your starting value is missing (so the Missing CTE will contain one row), or it's present, so you count forwards using a recursive CTE (Sequence), and take the max from that and add 1
Tables:
create table Blank (
ID int not null,
Name varchar(20) not null
)
insert into Blank(ID,Name)
select 2 ,'Fred' union all
select 3 ,'Fred' union all
select 4 ,'Fred' union all
select 6 ,'Fred' union all
select 7 ,'Fred' union all
select 8 ,'Fred' union all
select 11 ,'Fred'
go

I would create a temp table containing all numbers from StartingNumber to EndNumber and LEFT JOIN to it to receive the list of rows not contained in the temp table.

If NumberTaken is indexed you could do it with a join on the same table:
select T.NumberTaken -1 as MISSING_NUMBER
from myTable T
left outer join myTable T1
on T.NumberTaken= T1.NumberTaken+1
where T1.NumberTaken is null and t.NumberTaken >= STARTING_NUMBER
order by T.NumberTaken
EDIT
Edited to get 1 too

1> select 1+ID as ID from #b as b
where not exists (select 1 from #b where ID = 1+b.ID)
2> go
ID
-----------
5
9
12
Take max(1+ID) and/or add your starting value to the where clause, depending on what you actually want.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Group By, Count and Delete on Consecutive Records - sql

Related

SQL Query Optimization to retrieve non-null entries

Access SQL Subquery Criteria based on Next Record

Copy Data and increment PK in destination table

edit and Update records using reference id

How do you find a missing number in a table field starting from a parameter and incrementing sequentially?

Categories

Resources