Temp table - group by - delete - keep top 10

Temp table - group by - delete - keep top 10 - sql

I have a temp table with 50 000 records. If I do a GROUP BY, with COUNT, it will look like this:
+--------+--------+
|GrpById | Count |
+--------+--------+
| 1 | 10000 |
| 2 | 8000 |
| 3 | 12000 |
| 4 | 9000 |
| 5 | 11000 |
+--------+--------+
I would like to delete some records, so from each Id's (1,2,3,4,5) I would have only 10 records left after deletion.
So eventually If I would make a new GROUP BY with COUNT, I would have something like this:
+--------+--------+
|GrpById | Count |
+--------+--------+
| 1 | 10 |
| 2 | 10 |
| 3 | 10 |
| 4 | 10 |
| 5 | 10 |
+--------+--------+
Can I do it without FETCH NEXT ?

To just preserve an arbitrary 10 per group you can use
WITH CTE AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY GrpById ORDER BY GrpById) AS RN
FROM YourTable
)
DELETE FROM
CTE WHERE RN > 10;
Change the ORDER BY if you need something less arbitrary.

declare #id int;
declare #count int;
set #id =1;
select #count=count(1) from table where id = #id
delete top(#count-10) from table where id = #id
Try the above query for all values if id in variable #id

Related

Linking Related IDs together through two other ID columns

I have a table of about 100k rows with the following layout:
+----+-----------+------------+-------------------+
| ID | PIN | RAID | Desired Output ID |
+----+-----------+------------+-------------------+
| 1 | 80602627 | 1737852-1 | 1 |
| 2 | 80602627 | 34046655-1 | 1 |
| 3 | 351418172 | 33661 | 2 |
| 4 | 351418172 | 33661 | 2 |
| 5 | 351418172 | 33661 | 2 |
| 6 | 351418172 | 34443321-1 | 2 |
| 7 | 491863017 | 26136 | 3 |
| 8 | 491863017 | 34575 | 3 |
| 9 | 491863017 | 34575 | 3 |
| 10 | 661254727 | 26136 | 3 |
| 11 | 661254727 | 26136 | 3 |
| 12 | NULL | 7517 | 4 |
| 13 | NULL | 7517 | 4 |
| 14 | NULL | 7517 | 4 |
| 15 | NULL | 7517 | 4 |
| 16 | NULL | 7517 | 4 |
| 17 | 554843813 | 33661 | 2 |
| 18 | 554843813 | 33661 | 2 |
+----+-----------+------------+-------------------+
The ID column has unique values, with the PIN and RAID columns being two separate identifying numbers used to group linked IDs together. The Desired Output ID column is what I would like SQL to do, essentially looking at both the PIN and RAID columns to spot where there are any relationships between them.
So for example Where Desired Output ID = 2, IDs 3-6 match on PIN = 351418172, and then IDs 17-18 also match as the RAID of 33661 was in the rows for IDs 3-5.
To add as well, NULLs will be in the PIN Column but not in any others.
I did spot a similar question Text however as it is in BigQuery I wasnt sure it would help.
Have been trying to crack this one for a while with no luck, any help massively appreciated.

I suppose DENSE_RANK can solve your problem. Not sure what the combination of PIN and RAID should be, but I think you'll be able to figure it out how to do it like this:
SELECT *,DENSE_RANK( ) over (ORDER BY isnull(pin,id) ),DENSE_RANK( ) over (ORDER BY raid)
FROM accounts

I believe I have found a bit of a bodged solution to this. It runs very slowly as it goes row by row and will only go two links deep on PIN/RAID, but this should be sufficient for 99%+ cases.
Would appreciate any suggestions to speeding it up if anything is immediately obvious.
ID in post above is DebtorNo in Code:
DECLARE #Counter INT = 1
DECLARE #EndCounter INT = 0
IF OBJECT_ID('Tempdb..#OrigACs') IS NOT NULL
BEGIN
DROP TABLE #OrigACs
END
SELECT DebtorNo,
Name,
PostCode,
DOB,
RAJoin,
COALESCE(PIN,DebtorNo COLLATE DATABASE_DEFAULT) AS PIN,
RelatedAssets,
RAID,
PINRelatedAssets
INTO #OrigACs
FROM MIReporting..HC_RA_Test_Data RA
IF OBJECT_ID('Tempdb..#Accounts') IS NOT NULL
BEGIN
DROP TABLE #Accounts
END
SELECT *,
ROW_NUMBER() OVER (ORDER BY CAST(RA.DebtorNo AS INT)) AS Row
INTO #Accounts
FROM #OrigACs RA
ORDER BY CAST(RA.DebtorNo AS INT)
CREATE INDEX Temp_HC_Index ON #OrigACs (RAID,PIN)
SET #EndCounter = (SELECT MAX(Row) FROM #Accounts)
WHILE #Counter <= #EndCounter
BEGIN
IF OBJECT_ID('Tempdb..#RAID1') IS NOT NULL
BEGIN
DROP TABLE #RAID1
END
SELECT *
INTO #RAID1
FROM #OrigACs A
WHERE A.RAID IN (SELECT RAID FROM #Accounts WHERE [Row] = #Counter)
IF OBJECT_ID('Tempdb..#PIN1') IS NOT NULL
BEGIN
DROP TABLE #PIN1
END
SELECT *
INTO #PIN1
FROM #OrigACs A
WHERE A.PIN IN (SELECT PIN FROM #RAID1)
IF OBJECT_ID('Tempdb..#RAID2') IS NOT NULL
BEGIN
DROP TABLE #RAID2
END
SELECT *
INTO #RAID2
FROM #OrigACs A
WHERE A.RAID IN (SELECT RAID FROM #PIN1)
IF OBJECT_ID('Tempdb..#PIN2') IS NOT NULL
BEGIN
DROP TABLE #PIN2
END
SELECT *
INTO #PIN2
FROM #OrigACs A
WHERE A.PIN IN (SELECT PIN FROM #RAID2)
INSERT INTO MIReporting..HC_RA_Final_ACs
SELECT DebtorNo,
Name,
PostCode,
DOB,
RAJoin,
CASE
WHEN PIN = DebtorNo COLLATE DATABASE_DEFAULT THEN NULL
ELSE PIN
END AS PIN,
RelatedAssets,
RAID,
PINRelatedAssets,
COALESCE((SELECT MAX(FRAID) FROM MIReporting..HC_RA_Final_ACs),0) + 1 AS FRAID
FROM #PIN2
SET #Counter = (SELECT MIN([ROW]) FROM #Accounts O WHERE O.DebtorNo NOT IN (SELECT DebtorNo FROM MIReporting..HC_RA_Final_ACs));
END;
SELECT *
FROM MIReporting..HC_RA_Final_ACs
DROP TABLE #OrigACs
DROP TABLE #Accounts
DROP TABLE #RAID1
DROP TABLE #PIN1
DROP TABLE #RAID2
DROP TABLE #PIN2

How to delete the rows with three same data columns and one different data column

I have a table "MARK_TABLE" as below.
How can I delete the rows with same "STUDENT", "COURSE" and "SCORE" values?
| ID | STUDENT | COURSE | SCORE |
|----|---------|--------|-------|
| 1 | 1 | 1 | 60 |
| 3 | 1 | 2 | 81 |
| 4 | 1 | 3 | 81 |
| 9 | 2 | 1 | 80 |
| 10 | 1 | 1 | 60 |
| 11 | 2 | 1 | 80 |
Now I already filtered the data I want to KEEP, but without the "ID"...
SELECT student, course, score FROM mark_table
INTERSECT
SELECT student, course, score FROM mark_table
The output:
| STUDENT | COURSE | SCORE |
|---------|--------|-------|
| 1 | 1 | 60 |
| 1 | 2 | 81 |
| 1 | 3 | 81 |
| 2 | 1 | 80 |

Use the following query to delete the desired rows:
DELETE FROM MARK_TABLE M
WHERE
EXISTS (
SELECT
1
FROM
MARK_TABLE M_IN
WHERE
M.STUDENT = M_IN.STUDENT
AND M.COURSE = M_IN.COURSE
AND M.SCORE = M_IN.SCORE
AND M.ID < M_IN.ID
)
OUTPUT
db<>fiddle demo
Cheers!!

use distinct
SELECT distinct student, course, score FROM mark_table

Assuming you don't just want to select the unique data you want to keep (you mention you've already done this), you can proceed as follows:
Create a temporary table to hold the data you want to keep
Insert the data you want to keep into the temporary table
Empty the source table
Re-Insert the data you want to keep into the source table.

select * from
(
select row_number() over (partition by student,course,score order by score)
rn,student,course,score from mark_table
) t
where rn=1

Use CTE with RowNumber
create table #MARK_TABLE (ID int, STUDENT int, COURSE int, SCORE int)
insert into #MARK_TABLE
values
(1,1,1,60),
(3,1,2,81),
(4,1,3,81),
(9,2,1,80),
(10,1,1,60),
(11,2,1,80)
;with cteDeleteID as(
Select id, row_number() over (partition by student,course,score order by score) [row_number] from #MARK_TABLE
)
delete from #MARK_TABLE where id in
(
select id from cteDeleteID where [row_number] != 1
)
select * from #MARK_TABLE
drop table #MARK_TABLE

sql query to find unique records

I am new to sql and need your help to achieve the below , I have tried using group and count functions but I am getting all the rows in the unique group which are duplicated.
Below is my source data.
CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan
543,xxx-23,12,12,500
543,xxx-23,12,12,501
543,xxx-23,12,12,510
643,xxx-33,11,17,700
343,xxx-33,11,17,700
766,xxx-74,32,1,300
766,xxx-74,32,1,300
877,xxx-32,12,2,300
877,xxx-32,12,2,300
877,xxx-32,12,2,301
Please note :-the source has multiple combinations of unique records, so when I do the count the unique set is not appearing as count =1
example :- the below data in source have 60 records for each combination
877,xxx-32,12,2,300 -- 60 records
877,xxx-32,12,2,301 -- 60 records
I am trying to get the unique unique records, but the duplicate records are also getting in
Below are the rows which should come up in the unique group. i.e. there will be multiple call_Plans for the same combinations of CDR_ID,TelephoneNo,Call_ID,call_Duration. I want to read records for which there is only one call plan for each unique combination of CDR_ID,TelephoneNo,Call_ID,call_Duration,
CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan
643,xxx-33,11,17,700
343,xxx-33,11,17,700
766,xxx-74,32,1,300
Please advice on this.
Thanks and Regards

To do more complex groupings you could also use a Common Table Expression/Derived Table along with windowed functions:
declare #t table(CDR_ID int,TelephoneNo nvarchar(20),Call_ID int,call_Duration int,Call_Plan int);
insert into #t values (543,'xxx-23',12,12,500),(543,'xxx-23',12,12,501),(543,'xxx-23',12,12,510),(643,'xxx-33',11,17,700),(343,'xxx-33',11,17,700),(766,'xxx-74',32,1,300),(766,'xxx-74',32,1,300),(877,'xxx-32',12,2,300),(877,'xxx-32',12,2,300),(877,'xxx-32',12,2,301);
with cte as
(
select CDR_ID
,TelephoneNo
,Call_ID
,call_Duration
,Call_Plan
,count(*) over (partition by CDR_ID,TelephoneNo,Call_ID,call_Duration) as c
from (select distinct * from #t) a
)
select *
from cte
where c = 1;
Output:
+--------+-------------+---------+---------------+-----------+---+
| CDR_ID | TelephoneNo | Call_ID | call_Duration | Call_Plan | c |
+--------+-------------+---------+---------------+-----------+---+
| 343 | xxx-33 | 11 | 17 | 700 | 1 |
| 643 | xxx-33 | 11 | 17 | 700 | 1 |
| 766 | xxx-74 | 32 | 1 | 300 | 1 |
+--------+-------------+---------+---------------+-----------+---+

using not exists()
select distinct *
from t
where not exists (
select 1
from t as i
where i.cdr_id = t.cdr_id
and i.telephoneno = t.telephoneno
and i.call_id = t.call_id
and i.call_duration = t.call_duration
and i.call_plan <> t.call_plan
)
rextester demo: http://rextester.com/RRNNE20636
returns:
+--------+-------------+---------+---------------+-----------+-----+
| cdr_id | TelephoneNo | Call_id | call_Duration | Call_Plan | cnt |
+--------+-------------+---------+---------------+-----------+-----+
| 343 | xxx-33 | 11 | 17 | 700 | 1 |
| 643 | xxx-33 | 11 | 17 | 700 | 1 |
| 766 | xxx-74 | 32 | 1 | 300 | 1 |
+--------+-------------+---------+---------------+-----------+-----+

Basically you should try this:
SELECT A.CDR_ID, A.TelephoneNo, A.Call_ID, A.call_Duration, A.Call_Plan
FROM YOUR_TABLE A
INNER JOIN (SELECT CDR_ID,TelephoneNo,Call_ID,call_Duration
FROM YOUR_TABLE
GROUP BY CDR_ID,TelephoneNo,Call_ID,call_Duration
HAVING COUNT(*)=1
) B ON A.CDR_ID= B.CDR_ID AND A.TelephoneNo=B.TelephoneNo AND A.Call_ID=B.Call_ID AND A.call_Duration=B.call_Duration
You can do a shorter query using Windows Function COUNT(*) OVER ...

Below query will provide you the result
SELECT CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan, COUNT(*)
FROM TABLE_NAME GROUP BY CDR_ID,TelephoneNo,Call_ID,call_Duration,Call_Plan
HAVING COUNT(*) < 2;
It gives you with the count as well. If not required you can remove it.

Select *, count(CDR_ID)
from table
group by CDR_ID, TelephoneNo, Call_ID, call_Duration, Call_Plan
having count(CDR_ID) = 1

How to insert the next highest number

I have a table with an id, tableid, and seqnum. Here is the table structure:
create table ztables
(
id serial primary key,
tableid integer,
seqnum integer
)
and sample data
+----+---------+-------+
| id | tableid | seqnum|
+----+---------+-------+
| 1 | 5 | 1 |
+----+---------+-------+
| 2 | 5 | 2 |
+----+---------+-------+
| 3 | 5 | 3 |
+----+---------+-------+
| 4 | 5 | 9 |
+----+---------+-------+
| 5 | 6 | 1 |
+----+---------+-------+
| 6 | 7 | 1 |
+----+---------+-------+
| 7 | 7 | 2 |
+----+---------+-------+
| 8 | 7 | 3 |
+----+---------+-------+
Lets take tableid 5 as an example. You can see the sequence number increases similar to a database sequence. But it should not increase for the whole table. I only want it to increase per tableid. So, if another record with tableid 5 is inserted the seqnum will be 10. If a record with tableid 7 is inserted the seqnum will be 4. What is the right way to do something like this. I need to account for concurrency issues as well.

Maybe that helps you or gives you at least an idea for a solution:
insert into ztables (
id,
tableid,
seqnum
)
select ? as id,
tableid,
max(seqnum) + 1 as seqnum
from ztables
where tableid = ?
group
by tableid
union
select ? as id,
? as tableid,
1 as seqnum
from ztables
order
by 3 desc
limit 1;
The question marks has to be replaced by constant values.

Count rows grouped by condition in SQL

We have a table like this:
+----+--------+
| Id | ItemId |
+----+--------+
| 1 | 1100 |
| 1 | 1101 |
| 1 | 1102 |
| 2 | 2001 |
| 2 | 2002 |
| 3 | 1101 |
+----+--------+
We want to count how many items each guy has, and show the guys with 2 items or more. Like this:
+----+-----------+
| Id | ItemCount |
+----+-----------+
| 1 | 3 |
| 2 | 2 |
+----+-----------+
We didn't count the guy with Id = 3 because he's got only 1 item.
How can we do this in SQL?

SELECT id, COUNT(itemId) AS ItemCount
FROM YourTable
GROUP BY id
HAVING COUNT(itemId) > 1

Use this query
SELECT *
FROM (
SELECT COUNT(ItemId ) AS COUNT, Id FROM ITEM
GROUP BY Id
)
my_select
WHERE COUNT>1

SELECT id,
count(1)
FROM YOUR_TABLE
GROUP BY id
HAVING count(1) > 1;

select Id, count(ItemId) as ItemCount
from table_name
group by Id
having ItemCount > 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Temp table - group by - delete - keep top 10 - sql

To just preserve an arbitrary 10 per group you can use WITH CTE AS ( SELECT *, ROW_NUMBER() OVER (PARTITION BY GrpById ORDER BY GrpById) AS RN FROM YourTable ) DELETE FROM CTE WHERE RN > 10; Change the ORDER BY if you need something less arbitrary.

declare #id int; declare #count int; set #id =1; select #count=count(1) from table where id = #id delete top(#count-10) from table where id = #id Try the above query for all values if id in variable #id

Related

Linking Related IDs together through two other ID columns

How to delete the rows with three same data columns and one different data column

sql query to find unique records

How to insert the next highest number

Count rows grouped by condition in SQL

Categories

Resources