Column Operations and eliminating the duplicate - sql

I have this particular table below . I want to eliminate duplicate Course from the group 2, cause it is in group 1. Basically if the course is mapped on to Group 1 which is mandatory, we have to only consider that and not in any other group. I will have to check repeating courses first and then remove the duplicate course which is not mandatory.
Program Group Course Mandatory
Program1 1 a YES
Program1 1 b YES
Program1 1 c YES
Program1 2 d NO
Program1 2 a NO
Program1 2 e NO
Program1 3 f YES
I am not able to figure out same column operations , or my mind is not working today(:-) )
I have tried using Count Operation and creating a flag for duplicate rows ,but cannot do it with 'Group' in the group by clause.
Output:
Program Group Course Mandatory
Program1 1 a YES
Program1 1 b YES
Program1 1 c YES
Program1 2 d NO
Program1 2 e NO
Program1 3 f YES
EDIT
How can we
Check for duplicate records and delete it from only one particular group.

You can use a ROW_NUMBER() function to achieve this:
SELECT *
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY COURSE ORDER BY [Group] ) as RowRank
FROM table
)sub
WHERE RowRank = 1
Demo: SQL Fiddle
Edit: ROW_NUMBER assigns a number to each row. Numbering will start at 1 for each grouping you assign via the PARTITION BY portion, in this case each COURSE would have a number 1 and go up. The order of the numbers is determined by the ORDER BY portion, in this case the lowest [Group] gets the 1.

Editing my answer to reflect clarified requirements.
select *
from #TableName t
where
(Mandatory = 'YES' or
not exists (
select *
from #TableName
where
Program = t.Program and
Course = t.Course and
[Group] != t.[Group] and
Mandatory = 'YES'
))
based on your comments below here's another sample to try
;with group1 as (
select * from #Table where [Group] = 1
),
groups12 as (
select * from group1
union all
select * from #Table t where [Group] = 2 and not exists (select * from group1 where Program = t.Program and Course = t.Course)
),
groups123 as (
select * from groups12
union all
select * from #Table t where [Group] = 3 and not exists (select * from groups12 where Program = t.Program and Course = t.Course)
),
groups1234 as (
select * from groups123
union all
select * from #Table t where [Group] = 4 and not exists (select * from groups123 where Program = t.Program and Course = t.Course)
)
select * from groups1234
This query pulls rows for groups 1-4 in order and only when the Program/Course hasn't already appeared in a lower number group.

Related

Finding updates in a table using Self-Join

I have a table as shown below
tablename - property
|runId|listingId|listingName
1 123 abc
1 234 def
2 123 abcd
2 567 ghi
2 234 defg
As you can see in above code there is a runId and there is a listing Id. I am trying to fetch for a particular runId which are the new listings added (In this case for runId 2 its 4th row with listing id 567 ) and which are the listing Ids that are update (In this case its row 3 and row 5 with listingId 123 and 234 respectively)
I am trying self join and it is working fairly for new updates but new additions are giving me trouble
SELECT p1.* FROM property p1
INNER JOIN property p2
ON p1.listingid = p2.listingid
WHERE p1.runid=456 AND p2.runid!=456
The above query provides me correct updated records in the table. But I am not able to find new listing. I used p1.listingid != p2.listingId , left outer join, still wont work.
I would use the ROW_NUMBER() analytical function for it.
SELECT
T.*
FROM
(
SELECT
T.*,
CASE
WHEN ROW_NUMBER() OVER(
PARTITION BY LISTINGID
ORDER BY
RUNID
) = 1 THEN 'INSERTED'
ELSE 'UPDATED'
END AS OPERATION_
FROM
PROPERTY
)
WHERE
RUNID = 2
-- AND OPERATION_ = 'INSERTED'
-- AND OPERATION_ = 'UPDATED'
This will provide the result as updated if listingid is added in any of the previous runid
Cheers!!
You may try this.
with cte as (
select row_number() over (partition by listingId order by runId) as Slno, * from property
)
select * from property where listingId not in (
select listingId from cte as c where slno>1
) --- for new listing added
with cte as (
select row_number() over (partition by listingId order by runId) as Slno, * from property
)
select * from property where listingId in (
select listingId from cte as c where slno>1
) --- for modified listing
For this, I would recommend exists and not exists. For updates:
select p.*
from property p
where exists (select 1
from property p2
where p2.listingid = p.listingid and
p2.runid < p.runid
);
If you want the result for a particular runid, add and runid = ? to the outer query.
And for new listings:
select p.*
from property p
where not exists (select 1
from property p2
where p2.listingid = p.listingid and
p2.runid < p.runid
);
With an index on property(listingid, runid), I would expect this to have somewhat better performance than a solution using window functions.
Here is a db<>fiddle.

Aggregate data from multiple rows into single row

In my table each row has some data columns Priority column (for example, timestamp or just an integer). I want to group my data by ID and then in each group take latest not-null column. For example I have following table:
id A B C Priority
1 NULL 3 4 1
1 5 6 NULL 2
1 8 NULL NULL 3
2 634 346 359 1
2 34 NULL 734 2
Desired result is :
id A B C
1 8 6 4
2 34 346 734
In this example table is small and has only 5 columns, but in real table it will be much larger. I really want this script to work fast. I tried do it myself, but my script works for SQLSERVER2012+ so I deleted it as not applicable.
Numbers: table could have 150k of rows, 20 columns, 20-80k of unique ids and average SELECT COUNT(id) FROM T GROUP BY ID is 2..5
Now I have a working code (thanks to #ypercubeᵀᴹ), but it runs very slowly on big tables, in my case script can take one minute or even more (with indices and so on).
How can it be speeded up?
SELECT
d.id,
d1.A,
d2.B,
d3.C
FROM
( SELECT id
FROM T
GROUP BY id
) AS d
OUTER APPLY
( SELECT TOP (1) A
FROM T
WHERE id = d.id
AND A IS NOT NULL
ORDER BY priority DESC
) AS d1
OUTER APPLY
( SELECT TOP (1) B
FROM T
WHERE id = d.id
AND B IS NOT NULL
ORDER BY priority DESC
) AS d2
OUTER APPLY
( SELECT TOP (1) C
FROM T
WHERE id = d.id
AND C IS NOT NULL
ORDER BY priority DESC
) AS d3 ;
In my test database with real amount of data I get following execution plan:
This should do the trick, everything raised to the power 0 will return 1 except null:
DECLARE #t table(id int,A int,B int,C int,Priority int)
INSERT #t
VALUES (1,NULL,3 ,4 ,1),
(1,5 ,6 ,NULL,2),(1,8 ,NULL,NULL,3),
(2,634 ,346 ,359 ,1),(2,34 ,NULL,734 ,2)
;WITH CTE as
(
SELECT id,
CASE WHEN row_number() over
(partition by id order by Priority*power(A,0) desc) = 1 THEN A END A,
CASE WHEN row_number() over
(partition by id order by Priority*power(B,0) desc) = 1 THEN B END B,
CASE WHEN row_number() over
(partition by id order by Priority*power(C,0) desc) = 1 THEN C END C
FROM #t
)
SELECT id, max(a) a, max(b) b, max(c) c
FROM CTE
GROUP BY id
Result:
id a b c
1 8 6 4
2 34 346 734
One alternative that might be faster is a multiple join approach. Get the priority for each column and then join back to the original table. For the first part:
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id;
Then join back to this table:
with pabc as (
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id
)
select pabc.id, ta.a, tb.b, tc.c
from pabc left join
t ta
on pabc.id = ta.id and pabc.pa = ta.priority left join
t tb
on pabc.id = tb.id and pabc.pb = tb.priority left join
t tc
on pabc.id = tc.id and pabc.pc = tc.priority ;
This can also take advantage of an index on t(id, priority).
previous code will work with following syntax:
with pabc as (
select id,
max(case when a is not null then priority end) as pa,
max(case when b is not null then priority end) as pb,
max(case when c is not null then priority end) as pc
from t
group by id
)
select pabc.Id,ta.a, tb.b, tc.c
from pabc
left join t ta on pabc.id = ta.id and pabc.pa = ta.priority
left join t tb on pabc.id = tb.id and pabc.pb = tb.priority
left join t tc on pabc.id = tc.id and pabc.pc = tc.priority ;
This looks rather strange. You have a log table for all column changes, but no associated table with current data. Now you are looking for a query to collect your current values from the log table, which is a laborious task naturally.
The solution is simple: have an additional table with the current data. You can even link the tables with a trigger (so either every time a record gets inserted in your log table you update the current table or everytime a change is written to the current table you write a log entry).
Then just query your current table:
select id, a, b, c from currenttable order by id;

How can I select unique rows in a database over two columns?

I have found similar solutions online but none that I've been able to apply to my specific problem.
I'm trying to "unique-ify" data from one table to another. In my original table, data looks like the following:
USERIDP1 USERIDP2 QUALIFIER DATA
1 2 TRUE AB
1 2 CD
1 3 EF
1 3 GH
The user IDs are composed of two parts, USERIDP1 and USERIDP2 concatenated. I want to transfer all the rows that correspond to a user who has QUALIFIER=TRUE in ANY row they own, but ignore users who do not have a TRUE QUALIFIER in any of their rows.
To clarify, all of User 12's rows would be transferred, but not User 13's. The output would then look like:
USERIDP1 USERIDP2 QUALIFIER DATA
1 2 TRUE AB
1 2 CD
So basically, I need to find rows with distinct user ID components (involving two unique fields) that also possess a row with QUALIFIER=TRUE and copy all and only all of those users' rows.
Although this nested query will be very slow for large tables, this could do it.
SELECT DISTINCT X.USERIDP1, X.USERIDP2, X.QUALIFIER, X.DATA
FROM YOUR_TABLE_NAME AS X
WHERE EXISTS (SELECT 1 FROM YOUR_TABLE_NAME AS Y WHERE Y.USERIDP1 = X.USERIDP1
AND Y.USERIDP2 = X.USERIDP2 AND Y.QUALIFIER = TRUE)
It could be written as an inner join with itself too:
SELECT DISTINCT X.USERIDP1, X.USERIDP2, X.QUALIFIER, X.DATA
FROM YOUR_TABLE_NAME AS X
INNER JOIN YOUR_TABLE_NAME AS Y ON Y.USERIDP1 = X.USERIDP1
AND Y.USERIDP2 = X.USERIDP2 AND Y.QUALIFIER = TRUE
For a large table, create a new auxiliary table containing only USERIDP1 and USERIDP2 columns for rows that have QUALIFIER = TRUE and then join this table with your original table using inner join similar to the second option above. Remember to create appropriate indexes.
This should do the trick - if the id fields are stored as integers then you will need to convert / cast into Varchars
SELECT 1 as id1,2 as id2,'TRUE' as qualifier,'AB' as data into #sampled
UNION ALL SELECT 1,2,NULL,'CD'
UNION ALL SELECT 1,3,NULL,'EF'
UNION ALL SELECT 1,3,NULL,'GH'
;WITH data as
(
SELECT
id1
,id2
,qualifier
,data
,SUM(CASE WHEN qualifier = 'TRUE' THEN 1 ELSE 0 END)
OVER (PARTITION BY id1 + '' + id2) as num_qualifier
from #sampled
)
SELECT
id1
,id2
,qualifier
,data
from data
where num_qualifier > 0
Select *
from yourTable
INNER JOIN (Select UserIDP1, UserIDP2 FROM yourTable WHERE Qualifier=TRUE) B
ON yourTable.UserIDP1 = B.UserIDP1 and YourTable.UserIDP2 = B.UserIDP2
How about a subquery as a where clause?
SELECT *
FROM theTable t1
WHERE CAST(t1.useridp1 AS VARCHAR) + CAST(t1.useridp2 AS VARCHAR) IN
(SELECT CAST(t2.useridp1 AS VARCHAR) + CAST(t.useridp2 AS VARCHAR)
FROM theTable t2
WHERE t2.qualified
);
This is a solution in mysql, but I believe it should transfer to sql server pretty easily. Use a subquery to pick out groups of (id1, id2) combinations with at least one True 'qualifier' row; then join that to the original table on (id1, id2).
mysql> SELECT u1.*
FROM users u1
JOIN (SELECT id1,id2
FROM users
WHERE qualifier
GROUP BY id1, id2) u2
USING(id1, id2);
+------+------+-----------+------+
| id1 | id2 | qualifier | data |
+------+------+-----------+------+
| 1 | 2 | 1 | aa |
| 1 | 2 | 0 | bb |
+------+------+-----------+------+
2 rows in set (0.00 sec)

Consolidate records

I want to consolidate a set of records
(id) / (referencedid)
1 10
1 11
2 11
2 10
3 10
3 11
3 12
The result of query should be
1 10
1 11
3 10
3 11
3 12
So, since id=1 and id=2 has same set of corresponding referenceids {10,11} they would be consolidated. But id=3 s corresponding referenceids are not the same, hence wouldnt be consolidated.
What would be good way to get this done?
Select id, referenceid
From MyTable
Where Id In (
Select Min( Z.Id ) As Id
From (
Select Z1.id, Group_Concat( Z1.referenceid ) As signature
From (
Select id, referenceid
From MyTable
Order By id, referenceid
) As Z1
Group By Z1.id
) As Z
Group By Z.Signature
)
-- generate count of elements for each distinct id
with Counts as (
select
id,
count(1) as ReferenceCount
from
tblReferences R
group by
R.id
)
-- generate every pairing of two different id's, along with
-- their counts, and how many are equivalent between the two
,Pairings as (
select
R1.id as id1
,R2.id as id2
,C1.ReferenceCount as count1
,C2.ReferenceCount as count2
,sum(case when R1.referenceid = R2.referenceid then 1 else 0 end) as samecount
from
tblReferences R1 join Counts C1 on R1.id = C1.id
cross join
tblReferences R2 join Counts C2 on R2.id = C2.id
where
R1.id < R2.id
group by
R1.id, C1.ReferenceCount, R2.id, C2.ReferenceCount
)
-- generate the list of ids that are safe to remove by picking
-- out any id's that have the same number of matches, and same
-- size of list, which means their reference lists are identical.
-- since id2 > id, we can safely remove id2 as a copy of id, and
-- the smallest id of which all id2 > id are copies will be left
,RemovableIds as (
select
distinct id2 as id
from
Pairings P
where
P.count1 = P.count2 and P.count1 = P.samecount
)
-- validate the results by just selecting to see which id's
-- will be removed. can also include id in the query above
-- to see which id was identified as the copy
select id from RemovableIds R
-- comment out `select` above and uncomment `delete` below to
-- remove the records after verifying they are correct!
--delete from tblReferences where id in (select id from RemovableIds) R

duplicate deletion problem

I am working on a database where i have two tables.
BILL_MASTER (master_table)
Bill_Master_ID
Consumer_No
BILL_GENERATION (Detail table)
Bill_Master_ID
Somehow user has inserted two identical consumer #'s in bill master but bill_master_id is different..here is the example of bill_master table
Bill_Master_ID Consumer_No
1 1234567890
2 1234567890
now user has made one transaction of bill_master_id "1" and record exists in Bill_Generation table.
What i want to do is when i pass consumer # in SQL statement as parameter it check if selected consumer# Bill_master_id exist in bill_generation or not. If yes then count return should be 1 else 0.
If I understand you correctly, this should do what you want.
IF (SELECT COUNT(consumer_no) FROM bill_master m
inner join bill_generation g on m.bill_master_id=g.bill_master_id
WHERE consumer_no=1234567890
) > 1
SELECT 1
ELSE
SELECT 0
The title is misleading, this has nothing to do with deletion unless that's additional to the question.
So what you mean is, based on consumer no, you need to find any record in bill_generation for any of the consumer's ids in bill_master?
select case when exists (
select *
from bill_master m
inner join bill_generation g on g.bill_master_id = m.bill_master_id
where m.consumer_no = 1234 -- or some number
) then 1 else 0 end
To actually delete records from bill_master except the lowest id, you can use this
;with tmp as (
select *, rn = row_number() over (partition by consumer_no order by bill_master_id)
from bill_master)
delete tmp where rn > 1