SQL loop on duplicate row to combine into one - sql

I have something to fix in my database here it is:
I have a table with duplicate rows like that:
the duplicate columns are IDPatient and IDObjet and you should never have both duplicate and that's why i put Key on both column but it's a bit too late.. so I have to fix this by combining these duplicate row into one without losing data and to put it in order.
Example, as you can see in the picture the column texte_1 contains each one a date 2010-11-25 and 2011-11-04. The date 2010-11-25 come before 2011-11-04 So i have to put 2011-11-04 into the column texte_2 of the first row and looping like that for each data I have in my row and to verify if the date is older or not. If yes, I have to replace the data in the row one with the second row, taking the information we have replace in a temp var and then finding a new column("Texte_X") to insert into the same row my replace data and validating at the same time if it's not older.
I can have multiple duplicate row in my table and I know looping in SQL server is slow, but would really appreciate a good solution to solve this here.
Here's a example of multiple duplicate row

How about a MERGE:
merge mytable as t
using (
select idPatient, idObject, max(texte_1) dt
from mytable
group by idPatient, idObject
) s on t.idPatient = s.idPatient
and t.idObject = s.idObject
and t.texte_1 != s.dt
when matched then delete;

You could use the ROW_NUMBER() function and your ID field to order the duplicates, then PIVOT to de-normalize the records, or self-joins, like:
;with cte as (SELECT *,RN = ROW_NUMBER() OVER(PARTITION BY IDPatient,IDObjet ORDER BY ID)
FROM YourTable
)
SELECT a.IDPatient,a.IDObjet,a.Texte_1, b.Texte_1 as Texte_2, c.Texte_1 AS Texte_3
FROM cte a
LEFT JOIN cte b
ON a.IDPatient = b.IDPatient
AND a.IDObjet = b.IDObjet
AND b.RN = 2
LEFT JOIN cte c
ON a.IDPatient = c.IDPatient
AND a.IDObjet = c.IDObjet
AND c.RN = 3
WHERE a.RN = 1
This assumes the ID order is sufficient, you could change it to your date field if needed. Since you ultimately want to remove the duplicate lines, you could either run this query into a new table, or after you use this as the basis of your update you can then DELETE records from the cte above where RN > 1
Personally, I would avoid the de-normalized Texte_1-10 structure, and add a new field that's the equivalent of the RN field as part of the key.

Related

Redshift Query for comparing current row to previous row - SQL query to Redshift Query

I have a table test with fields - A (ID), B (Flag). I need to add a new column - C (Result) in this table and it's value will be derived based on B (Flag) field. If flag is false then keep checking previous rows till we get flag as true and then take value of A (ID) field and populate it in C (Result) column. So C will have the last value of A with B field as True.
I have the query in SQL but when I try to use it in Redshift I get following errors.
1st Query Option:
WITH
cte1 AS (
SELECT A, SUM(B='T') OVER (ORDER BY A) group_no
FROM test
),
cte2 AS (
SELECT A, MIN(A) OVER (PARTITION BY group_no) previous_T
FROM cte1
)
UPDATE test
JOIN cte2 USING (A)
SET test.C = cte2.previous_T;
I am getting errors in SUM and MIN function.
2nd Query Option:
UPDATE test
JOIN (
SELECT A,
#tmp := CASE WHEN B='T' THEN A ELSE #tmp END C
FROM test
JOIN (SELECT #tmp:=0) init
ORDER BY A
) data USING (A)
SET test.C = data.C;
Getting error in temporary table.
I am new to SQL with no experience in Redshift, appreciate any help I get. Thanks!
I only have a few min but I think I can get you started. Let’s stick to query #1.
SUM(B=‘T’) isn’t going to work in Redshift. Look at the function DECODE() as it will allow you to switch how a column is generated based on another column.
It looks like you want to do a rolling SUM() so you will need a frame clause (likely “rows unbounded preceding”).
It’s not clear why you want SUM() of ids. I’d think you would want MAX() as this will give you the highest preceding id. Advise if I’m missing something.
I’d think you would want something like:
DECODE(B=‘T’, true, A, MAX(A) OVER (order by A rows unbounded preceding) as prev_id
But I don’t know your data or the process you are trying to implement.
As for the UPDATE I suggest you look at the Redshift docs. This will look more like:
…
update test set c = cte.prev_id
from cte
where test.id = cte.id;
This assumes the id is unique and that I have half a clue what you are trying to do.

To Remove Duplicates from Netezza Table

I have a scenario for a type2 table where I have to remove duplicates on total row level.
Lets consider below example as the data in table.
A|B|C|D|E
100|12-01-2016|2|3|4
100|13-01-2016|3|4|5
100|14-01-2016|2|3|4
100|15-01-2016|5|6|7
100|16-01-2016|5|6|7
If you consider A as key column, you know that last 2 rows are duplicates.
Generally to find duplicates, we use group by function.
select A,C,D,E,count(1)
from table
group by A,C,D,E
having count(*)>1
for this output would be 100|2|3|4 as duplicate and also 100|5|6|7.
However, only 100|5|6|7 is only duplicate as per type 2 and not 100|2|3|4 because this value has come back in 3rd run and not soon after 1st load.
If I add date field into group by 100|5|6|7 will not be considered as duplicate, but in reality it is.
Trying to figure out duplicates as explained above.
Duplicates should only be 100|5|6|7 and not 100|2|3|4.
can someone please help out with SQL for the same.
Regards
Raghav
Use row_number analytical function to get rid of duplicates.
delete from
(
select a,b,c,d,e,row_number() over (partition by a,b,c,d,e) as rownumb
from table
) as a
where rownumb > 1
if you want to see all duplicated rows, you need join table with your group by query or filter table using group query as subquery.
wITH CTE AS (select a, B, C,D,E, count(*)
from TABLE
group by 1,2,3,4,5
having count(*)>1)
sELECT * FROM cte
WHERE B <> B + 1
Try this query and see if it works. In case you are getting any errors then let me know.
I am assuming that your column B is in the Date format if not then cast it to date
If you can see the duplicate then just replace select * to delete

UPDATE random row from another table SQL Server 2014

I tried to do an UPDATE statement with a random row from another table. I know this question has been asked before (here), but it doesn't seem to work for me.
I should update each row with a different value from the other table. In my case it only gets one random row from a table and puts that in every row.
UPDATE dbo.TABLE_CHARGE
SET COLRW_STREET =
(SELECT TOP 1 COLRW_STREET FROM CHIEF_PreProduction.dbo.TABLE_FAKESTREET
ORDER BY ABS(CHECKSUM(NewId())%250))
Thanks in advance!
I took a liberty to assume that you have ID field in your TABLE_CHARGE table. This is probably not the most efficient way, but seems to work:
WITH random_values as
(
SELECT t.id, t.COLRW_STREET, t.random_street FROM (
SELECT c.id, c.COLRW_STREET,
f.COLRW_STREET as random_street, ROW_NUMBER() OVER (partition by c.id ORDER BY ABS(CHECKSUM(NewId())%250)) rn
FROM table_charge c, TABLE_FAKESTREET f) t
WHERE t.rn = 1
)
UPDATE random_values SET COLRW_STREET = random_street;
SQL Fiddle demo
Your original code did not work because when yo do ... SET x = (SELECT TOP 1 ..) database does OUTER JOIN of your target table with one TOP row, which means that one single row is applied to all rows in your target table. Hence you have same value in all rows.
Following query demonstrates what is happening in the UPDATE:
SELECT * FROM
TABLE_CHARGE tc,
(SELECT TOP 1 COLRW_STREET as random_street FROM TABLE_FAKESTREET
ORDER BY ABS(CHECKSUM(NewId())%250)) t
My solution gets all fake records ordered randomly for each record in target table and only selects the first one per ID.

My tricky SQL Update query not working so well

I am trying to update a table in my database with another row from another table. I have two parameters one being the ID and another being the row number (as you can select which row you want from the GUI)
this part of the code works fine, this returns one column of a single row.
(SELECT txtPageContent
FROM (select *, Row_Number() OVER (ORDER BY ArchiveDate asc) as rowid
from ARC_Content Where ContentID = #ContentID) as test
Where rowid = #rowID)
its just when i try to add the update/set it won't work. I am probably missing something
UPDATE TBL_Content
Set TBL_Content.txtPageContent = (select txtPageContent
FROM (select *, Row_Number() OVER (ORDER BY ArchiveDate asc) as rowid
from ARC_Content Where ContentID = #ContentID) as test
Where rowid = #rowID)
Thanks for the help! (i have tried top 1 with no avail)
I see a few issues with your update. First, I don't see any joining or selection criteria for the table that you're updating. That means that every row in the table will be updated with this new value. Is that really what you want?
Second, the row number between what is on the GUI and what you get back in the database may not match. Even if you reproduce the query used to create your list in the GUI (which is dangerous anyway, since it involves keeping the update and the select code always in sync), it's possible that someone could insert or delete or update a row between the time that you fill your list box and send that row number to the server for the update. It's MUCH better to use PKs (probably IDs in your case) to determine which row to use for updating.
That said, I think that the following will work for you (untested):
;WITH cte AS (
SELECT
txtPageContent,
ROW_NUMBER() OVER (ORDER BY ArchiveDate ASC) AS rowid
FROM
ARC_Content
WHERE
ContentID = #ContentID)
UPDATE
TC
SET
txtPageContent = cte.txtPageContent
FROM
TBL_Content TC
INNER JOIN cte ON
rowid = #rowID

SQL Delete low counts

I have a table with this data:
Id Qty
-- ---
A 1
A 2
A 3
B 112
B 125
B 109
But I'm supposed to only have the max values for each id. Max value for A is 3 and for B is 125. How can I isolate (and delete) the other values?
The final table should look like this :
Id Qty
-- ---
A 3
B 125
Running MySQL 4.1
Oh wait. Got a simpler solution :
I'll select all the max values(group by id), export the data, flush the table, reimport only the max values.
CREATE TABLE tabletemp LIKE table;
INSERT INTO tabletemp SELECT id,MAX(qty) FROM table GROUP BY id;
DROP TABLE table;
RENAME TABLE tabletemp TO table;
Thanks to all !
Try this in SQL Server:
delete from tbl o
left outer join
(Select max(qty) anz , id
from tbl i
group by i.id) k on o.id = k.id and k.anz = o.qty
where k.id is null
Revision 2 for MySQL... Can anyone check this one?:
delete from tbl o
where concat(id,qty) not in
(select concat(id,anz) from (Select max(qty) anz , id
from tbl i
group by i.id))
Explanation:
Since I was supposed to not use joins (See comments about MySQL Support on joins and delete/update/insert), I moved the subquery into a IN(a,b,c) clause.
Inside an In clause I can use a subquery, but that query is only allowed to return one field. So in order to filter all elements that are not the maximum, i need to concat both fields into a single one, so i can return it inside the in clause. So basically my query inside the IN returns the biggest ID+QTY only. To compare it with the main table i also need to make a concat on the outside, so the data for both fields match.
Basically the In clause contains:
("A3","B125")
Disclaimer: The above query is "evil!" since it uses a function (concat) on fields to compare against. This will cause any index on those fields to become almost useless. You should never formulate a query that way that is run on a regular basis. I only wanted to try to bend it so it works on mysql.
Example of this "bad construct":
(Get all o from the last 2 weeks)
select ... from orders where orderday + 14 > now()
You should allways do:
select ... from orders where orderday > now() - 14
The difference is subtle: Version 2 only has to do the math once, and is able to use the index, and version 1 has to do the math for every single row in the orders table., and you can forget about the index usage...
I'd try this:
delete from T
where exists (
select * from T as T2
where T2.Id = T.Id
and T2.Qty > T.Qty
);
For those who might have similar question in the future, this might be supported some day (it is now in SQL Server 2005 and later)
It won't require a join, and it has advantages over the use of a temporary table if the table has dependencies
with Tranked(Id,Qty,rk) as (
select
Id, Qty,
rank() over (
partition by Id
order by Qty desc
)
from T
)
delete from Tranked
where rk > 1;
You'll have to go via another table (among other things that makes a single delete statement here quite impossible in mysql is you can't delete from a table and use the same table in a subquery).
BEGIN;
create temporary table tmp_del select id,max(qty) as qty from the_tbl;
delete the_tbl from the_tbl,tmp_del where
the_tbl.id=tmp_del.id and the_tbl.qty=tmp_del.qty;
drop table tmp_del;
END;
MySQL 4.0 and later supports a simple multi-table syntax for DELETE:
DELETE t1 FROM MyTable t1 JOIN MyTable t2 ON t1.id = t2.id AND t1.qty < t2.qty;
This produces a join of each row with a given id to all other rows with the same id, and deletes only the row with the lesser qty in each pairing. After this is all done, the row with the greatest qty per group of id is left not deleted.
If you only have one row with a given id, it still works because a single row is naturally the one with the greatest value.
FWIW, I just tried my solution using MySQL 5.0.75 on a Macbook Pro 2.40GHz. I inserted 1 million rows of synthetic data, with different numbers of rows per "group":
2 rows per id completes in 26.78 sec.
5 rows per id completes in 43.18 sec.
10 rows per id completes in 1 min 3.77 sec.
100 rows per id completes in 6 min 46.60 sec.
1000 rows per id didn't complete before I terminated it.