INSERT INTO from tableA to tableB with count() - sql

Inserting rows from one table to another table, I try to use a count(*) to ensure that the line_no column in table OBJ_LINES is set to 1,2,3,4.. and so on for every line added.
INSERT INTO OBJ_LINES(
id,
line_no)
SELECT (1,
(select count(*)+1 FROM OBJ_LINES WHERE id = 1)
FROM TEMPLATE_LINES tmp
WHERE tmp.id = 37;
(syntax Sybase Database)
If the TEMPLATE_LINES table holds more than one row, I get a duplicate error as the count() seems to be evaluted only once, and not for every row found in the TEMPLATE_LINES table.
How may I write the sql to set 'dynamic' line_no depending on the current number of rows for a given id?

INSERT INTO OBJ_LINES(
id,
line_no)
SELECT (1,
number(*)
FROM TEMPLATE_LINES tmp
WHERE tmp.id = 37;

Related

Change value of duplicated rows

There is a table with tow columns(ID, Data) and there are 3 rows with same value.
ID Data
4 192.168.0.22
4 192.168.0.22
4 192.168.0.22
Now I want to change third row DATA column. In update SQL Server Generate an error that I ca not change the value.
I can delete all 3 rows. But I can not delete third row separately.
This table is for a software that I bought and I changed the third Server IP.
You can try the following query
create table #tblSimilarValues(id int, ipaddress varchar(20))
insert into #tblSimilarValues values (4, '192.168.0.22'),
(4, '192.168.0.22'),(4, '192.168.0.22')
Use Below query if you want to change all rows
with oldData as (
select *,
count(*) over (partition by id, ipaddress) as cnt
from #tblSimilarValues
)
update oldData
set ipaddress = '192.168.0.22_1'
where cnt > 1;
select * from #tblSimilarValues
Use Below query if you want to skip firs row
;with oldData as (
select *,
ROW_NUMBER () over (partition by id, ipaddress order by id, ipaddress) as cnt
from #tblSimilarValues
)
update oldData
set ipaddress = '192.168.0.22_2'
where cnt > 1;
select * from #tblSimilarValues
drop table #tblSimilarValues
You can find the live demo live demo here
Since there is no column that allows us to distinguish these rows from each other, there's no "third row" (nor a first or second one for that matter).
We can use a ROW_NUMBER function to apply arbitrary row numbers to these rows, however, and if we place that in a CTE, we can apply DELETE/UPDATE actions via the CTE and use the arbitrary row numbers:
declare #t table (ID int not null, Data varchar(15))
insert into #t(ID,Data) values
(4,'192.168.0.22'),
(4,'192.168.0.22'),
(4,'192.168.0.22')
;With ArbitraryAssignments as (
select *,ROW_NUMBER() OVER (PARTITION BY ID, Data ORDER BY Data) as rn
from #t
)
delete from ArbitraryAssignments where rn > 2
select * from #t
This produces two rows of output - one row was deleted.
Note that I say that the ROW_NUMBER is arbitrary. One of the expressions in both the PARTITION BY and ORDER BY clauses is the same. By definition, then, we know that no real ORDER is defined by this (because all rows within the same partition, by definition, have the same value for that expression).
In this case ID columns allows duplicate value which is wrong, ID should be unique.
Now what you can do is create a new column make that unique or Primary Key or change the duplicate values of ID column and make it Unique/Primary key.
Now as per your Unique key/Primary key you can update DATA column value by query as below:
UPDATE <Table Name>
SET DATA = 'new data'
WHERE ID = 3;

Merge data from one table to an other with removing duplicates - Oracle SQL

I want to copy from a table to an other with "filtering the data": I want to remove duplicates based on a column (date of statistic) but the following script copies all the rows. What am I missing, or how should it be properly handled? I am also open for a solution that does not not copy to a new table but does it in the correct one.
MERGE INTO TEMP temp
USING ORIG orig
ON (temp.DATE_OF_STATISTIC = orig.DATE_OF_STATISTIC)
WHEN MATCHED THEN
UPDATE SET temp.COUNT = temp.COUNT + orig.COUNT
WHEN NOT MATCHED THEN
INSERT (ID, DATE_OF_STATISTIC, COUNT)
VALUES (orig.ID, orig.DATE_OF_STATISTIC, orig.COUNT);
DATE_OF_STATISTIC is a VARCHAR2 column with format: dd-mm-yyyy for example: 20-12-2014
In case of duplicate i have to select one of the records (doesnt matter which one) and merge the count values in it.
What am I missing [..] ?
The one thing you missed is MERGE will check for matching rows before actually performing the merge operation. So it will not take into account any row added during processing. That's why you might end up having duplicate records when orig has two rows with the same date.
Your only solution here is to merge on an aggregate sub-query as it has already been suggested in other answers:
MERGE INTO TEMP2 temp
USING (SELECT MIN(id) "ID", SUM("COUNT") "COUNT", DATE_OF_STATISTIC
FROM ORIG GROUP BY DATE_OF_STATISTIC) orig
-- ^^^^^^^^^^^^^^^^^^^^^^^^^^
-- aggregates rows by DATE_OF_STATISTIC
ON (temp.DATE_OF_STATISTIC = orig.DATE_OF_STATISTIC)
WHEN MATCHED THEN
UPDATE SET temp."COUNT" = temp."COUNT" + orig."COUNT"
WHEN NOT MATCHED THEN
INSERT (ID, DATE_OF_STATISTIC, COUNT)
VALUES (orig.ID, orig.DATE_OF_STATISTIC, orig."COUNT");
If temp table is empty before processing, this could further be reduced to a simple CREATE ... SELECT statement:
CREATE TABLE temp3 AS (SELECT MIN(id) "ID", SUM("COUNT") "COUNT", DATE_OF_STATISTIC
FROM ORIG GROUP BY DATE_OF_STATISTIC);
Or if you really need two different statements, as a CREATE TABLE followed by an INSERT ... SELECT:
CREATE TABLE temp4 .... ;
-- ^^^^^
-- whatever you need here
INSERT INTO temp4 SELECT MIN(id) "ID", SUM("COUNT") "COUNT", DATE_OF_STATISTIC
FROM ORIG GROUP BY DATE_OF_STATISTIC;
Compare all those solutions on http://sqlfiddle.com/#!4/1a42f/1
Maybe this will help. In case of rows with the same date_of_statistic value it will take the row with the hightest id:
MERGE INTO TEMP temp
USING ( SELECT id,
date_of_statistic,
count
FROM(SELECT id,
date_of_statistic,
SUM(count) OVER (PARTITION BY date_of_statistic) count,
ROW_NUMBER() OVER (PARTITION BY date_of_statistic ORDER BY id DESC) rank
FROM ORIG
)
WHERE rank = 1
) orig
ON (temp.DATE_OF_STATISTIC = orig.DATE_OF_STATISTIC)
WHEN MATCHED THEN
UPDATE SET temp.COUNT = temp.COUNT + orig.COUNT
WHEN NOT MATCHED THEN
INSERT (ID, DATE_OF_STATISTIC, COUNT)
VALUES (orig.ID, orig.DATE_OF_STATISTIC, orig.COUNT);
In your using clause, instead of just table name, select only unique rows from the table.
Another thing, do you just want to get rid of duplicate rows in source table? If yes, then no need of temp table and merge. Just delete the duplicates in the source table.
Do you need a query to remove duplicates? If yes, look for analytic functions. Based on the rule to pick duplicates, I can suggest further. Mention on what basis the rows are duplicate. Some sample data would be much better.
insert into TEMP
select ID, DATE_OF_STATISTIC, s from (
select orig.ID, orig.DATE_OF_STATISTIC,
sum(orig.COUNT) over(partition by orig.DATE_OF_STATISTIC) s,
row_number() over(partition by orig.DATE_OF_STATISTIC order by orig.id desc) rw
from ORIG orig)
where rw = 1;

How can I delete one of two perfectly identical rows?

I am cleaning out a database table without a primary key (I know, I know, what were they thinking?). I cannot add a primary key, because there is a duplicate in the column that would become the key. The duplicate value comes from one of two rows that are in all respects identical. I can't delete the row via a GUI (in this case MySQL Workbench, but I'm looking for a database agnostic approach) because it refuses to perform tasks on tables without primary keys (or at least a UQ NN column), and I cannot add a primary key, because there is a duplicate in the column that would become the key. The duplicate value comes from one...
How can I delete one of the twins?
SET ROWCOUNT 1
DELETE FROM [table] WHERE ....
SET ROWCOUNT 0
This will only delete one of the two identical rows
One option to solve your problem is to create a new table with the same schema, and then do:
INSERT INTO new_table (SELECT DISTINCT * FROM old_table)
and then just rename the tables.
You will of course need approximately the same amount of space as your table requires spare on your disk to do this!
It's not efficient, but it's incredibly simple.
Note that MySQL has its own extension of DELETE, which is DELETE ... LIMIT, which works in the usual way you'd expect from LIMIT: http://dev.mysql.com/doc/refman/5.0/en/delete.html
The MySQL-specific LIMIT row_count option to DELETE tells the server
the maximum number of rows to be deleted before control is returned to
the client. This can be used to ensure that a given DELETE statement
does not take too much time. You can simply repeat the DELETE
statement until the number of affected rows is less than the LIMIT
value.
Therefore, you could use DELETE FROM some_table WHERE x="y" AND foo="bar" LIMIT 1; note that there isn't a simple way to say "delete everything except one" - just keep checking whether you still have row duplicates.
delete top(1) works on Microsoft SQL Server (T-SQL).
This can be accomplished using a CTE and the ROW_NUMBER() function, as below:
/* Sample Data */
CREATE TABLE #dupes (ID INT, DWCreated DATETIME2(3))
INSERT INTO #dupes (ID, DWCreated) SELECT 1, '2015-08-03 01:02:03.456'
INSERT INTO #dupes (ID, DWCreated) SELECT 2, '2014-08-03 01:02:03.456'
INSERT INTO #dupes (ID, DWCreated) SELECT 1, '2013-08-03 01:02:03.456'
/* Check sample data - returns three rows, with two rows for ID#1 */
SELECT * FROM #dupes
/* CTE to give each row that shares an ID a unique number */
;WITH toDelete AS
(
SELECT ID, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY DWCreated) AS RN
FROM #dupes
)
/* Delete any row that is not the first instance of an ID */
DELETE FROM toDelete WHERE RN > 1
/* Check the results: ID is now unique */
SELECT * FROM #dupes
/* Clean up */
DROP TABLE #dupes
Having a column to ORDER BY is handy, but not necessary unless you have a preference for which of the rows to delete. This will also handle all instances of duplicate records, rather than forcing you to delete one row at a time.
For PostgreSQL you can do this:
DELETE FROM tablename
WHERE id IN (SELECT id
FROM (SELECT id, ROW_NUMBER()
OVER (partition BY column1, column2, column3 ORDER BY id) AS rnum
FROM tablename) t
WHERE t.rnum > 1);
column1, column2, column3 would the column set which have duplicate values.
Reference here.
This works for PostgreSQL
DELETE FROM tablename WHERE id = 123 AND ctid IN (SELECT ctid FROM tablename WHERE id = 123 LIMIT 1)
Tried LIMIT 1? This will only delete 1 of the rows that match your DELETE query
DELETE FROM `table_name` WHERE `column_name`='value' LIMIT 1;
In my case I could get the GUI to give me a string of values of the row in question (alternatively, I could have done this by hand). On the suggestion of a colleague, in whose debt I remain, I used this to create an INSERT statement:
INSERT
'ID1219243408800307444663', '2004-01-20 10:20:55', 'INFORMATION', 'admin' (...)
INTO some_table;
I tested the insert statement, so that I now had triplets. Finally, I ran a simple DELETE to remove all of them...
DELETE FROM some_table WHERE logid = 'ID1219243408800307444663';
followed by the INSERT one more time, leaving me with a single row, and the bright possibilities of a primary key.
in case you can add a column like
ALTER TABLE yourtable ADD IDCOLUMN bigint NOT NULL IDENTITY (1, 1)
do so.
then count rows grouping by your problem column where count >1 , this will identify your twins (or triplets or whatever).
then select your problem column where its content equals the identified content of above and check the IDs in IDCOLUMN.
delete from your table where IDCOLUMN equals one of those IDs.
You could use a max, which was relevant in my case.
DELETE FROM [table] where id in
(select max(id) from [table] group by id, col2, col3 having count(id) > 1)
Be sure to test your results first and having a limiting condition in your "having" clausule. With such a huge delete query you might want to update your database first.
delete top(1) tableNAme
where --your conditions for filtering identical rows
I added a Guid column to the table and set it to generate a new id for each row. Then I could delete the rows using a GUI.
In PostgreSQL there is an implicit column called ctid. See the wiki. So you are free to use the following:
WITH cte1 as(
SELECT unique_column, max( ctid ) as max_ctid
FROM table_1
GROUP BY unique_column
HAVING count(*) > 1
), cte2 as(
SELECT t.ctid as target_ctid
FROM table_1 t
JOIN cte1 USING( unique_column )
WHERE t.ctid != max_ctid
)
DELETE FROM table_1
WHERE ctid IN( SELECT target_ctid FROM cte2 )
I'm not sure how safe it is to use this when there is a possibility of concurrent updates. So one may find it sensible to make a LOCK TABLE table_1 IN ACCESS EXCLUSIVE MODE; before actually doing the cleanup.
In case there are multiple duplicate rows to delete and all fields are identical, no different id, the table has no primary key , one option is to save the duplicate rows with distinct in a new table, delete all duplicate rows and insert the rows back. This is helpful if the table is really big and the number of duplicate rows is small.
--- col1 , col2 ... coln are the table columns that are relevant.
--- if not sure add all columns of the table in the select bellow and the where clause later.
--- make a copy of the table T to be sure you can rollback anytime , if possible
--- check the ##rowcount to be sure it's what you want
--- use transactions and rollback in case there is an error
--- first find all with duplicate rows that are identical , this statement could be joined
--- with the first one if you choose all columns
select col1,col2, --- other columns as needed
count(*) c into temp_duplicate group by col1,col2 having count(*) > 1
--- save all the rows that are identical only once ( DISTINCT )
insert distinct * into temp_insert from T , temp_duplicate D where
T.col1 = D.col1 and
T.col2 = D.col2 --- and other columns if needed
--- delete all the rows that are duplicate
delete T from T , temp_duplicate D where
T.col1 = D.col1 and
T.col2 = D.col2 ---- and other columns if needed
--- add the duplicate rows , now only once
insert into T select * from temp_insert
--- drop the temp tables after you check all is ok
If, like me, you don't want to have to list out all the columns of the database, you can convert each row to JSONB and compare by that.
(NOTE: This is incredibly inefficient - be careful!)
select to_jsonb(a.*), to_jsonb(b.*)
FROM
table a
left join table b
on
a.entry_date < b.entry_date
where (SELECT NOT exists(
SELECT
FROM jsonb_each_text(to_jsonb(a.*) - 'unwanted_column') t1
FULL OUTER JOIN jsonb_each_text(to_jsonb(b.*) - 'unwanted_column') t2 USING (key)
WHERE t1.value<>t2.value OR t1.key IS NULL OR t2.key IS NULL
))
Suppose we want to delete duplicate records with keeping only 1 unique records from Employee table - Employee(id,name,age)
delete from Employee
where id not in (select MAX(id)
from Employee
group by (id,name,age)
);
You can use limit 1
This works perfectly for me with MySQL
delete from `your_table` [where condition] limit 1;
DELETE FROM Table_Name
WHERE ID NOT IN
(
SELECT MAX(ID) AS MaxRecordID
FROM Table_Name
GROUP BY [FirstName],
[LastName],
[Country]
);

Single Query to delete and display duplicate records

One of the question asked in an interview was,
One table has 100 records. 50 of them
are duplicates. Is it possible with a single
query to delete the duplicate records
from the table as well as select and
display the remaining 50 records.
Is this possible in a single SQL query?
Thanks
SNA
with SQL Server you would use something like this
DECLARE #Table TABLE (ID INTEGER, PossibleDuplicate INTEGER)
INSERT INTO #Table VALUES (1, 100)
INSERT INTO #Table VALUES (2, 100)
INSERT INTO #Table VALUES (3, 200)
INSERT INTO #Table VALUES (4, 200)
DELETE FROM #Table
OUTPUT Deleted.*
FROM #Table t
INNER JOIN (
SELECT ID = MAX(ID)
FROM #Table
GROUP BY PossibleDuplicate
HAVING COUNT(*) > 1
) d ON d.ID = t.ID
The OUTPUT statement shows the records that get deleted.
Update:
Above query will delete duplicates and give you the rows that are deleted, not the rows that remain. If that is important to you (all in all, the remaining 50 rows should be identical to the 50 deleted rows), you could use SQL Server's 2008 MERGE syntax to achieve this.
Lieven's Answer is a good explanation of how to output the deleted rows. I'd like to add two things:
If you want to do something more with the output other than displaying it, you can specify OUTPUT INTO #Tbl (where #Tbl is a table-var you declare before the deleted);
Using MAX, MIN, or any of the other aggregates can only handle one duplicate row per group. If it's possible for you to have many duplicates, the following SQL Server 2005+ code will help do that:
;WITH Duplicates AS
(
SELECT
ID,
ROW_NUMBER() OVER (PARTITION BY DupeColumn ORDER BY ID) AS RowNum
)
DELETE FROM MyTable
OUTPUT deleted.*
WHERE ID IN
(
SELECT ID
FROM Duplicates
WHERE RowNum > 1
)
Sounds unlikely, at least in ANSI SQL, since a delete only returns the count of the number of deleted rows.

UPDATE statement in Oracle using SQL or PL/SQL to update first duplicate row ONLY

I'm looking for an UPDATE statement where it will update a single duplicate row only and remain the rest (duplicate rows) intact
as is, using ROWID or something else or other elements to utilize in Oracle SQL or PL/SQL?
Here is an example duptest table to work with:
CREATE TABLE duptest (ID VARCHAR2(5), NONID VARCHAR2(5));
run one INSERT INTO duptest VALUES('1','a');
run four (4) times INSERT INTO duptest VALUES('2','b');
Also, the first duplicate row has to be updated (not deleted), always, whereas the other three (3) have to be remained as is!
Thanks a lot,
Val.
Will this work for you:
update duptest
set nonid = 'c'
WHERE ROWID IN (SELECT MIN (ROWID)
FROM duptest
GROUP BY id, nonid)
This worked for me, even for repeated runs.
--third, update the one row
UPDATE DUPTEST DT
SET DT.NONID = 'c'
WHERE (DT.ID,DT.ROWID) IN(
--second, find the row id of the first dup
SELECT
DT.ID
,MIN(DT.ROWID) AS FIRST_ROW_ID
FROM DUPTEST DT
WHERE ID IN(
--first, find the dups
SELECT ID
FROM DUPTEST
GROUP BY ID
HAVING COUNT(*) > 1
)
GROUP BY
DT.ID
)
I think this should work.
UPDATE DUPTEST SET NONID = 'C'
WHERE ROWID in (
Select ROWID from (
SELECT ROWID, Row_Number() over (Partition By ID, NONID order by ID) rn
) WHERE rn = 1
)
UPDATE duptest
SET nonid = 'c'
WHERE nonid = 'b'
AND rowid = (SELECT min(rowid)
FROM duptest
WHERE nonid = 'b');
I know that this does not answer your initial question, but there is no key on your table and the problem you have adressing a specific row results from that.
So my suggestion - if the specific application allows for it - would be to add a key column to your table (e.g. REAL_ID as INTEGER).
Then you could find out the lowest id for the duplicates
select min (real_id)
from duptest
group by (id, nonid)
and update just these rows:
update duptest
set nonid = 'C'
where real_id in (<select from above>)
I'm sure the update statement can be tuned somewhat, but I hope it illustrates the idea.
The advantage is a "cleaner" design (your id column is not really an id), and a more portable solution than relying on the DB-specific versions of rowid.