Consider the scenario of loading of a table from a flat file. the table has no constraints or indexes defined.Somehow in between loading was interrupted and after some time the table was again loaded from the same file. So this time the records already inserted during first loading were duplicated. how to find the duplicate rows now ? assume there are 150 columns in the table so group by each and every column is tedious
A record is truly duplicate only if all the column values match. It becomes different or unique even if 1 column has a different value. If your table has no primary constraints, you must compare all columns.
An alternative way could be that you could do your 2nd load on a new temp table and populate your old table with records from this temp table where the records do not exist in the old table. In any case you have to compare all columns between the 2 tables to identify truly unique records.
You could also consider adding a primary key to your table and then running your delete query. Check the accepted answer on this link
You can use ROWID for deleting duplicate rows;
Select * FROM table_name A
WHERE
a.rowid > ANY (
SELECT
B.rowid
FROM
table_name B
WHERE
A.col1 = B.col1
AND
A.col2 = B.col2
);
here is a useful link:
[http://www.dba-oracle.com/t_delete_duplicate_table_rows.htm
Tested... Appears to work...
1st we get a list of the table columns in a comma separated list
SELECT wm_concat(column_Name)
FROM all_tab_cols
WHERE table_name = 'TABLENAME'Select and Column_ID is not null;
copy the results into query below where ResultList is defined.
adjust 'Tablename' to your table.
WITH CTE AS (SELECT TN.*, RowNum RN from 'TableName' TN order by ResultList),
SELECT * FROM CTE A
INNER JOIN CTE B using (ResultList)
WHERE A.RN <> B.RN
The above uses natrual joins to join all the tables columns to the same table columns and since duplicate rows will have different row numbers, the result set will list both offending records.
I got this snippet somewhere along the line for deleting dups:
DELETE FROM TABLE_NAME
WHERE ROWID IN
(SELECT ROWID FROM TABLE_NAME
MINUS
SELECT MIN(ROWID) FROM TABLE_NAME
GROUP BY <column list> );
Note the column_list lists the columns that are used to determine uniqueness.
Select * FROM table_name A
WHERE
a.rowid > (
SELECT
min (B.rowid)
FROM
table_name B
WHERE
A.row_id = B.row_id
);
Suppose you are having a test table(table in which you moved the record using flat file) dummd which is having multiple columns (like 150 and you are not sure which column is unique or primary )and duplicate rows so to find all the unique records you can use union and then create a view or new table like i did as test1 :-
create table test1
as
select * from dummd
union
select * from dummd
Related
I have two tables: Table A and Table B. Both tables have a column like a Name, Location, Level. Table A is an initial table and Table B is the updated version of Table A. That means there will be new data present in Table B. I want to write a query that deletes the data from Table B if data is present in Table A but not in Table B. I don't want to delete the data from Table B if the table has a new row of data.
My approach was like this
Delete From TableB Where Exist(
SELECT * FROM dbo.TableB AS TB
EXCEPT
SELECT * FROM dbo.TableA as TA)
This one deletes the data, but it deletes the data from Table B if it is new inserted data as well. Any kind of suggestion is appreciated.
DELETE FROM TableB
WHERE EXIST(
SELECT 1
FROM TableA
WHERE TableA.Name = Name AND TableA.Location = Location AND TableA.Level = Level
)
If TableA has a primary key, then it's enough to check only this key in WHERE condition. Furthermore, if you have this primary key (e.g. it's Name), then you can do like this:
DELETE FROM TableB
WHERE Name IN (SELECT Name FROM TableA)
My code looks like:
CREATE TABLE tableC AS
(SELECT tableA.*,
ST_Intersection (B.geom, A.geom) as geom2 -- generate geom
FROM tableB, tableA
JOIN tableB
ON ST_Intersects (A.geom, b.geom)
WHERE test.id = 2);
Now It is working but I have two columns geom and geom2!
Inside geom column I will have the new geometry based on the intersection. So how can I select tableA except the geom column?
Create the table with all the columns and after that drop the geom column and rename the new one:
CREATE TABLE tableC AS
SELECT
tableA.*,
ST_Intersection (B.geom, A.geom) as geom2 -- generate geom
FROM
tableA inner JOIN tableB ON ST_Intersects (A.geom, b.geom)
WHERE test.id = 2
;
alter table tableC drop column geom;
alter table tableC rename column geom2 to geom;
The only way you would be able to do this would be to generate a dynamic SQL statement based on the columns within the table that excludes those you don't want. Obviously this will be a lot more effort than simply adding in all the column names.
There are also a lot of very good reasons to never include a select * in a production environment, given how picky SQL often is on the number and format of columns that are returned. By using select * you open yourself up to a changing query result in the future that could potentially break things.
If you have a LOT of columns and you simply don't want to manually type them all out, run the query below for your table and then format the result so you can copy/paste into your script:
SELECT *
FROM information_schema.columns
WHERE table_schema = 'your_schema'
AND table_name = 'your_table'
I have 30 tables , the column "ID" value is unique. I can set it as primary key. Each table has 100,000 rows.
ID value
397 3209166.899725
How to verify that all rows are integrated into the table without any one left ?
each small table
Left join big table
Check the null rows ?
Or
SELECT a.* FROM small_table
where not exists
(
SELECT *
FROM Big_table
Where ID = small_table.ID
)
Are there better ways ?
Thanks
You can use the EXCEPT set operator:
select id from small_table
except
select id from big_table
I have two tables with same number of columns :-Table A and Table B
Every day I insert data from Table B to Table A. now the insert query is working
insert into table_a (select * from table_b);
But by this insert the same data which was inserted earlier that is also getting inserted. I only want those rows which are new or are changed from the old data. How can this be done ?
You can use minus:
insert into table_a
select *
from table_b
minus
select *
from table_a;
This assumes that by "duplicate" you mean that all the columns are duplicated.
If you have a timestamp field, you could use it to limit the records to those created after the last copy.
Another option is, assuming that you have an primary key (id column in my example) that you can use to know whether a record has already been copied, you can create a table c (with the same structure as a and b) and do the following:
insert into table c
select a.* from table a
left join table b on (a.id=b.id)
where b.id is null;
insert into table b select * from table c;
truncate table c;
You need to adjust this query in order to use the actual primary key.
Hope this helps!
If the tables have a primary or unique key, then you could leverage that in an anti-join:
insert into table_a
select *
from table_b b
where not exists (
select null
from table_a a
where
a.pk_field_1 = b.pk_field_1 and
a.pk_field_2 = b.pk_field_2
)
You don't say what your key is. Assuming you have a key ID, that is you only want ID's that are not already in Table A. You can also use Merge-Statement for this:
MERGE INTO A USING B ON (A.ID = B.ID)
WHEN NOT MATCHED THEN INSERT (... columns of A) VALUES (... columns of B)
I want to accomplish something of the following:
Select DISTINCT(tableA.column) INTO tableB.column FROM tableA
The goal would be to select a distinct data set and then insert that data into a specific column of a new table.
SELECT column INTO tableB FROM tableA
SELECT INTO will create a table as it inserts new records into it. If that is not what you want (if tableB already exists), then you will need to do something like this:
INSERT INTO tableB (
column
)
SELECT DISTINCT
column
FROM tableA
Remember that if tableb has more columns that just the one, you will need to list the columns you will be inserted into (like I have done in my example).
You're pretty much there.
SELECT DISTINCT column INTO tableB FROM tableA
It's going to insert into whatever column(s) are specified in the select list, so you would need to alias your select values if you need to insert into columns of tableB that aren't in tableA.
SELECT INTO
Try the following...
INSERT INTO tableB (column)
Select DISTINCT(tableA.column)
FROM tableA
The goal would be to select a distinct data set and then insert that data into a specific column of a new table.
I don't know what the schema of tableB is... if table B already exists and there is no unique constraint on the column you can do as any of the others suggest here....
INSERT INTO tableB (column)Select DISTINCT(tableA.column)FROM tableA
but if you have a unique constraint on table B and it already exists you'll have to exclude those values already in table B...
INSERT INTO tableB (column)
Select DISTINCT(tableA.column)
FROM tableA
WHERE tableA.column NOT IN (SELECT /* NOTE */ tableB.column FROM tableB)
-- NOTE: Remember if there is a unique constraint you don't need the more
-- costly form of a "SELECT DISTICT" in this subquery against tableB
-- This could be done in a number of different ways - this is just
-- one version. Best version will depend on size of data in each table,
-- indexes available, etc. Always prototype different ways and measure perf.