Insert distinct values from one table into another table - sql

So for each distinct value in a column of one table I want to insert that unique value into a row of another table.
list = select distinct(id) from table0
for distinct_id in list
insert into table1 (id) values (distinct_id)
end
Any ideas as to how to go about this?

Whenever you think about doing something in a loop, step back, and think again. SQL is optimized to work with sets. You can do this using a set-based query without the need to loop:
INSERT dbo.table1(id) SELECT DISTINCT id FROM dbo.table0;
There are some edge cases where looping can make more sense, but as SQL Server matures and more functionality is added, those edge cases get narrower and narrower...

insert into table1 (id)
select distinct id from table0

The following statement works with me.
insert into table1(col1, col2) select distinct on (col1) col1 col2 from table0

The below query will also check the existing data in the Table2.
INSERT INTO Table2(Id) SELECT DISTINCT Id FROM Table1 WHERE Id NOT IN(SELECT Id FROM Table2);

Other Simple way to copy distinct data with multiple columns from one table to other
Insert into TBL2
Select * from (Select COL1, ROW_NUMBER() over(PARTITION BY COL1 Order By COL1) AS COL2 From TBL1)T
where T.COL2 = 1

Related

adjusting the insert into SQL

I got this already working;
INSERT INTO TermsFinal
(old_classification, count, new_classification, old_term,new_term)
SELECT old_classification , Count(seed) AS count , new_classification, old_term, new_term FROM TermsTemp
GROUP BY old_classification
ORDER BY count DESC
There is one more field in the TermsFinal called SOURCE_TABLE which TermsTemp does not have.
I would like to populate that field too. I already got the $source_table value. I tried this but dis not work.
INSERT INTO TermsFinal
(SOURCE_TABLE,old_classification, count, new_classification, old_term,new_term)
'{$SOURCE_TABLE}', SELECT old_classification , Count(seed) AS count , new_classification, old_term, new_term FROM TermsTemp_TEMP
GROUP BY old_classification
ORDER BY count DESC
How do you add that value into the SOURCE_TABLE field of the TermsFinal while executing the insert into statement in one go?
and the other puzzling thing to me here, how come my first SQL insertinto works without the SQL keyword VALUES. This page http://www.w3schools.com/sql/sql_insert.asp teaches that VALUES part is needed!
You can put string (or any other type ) constant into select, for example
select 'string' as const_str , field1 from table1 will return 2 columns , the first will have "string" text for all rows. In your case you can do
INSERT INTO TermsFinal
(SOURCE_TABLE,old_classification, count, new_classification, old_term,new_term)
SELECT '{$SOURCE_TABLE}', old_classification , Count(seed) AS count , new_classification, old_term, new_term FROM TermsTemp_TEMP
GROUP BY old_classification
ORDER BY count DESC
There are several ways to insert data into a table
One row at a time. This is where you need the values keyword.
Insert into TableA (Col1, Col2,...) values (#Val1, #Val2,...)
You can get the id using select ##identity (at least in ms sql), if you have auto identity
on. This is useful when you need the ID for you next insert.
From select (what you're doing)
Insert into tableA (Col1, Col2)
Select Val1, Val2 from TableB
or insert a hard coded value and values from two separate tables
Insert into tableA (Col1, Col2, Col3, Col4)
Select 'hard coded value', b.Val1, b.Val2, c.Val1
from TableB b join Table c on b.TableCID=c.ID
Now you can insert more than one row at once.
Select into
This method ends up creating a new table from a query, and is great for quick backups of a table, or part of a table.
select * into TermsFinal_backup from TermsFinal where ...

How to select the record from table and insert into another table?

I wanted to select the last record from the table1 and insert into another table .Here is my query.
Insert into table2 values(select top 1 col1,col2 from table1 order by id desc).
I know for adding the value into table,need to be in cotation.But where to add?
You can select literals to fill in the other columns that table1 can't provide, something like this:
insert into table2 (col_a, col_b, col_c, col_d)
select top 1 col1, col2, 'foo', 'bar'
from table1
order by id desc
Any columns you do not name in the column list will get the default value, or null if no default is defined.
The number and type of columns selected must match the number and type of columns in the insert column list.
In SQL, there are essentially basically two ways to INSERT data into a table: One is to insert it one row at a time, the other is to insert multiple rows at a time. Let's take a look at each of them individually:
INSERT INTO table_name (column1, column2, ...)
VALUES ('value1', 'value2', ...)
The second type of INSERT INTO allows us to insert multiple rows into a table. Unlike the previous example, where we insert a single row by specifying its values for all columns, we now use a SELECT statement to specify the data that we want to insert into the table. If you are thinking whether this means that you are using information from another table, you are correct. The syntax is as follows:
INSERT INTO table1 (column1, column2, ...)
SELECT t2.column3, t2.column4, ...
FROM table2 t2
So, in you case, you can do it like this:
Insert into table2
select top 1 t1.col1,t1.col2 from table1 t1 order by id desc
Or you can use your syntax like this:
declare #col1 type_of_col1, #col2 type_of_col2
select top 1 #col1 = t1.col1, #col2 = t1.col2 from table1 t1 order by id desc
Insert into table2 values(#col1, #col2)
Offcourse, this all works assuming that the column datatypes are matched.

Efficiently duplicate some rows in PostgreSQL table

I have PostgreSQL 9 database that uses auto-incrementing integers as primary keys. I want to duplicate some of the rows in a table (based on some filter criteria), while changing one or two values, i.e. copy all column values, except for the ID (which is auto-generated) and possibly another column.
However, I also want to get the mapping from old to new IDs. Is there a better way to do it then just querying for the rows to copy first and then inserting new rows one at a time?
Essentially I want to do something like this:
INSERT INTO my_table (col1, col2, col3)
SELECT col1, 'new col2 value', col3
FROM my_table old
WHERE old.some_criteria = 'something'
RETURNING old.id, id;
However, this fails with ERROR: missing FROM-clause entry for table "old" and I can see why: Postgres must be doing the SELECT first and then inserting it and the RETURNING clauses only has access to the newly inserted row.
RETURNING can only refer to the columns in the final, inserted row. You cannot refer to the "OLD" id this way unless there is a column in the table to hold both it and the new id.
Try running this which should work and will show all the possible values that you can get via RETURNING:
INSERT INTO my_table (col1, col2, col3)
SELECT col1, 'new col2 value', col3
FROM my_table AS old
WHERE old.some_criteria = 'something'
RETURNING *;
It won't get you the behavior you want, but should illustrate better how RETURNING is designed to work.
This can be done with the help of data-modifiying CTEs (Postgres 9.1+):
WITH sel AS (
SELECT id, col1, col3
, row_number() OVER (ORDER BY id) AS rn -- order any way you like
FROM my_table
WHERE some_criteria = 'something'
ORDER BY id -- match order or row_number()
)
, ins AS (
INSERT INTO my_table (col1, col2, col3)
SELECT col1, 'new col2 value', col3
FROM sel
ORDER BY id -- redundant to be sure
RETURNING id
)
SELECT s.id AS old_id, i.id AS new_id
FROM (SELECT id, row_number() OVER (ORDER BY id) AS rn FROM ins) i
JOIN sel s USING (rn);
SQL Fiddle demonstration.
This relies on the undocumented implementation detail that rows from a SELECT are inserted in the order provided (and returned in the order provided). It works in all current versions of Postgres and is not going to break. Related:
Does Postgres preserve insertion order of records?
Window functions are not allowed in the RETURNING clause, so I apply row_number() in another subquery.
More explanation in this related later answer:
INSERT INTO ... FROM SELECT ... RETURNING id mappings
Good! I test this code, but I change
this (FROM my_table AS old) in (FROM my_table) and
this (WHERE old.some_criteria = 'something') in (WHERE some_criteria = 'something')
This is the final code that I use
INSERT INTO my_table (col1, col2, col3)
SELECT col1, 'new col2 value', col3
FROM my_table AS old
WHERE some_criteria = 'something'
RETURNING *;
Thanks!
DROP TABLE IF EXISTS tmptable;
CREATE TEMPORARY TABLE tmptable as SELECT * FROM products WHERE id = 100;
UPDATE tmptable SET id = sbq.id from (select max(id)+1 as id from products) as sbq;
INSERT INTO products (SELECT * FROM tmptable);
DROP TABLE IF EXISTS tmptable;
add another update before the insert to modify another field
UPDATE tmptable SET another = 'data';
'old' is a reserved word, used by the rule rewrite system.
[ I presume this query fragment is not part of a rule; in that case you would have phrased the question differently ]

sql insert into table from select without duplicates (need more then a DISTINCT)

I am selecting multiple rows and inserting them into another table. I want to make sure that it doesn't already exists in the table I am inserting multiple rows into.
DISTINCT works when there are duplicate rows in the select, but not when comparing it to the data already in the table your inserting into.
If I Selected one row at a time I could do a IF EXIST but since its multiple rows (sometimes 10+) it doesn't seem like I can do that.
INSERT INTO target_table (col1, col2, col3)
SELECT DISTINCT st.col1, st.col2, st.col3
FROM source_table st
WHERE NOT EXISTS (SELECT 1
FROM target_table t2
WHERE t2.col1 = st.col1
AND t2.col2 = st.col2
AND t2.col3 = st.col3)
If the distinct should only be on certain columns (e.g. col1, col2) but you need to insert all column, you will probably need some derived table (ANSI SQL):
INSERT INTO target_table (col1, col2, col3)
SELECT st.col1, st.col2, st.col3
FROM (
SELECT col1,
col2,
col3,
row_number() over (partition by col1, col2 order by col1, col2) as rn
FROM source_table
) st
WHERE st.rn = 1
AND NOT EXISTS (SELECT 1
FROM target_table t2
WHERE t2.col1 = st.col1
AND t2.col2 = st.col2)
If you already have a unique index on whatever fields need to be unique in the destination table, you can just use INSERT IGNORE (here's the official documentation - the relevant bit is toward the end), and have MySQL throw away the duplicates for you.
Hope this helps!
So you're looking to retrieve all unique rows from source table which do not already exist in target table?
SELECT DISTINCT(*) FROM source
WHERE primaryKey NOT IN (SELECT primaryKey FROM target)
That's assuming you have a primary key which you can base the uniqueness on... otherwise, you'll have to check each column for uniqueness.
pseudo code for what might work
insert into <target_table> select col1 etc
from <source_table>
where <target_table>.keycol not in
(select source_table.keycol from source_table)
There are a few MSDN articles out there about this, but by far this one is the best:
http://msdn.microsoft.com/en-us/library/ms162773.aspx
They made it real easy to implement and my problem is now fixed. Also the GUI is ugly, but you actually can set minute intervals without using the command line in windows 2003.

How can I delete duplicate rows in a table

I have a table with say 3 columns. There's no primary key so there can be duplicate rows. I need to just keep one and delete the others. Any idea how to do this is Sql Server?
I'd SELECT DISTINCT the rows and throw them into a temporary table, then drop the source table and copy back the data from the temp.
EDIT: now with code snippet!
INSERT INTO TABLE_2
SELECT DISTINCT * FROM TABLE_1
GO
DELETE FROM TABLE_1
GO
INSERT INTO TABLE_1
SELECT * FROM TABLE_2
GO
Add an identity column to act as a surrogate primary key, and use this to identify two of the three rows to be deleted.
I would consider leaving the identity column in place afterwards, or if this is some kind of link table, create a compound primary key on the other columns.
The following example works as well when your PK is just a subset of all table columns.
(Note: I like the approach with inserting another surrogate id column more. But maybe this solution comes handy as well.)
First find the duplicate rows:
SELECT col1, col2, count(*)
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
If there are only few, you can delete them manually:
set rowcount 1
delete from t1
where col1=1 and col2=1
The value of "rowcount" should be n-1 times the number of duplicates. In this example there are 2 dulpicates, therefore rowcount is 1. If you get several duplicate rows, you have to do this for every unique primary key.
If you have many duplicates, then copy every key once into anoher table:
SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
Then copy the keys, but eliminate the duplicates.
SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
In your keys you have now unique keys. Check if you don't get any result:
SELECT col1, col2, count(*)
FROM holddups
GROUP BY col1, col2
Delete the duplicates from the original table:
DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
Insert the original rows:
INSERT t1 SELECT * FROM holddups
btw and for completeness: In Oracle there is a hidden field you could use (rowid):
DELETE FROM our_table
WHERE rowid not in
(SELECT MIN(rowid)
FROM our_table
GROUP BY column1, column2, column3... ;
see: Microsoft Knowledge Site
Here's the method I used when I asked this question -
DELETE MyTable
FROM MyTable
LEFT OUTER JOIN (
SELECT MIN(RowId) as RowId, Col1, Col2, Col3
FROM MyTable
GROUP BY Col1, Col2, Col3
) as KeepRows ON
MyTable.RowId = KeepRows.RowId
WHERE
KeepRows.RowId IS NULL
This is a way to do it with Common Table Expressions, CTE. It involves no loops, no new columns or anything and won't cause any unwanted triggers to fire (due to deletes+inserts).
Inspired by this article.
CREATE TABLE #temp (i INT)
INSERT INTO #temp VALUES (1)
INSERT INTO #temp VALUES (1)
INSERT INTO #temp VALUES (2)
INSERT INTO #temp VALUES (3)
INSERT INTO #temp VALUES (3)
INSERT INTO #temp VALUES (4)
SELECT * FROM #temp
;
WITH [#temp+rowid] AS
(SELECT ROW_NUMBER() OVER (ORDER BY i ASC) AS ROWID, * FROM #temp)
DELETE FROM [#temp+rowid] WHERE rowid IN
(SELECT MIN(rowid) FROM [#temp+rowid] GROUP BY i HAVING COUNT(*) > 1)
SELECT * FROM #temp
DROP TABLE #temp
This is a tough situation to be in. Without knowing your particular situation (table size etc) I think that your best shot is to add an identity column, populate it and then delete according to it. You may remove the column later but I would suggest that you should keep it as it is really a good thing to have in the table
After you clean up the current mess you could add a primary key that includes all the fields in the table. that will keep you from getting into the mess again.
Of course this solution could very well break existing code. That will have to be handled as well.
Can you add a primary key identity field to the table?
Manrico Corazzi - I specialize in Oracle, not MS SQL, so you'll have to tell me if this is possible as a performance boost:-
Leave the same as your first step - insert distinct values into TABLE2 from TABLE1.
Drop TABLE1. (Drop should be faster than delete I assume, much as truncate is faster than delete).
Rename TABLE2 as TABLE1 (saves you time, as you're renaming an object rather than copying data from one table to another).
Here's another way, with test data
create table #table1 (colWithDupes1 int, colWithDupes2 int)
insert into #table1
(colWithDupes1, colWithDupes2)
Select 1, 2 union all
Select 1, 2 union all
Select 2, 2 union all
Select 3, 4 union all
Select 3, 4 union all
Select 3, 4 union all
Select 4, 2 union all
Select 4, 2
select * from #table1
set rowcount 1
select 1
while ##rowcount > 0
delete #table1 where 1 < (select count(*) from #table1 a2
where #table1.colWithDupes1 = a2.colWithDupes1
and #table1.colWithDupes2 = a2.colWithDupes2
)
set rowcount 0
select * from #table1
What about this solution :
First you execute the following query :
select 'set rowcount ' + convert(varchar,COUNT(*)-1) + ' delete from MyTable where field=''' + field +'''' + ' set rowcount 0' from mytable group by field having COUNT(*)>1
And then you just have to execute the returned result set
set rowcount 3 delete from Mytable where field='foo' set rowcount 0
....
....
set rowcount 5 delete from Mytable where field='bar' set rowcount 0
I've handled the case when you've got only one column, but it's pretty easy to adapt the same approach tomore than one column. Let me know if you want me to post the code.
How about:
select distinct * into #t from duplicates_tbl
truncate duplicates_tbl
insert duplicates_tbl select * from #t
drop table #t
I'm not sure if this works with DELETE statements, but this is a way to find duplicate rows:
SELECT *
FROM myTable t1, myTable t2
WHERE t1.field = t2.field AND t1.id > t2.id
I'm not sure if you can just change the "SELECT" to a "DELETE" (someone wanna let me know?), but even if you can't, you could just make it into a subquery.