insert into fails to insert selected data - sql

I have been working on breaking up a 68GB table into a more normalized structure for the last few weeks, and everything has bee going smoothly until today.
I am attempting to move a select few columns from the big table into the new table with this query:
insert into [destination] (col1, col2, col3...)
select col1, col2, col3
From [source]
where companyID = [source].companyID
I receive the message, (60113678 row(s) affected), but the data was not inserted into the destination, and the data in the source table hasn't been altered, so what has been affected, and why wasn't any of the data inserted into the destination?

The code you seem to want to execute is:
update d
set col1 = s.col1,
col2 = s.col2,
col3 = s.col3
from destination d join
sources s
on s.companyID = s.companyId;
The code you have written is equivalent to:
insert into [destination] (col1, col2, col3...)
select s.col1, s.col2, s.col3
From [source]
where s.companyID = s.companyID;
The where is equivalent to s.companyID is not null. Hence, you have inserted all 60,113,678 rows from source into new rows in destination.
Obviously, one moral of the story is to understand the difference between insert and update. More importantly, qualify all columns names in a query. If you had done so, your query would have have failed at source.CompanyID = destination.CompanyId -- and you wouldn't have to figure out how to delete 60,113,678 new rows.

Related

Insert column from one table to another after copying

I am trying to run this query on SQL Server Management Studio. But I am getting an error. What I did initially was to copy a table from another table. the table from which I am copying is the painting table and I am trying to make a replica of that painting table, named as painting1. But the task was to copy all the columns except for 1.
painting
col1, col2, col3, col4
The query used for this task is as below
SELECT col1, col2, col3
INTO PAINTING1
FROM PAINTING
After the creation of painting1, it has col1, col2 and col3 from painting table
Now the task was to insert the one remaining column into the replica table, so what I did to insert that remaining column to insert the replica table painting1 from painting table.
INSERT INTO PAINTING1 (col4)
SELECT col4 FROM PAINTING
And I run into this error:
invalid column name 'col4'
Any suggestions on what I might be doing wrong,
First add the new column in the table PAINTING1:
ALTER TABLE PAINTING1 ADD col4 INTEGER;
Change INTEGER to the data type of col4 that you want.
What you actually need is to update PAINTING1 with the values of the column col4 from PAINTING.
This can only be done if one of the columns col1, col2 or col3 is unique, like a primary key, so it can be used as the link between the 2 tables:
UPDATE p1
SET p1.col4 = p.col4
FROM PAINTING1 p1 INNER JOIN PAINTING p
ON p.col1 = p1.col1
I used col1 as the link between the tables.
If col1 is not unique, use all 3 columns in the ON clause (if their combination is unique):
ON p.col1 = p1.col1 AND p.col2 = p1.col2 AND p.col3 = p1.col3
You're creating the table with col1, col2 and col3. How can it be anything but an error to refeference col4 that doesn't exist?
You can of course add additional columns to the table at any time, but what you could do is add col4 as part of your select into.
SELECT col1, col2, col3, '' as Col4 /* use / cast whatever data type it should be */
INTO PAINTING1
FROM PAINTING

Avoid duplicates on import of updated excel-sheets. Unique-Index can only hold 10 fields max

I am facing the following situation:
I import an Excel-Sheet, then some columns are modified (e.g. "comments")
After a while, I would receive an updated Excel-Sheet containing the records from the old Excel-sheet as well as new ones.
I do not want to import the records that already exist in the database.
Step-by-Step:
Initial Excel-sheet
col1 col2 comments
A A
A B
After import, some fields will get manipulated
col1 col2 comments
A A looks good
A B fine with me
Then I receive an excel sheet with updates
col1 col2 comments
A A
A B
A C
After this update-step, the database should look like
col1 col2 comments
A A looks good
A B fine with me
A C
I was planning to simply create a unique index on all fields that won't get manipulated, so only the new records will get imported. (sth like
ALTER TABLE tbl ADD CONSRAINT unique_key UNIQUE (col1,col2)
My problem now is that Access somehow only allows composite indices of max. 10 fields. My tables all have around 11-20 cols...
I could maybe import the updated xls to a temp. table, and do s.th like
INSERT INTO tbl_old SELECT col1,col2, "" FROM tbl_new WHERE (col1,col2) NOT IN (SELECT col1,col2 FROM tbl_old UNION SELECT col1,col2 FROM tbl_new)
But I'm wondering if there isn't a more straigt-forward way...
Any ideas how I can solve that?
Try the EXISTS condition:
INSERT INTO tbl_old (col1, col2, comments)
SELECT col1, col2, Null
FROM tbl_new
WHERE NOT EXISTS (SELECT col1, col2 FROM tbl_old WHERE tbl_old.col1 = tbl_new.col1 AND tbl_old.col2 = tbl_new.col2);
Considering you will use SQL approach:
INSERT INTO table_old (col1, col2)
SELECT col1, col2 FROM table_new
EXCEPT
SELECT col1, col2 FROM table_old
:)
It will insert null in comments column though. Use this:
INSERT INTO table_old
SELECT * FROM table_new
EXCEPT
SELECT * FROM table_old
to avoid null values. Also both tables have to have the same amount of columns. For Oracle go with minus instead of except. Equivalent SQL query would be made with LEFT OUTER JOIN.
INSERT INTO table_old (col1 , col2)
SELECT N.col1, N.col2
FROM table_new N
LEFT OUTER JOIN table_old O ON O.col2 = N.col2
WHERE O.col2 IS NULL
Which will also provide null values to comments column, as we are inserting only col1 and col2. All inserts tested on provided table examples.
I would just put PK ID column in those tables.

SQL Server delete a specific row

I have a SQL question which is to delete some rows from a table. The structure of the table is like a paired rows. It can be expressed in the following SQL:
create table #test
(
col1 int, col2 int, col3 int, id char(1), dtime datetime
)
insert into #test
values
(1,1,1,'a','2015-02-01 1:00:00')
,(1,1,1,'b','2015-02-01 1:00:01')
,(2,1,1,'a','2015-02-01 1:00:00')
,(2,1,1,'b','2015-02-01 1:00:01')
,(3,1,3,'b','2015-02-01 1:00:00') -- Remove this row
,(3,1,3,'a','2015-02-01 1:00:03')
,(3,1,3,'b','2015-02-01 1:00:04')
,(4,2,1,'a','2015-02-01 3:00:00')
,(4,2,1,'b','2015-02-01 3:00:01')
,(5,3,1,'a','2015-02-01 4:00:00')
,(5,3,1,'b','2015-02-01 4:00:01')
,(5,6,3,'b','2015-02-01 4:00:00') -- Remove this row
,(5,6,3,'a','2015-02-01 4:00:03')
,(5,6,3,'b','2015-02-01 4:00:04')
select *
from #test
order by col1,col2,col3
drop table #test
Sorry, I have to make it clear. This question is from a real dataflow. The data is about a workflow steps. It has a start time, and a complete time. Each step might have multiple rows (because the step is called multiple times). When I choose a begin time and end time to get the dataflow, you can expect some steps is cut at the complete time instead of the start time which I want.
The query is to remove the unpaired rows that begin with complete time.
As you seen, every two rows should consist of a column of 'a' and 'b', and start with 'a' -- the start time. But those two rows to be deleted (actually we do not know how many they are) starts with 'b'-- the complete time.
Having a primary key makes deletions much easier. Adding one would be the ideal solution.
Without a primary key or some other unique constraint, there could be duplicate rows. The datetime column does not guarantee that the data is unique.
If there are duplicates, do you want all duplicate rows deleted? If so, you can delete them specifying all of the columns:
delete from #Test
where col 1 = 3
and col2 = 1
and col3 = 3
and id = 'b'
and dtime = '2015-02-01 1:00:00'
delete from #Test
where col 1 = 5
and col2 = 6
and col3 = 3
and id = 'b'
and dtime = 2015-02-01 4:00:00'
If you want all but one of the potential duplicates removed, you would have to number them and delete all matching rows after the first row.
You can not delete a specific row with non-unique values.
So, you have to ad an id-column (primary key!)
As said, if there's not a primary key set, you have to tell it every value that makes it different from other. In this case:
DELETE FROM #test WHERE dtime ='2015-02-01 1:00:00' AND id = 'b' AND col1 = 3 AND col2 = 1 AND col3 = 3
But I warn you this isn't a good practice. You should set a primary key as you've already told.
WITH Ordered AS
(
SELECT Col1, col2, col3, id, dtime,
ROW_NUMBER() OVER(PARTITION BY col1, col2, col3, id ORDER BY dtime DESC) AS Pos
FROM #test
)
--SELECT a.*, b.Pos
DELETE a
FROM #test AS a
INNER JOIN Ordered AS b ON a.col1 = b.col1 AND a.col2 = b.col2 AND a.col3 = b.col3
AND a.ID = b.ID AND a.dtime = b.dtime
AND b.Pos <> 1
This will remove all but the most recent of each duplicate.

SQL Server Merge - Output returning null deleted rows as inserted

I'm using MERGE to sync data in a table, and I'm seeing some wrong (to me) behavior from SQL Server. When I OUTPUT the INSERTED.* values, and a row was deleted, the MERGE command returns a row with all NULL columns for each row that was deleted.
For example, take this schema:
CREATE TABLE tbl
(
col1 INT NOT NULL,
col2 INT NOT NULL
);
I do an initial load of data, and all 4 rows are outputted as expected.
WITH data1 AS (
SELECT 1 [col1],1 [col2]
UNION ALL SELECT 2 [col1],2 [col2]
UNION ALL SELECT 3 [col1],3 [col2]
UNION ALL SELECT 4 [col1],4 [col2]
)
MERGE tbl t
USING data1 s
ON t.col1 = s.col1 AND t.col2 = s.col2
WHEN NOT MATCHED BY TARGET
THEN INSERT (col1,col2) VALUES (s.col1,s.col2)
WHEN NOT MATCHED BY SOURCE
THEN DELETE
OUTPUT INSERTED.*;
Now, say I remove 2 rows from the data I'm syncing with the table (in my CTE) and do the same MERGE, I see 2 rows of all NULL columns returned.
WITH data1 as (
SELECT 1 [col1],1 [col2]
UNION ALL SELECT 2 [col1],2 [col2]
)
MERGE tbl t
USING data1 s
ON t.col1 = s.col1 AND t.col2 = s.col2
WHEN NOT MATCHED BY TARGET
THEN INSERT (col1,col2) VALUES (s.col1,s.col2)
WHEN NOT MATCHED BY SOURCE
THEN DELETE
OUTPUT INSERTED.*;
To me, this seems like wrong behavior because A) I didn't as for any deleted rows and B) this makes it seem like I inserted these 2 NULL rows into my table, which I clearly did not. Can anyone shed some light on what's happening?
From the documentation:
output_clause - Returns a row for every row in target_table that is
updated, inserted, or deleted, in no particular order. $action can be
specified in the output clause. $action is a column of type
nvarchar(10) that returns one of three values for each row: 'INSERT',
'UPDATE', or 'DELETE', according to the action that was performed on
that row.
It seems that SQL Server is outputting one row for every row that changed (by an insert or delete). When I specify OUTPUT INSERTED.*, I'm really only specifying the inserted data, which is null for the 2 rows that were changed. If I specify OUTPUT INSERTED.col1 [InsCol1],INSERTED.col2 [InsCol2],DELETED.col1 [DelCol1],DELETED.col2 [DelCol2],$action then I can see a better picture of what's happening.
Thanks to Laurence for your comment.

SQL: How to insert data into a table with column names

When inserting data into a SQL Server table, is it possible to specify which column you want to insert data to?
For a table with
I know you can have syntax like this:
INSERT INTO MyTable (Name, col4_on, col8_on, col9_on)
VALUES ('myName', 0, 1, 0)
But the above syntax becomes unwieldy when you have lots of columns, especially if they have binary data. It becomes hard to match up which 1 and 0 go with which column. I'm hoping there's a named-parameter like syntax (similar to what C# has) which looks like the following:
INSERT INTO MyTable
VALUES (Name: 'myName', col4_on: 0, col8_on: 1, col9_on: 0)
Thanks
You must specify the column names. However, there is one exception. If you INSERTing exactly the same number of columns as the target table has in the same order as they are in the table, use this syntax:
INSERT INTO MyTable
VALUES ('val1A', 'val4A', 'val8A')
Note that this is a fragile way of performing an INSERT, because if that table changes, or if the columns are ordered differently on a different system, the INSERT may fail, or worse-- it may put the wrong data in each column.
I've found that when I INSERT a lot of columns, I find the queries easier to read if I can group them somehow. If column names are long, I may put them on separate lines like so:
INSERT INTO MyTable
(
MyTable_VeryLongName_Col1,
MyTable_VeryLongName_Col4,
MyTable_VeryLongName_Col8,
-- etc.
)
SELECT
Very_Long_Value_1,
Very_Long_Value_4,
Very_Long_Value_8,
-- etc.
Or you can group 2 columns on a line, or put spaces on every 5, or comment every 10th line, etc. Whatever makes it easier to read.
If you find including column names onerous when INSERTing a lot of rows, then try chaining the data together:
INSERT INTO MyTable (col1, col4, col8)
VALUES ('val1A', 'val4A', 'val8A'),
('val1B', 'val4B', 'val8B'),
-- etc.
Or UNION them together:
INSERT INTO MyTable (col1, col4, col8)
SELECT 'val1A', 'val4A', 'val8A'
UNION ALL 'val1B', 'val4B', 'val8B'
UNION ALL ... -- etc.
Or, SELECT them from another table:
INSERT INTO MyTable (col1, col4, col8)
SELECT val1, va4, val8
FROM MyOtherTable
WHERE -- some condition is met
INSERT INTO MyTable (col1, col4, col8)
VALUES ('val1', 'val4', 'val8')
This statement will add values to the columns mentioned in your INSERT INTO statement, you can write the above query in the following formats it will not make any difference .
INSERT INTO MyTable (col8, col1, col4)
VALUES ('val8', 'val1', 'val4')
OR
INSERT INTO MyTable (col4, col8, col1)
VALUES ('val4', 'val8', 'val1')
to Add multiple rows at a time you can pass multiple rows at a time in you values clause something like this
INSERT INTO MyTable (col4, col8, col1)
VALUES ('val4', 'val8', 'val1'),
('val4', 'val8', 'val1'),
('val4', 'val8', 'val1'),
('val4', 'val8', 'val1')
The order of the values should match the order of the columns
mentioned in your INSERT INTO statement.
All above statement will have the same result.
keeping one thing in mind once you have mentioned a column you must provide a value for it
like this
INSERT INTO MyTable (col1, col4, col8)
VALUES ('val1', null, 'val8')
but you cannot do something like this
INSERT INTO MyTable (col1, col4, col8)
VALUES ('val1', 'val8')
I figured out a way around this but it's rather hacky and only works for tables which has columns with unique values:
INSERT INTO MyTable (Name)
VALUES ('myName')
UPDATE MyTable
SET col4_on=0, col8_on=1, col9_on=0
WHERE Name = 'myName'
This could be expanded into a multiple row insert as follows:
INSERT INTO MyTable (Name)
VALUES ('row1'), ('row2'), ('row3')
UPDATE MyTable SET col4_on=0, col8_on=1, col9_on=0 WHERE Name = 'row1'
UPDATE MyTable SET col4_on=1, col8_on=0, col9_on=0 WHERE Name = 'row2'
UPDATE MyTable SET col4_on=1, col8_on=1, col9_on=1 WHERE Name = 'row3'
No, there is no way to do specifically what you want. The closest thing you can do is to use the column creation order to avoid use the columns names on the insert command. As this:
If you have a table like
tableA ( id, name, phone )
You can insert values on it using
insert into tableA values ( 1, 'Name', '555-9999' );
But be carefull, you have to follow the exact order on the fields of your table, otherwise you can have an error and worst, put wrong data in wrong fields.
Nope you cannot do it, the only other alternative for you would be insert from select
insert into MyTable
select 'val1' as col1, 'val4' as col4, 'val8' as col8 --if any extra columns then just do "null as col10"
assuming the order is same in table