OUTPUT INTO fails due to invalid columns name - sql

I am trying to make two inserts one after another like:
INSERT INTO tbl_tours (TimeFrom)
OUTPUT inserted.tourId, DispatchingId, TimeFrom, TimeTo INTO tbl_tourData (tour_fk, dispatchId, timeFrom, timeTo)
SELECT TimeFrom
FROM #tmpTable
SELECT * FROM tbl_tours
SELECT * FROM tbl_tourData
But I get an error:
Msg 207 Level 16 State 1 Line 13
Invalid column name 'DispatchingId'.
Msg 207 Level 16 State 1 Line 13
Invalid column name 'TimeFrom'.
Msg 207 Level 16 State 1 Line 13
Invalid column name 'TimeTo'.
You can check full code at this fiddle:
https://dbfiddle.uk/?rdbms=sqlserver_2016&fiddle=c10f9886bcfb709503007f18b24eabfd
How to combine these inserts?

The output clause can only refer to columns that are inserted. So this works:
INSERT INTO tbl_tours (TimeFrom)
output inserted.tourId, inserted.TimeFrom into tbl_tourData(tour_fk, timeFrom)
SELECT TimeFrom FROM #tmpTable;
Here is the revised db<>fiddle.
If you want additional information, you need to join back to another source.

When you do an insert ... output, the "output" part can only output whatever was inserted by the "insert" part. You can't reference data from the "inserting" table.
You do insert into tbl_tours(TimeFrom). So you're only inserting a single column - the TimeFrom column, and the tour_id column will be automatically inserted, so that's available too. But then you try to use 4 columns in the output list. Where would these extra two columns come from?
One way to do this in a single step is to use the merge statement, which can get data from the "inserting" source, not just the "inserted" table. Since you know you always want to do an insert, you can join on 1 = 0:
merge tbl_tours
using #tmpTable tmp on 1 = 0
when not matched then
insert (TimeFrom)
values (tmp.TimeFrom)
output inserted.tourId,
tmp.dispatchingId,
inserted.timeFrom, -- or tmp.timeFrom, doesn't matter which
tmp.TimeTo
into tbl_tourData (tour_fk, dispatchId, timeFrom, timeTo);
I should add: This is only possible because you don't actually have a foreign key defined from tbl_tourData to tbl_Tours. You probably do intend to have one given your column name. An output clause can't output into a table with a foreign key (or a primary key with a foreign key to it), so this approach won't work at all if you ever decide to actually create that foreign key. You'll have to do it in two steps. Either per Gordon's answer (insert and join), or by creating a whole new temp table matching the schema of tbl_tourData, outputting everything into that using merge, and then dumping the second temp table into the real tbl_tourData.

Related

BigQuery insert values AS, assume nulls for missing columns

Imagine there is a table with 1000 columns.
I want to add a row with values for 20 columns and assume NULLs for the rest.
INSERT VALUES syntax can be used for that:
INSERT INTO `tbl` (
date,
p,
... # 18 more names
)
VALUES(
DATE('2020-02-01'),
'p3',
... # 18 more values
)
The problem with it is that it is hard to tell which value corresponds to which column. And if you need to change/comment out some value then you have to make edits in two places.
INSERT SELECT syntax also can be used:
INSERT INTO `tbl`
SELECT
DATE('2020-02-01') AS date,
'p3' AS p,
... # 18 more value AS column
... # 980 more NULL AS column
Then if I need to comment out some column just one line has to be commented out.
But obviously having to set 980 NULLs is an inconvenience.
What is the way to combine both approaches? To achieve something like:
INSERT INTO `tbl`
SELECT
DATE('2020-02-01') AS date,
'p3' AS p,
... # 18 more value AS column
The query above doesn't work, the error is Inserted row has wrong column count; Has 20, expected 1000.
Your first version is really the only one you should ever be using for SQL inserts. It ensures that every target column is explicitly mentioned, and is unambiguous with regard to where the literals in the VALUES clause should go. You can use the version which does not explicitly mention column names. At first, it might seem that you are saving yourself some code. But realize that there is a column list which will be used, and it is the list of all the table's columns, in whatever their positions from definition are. Your code might work, but appreciate that any addition/removal of a column, or changing of column order, can totally break your insert script. For this reason, most will strongly advocate for the first version.
You can try following solution, it is combination of above 2 process which you have highlighted in case study:-
INSERT INTO `tbl` (date, p, 18 other coll names)
SELECT
DATE('2020-02-01') AS date,
'p3' AS p,
... # 18 more value AS column
Couple of things you should consider here are :-
Other 980 Columns should ne Nullable, that means it should hold NULL values.
All 18 columns in Insert line and Select should be in same order so that data will be inserted in same correct order.
To Avoid any confusion, try to use Alease in Select Query same as Insert Table Column name. It will remove any ambiguity.
Hopefully it will work for you.
In BigQuery, the best way to do what you're describing is to first load to a staging table. I'll assume you can get the values you want to insert into JSON format with keys that correspond to the target column names.
values.json
{"date": "2020-01-01", "p": "p3", "column": "value", ... }
Then generate a schema file for the target table and save it locally
bq show --schema project:dataset.tbl > schema.json
Load the new data to the staging table using the target schema. This gives you "named" null values for each column present in the target schema but missing from your json, bypassing the need to write them out.
bq load --replace --source_format=NEWLINE_DELIMIITED_JSON \
project:dataset.stg_tbl values.json schema.json
Now the insert select statement works every time
insert into `project:dataset.tbl`
select * from `project:dataset.stg_tbl`
Not a pure SQL solution but I managed this by loading my staging table with data then running something like:
from google.cloud import bigquery
client = bigquery.Client()
table1 = client.get_table(f"{project_id}.{dataset_name}.table1")
table1_col_map = {field.name: field for field in table1.schema}
table2 = client.get_table(f"{project_id}.{dataset_name}.table2")
table2_col_map = {field.name: field for field in table2.schema}
combined_schema = {**table2_col_map, **table1_col_map}
table1.schema = list(combined_schema.values())
client.update_table(table1_cols, ["schema"])
Explanation:
This will retrieve the schemas of both, convert their schemas into a dictionary with key as column name and value as the actual field info from the sdk. Then both are combined with dictionary unpacking (the order of unpacking determines which table's columns have precedence when a column is common between them. Finally the combined schema is assigned back to table 1 and used to update the table, adding the missing columns with nulls.

When using OUTPUT in an insert statement, if you specify the order by in the select does it honor the order specified in the select?

I am inserting data into two tables. Within each insert there is an OUTPUT to a #temp table each with an identity column. The select that is generating the data for the insert has the same order by for each insert. Later on I join the two #temp tables by the Identity column. What I would expect is that the identity column numbers would line up as the order by is specified on both sides when inserting. Every long once in a while it appears those numbers don't match up and the only thing I can think of is that perhaps the OUTPUT isn't always honoring the order by in the select statements when writing the OUTPUT data to the temp tables.
CREATE TABLE #TempTable
(
RowNumber Integer IDENTITY (1,1) NOT NULL,
TableID Integer
CONSTRAINT PK_TableID PRIMARY KEY NONCLUSTERED (RowNumber)
)
INSERT INTO Table
(column1,column2,column3,etc)
OUTPUT
INSERTED.ID
INTO #TempTable
(ID)
SELECT
column1,column2,column3,etc
FROM
Other table
ORDER BY
SourceFlag,
StoreID,
storenumber,
EstablishDate,
TableID
What I would expect is that the statements would insert for example 25 rows in both statements in the same order 1 through 25. Then I should be able to join based on the row number 1 = 1, 25= 25, etc. in order to get the matching data. What I think is happening is somehow that order is getting messed up, so that row #1 from the first insert really matches say row #14 from the second, so when I later join 1 on 1 I'm getting mismatched data.
Apparently, it doesn't:
However, SQL Server does not guarantee the order in which rows are
processed and returned by DML statements using the OUTPUT clause.
You need to identify a natural key in your data and then reference it to match the newly inserted rows with the OUTPUT resultset.
Alternatively, you can replace the INSERT with MERGE; in this case, you will be able to catch the newly created identity values for your records in the OUTPUT clause.

Insert from select

Based on this topic I've encountered a problem with insertions.
My Tests table contains:
TestID Name
1 test_insert_film
2 test_insert_writer
3 test_insert_location
4 test_delete_film
5 test_delete_writer
6 test_delete_location
I want to insert into my TestTables the id's of the tests with the following sequence:
INSERT INTO TestTables(TestID)
SELECT TestID
FROM Tests
But I get:
Msg 515, Level 16, State 2, Line 1
Cannot insert the value NULL into column 'TableID', table 'FilmS.dbo.TestTables'; column does not allow nulls. INSERT fails. The statement has been terminated.
TestTables contains 4 columns, one of them being TestID. Why isn't this working?
The column TableID (!) in your table TestTables is not allowed NULL values! This column is not in the list of columns to be filled upon the INSERT, so the default value assumed is NULL. This is why you get the error.
You may need something like:
INSERT INTO TestTables(TestID, TableID)
SELECT TestID, '' FROM Tests
To fill the TableID column with a default value. Maybe also other columns in the TestTables table are affected and need to be treated similarly.
PS: You could also modify the the TestTables definition to provide a default value for the respective columns. If you do so you can leave the above statement as it is.

Cannot insert duplicate key SQL

insert into A (id,Name)
select ti.id,ti .Name
from A ti
where ti.id >= 1 AND ti.id<=3
id is the primary key but not autogenerated. When I run the query I get an error
Violation of PRIMARY KEY constraint 'XPKA'. Cannot insert duplicate key in object 'dbo.A'
tabel A
id Name
1 A
2 B
3 C
and I want to insert
id Name
4 A
5 B
6 C
Every row must have a different value for the Primary Key column. You are inserting the records from A back into itself, thus you are attempting to create a new row using a Primary Key value that is already being used. This leads to the error message that you see.
If you must insert records in this fashion, then you need to include a strategy for including unique values in the PK Column. If you cannot use an autoincrement rule (the normal method), then your logic needs to enforce this requirement, otherwise you will continue to see errors like this.
You are selecting from table A and inserting straight back in to it. This means that the ID values you insert will certainly already be there.
The message says that ID col has a PrimaryKey on it and requires the values in the column to be unique. It won't let you perform the action for this reason.
To fix your query based on your stated requirement, change the script to:
insert into A (id,Name)
select ti.id + 3,ti .Name
from A ti
where ti.id >= 1 AND ti.id<=3
You need to adjust the ID of the rows you are inserting. In your example to produce keys 4, 5, 6:
insert into A (id,Name)
select ti.id + 3 as NewKey,ti.Name
from A ti
where ti.id >= 1 AND ti.id<=3
But in reality you need to pick a value that will keep your new keys separate from any possible old key, maybe:
insert into A (id,Name)
select ti.id + 100000 as NewKey,ti.Name
from A ti
where ti.id >= 1 AND ti.id<=3
As Yaakov Ellis has said...
Every row must have a different value for the Primary Key column.
And as you have a WHERE clause which constricts your rows to 3 in total EVER
Those with the unique Id's 1, 2 and 3
So if you want to replace those rather then tring to INSERT them where they already exist and generating your error.
Maybe you could UPDATE them instead?
And that will resolve your issue.
UPDATE
After your addition of extra code...
You should set your UNIQUE Key Identifier to the ID Number and not the ABC field name (whatever you have called it)

Combine two columns together from separate tables

Let's say for instance:
I have two tables: old_data and new_data.
Both old_data and new_data have one column called this_is_col.
Both old_data and new_data have various (hundreds) of rows of dates (2010-02-06, 2010-01-09, 2007-06-02, etc.). Both tables don't necessarily have the same dates, but they both have the same format.
The fields of both tables are various integers.
My task:
Copy the fields from old_data to new_data.
If a date exists in both tables, the field in new_data will be replaced.
If the date doesn't exist in new_data, then the correct row will be added and the field will be copied over.
Here is how far I've gotten:
Create a temporary column:
ALTER TABLE `new_data` ADD `tempColumn` TEXT NULL;
Copy over data from old_data:
INSERT INTO `new_data` (`tempColumn`) SELECT `this_is_col` FROM `old_data`;
Combine temporary column and new_data . this_is_col. (I haven't really figured this step out since I haven't gotten this far).
MERGE? `tempColumn` `this_is_col`;
Delete temporary table
ALTER TABLE `new_data` DROP `tempColumn`;
Upon performing the second action (transferring the data over to the temporary column) I get this error:
#1062 - Duplicate entry '0000-00-00' for key 1
And now I'm stuck. Any help would be appreciated. I'm using MySQL and phpMyAdmin to test the SQL commands.
Assuming your dates are indexed as unique keys:
INSERT INTO newtable
SELECT *
FROM oldtable
ON DUPLICATE KEY column1=oldcolumn1, ...
you want INSERT ... ON DUPLICATE KEY UPDATE. your solution already satisfies steps 1 and 3 of your task, ON DUPLICATE KEY UPDATE will take care of step 2.
If you'd rather delete the row first, instead of updating: REPLACE
It'd be just one line too, so: REPLACE data SELECT, you wouldn't have to do the weirdness with adding a text column.
How about just doing an UPDATE and INSERT?
UPDATE new_data SET col=col
FROM new_data a join old_data b on a.this_is_col = b.this_is_col
Then
INSERT INTO new_data (cols) SELECT cols
FROM old_data WHERE this_is_col NOT IN (SELECT this_is_col FROM new_data)
Unless I misunderstood...