BigQuery insert values AS, assume nulls for missing columns - sql

Imagine there is a table with 1000 columns.
I want to add a row with values for 20 columns and assume NULLs for the rest.
INSERT VALUES syntax can be used for that:
INSERT INTO `tbl` (
date,
p,
... # 18 more names
)
VALUES(
DATE('2020-02-01'),
'p3',
... # 18 more values
)
The problem with it is that it is hard to tell which value corresponds to which column. And if you need to change/comment out some value then you have to make edits in two places.
INSERT SELECT syntax also can be used:
INSERT INTO `tbl`
SELECT
DATE('2020-02-01') AS date,
'p3' AS p,
... # 18 more value AS column
... # 980 more NULL AS column
Then if I need to comment out some column just one line has to be commented out.
But obviously having to set 980 NULLs is an inconvenience.
What is the way to combine both approaches? To achieve something like:
INSERT INTO `tbl`
SELECT
DATE('2020-02-01') AS date,
'p3' AS p,
... # 18 more value AS column
The query above doesn't work, the error is Inserted row has wrong column count; Has 20, expected 1000.

Your first version is really the only one you should ever be using for SQL inserts. It ensures that every target column is explicitly mentioned, and is unambiguous with regard to where the literals in the VALUES clause should go. You can use the version which does not explicitly mention column names. At first, it might seem that you are saving yourself some code. But realize that there is a column list which will be used, and it is the list of all the table's columns, in whatever their positions from definition are. Your code might work, but appreciate that any addition/removal of a column, or changing of column order, can totally break your insert script. For this reason, most will strongly advocate for the first version.

You can try following solution, it is combination of above 2 process which you have highlighted in case study:-
INSERT INTO `tbl` (date, p, 18 other coll names)
SELECT
DATE('2020-02-01') AS date,
'p3' AS p,
... # 18 more value AS column
Couple of things you should consider here are :-
Other 980 Columns should ne Nullable, that means it should hold NULL values.
All 18 columns in Insert line and Select should be in same order so that data will be inserted in same correct order.
To Avoid any confusion, try to use Alease in Select Query same as Insert Table Column name. It will remove any ambiguity.
Hopefully it will work for you.

In BigQuery, the best way to do what you're describing is to first load to a staging table. I'll assume you can get the values you want to insert into JSON format with keys that correspond to the target column names.
values.json
{"date": "2020-01-01", "p": "p3", "column": "value", ... }
Then generate a schema file for the target table and save it locally
bq show --schema project:dataset.tbl > schema.json
Load the new data to the staging table using the target schema. This gives you "named" null values for each column present in the target schema but missing from your json, bypassing the need to write them out.
bq load --replace --source_format=NEWLINE_DELIMIITED_JSON \
project:dataset.stg_tbl values.json schema.json
Now the insert select statement works every time
insert into `project:dataset.tbl`
select * from `project:dataset.stg_tbl`

Not a pure SQL solution but I managed this by loading my staging table with data then running something like:
from google.cloud import bigquery
client = bigquery.Client()
table1 = client.get_table(f"{project_id}.{dataset_name}.table1")
table1_col_map = {field.name: field for field in table1.schema}
table2 = client.get_table(f"{project_id}.{dataset_name}.table2")
table2_col_map = {field.name: field for field in table2.schema}
combined_schema = {**table2_col_map, **table1_col_map}
table1.schema = list(combined_schema.values())
client.update_table(table1_cols, ["schema"])
Explanation:
This will retrieve the schemas of both, convert their schemas into a dictionary with key as column name and value as the actual field info from the sdk. Then both are combined with dictionary unpacking (the order of unpacking determines which table's columns have precedence when a column is common between them. Finally the combined schema is assigned back to table 1 and used to update the table, adding the missing columns with nulls.

Related

Does Oracle allow an SQL INSERT INTO using a SELECT statement for VALUES if the destination table has an GENERATE ALWAYS AS IDENTITY COLUMN

I am trying to insert rows into an Oracle 19c table that we recently added a GENERATED ALWAYS AS IDENTITY column (column name is "ID"). The column should auto-increment and not need to be specified explicitly in an INSERT statement. Typical INSERT statements work - i.e. INSERT INTO table_name (field1,field2) VALUES ('f1', 'f2'). (merely an example). The ID field increments when typical INSERT is executed. But the query below, that was working before the addition of the IDENTITY COLUMN, is now not working and returning the error: ORA-00947: not enough values.
The field counts are identical with the exception of not including the new ID IDENTITY field, which I am expecting to auto-increment. Is this statement not allowed with an IDENTITY column?
Is the INSERT INTO statement, using a SELECT from another table, not allowing this and producing the error?
INSERT INTO T.AUDIT
(SELECT r.IDENTIFIER, r.SERIAL, r.NODE, r.NODEALIAS, r.MANAGER, r.AGENT, r.ALERTGROUP,
r.ALERTKEY, r.SEVERITY, r.SUMMARY, r.LASTMODIFIED, r.FIRSTOCCURRENCE, r.LASTOCCURRENCE,
r.POLL, r.TYPE, r.TALLY, r.CLASS, r.LOCATION, r.OWNERUID, r.OWNERGID, r.ACKNOWLEDGED,
r.EVENTID, r.DELETEDAT, r.ORIGINALSEVERITY, r.CATEGORY, r.SITEID, r.SITENAME, r.DURATION,
r.ACTIVECLEARCHANGE, r.NETWORK, r.EXTENDEDATTR, r.SERVERNAME, r.SERVERSERIAL, r.PROBESUBSECONDID
FROM R.STATUS r
JOIN
(SELECT SERVERSERIAL, MAX(LASTOCCURRENCE) as maxlast
FROM T.AUDIT
GROUP BY SERVERSERIAL) gla
ON r.SERVERSERIAL = gla.SERVERSERIAL
WHERE (r.LASTOCCURRENCE > SYSDATE - (1/1440)*5 AND gla.maxlast < r.LASTOCCURRENCE)
) )
Thanks for any help.
Yes, it does; your example insert
INSERT INTO table_name (field1,field2) VALUES ('f1', 'f2')
would also work as
INSERT INTO table_name (field1,field2) SELECT 'f1', 'f2' FROM DUAL
db<>fiddle demo
Your problematic real insert statement is not specifying the target column list, so when it used to work it was relying on the columns in the table (and their data types) matching the results of the query. (This is similar to relying on select *, and potentially problematic for some of the same reasons.)
Your query selects 34 values, so your table had 34 columns. You have now added a 35th column to the table, your new ID column. You know that you don't want to insert directly into that column, but in general Oracle doesn't, at least at the point it's comparing the query with the table columns. The table has 35 columns, so as you haven't said otherwise as part of the statement, it is expecting 35 values in the select list.
There's no way for Oracle to know which of the 35 columns you're skipping. Arguably it could guess based on the identity column, but that would be more work and inconsistent, and it's not unreasonable for it to insist you do the work to make sure it's right. It's expecting 35 values, it sees 34, so it throws an error saying there are not enough values - which is true.
Your question sort of implies you think Oracle might be doing something special to prevent the insert ... select ... syntax if there is an identity column, but in facts it's the opposite - it isn't doing anything special, and it's reporting the column/value count mismatch as it usually would.
So, you have to list the columns you are populating - you can't automatically skip one. So you statement needs to be:
INSERT INTO T.AUDIT (IDENTIFIER, SERIAL, NODE, ..., PROBESUBSECONDID)
SELECT r.IDENTIFIER, r.SERIAL, r.NODE, ..., r.PROBESUBSECONDID
FROM ...
using the actual column names of course if they differ from the query column names.
If you can't change that insert statement then you could make the ID column invisible; but then you would have to specify it explicitly in queries, as select * won't see it - but then you shouldn't rely on * anyway.
db<>fiddle

How to fix Error 213 in SQL Columns that ALL allow null

I am having an issue with an sql query used in job automation
The procedure inserts data from a source table(48 columns) to destination table(49 columns where the 49th/last column is NOT in the source table). But all columns in the destination and source table accept null, so that shouldn't be an issue copying from 48 columns to 49 columns.
It throws this error :
Column name or number of supplied values does not match table definition. [SQLSTATE 21S01] (Error 213). The step failed.
It should just insert null into the 49th column and I have checked the column names and they correspond.
Let's treat this like I can't delete the 49th column.
Please what can I do here?
Accepting NULL doesn't mean you can specify 49 cols and 48 values in the sql INSERT statement. The number of columns and number of values must match exactly. Either drop extra column from INSERT list or add 49th value (NULL I guess) to the values list. In both cases if column is NULLable, it will be set to NULL.
First, if you have code that's not working, you should post it so we can tell for sure what's happening. But I'd be pretty willing to bet you're trying to short cut the process and use something like this:
INSERT tableB
SELECT *
FROM tableA
But the tables don't have the same number of columns, so the SQL Engine doesn't know which source column goes into which destination column. You need to provide an explicit list so it knows which one you intend to ignore:
INSERT tableB
(
col1,
col2,
...
col48
)
SELECT
col1,
col2,
...
col48
FROM tableA;

Dynamic Way to Insert Data into SQLite Table When Column Counts Change

I am working on a script using SQLite where there is a flux in the number of columns that are available to be inserted into a table I am creating to later do a join on.
The table I am created to insert the data into has 97 columns, the data coming in from my feed can range from around 80 all the way up to that 97th column.
The error I get is SQLITE_ERROR: table allPositionsTable has 97 columns but 80 values were supplied and is the one I am trying to avoid by figuring out a way where this doesn't happen.
Are there any workarounds or tricks I can use to have SQLite function so that it will always include the columns where there is no data for them or dynamically not include them so the error goes away?
The error I get is SQLITE_ERROR: table allPositionsTable has 97
columns but 80 values were supplied and is the one I am trying to
avoid by figuring out a way where this doesn't happen.
This happens because you are using the default column list (i.e. by not specifying the columns into which the values are to be placed)
That is you would be coding the equivalent of INSERT INTO your_table VALUES(.......
so in the absence of a list of columns you are saying that you will provide a value for all columns in the table and hence the message when a value or values are not present.
What you want to do is use INSERT INTO your_table_name (your_comma_separated_list_of_columns_to_be_inserted) VALUES(.......
where your_table_name and your_comma_separated_list_of_columns_to_be_inserted would be replaced with the appropriate values.
See the highlighted section of the INSERT syntax that can be found at SQL As Understood By SQLite - INSERT
and the respective section from the above link is :-
The first form (with the "VALUES" keyword) creates one or more new
rows in an existing table.
If the column-name list after table-name is
omitted then the number of values inserted into each row must be the
same as the number of columns in the table.
In this case the result of
evaluating the left-most expression from each term of the VALUES list
is inserted into the left-most column of each new row, and so forth
for each subsequent expression.
If a column-name list is specified,
then the number of values in each term of the VALUE list must match
the number of specified columns.
Each of the named columns of the new
row is populated with the results of evaluating the corresponding
VALUES expression.
Table columns that do not appear in the column list
are populated with the default column value (specified as part of the
CREATE TABLE statement), or with NULL if no default value is
specified.

BQ - INSERT without listing columns

I have 2 BQ tables, very wide ones in terms of number of columns. Note all the table columns are made nullable for flexibility
Table A - 1000 cols - Superset of Bs cols
Table B - 500 cols - Subset of As cols - exactly named/typed as above cols
So rows in Bs table data should be insertable into A, where anything column not inserted just gets a null. i.e 500 cols get a value, remaining 500 get a default null as not present in the insert.
So as these tables are very wide, enumerating all the columns in an insert statement would take forever and be a maintence nightmare.
Is there a way in standard SQL to insert without listing the columns names in the the insert statement, whereby its automagically name matched?
So I want to be able to do this really and have the columns from B matched to A for each row inserted? If not is there any other way I am not seeing that could help with this?
thanks!
INSERT INTO
`p.d.A` (
SELECT
*
FROM
`p.d.B` )
I actually tried enumerating the columns to see if nesting worked and seems it doesnt?
INSERT INTO
`p.d.A` (x, y.z) (
SELECT
x, y.z
FROM
`p.d.B` )
I cant just say (x,y) as y structs from the dff tables arent exactly the same BQ complains structs dont match exact.....hence why I was trying y.z ?
Sure, easy!
Prepare dummy table p.d.b_ using below select
SELECT * FROM `p.d.a` WHERE FALSE
(Note, even though result will be empty table - above will scan whole table a - this is required just once - so should be Okey - if not you can script this once and just create this table from script)
Ok, so now instead of using
SELECT * FROM `p.d.b`
you will use
SELECT * FROM `p.d.b*`
and this will make a trick for you (it did for me :o)
P.S. Of course I assume you will make sure there is no other tables with names starting with b (or whatever real name is) in that dataset

insert data and avoid duplication by checking a specific column

I have a local db that I'm trying to insert multiple rows of data, but I do not want duplicates. I do not have a second db that I'm trying to insert from. I have an sql file. The structure is this for the db I'm inserting into:
(db)artists
(table)names-> ID | ArtistName | ArtistURL | Modified
I am trying to do this insertion:
INSERT names (ArtistName, Modified)
VALUES (name1, date),
(name2, date2),
...
(name40, date40)
The question is, how can I insert data and avoid duplication by checking a specific column to this list of data that I want inserted using SQL?
Duplicate what? Duplicate name? Duplicate row? I'll assume no dup ArtistName.
Have UNIQUE(ArtistName) (or PRIMARY KEY) on the table.
Use INSERT IGNORE instead of IGNORE.
(No LEFT JOIN, etc)
I ended up following the advice of #Hart CO a little bit by inserting all my values into a completely new table. Then I used this SQL statement:
SELECT ArtistName
FROM testing_table
WHERE !EXISTS
(SELECT ArtistName FROM names WHERE
testing_table.ArtistName = testing_table.ArtistName)
This gave me all my artist names that were in my data and not in the name table.
I then exported to an sql file and adjusted the INSERT a little bit to insert into the names table with the corresponding data.
INSERT IGNORE INTO `names` (ArtistName) VALUES
*all my values from the exported data*
Where (ArtistName) could have any of the data returned. For example,
(ArtistName, ArtistUrl, Modified). As long as the values returned from the export has 3 values.
This is probably not the most efficient, but it worked for what I was trying to do.