Is there a convenient way (Python, web UI, or CLI) for inserting a new column into an existing BigQuery table (that already has 100 columns or so) and update the schema accordingly?
Say I want to insert it after column 49. If I do this via a query, I will have to type every single column name, will I not?
Update: the suggested answer does not make it clear how this applies to BigQuery. Furthermore, the documentation does not seem to cover
ALTER TABLE `tablename` ADD `column_name1` TEXT NOT NULL AFTER `column_name2`;
Syntax. A test confirmed that the AFTER identifier does not work for BigQuery.
I think that is not possible to perform this action in a simple way, I thought in some workarounds to reach this such as:
Create a view after adding your column.
Creating a table from a query result after adding your column.
On the other hand, I can't catch how this is useful, the only scenario I can think for this requirement is if you are using SELECT * which is not recommended when using BigQuery according with the Bigquery best practices. If is not the case share your case of use to get a better understanding of it.
Since this is not a current feature of BigQuery you can file a feature request asking for this feature.
Related
I have a partitioned and clustered table in bigquery. I would like to add another column to the set of clustered columns. I found out that the way to fix it is creating another table as you can see here Make existing bigquery table clustered, but I can't do it because my table is the source of a Data Studio dashboard where I have many calculated fields and I don't want to lose these fields.
Any suggestion? Thanks a lot!
Gustavo.
You don't need a new table, although changing cluster column was not supported initially, it is supported afterwards (since early 2020).
Please check this documentation: https://cloud.google.com/bigquery/docs/creating-clustered-tables#modifying-cluster-spec
Unfortunately, the feature is only available through API right now.
(If you're not familiar with BigQuery API) It doesn't require you to write code, you can interact with API web interface here. For your one time maintenance, it may save you some time.
I don't think BigQuery yet allows renaming a table.
Can you use view? So copy data to another table with required modified clustering. Then have a view with same name as the old table name on the new table, so that nothing breaks on Data Studio.
Kind of a general question here but is there an easy way to determine, in Oracle database, if a field has a sequence number attached to it? This seems like it should be obvious, but I'm missing it.
Thanks.
In general, no. Sequences are separate first-class objects. Normally, you'd create one sequence per table and use that sequence consistently to populate the key (via a trigger or via whatever procedural API you have to do the insert). But nothing stops you from using the same sequence to populate multiple tables or writing code that doesn't use the sequence when one exists.
If you are on a recent version of Oracle and you are looking only at columns that are explicitly created as identity columns rather than the old-school approach of creating a separate sequence and using a trigger/ column default to populate the key, you can use the identity_column column in all_tab_columns (or user_tab_columns/ dba_tab_columns) to see whether the column was declared as an identity.
there is no way to attach a sequence to a field in oracle, what you can do is to use the sequence in your application as you see fit.
General you'll need to look for triggers on the table, and for procedures that maybe used to insert data to this table, some people use those to regulate sequence use and to sort of attach it to a field but it's not a real attachment but they are just using the sequence and it could be used in many other ways.
I am attempting to fix the schema of a Bigquery table in which the type of a field is wrong (but contains no data). I would like to copy the data from the old schema to the new using the UI ( select * except(bad_column) from ... ).
The problem is that:
if I select into a table, then Bigquery is removing the required of the columns and therefore rejecting the insert.
Exporting via json loses information on dates.
Is there a better solution than creating a new table with all columns being nullable/repeated or manually transforming all of the data?
Update (2018-06-20): BigQuery now supports required fields on query output in standard SQL, and has done so since mid-2017.
Specifically, if you append your query results to a table with a schema that has required fields, that schema will be preserved, and BigQuery will check as results are written that it contains no null values. If you want to write your results to a brand-new table, you can create an empty table with the desired schema and append to that table.
Outdated:
You have several options:
Change your field types to nullable. Standard SQL returns only nullable fields, and this is intended behavior, so going forward it may be less useful to mark fields as required.
You can use legacy SQL, which will preserve required fields. You can't use except, but you can explicitly select all other fields.
You can export and re-import with the desired schema.
You mention that export via JSON loses date information. Can you clarify? If you're referring to the partition date, then unfortunately I think any of the above solutions will collapse all data into today's partition, unless you explicitly insert into a named partition using the table$yyyymmdd syntax. (Which will work, but may require lots of operations if you have data spread across many dates.)
BigQuery now supports table clone features. A table clone is a lightweight, writeable copy of another table
Copy tables from query in Bigquery
How can I update a table in one schema to match a table in a second schema assuming the only difference is additional fields and indexes in the second. I do not want to change any of the data in the table. Hoping to do it without laboriously identifying the missing fields.
A elegant solution to this can be a DDL trigger that is triggered on a ALTER, CREATE ddl_event that applies the same changes to the first table (in one schema) as in the second table(that is another schema) in the same transaction.
Link --> https://docs.oracle.com/cd/E11882_01/appdev.112/e25519/triggers.htm#LNPLS2008
A little known but interesting recent addition to the Oracle DBMS artillery is DBMS_COMPARISON.
https://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_comparison.htm
Haven't tried it myself, but according the document should be able to get you the information at least without having to do any heavy scripting.
I've been doing this sort of thing since Oracle7 and always had to resort to complex scripting.
If I have a table with columns: a, b, c and later I do a ALTER TABLE command to add a new column "d", is it possible to add it between a and b for example, and not at the end?
I heard that the position of the columns affects performance.
It's not possible to add a column between two existing columns with an ALTER TABLE statement in SQLite. This works as designed.
The new column is always appended to the end of the list of existing
columns.
As far as I know, MySQL is the only SQL (ish) dbms that lets you determine the placement of new columns.
To add a column at a specific position within a table row, use FIRST
or AFTER col_name. The default is to add the column last. You can also
use FIRST and AFTER in CHANGE or MODIFY operations to reorder columns
within a table.
But this isn't a feature I'd use regularly, so "as far as I know" isn't really very far.
With every sql platform I've seen the only way to do this is to drop the table and re-create it.
However, I question if the position of the column affects performance... In what way would it, what operations are you doing that you think it will make a difference?
I will also note that dropping the table and recreating it is often not a heavy lift. Making a backup of a table and restoring that table is easy on all major platforms so scripting a backup - drop - create - restore is an easy task for a competent DBA.
In fact I've done so often when users ask -- but I always find it a little silly. The most often reason given is the tool of choice behaves nicer when the columns are created in a certain order. (This was also #Jarad's reason below) So this is a good lesson for tool makers, make your tool able to reorder columns (and remember it between runs) -- then everyone is happy.
I use the DB.compileStatement:
sql = DB.compileStatement("INSERT INTO tableX VALUES (?,?,?);
sql.bindString(1,"value for column 1");
sql.bindString(2,"value for column 2");
sql.bindString(3,"value for column 3");
sql.executeUpdateDelete();
So there will be a big difference if order of the columns is not correct.
Unfortunately adding columns at a specific position is not possible using ALTER TABLE, at least not in SQLite. (MySQL it is possible). Workaroud is recreating the table.. (and backup and restore data)