I have a huge BQ table with a complex schema (lots of repeated and record fields). Is there a way for me to add more columns to this table and/or create a select that would copy the entire table into a new one with the addition of one (or more) columns? It appears as if copying a table requires flattening of repeated columns (not good). I need an exact copy of the original table with some new columns.
I found a way to Update Table Schema but it looks rather limited as I can only seem to add nullable or repeated columns. I can't add record columns or remove anything.
If I were to modify my import JSON data (and schema) I could import anything. But my import data is huge and conveniently already in a denormalized gzipped JSON so changing that seems like a huge effort.
I think you can add fields of type RECORD.
Nullable and repeated refer to field's mode, not type. So you can add a Nullable record or a Repeated record, but cannot add a Required record.
https://cloud.google.com/bigquery/docs/reference/v2/tables#resource
You are correct that you cannot delete anything.
If you want to use a query to copy the table, but don't want nested and repeated fields to be flattened, you can set the flattenResults parameter to false to preserve the structure of your output schema.
Related
I am looking at creating temporal tables https://msdn.microsoft.com/en-us/library/mt604462.aspx in our database but I cant on a couple of tables that have computed columns.
The error message returned is rather self explanatory
"Computed column is defined with a user-defined function which is not allowed with system-versioned table"
but I was hoping there was a way to exclude or ignore columns from being tracked?
I have tried dropping the computed column creating the history table then adding the computed column back into the table but this didn't work.
Any help is appreciated.
Thanks
Edit -
I wasn't able to find a way to ignore columns from being tracked but we were able to refactor out the columns that used UDFs thus enabling us to use temporal tables.
I was struggling with adding a computed column to an existing system-versioned table. In case anyone else with a similar problem lands here, I finally realized that the history table doesn't treat the column the same way. It ends up being similar to having an IDENTITY column on the base table, but that would result in a regular INT field on the history table.
If you are attempting to add a computed column to a system-versioned (temporal) table:
First turn off system versioning
Then add your computed column to the base table
Verify the "type" of the resulting computed column
Add the column with the appropriate static type to the history table
Turn system versioning back on (DO NOT FORGET TO SPECIFY THE HISTORY TABLE)
I find it rather odd that you can accidentally omit the history_table when turning system versioning back on. I'd expect either it would resume versioning to the same table OR throw some kind of error considering it might be a bit unexpected behavior.
#pasquale-ceglie - I don't have enough reputation to comment, but I wanted to expand on what you said. You should be able to use most computed columns with temporal tables, just more manually. Basically you can't copy the schema definition with the computed columns, you can however replicate the resulting columns and generate the appropriate history table before trying to turn everything on. The definitions are just a bit different between the two tables (was quite confusing to me at first). I subscribed here, ping me if the above isn't clear and are curious.
System-versioned table schema modification fail because adding computed column while system-versioning is ON is not supported, so for the same reason you can't transform a regular table into a temporal one if there are computed columns on it.
Hope will help,
Is it good practice to add some placeholder columns when creating a database table with millions of rows, in case the schema gets changed later? More efficient to rename a column than to insert a new one?
There are many problems with adding "placeholder" columns to a table.
These columns may take up useless space, and appear "sloppy".
You may create too many columns now, and have columns that will never be used.
You may not create enough columns now, and will have to end up creating more anyways.
You don't know what the column data types will be at this time.
Always remember that if a column needs added at a later date and will not be used for any of the current rows in the table, you can still keep the table normalized by creating a smaller table that holds this information, then link them by using the primary key.
Let me know if you have any questions about this. I hope this helps!
I want to try and create a boolean table with the same structure as another table. I know how to create the table but my issue is the updating.
Lets say i have the table A1 with 10 columns with different attributes for a person such as, height, run speed, name, hair colour etc.
I then want to be able to modifiy this table by either removing or adding columns to table A1 and these updates apply to my other column B1 so it has the same columns but a boolean value (the boolean value is not based on A1).
My first question is if it's doable.
My second is: Will the updates be super ineffecient for lets say 200-300 records.
(I could probably create an external program that reads the table and manually removes and adds columns via ADD/DROP sql statements, but i was hoping there was a more dynamic/efficent solution)
What you want, as another answer posted, is EAV schema "entity - attribute - value". This allows you to dynamically add new attributes without changing any physical table schema. It is also horrible for performance (but with only a few hundred entities it shouldn't be too bad).
Another equally ugly solution is to add as many columns as you think you'll ever need, named Attribute_1, Attribute_2, etc. Then you have a lookup table which allows you to map attributes to their definitions.
This is less flexible than the EAV schema, but allows you to index on specific attributes so that your queries are a little more performant.
Another solution would be to use XML data types to hold the attributes and values. SQL Server has built-in functionality for XML data, while it's not as easy to use as normal SQL, it does work quite well.
Rather than add and subtract columns to the table. I would suggest that you have a table with the fixed attributes. Then have another table which stores additional attributes (the names of the columns) then a third table which holds the id of the person, the is off the attribute and the value of the attribute.
For example the user table :
UserId
Firstname
Surname
The attribute table
AttrId
AttrName
The UserAttribute table:
UserId
AttrId
AttrValue
For this to answer your question you could have two sets of these tables but the AttrValue would be boolean for the second table.
An intermediate option is to go for multiple spare columns in the table and use the attribute table to store a column name and a boolean to indicate if the column is in use
I want to update an empty column from type Boolean to string in big-query.
How can i do it without overwrite the table and load all the data?
thanks!
You can only add new fields at the end of the table. On the old columns, you have the option to change required to nullable. So what you want is not possible, only if you add a new field, or as you say completely overwriting the table.
There are two table operations Update and Patch.
You need to use the Update command, to add new columns to your schema.
Important side notes:
order is important. If you change the ordering, it will look like an incompatible schema.
you can only add new fields at the end of the table. On the old columns, you have the option to change required to nullable.
you cannot add a required field to an existing schema.
you cannot remove old fields, once a table's schema has been specified you cannot change it without first deleting all the of the data associated with it. If you want to change a table's schema, you must specify a writeDisposition of WRITE_TRUNCATE. For more information, see the Jobs resource.
Here is an example of a curl session that adds fields to a schema. It should be relatively easy to adapt to Java. It uses auth.py from here
When using Table.Update(), you must include the full table schema again. If you don't provide an exact matching schema you can get: Provided Schema does not match Table. For example I didn't paid attention to details and in one of my update calls I didn't include an old field like created and it failed.
I want to merge or import a column of data from one data table created in VB.NET to another data table's 'same-named' column.
The destination data table has all the required columns created before hand but the source data table contains one column at a time, whose name is the same as that of one of the columns in destination data table.
Whatever coding I have done for the same till now, results in blank rows at starting. The destination data table looks like this:
And I want it to be displayed it as:
Regards
I worked out a way around this.
I store all the details in an 2-D array and then used that array as the data source for the grid. So instead of merging tables, keep storing different values in different columns of array.
There might be better solutions to this problem but this solution worked for me.