I'm about to release an app update of an iPhone app that I've written in Titanium Alloy. I've added a new column to the database, so I've written a migration for it. The upwards migration is simple enough, just altering the table to add a new column. However, the downwards migration has be a little worried as it involved creating a temporary database, storing the data I need, and then dropping the existing data base and creating a new one with the stored data, in order to keep remove column.
How do I test that this code is correct and will work?
Here are my migrations:
migration.up = function(migrator) {
migrator.db.execute('ALTER TABLE ' + migrator.table + ' ADD COLUMN is_sample BOOLEAN;');
};
migration.down = function(migrator) {
var db = migrator.db;
var table = migrator.table;
db.execute('CREATE TEMPORARY TABLE beers_backup(alloy_id,name,brewery,rating,percent,establishment,location,notes,date,date_string,beer_image,latitude,longitude,favourite);');
db.execute('INSERT INTO beers_backup SELECT alloy_id,name,brewery,rating,percent,establishment,location,notes,date,date_string,beer_image,latitude,longitude,favourite FROM ' + table + ';');
migrator.dropTable();
migrator.createTable({
columns: {
"name": "text",
"brewery": "text",
"rating": "integer",
"percent": "integer",
"establishment": "text",
"location": "text",
"notes": "text",
"date": "text",
"date_string": "text",
"beer_image": "text",
"latitude": "integer",
"longitude": "integer",
"favourite": "boolean"
},
});
db.execute('INSERT INTO ' + table + ' SELECT alloy_id,name,brewery,rating,percent,establishment,location,notes,date,date_string,beer_image,latitude,longitude,favourite FROM beers_backup;');
db.execute('DROP TABLE beers_backup;');
};
You should be fine as long as your first migration file (the one you created when you created the model) matches up with the downward migration here.
Like this:
migration.up = function(migrator){
migrator.createTable({
columns: {
"name": "text",
"brewery": "text",
"rating": "integer",
"percent": "integer",
"establishment": "text",
"location": "text",
"notes": "text",
"date": "text",
"date_string": "text",
"beer_image": "text",
"latitude": "integer",
"longitude": "integer",
"favourite": "boolean"
}
});
};
migration.down = function(migrator) {
migrator.droptTable();
};
This migration file has to have a timestamp that is less than the one you listed in the original question.
Related
I was trying to load an Avro file with nested record. One of the record was having a union of schema. When loaded to BigQuery, it created a very long name like com_mycompany_data_nestedClassname_value on each union element. That name is long. Wondering if there is a way to specify name without having the full package name prefixed.
For example. The following Avro schema
{
"type": "record",
"name": "EventRecording",
"namespace": "com.something.event",
"fields": [
{
"name": "eventName",
"type": "string"
},
{
"name": "eventTime",
"type": "long"
},
{
"name": "userId",
"type": "string"
},
{
"name": "eventDetail",
"type": [
{
"type": "record",
"name": "Network",
"namespace": "com.something.event",
"fields": [
{
"name": "hostName",
"type": "string"
},
{
"name": "ipAddress",
"type": "string"
}
]
},
{
"type": "record",
"name": "DiskIO",
"namespace": "com.something.event",
"fields": [
{
"name": "path",
"type": "string"
},
{
"name": "bytesRead",
"type": "long"
}
]
}
]
}
]
}
Came up with
Is that possible to make the long field name like eventDetail.com_something_event_Network_value to be something like eventDetail.Network
Avro loading is not as flexible as it should be in BigQuery (basic example is that it does not support load a subset of the fields (reader schema). Also, renaming of the columns is not supported today in BigQuery refer here. Only options are recreate your table with the proper names (create a new table from your existing table) or recreate the table from your previous table
Suppose I have the following JSON, which is the result of parsing urls parameters from a log file.
{
"title": "History of Alphabet",
"author": [
{
"name": "Larry"
},
]
}
{
"title": "History of ABC",
}
{
"number_pages": "321",
"year": "1999",
}
{
"title": "History of XYZ",
"author": [
{
"name": "Steve",
"age": "63"
},
{
"nickname": "Bill",
"dob": "1955-03-29"
}
]
}
All the fields in top-level, "title", "author", "number_pages", "year" are optional. And so are the fields in the second level, inside "author", for example.
How should I make a schema for this JSON when loading it to BQ?
A related question:
For example, suppose there is another similar table, but the data is from different date, so it's possible to have different schema. Is it possible to query across these 2 tables?
How should I make a schema for this JSON when loading it to BQ?
The following schema should work. You may want to change some of the types (e.g. maybe you want the dob field to be a TIMESTAMP instead of a STRING), but the general structure should be similar. Since types are NULLABLE by default, all of these fields should handle not being present for a given row.
[
{
"name": "title",
"type": "STRING"
},
{
"name": "author",
"type": "RECORD",
"fields": [
{
"name": "name",
"type": "STRING"
},
{
"name": "age",
"type": "STRING"
},
{
"name": "nickname",
"type": "STRING"
},
{
"name": "dob",
"type": "STRING"
}
]
},
{
"name": "number_pages",
"type": "INTEGER"
},
{
"name": "year",
"type": "INTEGER"
}
]
A related question: For example, suppose there is another similar table, but the data is from different date, so it's possible to have different schema. Is it possible to query across these 2 tables?
It should be possible to union two tables with differing schemas without too much difficulty.
Here's a quick example of how it works over public data (kind of a silly example, since the tables contain zero fields in common, but shows the concept):
SELECT * FROM
(SELECT * FROM publicdata:samples.natality),
(SELECT * FROM publicdata:samples.shakespeare)
LIMIT 100;
Note that you need the SELECT * around each table or the query will complain about the differing schemas.
I've got a nested table A in BigQuery with a schema as follows:
{
"name": "page_event",
"mode": "repeated",
"type": "RECORD",
"fields": [
{
"name": "id",
"type": "STRING"
}
]
}
I would like to enrich table A with data from other table and save result as a new nested table. Let's say I would like to add "description" field to table A (creating table B), so my schema will be as follows:
{
"name": "page_event",
"mode": "repeated",
"type": "RECORD",
"fields": [
{
"name": "id",
"type": "STRING"
},
{
"name": "description",
"type": "STRING"
}
]
}
How do I do this in BigQuery? It seems, that there are no functions for creating nested structures in BigQuery SQL (except NEST functions, which produces a list - but this function doesn't seem to work, failing with Unexpected error)
The only way of doing this I can think of, is to:
use string concatenation functions to produce table B with single field called "json" with content being enriched data from A, converted to json string
export B to GCS as set of files F
load F as table C
Is there an easier way to do it?
To enrich schema of existing table one can use tables patch API
https://cloud.google.com/bigquery/docs/reference/v2/tables/patch
Request will look like below
PATCH https://www.googleapis.com/bigquery/v2/projects/{project_id}/datasets/{dataset_id}/tables/{table_id}?key={YOUR_API_KEY}
{
"schema": {
"fields": [
{
"name": "page_event",
"mode": "repeated",
"type": "RECORD",
"fields": [
{
"name": "id",
"type": "STRING"
},
{
"name": "description",
"type": "STRING"
}
]
}
]
}
}
Before Patch
After Patch
It could be that I've created my index wrong, but I have a lead index with variable field names that I need to search through. I created a sub object called fields that contains name and value. Sample:
[
{
"name": "first_name",
"value": "XXX"
},
{
"name": "last_name",
"value": "XXX"
},
{
"name": "email",
"value": "X0#yahoo.com"
},
{
"name": "address",
"value": "X Thomas RD Apt 1023"
},
{
"name": "city",
"value": "phoenix"
},
{
"name": "state",
"value": "AZ"
},
{
"name": "zip",
"value": "12345"
},
{
"name": "phone",
"value": "5554448888"
},
{
"name": "message",
"value": "recently had XXXX"
}
]
name field is not_analyzed, and value field is analyzed and not, as .exact and .search
I thought I could get the results I want from a query string query doing something like
+fields.name: first_name +fields.value.exact: XXX
But it doesn't quite work the way I thought. I figure its because I'm trying to use this as mysql instead of as nosql, and there is a fundamental brain shift I must have.
While the approach you are taking probably should work with enough effort, you are much better off having explicit field names for everything, eg:
{
"name.first_name" : "XXX",
"name.last_name" : "XXX",
etc...
}
Then your query_string looks like this:
name.first_name:XXX
If you are a new to elasticsearch, play around with things before you add your mappings. The dynamic defaults should kick in and things will work. You then add mappings to get fine grained control over the field behavior.
I'm trying to append data fetched from a SELECT to another existing table but I keep getting the following error:
Provided Schema does not match Table projectId:datasetId.existingTable
Here is my request body:
{'projectId': projectId,
'configuration': {
'query': {
'query': query,
'destinationTable': {
'projectId': projectId,
'datasetId': datasetId,
'tableId': tableId
},
'writeDisposition': "WRITE_APPEND"
}
}
}
Seems like the writeDisposition option does not get evaluated.
In order for the append to work, the schema of the existing table must match exactly the schema of the query results you're appending. Can you verify that this is the case (one way to check this would be to save this query as a table and compare the schema with the table you are appending to).
Ok think I got something here. That's a weird one...
Actually it does not work if you have the same schema exactly (field mode).
Here is the source table schema:
"schema": {
"fields": [
{
"name": "ID_CLIENT",
"type": "INTEGER",
"mode": "NULLABLE"
},
{
"name": "IDENTITE",
"type": "STRING",
"mode": "NULLABLE"
}
]
}
If if I use the copy functionality from the browser interface (bigquery.cloud.google.com), I get the exact same schema which is expected:
"schema": {
"fields": [
{
"name": "ID_CLIENT",
"type": "INTEGER",
"mode": "NULLABLE"
},
{
"name": "IDENTITE",
"type": "STRING",
"mode": "NULLABLE"
}
]
}
But then I cannot append from the following fetch to the copied table:
SELECT ID_CLIENT + 1 AS ID_CLIENT, RIGHT(IDENTITE,12) AS IDENTITE FROM datasetid.client
although it returns the same schema, at least from the browser interface view, internally this returns the following schema:
"schema": {
"fields": [
{
"name": "ID_CLIENT",
"type": "INTEGER",
"mode": "REQUIRED"
},
{
"name": "IDENTITE",
"type": "STRING",
"mode": "NULLABLE"
}
]
}
Which isn't the same schema exactly (check mode).
And weirder this select:
SELECT ID_CLIENT, IDENTITE FROM datasetid.client
returns this schema:
"schema": {
"fields": [
{
"name": "ID_CLIENT",
"type": "INTEGER",
"mode": "REQUIRED"
},
{
"name": "IDENTITE",
"type": "STRING",
"mode": "REQUIRED"
}
]
}
Conclusion:
Don't rely on tables schema information from the browser interface, always use Tables.get API.
Copy doesn't really work as expected...
I have successfully appended data to existing table from a CSV file using bq command line tool. The only difference i see here is the configuration to have
write_disposition instead of writeDisposition as shown in the original question.
What i did is add the append flag to bq command line utility (python scripts) for load and it worked like charm.
I have to update the bq.py with the following.
Added a new flag called --append for load function
in the _Load class under RunWithArgs checked to see if append was set if so set 'write_disposition' = 'WRITE_APPEND'
The code is changed for bq.py as follows
In the __init__ function for _Load Class add the following
**flags.DEFINE_boolean(
'append', False,
'If true then data is appended to the existing table',
flag_values=fv)**
And in the function RunWithArgs for _Load class after the following statement
if self.replace:
opts['write_disposition'] = 'WRITE_TRUNCATE'
---> Add the following text
**if self.append:
opts['write_disposition'] = 'WRITE_APPEND'**
Now in the command line
> bq.py --append=true <mydataset>.<existingtable> <filename>.gz
will append the contents of compressed (gzipped) csv file to the existing table.