I am migrating mssql data into Bigquery using dataflow Jdbc to Bigquery template.
I am creating Table with same in schema in bigquery and then running dataflow pipeline.
But there are some tables in mssql where column name contains spaces (e.g. Employee Details). How can i create same columns in Bigquery which contains spaces?
Related
I have created one bigquery table with certain number of columns. Now added few more columns and uploading SQL file to run the airflow pipeline but values are copied into wrong columns. How columns will be mapped with the sql containing columns values?
I want to extract bigquery external table metadata.
I've gone through documetation
But I'm not able to find the field which gives me the information related to the external table location on gcs.
So is there any other metadata table that gives me location information of the external table?
Using SPLIT since there can be multiple comma separated uris:
SELECT ddl, SPLIT(REGEXP_EXTRACT(ddl, r"(?i)uris\s*=\s*\[(.*)\]")) as uris
FROM `catalog.schema.INFORMATION_SCHEMA.TABLES`
WHERE table_type = 'EXTERNAL';
I have a table in Athena with the following columns.
Describe my_table
row_id
icd9_code
linksto
The column icd9_code is empty with intdata type. I want to insert some integer values to the column icd9_code of my table named my_table.
Those integer values are stored in an excel sheet in my local pc. Does AWS athena provide some way to do it.
Amazon Athena is primarily designed to run SQL queries across data stored in Amazon S3. It is not able to access data stored in Microsoft Excel files, nor is it able to access files stored on your computer.
To update a particular column of data for existing rows of data, you would need to modify the files in Amazon S3 that contain those rows of data.
I ran a query abc and got the result table m in Datalab.
Is there any way I can create a new table in Bigquery and write the content of table m to it in Datalab?
%%bq query --name abc
select *
from `test`
m=abc.execute().result()
Yes you can, Datalab supports a number of BigQuery magic commands, among them is create, which allows for the creation of datasets and tables.
See the documentation here:
http://googledatalab.github.io/pydatalab/datalab.magics.html
And specifically this section, which details all the available BigQuery commands:
positional arguments:
{sample,create,delete,dryrun,udf,execute,pipeline,table,schema,datasets,tables,extract,load}
commands
sample Display a sample of the results of a BigQuery SQL
query. The cell can optionally contain arguments for
expanding variables in the query, if -q/--query was
used, or it can contain SQL for a query.
create Create a dataset or table.
delete Delete a dataset or table.
dryrun Execute a dry run of a BigQuery query and display
approximate usage statistics
udf Create a named Javascript BigQuery UDF
execute Execute a BigQuery SQL query and optionally send the
results to a named table. The cell can optionally
contain arguments for expanding variables in the
query.
pipeline Define a deployable pipeline based on a BigQuery
query. The cell can optionally contain arguments for
expanding variables in the query.
table View a BigQuery table.
schema View a BigQuery table or view schema.
datasets List the datasets in a BigQuery project.
tables List the tables in a BigQuery project or dataset.
extract Extract BigQuery query results or table to GCS.
load Load data from GCS into a BigQuery table.
I have some Insert queries written in hive to be migrated in Bigquery.
For example:
insert into test.abc partition(yrmth) select * from test.xyz
In Bigquery, partition is only supported in YYYYMMDD format. I'm able to dump the data in partitioned table through BQ command line tool by loading test.abc$20171125.
How can I achieve the same using DML statements in Bigquery?
I have learnt that Legacy SQL doesn't support writing DML statements and Standard SQL doesn't support the table specifications like test.abc$20171125 that is required for loading the data in corresponding partition.
You are correct - DML statements are not yet supported over partitioned tables.
Just do simple select select * from test.xyz with destination table test.abc$20171125. This is supported by Web UI, bq command line, API and any client of your choice
Check https://issuetracker.google.com/issues/36383555 if you want to try alpha release for column based partitioned tables - DML over partitioned tables is part of it