I am trying to create a table from JSON files in BigQuery and want just one column which will represent the first key 'id' only.
Creating a schema with only one column causes errors because all of the JSON keys in the input files are considered.
Is there a way to create a table that corresponds to only specific JSON keys?
Unfortunately, you can’t create a table from a JSON file in BigQuery with just one column from the JSON file. You can create a feature request in this link.
You have these options:
Option 1
Don't import as JSON, but as CSV instead (define null character as
separator)
Each line has only one column - the full JSON string
Parse inside BigQuery with maximum flexibility (JSON parsing
functions and even JS)
Option 2
Do a 2-step import:
Import as a new table with all the columns.
Append "SELECT column1 FROM [newtable]" into the existing table.
Related
I've multiple csv files and multiple tables.
The table name is file name and column name is first row of csv file.
Now I want to add default value of empty string to the sink table.
Consider my scenario,
employee:
id int, name varchar, is_active bit NULL
employee.csv:
id|name|is_active
1|raja|
Now I'm trying to copy the csv data to PostgreSQL table its throwing error.
Expected result is default value if its empty value.
You can use NULLIF in PostgreSQL:
NULLIF(argument_1,argument_2);
The NULLIF function returns a null value if argument_1 equals to argument_2, otherwise it returns argument_1.
This way you can replace NULL value with some other value
If your error is related to Type mismatch then consider typecasting the column first
Thanks!
As per the issue, tried to repro the scenario and here is the following outcome which was successfully copied. You have to use
Source Dataset: employee.csv from Azure Blob Storage
Sink Dataset : Here, I have used the sink as Azure SQL DB for some limitations but as you have used PostgreSQL is almost similar.
Copy Activity Settings:
Under the mapping settings there will be type conversion, where you have to import schema else you can dynamically add
Output:
Alternative to use DataFlow - if you have multiple data fields, you need to use the derived column transformation to generate new columns in your data flow or to modify existing fields.
For more details, refer Derived column transformation in mapping data flow.
You can even refer to this Microsoft Q&A post for more insights: Copy Task failure because of conversion failure
I have a NEWLINE_DELIMITED_JSON file on my computer and I would like to load it into a BigQuery table.
I have 3 keys in each lines. One of those is a timestamp: I would like to remove it and not get a "timestamp" column in my BigQuery table.
One of them has a wrong name: the name of the key in the JSON file is "special_id" but I would like to load it in a column named "main_id".
I can't find a way to do that while specifying the schema of the table created while loading. Is there a way to do this ?
Thanks you
For that level of flexibility:
Don't import as JSON
Import as CSV (define null character as separator)
Each line has only one column - the full JSON string
Parse inside BigQuery with maximum flexibility (JSON parsing functions and even JS)
I am trying to create a table from a CSV using the schema autodetect option. It fails because some rows / columns have values that do not conform to the auto detected type. I would like to change the type for those columns to STRING.
Is there a way to export the autodetected schema so I can update it and use it to load. The CSV has 30+ columns and I would like to avoid having to manually generate a schema file for all the columns.
Update
This question is not a duplicate of this. The latter is a solution to the case where the table already exists. In this question there is no existing table whose schema can be exported.
So for my previous homework, we were asked to import a csv file with no columns names to impala, where we explicitly give the name and type of each column while creating the table. However, now we have a csv file but with column names given, in this case, do we still need to write down the name and type of it even it is provided in the data?
Yes, you still have to create an external table and define the column names and types. But you have to pass the following option right at the end of the create table statement
tblproperties ("skip.header.line.count"="1");
-- Once the table property is set, queries skip the specified number of lines
-- at the beginning of each text data file. Therefore, all the files in the table
-- should follow the same convention for header lines.
I exported data of the dataset of BigQuery using API to JSON file, but the JSON that I download has a properties saved as array object with key name as "V" instead of original name of property.
I don't want to export the table fo dataset to Google Storage, nor to execute a specified query.
I need to get the table data of the dataset with the orginal schema using the api to json file.
I am using the api:
Function:
Tabledata: list: Retrieves table data from a specified set of rows.
https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list#request
Function
Tables: get This method does not return the data in the table, it only returns the table resource, which describes the structure of this table.
https://cloud.google.com/bigquery/docs/reference/v2/tables/get#request
Thank you,
Best regards,