BigQuery export table json using API - google-bigquery

I exported data of the dataset of BigQuery using API to JSON file, but the JSON that I download has a properties saved as array object with key name as "V" instead of original name of property.
I don't want to export the table fo dataset to Google Storage, nor to execute a specified query.
I need to get the table data of the dataset with the orginal schema using the api to json file.
I am using the api:
Function:
Tabledata: list: Retrieves table data from a specified set of rows.
https://cloud.google.com/bigquery/docs/reference/v2/tabledata/list#request
Function
Tables: get This method does not return the data in the table, it only returns the table resource, which describes the structure of this table.
https://cloud.google.com/bigquery/docs/reference/v2/tables/get#request
Thank you,
Best regards,

Related

Big Query: create table from first key in JSON only

I am trying to create a table from JSON files in BigQuery and want just one column which will represent the first key 'id' only.
Creating a schema with only one column causes errors because all of the JSON keys in the input files are considered.
Is there a way to create a table that corresponds to only specific JSON keys?
Unfortunately, you can’t create a table from a JSON file in BigQuery with just one column from the JSON file. You can create a feature request in this link.
You have these options:
Option 1
Don't import as JSON, but as CSV instead (define null character as
separator)
Each line has only one column - the full JSON string
Parse inside BigQuery with maximum flexibility (JSON parsing
functions and even JS)
Option 2
Do a 2-step import:
Import as a new table with all the columns.
Append "SELECT column1 FROM [newtable]" into the existing table.

How to Set default value of empty data of column in copy activity from csv file using azure data factory v2

I've multiple csv files and multiple tables.
The table name is file name and column name is first row of csv file.
Now I want to add default value of empty string to the sink table.
Consider my scenario,
employee:
id int, name varchar, is_active bit NULL
employee.csv:
id|name|is_active
1|raja|
Now I'm trying to copy the csv data to PostgreSQL table its throwing error.
Expected result is default value if its empty value.
You can use NULLIF in PostgreSQL:
NULLIF(argument_1,argument_2);
The NULLIF function returns a null value if argument_1 equals to argument_2, otherwise it returns argument_1.
This way you can replace NULL value with some other value
If your error is related to Type mismatch then consider typecasting the column first
Thanks!
As per the issue, tried to repro the scenario and here is the following outcome which was successfully copied. You have to use
Source Dataset: employee.csv from Azure Blob Storage
Sink Dataset : Here, I have used the sink as Azure SQL DB for some limitations but as you have used PostgreSQL is almost similar.
Copy Activity Settings:
Under the mapping settings there will be type conversion, where you have to import schema else you can dynamically add
Output:
Alternative to use DataFlow - if you have multiple data fields, you need to use the derived column transformation to generate new columns in your data flow or to modify existing fields.
For more details, refer Derived column transformation in mapping data flow.
You can even refer to this Microsoft Q&A post for more insights: Copy Task failure because of conversion failure

How to get an array from JSON in the Azure Data Factory?

My actual (not properly working) setup has two pipelines:
Get API data to lake: for each row in metadata table in SQL calling the REST API and copy the reply (json-files) to the Blob datalake.
Copy data from the lake to SQL: For Each file auto create table in SQL.
The result is the correct number of tables in SQL. Only the content of the tables is not what I hoped for. They all contain 1 column named odata.metadata and 1 entry, the link to the metadata.
If I manually remove the metadata from the JSON in the datalake and then run the second pipeline, the SQL table is what I want to have.
Have:
{ "odata.metadata":"https://test.com",
"value":[
{
"Key":"12345",
"Title":"Name",
"Status":"Test"
}]}
Want:
[{
"Key":"12345",
"Title":"Name",
"Status":"Test"
}]
I tried to add $.['value'] in the API call. The result then was no odata.metadata line, but the array started with {value: which resulted in an error copying to SQL
I also tried to use mapping (in sink) to SQL. That gives the wanted result for the dataset I manually specified the mapping for, but only goes well for the dataset with the same number of column in the array. I don't want to manually do the mapping for 170 calls...
Does anyone know how handle this in ADF? For now I feel like the only solution is to add a Python step in the pipeline, but I hope for a somewhat standard ADF way to do this!
You can add another pipeline with dataflow to remove the content from JSON file before copying data to SQL, using flatten formatters.
Before flattening the JSON file:
This is what I see when JSON data copied to SQL database without flattening:
After flattening the JSON file:
Added a pipeline with dataflow to flatten the JSON file to remove 'odata.metadata' content from the array.
Source preview:
Flatten formatter:
Select the required object from the Input array
After selecting value object from input array, you can see only the values under value in Flatten formatter preview.
Sink preview:
File generated after flattening.
Copy the generated file as Input to SQL.
Note: If your Input file schema is not constant, you can enable Allow schema drift to allow schema changes
Reference: Schema drift in mapping data flow

Using ADF Data Flow Derived Column transform against nested Delta structures

I'm trying to use a derived column transform within an ADF (Gen 2) Data Flow where I've ingested a Delta table with nested structures. I'm struggling with the syntax needed to flatten out these structures and no column info is displayed despite me being able to preview the data.
Such a structure would be:
{
"ContactId":"1002657",
"Name":{
"FirstName":"Donna",
"FullName":"Donna Brittain",
"LastName":"Brittain"
}
}
Data Preview working OK:
Data Preview
The structure of my Delta table:
Delta Table Struct
The error I'm getting trying to reference a nested column:
Derived Column Task
How can I reference a nested column such as Name.FirstName to flatten it out to FirstName and why is it not showing up in any of the mappings?
There is a easy way to flatten the nested structures. We can use Copy activity in ADF firstly, it will automate flatten the nested column.
Copy the data into Azure Storage such as data lake(here I used Azure Data Lake Storage Gen 2), then we can use it as data source in the Data Flow.
We can create a txt or csv file with headers in data lake.
Then we can define a Copy activity in ADF and set the mapping.
After run debug, we can see the result. We can use it as data source in data flow.
Update:
In the sink,we can set the value of the Max rows per file option like follows:
ADF will divide the file into several files

Using elemMatch in Hive with json field

I am using Hive for json storage. Then, I have created a table with only one string column containing all the json. I have tested the get_json_object function that Hive offers but I am not able to create a query that iterates all the subdocuments in a list and finds a value in a specific field.
In MongoDB, this problem can be solved by using $elemMatch as the documentation says.
Is there any way to do something like this in Hive?