Data Transfer Job Fails with Error: "Cannot load CSV data with a nested schema" - google-bigquery

I created a Data Transfer Job with the following information:
Datasource - Cloud Storage (Bucket with Datastore export data)
Source - Table on Bigquery
Run On-Demand
Fails with error: Cannot load CSV data with a nested schema
Cannot figure out how to resolve this

Related

Azure Data Factory Pass "Object api name" of Salesforce Tables as Parameter from ForEach

I trying to load some tables from Salesforce, the number of tables can change from time to time. We created a CSV file on a blob storage which contains the name of tables that we want to load from Salesforce, this CSV contains one column as shown bellow:
CSV File
I have created a Lookup activity that refers to the CSV file (and I disabled the option of First row only), then I connected it to a ForEach activity which iterates over each row of the Lookup activity, as shown bellow:
#activity('TablesLookup').output.value
Inside the Foreach I have created a Copy Data activity which has Salesforce as data source.
The problem here is that I'm trying to pass the table name of the Salesforce data source (the Object api name) as a parameter from the ForEach. But I didn't find the option where I can pass the table name. Details on the figures bellow:
Salesforce Dataset - Parameters
Salesforce Dataset - Connection
ForEach - Copy Data, Salesforce Data Source
This gives me an the following error:
ErrorCode=UserErrorOdbcOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=ERROR [HY000] [Microsoft][Salesforce] (120) SOQL_FIRST mode prepare failure: SOQL error: [Microsoft][SQLEngine] (31480) syntax error near 'SELECT *<<< ??? >>> FROM "Student"'. SQL error: [Microsoft][SQLEngine] (31740) Table or view not found: Deloitte..Student,Source=Microsoft.DataTransfer.ClientLibrary.Odbc.OdbcConnector,''Type=System.Data.Odbc.OdbcException,Message=ERROR [HY000] [Microsoft][Salesforce] (120) SOQL_FIRST mode prepare failure: SOQL error: [Microsoft][SQLEngine] (31480) syntax error near 'SELECT *<<< ??? >>> FROM "Student"'. SQL error: [Microsoft][SQLEngine] (31740) Table or view not found: Deloitte..Student,Source=Microsoft Salesforce ODBC Driver,'
Best regards,
If your Lookup activity's output is like this:
You need to change #item() to #item().Names when you pass your value of Table Name from Look up activity's output to parameter named 'tableName' in dataset(In 'ForEach - Copy Data, Salesforce Data Source' step).

Loading Avro Data into BigQuery via command-line?

I have created an avro-hive table and loaded data into avro-table from another table using hive insert-overwrite command.I can see the data in avro-hive table but when i try to load this into bigQuery table, It gives an error.
Table schema:-
CREATE TABLE `adityadb1.gold_hcth_prfl_datatype_acceptence`(
`prfl_id` bigint,
`crd_dtl` array< struct < cust_crd_id:bigint,crd_nbr:string,crd_typ_cde:string,crd_typ_cde_desc:string,crdhldr_nm:string,crd_exprn_dte:string,acct_nbr:string,cre_sys_cde:string,cre_sys_cde_desc:string,last_upd_sys_cde:string,last_upd_sys_cde_desc:string,cre_tmst:string,last_upd_tmst:string,str_nbr:int,lng_crd_nbr:string>>)
STORED AS AVRO;
Error that i am getting:-
Error encountered during job execution:
Error while reading data, error message: The Apache Avro library failed to read data with the follwing error: Cannot resolve:
I am using following command to load the data into bigquery:-
bq load --source_format=AVRO dataset.tableName avro-filePath
Make sure that there is data available in your gs folder where you are pointing and the data contains the schema (it should if your created it from Hive). Here you have an example of how load data
bq --location=US load --source_format=AVRO --noreplace my_dataset.my_avro_table gs://myfolder/mytablefolder/part-m-00001.avro

Unable to load avro file to BigQuery because of schema mismatch

I am new to big-query and i was trying to load avro file to bigQuery table.For the first two times i was able to load avro file to bigquery table .For the third times onwords it starts failing and the error message is -
Waiting on bqjob_r77fb1a791c9ab204_0000015c88ab3ad8_1 ... (0s) Current
status: DONE BigQuery error in load operation: Error processing job 'xxx-yz-
df:bqjob_r77fb1a791c9ab204_0000015c88ab3ad8_1': Provided Schema does not
match Table xxx-yz-df:adityadb.avro_poc3_part_stage$20120611.
i tried many times .How schema can be mismatch for the same file ,if you try more than two times .The load command which i was using is-
bq load --source_format=AVRO adityadb.avro_poc3_part_stage$20120611 gs://reair_ddh/apps/hive/warehouse/adityadb1.db/avro_poc3_part_txt/ingestion_time=20120611/000000_0
I dont know why this is happening,Any help would be appreciated. Thank you.

Google Dataflow job and BigQuery failing on different regions

I have a Google Dataflow job that is failing on:
BigQuery job ... finished with error(s): errorResult:
Cannot read and write in different locations: source: EU, destination: US, error: Cannot read and write in different locations: source: EU, destination: US
I'm starting the job with
--zone=europe-west1-b
And this is the only part of the pipeline that does anything with BigQuery:
Pipeline p = Pipeline.create(options);
p.apply(BigQueryIO.Read.fromQuery(query));
The BigQuery table I'm reading from has this in the details: Data Location EU
When I run the job locally, I get:
SEVERE: Error opening BigQuery table dataflow_temporary_table_339775 of dataset _dataflow_temporary_dataset_744662 : 404 Not Found
I don't understand why it is trying to write to a different location if I'm only reading data. And even if it needs to create a temporary table, why is it being created in a different region?
Any ideas?
I would suggest to verify:
If the staging location for the Google Dataflow is in the same zone.
If Google Cloud Storage location used in Dataflow is also the in same zone.

Issues creating table from bucket file

I have a big table (About 10 million rows) that I'm trying to pull into my bigquery. I had to upload the CSV into the bucket due to the size constraints when creating the table. When I try to create the table using the Datastore, the job fails with the error:
Error Reason:invalid. Get more information about this error at Troubleshooting Errors: invalid.
Errors:
gs://es_main/provider.csv does not contain valid backup metadata.
Job ID: liquid-cumulus:job_KXxmLZI0Ulch5WmkIthqZ4boGgM
Start Time: Dec 16, 2015, 3:00:51 PM
End Time: Dec 16, 2015, 3:00:51 PM
Destination Table: liquid-cumulus:ES_Main.providercloudtest
Source URI: gs://es_main/provider.csv
Source Format: Datastore Backup
I've troubleshot by using a small sample file of rows from the same table and just uploading using the CSV feature in the table creation without any errors and can view the data just fine.
I'm just wondering what the metadata should be set as with the "Edit metadata" option within the bucket or if there is some other work around I'm missing. Thanks
The error message for the job that you posted is telling you that the file you're providing is not a Datastore Backup file. Note that "Datastore" here means Google Cloud Datastore, which is another storage solution that it sounds like you aren't using. A Cloud Datastore Backup is a specific file type from that storage product which is different from CSV or JSON.
Setting the file metadata within the Google Cloud Storage browser, which is where the "Edit metadata" option you're talking about lives, should have no impact on how BigQuery imports your file. It might be important if you were doing something more involved with your file from Cloud Storage, but it isn't important to BigQuery as far as I know.
To upload a CSV file from Google Cloud Storage to BigQuery, make sure to select the CSV source format and the Google Storage load source as pictured below.