Data Factory copy pipeline from API - azure-sql-database

We use Azure Data Factory copy pipeline to transfer data from REST api's to a Azure SQL Database and it is doing some strange things. Because we loop over a set of API's that need to be transferred the mapping is empty from the copy activity.
But for one API the automatic mapping is going wrong, the destination table is created with all the needed columns and correct datatypes based on the received metadata. When we run the pipeline for that specific API, the following message is showed.
{ "errorCode": "2200", "message": "ErrorCode=SchemaMappingFailedInHierarchicalToTabularStage,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to process hierarchical to tabular stage, error message: Ticks must be between DateTime.MinValue.Ticks and DateTime.MaxValue.Ticks.\r\nParameter name: ticks,Source=Microsoft.DataTransfer.ClientLibrary,'", "failureType": "UserError", "target": "Copy data1", "details": [] }
As a test we did do the mapping for that API manually by using the "Import Schema" option on the Mapping page. there we see that all the fields are correctly mapped. We execute the pipeline again using the mapping and everything is working fine.
But of course, we don't want to use a manually mapping because it is used in a loop for different API's also.

Related

How do I create a bigquery pubsub direct in gcp? I get an error failed to create

I am trying to publish data directly from pubsub to bigquery.
I have created a topic with a schema.
I have created a table.
But when I create the subscription, I get an error request contains an invalid argument
gcloud pubsub subscriptions create check-me.httpobs --topic=check-me.httpobs --bigquery-table=agilicus:checkme.httpobs --write-metadata --use-topic-schema
ERROR: Failed to create subscription [projects/agilicus/subscriptions/check-me.httpobs]: Request contains an invalid argument.
ERROR: (gcloud.pubsub.subscriptions.create) Failed to create the following: [check-me.httpobs].
there's not really a lot of diagnostics i can do here.
Is there any worked out example that shows? What am i doing wrong for this error?
Side note: its really a pain to have to create the BQ schema w/ its native json format, and then create the message schema in avro format. Similar, but different, and no conversion tools that I can find.
If i run with --log-http, it doesn't really enlighten:
{
"error": {
"code": 400,
"message": "Request contains an invalid argument.",
"status": "INVALID_ARGUMENT"
}
}
-- update:
switched to protobuf, same problem.
https://gist.github.com/donbowman/5ea8f8d8017493cbfa3a9e4f6e736bcc has the details.
gcloud version
Google Cloud SDK 404.0.0
alpha 2022.09.23
beta 2022.09.23
bq 2.0.78
bundled-python3-unix 3.9.12
core 2022.09.23
gsutil 5.14
I have confirmed all the fields are present and correct format, as per https://github.com/googleapis/googleapis/blob/master/google/pubsub/v1/pubsub.proto#L639
specifically:
{"ackDeadlineSeconds": 900, "bigqueryConfig": {"dropUnknownFields": true, "table": "agilicus:checkme.httpobs", "useTopicSchema": true, "writeMetadata": true}, "name": "projects/agilicus/subscriptions/check-me.httpobs", "topic": "projects/agilicus/topics/check-me.httpobs"}
I have also tried using the API Explorer to post this, same effect.
I have also tried using the python example:
https://cloud.google.com/pubsub/docs/samples/pubsub-create-bigquery-subscription#pubsub_create_bigquery_subscription-python
to create. Same error w/ a slight bit more info (ip, grpc_status 3)
debug_error_string = "UNKNOWN:Error received from peer ipv6:%5B2607:f8b0:400b:807::200a%5D:443 {created_time:"2022-10-04T20:54:44.600831924-04:00", grpc_status:3, grpc_message:"Request contains an invalid argument."}"

How do we filter the entities which is not start with "msdn" using MSDynamics Web API

I want to get all entities which are the name not start the prefix as 'msdn' from ms dynamics.
I tried the below APIs, got the error.
GET /api/data/v9.1/EntityDefinitions?$select=LogicalName&$filter=not startswith(LogicalName,%27msdn%27)
Response :
{
"error":
{"code":"0x0",
"message":"The \"startswith\" function isn't supported for Metadata Entities."
}
}
I referred https://learn.microsoft.com/en-us/powerapps/developer/common-data-service/webapi/query-data-web-api#standard-query-functions
I have checked that in one of my environment as well. What you require is not possible.
You will have to go 2 steps.
Retrieve all entities and then filter them out in your local program may it be JavaScript/C# or Json filtering/Power automate or something.

Can BigQuery report mismatched the schema field?

When I upsert a row that mismatches schema I get a PartialFailureError along with a message, e.g.:
[ { errors:
[ { message: 'Repeated record added outside of an array.',
reason: 'invalid' } ],
...
]
However for large rows this isn't sufficient, because I have no idea which field is the one creating the error. The bq command does report the malformed field.
Is there either a way to configure or access name of the offending field, or can this be added to the API endpoint?
Please see this Github Issue: https://github.com/googleapis/nodejs-bigquery/issues/70 . Apparently node.js client library is not getting the location field from the API so it's not able to return it to the caller.
Workaround that worked for me: I copied the JSON payload to my Postman client and manually sent a request to REST API (let me know if you need more details of how to do it).

Checking a SQL Azure Database is available from a c# code

I do an up scale with a code like this on an Azure SQL Database:
ALTER DATABASE [MYDATABASE] Modify (SERVICE_OBJECTIVE = 'S1');
How is it possible to know from a c# code when Azure has completed the job and the table is already available?
Checking for SERVICE_OBJECTIVE value is not enough, the process still continue further.
Instead of performing this task in T-SQL I would perform the task from C# using an API call over to the REST API, you can find all of the details on MSDN.
Specifically, you should look at the Get Create or Update Database Status API method which allows you to call the following URL:
GET https://management.azure.com/subscriptions/{subscription-id}/resourceGroups/{resource-group-name}/providers/Microsoft.Sql/servers/{server-name}/databases/{database-name}}/operationResults/{operation-id}?api-version={api-version}
The JSON body allows you to pass the following parameters:
{
"id": "{uri-of-database}",
"name": "{database-name}",
"type": "{database-type}",
"location": "{server-location}",
"tags": {
"{tag-key}": "{tag-value}",
...
},
"properties": {
"databaseId": "{database-id}",
"edition": "{database-edition}",
"status": "{database-status}",
"serviceLevelObjective": "{performance-level}",
"collation": "{collation-name}",
"maxSizeBytes": {max-database-size},
"creationDate": "{date-create}",
"currentServiceLevelObjectiveId":"{current-service-id}",
"requestedServiceObjectiveId":"{requested-service-id}",
"defaultSecondaryLocation": "{secondary-server-location}"
}
}
In the properties section, the serviceLevelObjective property is the one you can use to resize the database. To finish off you can then perform a GET on the Get Database API method where you can compare both the currentServiceLevelObjectiveId and requestedServiceObjectiveId properties to ensure your command has been successful.
Note: Don't forget to pass all of the common parameters required to make API calls in Azure.

BigQuery Command Line Tool: get error details

one of my jobs keeps failing and when I looked into why (by requesting job details) I get the following output:
status": {
"errorResult": {
"location": "gs://sf_auto/Datastore Mapper modules.models.userData/15716706166748C8426AD/output-46",
"message": "JSON table encountered too many errors, giving up. Rows: 1; errors: 1.",
"reason": "invalid"
},
"errors": [
{
"location": "gs://sf_auto/Datastore Mapper modules.models.userData/15716706166748C8426AD/output-46",
"message": "JSON table encountered too many errors, giving up. Rows: 1; errors: 1.",
"reason": "invalid"
}
],
"state": "DONE"
Problem is, it doesn't help at all, and I need more details. Is there anyway to understand which column or attribute caused the failings? Is there any way to get more information?
Edit Additional Details
We're running a map reduce job on appengine to transfer our datastore from appengine to BigQuery
The files are stored on Google Cloud Store
It's creating a brand new table instead of adding to an existing one
Update #2
I played around with the query trying lots of things as well as adjusting the scheme and i've narrowed down the problem to the uuid. For some reason this type of data messes everything up:
"uuid": "XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX"
The schema defines it as a String
OK, after loads of debugging I found the error... in the JSON Newline file we had two attributes that were similar:
uuid: "XXX..."
uuId: "XXX..."
This has been there for a while so I think some change within bigquery started to require that keys be unique regardless of capitalization. Will test so more and confirm!
A recent change made loads of JSON data case insensitive in field names similar to be consistent with how SQL queries treat field names. I have opened a work item to track the improvement of error message for this case.