bigquery nested object : No such field - google-bigquery

I have a table with this schema :
I'm trying to upload some data from Google Coud Storage using the python client. The file is JSON newline delimited. Most of my lines don't have the field "passenger_origin.accuracy" but when the filed is present I have the following error :
Error while reading
data, error message: JSON parsing error in row starting at position
2122510: No such field: driver_origin.accuracy. (error code: invalid)
Error while reading
data, error message: JSON parsing error in row starting at position
2126317: No such field: passenger_origin.accuracy. (error code:
invalid)
Example of an invalid row :
{
"id": 1479443,
"is_obsolete": 0,
"seat_count": 1,
"is_ticket_checked": 0,
"score": 0.3709318902,
"is_multimodal": 0,
"fake_paths": 0,
"passenger_origin": {
"id": 2204,
"poi_uuid": "15b4e52c-7c58-442c-98df-1eb06079f6bb",
"user_id": 1987,
"accuracy": 250.0,
"disabled": 0,
"last_update": "2017-03-10T15:15:39",
"created": "2016-02-05T17:06:26",
"modified_by_user": 1,
"is_recurrent": 0,
"source": 1,
"hidden_by_user": 0,
"kind": 2,
},
"driver_origin": {
"id": 412491,
"poi_uuid": "47e90b6d-e178-4e02-9f02-f4ea5f8beaa1",
"user_id": 71471,
"disabled": 0,
"last_update": "2017-11-02T10:09:09",
"created": "2017-11-02T10:09:09",
"modified_by_user": 0,
"is_recurrent": 0,
"source": 1,
"hidden_by_user": 0,
"kind": 2,
},
"passenger_destination": {
"id": 2203,
"poi_uuid": "c531c3ca-47f0-4003-8098-1272fee8d018",
"user_id": 1987,
"accuracy": 250.0,
"disabled": 0,
"last_update": "2017-03-10T15:12:42",
"created": "2016-02-05T17:06:19",
"modified_by_user": 1,
"is_recurrent": 0,
"source": 1,
"hidden_by_user": 0,
"kind": 1,
}
}
The table is created before the upload of the data and is not modified since. I don't understand why the upload is failing on theses fields ? Do the RECORD fields have to be REPEATED ?

To ignore the fields that aren't present in the schema, use a combination of:
configuration.load.ignoreUnknownValues
configuration.load.maxBadRecords
Setting the first to true and the second to some arbitrarily-high number, e.g. 100000, will enable the load to succeed even if there are extra fields.

The problem was configuration.load.autodetect was set to True. I set it to False and the problem was fix

Related

Azure Data Factory JSON syntax

In Azure Data Factory, I have a copy activity. The data source is the response body from a REST API POST request.
The sink is a SQL table. The problem is that, even though my JSON data contains multiple rows, only the first row is getting copied.
The source data looks like the following:
{
"offset": 0,
"limit": 1000,
"total": 65,
"loaded": 34,
"unloaded": 31,
"cubeCaches": [
{
"id": "MxMUVDN0Q1MzAk5MDg6RDkxREQxMUU5RDBDNzR2NMTk6YWNsZGxwMTJtc3QuY2952aXppZW50aW5==",
"projectId": "15D91DD11E9D0C74B3319",
"source": {
"name": "12302021",
"id": "07EF95111EC7F954158",
"type": "cube"
},
"state": {
"active": true,
"dirty": false,
"infoDirty": false,
"persisted": true,
"processing": false,
"loadedState": "loaded"
},
"lastUpdateTime": "2022-01-24T14:22:30Z",
"lastHitTime": "2022-02-14T20:02:02Z",
"hitCount": 1,
"size": 798720,
"creatorId": "D4E8BFD56085",
"lastUpdateJob": 18937,
"openViewCount": 0,
"creationTime": "2022-01-24T15:07:24Z",
"historicHitCount": 22,
"dataLanguages": [],
"rowCount": 2726,
"columnCount": 9
},
{
"id": "UYwMTIxMUFNjkxMUU5RDBDMTRCNkMwMDgwRUYzNUQ0MUI6YWNsZjLmNvbQ==",
"projectId": "120D0C1480EF35D41B",
"source": {
"name": "All Clients (YTD)",
"id": "49E5B13466251CD0B54E8F",
"type": "cube"
},
"state": {
"active": true,
"dirty": false,
"infoDirty": false,
"persisted": true,
"processing": false,
"loadedState": "loaded"
},
"lastUpdateTime": "2022-01-03T01:00:01Z",
"hitCount": 0,
"size": 82488152,
"creatorId": "1E2AFB011E80EF35FF14",
"lastUpdateJob": 364091,
"openViewCount": 0,
"creationTime": "2022-02-14T01:04:55Z",
"historicHitCount": 0,
"dataLanguages": [],
"rowCount": 8146903,
"columnCount": 13
}
}
I want to add a row in the Sink table (SQL) for every "id" in the JSON. However, when I run the activity, only the first record gets copied. It's mapped correctly, but I want it to copy all rows in the JSON, not just 1.
My Mapping tab in Azure Data Factory looks like this:
What am I doing wrong here? I'm thinking there is something wrong with my "Source" syntax for each of the columns...
In $cubeCashes[0][...] you're explicitly mapping the first element from this array into columns, and that's why only one row lands in the Sink.
I don’t know a way to achieve what you intend with copy activity only. I would use the Mapping Data Flow here, and inlide I would flatten (Flatten activity) your data to get the array of objects.
Then from this flattened dataset you could use a Derived Column to map the fields in JSON into columns of your target, Select, to remove unwanted original fields, and Sink it into your target location.

SQL Server trim before and after specific values

I have a database that has a column with a long string and I'm looking for a way to extract just a certain portion of it.
Here is a sample:
{
"vendorId": 53,
"externalRef": "38828059 $567.82",
"lines": [{
"amount": 0,
"lineType": "PURCHASE",
"lineItemType": "INVENTORY",
"inventory": {
"cost": 0,
"quantity": 1,
"row": "6",
"seatType": "CONSECUTIVE",
"section": "102",
"notes": "http://testurl/0F005B52CE7F5892 38828059 $567.82 ,special",
"splitType": "ANY",
"stockType": "ELECTRONIC",
"listPrice": 0,
"publicNotes": " https://brokers.123.com/wholesale/event/146489908 https://www.123.com/buy-event/4897564 ",
"eventId": 3757669,
"eventMapping": {
"eventDate": "",
"eventName": "Brandi Carlile: Beyond These Silent Days Tour",
"venueName": "Gorge Amphitheatre"
},
"tickets": [{
"seatNumber": 1527
}]
}
}]
}
What I'm looking to extract is just http://testurl/0F005B52CE7F5892
Would someone be able to assist me with the syntax how to call my query that it will make a new temp column and give me just this extracted value for each row in this column?
I user SQL Server 2008 so some newer functions won't work for me.
Upgrade your SQL Server to a supported version.
But till then, we pity those who dare to face the horror of handling JSON with only the old string functions.
select
[notes_url] =
CASE
WHEN [json_column] LIKE '%"notes": "http%'
THEN substring([json_column],
patindex('%"notes": "http%', [json_column])+10,
charindex(' ', [json_column] ,
patindex('%"notes": "http%', [json_column])+15)
- patindex('%"notes": "http%', [json_column])-10)
END
from [YourTable];
db<>fiddle here

Add two json values dynamically in azure data factory

I need to add two json value which is coming dynamically from one activity and one variable value of pipeline in data factory.
I am doing it like this as below.
#union(activity('Get Order Events Data').output, json('{"orig_orderID" : "variables('orderid')"}'))
But it is showing error.
Missing comma between arguments
What i am doing wrong here.
But it is showing error. Missing comma between arguments
This is the expression variables('orderid') has ' in it which splits your expression.
You should use concat() function to do this #union(activity('Get Order Events Data').output, json(concat('{"orig_orderID" :',variables('orderid'),'}'))). But this
expression can't get your expected result due to it wouldn't add in your data. It would be like this:
{
"data": [
{
"id": 145,
"order_id": 256,
"created_at": "2021-06-20T11:48:20Z",
"type": 10,
"sender": -1,
"message": null,
"previous_status": 4,
"fas_user_id": null,
"event_data": "5",
"shopkeeper_timestamp": null,
"store_id": 123
}
],
"orig_orderID": "860"
}
You can try the following expression:#union(activity('Get Order Events Data').output.data[0], json(concat('{"orig_orderID" :',variables('orderid'),'}')))
it can get the result:
{
"id": 145,
"order_id": 256,
"created_at": "2021-06-20T11:48:20Z",
"type": 10,
"sender": -1,
"message": null,
"previous_status": 4,
"fas_user_id": null,
"event_data": "5",
"shopkeeper_timestamp": null,
"store_id": 123,
"orig_orderID": "860"
}

Update jsonb object in postgres

One of my column is jsonb and have value in the format. The value of a single row of column is below.
{
"835": {
"cost": 0,
"name": "FACEBOOK_FB1_6JAN2020",
"email": "test.user#silverpush.co",
"views": 0,
"clicks": 0,
"impressions": 0,
"campaign_state": "paused",
"processed":"in_progress",
"modes":["obj1","obj2"]
},
"876": {
"cost": 0,
"name": "MARVEL_BLACK_WIDOW_4DEC2019",
"email": "test.user#silverpush.co",
"views": 0,
"clicks": 0,
"impressions": 0,
"campaign_state": "paused",
"processed":"in_progress",
"modes":["obj1","obj2"]
}
}
I want to update campaign_info(column name) column's the inner key "processed" and "models" of the campaign_id is "876".
I have tried this query:
update safe_vid_info
set campaign_info -> '835' --> 'processed'='completed'
where cid = 'kiywgh';
But it didn't work.
Any help is appreciated. Thanks.
Is this what you want?
jsonb_set(campaign_info, '{876,processed}', '"completed"')
This updates the value at path "876" > "processed" with value 'completed'.
In your update query:
update safe_vid_info
set campaign_info = jsonb_set(campaign_info, '{876,processed}', '"completed"')
where cid = 'kiywgh';

How to get API usage data from Azure API Management?

We are trying to collect some reports about how our API is used by customers. We use Azure API Management, and I can see that in API Management portal such data exist, I can see what I need to know going to portal in Admin>Activity section. Like how many calls individual user made for particular API, and I can filter it by date.
Question: How to get this data out of the system. Preferable using some API to have continues export. But, even manually?
The API to get request level analytics is
GET https://management.azure.com/subscriptions/subid/resourceGroups/rg1/providers/Microsoft.ApiManagement/service/apimService1/reports/byRequest?$filter=timestamp ge datetime'2017-06-01T00:00:00' and timestamp le datetime'2017-06-04T00:00:00'&api-version=2017-03-01
The response includes the ApiId, OperationId, UserId, his subscriptionId to the Product etc, which might be beneficial to you.
{
"value": [
{
"apiId": "/apis/5931a75ae4bbd512a88c680b",
"operationId": "/apis/5931a75ae4bbd512a88c680b/operations/-",
"productId": "/products/-",
"userId": "/users/1",
"method": "GET",
"url": "https://apimService1.azure-api.net/echo/resource?param1=sample",
"ipAddress": "207.xx.155.xx",
"responseCode": 404,
"responseSize": 405,
"timestamp": "2017-06-03T00:17:00.1649134Z",
"cache": "none",
"apiTime": 221.1544,
"serviceTime": 0,
"apiRegion": "East Asia",
"subscriptionId": "/subscriptions/5600b59475ff190048070002",
"requestId": "63e7119c-26aa-433c-96d7-f6f3267ff52f",
"requestSize": 0
}]
}
Check this out Reports_ByRequest
Also, check the Azure monitor integration
For those who are looking for aggregate usage by user (perhaps for monetization) - there is a "byUser" endpoint as well. The request is structured like below:
https://{api-service-name}.management.azure-api.net/subscriptions/{subscription}/resourceGroups/{resource-group}/providers/Microsoft.ApiManagement/service/{api-service-name}/reports/byUser?$filter=timestamp ge datetime'2019-12-01T00:00:00' and timestamp le datetime'2019-12-04T00:00:00'&api-version=2017-03-01
The documentation says to make a request to "https://management.azure-api.net/[...]" but I had to prepend the resource name like in the request above.
And the response:
{
"value": [
{
"name": "Administrator",
"userId": "/users/1",
"callCountSuccess": 13,
"callCountBlocked": 1,
"callCountFailed": 0,
"callCountOther": 0,
"callCountTotal": 14,
"bandwidth": 11019,
"cacheHitCount": 0,
"cacheMissCount": 0,
"apiTimeAvg": 1015.7607923076923,
"apiTimeMin": 330.3206,
"apiTimeMax": 1819.2173,
"serviceTimeAvg": 957.094776923077,
"serviceTimeMin": 215.24,
"serviceTimeMax": 1697.3612
},
{
"name": "Samir Solanki",
"userId": "/users/56eaec62baf08b06e46d27fd",
"callCountSuccess": 0,
"callCountBlocked": 0,
"callCountFailed": 0,
"callCountOther": 0,
"callCountTotal": 0,
"bandwidth": 0,
"cacheHitCount": 0,
"cacheMissCount": 0,
"apiTimeAvg": 0,
"apiTimeMin": 0,
"apiTimeMax": 0,
"serviceTimeAvg": 0,
"serviceTimeMin": 0,
"serviceTimeMax": 0
},
{
"name": "Anonymous",
"userId": "/users/54c800b332965a0035030000",
"callCountSuccess": 0,
"callCountBlocked": 0,
"callCountFailed": 0,
"callCountOther": 0,
"callCountTotal": 0,
"bandwidth": 0,
"cacheHitCount": 0,
"cacheMissCount": 0,
"apiTimeAvg": 0,
"apiTimeMin": 0,
"apiTimeMax": 0,
"serviceTimeAvg": 0,
"serviceTimeMin": 0,
"serviceTimeMax": 0
}
],
"count": 3,
"nextLink": ""
}
If you need to filter by type of request or API, you can do that as well - List by User