I have a table with this schema :
I'm trying to upload some data from Google Coud Storage using the python client. The file is JSON newline delimited. Most of my lines don't have the field "passenger_origin.accuracy" but when the filed is present I have the following error :
Error while reading
data, error message: JSON parsing error in row starting at position
2122510: No such field: driver_origin.accuracy. (error code: invalid)
Error while reading
data, error message: JSON parsing error in row starting at position
2126317: No such field: passenger_origin.accuracy. (error code:
invalid)
Example of an invalid row :
{
"id": 1479443,
"is_obsolete": 0,
"seat_count": 1,
"is_ticket_checked": 0,
"score": 0.3709318902,
"is_multimodal": 0,
"fake_paths": 0,
"passenger_origin": {
"id": 2204,
"poi_uuid": "15b4e52c-7c58-442c-98df-1eb06079f6bb",
"user_id": 1987,
"accuracy": 250.0,
"disabled": 0,
"last_update": "2017-03-10T15:15:39",
"created": "2016-02-05T17:06:26",
"modified_by_user": 1,
"is_recurrent": 0,
"source": 1,
"hidden_by_user": 0,
"kind": 2,
},
"driver_origin": {
"id": 412491,
"poi_uuid": "47e90b6d-e178-4e02-9f02-f4ea5f8beaa1",
"user_id": 71471,
"disabled": 0,
"last_update": "2017-11-02T10:09:09",
"created": "2017-11-02T10:09:09",
"modified_by_user": 0,
"is_recurrent": 0,
"source": 1,
"hidden_by_user": 0,
"kind": 2,
},
"passenger_destination": {
"id": 2203,
"poi_uuid": "c531c3ca-47f0-4003-8098-1272fee8d018",
"user_id": 1987,
"accuracy": 250.0,
"disabled": 0,
"last_update": "2017-03-10T15:12:42",
"created": "2016-02-05T17:06:19",
"modified_by_user": 1,
"is_recurrent": 0,
"source": 1,
"hidden_by_user": 0,
"kind": 1,
}
}
The table is created before the upload of the data and is not modified since. I don't understand why the upload is failing on theses fields ? Do the RECORD fields have to be REPEATED ?
To ignore the fields that aren't present in the schema, use a combination of:
configuration.load.ignoreUnknownValues
configuration.load.maxBadRecords
Setting the first to true and the second to some arbitrarily-high number, e.g. 100000, will enable the load to succeed even if there are extra fields.
The problem was configuration.load.autodetect was set to True. I set it to False and the problem was fix
Related
In Azure Data Factory, I have a copy activity. The data source is the response body from a REST API POST request.
The sink is a SQL table. The problem is that, even though my JSON data contains multiple rows, only the first row is getting copied.
The source data looks like the following:
{
"offset": 0,
"limit": 1000,
"total": 65,
"loaded": 34,
"unloaded": 31,
"cubeCaches": [
{
"id": "MxMUVDN0Q1MzAk5MDg6RDkxREQxMUU5RDBDNzR2NMTk6YWNsZGxwMTJtc3QuY2952aXppZW50aW5==",
"projectId": "15D91DD11E9D0C74B3319",
"source": {
"name": "12302021",
"id": "07EF95111EC7F954158",
"type": "cube"
},
"state": {
"active": true,
"dirty": false,
"infoDirty": false,
"persisted": true,
"processing": false,
"loadedState": "loaded"
},
"lastUpdateTime": "2022-01-24T14:22:30Z",
"lastHitTime": "2022-02-14T20:02:02Z",
"hitCount": 1,
"size": 798720,
"creatorId": "D4E8BFD56085",
"lastUpdateJob": 18937,
"openViewCount": 0,
"creationTime": "2022-01-24T15:07:24Z",
"historicHitCount": 22,
"dataLanguages": [],
"rowCount": 2726,
"columnCount": 9
},
{
"id": "UYwMTIxMUFNjkxMUU5RDBDMTRCNkMwMDgwRUYzNUQ0MUI6YWNsZjLmNvbQ==",
"projectId": "120D0C1480EF35D41B",
"source": {
"name": "All Clients (YTD)",
"id": "49E5B13466251CD0B54E8F",
"type": "cube"
},
"state": {
"active": true,
"dirty": false,
"infoDirty": false,
"persisted": true,
"processing": false,
"loadedState": "loaded"
},
"lastUpdateTime": "2022-01-03T01:00:01Z",
"hitCount": 0,
"size": 82488152,
"creatorId": "1E2AFB011E80EF35FF14",
"lastUpdateJob": 364091,
"openViewCount": 0,
"creationTime": "2022-02-14T01:04:55Z",
"historicHitCount": 0,
"dataLanguages": [],
"rowCount": 8146903,
"columnCount": 13
}
}
I want to add a row in the Sink table (SQL) for every "id" in the JSON. However, when I run the activity, only the first record gets copied. It's mapped correctly, but I want it to copy all rows in the JSON, not just 1.
My Mapping tab in Azure Data Factory looks like this:
What am I doing wrong here? I'm thinking there is something wrong with my "Source" syntax for each of the columns...
In $cubeCashes[0][...] you're explicitly mapping the first element from this array into columns, and that's why only one row lands in the Sink.
I don’t know a way to achieve what you intend with copy activity only. I would use the Mapping Data Flow here, and inlide I would flatten (Flatten activity) your data to get the array of objects.
Then from this flattened dataset you could use a Derived Column to map the fields in JSON into columns of your target, Select, to remove unwanted original fields, and Sink it into your target location.
I have a database that has a column with a long string and I'm looking for a way to extract just a certain portion of it.
Here is a sample:
{
"vendorId": 53,
"externalRef": "38828059 $567.82",
"lines": [{
"amount": 0,
"lineType": "PURCHASE",
"lineItemType": "INVENTORY",
"inventory": {
"cost": 0,
"quantity": 1,
"row": "6",
"seatType": "CONSECUTIVE",
"section": "102",
"notes": "http://testurl/0F005B52CE7F5892 38828059 $567.82 ,special",
"splitType": "ANY",
"stockType": "ELECTRONIC",
"listPrice": 0,
"publicNotes": " https://brokers.123.com/wholesale/event/146489908 https://www.123.com/buy-event/4897564 ",
"eventId": 3757669,
"eventMapping": {
"eventDate": "",
"eventName": "Brandi Carlile: Beyond These Silent Days Tour",
"venueName": "Gorge Amphitheatre"
},
"tickets": [{
"seatNumber": 1527
}]
}
}]
}
What I'm looking to extract is just http://testurl/0F005B52CE7F5892
Would someone be able to assist me with the syntax how to call my query that it will make a new temp column and give me just this extracted value for each row in this column?
I user SQL Server 2008 so some newer functions won't work for me.
Upgrade your SQL Server to a supported version.
But till then, we pity those who dare to face the horror of handling JSON with only the old string functions.
select
[notes_url] =
CASE
WHEN [json_column] LIKE '%"notes": "http%'
THEN substring([json_column],
patindex('%"notes": "http%', [json_column])+10,
charindex(' ', [json_column] ,
patindex('%"notes": "http%', [json_column])+15)
- patindex('%"notes": "http%', [json_column])-10)
END
from [YourTable];
db<>fiddle here
I need to add two json value which is coming dynamically from one activity and one variable value of pipeline in data factory.
I am doing it like this as below.
#union(activity('Get Order Events Data').output, json('{"orig_orderID" : "variables('orderid')"}'))
But it is showing error.
Missing comma between arguments
What i am doing wrong here.
But it is showing error. Missing comma between arguments
This is the expression variables('orderid') has ' in it which splits your expression.
You should use concat() function to do this #union(activity('Get Order Events Data').output, json(concat('{"orig_orderID" :',variables('orderid'),'}'))). But this
expression can't get your expected result due to it wouldn't add in your data. It would be like this:
{
"data": [
{
"id": 145,
"order_id": 256,
"created_at": "2021-06-20T11:48:20Z",
"type": 10,
"sender": -1,
"message": null,
"previous_status": 4,
"fas_user_id": null,
"event_data": "5",
"shopkeeper_timestamp": null,
"store_id": 123
}
],
"orig_orderID": "860"
}
You can try the following expression:#union(activity('Get Order Events Data').output.data[0], json(concat('{"orig_orderID" :',variables('orderid'),'}')))
it can get the result:
{
"id": 145,
"order_id": 256,
"created_at": "2021-06-20T11:48:20Z",
"type": 10,
"sender": -1,
"message": null,
"previous_status": 4,
"fas_user_id": null,
"event_data": "5",
"shopkeeper_timestamp": null,
"store_id": 123,
"orig_orderID": "860"
}
One of my column is jsonb and have value in the format. The value of a single row of column is below.
{
"835": {
"cost": 0,
"name": "FACEBOOK_FB1_6JAN2020",
"email": "test.user#silverpush.co",
"views": 0,
"clicks": 0,
"impressions": 0,
"campaign_state": "paused",
"processed":"in_progress",
"modes":["obj1","obj2"]
},
"876": {
"cost": 0,
"name": "MARVEL_BLACK_WIDOW_4DEC2019",
"email": "test.user#silverpush.co",
"views": 0,
"clicks": 0,
"impressions": 0,
"campaign_state": "paused",
"processed":"in_progress",
"modes":["obj1","obj2"]
}
}
I want to update campaign_info(column name) column's the inner key "processed" and "models" of the campaign_id is "876".
I have tried this query:
update safe_vid_info
set campaign_info -> '835' --> 'processed'='completed'
where cid = 'kiywgh';
But it didn't work.
Any help is appreciated. Thanks.
Is this what you want?
jsonb_set(campaign_info, '{876,processed}', '"completed"')
This updates the value at path "876" > "processed" with value 'completed'.
In your update query:
update safe_vid_info
set campaign_info = jsonb_set(campaign_info, '{876,processed}', '"completed"')
where cid = 'kiywgh';
We are trying to collect some reports about how our API is used by customers. We use Azure API Management, and I can see that in API Management portal such data exist, I can see what I need to know going to portal in Admin>Activity section. Like how many calls individual user made for particular API, and I can filter it by date.
Question: How to get this data out of the system. Preferable using some API to have continues export. But, even manually?
The API to get request level analytics is
GET https://management.azure.com/subscriptions/subid/resourceGroups/rg1/providers/Microsoft.ApiManagement/service/apimService1/reports/byRequest?$filter=timestamp ge datetime'2017-06-01T00:00:00' and timestamp le datetime'2017-06-04T00:00:00'&api-version=2017-03-01
The response includes the ApiId, OperationId, UserId, his subscriptionId to the Product etc, which might be beneficial to you.
{
"value": [
{
"apiId": "/apis/5931a75ae4bbd512a88c680b",
"operationId": "/apis/5931a75ae4bbd512a88c680b/operations/-",
"productId": "/products/-",
"userId": "/users/1",
"method": "GET",
"url": "https://apimService1.azure-api.net/echo/resource?param1=sample",
"ipAddress": "207.xx.155.xx",
"responseCode": 404,
"responseSize": 405,
"timestamp": "2017-06-03T00:17:00.1649134Z",
"cache": "none",
"apiTime": 221.1544,
"serviceTime": 0,
"apiRegion": "East Asia",
"subscriptionId": "/subscriptions/5600b59475ff190048070002",
"requestId": "63e7119c-26aa-433c-96d7-f6f3267ff52f",
"requestSize": 0
}]
}
Check this out Reports_ByRequest
Also, check the Azure monitor integration
For those who are looking for aggregate usage by user (perhaps for monetization) - there is a "byUser" endpoint as well. The request is structured like below:
https://{api-service-name}.management.azure-api.net/subscriptions/{subscription}/resourceGroups/{resource-group}/providers/Microsoft.ApiManagement/service/{api-service-name}/reports/byUser?$filter=timestamp ge datetime'2019-12-01T00:00:00' and timestamp le datetime'2019-12-04T00:00:00'&api-version=2017-03-01
The documentation says to make a request to "https://management.azure-api.net/[...]" but I had to prepend the resource name like in the request above.
And the response:
{
"value": [
{
"name": "Administrator",
"userId": "/users/1",
"callCountSuccess": 13,
"callCountBlocked": 1,
"callCountFailed": 0,
"callCountOther": 0,
"callCountTotal": 14,
"bandwidth": 11019,
"cacheHitCount": 0,
"cacheMissCount": 0,
"apiTimeAvg": 1015.7607923076923,
"apiTimeMin": 330.3206,
"apiTimeMax": 1819.2173,
"serviceTimeAvg": 957.094776923077,
"serviceTimeMin": 215.24,
"serviceTimeMax": 1697.3612
},
{
"name": "Samir Solanki",
"userId": "/users/56eaec62baf08b06e46d27fd",
"callCountSuccess": 0,
"callCountBlocked": 0,
"callCountFailed": 0,
"callCountOther": 0,
"callCountTotal": 0,
"bandwidth": 0,
"cacheHitCount": 0,
"cacheMissCount": 0,
"apiTimeAvg": 0,
"apiTimeMin": 0,
"apiTimeMax": 0,
"serviceTimeAvg": 0,
"serviceTimeMin": 0,
"serviceTimeMax": 0
},
{
"name": "Anonymous",
"userId": "/users/54c800b332965a0035030000",
"callCountSuccess": 0,
"callCountBlocked": 0,
"callCountFailed": 0,
"callCountOther": 0,
"callCountTotal": 0,
"bandwidth": 0,
"cacheHitCount": 0,
"cacheMissCount": 0,
"apiTimeAvg": 0,
"apiTimeMin": 0,
"apiTimeMax": 0,
"serviceTimeAvg": 0,
"serviceTimeMin": 0,
"serviceTimeMax": 0
}
],
"count": 3,
"nextLink": ""
}
If you need to filter by type of request or API, you can do that as well - List by User