How to save only specific JSON elements in a new OpenRefine column

How to save only specific JSON elements in a new OpenRefine column - openrefine

{
"business_id": "SQ0j7bgSTazkVQlF5AnqyQ",
"full_address": "214 E Main St\nCarnegie\nCarnegie, PA 15106",
"hours": {},
"open": true,
** "categories": ["Chinese", "Restaurants"] ** ,
"city": "Carnegie",
"review_count": 9,
"name": "Don Don Chinese Restaurant",
"neighborhoods": ["Carnegie"],
"longitude": -80.0849615,
"state": "PA",
"stars": 2.5,
"latitude": 40.4083473,
"attributes": {
"Take-out": true,
"Alcohol": "none",
"Noise Level": "quiet",
"Parking": {
"garage": false,
"street": false,
"validated": false,
"lot": false,
"valet": false
},
"Delivery": true,
"Has TV": true,
"Outdoor Seating": false,
"Attire": "casual",
"Waiter Service": false,
"Accepts Credit Cards": true,
"Good for Kids": true,
"Good For Groups": false,
"Price Range": 1
},
"type": "business"
}
value.parseJson()['categories'] will create a new column called 'categories' in OpenRefine, but is it possible to filter and keep 'chinese' as the only value and remove any other values?

In the example above, the GREL expression:
value.parseJson()['categories']
results in an array containing two values:
["Chinese", "Restaurants"]
You can manipulate this with GREL expressions that act on arrays. For example, to select the first value in the array you could use:
value.parseJson()['categories'][0]
Which would select the first entry in the array (increase the number in square brackets at the end of the expression to select other entries in the array)
If you want to filter on a specific value in the array you could use the 'filter' expression:
filter(value.parseJson()['categories'],v,v=="Chinese")
This would result in a new array with only the word "Chinese" in it (in the above example). To store this in the new column, you need to convert the array into a string:
filter(value.parseJson()['categories'],v,v=="Chinese").join("")
To avoid issues with case-sensitivity, and the possibility of 'Chinese' appearing more than once in the 'categories' array, I'd convert the values to lowercase first and de-duplicate the array before converting to a string - so you end up with:
filter(forEach(value.parseJson()["categories"],v,v.toLowercase()),w,w=="chinese").uniques().join("")

Related

Azure Data Factory JSON syntax

In Azure Data Factory, I have a copy activity. The data source is the response body from a REST API POST request.
The sink is a SQL table. The problem is that, even though my JSON data contains multiple rows, only the first row is getting copied.
The source data looks like the following:
{
"offset": 0,
"limit": 1000,
"total": 65,
"loaded": 34,
"unloaded": 31,
"cubeCaches": [
{
"id": "MxMUVDN0Q1MzAk5MDg6RDkxREQxMUU5RDBDNzR2NMTk6YWNsZGxwMTJtc3QuY2952aXppZW50aW5==",
"projectId": "15D91DD11E9D0C74B3319",
"source": {
"name": "12302021",
"id": "07EF95111EC7F954158",
"type": "cube"
},
"state": {
"active": true,
"dirty": false,
"infoDirty": false,
"persisted": true,
"processing": false,
"loadedState": "loaded"
},
"lastUpdateTime": "2022-01-24T14:22:30Z",
"lastHitTime": "2022-02-14T20:02:02Z",
"hitCount": 1,
"size": 798720,
"creatorId": "D4E8BFD56085",
"lastUpdateJob": 18937,
"openViewCount": 0,
"creationTime": "2022-01-24T15:07:24Z",
"historicHitCount": 22,
"dataLanguages": [],
"rowCount": 2726,
"columnCount": 9
},
{
"id": "UYwMTIxMUFNjkxMUU5RDBDMTRCNkMwMDgwRUYzNUQ0MUI6YWNsZjLmNvbQ==",
"projectId": "120D0C1480EF35D41B",
"source": {
"name": "All Clients (YTD)",
"id": "49E5B13466251CD0B54E8F",
"type": "cube"
},
"state": {
"active": true,
"dirty": false,
"infoDirty": false,
"persisted": true,
"processing": false,
"loadedState": "loaded"
},
"lastUpdateTime": "2022-01-03T01:00:01Z",
"hitCount": 0,
"size": 82488152,
"creatorId": "1E2AFB011E80EF35FF14",
"lastUpdateJob": 364091,
"openViewCount": 0,
"creationTime": "2022-02-14T01:04:55Z",
"historicHitCount": 0,
"dataLanguages": [],
"rowCount": 8146903,
"columnCount": 13
}
}
I want to add a row in the Sink table (SQL) for every "id" in the JSON. However, when I run the activity, only the first record gets copied. It's mapped correctly, but I want it to copy all rows in the JSON, not just 1.
My Mapping tab in Azure Data Factory looks like this:
What am I doing wrong here? I'm thinking there is something wrong with my "Source" syntax for each of the columns...

In $cubeCashes[0][...] you're explicitly mapping the first element from this array into columns, and that's why only one row lands in the Sink.
I don’t know a way to achieve what you intend with copy activity only. I would use the Mapping Data Flow here, and inlide I would flatten (Flatten activity) your data to get the array of objects.
Then from this flattened dataset you could use a Derived Column to map the fields in JSON into columns of your target, Select, to remove unwanted original fields, and Sink it into your target location.

Looping JSON Array in JSONB field

I want to loop over JSONB column and get certain values (price, discount_price, and currency) of relevant JSON objects to my filter. But I get this error:
syntax error at or near "FOR"
Value of the parts column which is JSONB:
[
{
"item_tags": ["black", "optional"],
"name": "Keyboard",
"price": 50,
"currency": "USD",
"discount_price": 40
},
{
"item_tags": ["white", "optional"],
"name": "Mouse",
"price": 40,
"currency": "USD",
"discount_price": 30
}
]
My query ($1 is the user input. Can be 'optional' or 'required'):
SELECT
id,
title,
FOR element IN SELECT * FROM jsonb_array_elements(parts)
LOOP
CASE
WHEN element->'item_tags' #> $1
THEN SELECT element->>'discount_price' AS price, element->>'currency' AS currency
ELSE SELECT element->>'price' AS price, element->>'currency' AS currency
END
END LOOP
FROM items;
This is the output I want to get if $1 is equal to 'optional':
{
"id": 1,
"title": "example title",
"parts": [
{
"name": "Keyboard",
"discount_price": 40,
"currency": "USD"
},
{
"name": "Mouse",
"discount_price": 30,
"currency": "USD"
}
]
}
Any help is highly appreciated. I follow official docs but it is not beginner-friendly. I use PostgreSQL 13.

You need to unnest the array, filter out the unwanted parts, remove the unwanted key, then aggregate the changed parts back into a JSON array.
This can be done using a scalar sub-query:
select id, title,
(select jsonb_agg(x.part - 'item_tags')
from jsonb_array_elements(i.parts) as x(part)
where (x.part -> 'item_tags') ? 'optional')
from items i;
The expression x.part - 'item_tags' removes the item_tags key from the JSON object. The ? operator tests if the item_tags array contains the string on the right hand side. And jsonb_agg() then aggregates those JSON values back into an array.
You can pass your parameter in the place of the 'optional' string.

JSON Parsing in Snowflake - Square Brackets At Start

I'm trying to parse out some JSON files in snowflake. In this case, I'd like to extract the "gift card" from the line that has "fulfillment_service": "gift_card". I've had success querying one dimensional JSON data, but this - with the square brackets - is confounding me.
Here's my simple query - I've created a small table called "TEST_WEEK"
select line_items:fulfillment_service
from TEST_WEEK
, lateral flatten(FULFILLMENTS:line_items) line_items;
Hopefully this isn't too basic a question. I'm very new with parsing JSON.
Thanks in advance!
Here's the start of the FULLFILLMENTS field with the info I want to get at.
[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA",

Maybe you can use two lateral flatten to process values in line_items array:
Sample table:
create table TEST_WEEK( FULFILLMENTS variant ) as
select parse_json(
'[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA"
}
}
]
}
]');
Sample query:
select s.VALUE:fulfillment_service
from TEST_WEEK,
lateral flatten( FULFILLMENTS ) f,
lateral flatten( f.VALUE:line_items ) s;
The output:
+-----------------------------+
| S.VALUE:FULFILLMENT_SERVICE |
+-----------------------------+
| "gift_card" |
+-----------------------------+

Those square brackets indicate that you have an array of JSON objects in your FULLFILLMENTS field. Unless there is a real need to have an array of objects in one field you should have a look at the STRIP_OUTER_ARRAY property of the COPY command. An example can be found here in the Snowflake documentation:
copy into <table>
from #~/<file>.json
file_format = (type = 'JSON' strip_outer_array = true);

In case others are stuck with same data issue (all json data in one array), I have this solution:
select f.VALUE:fulfillment_service::string
from TEST_WEEK,
lateral flatten( FULFILLMENTS[0].line_items ) f;
With this, you just grab the first element of the array (which is the only element).
If you have nested array elements, just add this to the lateral flatten:
, RECURSIVE => TRUE, mode => 'array'

How to query a json string in Postgres

I want to get a specific string from a column. How can I get that.
Here is the json with column name json with table name my_table
I want to fetch "extensionAttribute.simSerial": "310240000029929".
{
"name": "urn:imei:930000001801583",
"type": "DEVICE",
"sourceId": "P-n1000USCqT4",
"consumers": "CDM",
"crudStatus": {
"status": "SUCCESS",
"operation": "UPDATE"
},
"targetName": "urn:imei:930000001801583",
"deviceTypeId": "dgs11b74714f7020ctoogu3zfugc6",
"createdUserId": "a8aacc5d978d494eb54ae4243e714646",
"onboardStatus": "DONE",
"consrStatus": "",
"deviceTypeName": "performance_device_type_2",
"lwm2mPskSecret": "49443",
"createdUserName": "manager",
"bootstrapRequest": true,
"lwm2mPskIdentity": "urn:imei:930000001801583",
"boostrapPskSecret": "49443",
"deviceTypeVersion": 1,
"lastUpdatedUserId": "",
"lwm2mSecurityMode": "psk",
"consumersForUpdate": "",
"bootstrapPskIdentity": "urn:imei:930000001801583",
"pureCoapSecurityMode": "NONE",
"bootstrapSecurityMode": "psk",
"extensionAttribute.imsi": "310240000029929",
"extensionAttribute.msisdn": "310240000029929",
"extensionAttribute.simSerial": "310240000029929",
"extensionAttribute.msisdnStatus": "active"
}

Solution
You can do it as follows:
select column_name->>'extensionAttribute.simSerial' as simSerial from my_table;
where column_name is your column name in a table.
If you will have JSON with higher depth you can do something like:
column_name->'key_in_json'->'another_key_in_json'->>'last_key_in_json'
Manual
JSON Functions and Operators
PostgreSQL JSON Tutorial

Append to Specific Array Index from another table dynamically SQL Server

I have a json string that I need to modify
{"RecordCount":3,"Top":10,"Skip":0,"SelectedSort":"Seed asc","value":[{"AccountProductListId":22091612871138,"Name":"April 4th 2018","AccountId":256813438078643,"IsPublic":false,"Comment":"Test order sheet","Quantity":3},{"AccountProductListId":166305848801939,"Name":"test","AccountId":256813438078643,"IsPublic":false,"Comment":"","Quantity":1},{"AccountProductListId":21177711287586,"Name":"Test Order sheet","AccountId":256813438078643,"IsPublic":true,"Comment":"the very first sheet","Quantity":2}]}
Inside value the array looks like this:
"value": [{
"AccountProductListId": 22091612871138,
"Name": "April 4th 2018",
"IsPublic": false,
"Comment": "Test order sheet",
"Quantity": 3
}, {
"AccountProductListId": 166305848801939,
"Name": "test",
"IsPublic": false,
"Comment": "",
"Quantity": 1
}, {
"AccountProductListId": 21177711287586,
"Name": "Test Order sheet",
"IsPublic": true,
"Comment": "the very first sheet",
"Quantity": 2
}],
What I need to do is append some data from another table:
AccountProductListId ProductID
21177711287586 97096131867163|32721319938943
22091612871138 97096131867163|145461009584740|130005306921282
166305848801939 8744071222157
As you can see the AccountProductListId is already in the JSON result so I should know which array it should go to. The only problem is I don't know the syntax to merge the ProductID data into its specific array index. The JSON array could have more than 3 items.
Essentially ending up with something like this:
"value": [{
"AccountProductListId": 22091612871138,
"Name": "April 4th 2018",
"IsPublic": false,
"Comment": "Test order sheet",
"Quantity": 3,
"ProductID": "97096131867163|145461009584740|130005306921282"
}, {
"AccountProductListId": 166305848801939,
"Name": "test",
"IsPublic": false,
"Comment": "",
"Quantity": 1,
"ProductID": "8744071222157"
}, {
"AccountProductListId": 21177711287586,
"Name": "Test Order sheet",
"IsPublic": true,
"Comment": "the very first sheet",
"Quantity": 2,
"ProductID": "97096131867163|32721319938943"
}],
Any information would be greatly appreciated. Thanks.

Process with SQL Server
Prior to SQL Server 2016, there is not in-built support to read or write JSON.
Starting SQL Server 2016, you can use OPENJSON rowset function to read JSON and FOR JSON clause to write JSON.
See https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server
The approach would be to use OPENJSON to read the JSON string as a rowset, join it with the table to pickup ProductID and use FOR JSON to convert back to JSON.
Process outside SQL Server
Depending on your situation, it might be simpler to parse JSON outside SQL Server. If going that route, then you could
Collect all the AccountProductListIDs from the parsed JSON
Send the collected id to SQL Server via a stored procedure that takes a TVP input and outputs the AccountProductListID -> ProductID mapping
Inject the ProductIDs into JSON object and serialize back to string

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to save only specific JSON elements in a new OpenRefine column - openrefine

Related

Azure Data Factory JSON syntax

Looping JSON Array in JSONB field

JSON Parsing in Snowflake - Square Brackets At Start

How to query a json string in Postgres

Append to Specific Array Index from another table dynamically SQL Server

Categories

Resources