SQL Server trim before and after specific values - sql

I have a database that has a column with a long string and I'm looking for a way to extract just a certain portion of it.
Here is a sample:
{
"vendorId": 53,
"externalRef": "38828059 $567.82",
"lines": [{
"amount": 0,
"lineType": "PURCHASE",
"lineItemType": "INVENTORY",
"inventory": {
"cost": 0,
"quantity": 1,
"row": "6",
"seatType": "CONSECUTIVE",
"section": "102",
"notes": "http://testurl/0F005B52CE7F5892 38828059 $567.82 ,special",
"splitType": "ANY",
"stockType": "ELECTRONIC",
"listPrice": 0,
"publicNotes": " https://brokers.123.com/wholesale/event/146489908 https://www.123.com/buy-event/4897564 ",
"eventId": 3757669,
"eventMapping": {
"eventDate": "",
"eventName": "Brandi Carlile: Beyond These Silent Days Tour",
"venueName": "Gorge Amphitheatre"
},
"tickets": [{
"seatNumber": 1527
}]
}
}]
}
What I'm looking to extract is just http://testurl/0F005B52CE7F5892
Would someone be able to assist me with the syntax how to call my query that it will make a new temp column and give me just this extracted value for each row in this column?
I user SQL Server 2008 so some newer functions won't work for me.

Upgrade your SQL Server to a supported version.
But till then, we pity those who dare to face the horror of handling JSON with only the old string functions.
select
[notes_url] =
CASE
WHEN [json_column] LIKE '%"notes": "http%'
THEN substring([json_column],
patindex('%"notes": "http%', [json_column])+10,
charindex(' ', [json_column] ,
patindex('%"notes": "http%', [json_column])+15)
- patindex('%"notes": "http%', [json_column])-10)
END
from [YourTable];
db<>fiddle here

Related

Azure Data Factory JSON syntax

In Azure Data Factory, I have a copy activity. The data source is the response body from a REST API POST request.
The sink is a SQL table. The problem is that, even though my JSON data contains multiple rows, only the first row is getting copied.
The source data looks like the following:
{
"offset": 0,
"limit": 1000,
"total": 65,
"loaded": 34,
"unloaded": 31,
"cubeCaches": [
{
"id": "MxMUVDN0Q1MzAk5MDg6RDkxREQxMUU5RDBDNzR2NMTk6YWNsZGxwMTJtc3QuY2952aXppZW50aW5==",
"projectId": "15D91DD11E9D0C74B3319",
"source": {
"name": "12302021",
"id": "07EF95111EC7F954158",
"type": "cube"
},
"state": {
"active": true,
"dirty": false,
"infoDirty": false,
"persisted": true,
"processing": false,
"loadedState": "loaded"
},
"lastUpdateTime": "2022-01-24T14:22:30Z",
"lastHitTime": "2022-02-14T20:02:02Z",
"hitCount": 1,
"size": 798720,
"creatorId": "D4E8BFD56085",
"lastUpdateJob": 18937,
"openViewCount": 0,
"creationTime": "2022-01-24T15:07:24Z",
"historicHitCount": 22,
"dataLanguages": [],
"rowCount": 2726,
"columnCount": 9
},
{
"id": "UYwMTIxMUFNjkxMUU5RDBDMTRCNkMwMDgwRUYzNUQ0MUI6YWNsZjLmNvbQ==",
"projectId": "120D0C1480EF35D41B",
"source": {
"name": "All Clients (YTD)",
"id": "49E5B13466251CD0B54E8F",
"type": "cube"
},
"state": {
"active": true,
"dirty": false,
"infoDirty": false,
"persisted": true,
"processing": false,
"loadedState": "loaded"
},
"lastUpdateTime": "2022-01-03T01:00:01Z",
"hitCount": 0,
"size": 82488152,
"creatorId": "1E2AFB011E80EF35FF14",
"lastUpdateJob": 364091,
"openViewCount": 0,
"creationTime": "2022-02-14T01:04:55Z",
"historicHitCount": 0,
"dataLanguages": [],
"rowCount": 8146903,
"columnCount": 13
}
}
I want to add a row in the Sink table (SQL) for every "id" in the JSON. However, when I run the activity, only the first record gets copied. It's mapped correctly, but I want it to copy all rows in the JSON, not just 1.
My Mapping tab in Azure Data Factory looks like this:
What am I doing wrong here? I'm thinking there is something wrong with my "Source" syntax for each of the columns...
In $cubeCashes[0][...] you're explicitly mapping the first element from this array into columns, and that's why only one row lands in the Sink.
I don’t know a way to achieve what you intend with copy activity only. I would use the Mapping Data Flow here, and inlide I would flatten (Flatten activity) your data to get the array of objects.
Then from this flattened dataset you could use a Derived Column to map the fields in JSON into columns of your target, Select, to remove unwanted original fields, and Sink it into your target location.

JSON Parsing in Snowflake - Square Brackets At Start

I'm trying to parse out some JSON files in snowflake. In this case, I'd like to extract the "gift card" from the line that has "fulfillment_service": "gift_card". I've had success querying one dimensional JSON data, but this - with the square brackets - is confounding me.
Here's my simple query - I've created a small table called "TEST_WEEK"
select line_items:fulfillment_service
from TEST_WEEK
, lateral flatten(FULFILLMENTS:line_items) line_items;
Hopefully this isn't too basic a question. I'm very new with parsing JSON.
Thanks in advance!
Here's the start of the FULLFILLMENTS field with the info I want to get at.
[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA",
Maybe you can use two lateral flatten to process values in line_items array:
Sample table:
create table TEST_WEEK( FULFILLMENTS variant ) as
select parse_json(
'[
{
"admin_graphql_api_id": "gid://shopify/Fulfillment/2191015870515",
"created_at": "2020-08-10T14:54:38Z",
"id": 2191015870515,
"line_items": [
{
"admin_graphql_api_id": "gid://shopify/LineItem/5050604355635",
"discount_allocations": [],
"fulfillable_quantity": 0,
"fulfillment_service": "gift_card",
"fulfillment_status": "fulfilled",
"gift_card": true,
"grams": 0,
"id": 5050604355635,
"name": "Gift Card - $100.00",
"origin_location": {
"address1": "100 Indian Road",
"address2": "",
"city": "Toronto",
"country_code": "CA"
}
}
]
}
]');
Sample query:
select s.VALUE:fulfillment_service
from TEST_WEEK,
lateral flatten( FULFILLMENTS ) f,
lateral flatten( f.VALUE:line_items ) s;
The output:
+-----------------------------+
| S.VALUE:FULFILLMENT_SERVICE |
+-----------------------------+
| "gift_card" |
+-----------------------------+
Those square brackets indicate that you have an array of JSON objects in your FULLFILLMENTS field. Unless there is a real need to have an array of objects in one field you should have a look at the STRIP_OUTER_ARRAY property of the COPY command. An example can be found here in the Snowflake documentation:
copy into <table>
from #~/<file>.json
file_format = (type = 'JSON' strip_outer_array = true);
In case others are stuck with same data issue (all json data in one array), I have this solution:
select f.VALUE:fulfillment_service::string
from TEST_WEEK,
lateral flatten( FULFILLMENTS[0].line_items ) f;
With this, you just grab the first element of the array (which is the only element).
If you have nested array elements, just add this to the lateral flatten:
, RECURSIVE => TRUE, mode => 'array'

Convert CSV containing nested JSON rows to SQL table

I have a CSV file with several million rows, and want to load it as a PostgreSQL table. One of the rows in the column 'json_doc' as an example contains:
{"id": <>,
"base":
{"ateco":
[
{
"code": "<>",
"rootCode": "<>",
"description": "<>"
}
],
"founded": "<>",
"legalName": "<>",
"legalForms":
[
{
"name": "<>",
"level": <>
},
{
"name": "<>",
"level": <>
}
]
},
"name": "<>",
"people":
{
"items":
[
{
"name": "<>",
"givenName": "<>",
"familyName": "<>"
}
]
},
"country": "<>",
"locations": {}
}
Which as you can see has many nested dictionaries. And there are several million of these.
I'd like to get this file into an SQL table with even the sub-dictionary values in their own columns. How can I do this? It would seem I have to use some sort of name spacing technique for the nested data as there are some duplicate keys i.e. 'name'.
The data will be analysed using Pandas, but I'd like to get this straight into Postgres if possible. Any assistance greatly appreciated.
The result will look like:
id | base_ateco_code | etc | base_ateco_legalForms_name | etc |
Unless there are any ideas about this - it's a pretty open project from my employer - I just need to be able to use this information as part of a JOIN with another table.
Many thanks.

Append to Specific Array Index from another table dynamically SQL Server

I have a json string that I need to modify
{"RecordCount":3,"Top":10,"Skip":0,"SelectedSort":"Seed asc","value":[{"AccountProductListId":22091612871138,"Name":"April 4th 2018","AccountId":256813438078643,"IsPublic":false,"Comment":"Test order sheet","Quantity":3},{"AccountProductListId":166305848801939,"Name":"test","AccountId":256813438078643,"IsPublic":false,"Comment":"","Quantity":1},{"AccountProductListId":21177711287586,"Name":"Test Order sheet","AccountId":256813438078643,"IsPublic":true,"Comment":"the very first sheet","Quantity":2}]}
Inside value the array looks like this:
"value": [{
"AccountProductListId": 22091612871138,
"Name": "April 4th 2018",
"IsPublic": false,
"Comment": "Test order sheet",
"Quantity": 3
}, {
"AccountProductListId": 166305848801939,
"Name": "test",
"IsPublic": false,
"Comment": "",
"Quantity": 1
}, {
"AccountProductListId": 21177711287586,
"Name": "Test Order sheet",
"IsPublic": true,
"Comment": "the very first sheet",
"Quantity": 2
}],
What I need to do is append some data from another table:
AccountProductListId ProductID
21177711287586 97096131867163|32721319938943
22091612871138 97096131867163|145461009584740|130005306921282
166305848801939 8744071222157
As you can see the AccountProductListId is already in the JSON result so I should know which array it should go to. The only problem is I don't know the syntax to merge the ProductID data into its specific array index. The JSON array could have more than 3 items.
Essentially ending up with something like this:
"value": [{
"AccountProductListId": 22091612871138,
"Name": "April 4th 2018",
"IsPublic": false,
"Comment": "Test order sheet",
"Quantity": 3,
"ProductID": "97096131867163|145461009584740|130005306921282"
}, {
"AccountProductListId": 166305848801939,
"Name": "test",
"IsPublic": false,
"Comment": "",
"Quantity": 1,
"ProductID": "8744071222157"
}, {
"AccountProductListId": 21177711287586,
"Name": "Test Order sheet",
"IsPublic": true,
"Comment": "the very first sheet",
"Quantity": 2,
"ProductID": "97096131867163|32721319938943"
}],
Any information would be greatly appreciated. Thanks.
Process with SQL Server
Prior to SQL Server 2016, there is not in-built support to read or write JSON.
Starting SQL Server 2016, you can use OPENJSON rowset function to read JSON and FOR JSON clause to write JSON.
See https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server
The approach would be to use OPENJSON to read the JSON string as a rowset, join it with the table to pickup ProductID and use FOR JSON to convert back to JSON.
Process outside SQL Server
Depending on your situation, it might be simpler to parse JSON outside SQL Server. If going that route, then you could
Collect all the AccountProductListIDs from the parsed JSON
Send the collected id to SQL Server via a stored procedure that takes a TVP input and outputs the AccountProductListID -> ProductID mapping
Inject the ProductIDs into JSON object and serialize back to string

Freebase search_api and excluding results by specified type

is anyone know, how to exclude some topics with specified type(s) using search api and mql?
For example i'm try to find all topics "Voodoo People", and exclude only those, that have composition and release types, and sort result by score desc: http://tinyurl.com/3tjkb7y.
Sorting work perfect, but i can't find functionality for excluding :(
I'm try to use mql_filter: http://tinyurl.com/644xkow, but releases still there.
And one more question: i see in type_strict param possible values: "all", "any", "should". But there is no value "not" or "not in". Is needed result can be obtained in any other way?
The syntax that you're looking for is "optional" : "forbidden". In your query that would look like this:
[{
"search": {
"query": "Voodoo People",
"score": null,
"mql_filter": [{
"type": {
"id": "/music/release",
"optional": "forbidden"
}
}]
},
"name": null,
"id": null,
"type": [],
"/common/topic/notable_for": {
},
"limit": 15,
"sort": "-search.score"
}]​