Snowflake Searching string in semi structured data - sql

I have a table. There are many columns and rows. One column that I am trying to query in Snowflake has semi structured data. For example, when I query
select response
from table
limit 5
This is what is returned
[body={\n "id": "xxxxx",\n "object": "charge",\n "amount": 500,\n "amount_refunded": 0,\n "application": null,\n "application_fee": null,\n "application_fee_amount": null,\n "balance_transaction": null,\n "billing_details": {\n "address": {\n "city": null,\n "zip": "xxxxx",]
I want to select only the zip in this data. When I run code:
select response:zip
from table
limit 5
I get an error.
SQL compilation error: error line 1 at position 21 Invalid argument types for function 'GET': (VARCHAR(16777216), VARCHAR(11))
Is there a reason why this is happening? I am new to snowflake so trying to parse out this data but stuck. Thanks!

Snowflake has very good documentation on the subject
For your specific case, have you attempted to use dot notation? It's the appropiate method for accessing JSON. So
Select result:body.zip
from table
Remember that you have your 'body' element. You need to access that one first with semicolon because it's a level 1 element. Zip is located within body so it's a level 2. Level 1 elements are accessed with semicolon, level 2 elements are accessed with dot notation.

I think you have multiple issues with this.
First I think your response column is not a variant column. Please run the below query and confirm
SHOW COLUMNS ON table;
Even if the column is variant, the way the data is stored is not in a valid JSON format. You will need to strip the JSON part and then store that in the variant column.
Please do the first part and share the information, I will then suggest next steps. I wanted to put that in the comment but comment does not allow to write so many sentences.

Related

Data Factory Copy Activity: Error found when processing 'Csv/Tsv Format Text' source 'xxx.csv' with row number 6696: found more columns than expected

I am trying to perform a simply copy activity in Azure Data Factory from CSV to SQL Table, but I'm getting the following error:
{
"errorCode": "2200",
"message": "ErrorCode=DelimitedTextMoreColumnsThanDefined,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text' source 'organizations.csv' with row number 6696: found more columns than expected column count 41.,Source=Microsoft.DataTransfer.Common,'",
"failureType": "UserError",
"target": "Copy data1",
"details": []
}
The copy activity is as follows
Source
My Sink is as follows:
As preview of the data in source is as follows:
This seems like a very straight forward copy activity. Any thoughts on what might be causing the error?
My row 6696 looks like the following:
3b1a2e5f-d08b-166b-4b91-eb53009b2377 Compassites Software Solutions organization compassites-software https://www.crunchbase.com/organization/compassites-software 318375 17/07/2008 10:46 05/12/2022 12:17 company compassitesinc.com http://www.compassitesinc.com IND Karnataka Bangalore "Pradeep Court", #163/B, 6th Main 3rd Cross, JP Nagar 3rd phase 560078 operating Custom software solution experts Big Data,Cloud Computing,Information Technology,Mobile,Software Data and Analytics,Information Technology,Internet Services,Mobile,Software 01/11/2005 51-100 info#compassitesinc.com 080-42032572 http://www.facebook.com/compassites http://www.linkedin.com/company/compassites-software-solutions http://twitter.com/compassites https://res.cloudinary.com/crunchbase-production/image/upload/v1397190270/c3e5acbde40f36eaf4f8c6f6eda3f803.png company
No commas
As the error message indicates, there is a record at row number 6696 where there is a value containing , as a character in it.
Look at the following demonstration where I have taken a similar case. I have 3 columns in my source. The data looks as shown below:
When I run use similar dataset settings and read these values, the same error would be thrown.
So, the value T1,OG is being considered as if they belong to 2 different columns since they have dataset delimiter within the value.
Such values would throw an error as it is ambiguous to read. One way to avoid this is to enclose such values with quote character (double quote in this case).
Now when I run the copy activity, it would give the desired output.
The table data would look like this:

JSON EXTRACT IN BIG QUERY

I Currently have a column in json format with multiple items.
The struct is like the following:
phones": [{"phone": "11111111", "type": "CELLPHONE"}, {"phone":
"222222222", "type":"CELLPHONE"}, {"phone": "99999999", "type":
"CELLPHONE"}]
I tried:
json_extract(Contacts,'$.phones.phone') as phone_number but it only extracts the first one.
JSON_EXTRACT_ARRAY(Contacts,'$.phones') as phone_contacts gives me an array, But Im getting error trying to unnest it.
Does anybody know any approach to solve this problems?
Thanks in Advance.
Consider below
select json_value(phone_contact, '$.phone') phone,
json_value(phone_contact, '$.type') type
from your_table,
unnest(json_extract_array(Contacts,'$.phones')) phone_contact
if applied to sample data in your question - output is

Error code: DelimitedTextMoreColumnsThanDefined Azure Data Factory

I am trying to copy data from a csv file to a sql table in Azure Data Factory
This is my type property for the CSV file
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"fileName": "2020-09-16-stations.csv",
"container": "container"
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
I recieve following error:
ErrorCode=DelimitedTextMoreColumnsThanDefined,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error found when processing 'Csv/Tsv Format Text' source '2020-09-16-stations.csv' with row number 2: found more columns than expected column count 11.,Source=Microsoft.DataTransfer.Common,'
This is row #2
0e18d0d3-ed38-4e7f,Station2,Mainstreet33,,12207,Berlin,48.1807,11.4609,1970-01-01 01:00:00+01,"{""openingTimes"":[{""applicable_days"":96,""periods"":[{""startp"":""08:00"",""endp"":""20:00""}]},{""applicable_days"":31,""periods"":[{""startp"":""06:00"",""endp"":""20:00""}]}]}"
I think the last column, the JSON query is making trouble in this case. When I view the data it looks fine:
I thought exactly the "quoteChar": "\""would prevent that the last column makes problems. I have no idea why I am getting this error while i run debug
Try setting the escape character = " (a double quote). This should treat each pair of double quotes as an actual single quote and wont consider them as a "Quote Char" within the string, so you will end up with a string that looks like this (and which the system knows is a single string and not something it has to split):
{"openingTimes":[{"applicable_days":96,"periods":[{"startp":"08:00","endp":"20:00"}]},
{"applicable_days":31,"periods":[{"startp":"06:00","endp":"20:00"}]}]}
This is because this value "{""openingTimes"":[{""applicable_days"":96,""periods"":[{""startp"":""08:00"",""endp"":""20:00""}]},{""applicable_days"":31,""periods"":[{""startp"":""06:00"",""endp"":""20:00""}]}]}" contains several comma and your columnDelimiter is "," which leads to that value is split to several column. So you need to change your columnDelimiter.

Having problems with ordering by numeric value

I have JSON data in my collection similar to following example. There is a icCount property with numeric value. Now when I issue a query with order specified by icCount, its sorted as text and not numeric value (see screenshot below). Index is automatic here. Any idea what is wrong here? (running RavenDB 4.1.1)
{
"enabled": true,
"description": "",
"icCount": 3865,
"companyname": "ABC Data"
}
Ok, so I just found it myself. Help here https://ravendb.net/docs/article-page/4.1/csharp/indexes/querying/sorting states that I should specify ordering mode(type). For my case I can simply rewrite it to: order by icCount as long desc ... see the long in clause. This way my data list is ordered correctly.

How to extract this json into a table?

I've a sql column filled with json document, one for row:
[{
"ID":"TOT",
"type":"ABS",
"value":"32.0"
},
{
"ID":"T1",
"type":"ABS",
"value":"9.0"
},
{
"ID":"T2",
"type":"ABS",
"value":"8.0"
},
{
"ID":"T3",
"type":"ABS",
"value":"15.0"
}]
How is it possible to trasform it into tabular form? I tried with redshift json_extract_path_text and JSON_EXTRACT_ARRAY_ELEMENT_TEXT function, also I tried with json_each and json_each_text (on postgres) but didn't get what expected... any suggestions?
desired results should appear like this:
T1 T2 T3 TOT
9.0 8.0 15.0 32.0
I assume you printed 4 rows. In postgresql
SELECT this_column->'ID'
FROM that_table;
will return column with JSON strings. Use ->> if you want text column. More info here: https://www.postgresql.org/docs/current/static/functions-json.html
In case you were using some old Postgresql (before 9.3), this gets harder : )
Your best option is to use COPY from JSON Format. This will load the JSON directly into a normal table format. You then query it as normal data.
However, I suspect that you will need to slightly modify the format of the file by removing the outer [...] square brackets and also the commas between records, eg:
{
"ID": "TOT",
"type": "ABS",
"value": "32.0"
}
{
"ID": "T1",
"type": "ABS",
"value": "9.0"
}
If, however, your data is already loaded and you cannot re-load the data, you could either extract the data into a new table, or add additional columns to the existing table and use an UPDATE command to extract each field into a new column.
Or, very worst case, you can use one of the JSON Functions to access the information in a JSON field, but this is very inefficient for large requests (eg in a WHERE clause).