Snowflake SQL: Extract from bracketed list of strings - sql

I have a particularly hairy string I need to parse using SQL on snowflake. The source data looks like this:
sample data image
I need to extract the default sub brand for each brand ID. For the 3 records shown, the values should be Amazon, Alphabet, and null.
I figure the first step would be to flatten the string into multiple rows per brand (i.e. each set of curly brackets on a different row) but I've tried many combinations of regex/json functions as well as lateral/flatten, but can't seem to figure out the best way to accomplish this.
Here is the code to create a sample table to test out:
select
column1 as brand_id,
column2 as sub_brand_data
from values
(1, '"[{\\"Name\\":\\"Amazon\\",\\"CD\\":\\"AMZ\\",\\"IsDefault\\":true}]"'),
(2, '"[{\\"Name\":\\"Google\\",\\"CD\":\\"GOG\\",\\"IsDefault\\":false},{\\"Name\\":\\"Alphabet\\",\\"CD\\":\\"ALP\\",\\"IsDefault\\":true}]"'),
(3, '"[]")')
;
select * from sub_brand_extract
*note that the create table statement has extra '' to escape the literal \ in the source string.

You can do a bit of cleanup with the trim function, and then get rid of the escaping of the double quotes. From there, it's valid JSON (an array) that can be parsed with the parse_json function.
From there it's possible to use the flatten table function to convert them into rows (if desired).
with VALS as
(
select
column1 as brand_id,
column2 as sub_brand_data,
parse_json(replace(trim(column2, '"())'), '\\"', '"')) as column3
from values
(1, '"[{\\"Name\\":\\"Amazon\\",\\"CD\\":\\"AMZ\\",\\"IsDefault\\":true}]"'),
(2, '"[{\\"Name\":\\"Google\\",\\"CD\":\\"GOG\\",\\"IsDefault\\":false},{\\"Name\\":\\"Alphabet\\",\\"CD\\":\\"ALP\\",\\"IsDefault\\":true}]"'),
(3, '"[]")')
)
select BRAND_ID
,VALUE:CD::string as CD
,VALUE:IsDefault::string as IS_DEFAULT
,VALUE:Name::string as NAME
from VALS, table(flatten(COLUMN3))
;
BRAND_ID
CD
IS_DEFAULT
NAME
1
AMZ
true
Amazon
2
GOG
false
Google
2
ALP
true
Alphabet

Related

Hive sql extract one to multiple values from key value pairs

I have a column that looks like:
[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false},...]
There can be 1 to many items described by parameters in {} in the column.
I would like to extract values only of parameters described by key_1. Is there a function for that? I tried so far json related functions (json_tuple, get_json_object) but each time I received null.
Consider below json path.
WITH sample_data AS (
SELECT '[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false}]' json
)
SELECT get_json_object(json, '$[*].key_1') AS key1_values FROM sample_data;
Query results

Selecting substrings from different points in strings depending on another column entry SQL

I have 2 columns that look a little like this:
Column A
Column B
Column C
ABC
{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}
1.0
DEF
{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}
24.0
I need a select statement to create column C - the numerical digits in column B that correspond to the letters in Column A. I have got as far as finding the starting point of the numbers I want to take out. But as they have different character lengths I can't count a length, I want to extract the characters from the calculated starting point( below) up to the next comma.
STRPOS(Column B, Column A) +5 Gives me the correct character for the starting point of a SUBSTRING query, from here I am lost. Any help much appreciated.
NB, I am using google Big Query, it doesn't recognise CHARINDEX.
You can use a regular expression as well.
WITH sample_table AS (
SELECT 'ABC' ColumnA, '{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}' ColumnB UNION ALL
SELECT 'DEF', '{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}' UNION ALL
SELECT 'XYZ', '{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}'
)
SELECT *,
REGEXP_EXTRACT(ColumnB, FORMAT('"%s":([0-9.]+)', ColumnA)) ColumnC
FROM sample_table;
Query results
[Updated]
Regarding #Bihag Kashikar's suggestion: sinceColumnB is an invalid json, it will not be properly parsed within js udf like below. If it's a valid json, js udf with json key can be an alternative of a regular expression. I think.
CREATE TEMP FUNCTION custom_json_extract(json STRING, key STRING)
RETURNS STRING
LANGUAGE js AS """
try {
obj = JSON.parse(json);
}
catch {
return null;
}
return obj[key];
""";
SELECT custom_json_extract('{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}', 'ABC') invalid_json,
custom_json_extract('{"ABC":1.0,"DEF":24.0,"XYZ":10.50}', 'ABC') valid_json;
Query results
take a look at this post too, this shows using js udf and with split options
Error when trying to have a variable pathsname: JSONPath must be a string literal or query parameter

ignoring rows where a column has non-numeric characters in its value

I have a table with a column that has some variable data. I would like to select only the rows that have values with numerical characters [0-9]
The column would look someting like this:
time
1545123
none
1565543
1903-294
I would want the rows with the first and third values only (1545123 and 1565543). None of my approaches have worked.
I've tried:
WHERE time NOT LIKE '%[^0-9]+%'
WHERE NOT regexp_like(time, '%[^0-9]+%')
WHERE regexp_like(time, '[0-9]+')
I've also tried these expressions in a CASE statement, but that was also a no go. Am I missing something here?
This is on Amazon Athena, which uses an older version of Presto
Thanks in advance
You can use regexp matching only numbers like '^[0-9]+$' or '^\d+$':
-- sample data
WITH dataset (time) AS (
VALUES
('1545123'),
('none'),
('1565543'),
('1903-294')
)
--query
select *
from dataset
WHERE regexp_like(time, '^[0-9]+$')
Output:
time
1545123
1565543
Another option which I would say should not be used in this case but can be helpful in some others is using try with cast:
--query
select *
from (
select try(cast(time as INTEGER)) time
from dataset
)
where time is not null

Parsing within a field using SQL

We are receiving data in one column where further parsing is needed. In this example the separator is ~.
Goal is to grab the pass or fail value from its respective pair.
SL
Data
1
"PARAM-0040,PASS~PARAM-0045,PASS~PARAM-0070,PASS"
2
"PARAM-0040,FAIL~PARAM-0045,FAIL~PARAM-0070,PASS"
Required outcome:
SL
PARAM-0040
PARAM-0045
PARAM-0070
1
PASS
PASS
PASS
2
FAIL
FAIL
PASS
This will be a part of a bigger SQL query where we are selecting many other columns, and these three columns are to be picked up from the source as well and passed in the query as selected columns.
E.g.
Select Column1, Column2, [ Parse code ] as PARAM-0040, [ Parse code ] as PARAM-0045, [ Parse code ] as PARAM-0070, Column6 .....
Thanks
You can do that with a regular expression. But regexps are non-standard.
This is how it is done in postgresql: REGEXP_MATCHES()
https://www.postgresqltutorial.com/postgresql-regexp_matches/
In postgresql regexp_matches returns zero or more values. So then it has to be broken down (thus the {})
A simpler way, also in postgresql is to use substring.
substring('foobar' from 'o(.)b')
Like:
select substring('PARAM-0040,PASS~PARAM-0045,PASS~PARAM-0070,PASS' from 'PARAM-0040,([^~]+)~');
substring
-----------
PASS
(1 row)
You may use the str_to_map function to split your data and subsequently extract each param's value. This example will first split each param/value pair by ~ before splitting the parameter and value by ,.
Reproducible example with your sample data:
WITH my_table AS (
SELECT 1 as SL, "PARAM-0040,PASS~PARAM-0045,PASS~PARAM-0070,PASS" as DATA
UNION ALL
SELECT 2 as SL, "PARAM-0040,FAIL~PARAM-0045,FAIL~PARAM-0070,PASS" as DATA
),
param_mapped_data AS (
SELECT SL, str_to_map(DATA,"~",",") param_map FROM my_table
)
SELECT
SL,
param_map['PARAM-0040'] AS PARAM0040,
param_map['PARAM-0045'] AS PARAM0045,
param_map['PARAM-0070'] AS PARAM0070
FROM
param_mapped_data
Actual code assuming your table is named my_table
WITH param_mapped_data AS (
SELECT SL, str_to_map(DATA,"~",",") param_map FROM my_table
)
SELECT
SL,
param_map['PARAM-0040'] AS PARAM0040,
param_map['PARAM-0045'] AS PARAM0045,
param_map['PARAM-0070'] AS PARAM0070
FROM
param_mapped_data
Outputs:
sl
param0040
param0045
param0070
1
PASS
PASS
PASS
2
FAIL
FAIL
PASS

Postgres Array[VarChar] uppercase?

I'm trying to find a way to force an array to make it upper or lowercase. This is so that no matter what the user inputs they get a result. This is the query:
select * from table where any(:id) = databasecolumn
:id is an array of chars that the user inputs(can be lowercase or uppercase) and I need to make sure that whatever the user inputs they get a result.
This works as long as the user inputs in uppercase (because the database values are also uppercase). But when they input lowercase letters they get no response.
I tried this:
select * from table where any(upper(:id)) = upper(databasecolumn)
but this does not work because the function "upper" is not for arrays. It works fine when I do it with a single input but not arrays.
Do you have any pointers? I couldn't find an equivalent function for an array of varchars.
You could use ILIKE:
select *
from table
where databasecolumn ILIKE any(:id);
This:
with data (col) as (
values ('one'), ('Two'), ('THREE')
)
select *
from data
where col ilike any(array['one', 'two', 'three']);
returns:
col
-----
one
Two
THREE
you can use double casting like here:
t=# with a as (select '{caSe1,cAse2}'::text[] r) select r,upper(r::text)::text[] from a where true;
r | upper
---------------+---------------
{caSe1,cAse2} | {CASE1,CASE2}
(1 row)
It neglects the benefits of using ANY though