Parsing within a field using SQL - sql

We are receiving data in one column where further parsing is needed. In this example the separator is ~.
Goal is to grab the pass or fail value from its respective pair.
SL
Data
1
"PARAM-0040,PASS~PARAM-0045,PASS~PARAM-0070,PASS"
2
"PARAM-0040,FAIL~PARAM-0045,FAIL~PARAM-0070,PASS"
Required outcome:
SL
PARAM-0040
PARAM-0045
PARAM-0070
1
PASS
PASS
PASS
2
FAIL
FAIL
PASS
This will be a part of a bigger SQL query where we are selecting many other columns, and these three columns are to be picked up from the source as well and passed in the query as selected columns.
E.g.
Select Column1, Column2, [ Parse code ] as PARAM-0040, [ Parse code ] as PARAM-0045, [ Parse code ] as PARAM-0070, Column6 .....
Thanks

You can do that with a regular expression. But regexps are non-standard.
This is how it is done in postgresql: REGEXP_MATCHES()
https://www.postgresqltutorial.com/postgresql-regexp_matches/
In postgresql regexp_matches returns zero or more values. So then it has to be broken down (thus the {})
A simpler way, also in postgresql is to use substring.
substring('foobar' from 'o(.)b')
Like:
select substring('PARAM-0040,PASS~PARAM-0045,PASS~PARAM-0070,PASS' from 'PARAM-0040,([^~]+)~');
substring
-----------
PASS
(1 row)

You may use the str_to_map function to split your data and subsequently extract each param's value. This example will first split each param/value pair by ~ before splitting the parameter and value by ,.
Reproducible example with your sample data:
WITH my_table AS (
SELECT 1 as SL, "PARAM-0040,PASS~PARAM-0045,PASS~PARAM-0070,PASS" as DATA
UNION ALL
SELECT 2 as SL, "PARAM-0040,FAIL~PARAM-0045,FAIL~PARAM-0070,PASS" as DATA
),
param_mapped_data AS (
SELECT SL, str_to_map(DATA,"~",",") param_map FROM my_table
)
SELECT
SL,
param_map['PARAM-0040'] AS PARAM0040,
param_map['PARAM-0045'] AS PARAM0045,
param_map['PARAM-0070'] AS PARAM0070
FROM
param_mapped_data
Actual code assuming your table is named my_table
WITH param_mapped_data AS (
SELECT SL, str_to_map(DATA,"~",",") param_map FROM my_table
)
SELECT
SL,
param_map['PARAM-0040'] AS PARAM0040,
param_map['PARAM-0045'] AS PARAM0045,
param_map['PARAM-0070'] AS PARAM0070
FROM
param_mapped_data
Outputs:
sl
param0040
param0045
param0070
1
PASS
PASS
PASS
2
FAIL
FAIL
PASS

Related

Hive sql extract one to multiple values from key value pairs

I have a column that looks like:
[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false},...]
There can be 1 to many items described by parameters in {} in the column.
I would like to extract values only of parameters described by key_1. Is there a function for that? I tried so far json related functions (json_tuple, get_json_object) but each time I received null.
Consider below json path.
WITH sample_data AS (
SELECT '[{"key_1":true,"key_2":true,"key_3":false},{"key_1":false,"key_2":false,"key_3":false}]' json
)
SELECT get_json_object(json, '$[*].key_1') AS key1_values FROM sample_data;
Query results

Selecting substrings from different points in strings depending on another column entry SQL

I have 2 columns that look a little like this:
Column A
Column B
Column C
ABC
{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}
1.0
DEF
{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}
24.0
I need a select statement to create column C - the numerical digits in column B that correspond to the letters in Column A. I have got as far as finding the starting point of the numbers I want to take out. But as they have different character lengths I can't count a length, I want to extract the characters from the calculated starting point( below) up to the next comma.
STRPOS(Column B, Column A) +5 Gives me the correct character for the starting point of a SUBSTRING query, from here I am lost. Any help much appreciated.
NB, I am using google Big Query, it doesn't recognise CHARINDEX.
You can use a regular expression as well.
WITH sample_table AS (
SELECT 'ABC' ColumnA, '{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}' ColumnB UNION ALL
SELECT 'DEF', '{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}' UNION ALL
SELECT 'XYZ', '{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}'
)
SELECT *,
REGEXP_EXTRACT(ColumnB, FORMAT('"%s":([0-9.]+)', ColumnA)) ColumnC
FROM sample_table;
Query results
[Updated]
Regarding #Bihag Kashikar's suggestion: sinceColumnB is an invalid json, it will not be properly parsed within js udf like below. If it's a valid json, js udf with json key can be an alternative of a regular expression. I think.
CREATE TEMP FUNCTION custom_json_extract(json STRING, key STRING)
RETURNS STRING
LANGUAGE js AS """
try {
obj = JSON.parse(json);
}
catch {
return null;
}
return obj[key];
""";
SELECT custom_json_extract('{"ABC":1.0,"DEF":24.0,"XYZ":10.50,}', 'ABC') invalid_json,
custom_json_extract('{"ABC":1.0,"DEF":24.0,"XYZ":10.50}', 'ABC') valid_json;
Query results
take a look at this post too, this shows using js udf and with split options
Error when trying to have a variable pathsname: JSONPath must be a string literal or query parameter

delimit the output of xml extract function in oracle [duplicate]

I have a CLOB column that contains XML type data. For example XML data is:
<A><B>123</b><C>456</C><B>789</b></A>
I have tried the concat function:
concat(xmltype (a.xml).EXTRACT ('//B/text()').getStringVal (),';'))
or
xmltype (a.xml).EXTRACT (concat('//B/text()',';').getStringVal ()))
But they are giving ";" at end only not after each <B> tag.
I am currently using
xmltype (a.xml).EXTRACT ('//B/text()').getStringVal ()
I want to concatenate all <B> with ; and expected result should be 123;789
Please suggest me how can I concatenate my data.
The concat() SQL function concatenates two values, so it's just appending the semicolon to each extracted value independently. But you're really trying to do string aggregation of the results (which could, presumably, really be more than two extracted values).
You can use XMLQuery instead of extract, and use an XPath string-join() function to do the concatentation:
XMLQuery('string-join(/A/B, ";")' passing xmltype(a.xml) returning content)
Demo with fixed XMl end-node tags:
-- CTE for sample data
with a (xml) as (
select '<A><B>123</B><C>456</C><B>789</B></A>' from dual
)
-- actual query
select XMLQuery('string-join(/A/B, ";")' passing xmltype(a.xml) returning content) as result
from a;
RESULT
------------------------------
123;789
You could also extract all of the individual <B> values using XMLTable, and then use SQL-level aggregation:
-- CTE for sample data
with a (xml) as (
select '<A><B>123</B><C>456</C><B>789</B></A>' from dual
)
-- actual query
select listagg(x.b, ';') within group (order by null) as result
from a
cross join XMLTable('/A/B' passing xmltype(a.xml) columns b number path '.') x;
RESULT
------------------------------
123;789
which gives you more flexibility and would allow grouping by other node values more easily, but that doesn't seem to be needed here based on your example value.

Hive Delimiter using :

I want to extract a column A that has values such as W:X:Y:Z.
I am interested to extract Z from Column A.
I tried multiple commands such as SPLIT(Table.A, "[:]"[3] ) but get an error.
What is the best way to do this?
Split function returns array. Array index [3] should be applied to the split function result:
with yourtable as ( -- use your table instead of this
select 'W:X:Y:Z' as A
)
select split(A,'\\:')[3] from yourtable;
Result:
Z

Postgres query to calculate matching strings

I have following table:
id description additional_info
123 XYZ XYD
And an array as:
[{It is known to be XYZ},{It is know to be none},{It is know to be XYD}]
I need to map both the content in such a way that for every record of table I'm able to define the number of successful match.
The result of the above example will be:
id RID Matches
1 123 2
Only the content at position 0 and 2 match the record's description/additional_info so Matches is 2 in the result.
I am struggling to transform this to a query in Postgres - dynamic SQL to create a VIEW in a PL/pgSQL function to be precise.
It's undefined how to deal with array elements that match both description and additional_info at the same time. I'll assume you want to count that as 1 match.
It's also undefined where id = 1 comes from in the result.
One way is to unnest() the array and LEFT JOIN the main table to each element on a match on either of the two columns:
SELECT 1 AS id, t.id AS "RID", count(a.txt) AS "Matches"
FROM tbl t
LEFT JOIN unnest(my_arr) AS a(txt) ON a.txt ~ t.description
OR a.txt ~ t.additional_info
GROUP BY t.id;
I use a regular expression for the match. Special characters like (.\?) etc. in the strings to the right have special meaning. You might have to escape those if possible.
Addressing your comment
You should have mentioned that you are using a plpgsql function with EXECUTE. Probably 2 errors:
The variable array_content is not visible inside EXECUTE, you need to pass the value with a USING clause - or concatenate it as string literal in a CREATE VIEW statement which does not allow parameters.
Missing single quotes around the string 'brand_relevance_calculation_‌​view'. It's still a string literal before you concatenate it as identifier. You did good to use format() with %I there.
Demo:
DO
$do$
DECLARE
array_content varchar[]:= '{FREE,DAY}';
BEGIN
EXECUTE format('
CREATE VIEW %I AS
SELECT id, description, additional_info, name, count(a.text) AS business_objectives
, multi_city, category IS NOT NULL AS category
FROM initial_events i
LEFT JOIN unnest(%L::varchar[]) AS a(text) ON a.text ~ i.description
OR a.text ~ i.additional_info'
, 'brand_relevance_calculation_‌​view', array_content);
END
$do$;