How to use string_split() with delimiter in databricks?

How to use string_split() with delimiter in databricks? - sql

I am trying to use string_split() function in databricks to convert below dataframe.
Source dataframe stored as TempView in Databricks:
ID
value
1
value-1,value-2,value-3
2
value-1,value-4
Output needed:
ID
value
1
value-1
1
value-2
1
value-3
3
value-1
3
value-4
I tried below code:
%sql
SELECT ID, value
FROM TempView
CROSS APPLY STRING_SPLIT(value, ',')
GROUP BY cs.PERMID, value
but I am getting Parse exception.

There is no string_split function in Databricks SQL. But there is split function for that (doc).
Also in your case it's easier to write code using the combination of split and explode (doc) functions. Something like this:
SELECT ID, explode(split(value, ',')) FROM TempView

Related

Presto SQL query

Let's assume that i have an array of strings with the following values:
string = {'123','12ab','38','abc','01a8','1123b'}
how should i do a query in Presto SQL to extract only the values containing only and only numerical digits, so that my output would be {'123','38'}?
doing something like the query below, does not returns any output
SELECT string
FROM table1
WHERE string LIKE '[0-9]*'
GROUP BY string
example of my data sample
enter image description here

There are at least two options:
leverage try_cast operator provided by Presto
-- sample data
WITH dataset(string) AS (
values ('123'),
('12ab'),
('38'),
('abc'),
('01a8'),
('1123b')
)
-- query
select *
from dataset
where try_cast(string as integer) is not null;
Or use regular expressions via regexp_like:
-- query
select *
from dataset
where regexp_like(string, '^\d+$');
Output:
string
123
38

Big query unnest array with json values

Lets consider the following table on Google BigQuery:
WITH example AS (
SELECT 1 AS id, ["{\"id\":1, \"name\":\"AAA\"}", "{\"id\":2, \"name\":\"BBB\"}","{\"id\":3, \"name\":\"CCC\"}"]
UNION ALL
SELECT 2 AS id, ["{\"id\":5, \"name\":\"XXX\"}", "{\"id\":6, \"name\":\"ZZZ\"}"]
)
SELECT *
FROM example;
I would like to compose a query that will return names with their parent row's id.
like:
I tried using unnest with json functions and I just cant make this right.
Can anyone help me?
Thanks
Ido

According to your query, you already have json elements in your array. So with the use of unnest, you can use a json function like json_value to extract the name attribute of your elements.
select
id,
json_value(elt, '$.name')
from example, unnest(r) as elt;

How to change from regexp_extract to regexp_substr in Snowflake

I have some expressions in Hive that I need to change into Snowflake
(regexp_extract(subtransactionxml,'(.*?)()',1) in('REFUND'))
I tried to use this one but it gives me 0 results
(regexp_substr(subtransactionxml,'\W+(\W+)',1,1,'e',1) in('REFUND'))
Where is my mistake?

Based on your sample, the following query should work:
with xmldata as (
select '<transactionDate>2019-07-26T14:06:05.575-04:00</transactionDate> <type>CANCEL</type>' subtransactionxml )
select regexp_substr(subtransactionxml,'<type>(.*)<\/type>',1,1,'e',1) in ('CANCEL')
from xmldata;

Parsing within a field using SQL

We are receiving data in one column where further parsing is needed. In this example the separator is ~.
Goal is to grab the pass or fail value from its respective pair.
SL
Data
1
"PARAM-0040,PASS~PARAM-0045,PASS~PARAM-0070,PASS"
2
"PARAM-0040,FAIL~PARAM-0045,FAIL~PARAM-0070,PASS"
Required outcome:
SL
PARAM-0040
PARAM-0045
PARAM-0070
1
PASS
PASS
PASS
2
FAIL
FAIL
PASS
This will be a part of a bigger SQL query where we are selecting many other columns, and these three columns are to be picked up from the source as well and passed in the query as selected columns.
E.g.
Select Column1, Column2, [ Parse code ] as PARAM-0040, [ Parse code ] as PARAM-0045, [ Parse code ] as PARAM-0070, Column6 .....
Thanks

You can do that with a regular expression. But regexps are non-standard.
This is how it is done in postgresql: REGEXP_MATCHES()
https://www.postgresqltutorial.com/postgresql-regexp_matches/
In postgresql regexp_matches returns zero or more values. So then it has to be broken down (thus the {})
A simpler way, also in postgresql is to use substring.
substring('foobar' from 'o(.)b')
Like:
select substring('PARAM-0040,PASS~PARAM-0045,PASS~PARAM-0070,PASS' from 'PARAM-0040,([^~]+)~');
substring
-----------
PASS
(1 row)

You may use the str_to_map function to split your data and subsequently extract each param's value. This example will first split each param/value pair by ~ before splitting the parameter and value by ,.
Reproducible example with your sample data:
WITH my_table AS (
SELECT 1 as SL, "PARAM-0040,PASS~PARAM-0045,PASS~PARAM-0070,PASS" as DATA
UNION ALL
SELECT 2 as SL, "PARAM-0040,FAIL~PARAM-0045,FAIL~PARAM-0070,PASS" as DATA
),
param_mapped_data AS (
SELECT SL, str_to_map(DATA,"~",",") param_map FROM my_table
)
SELECT
SL,
param_map['PARAM-0040'] AS PARAM0040,
param_map['PARAM-0045'] AS PARAM0045,
param_map['PARAM-0070'] AS PARAM0070
FROM
param_mapped_data
Actual code assuming your table is named my_table
WITH param_mapped_data AS (
SELECT SL, str_to_map(DATA,"~",",") param_map FROM my_table
)
SELECT
SL,
param_map['PARAM-0040'] AS PARAM0040,
param_map['PARAM-0045'] AS PARAM0045,
param_map['PARAM-0070'] AS PARAM0070
FROM
param_mapped_data
Outputs:
sl
param0040
param0045
param0070
1
PASS
PASS
PASS
2
FAIL
FAIL
PASS

Hive Delimiter using :

I want to extract a column A that has values such as W:X:Y:Z.
I am interested to extract Z from Column A.
I tried multiple commands such as SPLIT(Table.A, "[:]"[3] ) but get an error.
What is the best way to do this?

Split function returns array. Array index [3] should be applied to the split function result:
with yourtable as ( -- use your table instead of this
select 'W:X:Y:Z' as A
)
select split(A,'\\:')[3] from yourtable;
Result:
Z

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to use string_split() with delimiter in databricks? - sql

There is no string_split function in Databricks SQL. But there is split function for that (doc). Also in your case it's easier to write code using the combination of split and explode (doc) functions. Something like this: SELECT ID, explode(split(value, ',')) FROM TempView

Related

Presto SQL query

Big query unnest array with json values

How to change from regexp_extract to regexp_substr in Snowflake

Parsing within a field using SQL

Hive Delimiter using :

Categories

Resources