Hive Delimiter using : - sql

I want to extract a column A that has values such as W:X:Y:Z.
I am interested to extract Z from Column A.
I tried multiple commands such as SPLIT(Table.A, "[:]"[3] ) but get an error.
What is the best way to do this?

Split function returns array. Array index [3] should be applied to the split function result:
with yourtable as ( -- use your table instead of this
select 'W:X:Y:Z' as A
)
select split(A,'\\:')[3] from yourtable;
Result:
Z

Related

How to use string_split() with delimiter in databricks?

I am trying to use string_split() function in databricks to convert below dataframe.
Source dataframe stored as TempView in Databricks:
ID
value
1
value-1,value-2,value-3
2
value-1,value-4
Output needed:
ID
value
1
value-1
1
value-2
1
value-3
3
value-1
3
value-4
I tried below code:
%sql
SELECT ID, value
FROM TempView
CROSS APPLY STRING_SPLIT(value, ',')
GROUP BY cs.PERMID, value
but I am getting Parse exception.
There is no string_split function in Databricks SQL. But there is split function for that (doc).
Also in your case it's easier to write code using the combination of split and explode (doc) functions. Something like this:
SELECT ID, explode(split(value, ',')) FROM TempView

Parsing within a field using SQL

We are receiving data in one column where further parsing is needed. In this example the separator is ~.
Goal is to grab the pass or fail value from its respective pair.
SL
Data
1
"PARAM-0040,PASS~PARAM-0045,PASS~PARAM-0070,PASS"
2
"PARAM-0040,FAIL~PARAM-0045,FAIL~PARAM-0070,PASS"
Required outcome:
SL
PARAM-0040
PARAM-0045
PARAM-0070
1
PASS
PASS
PASS
2
FAIL
FAIL
PASS
This will be a part of a bigger SQL query where we are selecting many other columns, and these three columns are to be picked up from the source as well and passed in the query as selected columns.
E.g.
Select Column1, Column2, [ Parse code ] as PARAM-0040, [ Parse code ] as PARAM-0045, [ Parse code ] as PARAM-0070, Column6 .....
Thanks
You can do that with a regular expression. But regexps are non-standard.
This is how it is done in postgresql: REGEXP_MATCHES()
https://www.postgresqltutorial.com/postgresql-regexp_matches/
In postgresql regexp_matches returns zero or more values. So then it has to be broken down (thus the {})
A simpler way, also in postgresql is to use substring.
substring('foobar' from 'o(.)b')
Like:
select substring('PARAM-0040,PASS~PARAM-0045,PASS~PARAM-0070,PASS' from 'PARAM-0040,([^~]+)~');
substring
-----------
PASS
(1 row)
You may use the str_to_map function to split your data and subsequently extract each param's value. This example will first split each param/value pair by ~ before splitting the parameter and value by ,.
Reproducible example with your sample data:
WITH my_table AS (
SELECT 1 as SL, "PARAM-0040,PASS~PARAM-0045,PASS~PARAM-0070,PASS" as DATA
UNION ALL
SELECT 2 as SL, "PARAM-0040,FAIL~PARAM-0045,FAIL~PARAM-0070,PASS" as DATA
),
param_mapped_data AS (
SELECT SL, str_to_map(DATA,"~",",") param_map FROM my_table
)
SELECT
SL,
param_map['PARAM-0040'] AS PARAM0040,
param_map['PARAM-0045'] AS PARAM0045,
param_map['PARAM-0070'] AS PARAM0070
FROM
param_mapped_data
Actual code assuming your table is named my_table
WITH param_mapped_data AS (
SELECT SL, str_to_map(DATA,"~",",") param_map FROM my_table
)
SELECT
SL,
param_map['PARAM-0040'] AS PARAM0040,
param_map['PARAM-0045'] AS PARAM0045,
param_map['PARAM-0070'] AS PARAM0070
FROM
param_mapped_data
Outputs:
sl
param0040
param0045
param0070
1
PASS
PASS
PASS
2
FAIL
FAIL
PASS

How to replace the first letter of a string using SQL and Postgres?

I have a database column of varchar(191) with strings in the database. We need to replace the first letter of every string with an "E". So for instance, we have:
Cuohvi-AQNqalPq8zdr1cOA
Needs to be changed to
Euohvi-AQNqalPq8zdr1cOA
Do you know how we can achieve this in Postgres with a SQL query? It needs to be updated for the whole table.
Per docs use overlay():
UPDATE the_table SET the_field = overlay(the_field placing 'E' from 1 for 1);
Use a combination of the CONCAT function and the RIGHT function with an argument of -1.
SELECT CONCAT('E', RIGHT('Cuohvi-AQNqalPq8zdr1cOA', - 1))
FROM yourtable
SELECT CONCAT('E', RIGHT(yourfield, - 1))
FROM yourtable
dbfiddle: https://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=e198b05f02283137afc39c24bb6c788d

SQL select specific (4th) part of column BLOB data, separated by specific pattern

There is a BLOB column that contains data like:
{{Property1 {property1_string}} {Property2 {property2_string}} {Property3 {property3_string}} {Property4 {property4_string}} {Property5 {property5_string}}}
I select the above column to display the BLOB data, as follows:
utl_raw.cast_to_varchar2(dbms_lob.substr(blobColumn))
I need to display only the data of 4th Property of BLOB column, so the following:
{Property4 {property4_string}}
So, I need help to create the necessary select for this purpose.
Thank you.
this will work:
select substr(cast(blobfieldname as
varchar2(2000)),instr(cast(blobfieldname as
varchar2(2000)),'{',1,8)),instr(cast(blobfieldname as
varchar2(2000)),'}',1,8))-
instr(cast(blobfieldname as varchar2(2000)),'{',1,8))) from tablename;
You may use REGEXP_SUBSTR.
select REGEXP_SUBSTR(s,'[^{} ]+', 1, 2 * :n) FROM t;
Where n is the nth property string you want to extract from your data.
n = 1 gives property1_string
n = 2 gives property2_string
..
and so on
Note that s should be the output of utl_raw.cast_to_varchar2
Demo

Get max on comma separated values in column

How to get max on comma separated values in Original_Ids column and get max value in one column and remaining ids in different column.
|Original_Ids | Max_Id| Remaining_Ids |
|123,534,243,345| 534 | 123,234,345 |
Upadte -
If I already have Max_id and just need below equation?
Remaining_Ids = Original_Ids - Max_id
Thanks
Thanks to the excellent possibilities of array manipulation in Postgres, this could be done relatively easy by converting the string to an array and from there to a set.
Then regular queries on that set are possible. With max() the maximum can be selected and with EXCEPT ALL the maximum can be removed from the set.
A set can then be converted to an array and with array_to_string() and the array can be converted to a delimited string again.
SELECT ids original_ids,
(SELECT max(un.id::integer)
FROM unnest(string_to_array(ids,
',')) un(id)) max_id,
array_to_string(ARRAY((SELECT un.id::integer
FROM unnest(string_to_array(ids,
',')) un(id)
EXCEPT ALL
SELECT max(un.id::integer)
FROM unnest(string_to_array(ids,
',')) un(id))),
',') remaining_ids
FROM elbat;
Another option would have been regexp_split_to_table() which directly produces a set (or regexp_split_to_array() but than we'd had the possible regular expression overhead and still had to convert the array to a set).
But nevertheless you just should (almost) never use delimited lists (nor arrays). Use a table, that's (almost) always the best option.
SQL Fiddle
You can use a window function (https://www.postgresql.org/docs/current/static/tutorial-window.html) to get the max element per unnested array. After that you can reaggregate the elements and remove the calculated max value from the array.
Result:
a max_elem remaining
123,534,243,345 534 123,243,345
3,23,1 23 3,17
42 42
56,123,234,345,345 345 56,123,234
This query needs only one split/unnest as well as only one max calculation.
SELECT
a,
max_elem,
array_remove(array_agg(elements), max_elem) as remaining -- C
FROM (
SELECT
*,
MAX(elements) OVER (PARTITION BY a) as max_elem -- B
FROM (
SELECT
a,
unnest((string_to_array(a, ','))::int[]) as elements -- A
FROM arrays
)s
)s
GROUP BY a, max_elem
A: string_to_array converts the string list into an array. Because the arrays are treated as string arrays you need the cast them into integer arrays by adding ::int[]. The unnest() expands all array elements into own rows.
B: window function MAX gives the maximum value of the single arrays as max_elem
C: array_agg reaggregates the elements through the GROUP BY id. After that array_remove removes the max_elem value from the array.
If you do not like to store them as pure arrays but as string list again you could add array_to_string. But I wouldn't recommend this because your data are integer arrays and not strings. For every further calculation you would need this string cast. A even better way (as already stated by #stickybit) is not to store the elements as arrays but as unnested data. As you can see in nearly every operation should would do the unnest before.
Note:
It would be better to use an ID to adress the columns/arrays instead of the origin string as in SQL Fiddle with IDs
If you install the extension intarray this is quite easy.
First you need to create the extension (you have to be superuser to do that):
create extension intarray;
Then you can do the following:
select original_ids,
original_ids[1] as max_id,
sort(original_ids - original_ids[1]) as remaining_ids
from (
select sort_desc(string_to_array(original_ids,',')::int[]) as original_ids
from bad_design
) t
But you shouldn't be storing comma separated values to begin with