Extracting key value pair from string

Extracting key value pair from string - hive

I am new to Hive and attempting to run a query where one of the columns (col1) is described as a type string and contains key value pairs such as {color=blue, name=john, size=M}. I am trying to extract some of the values so I could do something like return all rows where col1 contains color=blue.
I've been trying to use get_json_object but I don't think that was the right approach as I'm not sure the field is technically a json array.

Using SPARK SQL which is Hive compatible.
In case col1 is a string, this can be a solution:
val initDF = spark.sql("select '{color=blue, name=john, size=M}' as col1 union select '{color=red, name=jim, size=L}' as col1")
initDF.show(false)
It displays:
+-------------------------------+
|col1 |
+-------------------------------+
|{color=blue, name=john, size=M}|
|{color=red, name=jim, size=L} |
+-------------------------------+
And if you want to get only the rows where color=blue
initDF.where("col1 like '%color=blue%'").show(false)
Which shows the expected result:
+-------------------------------+
|col1 |
+-------------------------------+
|{color=blue, name=john, size=M}|
+-------------------------------+
In case col1 is a struct:
val initDFStruct = spark.sql("select 'blue' as color, 'john' as name, 'M' as size union select 'red' as color, 'jim' as name, 'L'")
.selectExpr("(color, name, size) as col1")
initDFStruct.show(false)
It displays:
+---------------+
|col1 |
+---------------+
|[red, jim, L] |
|[blue, john, M]|
+---------------+
initDFStruct.where("col1.color = 'blue'").show(false)
Which shows the wanted result:
+---------------+
|col1 |
+---------------+
|[blue, john, M]|
+---------------+
In summary, if you have it as a string column, you can use in your where clause
where col1 like '%color=blue%'
while if you have it as a struct, you where clause should be:
"col1.color = 'blue'

You can convert your string to map (remove curly braces and spaces after comma and use str_to_map function). Example for Hive:
with your_data as
(
select '{color=blue, name=john, size=M}' str
)
select str as original_string,
m['color'] as color,
m['name'] as name,
m['size'] as size
from
(
select str, str_to_map(regexp_replace(regexp_replace(str,'\\{|\\}',''),', *',','),',','=') m
from your_data --use your table
)s;
Result:
original_string color name size
{color=blue, name=john, size=M} blue john M

Related

like clause on all columns

I want to use the like clause to search all columns in a row.
something like
SELECT * FROM test WHERE '%something%' IN *
also, I don't know the exact columns that I have, this is why i need a wildcard (*)
there is a way to do that with snowflake / SQL?

You may consider using array_construct and array_contains:
CREATE or REPLACE TABLE test ( id number, v varchar, z varchar )
as SELECT * FROM VALUES
(1, 'Gokhan', 'Aylin'),
(2, 'Joe', 'Black');
SELECT *, ARRAY_CONSTRUCT( * ) combined
FROM test where ARRAY_CONTAINS( 'Gokhan'::variant, combined );
You can also convert this array to varchar to search partly matching strings:
SELECT *, ARRAY_CONSTRUCT( * ) combined
FROM test
WHERE combined::VARCHAR LIKE '%Go%';
+----+--------+-------+------------------------+
| ID | V | Z | COMBINED |
+----+--------+-------+------------------------+
| 1 | Gokhan | Aylin | [ 1, "Gokhan", "Aylin" |
+----+--------+-------+------------------------+

If you want to search for 'something' in all columns you can try to concatenate all columns in the where clause:
SELECT * from TABLE where CONCAT(colum1,column2,column3) LIKE '%something%'
Remember to cast to string type any non string type column.

You have to little tweak the SQL, concat takes all data types
select a.* from (
select *, concat(*) as all_col_data from snowflake.schema.table_name
) as a
where a.all_col_data like '%something%'

Split SQL string with specific string instead of separator?

I have table that looks like:
|ID | String
|546 | 1,2,1,5,7,8
|486 | 2,4,8,1,5,1
|465 | 18,11,20,1,4,18,11
|484 | 11,10,11,12,50,11
I want to split the string to this:
|ID | String
|546 | 1,2
|546 | 1,5
|486 | 1,5,1
|486 | 1
|465 | 1,4
My goal is to show ID and all the strings starting with 1 with just the next number after them.
I filtered all rows without '%1,%' and I don't know how to continue.

If you use SQL Server 2016+, you may try to use a JSON-based approach. You need to transform the data into a valid JSON array and parse the JSON array with OPENJSON(). Note that STRING_SPLIT() is not an option here, because as is mentioned in the documentation, the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.
Table:
CREATE TABLE Data (
ID int,
[String] varchar(100)
)
INSERT INTO Data
(ID, [String])
VALUES
(546, '1,2,1,5,7,8'),
(486, '2,4,8,1,5,1'),
(465, '18,11,20,1,4,18,11'),
(484, '11,10,11,12,50,11')
Statement:
SELECT
ID,
CONCAT(FirstValue, ',', SecondValue) AS [String]
FROM (
SELECT
d.ID,
j.[value] As FirstValue,
LEAD(j.[value]) OVER (PARTITION BY d.ID ORDER BY CONVERT(int, j.[key])) AS SecondValue
FROM Data d
CROSS APPLY OPENJSON(CONCAT('[', d.[String], ']')) j
) t
WHERE t.FirstValue = '1'
Result:
----------
ID String
----------
465 1,4
486 1,5
486 1,
546 1,2
546 1,5

Something like :
SELECT ID, S.value
FROM Data
CROSS APPLY STRING_SPLIT(REPLACE(',' + String, ',1,', '#1,'), '#') AS S
WHERE value LIKE '1,%'
?

Convert an array into a Map

I have a table with a column like
[{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}]
Which is of the format array<struct<key:string,value:array<string>>>
I want to convert the column into below format :
{"e":["253","203","204"],"st":["mi"],"k2":["1","2"]}
which is of the type map<string,array<string>>
I have tried exploding the array but that does not work. Any ideas how I can do this in hive.

Without use of external libraries it's impossible. Please refer to brickhouse or create your own UDAF.
Note: further code provides snippets to reproduce the problem and solving the problem that Hive's built-in functions can solve. i.e map<string,string> not map<string, array<string>>.
-- reproducing the problem
CREATE TABLE test_table(id INT, input ARRAY<STRUCT<key:STRING,value:ARRAY<STRING>>>);
INSERT INTO TABLE test_table
SELECT
1 AS id,
ARRAY(
named_struct("key","e", "value", ARRAY("253","203","204")),
named_struct("key","st", "value", ARRAY("mi")),
named_struct("key","k2", "value", ARRAY("1", "2"))
) AS input;
SELECT id, input FROM test_table;
+-----+-------------------------------------------------------------------------------------------------------+--+
| id | input |
+-----+-------------------------------------------------------------------------------------------------------+--+
| 1 | [{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}] |
+-----+-------------------------------------------------------------------------------------------------------+--+
With exploding and using STRUCT features, we can split the keys and values.
SELECT id, exploded_input.key, exploded_input.value
FROM (
SELECT id, exploded_input
FROM test_table LATERAL VIEW explode(input) d AS exploded_input
) x;
+-----+------+----------------------+--+
| id | key | value |
+-----+------+----------------------+--+
| 1 | e | ["253","203","204"] |
| 1 | st | ["mi"] |
| 1 | k2 | ["1","2"] |
+-----+------+----------------------+--+
The idea is to use your UDAF to "collect" a map while aggregating on id.
What Hive can solve with built in function is generating map<string,string> by converting rows to strings with a special delimiter, aggregate rows via another special delimiter and use str_to_map built-in function on the delimiters to generate map<string, string>.
SELECT
id,
str_to_map(
-- outputs: e:253,203,204#st:mi#k2:1,2 with delimiters between aggregated rows
concat_ws('#', collect_list(list_to_string)),
'#', -- first delimiter
':' -- second delimiter
) mapped_output
FROM (
SELECT
id,
-- outputs 3 rows: (e:253,203,203), (st:mi), (k2:1,2)
CONCAT(exploded_input.key,':' , CONCAT_WS(',', exploded_input.value)) as list_to_string
FROM (
SELECT id, exploded_input
FROM test_table LATERAL VIEW explode(input) d AS exploded_input
) x
) y
GROUP BY id;
Which outputs a string to string map like:
+-----+-------------------------------------------+--+
| id | mapped_output |
+-----+-------------------------------------------+--+
| 1 | {"e":"253,203,204","st":"mi","k2":"1,2"} |
+-----+-------------------------------------------+--+

with input_set as (
select array(named_struct('key','e','value',array('253','203','204')),named_struct('key','st','value',array('mi')),named_struct('key','k2','value',array('1','2'))) as input_array
), break_input_set as (
select y.col_num as y_col_num,y.col_value as y_col_value from input_set lateral view posexplode(input_set.input_array) y as col_num, col_value
), create_map as (
select map(y_col_value.key,y_col_value.value) as final_map from break_input_set
)
select * from create_map;

var Array = [{"key":"e","value":["253","203","204"]},{"key":"st","value":["mi"]},{"key":"k2","value":["1","2"]}];
var obj = {}
for(var i=0;i<Array.length;i++){
obj[Array[i].key] = Array[i].value
}
obj will be in the required format

Oracle SQL: Transpose / rotate a table result having one row and many columns

I'm looking for a way to transpose or rotate a table in Oracle SQL. For this case there is only one row in the SELECT, but multiple columns.
Example:
SELECT
id AS "Id",
name AS "Name",
some_value AS "Favorite color"
FROM
table
WHERE
id = 5;
Result:
id | name | some_value
--- ------ -----------
5 John Orange
What I would like to see is:
Id | 5
Name | John
Favorite color | Orange
I'm aware of PIVOT, but I'm struggling to see a simple code with this case.

You can unpivot the columns to get this result as follows:
select fld, val
from (
select to_char(id) as "Id", -- convert all columns to same type
name as "Name",
some_value as "Favorite color"
from your_table
where id = 5
) unpivot(val for fld in("Id", "Name", "Favorite color"));

Use simple UNION ALL clause
SELECT 'Id' As field_name, cast( id as varchar2(100)) as Value FROM "TABLE" where id = 5
UNION ALL
SELECT 'Name' , name FROM "TABLE" where id = 5
UNION ALL
SELECT 'Favorite color' , some_value FROM "TABLE" where id = 5;

Frank Ockenfuss gave the answer I was looking for. Thanks, Frank!
However, a minor change makes changing the column names a bit more easier:
SELECT * FROM (
SELECT
TO_CHAR(id) AS id,
TO_CHAR(name) AS name,
TO_CHAR(some_value) AS fav_color
FROM my_table
WHERE id = 5
) UNPIVOT(value FOR key IN(
id AS 'Id',
name AS 'Name',
fav_color AS 'Favorite color'
));
Result:
key | value
-------------- ------
Id 5
Name John
Favorite color Orange

PostgreSQL query on text array value

I have a table where one column has an array - but stored in a text format:
mytable
id ids
-- -------
1 '[3,4]'
2 '[3,5]'
3 '[3]'
etc ...
I want to find all records that have the value 5 as an array element in the ids column.
I was trying to achieve this by using the "string to array" function and removing the [ symbols with the translate function, but couldn't find a way.

You can do this: http://www.sqlfiddle.com/#!1/5c148/12
select *
from tbl
where translate(ids, '[]','{}')::int[] && array[5];
Output:
| ID | IDS |
--------------
| 2 | [3,5] |
You can also use bool_or: http://www.sqlfiddle.com/#!1/5c148/11
with a as
(
select id, unnest(translate(ids, '[]','{}')::int[]) as elem
from tbl
)
select id
from a
group by id
having bool_or(elem = 5);
To see the original elements:
with a as
(
select id, unnest(translate(ids, '[]','{}')::int[]) as elem
from tbl
)
select id, '[' || array_to_string(array_agg(elem), ',') || ']' as ids
from a
group by id
having bool_or(elem = 5);
Output:
| ID | IDS |
--------------
| 2 | [3,5] |
Postgresql DDL is atomic, if it's not late yet in your project, just structure your stringly-typed array to a real array: http://www.sqlfiddle.com/#!1/6e18c/2
alter table tbl
add column id_array int[];
update tbl set id_array = translate(ids,'[]','{}')::int[];
alter table tbl drop column ids;
Query:
select *
from tbl
where id_array && array[5]
Output:
| ID | ID_ARRAY |
-----------------
| 2 | 3,5 |
You can also use contains operator: http://www.sqlfiddle.com/#!1/6e18c/6
select *
from tbl
where id_array #> array[5];
I prefer the && syntax though, it directly connotes intersection. It reflects that you are detecting if there's an intersection between two sets(array is a set)
http://www.postgresql.org/docs/8.2/static/functions-array.html

If you store the string representation of your arrays slightly differently, you can cast to array of integer directly:
INSERT INTO mytable
VALUES
(1, '{3,4}')
,(2, '{3,5}')
,(3, '{3}');
SELECT id, ids::int[]
FROM mytable;
Else, you have to put in one more step:
SELECT (translate(ids, '[]','{}'))::int[]
FROM mytable
I would consider making the column an array type to begin with.
Either way, you can find your row like this:
SELECT id, ids
FROM (
SELECT id, ids, unnest(ids::int[]) AS elem
FROM mytable
) x
WHERE elem = 5

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Extracting key value pair from string - hive

Related

like clause on all columns

Split SQL string with specific string instead of separator?

Convert an array into a Map

Oracle SQL: Transpose / rotate a table result having one row and many columns

PostgreSQL query on text array value

Categories

Resources