how to convert jsonarray to multi column from hive - sql

example:
there is a json array column(type:string) from a hive table like:
"[{"filed":"name", "value":"alice"}, {"filed":"age", "value":"14"}......]"
how to convert it into :
name age
alice 14
by hive sql?
I've tried lateral view explode but it's not working.
thanks a lot!

This is working example of how it can be parsed in Hive. Customize it yourself and debug on real data, see comments in the code:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field_map['field'] = 'name' then field_map['value'] end) as name,
max(case when field_map['field'] = 'age' then field_map['value'] end) as age --do the same for all fields
from
(
select t.id,
t.str as original_string,
str_to_map(regexp_replace(regexp_replace(trim(a.field),', +',','),'\\{|\\}|"','')) field_map --remove extra characters and convert to map
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14
One more approach using get_json_object:
with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)
select id,
max(case when field = 'name' then value end) as name,
max(case when field = 'age' then value end) as age --do the same for all fields
from
(
select t.id,
get_json_object(trim(a.field),'$.field') field,
get_json_object(trim(a.field),'$.value') value
from your_table t
lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode
) s
group by id --aggregate in single row
;
Result:
OK
id name age
1 alice 14

Related

How to concatenate a conditional field and remove the same value

I am trying to create a column with a case statement, then concatenate the column. Here is an example code.
WITH base AS (
SELECT ID, Date, Action, case when (Date is null then Action || '**' else Action End) Action_with_no_date
FROM <Table_Name>
)
SELECT ID, "array_join"("array_agg"(DISTINCT Action_with_no_date), ', ') Action_with_no_date
FROM base
GROUP BY ID;
Basically, the Action_with_no_date will display the concatenation of values in Action with '**' string added to the values where Date is null for each ID
After I did this, I found an edge case.
If there is the same Action (i.e. play) taken for one ID, and if one action has date and the other one doesn't, then the output will have one play and one play** for the ID
However, I want this to display just one play with **.
Below is the example data for ID = 1
ID Date Action
1 1/2/22 read
1 1/3/22 play
1 NULL play
and expected result for the ID
ID Action_with_no_date
1 read, play**
How should I handle this?
You can calculate ** suffix if there is any row with null per id and action using analytic max() with case expression. Then concatenate suffix with action.
Demo:
with mytable as (
SELECT * FROM (
VALUES
(1, '1/2/22', 'read'),
(1, '1/3/22', 'play'),
(1, NULL, 'play')
) AS t (id, date, action)
)
select id, array_join(array_agg(DISTINCT action||suffix), ', ')
from
(
select id, date, action,
max(case when date is null then '**' else '' end) over(partition by id, action) as suffix
from mytable
)s
group by id
Result:
1 play**, read

Find and replace pattern inside BigQuery string

Here is my BigQuery table. I am trying to find out the URLs that were displayed but not viewed.
create table dataset.url_visits(ID INT64 ,displayed_url string , viewed_url string);
select * from dataset.url_visits;
ID Displayed_URL Viewed_URL
1 url11,url12 url12
2 url9,url12,url13 url9
3 url1,url2,url3 NULL
In this example, I want to display
ID Displayed_URL Viewed_URL unviewed_URL
1 url11,url12 url12 url11
2 url9,url12,url13 url9 url12,url13
3 url1,url2,url3 NULL url1,url2,url3
Split the each string into an array and unnest them. Do a case to check if the items are in each other and combine to an array or a string.
Select ID, string_agg(viewing ) as viewed,
string_agg(not_viewing ) as not_viewed,
array_agg(viewing ignore nulls) as viewed_array
from (
Select ID ,
case when display in unnest(split(Viewed_URL)) then display else null end as viewing,
case when display in unnest(split(Viewed_URL)) then null else display end as not_viewing,
from (
Select 1 as ID, "url11,url12" as Displayed_URL, "url12" as Viewed_URL UNION ALL
Select 2, "url9,url12,url13", "url9" UNION ALL
Select 3, "url1,url2,url3", NULL UNION ALL
Select 4, "url9,url12,url13", "url9,url12"
),unnest(split(Displayed_URL)) as display
)
group by 1
Consider below approach
select *, (
select string_agg(url)
from unnest(split(Displayed_URL)) url
where url != ifnull(Viewed_URL, '')
) unviewed_URL
from `project.dataset.table`
if applied to sample data in your question - output is

Find way for gathering data and replace with values from another table

I am looking for an Oracle SQL query to find a specific pattern and replace them with values from another table.
Scenario:
Table 1:
No column1
-----------------------------------------
12345 user:12345;group:56789;group:6785;...
Note: field 1 may be has one or more pattern
Table2 :
Id name type
----------------------
12345 admin user
56789 testgroup group
Result must be the same
No column1
-----------------------------------
12345 user: admin;group:testgroup
Logic:
First split the concatenated string to individual rows using connect
by clause and regex.
Join the newly created table(split_tab) with Table2(tab2).
Use listagg function to concatenate data in the columns.
Query:
WITH tab1 AS
( SELECT '12345' NO
,'user:12345;group:56789;group:6785;' column1
FROM DUAL )
,tab2 AS
( SELECT 12345 id
,'admin' name
,'user' TYPE
FROM DUAL
UNION
SELECT 56789 id
,'testgroup' name
,'group' TYPE
FROM DUAL )
SELECT no
,listagg(category||':'||name,';') WITHIN GROUP (ORDER BY tab2.id) column1
FROM ( SELECT NO
,REGEXP_SUBSTR( column1, '(\d+)', 1, LEVEL ) id
,REGEXP_SUBSTR( column1, '([a-z]+)', 1, LEVEL ) CATEGORY
FROM tab1
CONNECT BY LEVEL <= regexp_count( column1, '\d+' ) ) split_tab
,tab2
WHERE split_tab.id = tab2.id
GROUP BY no
Output:
No Column1
12345 user:admin;group:testgroup
with t1 (no, col) as
(
-- start of test data
select 1, 'user:12345;group:56789;group:6785;' from dual union all
select 2, 'user:12345;group:56789;group:6785;' from dual
-- end of test data
)
-- the lookup table which has the substitute strings
-- nid : concatenation of name and id as in table t1 which requires the lookup
-- tname : required substitute for each nid
, t2 (id, name, type, nid, tname) as
(
select t.*, type || ':' || id, type || ':' || name from
(
select 12345 id, 'admin' name, 'user' type from dual union all
select 56789, 'testgroup', 'group' from dual
) t
)
--select * from t2;
-- cte table calculates the indexes for the substrings (eg, user:12345)
-- no : sequence no in t1
-- col : the input string in t1
-- si : starting index of each substring in the 'col' input string that needs attention later
-- ei : ending index of each substring in the 'col' input string
-- idx : the order of substring to put them together later
,cte (no, col, si, ei, idx) as
(
select no, col, 1, case when instr(col,';') = 0 then length(col)+1 else instr(col,';') end, 1 from t1 union all
select no, col, ei+1, case when instr(col,';', ei+1) = 0 then length(col)+1 else instr(col,';', ei+1) end, idx+1 from cte where ei + 1 <= length(col)
)
,coll(no, col, sstr, idx, newstr) as
(
select
a.no, a.col, a.sstr, a.idx,
-- when a substitute is not found in t2, use the same input substring (eg. group:6785)
case when t2.tname is null then a.sstr else t2.tname end
from
(select cte.*, substr(col, si, ei-si) as sstr from cte) a
-- we don't want to miss if there is no substitute available in t2 for a substring
left outer join
t2
on (a.sstr = t2.nid)
)
select no, col, listagg(newstr, ';') within group (order by no, col, idx) from coll
group by no, col;

extract the second found matching substring using Postgresql

I use the bellow query to extract a value from a column that stores JSON objects.
The issue with it, it does only pull the first value matching to the regex inside SUBSTRING which is -$4,000.00, is there's a parameter to pass to the SUBSTRING to pull the value -$1,990.00 as well in another column.
SELECT attribute_actions_text
, SUBSTRING(attribute_actions_text FROM '"Member [Dd]iscount:":"(.+?)"') AS column_1
, '' AS column_2
FROM (
VALUES
('[{"Member Discount:":"-$4,000.00"},{"Member discount:":"-$1,990.00"}]')
, (NULL)
) ls(attribute_actions_text)
Desired result :
column_1 column_2
-$4,000.00 -$1,990.00
Try this
WITH data(id,attribute_actions_text) as (
VALUES
(1,'[{"Member Discount:":"-$4,000.00"},{"Member Discount:":"-$1,990.00"}]')
, (2,'[{"Member Discount:":"-$4,200.00"},{"Member Discount:":"-$1,890.00"}]')
, (3,NULL)
), match as (
SELECT
id,
m,
ROW_NUMBER()
OVER (PARTITION BY id) AS r
FROM data, regexp_matches(data.attribute_actions_text, '"Member [Dd]iscount:":"(.+?)"', 'g') AS m
)
SELECT
id
,(select m from match where id = d.id AND r=1) as col1
,(select m from match where id = d.id AND r=2) as col2
FROM data d
Result
1,"{-$4,000.00}","{-$1,990.00}"
2,"{-$4,200.00}","{-$1,890.00}"
3,NULL,NULL

Splitting rows to columns in oracle

I have data in a table which looks like:
I want to split its data and make it look like the following through a sql query in Oracle (without using pivot):
How can it be done?? is there any other way of doing so without using pivot?
You need to use a pivot query here to get the output you want:
SELECT Name,
MIN(CASE WHEN ID_Type = 'PAN' THEN ID_No ELSE NULL END) AS PAN,
MIN(CASE WHEN ID_Type = 'DL' THEN ID_No ELSE NULL END) AS DL,
MIN(CASE WHEN ID_Type = 'Passport' THEN ID_No ELSE NULL END) AS Passport
FROM yourTable
GROUP BY Name
You could also try using Oracle's built in PIVOT() function if you are running version 11g or later.
Since you mention without using PIVOT function, you can try to
use SQL within group for moving rows onto one line and listagg to display multiple column values in a single column.
In Oracle 11g, we can use the listagg built-in function :
select
deptno,
listagg (ename, ',')
WITHIN GROUP
(ORDER BY ename) enames
FROM
emp
GROUP BY
deptno
Which should give you the below result:
DEPTNO ENAMES
------ --------------------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
You can find all the solution(s) to this problem here:
http://www.dba-oracle.com/t_converting_rows_columns.htm
For Oracle 11g and above, you could use PIVOT.
For pre-11g release, you could use MAX and CASE.
A common misconception, about PIVOT better in terms of performance than the old way of MAX and DECODE. But, under the hood PIVOT is same MAX + CASE. You can check it in 12c where Oracle added EXPAND_SQL_TEXT procedure to DBMS_UTILITY package.
For example,
SQL> variable c clob
SQL> begin
2 dbms_utility.expand_sql_text(Q'[with choice_tbl as (
3 select 'Jones' person,1 choice_nbr,'Yellow' color from dual union all
4 select 'Jones',2,'Green' from dual union all
5 select 'Jones',3,'Blue' from dual union all
6 select 'Smith',1,'Orange' from dual
7 )
8 select *
9 from choice_tbl
10 pivot(
11 max(color)
12 for choice_nbr in (1 choice_nbr1,2 choice_nbr2,3 choice_nbr3)
13 )]',:c);
14 end;
15 /
PL/SQL procedure successfully completed.
Now let's see what Oracle actually does internally:
SQL> set long 100000
SQL> print c
C
--------------------------------------------------------------------------------
SELECT "A1"."PERSON" "PERSON",
"A1"."CHOICE_NBR1" "CHOICE_NBR1",
"A1"."CHOICE_NBR2" "CHOICE_NBR2",
"A1"."CHOICE_NBR3" "CHOICE_NBR3"
FROM (
SELECT "A2"."PERSON" "PERSON",
MAX(CASE WHEN ("A2"."CHOICE_NBR"=1) THEN "A2"."COLOR" END ) "CHOICE_NBR1",
MAX(CASE WHEN ("A2"."CHOICE_NBR"=2) THEN "A2"."COLOR" END ) "CHOICE_NBR2",
MAX(CASE WHEN ("A2"."CHOICE_NBR"=3) THEN "A2"."COLOR" END ) "CHOICE_NBR3"
FROM (
(SELECT 'Jones' "PERSON",1 "CHOICE_NBR",'Yellow' "COLOR" FROM "SYS"."DUAL" "A7") UNION ALL
(SELECT 'Jones' "'JONES'",2 "2",'Green' "'GREEN'" FROM "SYS"."DUAL" "A6") UNION ALL
(SELECT 'Jones' "'JONES'",3 "3",'Blue' "'BLUE'" FROM "SYS"."DUAL" "A5") UNION ALL
(SELECT 'Smith' "'SMITH'",1 "1",'Orange' "'ORANGE'" FROM "SYS"."DUAL" "A4")
) "A2"
GROUP BY "A2"."PERSON"
) "A1"
SQL>
Oracle internally interprets the PIVOT as MAX + CASE.
You're able to create a non-pivot query by understanding what the pivot query will do:
select *
from yourTable
pivot
(
max (id_no)
for (id_type) in ('PAN' as pan, 'DL' as dl, 'Passport' as passport)
)
What the pivot does is GROUP BY all columns not specified inside the PIVOT clause (actually, just the name column), selecting new columns in a subquery fashion based on the aggregations before the FOR clause for each value specified inside the IN clause and discarding those columns specified inside the PIVOT clause.
When I say "subquery fashion" I'm refering to one way to achieve the result got with PIVOT. Actually, I don't know how this works behind the scenes. This subquery fashion would be like this:
select <aggregation>
from <yourTable>
where 1=1
and <FORclauseColumns> = <INclauseValue>
and <subqueryTableColumns> = <PIVOTgroupedColumns>
Now you identified how you can create a query without the PIVOT clause:
select
name,
(select max(id_no) from yourTable where name = t.name and id_type = 'PAN') as pan,
(select max(id_no) from yourTable where name = t.name and id_type = 'DL') as dl,
(select max(id_no) from yourTable where name = t.name and id_type = 'Passport') as passport
from yourTable t
group by name
You can use CTE's to break the data down and then join them back together to get what you want:
WITH NAMES AS (SELECT DISTINCT NAME
FROM YOURTABLE),
PAN AS (SELECT NAME, ID_NO AS PAN
FROM YOURTABLE
WHERE ID_TYPE = 'PAN'),
DL AS (SELECT NAME, ID_NO AS DL
FROM YOURTABLE
WHERE ID_TYPE = 'DL'),
PASSPORT AS (SELECT NAME, ID_NO AS "Passport"
FROM YOURTABLE
WHERE ID_TYPE = 'Passport')
SELECT n.NAME, p.PAN, d.DL, t."Passport"
FROM NAMES n
LEFT OUTER JOIN PAN p
ON p.NAME = n.NAME
LEFT OUTER JOIN DL d
ON d.NAME = p.NAME
LEFT OUTER JOIN PASSPORT t
ON t.NAME = p.NAME'
Replace YOURTABLE with the actual name of your table of interest.
Best of luck.