conditioned explode in presto\spark - sql

I have table with with this structure:
input
id, email_MD5, email_SHA1, idType
d1, md1, sh1, type1
d2, null, sh2, type2
I need to transform the table to source and destination relations according to following logic:
If only one of email_MD5 and email_SHA1 fields are null it converted to id-> email relation with original type.
If both emails are not nulls it converted to 3 relations: id-> email_MD5 , id-> email_SHA1 and the relation between the emails email_MD5 -> email_SHA1 with hardcoded type email
output
src, dst, idType
d1, md1, type1
d1, sh1, type1
md1, sh1, email
d2, sh2, type2
How can I do it in presto and spark sql?

I guess this is possible purely with UNION all possible combinations:
SELECT id AS src
,email_SHA1 AS dst
,idType
FROM input
WHERE email_SHA1 is not null
UNION
SELECT id AS src
,email_MD5 AS dst
,idType
FROM input
WHERE email_MD5 is not null
UNION
SELECT email_MD5 AS src
,email_SHA1 AS dst
,'email' AS idType
FROM input
WHERE email_SHA1 is not null and email_MD5 is not null

Related

How to convert multiple records into single row using T-SQL PIVOT?

I have a table that stores data for each type of record. I want to convert multiple rows into a single row using PIVOT function.
Input table:
Expected output:
I was able to PIVOT the table using CHAR_TYPE field but not getting the respective date for E_DATE and CO_DATE fields.
My code:
SELECT * FROM
(
SELECT ACCT, CHAR_TYPE, EFF_DATE, ADHOC_CHAR_VAL
FROM ACCT
)
PIVOT
(
MIN(ADHOC_CHAR_VAL) FOR CHAR_TYPE IN ('C' AS C, 'M' AS M, 'E' AS E, 'CO' AS CO)
)
WHERE (C IS NOT NULL OR M IS NOT NULL OR E IS NOT NULL OR CO IS NOT NULL)
This is giving me output as:
Please assist with the query to get data in expected output format.

hierarchy with parent and nested child id in bigquery

I have a table in bigquery with the following schema
Name STRING NULLABLE
Parent_id STRING NULLABLE
Child_ids STRING REPEATED
The table is filled with the following rows:
Name Parent_id Child_ids
A 1 [2,3]
B 2 [4]
C 3 null
D 4 null
I would like to make a query which could return not only child_ids but also their name, i.e:
Name Parent_id Child_info
A 1 [(2,B),(3,C)]
B 2 [(4,D)]
C 3 null
D 4 null
Do you have any idea?
Consider below approach
select * except(Child_ids),
array(
select as struct id, Name
from t.Child_ids id
join your_table
on id = Parent_id
) Child_info
from your_table t;
if applied to sample data in your question - output is

SQL Union 2 table with union with one table having column as null

For given example , i am planing to do a union of 2 tables
table A
country_code | arrg_id | arrg_desc
sg test1 est_desc
.. .. ..
table B
country_code | arrg_desc
sg test_2
given an example above, i would like to union both of this 2 table.
i have came out with query
select Business as 'Retail' , country_code as country , arrg_id as arrg_id from table_A
union
select Business as 'Retail' , country_code as, 'NA' as arrg_id from table_b
as i run the query specified, i was receiving error on : error: (HY000, None) AnalysisException: Incompatible return types 'DOUBLE' and 'STRING'
However if i cast as string, this would work
select Business as 'Retail' , country_code as country , cast(arrg_id as String) as arrg_id from table_A
union
select Business as 'Retail' , country_code as, 'NA' as arrg_id from table_b
i was wondering if this would be the best approach or it will disturb the integrity of the data , would need some advices on this.
The rule of thumb for taking a union between two (or more) tables is that the types of columns in both halves of the union should always be the same. Presumably here, the arrg_id in the A table is numeric. This means that if you want to union this value with the string literal 'NA', then you would either need the cast in your example above, or, you would need to union arrg_id with another numeric value (e.g. perhaps -1). So, the query you suggested with the cast to string is correct.

SQL grouping by distinct values in a multi-value string column

(I want to perform a group-by based on the distinct values in a string column that has multiple values
The said column has a list of strings in a standard format separated by commas. The potential values are only a,b,c,d.
For example the column collection (type: String) contains:
Row 1: ["a","b"]
Row 2: ["b","c"]
Row 3: ["b","c","a"]
Row 4: ["d"]`
The expected output is a count of unique values:
collection | count
a | 2
b | 3
c | 2
d | 1
For all the below i used this table:
create table tmp (
id INT auto_increment,
test VARCHAR(255),
PRIMARY KEY (id)
);
insert into tmp (test) values
("a,b"),
("b,c"),
("b,c,a"),
("d")
;
If the possible values are only a,b,c,d you can try one of this:
Tke note that this will only works if you have not so similar values like test and test_new, because then the test would be joined also with all test_new rows and the count would not match
select collection, COUNT(*) as count from tmp JOIN (
select CONCAT("%", tb.collection, "%") as like_collection, collection from (
select "a" COLLATE utf8_general_ci as collection
union select "b" COLLATE utf8_general_ci as collection
union select "c" COLLATE utf8_general_ci as collection
union select "d" COLLATE utf8_general_ci as collection
) tb
) tb1
ON tmp.test LIKE tb1.like_collection
GROUP BY tb1.collection;
Which will give you the result you want
collection | count
a | 2
b | 3
c | 2
d | 1
or you can try this one
SELECT
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%a%') as a_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%b%') as b_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%c%') as c_count,
(SELECT COUNT(*) FROM tmp WHERE test LIKE '%d%') as d_count
;
The result would be like this
a_count | b_count | c_count | d_count
2 | 3 | 2 | 1
What you need to do is to first explode the collection column into separate rows (like a flatMap operation). In redshift the only way to generate new rows is to JOIN - so let's CROSS JOIN your input table with a static table having consecutive numbers, and take only ones having id less or equal to number of elements in the collection. Then we'll use split_part function to read the item at correct index. Once we have the exploaded table, we'll do a simple GROUP BY.
If your items are stored as JSON array strings ('["a", "b", "c"]') then you can use JSON_ARRAY_LENGTH and JSON_EXTRACT_ARRAY_ELEMENT_TEXT instead of REGEXP_COUNT and SPLIT_PART respectively.
with
index as (
select 1 as i
union all select 2
union all select 3
union all select 4 -- could be substituted with 'select row_number() over () as i from arbitrary_table limit 4'
),
agg as (
select 'a,b' as collection
union all select 'b,c'
union all select 'b,c,a'
union all select 'd'
)
select
split_part(collection, ',', i) as item,
count(*)
from index,agg
where regexp_count(agg.collection, ',') + 1 >= index.i -- only get rows where number of items matches
group by 1

How to update database column values in a single command(no CASE/SWITCH)?

Lets say we have a table name Swap-Table.
*Input Table*
ID NAME Type
------------------
1 name1 a
2 name2 b
3 name3 b
I want to write a single command to update the table.The output table would be
**Output Table**
ID NAME Type
------------------
1 name1 b
2 name2 a
3 name3 a
Condition: No CASE/SWITCH
You may use CTE or some sort of subquery to generate an update dictionary
WITH upd_dict (type_from, type_to) AS (
SELECT 'a', 'b'
UNION
SELECT 'b', 'a')
UPDATE table_name
SET type = ud.type_to
FROM upd_dict ud
WHERE ud.type_from = type
But CASE looks much more readable and understandable here if you ask.
I only offer this as a "cute" way to do this transformation, rather than anything I'd allow (or even recommend) in production code:
declare #t table (ID int not null,Name varchar(17) not null,Type varchar(3) not null)
insert into #t(ID,NAME,Type) values
(1,'name1','a'),
(2,'name2','b'),
(3,'name3','b')
update #t set Type = CHAR(195-ASCII(Type))
select * from #t
Produces:
ID Name Type
----------- ----------------- ----
1 name1 b
2 name2 a
3 name3 a
(Different database products may have different ways to transform from/to ascii codes and for table variables)