How to Create a Touchpoint Table from a list - sql

I am working on a SQL query in the Azure Databricks environment that has the following dataset:
CREATE OR REPLACE TABLE touchpoints_table
(
List STRING,
Path_Lenght INT
);
INSERT INTO touchpoints_table VALUES
('BBB, AAA, CCC', 3),
('BBB', 1),
('DDD, AAA', 2),
('DDD, BBB, AAA, EEE, CCC', 5),
('EEE, AAA, EEE, CCC', 4);
SELECT * FROM touchpoints_table
| | List | Path_length |
| 0 | BBB, AAA, CCC | 3 |
| 1 | CCC | 1 |
| 2 | DDD, AAA | 2 |
| 3 | DDD, BBB, AAA, EEE, CCC | 5 |
| 4 | EEE, AAA, EEE, CCC | 4 |
and the task consists of generating the following table:
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 1 | 1 | 0 |
| 2 | CCC | 1 | 0 | 0 | 3 |
| 3 | DDD | 0 | 2 | 0 | 0 |
| 4 | EEE | 0 | 1 | 2 | 0 |
where the columns contain the following:
Content: the elements found in the List
Unique: the number of times that the element appears alone in the list
Started: the number of times that the element appears at the beginning
Finished: the number of times that the element appears at the end
Middleway: the number of times the element appears between the beginning and the end.
Using the following query I almost get the result but somehow the group by does not worked correctly
WITH tb1 AS(
SELECT
CAST(touch_array AS STRING) AS touch_list,
EXPLODE(touch_array) AS explode_list,
ROW_NUMBER()OVER(PARTITION BY CAST(touch_array AS STRING) ORDER BY (SELECT 1)) touch_count,
COUNT(*)OVER(PARTITION BY touch_array) touch_lenght
FROM (SELECT SPLIT(List, ',') AS touch_array FROM touchpoints_table)
)
SELECT
explode_list AS Content,
SUM(CASE WHEN touch_lenght=1 THEN 1 ELSE 0 END) AS Unique,
SUM(CASE WHEN touch_count=1 AND touch_lenght > 1 THEN 1 ELSE 0 END) AS Started,
SUM(CASE WHEN touch_count>1 AND touch_count < touch_lenght THEN 1 ELSE 0 END) AS Middleway,
SUM(CASE WHEN touch_count>1 AND touch_count = touch_lenght THEN 1 ELSE 0 END) AS Finished
FROM tb1
GROUP BY explode_list
ORDER BY explode_list
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 0 | 1 | 0 |
| 2 | CCC | 0 | 0 | 0 | 3 |
| 3 | EEE | 0 | 0 | 2 | 0 |
| 4 | BBB | 1 | 1 | 0 | 0 |
| 5 | DDD | 0 | 2 | 0 | 0 |
| 6 | EEE | 0 | 1 | 0 | 0 |
Could you help me by suggesting a code that solves this task?

Query example for SQL Server
with allElements as(
select list ,el,elN,elQty
from touchpoints_table tp
cross apply (select trim(value) as el,row_number()over(order by (select 1)) elN
,count(*)over() elQty
from string_split(tp.list,',')
) t
)
select el
,sum(case when elQty=1 then 1 else 0 end) as 'unique'
,sum(case when elN=1 and elQty>1 then 1 else 0 end) as 'strated'
,sum(case when elN>1 and elN<elQty then 1 else 0 end) as 'middleway'
,sum(case when elN>1 and elN=elQty then 1 else 0 end) as 'finished'
from allElements
group by el
order by el
Demo

Related

Group by Problem in Touchpoint Table Creation (SQL)

I am working on a SQL query in the Azure Databricks environment that has the following dataset:
CREATE OR REPLACE TABLE touchpoints_table
(
List STRING,
Path_Lenght INT
);
INSERT INTO touchpoints_table VALUES
('BBB, AAA, CCC', 3),
('BBB', 1),
('DDD, AAA', 2),
('DDD, BBB, AAA, EEE, CCC', 5),
('EEE, AAA, EEE, CCC', 4);
SELECT * FROM touchpoints_table
| | List | Path_length |
| 0 | BBB, AAA, CCC | 3 |
| 1 | CCC | 1 |
| 2 | DDD, AAA | 2 |
| 3 | DDD, BBB, AAA, EEE, CCC | 5 |
| 4 | EEE, AAA, EEE, CCC | 4 |
and the task consists of generating the following table:
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 1 | 1 | 0 |
| 2 | CCC | 1 | 0 | 0 | 3 |
| 3 | DDD | 0 | 2 | 0 | 0 |
| 4 | EEE | 0 | 1 | 2 | 0 |
where the columns contain the following:
Content: the elements found in the List
Unique: the number of times that the element appears alone in the list
Started: the number of times that the element appears at the beginning
Finished: the number of times that the element appears at the end
Middleway: the number of times the element appears between the beginning and the end.
Using the following query I almost get the result but somehow the group by does not worked correctly
WITH tb1 AS(
SELECT
CAST(touch_array AS STRING) AS touch_list,
EXPLODE(touch_array) AS explode_list,
ROW_NUMBER()OVER(PARTITION BY CAST(touch_array AS STRING) ORDER BY (SELECT 1)) touch_count,
COUNT(*)OVER(PARTITION BY touch_array) touch_lenght
FROM (SELECT SPLIT(List, ',') AS touch_array FROM touchpoints_table)
)
SELECT
explode_list AS Content,
SUM(CASE WHEN touch_lenght=1 THEN 1 ELSE 0 END) AS Unique,
SUM(CASE WHEN touch_count=1 AND touch_lenght > 1 THEN 1 ELSE 0 END) AS Started,
SUM(CASE WHEN touch_count>1 AND touch_count < touch_lenght THEN 1 ELSE 0 END) AS Middleway,
SUM(CASE WHEN touch_count>1 AND touch_count = touch_lenght THEN 1 ELSE 0 END) AS Finished
FROM tb1
GROUP BY explode_list
ORDER BY explode_list
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 0 | 1 | 0 |
| 2 | CCC | 0 | 0 | 0 | 3 |
| 3 | EEE | 0 | 0 | 2 | 0 |
| 4 | BBB | 1 | 1 | 0 | 0 |
| 5 | DDD | 0 | 2 | 0 | 0 |
| 6 | EEE | 0 | 1 | 0 | 0 |
Could you help me by suggesting a code that solves this task?
Here is a way to do this using sql server.
with main_data
as (
select list
,ltrim(x.value) as split_val
,x.ordinal
,case when x.ordinal=1 and tt.path_length=1 then
'unique'
when x.ordinal=1 then
'start'
when x.ordinal=(tt.path_length+1)/2 then
'middle'
when x.ordinal=tt.path_length then
'end'
end as pos
from touchpoints_table tt
CROSS APPLY STRING_SPLIT(list,',',1) x
)
select split_val
,count(case when pos='unique' then 1 end) as unique_cnt
,count(case when pos='start' then 1 end) as start_cnt
,count(case when pos='middle' then 1 end) as middle_cnt
,count(case when pos='end' then 1 end) as end_cnt
from main_data
group by split_val
+-----------+------------+-----------+------------+---------+
| split_val | unique_cnt | start_cnt | middle_cnt | end_cnt |
+-----------+------------+-----------+------------+---------+
| AAA | 0 | 0 | 3 | 1 |
| BBB | 0 | 1 | 0 | 0 |
| CCC | 1 | 0 | 0 | 3 |
| DDD | 0 | 2 | 0 | 0 |
| EEE | 0 | 1 | 0 | 0 |
+-----------+------------+-----------+------------+---------+

Oracle Condition to Only Pull '0' Values

I need to pull ID's with only 0 values for both A and B columns. An example:
+----+------+------+
| ID | A | B |
+----+------+------+
| 1 | null | 123 |
| 2 | 23 | 768 |
| 3 | 0 | 0 |
| 4 | 96 | 0 |
| 5 | 0 | null |
| 6 | 0 | 0 |
+----+------+------+
I have tried several queries, but I am still pulling through values above 0. As there are null values in the table, I have used the NVL(expr1,0) syntax to replace null with 0:
+----+------+------+
| ID | A | B |
+----+------+------+
| 1 | 0 | 123 |
| 2 | 23 | 768 |
| 3 | 0 | 0 |
| 4 | 96 | 0 |
| 5 | 0 | 0 |
| 6 | 0 | 0 |
+----+------+------+
I am using the following in my WHERE clause, and get the below results:
Where status = 'OPEN'
AND a.value IS NULL OR a.value = '0'
AND b.value IS NULL OR b.value = '0'
Output:
+----+----+-----+
| ID | A | B |
+----+----+-----+
| 1 | 0 | 123 |
| 3 | 0 | 0 |
| 5 | 0 | 0 |
| 6 | 0 | 0 |
+----+----+-----+
It seems as though I am pulling only 0 values for A, but I am still getting values above 0 for B. I need to only pull ID's with a value of 0 for both A and B.
I think you just need parentheses:
Where status = 'OPEN' AND
(a.value IS NULL OR a.value = 0) AND
(b.value IS NULL OR b.value = 0)
I like Gordon's answer, but I would use COALESCE here for brevity:
SELECT *
...
WHERE
status = 'OPEN' AND
COALESCE(a.value, 0) = 0 AND
COALESCE(b.value, 0) = 0;
We could also express using a sum:
WHERE
status = 'OPEN' AND
COALESCE(a.value, 0) + COALESCE(b.value, 0) = 0;

Presto SQL - Trying to pull data from multiple columns into one entry to find a unique, missing, or dupe entry

new to SQL/Presto here.
Feel free to point out the obvious if needed.
I have a sub query that pulls data into a table like below.
For each ItemID, 1 would mean that the tag is on, 0 is off.
I am trying to make a query that would pull up each ItemID with its associated tag if its unique, otherwise point out if there is more than one or if its missing.
Data_Table
| ItemID | TagA | TagB | TagC | TagD | TagE |
| 111 | 1 | 1 | 0 | 0 | 0 |
| 222 | 1 | 1 | 1 | 0 | 0 |
| 333 | 1 | 1 | 0 | 0 | 0 |
| 444 | 0 | 1 | 0 | 0 | 0 |
| 555 | 0 | 0 | 0 | 0 | 0 |
| 666 | 0 | 0 | 0 | 1 | 1 |
I tried a case when statement that pull each 1 and another case query that tries to convert each column into just one row entry.
SELECT Item_ID,
CASE WHEN (Tag_A+Tag_B+Tag_C+Tag_D+Tag_E > 1) THEN 'Dupe'
ELSE (CASE WHEN Tag_A = 1 THEN 'TagA_Present'
WHEN Tag_B = 1 THEN 'TagB_Present'
WHEN Tag_C = 1 THEN 'TagC_Present'
WHEN Tag_D = 1 THEN 'TagD_Present'
WHEN Tag_E = 1 THEN 'TagE_Present'
ELSE 'Missing_Tag' END)
END as ItemTag
FROM Data_Table
EDITED - I went too far with the sample data and initial query has been changed.
Actual Results
| ItemID | ItemTag |
| 111 | Dupe |
| 222 | TagA_Present |
| 333 | TagB_Present |
| 444 | TagB_Present |
| 555 | Missing |
| 666 | TagD_Present |
ItemID 111, 222, 333, and 666 should all be 'Dupe', but the results seems to be deeming random ones unique.
Hmmm. I am thinking:
select t.itemId,
(case when (TagA + TagB + TagC + TagD + TagE) > 1 then 'Dupe'
when TagA = 1 then 'TagA'
when TagB = 1 then 'TagB'
when TagC = 1 then 'TagC'
when TagD = 1 then 'TagD'
when TagE = 1 then 'TagE'
else 'Missing'
end) as ItemTag
from Data_Table;
There is no reason to use aggregation for this.

SQL Server Pivot table for survey responses

I have a sql server table called surveys with the following data
+------------+--------------+----+----+----+----+
| ModuleCode | SurveyNumber | Q1 | Q2 | Q3 | Q4 |
+------------+--------------+----+----+----+----+
| NME3519 | 1 | 5 | 4 | 5 | 3 |
| NME3519 | 2 | 3 | 3 | 2 | 1 |
| NME3519 | 3 | 4 | 3 | 2 | 1 |
| NME3520 | 1 | 4 | 3 | 2 | 1 |
| NME3519 | 4 | 4 | 2 | 2 | 1 |
+------------+--------------+----+----+----+----+
I'd like to be able to report on one module at a time and for the result to be something like this:
Count of scores
+----------+---+---+---+---+---+
| Question | 1 | 2 | 3 | 4 | 5 |
+----------+---+---+---+---+---+
| Q1 | 0 | 0 | 1 | 2 | 1 |
| Q2 | 0 | 1 | 2 | 1 | 0 |
| Q3 | 0 | 3 | 0 | 0 | 1 |
| Q4 | 3 | 0 | 1 | 0 | 0 |
+----------+---+---+---+---+---+
I'm pretty sure from other examples I need to unpivot and then pivot but I can't get anywhere with my own data.
Many thanks
Richard
Unpivot and aggregate:
select v.question,
sum(case when v.score = 1 then 1 else 0 end) as score_1,
sum(case when v.score = 2 then 1 else 0 end) as score_2,
sum(case when v.score = 3 then 1 else 0 end) as score_3,
sum(case when v.score = 4 then 1 else 0 end) as score_4,
sum(case when v.score = 5 then 1 else 0 end) as score_5
from responses r cross apply
( values ('Q1', r.q1), ('Q2', r.q2), ('Q3', r.q3), ('Q4', r.q4), ('Q5', r.q5)
) v(question, score)
group by v.question;
This version uses a lateral join for unpivoting. I find the syntax simpler and lateral joins more powerful. Why bother learning unpivot when something else does the same thing more concisely, more powerfully, and has the same performance?
As for the pivoting, it uses conditional aggregation. In my experience with SQL Server, this has pretty much the same performance as pivot.

MySQL: Pivot + Counting

I need help with a SQL that will convert this table:
===================
| Id | FK | Status|
===================
| 1 | A | 100 |
| 2 | A | 101 |
| 3 | B | 100 |
| 4 | B | 101 |
| 5 | C | 100 |
| 6 | C | 101 |
| 7 | A | 102 |
| 8 | A | 102 |
| 9 | B | 102 |
| 10 | B | 102 |
===================
to this:
==========================================
| FK | Count 100 | Count 101 | Count 102 |
==========================================
| A | 1 | 1 | 2 |
| B | 1 | 1 | 2 |
| C | 1 | 1 | 0 |
==========================================
I can so simple counts, etc., but am struggling trying to pivot the table with the information derived. Any help is appreciated.
Use:
SELECT t.fk,
SUM(CASE WHEN t.status = 100 THEN 1 ELSE 0 END) AS count_100,
SUM(CASE WHEN t.status = 101 THEN 1 ELSE 0 END) AS count_101,
SUM(CASE WHEN t.status = 102 THEN 1 ELSE 0 END) AS count_102
FROM TABLE t
GROUP BY t.fk
use:
select * from
(select fk,fk as fk1,statusFK from #t
) as t
pivot
(COUNT(fk1) for statusFK IN ([100],[101],[102])
) AS pt
Just adding a shortcut to #OMG's answer.
You can eliminate CASE statement:
SELECT t.fk,
SUM(t.status = 100) AS count_100,
SUM(t.status = 101) AS count_101,
SUM(t.status = 102) AS count_102
FROM TABLE t
GROUP BY t.fk