I am working on a SQL query in the Azure Databricks environment that has the following dataset:
CREATE OR REPLACE TABLE touchpoints_table
(
List STRING,
Path_Lenght INT
);
INSERT INTO touchpoints_table VALUES
('BBB, AAA, CCC', 3),
('BBB', 1),
('DDD, AAA', 2),
('DDD, BBB, AAA, EEE, CCC', 5),
('EEE, AAA, EEE, CCC', 4);
SELECT * FROM touchpoints_table
| | List | Path_length |
| 0 | BBB, AAA, CCC | 3 |
| 1 | CCC | 1 |
| 2 | DDD, AAA | 2 |
| 3 | DDD, BBB, AAA, EEE, CCC | 5 |
| 4 | EEE, AAA, EEE, CCC | 4 |
and the task consists of generating the following table:
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 1 | 1 | 0 |
| 2 | CCC | 1 | 0 | 0 | 3 |
| 3 | DDD | 0 | 2 | 0 | 0 |
| 4 | EEE | 0 | 1 | 2 | 0 |
where the columns contain the following:
Content: the elements found in the List
Unique: the number of times that the element appears alone in the list
Started: the number of times that the element appears at the beginning
Finished: the number of times that the element appears at the end
Middleway: the number of times the element appears between the beginning and the end.
Using the following query I almost get the result but somehow the group by does not worked correctly
WITH tb1 AS(
SELECT
CAST(touch_array AS STRING) AS touch_list,
EXPLODE(touch_array) AS explode_list,
ROW_NUMBER()OVER(PARTITION BY CAST(touch_array AS STRING) ORDER BY (SELECT 1)) touch_count,
COUNT(*)OVER(PARTITION BY touch_array) touch_lenght
FROM (SELECT SPLIT(List, ',') AS touch_array FROM touchpoints_table)
)
SELECT
explode_list AS Content,
SUM(CASE WHEN touch_lenght=1 THEN 1 ELSE 0 END) AS Unique,
SUM(CASE WHEN touch_count=1 AND touch_lenght > 1 THEN 1 ELSE 0 END) AS Started,
SUM(CASE WHEN touch_count>1 AND touch_count < touch_lenght THEN 1 ELSE 0 END) AS Middleway,
SUM(CASE WHEN touch_count>1 AND touch_count = touch_lenght THEN 1 ELSE 0 END) AS Finished
FROM tb1
GROUP BY explode_list
ORDER BY explode_list
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 0 | 1 | 0 |
| 2 | CCC | 0 | 0 | 0 | 3 |
| 3 | EEE | 0 | 0 | 2 | 0 |
| 4 | BBB | 1 | 1 | 0 | 0 |
| 5 | DDD | 0 | 2 | 0 | 0 |
| 6 | EEE | 0 | 1 | 0 | 0 |
Could you help me by suggesting a code that solves this task?
Here is a way to do this using sql server.
with main_data
as (
select list
,ltrim(x.value) as split_val
,x.ordinal
,case when x.ordinal=1 and tt.path_length=1 then
'unique'
when x.ordinal=1 then
'start'
when x.ordinal=(tt.path_length+1)/2 then
'middle'
when x.ordinal=tt.path_length then
'end'
end as pos
from touchpoints_table tt
CROSS APPLY STRING_SPLIT(list,',',1) x
)
select split_val
,count(case when pos='unique' then 1 end) as unique_cnt
,count(case when pos='start' then 1 end) as start_cnt
,count(case when pos='middle' then 1 end) as middle_cnt
,count(case when pos='end' then 1 end) as end_cnt
from main_data
group by split_val
+-----------+------------+-----------+------------+---------+
| split_val | unique_cnt | start_cnt | middle_cnt | end_cnt |
+-----------+------------+-----------+------------+---------+
| AAA | 0 | 0 | 3 | 1 |
| BBB | 0 | 1 | 0 | 0 |
| CCC | 1 | 0 | 0 | 3 |
| DDD | 0 | 2 | 0 | 0 |
| EEE | 0 | 1 | 0 | 0 |
+-----------+------------+-----------+------------+---------+
Related
I am working on a SQL query in the Azure Databricks environment that has the following dataset:
CREATE OR REPLACE TABLE touchpoints_table
(
List STRING,
Path_Lenght INT
);
INSERT INTO touchpoints_table VALUES
('BBB, AAA, CCC', 3),
('BBB', 1),
('DDD, AAA', 2),
('DDD, BBB, AAA, EEE, CCC', 5),
('EEE, AAA, EEE, CCC', 4);
SELECT * FROM touchpoints_table
| | List | Path_length |
| 0 | BBB, AAA, CCC | 3 |
| 1 | CCC | 1 |
| 2 | DDD, AAA | 2 |
| 3 | DDD, BBB, AAA, EEE, CCC | 5 |
| 4 | EEE, AAA, EEE, CCC | 4 |
and the task consists of generating the following table:
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 1 | 1 | 0 |
| 2 | CCC | 1 | 0 | 0 | 3 |
| 3 | DDD | 0 | 2 | 0 | 0 |
| 4 | EEE | 0 | 1 | 2 | 0 |
where the columns contain the following:
Content: the elements found in the List
Unique: the number of times that the element appears alone in the list
Started: the number of times that the element appears at the beginning
Finished: the number of times that the element appears at the end
Middleway: the number of times the element appears between the beginning and the end.
Using the following query I almost get the result but somehow the group by does not worked correctly
WITH tb1 AS(
SELECT
CAST(touch_array AS STRING) AS touch_list,
EXPLODE(touch_array) AS explode_list,
ROW_NUMBER()OVER(PARTITION BY CAST(touch_array AS STRING) ORDER BY (SELECT 1)) touch_count,
COUNT(*)OVER(PARTITION BY touch_array) touch_lenght
FROM (SELECT SPLIT(List, ',') AS touch_array FROM touchpoints_table)
)
SELECT
explode_list AS Content,
SUM(CASE WHEN touch_lenght=1 THEN 1 ELSE 0 END) AS Unique,
SUM(CASE WHEN touch_count=1 AND touch_lenght > 1 THEN 1 ELSE 0 END) AS Started,
SUM(CASE WHEN touch_count>1 AND touch_count < touch_lenght THEN 1 ELSE 0 END) AS Middleway,
SUM(CASE WHEN touch_count>1 AND touch_count = touch_lenght THEN 1 ELSE 0 END) AS Finished
FROM tb1
GROUP BY explode_list
ORDER BY explode_list
| | Content | Unique | Started | Middleway | Finished |
| 0 | AAA | 0 | 0 | 3 | 1 |
| 1 | BBB | 0 | 0 | 1 | 0 |
| 2 | CCC | 0 | 0 | 0 | 3 |
| 3 | EEE | 0 | 0 | 2 | 0 |
| 4 | BBB | 1 | 1 | 0 | 0 |
| 5 | DDD | 0 | 2 | 0 | 0 |
| 6 | EEE | 0 | 1 | 0 | 0 |
Could you help me by suggesting a code that solves this task?
Query example for SQL Server
with allElements as(
select list ,el,elN,elQty
from touchpoints_table tp
cross apply (select trim(value) as el,row_number()over(order by (select 1)) elN
,count(*)over() elQty
from string_split(tp.list,',')
) t
)
select el
,sum(case when elQty=1 then 1 else 0 end) as 'unique'
,sum(case when elN=1 and elQty>1 then 1 else 0 end) as 'strated'
,sum(case when elN>1 and elN<elQty then 1 else 0 end) as 'middleway'
,sum(case when elN>1 and elN=elQty then 1 else 0 end) as 'finished'
from allElements
group by el
order by el
Demo
I have the following T-SQL code:
select
id,
(case
when n in(Bla1', 'Bla2') then 1
when n = 'Bla3' then 99
else 0
end) as c
from
hello
Running this code outputs this result:
| id | c |
+--------+----+
| 577140 | 0 |
| 577140 | 1 |
| 577140 | 0 |
| 577140 | 0 |
| 577140 | 99 |
| 577141 | 0 |
| 577141 | 0 |
| 577141 | 0 |
| 577142 | 0 |
| 577142 | 0 |
| 577142 | 1 |
How can I modify the code to get the following output?
| id | c |
+--------+----+
| 577140 | 99 |
| 577141 | 0 |
| 577142 | 1 |
Rule
For each id: If 99 exists, then c becomes 99. If not, either 1 or 0, depending if any 1 exists.
You can use aggregation:
select id,
max(case when n in ('Bla1', 'Bla2') then 1
when n = 'Bla3' then 99
else 0
end) as c
from hello
group by id;
new to SQL/Presto here.
Feel free to point out the obvious if needed.
I have a sub query that pulls data into a table like below.
For each ItemID, 1 would mean that the tag is on, 0 is off.
I am trying to make a query that would pull up each ItemID with its associated tag if its unique, otherwise point out if there is more than one or if its missing.
Data_Table
| ItemID | TagA | TagB | TagC | TagD | TagE |
| 111 | 1 | 1 | 0 | 0 | 0 |
| 222 | 1 | 1 | 1 | 0 | 0 |
| 333 | 1 | 1 | 0 | 0 | 0 |
| 444 | 0 | 1 | 0 | 0 | 0 |
| 555 | 0 | 0 | 0 | 0 | 0 |
| 666 | 0 | 0 | 0 | 1 | 1 |
I tried a case when statement that pull each 1 and another case query that tries to convert each column into just one row entry.
SELECT Item_ID,
CASE WHEN (Tag_A+Tag_B+Tag_C+Tag_D+Tag_E > 1) THEN 'Dupe'
ELSE (CASE WHEN Tag_A = 1 THEN 'TagA_Present'
WHEN Tag_B = 1 THEN 'TagB_Present'
WHEN Tag_C = 1 THEN 'TagC_Present'
WHEN Tag_D = 1 THEN 'TagD_Present'
WHEN Tag_E = 1 THEN 'TagE_Present'
ELSE 'Missing_Tag' END)
END as ItemTag
FROM Data_Table
EDITED - I went too far with the sample data and initial query has been changed.
Actual Results
| ItemID | ItemTag |
| 111 | Dupe |
| 222 | TagA_Present |
| 333 | TagB_Present |
| 444 | TagB_Present |
| 555 | Missing |
| 666 | TagD_Present |
ItemID 111, 222, 333, and 666 should all be 'Dupe', but the results seems to be deeming random ones unique.
Hmmm. I am thinking:
select t.itemId,
(case when (TagA + TagB + TagC + TagD + TagE) > 1 then 'Dupe'
when TagA = 1 then 'TagA'
when TagB = 1 then 'TagB'
when TagC = 1 then 'TagC'
when TagD = 1 then 'TagD'
when TagE = 1 then 'TagE'
else 'Missing'
end) as ItemTag
from Data_Table;
There is no reason to use aggregation for this.
My table returns results as following (skips row if HourOfDay does not have data for particular ID)
ID HourOfDay Counts
--------------------------
1 5 5
1 13 10
1 23 3
..........................HourOfDay up till 23
2 9 1
and so on.
What I am trying to achieve is to force showing rows displaying 0 for HoursOfDay, which don't have data, like following:
ID HourOfDay Counts
--------------------------
1 0 0
1 1 0
1 2 0
1......................
1 5 5
1 6 0
1......................
1 23 3
2 0 0
2 1 0
etc.
I have researched around about it. It looks like I can achieve this result if I create an extra table and outer join it. So I have created table variable in SP (as a temp workaround)
DECLARE #Hours TABLE
(
[Hour] INT NULL
);
INSERT INTO #Hours VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)
,(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23);
However, no matter how I join it, it does not achieve desired result.
How do I proceed? Do I add extra columns to join on? Completely different approach? Any hint in the right direction is appreciated!
Using a derived table for the distinct Ids cross joined to #Hours, left joined to your table:
select
i.Id
, h.Hour
, coalesce(t.Counts,0) as Counts
from (select distinct Id from t) as i
cross join #Hours as h
left join t
on i.Id = t.Id
and h.Hour = t.HourOfDay
rextester demo: http://rextester.com/XFZYX88502
returns:
+----+------+--------+
| Id | Hour | Counts |
+----+------+--------+
| 1 | 0 | 0 |
| 1 | 1 | 0 |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 4 | 0 |
| 1 | 5 | 5 |
| 1 | 6 | 0 |
| 1 | 7 | 0 |
| 1 | 8 | 0 |
| 1 | 9 | 0 |
| 1 | 10 | 0 |
| 1 | 11 | 0 |
| 1 | 12 | 0 |
| 1 | 13 | 10 |
| 1 | 14 | 0 |
| 1 | 15 | 0 |
| 1 | 16 | 0 |
| 1 | 17 | 0 |
| 1 | 18 | 0 |
| 1 | 19 | 0 |
| 1 | 20 | 0 |
| 1 | 21 | 0 |
| 1 | 22 | 0 |
| 1 | 23 | 3 |
| 2 | 0 | 0 |
| 2 | 1 | 0 |
| 2 | 2 | 0 |
| 2 | 3 | 0 |
| 2 | 4 | 0 |
| 2 | 5 | 0 |
| 2 | 6 | 0 |
| 2 | 7 | 0 |
| 2 | 8 | 0 |
| 2 | 9 | 1 |
| 2 | 10 | 0 |
| 2 | 11 | 0 |
| 2 | 12 | 0 |
| 2 | 13 | 0 |
| 2 | 14 | 0 |
| 2 | 15 | 0 |
| 2 | 16 | 0 |
| 2 | 17 | 0 |
| 2 | 18 | 0 |
| 2 | 19 | 0 |
| 2 | 20 | 0 |
| 2 | 21 | 0 |
| 2 | 22 | 0 |
| 2 | 23 | 0 |
+----+------+--------+
I need help with a SQL that will convert this table:
===================
| Id | FK | Status|
===================
| 1 | A | 100 |
| 2 | A | 101 |
| 3 | B | 100 |
| 4 | B | 101 |
| 5 | C | 100 |
| 6 | C | 101 |
| 7 | A | 102 |
| 8 | A | 102 |
| 9 | B | 102 |
| 10 | B | 102 |
===================
to this:
==========================================
| FK | Count 100 | Count 101 | Count 102 |
==========================================
| A | 1 | 1 | 2 |
| B | 1 | 1 | 2 |
| C | 1 | 1 | 0 |
==========================================
I can so simple counts, etc., but am struggling trying to pivot the table with the information derived. Any help is appreciated.
Use:
SELECT t.fk,
SUM(CASE WHEN t.status = 100 THEN 1 ELSE 0 END) AS count_100,
SUM(CASE WHEN t.status = 101 THEN 1 ELSE 0 END) AS count_101,
SUM(CASE WHEN t.status = 102 THEN 1 ELSE 0 END) AS count_102
FROM TABLE t
GROUP BY t.fk
use:
select * from
(select fk,fk as fk1,statusFK from #t
) as t
pivot
(COUNT(fk1) for statusFK IN ([100],[101],[102])
) AS pt
Just adding a shortcut to #OMG's answer.
You can eliminate CASE statement:
SELECT t.fk,
SUM(t.status = 100) AS count_100,
SUM(t.status = 101) AS count_101,
SUM(t.status = 102) AS count_102
FROM TABLE t
GROUP BY t.fk