Roll up multiple rows into one - sql

I have a table that looks like this:
USER_ID,ADDED_DATE,STATUS,COMPLETION_ID_TYPE,QA_OPTION,QA_OPTION_COUNT
12543,2020-06-01 00:00:00,qaComplete_L2,chart,Correct,3
12543,2020-06-01 00:00:00,qaComplete_L2,chart,Incorrect,3
12543,2020-06-12 00:00:00,qaComplete_L2,chart,Incorrect,1
12543,2020-06-12 00:00:00,qaComplete_L2,chart,Correct,1
I want to display the results as:
USER_ID ADDED_DATE STATUS COMPLETION_ID_TYPE L2 Correct L2 InCorrect
8388 6/01/20 0:00 qaComplete_L2 chart 3 3
8388 6/12/20 0:00 qaComplete_L2 chart 1 1
I have tried this but not getting the results I am expecting:
select distinct user_id,
added_date,
status,
completion_id_type,
max(case
when qa_option = 'Correct'
then qa_option_count
else 0
end) as L2_Correct,
max(case
when qa_option = 'Incorrect'
then qa_option_count
else 0
end) as L2_Incorrect
from qa_report2
where user_id = 12543
and status = 'qaComplete_L2'
group by user_id, status, added_date, completion_id_type,qa_option, qa_option_count
order by user_id, added_date;
;
USER_ID,ADDED_DATE,STATUS,COMPLETION_ID_TYPE,L2_CORRECT,L2_INCORRECT
12543,2020-06-01 00:00:00,qaComplete_L2,chart,0,3
12543,2020-06-01 00:00:00,qaComplete_L2,chart,3,0
12543,2020-06-12 00:00:00,qaComplete_L2,chart,1,0
12543,2020-06-12 00:00:00,qaComplete_L2,chart,0,1

You were almost there :)
I only removed the distinct and two last group by columns. Columns you need in the calculation, shouldn't appear in the group by clause, but only in the group function in the select clause.
So in the end, what I think you're looking for is:
select user_id,
added_date,
status,
completion_id_type,
max(case
when qa_option = 'Correct'
then qa_option_count
else 0
end) as L2_Correct,
max(case
when qa_option = 'Incorrect'
then qa_option_count
else 0
end) as L2_Incorrect
from qa_report2
where user_id = 12543
and status = 'qaComplete_L2'
group by user_id,
status,
added_date,
completion_id_type
--,qa_option
--,qa_option_count
order by user_id,
added_date;
Note: You should be aware that you're using max(), I can imagine that if multiple records exist, you actualy want to use sum(), but that really depends on your use case.

You can use the PIVOT to achieve it.
SELECT *
FROM (
SELECT USER_ID,
ADDED_DATE,
STATUS,
COMPLETION_ID_TYPE,
QA_OPTION_COUNT,
QA_OPTION
FROM QA_REPORT2
WHERE USER_ID = 12543
AND STATUS = 'qaComplete_L2'
) PIVOT (
MAX ( QA_OPTION_COUNT )
FOR QA_OPTION
IN ( 'Correct' AS L2_CORRECT, 'Incorrect' AS L2_INCORRECT )
);

Related

SQL to return 1 or 0 depending on values in a column's audit trail

If I were to have a table such as the one below:
id_
last_updated_by
1
robot
1
human
1
robot
2
robot
3
robot
3
human
Using SQL, how could I group by the ID and create a new column to indicate whether a human has ever updated the record like this:
id_
last_updated_by
updated_by_human
1
robot
1
2
robot
0
3
robot
1
UPDATE
I'm currently doing the following, though I'm not sure how efficient this is. Selecting the latest record and then merging it with my calculated column via a sub-select.
SELECT MAIN.TRANSACTION_ID,
MAIN.CREATED_DATE
MAIN.CREATED_BY_USER_ID,
MAIN.OWNER_USER_ID,
STP.TOUCHED_BY_HUMAN
FROM (
SELECT TRANSACTION_ID,
CREATED_DATE
CREATED_BY_USER_ID_
OWNER_USER_ID_
FROM TABLE_NAME
WHERE CREATED_DATE >= CAST('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= CAST('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY row_number() OVER (partition by TRANSACTION_ID order by End_Dt desc) = 1
) MAIN
LEFT JOIN (
SELECT TRANSACTION_ID,
CASE
WHEN CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE') OR
CREATED_BY_USER_ID LIKE 'N%' OR
CREATED_BY_USER_ID IS NULL
THEN 0
ELSE 1 END AS CREATED_BY_HUMAN,
CASE
WHEN OWNER_USER_ID IN ('ROBOT', 'MACHINE') OR
OWNER_USER_ID LIKE 'N%' OR
OWNER_USER_ID IS NULL
THEN 0
ELSE 1 END AS OWNED_BY_HUMAN,
CASE
WHEN CREATED_BY_HUMAN = 0 AND
OWNED_BY_HUMAN = 0
THEN 0
ELSE 1 END AS TOUCHED_BY_HUMAN_
FROM TABLE_NAME
WHERE CREATED_DATE >= CAST('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= CAST('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY row_number() OVER (partition by TRANSACTION_ID order by TOUCHED_BY_HUMAN_ desc) = 1
) STP
ON MAIN.TRANSACTION_ID = STP.TRANSACTION_ID
If I'm following your problem, then something like this should work.
SELECT
t.*
,CASE WHEN a.id IS NOT NULL THEN 1 ELSE 0 END AS updated_by_human
FROM table t
LEFT JOIN (SELECT DISTINCT id FROM table WHERE last_updated_by = 'human') a ON t.id = a.id
That takes care of the updated_by_human field, but if you also need to reduce the records in table (only keeping a subset) then you need more information to do that.
Exists clauses are usually not that performant but if your data isn't big this should work.
select id_,
IF (EXISTS (SELECT 1 FROM table_name t2 WHERE t2.last_updated_by = 'human' and t2.id_ = t1.id_), 1, 0) AS updated_by_human
from table_name t1;
here is another way
SELECT *
FROM table_name t1
GROUP BY ti.id_
HAVING COUNT(*) > 0
AND MAX(CASE t1.last_updated_by WHEN 'human' THEN 1 ELSE 0 END) = 1;
Since you didn't specified which column is used to determine this record is the newest record added by a given id, I assume that there will be a column to track the insert/modify timestamp (which is pretty standard table design), let's put it is last_updated_timestamp (if you don't have any, then I still insist you to have one as an auditing trail without timestamp does not make sense)
Given your table name is updating_trail
SELECT updating_trail.*, last_update_trail.modified_by_human
FROM updating_trail
INNER JOIN (
-- determine the id_, the lastest modified_timestamp, and a flag check to determine if there is any record with last_update_by is 'human' -> if yes then give 1
SELECT updating_trail.id_, MAX(last_update_timestamp) AS most_recent_update_ts, MAX(CASE WHEN updating_trail.last_updated_by = 'human' THEN 1 ELSE 0 END) AS modified_by_human
FROM updating_trail
GROUP BY updating_trail.id_
) last_update_trail
ON updating_trail.id_ = last_update_trail.id_ AND updating_trail.last_update_timestamp = last_update_trail.most_recent_update_ts;
Give
id_
last_updated_by
last_update_timestamp
modified_by_human
1
robot
2021-10-19T20:00:00.000Z
1
2
robot
2021-10-19T17:00:00.000Z
0
3
robot
2021-10-19T16:00:00.000Z
1
Check out this sample db fiddle I created for you
This is a 1:1 translation of your query to conditional aggregation:
SELECT TRANSACTION_ID,
CREATED_DATE,
CREATED_BY_USER_ID,
OWNER_USER_ID,
Max(CASE
WHEN CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE') OR
CREATED_BY_USER_ID LIKE 'N%' OR
CREATED_BY_USER_ID IS NULL
THEN 0
ELSE 1
END) Over (PARTITION BY TRANSACTION_ID) AS CREATED_BY_HUMAN
FROM Table_Name
WHERE CREATED_DATE >= Cast('{start_date} 00:00:00' AS TIMESTAMP)
AND CREATED_DATE <= Cast('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY Row_Number() Over (PARTITION BY TRANSACTION_ID ORDER BY End_Dt DESC) = 1

Single SQL query for getting count based of 2 condition in same table

I have data like this
Now I need a single query to get count of id where Info is 'Yes' and count of id which are in both 'yes' and 'no'
Single query for:
SELECT COUNT(id) FROM table WHERE info = 'yes'
and
SELECT COUNT(id) FROM table WHERE info = 'yes' AND info = 'No'
Since
Id having Yes are 7 (1,2,3,4,5,6,7)
and Id having and Yes and No both vaules are only 3 (1,4, 6)
it should give me id_as_yes = 7 and id_as_yes_no = 3
You can do it with aggregation and window functions:
SELECT DISTINCT
SUM(MAX(CASE WHEN info = 'yes' THEN 1 ELSE 0 END)) OVER () id_as_yes,
COUNT(CASE WHEN COUNT(DISTINCT info) = 2 THEN 1 END) OVER () id_as_yes_no
FROM tablename
GROUP BY id
See the demo.
Results:
> id_as_yes | id_as_yes_no
> --------: | -----------:
> 7 | 3
You need conditional aggregation.
Select id,
Count(case when info = 'y' then 1 end) as y_count,
Count(case when info = 'y' and has_n = 1 then 1 end) as yn_count
From (SELECT id, info,
Max(case when info = 'no' then 1 end) over (partirion by id) as has_n
From your_table) t
You can do this without a subquery. This relies on the observation that the number of ids that are "no" only is:
count(distinct id) - count(distinct case when info = 'yes' then id end)
And similarly for the number of yeses. So, the number that have both is the number of ids minus the number of no only minus the number of yes only:
select count(distinct case when info = 'yes' then id end) as num_yeses,
(count(distinct id) -
(count(distinct id) - count(distinct case when info = 'yes' then id end)) -
(count(distinct id) - count(distinct case when info = 'no' then id end))
)
from t;
This should do the trick...it's definitely not efficient or elegant, but no null value aggregate warnings
dbFiddle link
Select
(select count(distinct id) from mytest where info = 'yes') as yeses
,(select count(distinct id) from mytest where info = 'no' and id in (select distinct id from mytest where info = 'yes' )) as [yes and no]

Optimizing code with multple conditions on multiple tables?

I want to check whether these customers have LEAD action or SELL action which both stay in another tables. However, It takes like forever to finish it.
create table ct_nguyendang.visitor
as
select user_id, updated_at::date,
case
when user_id in (select distinct d_visitor_id from xiti.lead_detail) then 'lead'
else 'None'
end as lead_action,
case
when user_id in (select distinct account_id from ct_nguyendang.daily_listor) then 'sell'
else 'None'
end as sell_action
I think you can use union all and aggregation:
select user_id, max(is_lead) as has_lead, max(is_sale) as has_sale
from ((select d_visitor_id as user_id, 1 as is_lead, 0 as is_sale
from xiti.lead_detail
) union all
(select account_id, 0, 1
from ct_nguyendang.daily_listor
)
) ls
group by user_id;
If you have a table of users, then you can use correlated subqueries:
select u.*,
(case when exists (select 1
from xiti.lead_detail l
where u.user_id = l.d_visitor_id
)
then 1 else 0
end) as has_lead,
(case when exists (select 1
from ct_nguyendang.daily_listor s
where u.user_id = s.account_id
)
then 1 else 0
end) as has_sale
from users u;
Note that I prefer using 1 for "true" and 0 for "false". Of course, you can use string values if you prefer.
To optimize this query, you want indexes on xiti.lead_detail(d_visitor_id) and ct_nguyendang.daily_listor(account_id).

Subqueries in MSSQL producing NULL values

I am trying to determine my store only accounts revenue from the database, to do this I need to look through all account numbers with revenue against a 'store' description who do NOT appear in a list of accounts with an 'online' description which I have tried todo in the subquery below. The query runs however it just returns NULL values in my store_only_revenue column. Any guidance on what to do from here would be appreciated. Am I approaching the problem in a good way? Or is there a better solution:
SELECT
town,
financial_pd as month,
SUM(CASE WHEN [Descr] = 'online' THEN Net_Revenue ELSE 0 END) as online_revenue,
SUM(CASE WHEN [Descr] = 'store' THEN Net_Revenue ELSE 0 END) as store_revenue,
COUNT(DISTINCT CASE WHEN [Descr] = 'online' THEN Account_Number ELSE NULL END) as online_accounts,
COUNT(DISTINCT CASE WHEN [Descr] = 'store' THEN Account_Number ELSE NULL END) as store_accounts,
(SELECT
SUM(Net_Revenue)
FROM [mydb].[dbo].[mytable]
WHERE
Descr = 'store'
AND Account_Number
NOT IN(
SELECT DISTINCT Account_Number
FROM [mydb].[dbo].[mytable]
WHERE
Descr = 'online')
) as store_only_revenue
FROM [mydb].[dbo].[mytable] as orders
WHERE
Group_name = 'T'
AND NOT
Type_name_1 = 'Electronic'
AND
Account_type <> 1
AND
Total_Value > 0
AND
(Insert_Date BETWEEN '2016-05-30' AND '2016-07-03'
OR
Insert_Date BETWEEN '2015-05-25' AND '2015-06-28')
OR
(Insert_Date BETWEEN '2016-05-30' AND '2016-07-03'
AND
Insert_Date BETWEEN '2015-05-25' AND '2015-06-28')
GROUP BY
town,
financial_pd as period
This expression is suspect:
Account_Number NOT IN (SELECT DISTINCT t.Account_Number
FROM [mydb].[dbo].mytable t
WHERE t.Descr = 'online'
)
Assuming that the syntax problems are typos (missing table name, desc is a reserved word), then this will never return true if even one Account_Number is NULL. One way to fix this is:
Account_Number NOT IN (SELECT t.Account_Number
FROM [mydb].[dbo].mytable t
WHERE t.Desc = 'online' AND t.Account_Number IS NOT NULL
)
I would use NOT EXISTS:
not exists (select 1
from [mydb].[dbo].??? x
where x.Desc = 'online' AND ??.Account_Number = x.Account_Number
)
You need to use proper table aliases for this to work. Either of these solutions may fix your problem.

Using Rank or OVER() to create 1 or zero column SQL SERVER [duplicate]

I think I need some guidance as to what is wrong in my query. I am trying to do
Watched_Gladiator=CASE WHEN FilmName IN (CASE WHEN FilmName LIKE '%Gladiator%' THEN 1 END) then OVER(PARTITION BY Cust_Nr) THEN 1 ELSE 0 END
Tried this one too:
Watched_Gladiator=CASE WHEN FilmName IN (CASE WHEN FilmName LIKE '%Gladiator%' THEN Filmnamne END) then OVER(PARTITION BY Cust_Nr) THEN 1 ELSE 0 END
The Error I am currently getting is this:
Incorrect syntax near the keyword 'OVER'.
This is basically how my data looks like
Cust_Nr Date FilmName Watched Gladiator
157649306 20150430 Gladiator 1
158470722 20150504 Nick Cave: 20,000 Days On Earth 0
158467945 20150504 Out Of The Furnace 0
158470531 20150504 FilmA 0
157649306 20150510 Gladiator 1
158470722 20150515 Gladiator 1
I want to create a column (1 or zero) that shows if the customer has watched Gladiator then 1 ELSE 0. How can I do that?
I created a test column trying with a simple LIKE '%Gladiator%' THEN 1 ELSE 0. The problem with this solution is that it will show 1(one) more than once if the customer has watched multiple times. I only need 1 or zero.
I feel I am really close to finding a solution. I am very new to using OVER() and CASE WHEN but enjoying the thrill:=)
So you're saying that:
SELECT Cust_Nr, Date, FilmName,
CASE WHEN FilmName LIKE '%Gladiator%' THEN 1 ELSE 0 END as WatchedGladiator
FROM YourTable
WHERE YourColumn = #somevalue
Doesn't work? Because according to the data you've given, it should.
EDIT:
Well based on Tim's comment, I would simply add this bit to the query.
SELECT Cust_Nr, Date, FilmName, WatchedGladiator
FROM
(
SELECT Cust_Nr, Date, FilmName,
CASE WHEN FilmName LIKE '%Gladiator%' THEN 1 ELSE 0 END as WatchedGladiator
FROM YourTable
WHERE YourColumn = #somevalue
) as wg
WHERE WatchedGladiator = 1
The following does what you want for all films:
select r.*,
(case when row_number() over (partition by filmname order by date) = 1
then 1 else 0
end) as IsWatchedFirstAndGladiator
from results r;
For just Gladiator:
select r.*,
(case when filmname = 'Gladiator' and row_number() over (partition by filmname order by date) = 1
then 1 else 0
end) as IsWatchedFirst
from results r;
So you want to group by customer and add a column if this customer watched a specific film?
You could do:
SELECT Cust_Nr, MAX(Watched_Gladiator)
FROM( SELECT Cust_Nr,
Watched_Gladiator = CASE WHEN EXISTS
(
SELECT 1 FROM CustomerFilm c2
WHERE c2.Cust_Nr = c1.Cust_Nr
AND c2.FilmName LIKE '%Gladiator%'
) THEN 1 ELSE 0 END
FROM CustomerFilm c1 ) X
GROUP BY Cust_Nr
Demo
But it would be easier if you used the customer-table instead of this table, then you don't need the group-by.
Try grouping up to the cust/film level:
select
cust_nbr,
case when film_name like '%Gladiator%' then 1 else 0 end
from
(
select
cust_nbr,
film_name
from
<your table>
group by
cust_nbr,
film_name
) t
Or, as an alternative:
select distinct cust_nbr
from
<your table>
where
filmname = 'Gladiator'