Sql query to find count with a difference condition and total count in the same query - sql

Here is a sample table I have
Logs
user_id, session_id, search_query, action
1, 100, dog, A
1, 100, dog, B
2, 101, cat, A
3, 102, ball, A
3, 102, ball, B
3, 102, kite, A
4, 103, ball, A
5, 104, cat, A
where
miss = for the same user_id and same session id , if action A is not followed by action B its termed a miss.
Note: action B can happen only after action A has happened.
I am able to find the count of misses for each unique search_query across all users and sessions.
SELECT l1.search_query, count(l1.*) as misses
FROM logs l1
WHERE NOT EXISTS
(SELECT NULL FROM logs l2
WHERE l1.user_id = l2.user_id
AND l1.session_id = l2.session_id
AND l1.session_id != ''
AND l2.action = 'B'
AND l1.action = 'A')
AND l1.action='A'
AND l1.search_query != ''
GROUP BY v1.search_query
order by misses desc;
I am trying to find the value of miss_percentage=(number of misses/total number of rows)*100 for each unique search_query. I couldn't figure out how to find the count with a condition and count without that condition in the same query. Any help would be great.
expected output:
cat 100
kite 100
ball 50

One way to do it is to move the EXISTS into the count
SELECT l1.search_query, count(case when NOT EXISTS
(SELECT 1 FROM logs l2
WHERE l1.user_id = l2.user_id
AND l1.session_id = l2.session_id
AND l1.search_query = l2.search_query
AND l2.action = 'B'
AND l1.action = 'A') then 1 else null end
)*100.0/count(*) as misses
FROM logs l1
WHERE l1.action='A'
AND l1.search_query != ''
GROUP BY l1.search_query
order by misses desc;
This produces the desired results, but also zeros if no misses were found. This can be removed with a HAVING clause, or postprocessing.
Note I also added the clause l1.search_query = l2.search_query that was missing, since otherwise it was counting kite as succeeded, since there is a row with B in the same session.

I think you just need to use case statements here. If I have understood your problem correctly .. then the solution would be something like this -
WITH summary
AS (
SELECT user_id
,session_id
,search_query
,count(1) AS total_views
,sum(CASE
WHEN action = 'A'
THEN 1
ELSE 0
END) AS action_a
,sum(CASE
WHEN action = 'B'
THEN 1
ELSE 0
END) AS action_b
FROM logs l
GROUP BY user_id
,session_id
,search_query
)
SELECT search_query
,(sum(action_a - action_b) / sum(action_a)) * 100 AS miss_percentage
FROM summary
GROUP BY search_query;

You can allways create two queries, and combine them into one with a join. Then you can do the calculations in the bridging (or joining) SQL statement.
In MS-SQL compatible SQL this would be:
SELECT ActiontypeA,countedA,isNull(countedB,0) as countedB,
(countedA-isNull(countedB,0))*100/CountedA as missed
FROM (SELECT search_query as actionTypeA, count(*) as countedA
FROM logs WHERE Action='A' GROUP BY actionType
) as TpA
LEFT JOIN
(SELECT search_query as actionTypeB, count(*) as countedB
FROM logs WHERE Action='B' GROUP BY actionType
) as TpB
ON TpA.ActionTypeA = TpB.ActiontypeB
The LEFT JOIN is required to select all activities (search_query) from the 'A' results, and join them to only those from the 'B' results where a B is available.
Since this is very basic SQL (and well optimized by SQL engines) I'd suggest to prevent WHERE EXISTS as much as possible. The IsNull() function is an MS-SQL function to force a NULL value into the int(0) value which can be used in a calculation.
Finally you could filter on
WHERE missed>0
to get the final result.

Related

Partition table based on joined table

We have 2 Tables Lead and Task.
One lead can have multiple Tasks.
We want to determine if a Lead has a Task who's description contains String 'x'.
If the Lead has the String the it should belong to group1 if it doesn't to group2.
Then we want to count the leads per group and week.
The problem we have is that if a Lead has several tasks and one of them has string 'x' in its description and the others don't it is counted in both groups.
We would need something that resembles a break; statement in the IFF clause of the subquery, so that if the first condition = Contain string x is satisfied the other tasks are not counted anymore.
How would we achieve that?
So far we have the following statement:
--SQL:
SELECT LeadDate, GROUP, COUNT(LEAD_ID_T1)
FROM LEAD Lead INNER JOIN
(SELECT DISTINCT LEAD.ID AS LEAD_ID_T1,
IFF(CONTAINS(Task.DESCRIPTION,
'x'),
'GROUP1',
'GROUP2') AS GROUP
FROM TASK Task
RIGHT JOIN LEAD ON TASK.WHO_ID = LEAD.ID
) T1 ON T1.LEAD_ID_T1 = LEAD.ID
GROUP BY LeadDate,GROUP;
Code breaks because it can not aggregate the measures.
Really thankful for any input. This has been bothering me for a few days now.
I am thinking EXISTS with a CASE expression:
select l.*,
(case when exists (select 1
from task t
where t.who_id = l.id and
t.description like '%x%'
)
then 'GROUP1' else 'GROUP2'
end) as the_group
from lead l;
You can also try something like this, CASE with 1 and 0 then take the SUM
SELECT LeadDate,
sum(CASE When t.description like '%x%'then 1 else 0 end) as Group1,
sum(CASE When t.description like '%x%'then 0 else 1 end) as Group2
FROM TASK t
RIGHT JOIN LEAD l ON t.WHO_ID = l.ID
GROUP BY LeadDate;

How can I write this select query in SQL Server?

I need to extract some data to analyse exceptions/logs, and I'm stuck at a point.
I have a table with a column called CallType, and a status which can be Success or Failure. This table also has a column called SessionId.
I need to do this:
Select all the SessionId's where all the CallType = 'A' are marked as Success, but there is at least one CallType = 'B' having a Failure for that session.
There will be a where clause to filter out some stuff.
I'm thinking something like:
select top 10 *
from Log nolock
where ProviderId=48 -- add more conditions here
group by SessionId
having --? what should go over here?
I would do this with conditional aggregation in the having clause:
select top 10 *
from Log nolock
where ProviderId=48 -- add more conditions here
group by SessionId
having sum(case when CallType = 'A' and Status = 'Failure' then 1 else 0 end) = 0 and
sum(case when CallType = 'B' and Status = 'Failure' then 1 else 0 end) > 0 and
sum(case when CallType = 'A' and Status = 'Success' then 1 else 0 end) > 0;
The having clause checks for three conditions by counting the number of rows that meet each one. If = 0, then no records are allowed. If > 0 then records are required.
That CallType A has no failures.
That CallType B has at least one failure.
That at least one CallType A success exists.
The third condition is ambiguous -- if is not clear if you actually need CallType As to be in the data, based on the question.
SELECT *
FROM Log L WITH(NOLOCK)
WHERE L.CallType='A'
AND L.[Status] = 'Success'
AND L.ProviderId = 48
AND EXISTS (SELECT 1
FROM Log
WHERE L.SessionID = SessionID
AND CallType='B'
AND [Status] = 'Failure')
Having clause can only operate on aggregates within the group so this isn't the correct way to go about it since you are filtering out other rows you want to check against. I'd use EXISTS for this e.g.
edit: corrected the query
SELECT *
FROM Log L WITH(NOLOCK)
WHERE ProviderId = 48
AND CallType = 'A'
AND Status = 'Success'
AND EXISTS(SELECT * FROM Log WHERE L.SessionId = SessionId AND CallType = 'B' AND Status = 'Failure')
You can essentially filter out rows in the EXISTS part of the query using the aliased Log table (aliased L), matching all rows with the same session ID and seeing if any match the filters you required (failed with call type B)

What SQL query can answer "Do these rows exist?"

Here is the code to create the database:
CREATE TABLE foo (
id TEXT PRIMARY KEY,
value TEXT
);
INSERT INTO foo VALUES(1, 10), (2, 20), (3, 30), (5, 50);
Now I have a set of rows and I want back 0 if the row doesnt exist, 1 if the row exists but is not the same, and 2 if the row exists exactly.
So the result of the query on (1, 11), (2, 20), (4, 40) should be 1, 2, 0.
The reason I want this is to know what query to use to insert the data into the database. If it is a 0, I do a normal insert, if it is a 1 I do an update, and if it is a 2 I skip the row. I know that INSERT OR REPLACE will result in nearly the same rows, but the problem is that it doesnt trigger the correct triggers (it will always trigger an on insert trigger instead of an update trigger or no trigger if the row exists exactly).
Also, I want to do one query with all of the rows, not one query per row.
The idea is to use an aggregation query. Count the number of times that the id matches. If there are none, then return 0. Then check the value to distinguish between 1 and 2:
select (case when max(id = 1) = 0 then 0
when max(id = 1 and value = 11) = 0 then 1
else 2
end) as flag
from table t;
You need to plug the values into the query.
EDIT:
If you want to match a bunch of rows, do something like this:
select testvalue.id,
(case when max(t.id = testvalue.id) = 0 then 0
when max(t.id = testvalue.id and t.value = testvalue.value) = 0 then 1
else 2
end) as flag
from table t cross join
(select 1 as id 10 as value union all
select 2, 20 union all
select 4, 40
) as testvalues
group by testvalues.id;
You can use the EXISTS argument in Transact-SQL. MSDN Documentation.
This returns true if a row exists. You can then use an If statement within that to check if the row is the same or different, and if true, use the RETURN argument with your specified values. MSDN Documentation.
This is based off of Gordon Linoff's answer so upvote him. I just wanted to share what I actually went with:
select testvalues.id,
(case when t.id != testvalues.id then 0
when t.value != testvalues.value then 1
else 2
end) as flag
from (select 1 as id, 11 as entity union all
select 2, 20 union all
select 4, 40
) as testvalues
LEFT OUTER JOIN foo t on testvalues.id=t.id
This prevents the full memory usage of a cross join and group by clauses.

How do I determine if a group of data exists in a table, given the data that should appear in the group's rows?

I am writing data to a table and allocating a "group-id" for each batch of data that is written. To illustrate, consider the following table.
GroupId Value
------- -----
1 a
1 b
1 c
2 a
2 b
3 a
3 b
3 c
3 d
In this example, there are three groups of data, each with similar but varying values.
How do I query this table to find a group that contains a given set of values? For instance, if I query for (a,b,c) the result should be group 1. Similarly, a query for (b,a) should result in group 2, and a query for (a, b, c, e) should result in the empty set.
I can write a stored procedure that performs the following steps:
select distinct GroupId from Groups -- and store locally
for each distinct GroupId: perform a set-difference (except) between the input and table values (for the group), and vice versa
return the GroupId if both set-difference operations produced empty sets
This seems a bit excessive, and I hoping to leverage some other commands in SQL to simplify. Is there a simpler way to perform a set-comparison in this context, or to select the group ID that contains the exact input values for the query?
This is a set-within-sets query. I like to solve it using group by and having:
select groupid
from GroupValues gv
group by groupid
having sum(case when value = 'a' then 1 else 0 end) > 0 and
sum(case when value = 'b' then 1 else 0 end) > 0 and
sum(case when value = 'c' then 1 else 0 end) > 0 and
sum(case when value not in ('a', 'b', 'c') then 1 else - end) = 0;
The first three conditions in the having clause check that each elements exists. The last condition checks that there are no other values. This method is quite flexible, for various exclusions and inclusion conditions on the values you are looking for.
EDIT:
If you want to pass in a list, you can use:
with thelist as (
select 'a' as value union all
select 'b' union all
select 'c'
)
select groupid
from GroupValues gv left outer join
thelist
on gv.value = thelist.value
group by groupid
having count(distinct gv.value) = (select count(*) from thelist) and
count(distinct (case when gv.value = thelist.value then gv.value end)) = count(distinct gv.value);
Here the having clause counts the number of matching values and makes sure that this is the same size as the list.
EDIT:
query compile failed because missing the table alias. updated with right table alias.
This is kind of ugly, but it works. On larger datasets I'm not sure what performance would look like, but the nested instances of #GroupValues key off GroupID in the main table so I think as long as you have a good index on GroupID it probably wouldn't be too horrible.
If Object_ID('tempdb..#GroupValues') Is Not Null Drop Table #GroupValues
Create Table #GroupValues (GroupID Int, Val Varchar(10));
Insert #GroupValues (GroupID, Val)
Values (1,'a'),(1,'b'),(1,'c'),(2,'a'),(2,'b'),(3,'a'),(3,'b'),(3,'c'),(3,'d');
If Object_ID('tempdb..#FindValues') Is Not Null Drop Table #FindValues
Create Table #FindValues (Val Varchar(10));
Insert #FindValues (Val)
Values ('a'),('b'),('c');
Select Distinct gv.GroupID
From (Select Distinct GroupID
From #GroupValues) gv
Where Not Exists (Select 1
From #FindValues fv2
Where Not Exists (Select 1
From #GroupValues gv2
Where gv.GroupID = gv2.GroupID
And fv2.Val = gv2.Val))
And Not Exists (Select 1
From #GroupValues gv3
Where gv3.GroupID = gv.GroupID
And Not Exists (Select 1
From #FindValues fv3
Where gv3.Val = fv3.Val))

SQL Nested Select statements with COUNT()

I'll try to describe as best I can, but it's hard for me to wrap my whole head around this problem let alone describe it....
I am trying to select multiple results in one query to display the current status of a database. I have the first column as one type of record, and the second column as a sub-category of the first column. The subcategory is then linked to more records underneath that, distinguished by status, forming several more columns. I need to display every main-category/subcategory combination, and then the count of how many of each sub-status there are beneath that subcategory in the subsequent columns. I've got it so that I can display the unique combinations, but I'm not sure how to nest the select statements so that I can select the count of a completely different table from the main query. My problem lies in that to display the main category and sub category, I can pull from one table, but I need to count from a different table. Any ideas on the matter would be greatly appreciated
Here's what I have. The count statements would be replaced with the count of each status:
SELECT wave_num "WAVE NUMBER",
int_tasktype "INT / TaskType",
COUNT (1) total,
COUNT (1) "LOCKED/DISABLED",
COUNT (1) released,
COUNT (1) "PARTIALLY ASSEMBLED",
COUNT (1) assembled
FROM (SELECT DISTINCT
(t.invn_need_type || ' / ' || s.code_desc) int_tasktype,
t.task_genrtn_ref_nbr wave_num
FROM sys_code s, task_hdr t
WHERE t.task_genrtn_ref_nbr IN
(SELECT ship_wave_nbr
FROM ship_wave_parm
WHERE TRUNC (create_date_time) LIKE SYSDATE - 7)
AND s.code_type = '590'
AND s.rec_type = 'S'
AND s.code_id = t.task_type),
ship_wave_parm swp
GROUP BY wave_num, int_tasktype
ORDER BY wave_num
Image here: http://i.imgur.com/JX334.png
Guessing a bit,both regarding your problem and Oracle (which I've - unfortunately - never used), hopefully it will give you some ideas. Sorry for completely messing up the way you write SQL, SELECT ... FROM (SELECT ... WHERE ... IN (SELECT ...)) simply confuses me, so I have to restructure:
with tmp(int_tasktype, wave_num) as
(select distinct (t.invn_need_type || ' / ' || s.code_desc), t.task_genrtn_ref_nbr
from sys_code s
join task_hdr t
on s.code_id = t.task_type
where s.code_type = '590'
and s.rec_type = 'S'
and exists(select 1 from ship_wave_parm p
where t.task_genrtn_ref_nbr = p.ship_wave_nbr
and trunc(p.create_date_time) = sysdate - 7))
select t.wave_num "WAVE NUMBER", t.int_tasktype "INT / TaskType",
count(*) TOTAL,
sum(case when sst.sub_status = 'LOCKED' then 1 end) "LOCKED/DISABLED",
sum(case when sst.sub_status = 'RELEASED' then 1 end) RELEASED,
sum(case when sst.sub_status = 'PARTIAL' then 1 end) "PARTIALLY ASSEMBLED",
sum(case when sst.sub_status = 'ASSEMBLED' then 1 end) ASSEMBLED
from tmp t
join sub_status_table sst
on t.wave_num = sst.wave_num
group by t.wave_num, t.int_tasktype
order by t.wave_num
As you notice, I don't know anything about the table with the substatuses.
You can use inner join, grouping and count to get your result:
suppose tables are as follow :
cat (1)--->(n) subcat (1)----->(n) subcat_detail.
so the query would be :
select cat.title cat_title ,subcat.title subcat_title ,count(*) as cnt from
cat inner join sub_cat on cat.id=subcat.cat_id
inner join subcat_detail on subcat.ID=am.subcat_detail_id
group by cat.title,subcat.title
Generally when you need different counts, you need to use the CASE statment.
select count(*) as total
, case when field1 = "test' then 1 else 0 end as testcount
, case when field2 = 'yes' then 1 else 0 endas field2count
FROM table1