SQL Aggreate Functions - sql

I have table which list a number of cases and assigned primary and secondary technicians. What I am trying to accomplish is to aggregate the number of cases a technician has worked as a primary and secondary tech. Should look something like this...
Technician Primary Secondary
John 4 3
Stacy 3 1
Michael 5 3
The table that I am pulling that data from looks like this:
CaseID, PrimaryTech, SecondaryTech, DOS
In the past I have used something like this, but now my superiors are asking for the number of secondary cases as well...
SELECT PrimaryTech, COUNT(CaseID) as Total
GROUP BY PrimaryTech
I've done a bit of searching, but cant seem to find the answer to my problem.

Select Tech,
sum(case when IsPrimary = 1 then 1 else 0 end) as PrimaryCount,
sum(case when IsPrimary = 0 then 1 else 0 end) as SecondaryCount
from
(
SELECT SecondaryTech as Tech, 0 as IsPrimary
FROM your_table
union all
SELECT PrimaryTech as Tech, 1 as IsPrimary
FROM your_table
) x
GROUP BY Tech

You can group two subqueries together with a FULL JOIN as demonstrated in this SQLFiddle.
SELECT Technician = COALESCE(pri.Technician, sec.Technician)
, PrimaryTech
, SecondaryTech
FROM
(SELECT Technician = PrimaryTech
, PrimaryTech = COUNT(*)
FROM Cases
WHERE PrimaryTech IS NOT NULL
GROUP BY PrimaryTech) pri
FULL JOIN
(SELECT Technician = SecondaryTech
, SecondaryTech = COUNT(*)
FROM Cases
WHERE SecondaryTech IS NOT NULL
GROUP BY SecondaryTech) sec
ON pri.Technician = sec.Technician
ORDER By Technician;

SELECT COALESCE(A.NAME, B.NAME) AS NAME, CASE WHEN A.CASES IS NOT NULL THEN A.CASES ELSE 0 END AS PRIMARY_CASES,
CASE WHEN B.CASES IS NOT NULL THEN B.CASES ELSE 0 END AS SECONDARY_CASES
FROM
(
SELECT COUNT(*) AS CASES, PRIMARYTECH AS NAME FROM YOUR_TABLE
GROUP BY PRIMARYTECH
) AS A
FULL OUTER JOIN
(
SELECT COUNT(*) AS CASES, SECONDARYTECH AS NAME FROM YOUR_TABLE
GROUP BY SECONDARYTECH
) AS B
ON A.NAME = B.NAME

Related

Optimizing code with multple conditions on multiple tables?

I want to check whether these customers have LEAD action or SELL action which both stay in another tables. However, It takes like forever to finish it.
create table ct_nguyendang.visitor
as
select user_id, updated_at::date,
case
when user_id in (select distinct d_visitor_id from xiti.lead_detail) then 'lead'
else 'None'
end as lead_action,
case
when user_id in (select distinct account_id from ct_nguyendang.daily_listor) then 'sell'
else 'None'
end as sell_action
I think you can use union all and aggregation:
select user_id, max(is_lead) as has_lead, max(is_sale) as has_sale
from ((select d_visitor_id as user_id, 1 as is_lead, 0 as is_sale
from xiti.lead_detail
) union all
(select account_id, 0, 1
from ct_nguyendang.daily_listor
)
) ls
group by user_id;
If you have a table of users, then you can use correlated subqueries:
select u.*,
(case when exists (select 1
from xiti.lead_detail l
where u.user_id = l.d_visitor_id
)
then 1 else 0
end) as has_lead,
(case when exists (select 1
from ct_nguyendang.daily_listor s
where u.user_id = s.account_id
)
then 1 else 0
end) as has_sale
from users u;
Note that I prefer using 1 for "true" and 0 for "false". Of course, you can use string values if you prefer.
To optimize this query, you want indexes on xiti.lead_detail(d_visitor_id) and ct_nguyendang.daily_listor(account_id).

How to get count of items present in each category but not present in other categories?

I have a table with different visit_types to hospital. They are Inpatient, Outpatient, Emergency
I would like to know the count of subjects solely present under each visit_type but not in other visit_types. In the above example the
Inpatient count - 4
Outpatient count -2
Emergency count - 3
I tried the below but not sure whether it is accurate?
SELECT count(DISTINCT PERSON_ID) FROM Visit WHERE PERSON_ID NOT IN
(select distinct person_id from Visit where visit_type = 'Inpatient')
AND VISIT_type = 'Outpatient';
SELECT count(DISTINCT PERSON_ID) FROM Visit WHERE PERSON_ID NOT IN
(select distinct person_id from Visit where visit_type = 'Inpatient')
AND VISIT_type = 'Emergency';
When I do this, it includes common subjects between Emergency and Outpatient?
How can I get the count correctly?
With a CTE which returns for each person_id all the types:
with cte as (
select person_id,
sum(case visit_type when 'Inpatient' then 1 else 0 end) Inpatient,
sum(case visit_type when 'Outpatient' then 1 else 0 end) Outpatient,
sum(case visit_type when 'Emergency' then 1 else 0 end) Emergency
from Visit
group by person_id
)
select
case
when Inpatient > 0 then 'Inpatient'
when Outpatient > 0 then 'Outpatient'
when Emergency > 0 then 'Emergency'
end visit_type,
count(*) counter
from cte
group by visit_type
See the demo.
Results:
visit_type | counter
:--------- | ------:
Outpatient | 2
Emergency | 3
Inpatient | 4
I would like to know the count of subjects solely present under each category but not in other categories.
You can aggregate by patient, keeping track of the categories. Then aggregate again:
select visit_type, count(*)
from (select patientId, min(visit_type) as visit_type
from t
group by patientId
having min(visit_type) = max(visit_type)
) p
group by visit_type;
An alternative method uses group by but filters before aggregation:
select visit_type, count(*)
from t
where not exists (select 1
from t t2
where t2.patientid = t.patientid and
t2.visit_type <> t.visit_type
)
group by visit_type;
Note: In this case, the count(*) is counting rows. If your data has duplicates, use count(distinct visit_type).
I have no idea what "I consider Inpatient category as base category" is supposed to mean, but the question itself is quite clear.
EDIT:
I am unclear on the relationships between the different categories that you want. You may find it most flexible to use:
select visit_type, count(*)
from (select patientId,
bool_or(visit_type = 'Inpatient') as has_inpatient,
bool_or(visit_type = 'Outpatient') as has_oupatient,
bool_or(visit_type = 'Emergency') as has_emergency,
count(distinct visit_type) as num_visit_types
from t
group by patientId
) p
where num_visit_types = 1
group by visit_type;
This version is the same as the earlier two queries. But you can use the has_ flags for additional filtering -- for instance where num_visit_types = 1 or (num_visit_types = 2 and has_inpatient) if you want people with one type or one type plus "inpatient".
You can use this query!
SELECT
C.visit_type,
COUNT(*) AS count_per_visit_type
FROM (
SELECT
person_id
FROM (
SELECT
person_id,
ARRAY_AGG(DISTINCT visit_type) AS visit_type_array
FROM visit
GROUP BY person_id
) A
WHERE LENGTH(visit_type_array) = 1
) B
JOIN visit C
ON B.person_id = C.person_id
GROUP BY C.visit_type

Check whether an employee is present on three consecutive days

I have a table called tbl_A with the following schema:
After insert, I have the following data in tbl_A:
Now the question is how to write a query for the following scenario:
Put (1) in front of any employee who was present three days consecutively
Put (0) in front of employee who was not present three days consecutively
The output screen shoot:
I think we should use case statement, but I am not able to check three consecutive days from date. I hope I am helped in this
Thank you
select name, case when max(cons_days) >= 3 then 1 else 0 end as presence
from (
select name, count(*) as cons_days
from tbl_A, (values (0),(1),(2)) as a(dd)
group by name, adate + dd
)x
group by name
With a self-join on name and available = 'Y', we create an inner table with different combinations of dates for a given name and take a count of those entries in which the dates of the two instances of the table are less than 2 units apart i.e. for each value of a date adate, it will check for entries with its own value adate as well as adate + 1 and adate + 2. If all 3 entries are present, the count will be 3 and you will have a flag with value 1 for such names(this is done in the outer query). Try the below query:
SELECT Z.NAME,
CASE WHEN Z.CONSEQ_AVAIL >= 3 THEN 1 ELSE 0 END AS YOUR_FLAG
FROM
(
SELECT A.NAME,
SUM(CASE WHEN B.ADATE >= A.ADATE AND B.ADATE <= A.ADATE + 2 THEN 1 ELSE 0 END) AS CONSEQ_AVAIL
FROM
TABL_A A INNER JOIN TABL_A B
ON A.NAME = B.NAME AND A.AVAILABLE = 'Y' AND B.AVAILABLE = 'Y'
GROUP BY A.NAME
) Z;
Due to the complexity of the problem, I have not been able to test it out. If something is really wrong, please let me know and I will be happy to take down my answer.
--Below is My Approch
select Name,
Case WHen Max_Count>=3 Then 1 else 0 end as Presence
from
(
Select Name,MAx(Coun) as Max_Count
from
(
select Name, (count(*) over (partition by Name,Ref_Date)) as Coun from
(
select Name,adate + row_number() over (partition by Name order by Adate desc) as Ref_Date
from temp
where available='Y'
)
) group by Name
);
select name as employee , case when sum(diff) > =3 then 1 else 0 end as presence
from
(select id, name, Available,Adate, lead(Adate,1) over(order by name) as lead,
case when datediff(day, Adate,lead(Adate,1) over(order by name)) = 1 then 1 else 0 end as diff
from table_A
where Available = 'Y') A
group by name;

How to get first and last record in HiveSQL if key is different

I need to get the first and last record for a user if one of the key fields is different over time using a Hive table:
This is some sample data:
UserID EntryDate Activity
a3324 1/1/16 walk
a3324 1/2/16 walk
a3324 1/3/16 walk
a3324 1/4/16 run
a5613 1/1/16 walk
a5613 1/2/16 walk
a5613 1/3/16 walk
a5613 1/4/16 walk
And I'm looking for output preferably like this:
a3324 1/1/16 walk 1/4/16 run
Or at least like this:
a3324 walk run
I start writing code like this:
SELECT UserID, MINIMUM(EntryDate), MAXIMUM(EntryDate), Activity
FROM
SELECT UserID, DISTINCT Activity
GROUP BY UserID
HAVING Count(Activity) > 1
But I know that's not it.
I'd also like to be able to specify the cases where the original activity was Walk and the second activity was Run perhaps in the Where clause.
Can you help with an approach?
Thanks
You can use lag /lead to get a solution
SELECT * FROM (
select UserID ,EntryDate , Activityslec,
lead(Activityslec, 1) over (UserID ,EntryDate ) as nextActivityslec
from table) as A
where Activityslec <> nextActivityslec
SELECT
t.UserId
,MIN(CASE WHEN t.RowNumAsc = 1 THEN t.EntryDate END) as MinEntryDate
,MIN(CASE WHEN t.RowNumAsc = 1 THEN t.Activity END) as MinActivity
,MAX(CASE WHEN t.RowNumDesc = 1 THEN t.EntryDate END) as MaxEntryDate
,MAX(CASE WHEN t.RowNumDesc = 1 THEN t.Activity END) as MaxActivity
FROM
(
SELECT
UserId
,EntryDate
,Activity
,ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY EntryDate) as RowNumAsc
,ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY EntryDate DESC) as RowNumDesc
FROM
Table
) t
WHERE
t.RowNumAsc = 1
OR t.RowNumDesc = 1
GROUP BY
t.UserId
Looks like window functions are supported (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+WindowingAndAnalytics) so using 2 row numbers 1 for EntryDate Ascending and another for Descending with Conditional Aggregation should get you to the answer.
And if you don't want to use Analytic Functions (window functions) you can use self left joins and conditional aggregation:
SELECT
t.UserId
,MIN(CASE WHEN mn.UserId IS NULL THEN t.EntryDate END) as MinEntryDate
,MIN(CASE WHEN mn.UserId IS NULL THEN t.Activity END) as MinActivity
,MAX(CASE WHEN mx.UserId IS NULL THEN t.EntryDate END) as MaxEntryDate
,MAX(CASE WHEN mx.UserId IS NULL THEN t.Activity END) as MaxActivity
FROM
Table t
LEFT JOIN Table mn
ON t.UserId = mn.UserId
AND t.EntryDate > mn.EntryDate
LEFT JOIN Table mx
ON t.UserId = mx.UserId
AND t.EntryDate < mx.EntryDate
WHERE
mn.UserId IS NULL
OR mx.UserId IS NULL
GROUP BY
t.UserId
Or a correlated Sub Query way:
SELECT
UserId
,MIN(EntryDate) as MinEntryDate
,(SELECT
Activity
FROM
Activity a
WHERE
u.UserId = a.UserId
AND a.EntryDate = MIN(u.EntryDate)
LIMIT 1
) as MinActivity
,MAX(EntryDate) as MaxEntryDate
,(SELECT
Activity
FROM
Activity a
WHERE
u.UserId = a.UserId
AND a.EntryDate = MAX(u.EntryDate)
LIMIT 1
) as MaxActivity
FROM
Activity u
GROUP BY
UserId

Subselect Query Improvement

How can I improve the SQL query below (SQL Server 2008)? I want to try to avoid sub-selects, and I'm using a couple of them to produce results like this
StateId TotalCount SFRCount OtherCount
---------------------------------------------------------
AZ 102 50 52
CA 2931 2750 181
etc...
SELECT
StateId,
COUNT(*) AS TotalCount,
(SELECT COUNT(*) AS Expr1 FROM Property AS P2
WHERE (PropertyTypeId = 1) AND (StateId = P.StateId)) AS SFRCount,
(SELECT COUNT(*) AS Expr1 FROM Property AS P3
WHERE (PropertyTypeId <> 1) AND (StateId = P.StateId)) AS OtherCount
FROM Property AS P
GROUP BY StateId
HAVING (COUNT(*) > 99)
ORDER BY StateId
This may work the same, hard to test without data
SELECT
StateId,
COUNT(*) AS TotalCount,
SUM(CASE WHEN PropertyTypeId = 1 THEN 1 ELSE 0 END) as SFRCount,
SUM(CASE WHEN PropertyTypeId <> 1 THEN 1 ELSE 0 END) as OtherCount
FROM Property AS P
GROUP BY StateId
HAVING (COUNT(*) > 99)
ORDER BY StateId
Your alternative is a single self-join of Property using your WHERE conditions as a join parameter. The OtherCount can be derived by subtracting the TotalCount - SFRCount in a derived query.
Another alternative would be to use the PIVOT function like this:
SELECT StateID, [1] + [2] AS TotalCount, [1] AS SFRCount, [2] AS OtherCount
FROM Property
PIVOT ( COUNT(PropertyTypeID)
FOR PropertyTypeID IN ([1],[2])
) AS pvt
WHERE [1] + [2] > 99
You would need to add an entry for each property type which could be daunting but it is another alternative. Scott has a great answer.
If PropertyTypeId is not null then you could do this with a single join. Count is faster than Sum. But is Count plus Join faster than Sum. The test case below mimics your data. docSVsys has 800,000 rows and there are about 300 unique values for caseID. The Count plus Join in this test case is slightly faster than the Sum. But if I remove the with (nolock) then Sum is about 1/4 faster. You would need to test with your data.
select GETDATE()
go;
select caseID, COUNT(*) as Ttl,
SUM(CASE WHEN mimeType = 'message/rfc822' THEN 1 ELSE 0 END) as SFRCount,
SUM(CASE WHEN mimeType <> 'message/rfc822' THEN 1 ELSE 0 END) as OtherCount,
COUNT(*) - SUM(CASE WHEN mimeType = 'message/rfc822' THEN 1 ELSE 0 END) as OtherCount2
from docSVsys with (nolock)
group by caseID
having COUNT(*) > 1000
select GETDATE()
go;
select docSVsys.caseID, COUNT(*) as Ttl
, COUNT(primaryCount.sID) as priCount
, COUNT(*) - COUNT(primaryCount.sID) as otherCount
from docSVsys with (nolock)
left outer join docSVsys as primaryCount with (nolock)
on primaryCount.sID = docSVsys.sID
and primaryCount.mimeType = 'message/rfc822'
group by docSVsys.caseID
having COUNT(*) > 1000
select GETDATE()
go;