Aggregating based on GROUPING of multiple columns - sql

I am trying to subquery and aggregate in SQL after doing an initial query with multiple joins. My ultimate goal is to get a count (or a sum) of specimens tested based on a grouping of multiple columns. This is slightly different from SQL Server query - Selecting COUNT(*) with DISTINCT and SQL Server: aggregate error on grouping.
The three tables that I use (PERSON, SPECIMEN, TEST), have 1-many relationships. So PERSON has many SPECIMENS and those SPECIMENS have many TESTS. I did three inner joins to combine these tables plus an additional table (ANALYSIS).
WITH TALLY as (
SELECT PERSON.NAME, PERSON.PHASE, TEST.DATE_STARTED, TEST.ANALYSIS, SPECIMEN.GROUP, TEST.STATUS,
ANALYSIS.ANALYSIS_TYPE, SPECIMEN.SPECIMEN_NUMBER
FROM DB.TEST
INNER JOIN
DB.SAMPLE ON
TEST.SPECIMEN_NUMBER = SPECIMEN.SPECIMEN_NUMBER
INNER JOIN
DB.PRODUCT ON
SPECIMEN.PERSON = PERSON.NAME
INNER JOIN
DB.ANALYSIS ON
TEST.ANALYSIS = ANALYSIS.NAME
WHERE PERSON.NAME = 'Joe'
AND TEST.DATE_STARTED >= '20-DEC-16' AND TEST.DATE_STARTED <='01-APR-18'
AND PERSON.PHASE = 'PHASE1'
ORDER BY TEST.DATE_STARTED)
SELECT COUNT(DISTINCT ANALYSIS) as SPECIMEN_COUNT, DATE_STARTED, ANALYSIS, STATUS, GROUP, ANALYSIS_TYPE
FROM TALLY
GROUP BY DATE_STARTED, ANALYSIS, STATUS, GROUP, ANALYSIS_TYPE
ORDER BY DATE_STARTED;
This gives me the repeated columns: first grouping repeated 4 times
What I am trying to see is: aggregated first grouping with total count
Any thoughts as to what is missing? SUM instead of COUNT or in addition to COUNT creates an error. Thanks in advance!
9/17/2020 Update: I have tried adding a subquery because I also need to use a new column of metadata (ANALYSIS_TYPE_ALIAS) which is created in the first query through a CASE STATEMENT(...). I have also tried using another subquery with inner join to count based on those conditions to a temp table, but still cannot seem to aggregate to flatten the table. Here is my current attempt:
WITH TALLY as (
SELECT PERSON.NAME, PERSON.PHASE, TEST.DATE_STARTED, TEST.ANALYSIS, SPECIMEN.GROUP, TEST.STATUS,
ANALYSIS.ANALYSIS_TYPE...
FROM DB.TEST
INNER JOIN
DB.SAMPLE ON
TEST.SPECIMEN_NUMBER = SPECIMEN.SPECIMEN_NUMBER
INNER JOIN
DB.PRODUCT ON
SPECIMEN.PERSON = PERSON.NAME
INNER JOIN
DB.ANALYSIS ON
TEST.ANALYSIS = ANALYSIS.NAME
WHERE PERSON.NAME = 'Joe'
AND TEST.DATE_STARTED >= '20-DEC-16' AND TEST.DATE_STARTED <='01-APR-18'
AND PERSON.PHASE = 'PHASE1'
ORDER BY TEST.DATE_STARTED),
SUMMARY_COMBO AS (SELECT DISTINCT(CONCAT(CONCAT(CONCAT(CONCAT(ANALYSIS, DATE_STARTED),STATUS), GROUP), ANALYSIS_TYPE_ALIAS))AS UUID,
TALLY.NAME, TALLY.PHASE, TALLY.DATE_STARTED, TALLY.ANALYSIS, TALLY.GROUP, TALLY.STATUS, TALLY.ANALYSIS_TYPE_ALIAS
FROM TALLY)
SELECT SUMMARY_COMBO.NAME, SUMMARY_COMBO.PHASE, SUMMARY_COMBO.DATE_STARTED, SUMMARY_COMBO.ANALYSIS,SUMMARY_COMBO.GROUP, SUMMARY_COMBO.STATUS, SUMMARY_COMBO.ANALYSIS_TYPE_ALIAS,
COUNT(SUMMARY_COMBO.ANALYSIS) OVER (PARTITION BY SUMMARY_COMBO.UUID) AS SPECIMEN_COUNT
FROM SUMMARY_COMBO
ORDER BY SUMMARY_COMBO.DATE_STARTED;
This gave me the following table Shows aggregated counts, but doesn't aggregate based on unique UUID. Is there a way to take the sum of the count? I've tried to do this by storing count to a subquery and then referencing that count variable, but I am missing something in how to group the 8 columns of data that I want to show + the count of that combination of columns.
Thanks!

Just remove analysis from the group by clause, since that's the column whose distinct values you want to count. Otherwise, the query generates more groups than what you need (and the count of distinct analysis values in each group is always 1).
WITH TALLY as ( ...)
SELECT COUNT(DISTINCT ANALYSIS) as SPECIMEN_COUNT, DATE_STARTED, ANALYSIS, STATUS, GROUP, ANALYSIS_TYPE
FROM TALLY
GROUP BY DATE_STARTED, STATUS, GROUP, ANALYSIS_TYPE
ORDER BY DATE_STARTED;

Related

How to pull distinct counts from two different tables at once at align rows

I have two separate tables: emr and treatment. Each table has a userID column and a provider column. Currently, I'm doing a simple pull to count the number of distinct userIDs that appear in the emr table like this:
SELECT distinct vender, count (distinct userID) AS EMR_Patients
from emr
group by 1
This gets me the following output:
vender | EMR_Patients
+++++++++++++++++++++
a 10,000
b 5,000
c 37,500
However, I want to include the number of userIDs that also appear in the treatment table so I can see how many userID's that have an emr record and also have a treatment of interest. The output I'm trying to get is:
vender | EMR_Patients| Treatment_Patients
+++++++++++++++++++++++++++++++++++++++++
a 10,000 4,000
b 5,000 3,000
c 37,500 9,000
I tried using a union:
SELECT distinct vender, count (distinct userID) AS EMR_Patients
FROM emr
GROUP BY 1
UNION ALL
(SELECT distinct vender, count (distinct userID) AS Treatment_Patients
FROM treatment
GROUP BY 1)
But this doesn't work correctly. Is there a way to do this as a union, or should I left join the two tables together beforehand? Or maybe there's a cleaner way than either of these options?
Use JOIN:
SELECT e.vendor, e.EMR_Patients, t.Treatment_Patients
FROM (SELECT vendor, count(distinct userID) AS EMR_Patients
FROM emr
GROUP BY 1
) e
(SELECT vendor, count(distinct userID) AS Treatment_Patients
FROM treatment
GROUP BY 1
) t
ON e.vendor = t.vendor;
(I adjusted the spelling of "vendor".)
This will only include vendors in both tables. If you want vendors that are missing from one of the tables, you need an outer join of some sort. Your question is not clear on that.

Is there a way to use DISTINCT and COUNT(*) together to bulletproof your code against DUPLICATE entries?

I got help with a function yesterday to correctly get the count of multiple items in a column based on multiple criteria/columns. However, if there is a way to get the DISTINCT count of all the entries in the table based on aggregated GROUP BY statement.
SELECT TIME = ap.day,
acms.tenantId,
acms.CallingService,
policyList = ltrim(sp.value),
policyInstanceList = ltrim(dp.value),
COUNT(*) AS DISTINCTCount
FROM dbo.acms_data acms
CROSS APPLY string_split(acms.policyList, ',') sp
CROSS APPLY string_split(acms.policyInstanceList, ',') dp
CROSS APPLY (select day = convert(date, acms.[Time])) ap
GROUP BY ap.day, acms.tenantId, sp.value, dp.value, acms.CallingService
I would just like to know if there would be a way to see if there is a workaround for using DISTINCT and Count(*) together and whether or not it would affect my results to make this algorithm potentially invulnerable to duplicate entries.
The reason why I have to use COUNT(*) is because I am aggregating based on every column in the table not just a specific column or multiple.
We can use DISTINCT with COUNT together like this example.
USE AdventureWorks2012
GO
-- This query shows 290 JobTitle
SELECT COUNT(JobTitle) Total_JobTitle
FROM [HumanResources].[Employee]
GO
-- This query shows only 67 JobTitle
SELECT COUNT( DISTINCT JobTitle) Total_Distinct_JobTitle
FROM [HumanResources].[Employee]
GO

Get Sum of quantities from multiple tables?

I have at least 8 tables from where I need to match the customer name and fetch the quantities and get the sum of all the quantities fetched from these 8 tables. I am trying to write a code which will ignore the customer whose sum of quantities is zero.
For an example lets take two tables purchase_sugar and sales_sugar I have tried a lot of queries but only this one is returning some result which is wrong.
SELECT sum(purchase_sugar.qty + sales_sugar.qty) AS Total_Amount from purchase_sugar inner join sales_sugar on purchase_sugar.supplier = sales_sugar.customer WHERE purchase_sugar.supplier = "+str(x.id)+"
The Table structures are like:
purchase_sugar have two columns supplier and qty.
And sales_sugar have structure like customer and qty.
How can I get the SUM of QUANTITIES of these tables if I provide one name and search it through these tables and get the quantities. The other thing is that I dont want the customer to be found in all the tables. If it is found in one table we should just get the quantity from that one table and for that reason I don't think that JOIN is useful or may be i am wrong.
To take care of the situation where a supplier/customer is not in all the tables, you can use union all and group by:
select name, sum(p_qty) as sum_p, sum(s_qty) as sum_s,
sum(p_qty) + sum(s_qty)
from ((select ps.supplier as name, ps.qty as p_qty, 0 as s_qty
from purchase_sugar ps
) union all
(select ss.customer as name, 0, ss.qty
from sales_sugar ss
)
) s
group by name;
Notes:
This query gets results for all names. You can use a where clause to restrict the results to one name.
You don't have to split the quantities into two (or eight) different columns, if you just want the overall sum.
You can aggregate before the union all, but that is not necessary.
you should JOIN the sum and not sum the join
select t1.purchase_sum + sales_sum as Total_Amount
from (
select purchase_sugar.supplier, sum(purchase_sugar.qty) as purchase_sum
from purchase_sugar
group by purchase_sugar.supplier
) t1
inner join (
select sales_sugar.customer, sum(sales_sugar.qty) as sales_sum
from sales_sugar
group by sales_sugar.customer
) t2 on t1.supplier = t2.customer and t1.supplier = "+str(x.id)+"

How can I COUNT rows from another table using a SELECT statement when joining?

this is the first time I've tried including a row count within a select statement. I've tried the following but including COUNT(other row) is apparently not allowed in the way I'm trying to do it.
How can I include a row count from another table in a select statement, mainly consisting of objects from the first table?
-Thanks
...
SELECT
Reports.ReportID,
EmployeeADcontext,
ReportName,
CreatedDate,
COUNT(Expenses.ExpID) AS ExpCount,
ReportTotal,
Status
FROM
[dbo].[Reports]
INNER JOIN
[dbo].[Expenses]
ON
[dbo].[Expenses].ReportID = [dbo].[Reports].ReportID
WHERE EmployeeADcontext = #rptEmployeeADcontext
You are missing your GROUP BY. Whenever you aggregate (SUM, COUNT, MAX, etc..) you always need to include a GROUP BY statement that includes all visible fields except your aggregated fields. So your code should read:
SELECT
Reports.ReportID,
EmployeeADcontext,
ReportName,
CreatedDate,
COUNT(Expenses.ExpID) AS ExpCount,
ReportTotal,
Status
FROM
[dbo].[Reports]
INNER JOIN
[dbo].[Expenses]
ON
[dbo].[Expenses].ReportID = [dbo].[Reports].ReportID
WHERE EmployeeADcontext = #rptEmployeeADcontext
GROUP BY Reports.ReportID, EmployeeADcontext, ReportName, CreatedDate,
ReportTotal, Status
Here is some additional documentation on T-SQL GROUP BY.
You need a group by clause.
Add:
GROUP BY
Reports.ReportID,
EmployeeADcontext,
ReportName,
CreatedDate,
ReportTotal,
Status
You could use a sub-query to return the count. That way you don't need any joins. For example:
SELECT
r.ReportID,
r.EmployeeADcontext,
r.ReportName,
r.CreatedDate,
(select COUNT(e1.ExpID) FROM Expenses e1 where e1.ReportID = r.ReportId) AS ExpCount,
r.ReportTotal,
r.Status
FROM Reports r
WHERE r.EmployeeADcontext = #rptEmployeeADcontext

Selecting DISTINCT records on subset of columns with MAX in another column

I have been looking at other T-SQL questions including DISTINCT and MAX here on the site for a couple hours now, but cannot find anything that quite matches my need. Here is a desription of my dataset and query objectives. Any guidance is much appreciated.
Dataset
Dataset is a list of customers, customer sites, dates and values from the last billing cycle, with the following columns. It is possible for a single customer to have multiple sites:
Customer, Site, Date, Counter, CounterValue, CollectorNode
Query Requirements
For the given billing cycle, I would like to select the following
DISTINCT (Customer and Site)
MAX(CounterValue) for this billing cycle for each DISTINCT Customer and Site
While still returning all the fields for that record from the table (CollectorNode, Date, Counter)
My challenge here is my inability to return all the columns while selecting the DISTINCT columns and MAX for each. My many varied attempts return multiple records for each customer/site combination.
Using a self JOIN:
SELECT ds.customer,
ds.site,
ds.counter,
ds.countervalue,
ds.collectornode
FROM DATASET ds
JOIN (SELECT t.customer,
t.site,
MAX(t.countervalue) AS max_countervalue
FROM DATASET t
GROUP BY t.customer, t.site) x ON x.customer = ds.customer
AND x.site = ds.site
AND x.max_countervalue = ds.countervalue
Using a CTE & ROW_NUMBER (SQL Server 2005+):
WITH example AS (
SELECT ds.customer,
ds.site,
ds.counter,
ds.countervalue,
ds.collectornode,
ROW_NUMBER() OVER(PARTITION BY ds.customer, ds.site
ORDER BY ds.countervalue DESC) AS rank
FROM DATASET ds)
SELECT e.customer,
e.site,
e.counter,
e.countervalue,
e.collectornode
FROM example e
WHERE e.rank = 1
Use a subquery to do the grouping and join the result back to the original table, like this:
SELECT g.Customer, g.Site, c.Date, c.Counter, g.MaxCounterValue, c.CollectorNode
FROM Customers c
INNER JOIN
(
SELECT Customer, Site, MAX(CounterValue) MaxCounterValue
FROM Customers
GROUP BY Customer, Site
) g
ON g.Customer = c.Customer
AND g.Site = g.Site