How to pull distinct counts from two different tables at once at align rows - sql

I have two separate tables: emr and treatment. Each table has a userID column and a provider column. Currently, I'm doing a simple pull to count the number of distinct userIDs that appear in the emr table like this:
SELECT distinct vender, count (distinct userID) AS EMR_Patients
from emr
group by 1
This gets me the following output:
vender | EMR_Patients
+++++++++++++++++++++
a 10,000
b 5,000
c 37,500
However, I want to include the number of userIDs that also appear in the treatment table so I can see how many userID's that have an emr record and also have a treatment of interest. The output I'm trying to get is:
vender | EMR_Patients| Treatment_Patients
+++++++++++++++++++++++++++++++++++++++++
a 10,000 4,000
b 5,000 3,000
c 37,500 9,000
I tried using a union:
SELECT distinct vender, count (distinct userID) AS EMR_Patients
FROM emr
GROUP BY 1
UNION ALL
(SELECT distinct vender, count (distinct userID) AS Treatment_Patients
FROM treatment
GROUP BY 1)
But this doesn't work correctly. Is there a way to do this as a union, or should I left join the two tables together beforehand? Or maybe there's a cleaner way than either of these options?

Use JOIN:
SELECT e.vendor, e.EMR_Patients, t.Treatment_Patients
FROM (SELECT vendor, count(distinct userID) AS EMR_Patients
FROM emr
GROUP BY 1
) e
(SELECT vendor, count(distinct userID) AS Treatment_Patients
FROM treatment
GROUP BY 1
) t
ON e.vendor = t.vendor;
(I adjusted the spelling of "vendor".)
This will only include vendors in both tables. If you want vendors that are missing from one of the tables, you need an outer join of some sort. Your question is not clear on that.

Related

Get rows from primary table, and also a count of how many times that record appears in a secondary table (including 0's)

I have a Dogs table, a Kennels table and Visits table that contains DogId and KennelId columns.
I am trying to get a full list of all the dogs, with a column showing the number of visits to a particular kennel, so many of the results will contain a 0 as the visit count.
This is what I've tried:
select dog.*, visits.visitCount FROM
(select * from Dogs) as dog,
(select COUNT (Visits.Id) as visitCount from Visits INNER JOIN Dogs ON Dogs.Id =
Visits.DogId where KennelId = 'E15A8C60-E0FE-472D-9CC4-08DA251A992F') as visits
With this statement, I end up with all of the dogs, but with the same visit count for all, which is incorrect. I assume my count function is simply executed once with the result repeated for the remaining rows. I do not know how to correct this. Any help will be much appreciated!
With no table schemas or sample data, a guess would be something like the following:
select d.*, Coalesce(v.VisitCount,0) VisitCount
from dogs d
left join (
select DogId, Count(*) VisitCount
from visits v
where v.KennedId = 'E15A8C60-E0FE-472D-9CC4-08DA251A992F'
group by DogId
)v on v.DogId = d.DogId;

Postgres: Count multiple events for distinct dates

People of Stack Overflow!
Thanks for taking the time to read this question. What I am trying to accomplish is to pivot some data all from just one table.
The original table has multiple datetime entries of specific events (e.g. when the customer was added add_time and when the customer was lost lost_time).
This is one part of two rows of the deals table:
id
add_time
last_mail_time
lost_time
5
2020-03-24 09:29:24
2020-04-03 13:20:29
NULL
310
2020-03-24 09:29:24
NULL
2020-04-03 13:20:29
I want to create a view of this table. A view that has one row for each distinct date and counts the number of events at this specific time.
This is the goal (times do not match with the example!):
I have working code, like this:
SELECT DISTINCT
change_datetime,
(SELECT COUNT(add_time) as add_time_count FROM deals WHERE add_time::date = change_datetime),
(SELECT COUNT(lost_time) as lost_time_count FROM deals WHERE lost_time::date = change_datetime)
FROM (
SELECT
add_time::date AS change_datetime
FROM
deals
UNION ALL
SELECT
lost_time::date AS change_datetime
FROM
deals
) AS foo
WHERE change_datetime IS NOT NULL
ORDER BY
change_datetime;
but this has some ugly O(n2) queries and takes a lot of time.
Is there a better, more performant way to achieve this?
Thanks!!
You can use a lateral join to unpivot and then aggregate:
select t::date,
count(*) filter (where which = 'add'),
count(*) filter (where which = 'mail'),
count(*) filter (where which = 'lost')
from deals d cross join lateral
(values (add_time, 'add'),
(last_mail_time, 'mail'),
(lost_time, 'lost')
) v(t, which)
group by t::date;

Aggregating based on GROUPING of multiple columns

I am trying to subquery and aggregate in SQL after doing an initial query with multiple joins. My ultimate goal is to get a count (or a sum) of specimens tested based on a grouping of multiple columns. This is slightly different from SQL Server query - Selecting COUNT(*) with DISTINCT and SQL Server: aggregate error on grouping.
The three tables that I use (PERSON, SPECIMEN, TEST), have 1-many relationships. So PERSON has many SPECIMENS and those SPECIMENS have many TESTS. I did three inner joins to combine these tables plus an additional table (ANALYSIS).
WITH TALLY as (
SELECT PERSON.NAME, PERSON.PHASE, TEST.DATE_STARTED, TEST.ANALYSIS, SPECIMEN.GROUP, TEST.STATUS,
ANALYSIS.ANALYSIS_TYPE, SPECIMEN.SPECIMEN_NUMBER
FROM DB.TEST
INNER JOIN
DB.SAMPLE ON
TEST.SPECIMEN_NUMBER = SPECIMEN.SPECIMEN_NUMBER
INNER JOIN
DB.PRODUCT ON
SPECIMEN.PERSON = PERSON.NAME
INNER JOIN
DB.ANALYSIS ON
TEST.ANALYSIS = ANALYSIS.NAME
WHERE PERSON.NAME = 'Joe'
AND TEST.DATE_STARTED >= '20-DEC-16' AND TEST.DATE_STARTED <='01-APR-18'
AND PERSON.PHASE = 'PHASE1'
ORDER BY TEST.DATE_STARTED)
SELECT COUNT(DISTINCT ANALYSIS) as SPECIMEN_COUNT, DATE_STARTED, ANALYSIS, STATUS, GROUP, ANALYSIS_TYPE
FROM TALLY
GROUP BY DATE_STARTED, ANALYSIS, STATUS, GROUP, ANALYSIS_TYPE
ORDER BY DATE_STARTED;
This gives me the repeated columns: first grouping repeated 4 times
What I am trying to see is: aggregated first grouping with total count
Any thoughts as to what is missing? SUM instead of COUNT or in addition to COUNT creates an error. Thanks in advance!
9/17/2020 Update: I have tried adding a subquery because I also need to use a new column of metadata (ANALYSIS_TYPE_ALIAS) which is created in the first query through a CASE STATEMENT(...). I have also tried using another subquery with inner join to count based on those conditions to a temp table, but still cannot seem to aggregate to flatten the table. Here is my current attempt:
WITH TALLY as (
SELECT PERSON.NAME, PERSON.PHASE, TEST.DATE_STARTED, TEST.ANALYSIS, SPECIMEN.GROUP, TEST.STATUS,
ANALYSIS.ANALYSIS_TYPE...
FROM DB.TEST
INNER JOIN
DB.SAMPLE ON
TEST.SPECIMEN_NUMBER = SPECIMEN.SPECIMEN_NUMBER
INNER JOIN
DB.PRODUCT ON
SPECIMEN.PERSON = PERSON.NAME
INNER JOIN
DB.ANALYSIS ON
TEST.ANALYSIS = ANALYSIS.NAME
WHERE PERSON.NAME = 'Joe'
AND TEST.DATE_STARTED >= '20-DEC-16' AND TEST.DATE_STARTED <='01-APR-18'
AND PERSON.PHASE = 'PHASE1'
ORDER BY TEST.DATE_STARTED),
SUMMARY_COMBO AS (SELECT DISTINCT(CONCAT(CONCAT(CONCAT(CONCAT(ANALYSIS, DATE_STARTED),STATUS), GROUP), ANALYSIS_TYPE_ALIAS))AS UUID,
TALLY.NAME, TALLY.PHASE, TALLY.DATE_STARTED, TALLY.ANALYSIS, TALLY.GROUP, TALLY.STATUS, TALLY.ANALYSIS_TYPE_ALIAS
FROM TALLY)
SELECT SUMMARY_COMBO.NAME, SUMMARY_COMBO.PHASE, SUMMARY_COMBO.DATE_STARTED, SUMMARY_COMBO.ANALYSIS,SUMMARY_COMBO.GROUP, SUMMARY_COMBO.STATUS, SUMMARY_COMBO.ANALYSIS_TYPE_ALIAS,
COUNT(SUMMARY_COMBO.ANALYSIS) OVER (PARTITION BY SUMMARY_COMBO.UUID) AS SPECIMEN_COUNT
FROM SUMMARY_COMBO
ORDER BY SUMMARY_COMBO.DATE_STARTED;
This gave me the following table Shows aggregated counts, but doesn't aggregate based on unique UUID. Is there a way to take the sum of the count? I've tried to do this by storing count to a subquery and then referencing that count variable, but I am missing something in how to group the 8 columns of data that I want to show + the count of that combination of columns.
Thanks!
Just remove analysis from the group by clause, since that's the column whose distinct values you want to count. Otherwise, the query generates more groups than what you need (and the count of distinct analysis values in each group is always 1).
WITH TALLY as ( ...)
SELECT COUNT(DISTINCT ANALYSIS) as SPECIMEN_COUNT, DATE_STARTED, ANALYSIS, STATUS, GROUP, ANALYSIS_TYPE
FROM TALLY
GROUP BY DATE_STARTED, STATUS, GROUP, ANALYSIS_TYPE
ORDER BY DATE_STARTED;

Get Sum of quantities from multiple tables?

I have at least 8 tables from where I need to match the customer name and fetch the quantities and get the sum of all the quantities fetched from these 8 tables. I am trying to write a code which will ignore the customer whose sum of quantities is zero.
For an example lets take two tables purchase_sugar and sales_sugar I have tried a lot of queries but only this one is returning some result which is wrong.
SELECT sum(purchase_sugar.qty + sales_sugar.qty) AS Total_Amount from purchase_sugar inner join sales_sugar on purchase_sugar.supplier = sales_sugar.customer WHERE purchase_sugar.supplier = "+str(x.id)+"
The Table structures are like:
purchase_sugar have two columns supplier and qty.
And sales_sugar have structure like customer and qty.
How can I get the SUM of QUANTITIES of these tables if I provide one name and search it through these tables and get the quantities. The other thing is that I dont want the customer to be found in all the tables. If it is found in one table we should just get the quantity from that one table and for that reason I don't think that JOIN is useful or may be i am wrong.
To take care of the situation where a supplier/customer is not in all the tables, you can use union all and group by:
select name, sum(p_qty) as sum_p, sum(s_qty) as sum_s,
sum(p_qty) + sum(s_qty)
from ((select ps.supplier as name, ps.qty as p_qty, 0 as s_qty
from purchase_sugar ps
) union all
(select ss.customer as name, 0, ss.qty
from sales_sugar ss
)
) s
group by name;
Notes:
This query gets results for all names. You can use a where clause to restrict the results to one name.
You don't have to split the quantities into two (or eight) different columns, if you just want the overall sum.
You can aggregate before the union all, but that is not necessary.
you should JOIN the sum and not sum the join
select t1.purchase_sum + sales_sum as Total_Amount
from (
select purchase_sugar.supplier, sum(purchase_sugar.qty) as purchase_sum
from purchase_sugar
group by purchase_sugar.supplier
) t1
inner join (
select sales_sugar.customer, sum(sales_sugar.qty) as sales_sum
from sales_sugar
group by sales_sugar.customer
) t2 on t1.supplier = t2.customer and t1.supplier = "+str(x.id)+"

Selecting DISTINCT records on subset of columns with MAX in another column

I have been looking at other T-SQL questions including DISTINCT and MAX here on the site for a couple hours now, but cannot find anything that quite matches my need. Here is a desription of my dataset and query objectives. Any guidance is much appreciated.
Dataset
Dataset is a list of customers, customer sites, dates and values from the last billing cycle, with the following columns. It is possible for a single customer to have multiple sites:
Customer, Site, Date, Counter, CounterValue, CollectorNode
Query Requirements
For the given billing cycle, I would like to select the following
DISTINCT (Customer and Site)
MAX(CounterValue) for this billing cycle for each DISTINCT Customer and Site
While still returning all the fields for that record from the table (CollectorNode, Date, Counter)
My challenge here is my inability to return all the columns while selecting the DISTINCT columns and MAX for each. My many varied attempts return multiple records for each customer/site combination.
Using a self JOIN:
SELECT ds.customer,
ds.site,
ds.counter,
ds.countervalue,
ds.collectornode
FROM DATASET ds
JOIN (SELECT t.customer,
t.site,
MAX(t.countervalue) AS max_countervalue
FROM DATASET t
GROUP BY t.customer, t.site) x ON x.customer = ds.customer
AND x.site = ds.site
AND x.max_countervalue = ds.countervalue
Using a CTE & ROW_NUMBER (SQL Server 2005+):
WITH example AS (
SELECT ds.customer,
ds.site,
ds.counter,
ds.countervalue,
ds.collectornode,
ROW_NUMBER() OVER(PARTITION BY ds.customer, ds.site
ORDER BY ds.countervalue DESC) AS rank
FROM DATASET ds)
SELECT e.customer,
e.site,
e.counter,
e.countervalue,
e.collectornode
FROM example e
WHERE e.rank = 1
Use a subquery to do the grouping and join the result back to the original table, like this:
SELECT g.Customer, g.Site, c.Date, c.Counter, g.MaxCounterValue, c.CollectorNode
FROM Customers c
INNER JOIN
(
SELECT Customer, Site, MAX(CounterValue) MaxCounterValue
FROM Customers
GROUP BY Customer, Site
) g
ON g.Customer = c.Customer
AND g.Site = g.Site