Counting DISTINCT in FULL OUTER JOIN - sql

I'm sure there's a simple solution to this, which my pea brain is unable to comprehend right now.
I'm using the following query with a FULL OUTER JOIN and I would like to COUNT the DISTINCT memberid:
SELECT a.year,
COUNT(DISTINCT a.memberid) AS members
FROM (SELECT DISTINCT YEAR,
memberid
FROM (SELECT EXTRACT(YEAR FROM created_at) AS YEAR,
EXTRACT(MONTH FROM created_at) AS MONTH,
member_id AS memberid,
COUNT(DISTINCT field1) AS field1
FROM table1
GROUP BY YEAR,
MONTH,
member_id
ORDER BY YEAR,
MONTH,
eids DESC)) a
FULL OUTER JOIN (SELECT DISTINCT YEAR,
memberid
FROM (SELECT EXTRACT(YEAR FROM created) AS YEAR,
EXTRACT(MONTH FROM created) AS MONTH,
memberid,
COUNT(field2) AS field2
FROM table2
GROUP BY YEAR,
MONTH,
memberid
ORDER BY YEAR,
MONTH,
questions DESC)) b
ON a.year = b.year
AND a.memberid = b.memberid
GROUP BY a.year
ORDER BY a.year
This query executes properly, but I'm quite sure that the results are not what I expect.
I get the following results:
2014 26834
2015 58573
2016 178378
2017 233291
2018 297404
2019 281088
Let's call the queries on either side of the FULL OUTER JOIN as Left query and Right query for now. When I aggregate the Right query on year and count the distinct memberid, I get the following results:
2013 3915
2014 59025
2015 115514
2016 176528
2017 216675
2018 301007
2019 311141
As we can see, the results (DISTINCT COUNT) for the Right query itself is higher than the complete query having the FULL OUTER JOIN. This obviously doesn't make sense.
In my final result, I would like to run a COUNT DISTINCT on ALL the memberid (i.e. the memberid that appear in Left query, plus the memberid that appear in the Right query, without counting any memberid twice) and aggregate it by year.
I know the solution to this has to be simple. Any help would be much appreciated.

You are only counting a.memberid, which means anything from the right side is ignored.
To make this work you should do a union between the left and right side, and then just count(distinct memberid)

Related

How can I return a row for each group even if there were no results?

I'm working with a database containing customer orders. These orders contain the customer id, order month, order year, order half month( either first half 'FH' or last half 'LH' of the month), and quantity ordered.
I want to query monthly totals for each customer for given month. Here's what I have so far.
SELECT id, half_month, month, year, SUM(nbr_ord)
FROM Orders
WHERE month = 7
AND year = 2015
GROUP BY id, half_month, year, month
The problem with this is that if a customer did not order anything during one half_month there will not be a row returned for that period.
I want there to be a row for each customer for every half month. If they didn't order anything during a half month then a row should be returned with their id, the month, year, half month, and 0 for number ordered.
First, generate all the rows, which you can do with a cross join of the customers and the time periods. Then, bring in the information for the aggregation:
select i.id, t.half_month, t.month, t.year, coalesce(sum(nbr_ord), 0)
from (select distinct id from orders) i cross join
(select distinct half_month, month, year
from orders
where month = 7 and year = 2015
) t left join
orders o
on o.id = i.id and o.half_month = t.half_month and
o.month = t.month and o.year = t.year
group by i.id, t.half_month, t.month, t.year;
Note: you might have other sources for the id and date parts. This pulls them from orders.
IF you know the entire dataset has an occurance of each half_month, month, year combination you could use the listing of those 3 things as the left side of a left join. That would look like this:
Select t1.half_month, t1.month, t1.year, t2.ID, t2.nbr_ord from
(Select half_month, month, year)t1
Left Join
(SELECT id, half_month, month, year, SUM(nbr_ord)nbr_ord
FROM Orders
WHERE month = 7
AND year = 2015
GROUP BY id, half_month, year, month)t2
on t1.half_month = t2.half_month
and t1.month = t2.month
and t1.year = t2.year
SELECT m.id, m.half_month, m.year, t.nbr_order
FROM (
SELECT Id, sum(nbr_order) AS nbr_order
FROM Orders
GROUP BY id
) t
INNER JOIN Orders m
ON t.Id = m.id
WHERE m.month = 7
AND m.year = 2015;

refering to field out of subeselects scope

I'm working on a piece of SQL at the moment and i need to retrieve every row of a dataset with a median and an average aggregated in it.
Example
i have the following set
ID;month;value
and i would like to retrieve something like :
ID;month;value;average for this month;median for this month
without having to group by my result.
So it would be something like :
SELECT ID,month,value,
(SELECT AVG(value) FROM myTable) as "myAVG"
FROM myTable
but i would need that average to be the average for that month specifically. So, rows where the month="January" will have the average and median for "January" etc ...
Issue here is that i did not find a way to refer to the value of month in my subquery
(SELECT AVG(value) FROM myTable)
Does someone have a clue?
P.S: It's a redshift database i'm working on.
You would need to select all rows from the table, and do a left join with a select statement that does group by month. This way, you would get every row, and the group by results with them for that month.
Something like this:
SELECT * FROM myTable a
LEFT JOIN
(
SELECT Month, Sum(value being summed) as mySum
FROM myTable
GROUP BY Month
) b
ON a.Month = b.Month
Helpful?
with myavg as
(SELECT month, AVG(value) as avgval FROM myTable group by month)
, mymed as
(select month, median(value) as medval from myTable group by month)
select ID, month, value, ma.avgval, mm.medval
from mytable m left join myavg ma
on m.month = ma.month
left join mymed mm
on m.month = mm.month
You can use a cte to do this. However, you need a group by on month, as you are calculating an aggregate value.
In Redshift you can use Window Function.
select month,
avg(value) over
(PARTITION BY month rows unbounded preceding) as avg
from myTable
order by 1;

Summarize Table Based on Two Date Fields

I have a table that, in its simplified form, has two date fields and an amount field. One of the date fields is holds the order date, and one of the fields contains the shipped date. I've been asked to report on both the amounts ordered and shipped grouped by date.
I used a self join that seemed to be working fine, except I found that it doesn't work on dates where no new orders were taken, but orders were shipped. I'd appreciate any help figuring out how best to solve the problem. (See below)
Order_Date Shipped_Date Amount
6/1/2015 6/2/2015 10
6/1/2015 6/3/2015 15
6/2/2015 6/3/2015 17
The T-SQL statement I'm using is as follows:
select a.ddate, a.soldamt, b.shippedamt
from
(select order_date as ddate, sum(amount) as soldamt from TABLE group by order_date) a
left join
(select shipped_date as ddate, sum(amount) as shippedamt from TABLE group by shipped_date) b
on a.order_date = b.shipped_date
This results in:
ddate soldamt shippedamt
6/1/2015 15 0
6/2/2015 17 10
The amount shipped on 6/3/2015 doesn't appear, obviously because there are no new orders on that date.
It's important to note this is being done in a Visual FoxPro table using T-SQL syntax, so some of the features found in more popular databases do not exist (for example, PIVOT)
The simplest change would be to use a FULL OUTER JOIN instead of LEFT. A full join combines both right and left joins including unmatched records in both directions.
SELECT a.ddate, a.soldamt, b.shippedamt
FROM
(select order_date as ddate, sum(amount) as soldamt from TABLE group by order_date) a
FULL OUTER JOIN
(select shipped_date as ddate, sum(amount) as shippedamt from TABLE group by shipped_date) b
ON a.order_date = b.shipped_date
An other method (besides full outer join) is to use union all and an additional aggregation:
select ddate, sum(soldamt) as soldamt, sum(shippedamt) as shippedamt
from ((select order_date as ddate, sum(amount) as soldamt, 0 as shippedamt
from TABLE
group by order_date
) union all
(select shipped_date as ddate, 0, sum(amount) as shippedamt
from TABLE
group by shipped_date
)
) os
group by ddate;
This also results in fewer NULL values.

SQL Aggregates OVER and PARTITION

All,
This is my first post on Stackoverflow, so go easy...
I am using SQL Server 2008.
I am fairly new to writing SQL queries, and I have a problem that I thought was pretty simple, but I've been fighting for 2 days. I have a set of data that looks like this:
UserId Duration(Seconds) Month
1 45 January
1 90 January
1 50 February
1 42 February
2 80 January
2 110 February
3 45 January
3 62 January
3 56 January
3 60 February
Now, what I want is to write a single query that gives me the average for a particular user and compares it against all user's average for that month. So the resulting dataset after a query for user #1 would look like this:
UserId Duration(seconds) OrganizationDuration(Seconds) Month
1 67.5 63 January
1 46 65.5 February
I've been batting around different subqueries and group by scenarios and nothing ever seems to work. Lately, I've been trying OVER and PARTITION BY, but with no success there either. My latest query looks like this:
select Userid,
AVG(duration) OVER () as OrgAverage,
AVG(duration) as UserAverage,
DATENAME(mm,MONTH(StartDate)) as Month
from table.name
where YEAR(StartDate)=2014
AND userid=119
GROUP BY MONTH(StartDate), UserId
This query bombs out with a "Duration' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause" error.
Please keep in mind I'm dealing with a very large amount of data. I think I can make it work with CASE statements, but I'm looking for a cleaner, more efficient way to write the query if possible.
Thank you!
You are joining two queries together here:
Per-User average per month
All Organisation average per month
If you are only going to return data for one user at a time then an inline select may give you joy:
SELECT AVG(a.duration) AS UserAvergage,
(SELECT AVG(b.Duration) FROM tbl b WHERE MONTH(b.StartDate) = MONTH(a.StartDate)) AS OrgAverage
...
FROM tbl a
WHERE userid = 119
GROUP BY MONTH(StartDate), UserId
Note - using comparison on MONTH may be slow - you may be better off having a CTE (Common Table Expression)
missing partition clause in Average function
OVER ( Partition by MONTH(StartDate))
Please try this. It works fine to me.
WITH C1
AS
(
SELECT
AVG(Duration) AS TotalAvg,
[Month]
FROM [dbo].[Test]
GROUP BY [Month]
),
C2
AS
(
SELECT Distinct UserID,
AVG(Duration) OVER(PARTITION BY UserID, [Month] ORDER BY UserID) AS DetailedAvg,
[Month]
FROM [dbo].[Test]
)
SELECT C2.*, C1.TotalAvg
FROM C2 c2
INNER JOIN C1 c1 ON c1.[Month] = c2.[Month]
ORDER BY c2.UserID, c2.[Month] desc;
I was able to get it done using a self join, There's probably a better way.
Select UserId, AVG(t1.Duration) as Duration, t2.duration as OrgDur, t1.Month
from #temp t1
inner join (Select Distinct MONTH, AVG(Duration) over (partition by Month) as duration
from #temp) t2 on t2.Month = t1.Month
group by t1.Month, t1.UserId, t2.Duration
order by t1.UserId, Month desc
Here's using a CTE which is probably a better solution and definitely easier to read
With MonthlyAverage
as
(
Select MONTH, AVG(Duration) as OrgDur
from #temp
group by Month
)
Select UserId, AVG(t1.Duration) as Duration, m.duration as OrgDur , t1.Month
from #temp t1
inner join MonthlyAverage m on m.Month = t1.Month
group by UserId, t1.Month, m.duration
You can try below with less code.
SELECT Distinct UserID,
AVG(Duration) OVER(PARTITION BY [Month]) AS TotalAvg,
AVG(Duration) OVER(PARTITION BY UserID, [Month] ORDER BY UserID) AS DetailedAvg,
[Month]
FROM [dbo].[Test]

TSQL Running Totals aggregate from sum of previous rows

Not sure how to word this. Say i have a select returing this.
Name, month, amount
John, June, 5
John, July,6
John, July, 3
John August, 10
and I want to aggregate and report beggining blance for each month.
name, month, beggining balance.
john, may, 0
john, june, 0
john, july, 5
john, august, 14
john, September, 24
I can do this in excel with cell formulas, but how can I do it in SQL without storing values somewhere? I have another table with fiscal months i can do a left outer join with so all months are reported, just not sure how to aggregate from prior months in sql.
select
name
, month
, (select sum(balance) from mytable
where mytable.month < m.month and mytable.name = m.name) as starting_balance
from mytable m
group by name, month
This is not as nice as windowing functions, but since they vary from database to database, you'd need to tell us which system you are using.
And it's an inline subquery, which is not very performant. But at least it's easy to understand what's going on !
Use Grouping like this
SELECT NAME, MONTH , SUM(Balance) FROM table GROUP BY NAME, MONTH
Assuming your months are represented as dates, this will give you the running total.
select t1.name, t1.month, sum(t2.amount)
from yourtable t1
left join yourtable t2
on t1.name = t2.name
and t1.month>t2.month
group by t1.name, t1.month