SQL: HAVING clause - sql

See the following SQL statement:
SELECT datediff("d", MAX(invoice.date), Now) As Date_Diff
, MAX(invoice.date) AS max_invoice_date
, customer.number AS customer_number
FROM invoice
INNER JOIN customer
ON invoice.customer_number = customer.number
GROUP BY customer.number
If the the following was added:
HAVING datediff("d", MAX(invoice.date), Now) > 365
would this simply exclude rows with Date_Diff <= 365?
What should be the effect of the HAVING clause here?
EDIT: I am not experiencing what the answers here are saying. A copy of the mdb is at http://hotfile.com/dl/40641614/2353dfc/test.mdb.html (no macros or viruses). VISDATA.EXE is being used to execute the queries.
EDIT2: I think the problem might be VISDATA, because I am experiencing different results via DAO.

As already pointed out, yes, that is the effect. For completeness, 'HAVING' is like 'WHERE', but for the already aggregated (grouped) values (such as, MAX in this case, or SUM, or COUNT, or any of the other aggregate functions).

Yes, it would exclude those rows.

Yes, that is what it would do.

WHERE applies to all of the individual rows, so WHERE MAX(...) would match all rows.
HAVING is like WHERE, but within the current group. That means you can do things like HAVING count(*) > 1, which will only show groups with more than one result.
So to answer your question, it would only include rows where the record in the group that has the highest (MAX) date is greater than 365. In this case you are also selecting MAX(date), so yes, it excludes rows with date_diff <= 365.
However, you could select MIN(date) and see the minimum date in all the groups that have a maximum date of greater than 365. In this case it would not exclude "rows" with date_diff <= 365, but rather groups with max(date_diff) <= 365.
Hopefully it's not too confusing...

You may be trying the wrong thing with your MAX. By MAXing the invoice.date column you are effectively looking for the most recent invoice associated with the customer. So effectively the HAVING condition is selecting all those customers who have not had any invoices within the last 365 days.
Is this what you are trying to do? Or are you actually trying to get all customers who have at least one invoice from more than a year ago? If that is the case, then you should put the MAX outside the datediff function.

That depends on whether you mean rows in the table or rows in the result. The having clause filters the result after grouping, so it would elliminate customers, not invoices.
If you want to filter out the new invoices rather than the customers with new invoices, you should use where instead so that you filter before grouping:
select
datediff("d",
max(invoice.date), Now) As Date_Diff,
max(invoice.date) as max_invoice_date,
customer.number
from
invoice
inner join customer on invoice.customer_number = customer.number
where
datediff("d", invoice.date, Now) > 365
group by
customer.number

I wouldn't use a GROUP BY query at all. Using standard Jet SQL:
SELECT Customer.Number
FROM [SELECT DISTINCT Invoice.Customer_Number
FROM Invoice
WHERE (((Invoice.[Date])>Date()-365));]. AS Invoices
RIGHT JOIN Customer ON Invoices.Customer_Number = Customer.Number
WHERE (((Invoices.Customer_Number) Is Null));
Using SQL92 compatibility mode:
SELECT Customer.Number
FROM (SELECT DISTINCT Invoice.Customer_Number
FROM Invoice
WHERE (((Invoice.[Date])>Date()-365));) AS Invoices
RIGHT JOIN Customer ON Invoices.Customer_Number = Customer.Number
WHERE (((Invoices.Customer_Number) Is Null));
The key here is to get a set of the customer numbers who've had an invoice in the last year, and then doing an OUTER JOIN on that result set to return only those not in the set of customers with invoices in the last year.

Related

Sum Column for Running Total where Overlapping Date

I have a table with about 3 million rows of Customer Sales by Date.
For each CustomerID row I need to get the sum of the Spend_Value
WHERE Order_Date BETWEEN Order_Date_m365 AND Order_Date
Order_Date_m365 = OrderDate minus 365 days.
I just tried a self join but of course, this gave the wrong results due to rows overlapping dates.
If there is a way with Window Functions this would be ideal but I tried and can't do the between dates in the function, unless I missed a way.
Tonly way I can think now is to loop so process all rank 1 rows into a table, then rank 2 in a table, etc., but this will be really inefficient on 3 million rows.
Any ideas on how this is usually handled in SQL?
SELECT CustomerID,Order_Date_m365,Order_Date,Spend_Value
FROM dbo.CustomerSales
Window functions likely won't help you here, so you are going to need to reference the table again. I would suggest that you use an APPLY with a subquery to do this. Provided you have relevant indexes then this will likely be the more efficient approach:
SELECT CS.CustomerID,
CS.Order_Date_m365,
CS.Order_Date,
CS.Spend_Value,
o.output
FROM dbo.CustomerSales CS
CROSS APPLY (SELECT SUM(Spend_Value) AS output
FROM dbo.CustomerSales ca
WHERE ca.CustomerID = CS.CustomerID
AND ca.Order_Date >= CS.Order_Date_m365 --Or should this is >?
AND ca.Order_Date <= CS.Order_Date) o;

Access: Having trouble with getting average movies per day

I have a database project at my school and I am almost finished. The only thing that I need is average movies per day. I have a watchhistory where you can find the users who have watch a movie. The instrucition is that you filter the people out of the watchhistory who have an average of 2 movies per day.
I wrote the following SQL statement. But every time I get errors. Can someone help me?
SQL:
SELECT
customer_mail_address,
COUNT(movie_id) AS AantalBekeken,
COUNT(movie_id) / SUM(GETDATE() -
(SELECT subscription_start FROM Customer)) AS AveragePerDay
FROM
Watchhistory
GROUP BY
customer_mail_address
The error:
Msg 130, Level 15, State 1, Line 1
Cannot perform an aggregate function on an expression containing an aggregate or a subquery.
I tried something different and this query sums the total movie's per day. Now I need the average of everything and that SQL only shows the cusotmers who are have more than 2 movies per day average.
SELECT
Count(movie_id) as AantalPerDag,
Customer_mail_address,
Cast(watchhistory.watch_date as Date) as Date
FROM
Watchhistory
GROUP BY
customer_mail_address, Cast(watch_date as Date)
The big problem that I see is that you're trying to use a subquery as if it's a single value. A subquery could potentially return many values, and unless you have only one customer in your system it will do exactly that. You should be JOINing to the Customer table instead. Hopefully the JOIN only returns one customer per row in WatchHistory. If that's not the case then you'll have more work to do there.
SELECT
customer_mail_address,
COUNT(movie_id) AS AantalBekeken,
CAST(COUNT(movie_id) AS DECIMAL(10, 4)) / DATEDIFF(dy, C.subscription_start, GETDATE()) AS AveragePerDay
FROM
WatchHistory WH
INNER JOIN Customer C ON C.customer_id = WH.customer_id -- I'm guessing at the join criteria here since no table structures were provided
GROUP BY
C.customer_mail_address,
C.subscription_start
HAVING
COUNT(movie_id) / DATEDIFF(dy, C.subscription_start, GETDATE()) <> 2
I'm guessing that the criteria isn't exactly 2 movies per day, but either less than 2 or more than 2. You'll need to adjust based on that. Also, you'll need to adjust the precision for the average based on what you want.
What the error message is telling you is that you can't use SUM together with COUNT.
try putting SUM(GETDATE()-(SELECT subscription_start FROM Customer)) as your second aggregate variable, and
try using HAVING & FILTER at the end of your query to select only the users that have count/sum = 2
maybe this is what you need?
lets join the two tables Watchhistory and Customers
select customer_mail_address,
COUNT(movie_id) AS AantalBekeken,
COUNT(movie_id) / datediff(Day, GETDATE(),Customer.subscription_start) AS AveragePerDay
from Watchhistory inner join Customer
on Watchhistory.customer_mail_address = Customer.customer_mail_address
GROUP BY
customer_mail_address
having AveragePerDay = 2
change the last line of code according to what you need (I did not understand if you want it in or out)
I got it guys. Finally :)
SELECT customer_mail_address, SUM(AveragePerDay) / COUNT(customer_mail_address) AS gemiddelde
FROM (SELECT DISTINCT customer_mail_address, COUNT(CAST(watch_date AS date)) AS AveragePerDay
FROM dbo.Watchhistory
GROUP BY customer_mail_address, CAST(watch_date AS date)) AS d
GROUP BY customer_mail_address
HAVING (SUM(AveragePerDay) / COUNT(customer_mail_address) >= 2

SQL Difference Between Current Year and Last Year. If Last Year Data Does Not Exist Include Current Year

In a previous post I got help finding incremental sales. The query works great. I added the breakout by product. The issue I’m having is that I need to show new products being sold. If the product did not exist last year, but we are selling it this year; then it should show up in the data table.
I tried use a CASE statement in the WHERE, but it was causing a lot of duplication of the data. I was thinking something like what is below. How do I go about including items that are only in the current year? Thank you for your help, its greatly appreciated.
Not Working Where Clause
WHERE
Ym.Project =
CASE
WHEN ymprev.Project IS NULL THEN ym.Project
ELSE ymprev.Project
END
Below is the working query.
WITH ym as(
SELECT
Product
,SUM(Sales) AS Sales
,MONTH(Date) AS Month
,YEAR(Date) AS Year
FROM SalesTable
GROUP BY
YEAR(Date)
,MONTH(Date)
,Product
)
SELECT
ymprev.Project AS PrevProject
,ym.Product
,ym.Sales
,ymprev.Sales AS PreviousSales
,(ym.Sales - ymprev.Sales) AS IncrementalSales
,ymprev.Month AS PreviousMonth
,ymprev.Year AS PreviousYear
,ym.Month
,ym.Year
FROM ym
JOIN ym ymprev on
ymprev.Year = ym.Year
AND ymprev.Month = ym.Month
AND ymprev.Product = ym.Product
ORDER BY
ym.Year
,ym.Month
Your query is implicitly using an INNER JOIN - this means that you will only see values that have a match in both datasets, just as you describe.
Try changing your FROM clause to
FROM ym
LEFT JOIN
ym ymprev on
ymprev.Year = ym.Year
AND ymprev.Month = ym.Month
AND ymprev.Product = ym.Product
You will also need to incorporate similar logic in any values that include data elements from the previous year's query. For example, ,(ym.Sales - ymprev.Sales) AS IncrementalSales will need to be turned into ,(ym.Sales - ISNULL(ymprev.Sales,0)) AS IncrementalSales or it will return NULL for any records that only exist in the current year.
Your posted query doesn't include the Project field in your CTE, so I can't tell exactly how that works, but the posted data should get you started.

Subtracting 2 values from a query and sub-query using CROSS JOIN in SQL

I have a question that I'm having trouble answering.
Find out what is the difference in number of invoices and total of invoiced products between May and June.
One way of doing it is to use sub-queries: one for June and the other one for May, and to subtract the results of the two queries. Since each of the two subqueries will return one row you can (should) use CROSS JOIN, which does not require the "on" clause since you join "all" the rows from one table (i.e. subquery) to all the rows from the other one.
To find the month of a certain date, you can use MONTH function.
Here is the Erwin document
This is what I got so far. I have no idea how to use CROSS JOIN in this situation
select COUNT(*) TotalInv, SUM(ILP.ProductCount) TotalInvoicedProducts
from Invoice I, (select Count(distinct ProductId) ProductCount from InvoiceLine) AS ILP
where MONTH(inv_date) = 5
select COUNT(*) TotalInv, SUM(ILP.ProductCount) TotalInvoicedProducts
from Invoice I, (select Count(distinct ProductId) ProductCount from InvoiceLine) AS ILP
where MONTH(inv_date) = 6
If you guys can help that would be great.
Thanks
The problem statement suggests you use the following steps:
Construct a query, with a single result row giving the values for June.
Construct a query, with a single result row giving the values for May.
Compare the results of the two queries.
The issue is that, in SQL, it's not super easy to do that third step. One way to do it is by doing a cross join, which yields a row containing all the values from both subqueries; it's then easy to use SELECT (b - a) ... to get the differences you're looking for. This isn't the only way to do the third step, but what you have definitely doesn't work.
can't you do something with subqueries? I haven't tested this, but something like the below should give you 4 columns, invoices and products for may and june.
select (
select 'stuff' a, count(*) as june_invoices, sum(products) as products from invoices
where month = 'june'
) june , (
select 'stuff' a, count(*) as may_invoices, sum(products) as products from invoices
where month = 'may'
) may
where june.a = may.a

SQL to calculate value of Shares at a particular time

I'm looking for a way that I can calculate what the value of shares are at a given time.
In the example I need to calculate and report on the redemptions of shares in a given month.
There are 3 tables that I need to look at:
Redemptions table that has the Date of the redemption, the number of shares that were redeemed and the type of share.
The share type table which has the share type and links the 1st and 3rd tables.
The Share price table which has the share type, valuation date, value.
So what I need to do is report on and have calculated based on the number of share redemptions the value of those shares broken down by month.
Does that make sense?
Thanks in advance for your help!
Apologies, I think I should elaborate a little further as there might have been some misunderstandings. This isn't to calculate daily changing stocks and shares, it's more for fund management. What this means is that the share price only changes on a monthly basis and it's also normally a month behind.
The effect of this is that the what the query needs to do, is look at the date of the redemption, work out the date ie month and year. Then look at the share price table and if there's a share price for the given date (this will need to be calculated as it will be a single day ie the price was x on day y) then multiple they number of units by this value. However, if there isn't a share price for the given date then use the last price for that particular share type.
Hopefully this might be a little more clear but if there's any other information I can provide to make this easier then please let me know and I'll supply you with the information.
Regards,
Phil
This should do the trick (note: updated to group by ShareType):
SELECT
ST.ShareType,
RedemptionMonth = DateAdd(month, DateDiff(month, 0, R.RedemptionDate), 0),
TotalShareValueRedeemed = Sum(P.SharePrice * R.SharesRedeemed)
FROM
dbo.Redemption R
INNER JOIN dbo.ShareType ST
ON R.ShareTypeID = ST.ShareTypeID
CROSS APPLY (
SELECT TOP 1 P.*
FROM dbo.SharePrice P
WHERE
R.ShareTypeID = P.ShareTypeID
AND R.RedemptionDate >= P.SharePriceDate
ORDER BY P.SharePriceDate DESC
) P
GROUP BY
ShareType,
DateAdd(month, DateDiff(month, 0, R.RedemptionDate), 0)
ORDER BY
ShareType,
RedemptionMonth
;
See it working in a Sql Fiddle.
This can easily be parameterized by simply adding a WHERE clause with conditions on the Redemption table. If you need to show a 0 for share types in months where they had no Redemptions, please let me know and I'll improve my answer--it would help if you would fill out your use case scenario a little bit, and describe exactly what you want to input and what you want to see as output.
Also please note: I'm assuming here that there will always be a price for a share redemption--if a redemption exists that is before any share price for it, that redemption will be excluded.
If you have the valuations for every day, then the calculation is a simple join followed by an aggregation. The resulting query is something like:
select year(redemptiondate), month(redemptiondate),
sum(r.NumShares*sp.Price) as TotalPrice
from Redemptions r left outer join
ShareType st
on r.sharetype = st.sharetype left outer join
SharePrice sp
on st.sharename = sp.sharename and r.redemptiondate = sp.pricedate
group by year(redemptiondate), month(redemptiondate)
order by 1, 2;
If I understand your question, you need a query like
select shares.id, shares.name, sum (redemption.quant * shareprices.price)
from shares
inner join redemption on shares.id = redemption.share
inner join shareprices on shares.id = shareprices.share
where redemption.curdate between :p1 and :p2
order by shares.id
group by shares.id, shares.name
:p1 and :p2 are date parameters
If you just need it for one date range:
SELECT s.ShareType, SUM(ISNULL(sp.SharePrice, 0) * ISNULL(r.NumRedemptions, 0)) [RedemptionPrice]
FROM dbo.Shares s
LEFT JOIN dbo.Redemptions r
ON r.ShareType = s.ShareType
OUTER APPLY (
SELECT TOP 1 SharePrice
FROM dbo.SharePrice p
WHERE p.ShareType = s.ShareType
AND p.ValuationDate <= r.RedemptionDate
ORDER BY p.ValuationDate DESC) sp
WHERE r.RedemptionDate BETWEEN #Date1 AND #Date2
GROUP BY s.ShareType
Where #Date1 and #Date2 are your dates
The ISNULL checks are just there so it actually gives you a value if something is null (it'll be 0). It's completely optional in this case, just a personal preference.
The OUTER APPLY acts like a LEFT JOIN that will filter down the results from SharePrice to make sure you get the most recent ValuationDate from table based on the RedemptionDate, even if it wasn't from the same date range as that date. It could probably be achieved another way, but I feel like this is easily readable.
If you don't feel comfortable with the OUTER APPLY, you could use a subquery in the SELECT part (i.e., ISNULL(r.NumRedemptions, 0) * (/* subquery from dbo.SharePrice here */)