I have a table with about 3 million rows of Customer Sales by Date.
For each CustomerID row I need to get the sum of the Spend_Value
WHERE Order_Date BETWEEN Order_Date_m365 AND Order_Date
Order_Date_m365 = OrderDate minus 365 days.
I just tried a self join but of course, this gave the wrong results due to rows overlapping dates.
If there is a way with Window Functions this would be ideal but I tried and can't do the between dates in the function, unless I missed a way.
Tonly way I can think now is to loop so process all rank 1 rows into a table, then rank 2 in a table, etc., but this will be really inefficient on 3 million rows.
Any ideas on how this is usually handled in SQL?
SELECT CustomerID,Order_Date_m365,Order_Date,Spend_Value
FROM dbo.CustomerSales
Window functions likely won't help you here, so you are going to need to reference the table again. I would suggest that you use an APPLY with a subquery to do this. Provided you have relevant indexes then this will likely be the more efficient approach:
SELECT CS.CustomerID,
CS.Order_Date_m365,
CS.Order_Date,
CS.Spend_Value,
o.output
FROM dbo.CustomerSales CS
CROSS APPLY (SELECT SUM(Spend_Value) AS output
FROM dbo.CustomerSales ca
WHERE ca.CustomerID = CS.CustomerID
AND ca.Order_Date >= CS.Order_Date_m365 --Or should this is >?
AND ca.Order_Date <= CS.Order_Date) o;
Related
Similar to another question I've posted, given the following table...
Promo EffectiveDate
------ -------------
PromoA 1/1/2016
PromoB 4/1/2016
PromoC 7/1/2016
PromoD 10/1/2016
PromoE 1/1/2017
What is the easiest way to transform it into start and end dates, like so...
Promo StartDate EndDate
------ --------- ---------
PromoA 1/1/2016 4/1/2016
PromoB 4/1/2016 7/1/2016
PromoC 7/1/2016 10/1/2016
PromoD 10/1/2016 1/1/2017
PromoE 1/1/2017 null (ongoing until a new Effective Date is added)
Update
Correlated queries seem to be the simplest solution, but as I understand it, they are extremely inefficient since the subquery has to run once per row of the outer select.
What I was thinking as a potential solution was something along the lines of selecting the values from the table a second time, but eliminating the first result, then pairing them up with the first select by ordinal index with a simple outer left join.
As an example, substituting letters for dates above, the first select would be like A,B,C,D,E and second would be B,C,D,E (which is the first select minus the first record 'A') then pairing them up by ordinal index with a simple outer left join, resulting in A-B, B-C, C-D, D-E, E-null. However I couldn't figure out the syntax to make that work.
A correlated sub-query can lookup the additional field you need.
SELECT
yourTable.*,
(
SELECT MIN(lookup.EffectiveDate)
FROM yourTable AS lookup
WHERE lookup.EffectiveDate > yourTable.EffectiveDate
)
FROM
yourTable
EDIT
The notion of "has to run once per row" is a mis-understanding of how SQL generates the execution plan that actually runs. The same can be said for joining one table to another, the join has to be run at-least once per row... There is indeed a larger cost to a correlated sub-query, but with appropriate indexes it won't be "extemely high", and the functionality described does warrant it.
If you had another field that was guaranteed to be sequential, then it would be trivial, but do not try to re-use the existing Promo field for that additional purpose.
SELECT
this.*,
next.EffectiveEpoch
FROM
yourTable this
LEFT JOIN
yourTable next
ON next.sequential_id = this.sequential_id + 1
Yes, you can use a correlated query with LIMIT :
SELECT t.promo,t.effectiveDate as start_date,
(SELECT s.effectiveDate FROM YourTable s
WHERE s.date > t.date
ORDER BY s.effectiveDate
LIMIT 1) as end_date
FROM YourTable t
EDIT: Here is a solution with a join :
SELECT t.promo,t.effectiveDate as start_date,
MIN(s.effectiveDate) as end_date
FROM YourTable t
LEFT JOIN YourTable s
ON(t.date < s.date)
GROUP BY t.promo,t.effectiveDate
show this, use subquery
select
p.promo,
p.EffectiveDate as "Start",
(select n.EffectiveDate from table_promo n where n.EffectiveDate >
p.EffectiveDate order by n.EffectiveDate limit 1) as "End"
from table_promo p
I have a question that I'm having trouble answering.
Find out what is the difference in number of invoices and total of invoiced products between May and June.
One way of doing it is to use sub-queries: one for June and the other one for May, and to subtract the results of the two queries. Since each of the two subqueries will return one row you can (should) use CROSS JOIN, which does not require the "on" clause since you join "all" the rows from one table (i.e. subquery) to all the rows from the other one.
To find the month of a certain date, you can use MONTH function.
Here is the Erwin document
This is what I got so far. I have no idea how to use CROSS JOIN in this situation
select COUNT(*) TotalInv, SUM(ILP.ProductCount) TotalInvoicedProducts
from Invoice I, (select Count(distinct ProductId) ProductCount from InvoiceLine) AS ILP
where MONTH(inv_date) = 5
select COUNT(*) TotalInv, SUM(ILP.ProductCount) TotalInvoicedProducts
from Invoice I, (select Count(distinct ProductId) ProductCount from InvoiceLine) AS ILP
where MONTH(inv_date) = 6
If you guys can help that would be great.
Thanks
The problem statement suggests you use the following steps:
Construct a query, with a single result row giving the values for June.
Construct a query, with a single result row giving the values for May.
Compare the results of the two queries.
The issue is that, in SQL, it's not super easy to do that third step. One way to do it is by doing a cross join, which yields a row containing all the values from both subqueries; it's then easy to use SELECT (b - a) ... to get the differences you're looking for. This isn't the only way to do the third step, but what you have definitely doesn't work.
can't you do something with subqueries? I haven't tested this, but something like the below should give you 4 columns, invoices and products for may and june.
select (
select 'stuff' a, count(*) as june_invoices, sum(products) as products from invoices
where month = 'june'
) june , (
select 'stuff' a, count(*) as may_invoices, sum(products) as products from invoices
where month = 'may'
) may
where june.a = may.a
I am having trouble in calculating the maximum of a row_number in my sql case.
I will explain it directly on the SQL Fiddle example, as I think it will be faster to understand: SQL Fiddle
Columns 'OrderNumber', 'HourMinute' and 'Code' are just to represent my table and hence, should not be relevant for coding purposes
Column 'DateOnly' contains the dates
Column 'Phone' contains the phones of my customers
Column 'Purchases' contains the number of times customers have bought in the last 12 months. Note that this value is provided for each date, so the 12 months time period is relative to the date we're evaluating.
Finally, the column I am trying to produce is the 'PREVIOUSPURCHASES' which counts the number of times the figure provided in the column 'Purchases' has appeared in the previous 12 months (for each phone).
You can see on the SQL Fiddle example what I have achieved so far. The column 'PREVIOUSPURCHASES' is producing what I want, however, it is also producing lower values (e.g. only the maximum one is the one I need).
For instance, you can see that rows 4 and 5 are duplicated, one with a 'PREVIOUSPURCHASES' of 1 and the other with 2. I don't want to have the 4th row, in this case.
I have though about replacing the row_number by something like max(row_number) but I haven't been able to produce it (already looked at similar posts at stackoverflow...).
This should be implemented in SQL Server 2012.
Thanks in advance.
I'm not sure what kind of result set you want to see but is there anything wrong with what's returned with this?
SELECT c.OrderNumber, c.DateOnly, c.HourMinute, c.Code, c.Phone, c.Purchases, MAX(o.PreviousPurchases)
FROM cte c CROSS APPLY (
SELECT t2.DateOnly, t2.Phone,t2.ordernumber, t2.Purchases, ROW_NUMBER() OVER(PARTITION BY c.DateOnly ORDER BY t2.DateOnly) AS PreviousPurchases
FROM CurrentCustomers_v2 t2
WHERE c.Phone = t2.Phone AND t2.purchases<=c.purchases AND DATEDIFF(DAY, t2.DateOnly, c.DateOnly) BETWEEN 0 AND 365
) o
WHERE c.OrderNumber = o.OrderNumber
GROUP BY c.OrderNumber, c.DateOnly, c.HourMinute, c.Code, c.Phone, c.Purchases
ORDER BY c.DateOnly
I have 2 tables with no relations, both tables have different number of columns, but there are a few columns that are the same but hold different data. I was able to create a function or view of only the data I wanted, but when I try to count the data by filtering the date, I always get the wrong count in return. Let me explain by showing the 2 functions and what I try to do:
Function 1
ID - number from 1 to 8
data sent - YES or NO
Date - date value
Function 2
ID - number from 1 to 8
data sent - yes or no
date - date value
Upon running both separately, I get all the rows from the tables and everything looks good.
Then I try to add the following to each function:
select
count([data sent]), ID
from function1
Where (date between #date1 and #date2)
group by ID
The above statement works great and gives me the right result for each function.
Now I thought what if I want to add those 2 functions into one and get the count from both functions on 1 page.
So I created the following function:
Function 3
select
count(Function1.[data sent]) as Expr1,
Function1.id,
count(Function2.[data sent]) as Expr2,
Function1.date
from
Function1
LEFT OUTER JOIN
Function2 on Function1.id = Function2.id
Where
(Function1.date between #date1 and #date2)
group by
Function1.id
Upon running the above, I get the following table:
ID Expr1 Expr2
On both Expr1 and Expr2, I get results which I am not sure where they come from. I guess something is being multiplied by 100000 since one table holds almost 15000 rows and the other around 5000 rows.
What I would like to know first is if it possible at all to be able to filter by date and count records from both table at the same time. If anyone need more information please let me know and I will be glad to share and explain more.
Thank you
The LEFT OUTER JOIN is taking each row of the left table, finding ALL of the rows in the right table with the same id field, and creating that many rows in the result table. Since id isn't what we usually think of as an identity field (it looks more like a "deviceId" or something), you'll get lots of matches for each one. Repeat 15000 times and you get your combinatorial explosion.
Tip: To debug things like this, you can create sample tables with a tiny subset of the real data, say 10 rows from each, and run your query on them. You'll see the issue immediately.
It's possible to filter by date. It's hard to recommend an actual solution without better understanding your phrase "I want to add those 2 functions into one and get the count from both functions on 1 page".
Why can't you create a temporary table for each function then join them together?
Maybe subqueries can help you to achieve what you want:
SELECT
ID = COALESCE(f1.ID, f2.ID),
Date = COALESCE(f1.Date, f2.Date),
f1.Expr1,
f2.Expr2
FROM (
SELECT
ID,
Date,
Expr1 = COUNT([data sent])
FROM Function1
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f1
FULL JOIN (
SELECT
ID,
Date,
Expr2 = COUNT([data sent])
FROM Function2
WHERE Date BETWEEN #date1 AND #date2
GROUP BY
ID,
Date
) f2
ON f1.ID = f2.ID AND f1.Date = f2.Date
This query also uses full (outer) join instead of left join, in case the right side of the join contains rows that have no match in the left side (and you want those rows).
See the following SQL statement:
SELECT datediff("d", MAX(invoice.date), Now) As Date_Diff
, MAX(invoice.date) AS max_invoice_date
, customer.number AS customer_number
FROM invoice
INNER JOIN customer
ON invoice.customer_number = customer.number
GROUP BY customer.number
If the the following was added:
HAVING datediff("d", MAX(invoice.date), Now) > 365
would this simply exclude rows with Date_Diff <= 365?
What should be the effect of the HAVING clause here?
EDIT: I am not experiencing what the answers here are saying. A copy of the mdb is at http://hotfile.com/dl/40641614/2353dfc/test.mdb.html (no macros or viruses). VISDATA.EXE is being used to execute the queries.
EDIT2: I think the problem might be VISDATA, because I am experiencing different results via DAO.
As already pointed out, yes, that is the effect. For completeness, 'HAVING' is like 'WHERE', but for the already aggregated (grouped) values (such as, MAX in this case, or SUM, or COUNT, or any of the other aggregate functions).
Yes, it would exclude those rows.
Yes, that is what it would do.
WHERE applies to all of the individual rows, so WHERE MAX(...) would match all rows.
HAVING is like WHERE, but within the current group. That means you can do things like HAVING count(*) > 1, which will only show groups with more than one result.
So to answer your question, it would only include rows where the record in the group that has the highest (MAX) date is greater than 365. In this case you are also selecting MAX(date), so yes, it excludes rows with date_diff <= 365.
However, you could select MIN(date) and see the minimum date in all the groups that have a maximum date of greater than 365. In this case it would not exclude "rows" with date_diff <= 365, but rather groups with max(date_diff) <= 365.
Hopefully it's not too confusing...
You may be trying the wrong thing with your MAX. By MAXing the invoice.date column you are effectively looking for the most recent invoice associated with the customer. So effectively the HAVING condition is selecting all those customers who have not had any invoices within the last 365 days.
Is this what you are trying to do? Or are you actually trying to get all customers who have at least one invoice from more than a year ago? If that is the case, then you should put the MAX outside the datediff function.
That depends on whether you mean rows in the table or rows in the result. The having clause filters the result after grouping, so it would elliminate customers, not invoices.
If you want to filter out the new invoices rather than the customers with new invoices, you should use where instead so that you filter before grouping:
select
datediff("d",
max(invoice.date), Now) As Date_Diff,
max(invoice.date) as max_invoice_date,
customer.number
from
invoice
inner join customer on invoice.customer_number = customer.number
where
datediff("d", invoice.date, Now) > 365
group by
customer.number
I wouldn't use a GROUP BY query at all. Using standard Jet SQL:
SELECT Customer.Number
FROM [SELECT DISTINCT Invoice.Customer_Number
FROM Invoice
WHERE (((Invoice.[Date])>Date()-365));]. AS Invoices
RIGHT JOIN Customer ON Invoices.Customer_Number = Customer.Number
WHERE (((Invoices.Customer_Number) Is Null));
Using SQL92 compatibility mode:
SELECT Customer.Number
FROM (SELECT DISTINCT Invoice.Customer_Number
FROM Invoice
WHERE (((Invoice.[Date])>Date()-365));) AS Invoices
RIGHT JOIN Customer ON Invoices.Customer_Number = Customer.Number
WHERE (((Invoices.Customer_Number) Is Null));
The key here is to get a set of the customer numbers who've had an invoice in the last year, and then doing an OUTER JOIN on that result set to return only those not in the set of customers with invoices in the last year.