Joining sums of different periods from the same data set

Joining sums of different periods from the same data set - sql

I often face the situation where I need to compare aggregated data of different periods from the same source.
I usually deal with it this way:
SELECT
COALESCE(SalesThisYear.StoreId, SalesLastYear.StoreId) StoreId
, SalesThisYear.Sum_Revenue RevenueThisYear
, SalesLastYear.Sum_Revenue RevenueLastYear
FROM
(
SELECT StoreId, SUM(Revenue) Sum_Revenue
FROM Sales
WHERE Date BETWEEN '2017-09-01' AND '2017-09-30'
GROUP BY StoreId
) SalesThisYear
FULL JOIN (
SELECT StoreId, SUM(Revenue) Sum_Revenue
FROM Sales
WHERE Date BETWEEN '2016-09-01' AND '2016-09-30'
GROUP BY StoreId
) SalesLastYear
ON (SalesLastYear.StoreId = SalesThisYear.StoreId)
-- execution time 337 ms
It is not very elegant in my opinion, because it visits the table twice, but it works.
Another similar way to achieve the same is:
SELECT
Sales.StoreId
, SUM(CASE YEAR(Date) WHEN 2017 THEN Revenue ELSE 0 END) RevenueThisYear
, SUM(CASE YEAR(Date) WHEN 2016 THEN Revenue ELSE 0 END) RevenueLastYear
FROM
Sales
WHERE
Date BETWEEN '2017-09-01' AND '2017-09-30'
or Date BETWEEN '2016-09-01' AND '2016-09-30'
GROUP BY
StoreId
-- execution time 548 ms
Both solutions performs almost the same on my data set (1,929,419 rows in the selected period, all indexes on their places), the first one a little better in time. And it doesn't matter if I include more periods, the first one is always better on my data set.
This is only a simple example but, sometimes, it involves more than two intervals and even some logic (e.g. compare isoweek/weekday instead of month/day, compare different stores, etc).
Although I already have figured out several ways to achieve the same, I was wondering if there is a clever way to achieve the same. Maybe a more cleaner solution, or a more suitable for big data sets (over a TB).
For example, I suppose the second one is less resource intensive for a big data set, since it does a single Index Scan over the table. The first one, on the other hand, requires two Index Scans and a Merge. If the table is too big to fit in memory, what will happen? Or the first one is always better?

There is very rarely a This way of doing things is always better, especially when they are doing very similar things.
What I will suggest however is that you try to utilise best practise wherever you can, such as minimising the use of scalar functions in your queries as this inhibits index usage.
For example, by changing your second query to the following I would imagine you will see at least some improvement performance wise:
SELECT
Sales.StoreId
, SUM(CASE WHEN Date BETWEEN '2017-09-01' AND '2017-09-30' THEN Revenue ELSE 0 END) RevenueThisYear
, SUM(CASE WHEN Date BETWEEN '2016-09-01' AND '2016-09-30' THEN Revenue ELSE 0 END) RevenueLastYear
FROM
Sales
WHERE
Date BETWEEN '2017-09-01' AND '2017-09-30'
or Date BETWEEN '2016-09-01' AND '2016-09-30'
GROUP BY
StoreId

The second looks better. But I guess the year part is slowing the query. Lets take out the year and put this. 2017-01-01 will be greater for this year range('2017-09-01' AND '2017-09-30') and less for last year range ('2016-09-01' AND '2016-09-30') .
SELECT
Sales.StoreId
, SUM(CASE WHEN date > 2017-01-01 THEN Revenue ELSE 0 END) RevenueThisYear
, SUM(CASE WHEN date < 2017-01-01 THEN Revenue ELSE 0 END) RevenueLastYear
FROM
Sales
WHERE
Date BETWEEN '2017-09-01' AND '2017-09-30'
or Date BETWEEN '2016-09-01' AND '2016-09-30'
GROUP BY
StoreId
IF FULL join is working great, lets try this.
SELECT
COALESCE(SalesThisYear.StoreId, SalesLastYear.StoreId) StoreId
, sum(SalesThisYear.Revenue) RevenueThisYear
, sum(SalesLastYear.Revenue) RevenueLastYear
FROM Sales SalesThisYear full join
Sales SalesLastYear
ON SalesLastYear.StoreId = SalesThisYear.StoreId
WHERE SalesThisYear.Date BETWEEN '2017-09-01' AND '2017-09-30'
AND SalesLastYear.Date BETWEEN '2016-09-01' AND '2016-09-30'
GROUP BY COALESCE(SalesThisYear.StoreId, SalesLastYear.StoreId)
Edit *
SELECT Sales.StoreId
, SUM(CASE WHEN date > '2017-01-01' THEN Revenue ELSE 0 END) RevenueThisYear
, SUM(CASE WHEN date < '2017-01-01' THEN Revenue ELSE 0 END) RevenueLastYear
FROM
(Select store_id, date, revenue
from Sales
WHERE Date BETWEEN '2017-09-01' AND '2017-09-30'
or Date BETWEEN '2016-09-01' AND '2016-09-30') q
GROUP BY StoreId

Related

How to calculate average and latest cost of a product over date range in same sql query

I have table where product is there and it's cost over a time range. I need to calculate the average cost over the period, with the latest cost till date to be considered in average also I need to fetch the current cost. How can I achieve it in same query.
Input Table
I am looking for output like
product | average_cost | current_cost
(average cost is (cost*days of that cost)/total days dill today's date.

You can use date arithmetic and conditional aggregation:
select product,
( sum( cost * datediff(day, beg_date, (case when end_date > getdate() then getdate() else end_date end) )) /
sum(datediff(day, beg_date, (case when end_date > getdate() then getdate() else end_date end))
) as avg_price,
max(case when end_date > getdate() then price end)
from t
group by product;

Find Customers Who Shop at Multiple Stores

I need a query that will give me a count of customers who have shopped at multiple store locations within the last 3 years.
I have formulated the following query, but it's not what I need to know:
SELECT STORE_ID, CUSTOMER_ID, COUNT(DISTINCT CUSTOMER_ID) as SERVICE_COUNT
From SALES INNER JOIN
STORE_DETAILS
ON trim(STORE_ID) = trim(STORE_ID)
WHERE (CURRENT_DATE - cast(SALE_DATE AS DATE format 'mm/dd/yyyy')) < 1095
ORDER BY 1,2
Group by 1,2
HAVING COUNT(DISTINCT SALE_DATE) > 1

If you want customers at multiple stores, then something like:
SELECT CUSTOMER_ID
FROM SALES INNER JOIN
STORE_DETAILS
ON trim(STORE_ID) = trim(STORE_ID)
WHERE (CURRENT_DATE - cast(SALE_DATE AS DATE format 'mm/dd/yyyy')) < 1095
GROUP BY 1
HAVING COUNT(DISTINCT STORE_ID) > 1;
I don't understand your date expression, but presumably you know what it is supposed to be doing.

Optimized version of Gordon's query based on your comments:
Often, the store_id has trailing spaces not allowing a true match
Comparing strings ignores trailing spaces. As long as there are no leading spaces (which is a worst case and should be fixed during load) you don't have to TRIM (it's quite bad for performance).
The datatype for SALE_DATE is DATE
If it's a date there's no need for a CAST. Additionally the within three years logic can be simplified to avoid date calculattion on every row.
SELECT CUSTOMER_ID, COUNT(DISTINCT CUSTOMER_ID) as SERVICE_COUNT
FROM SALES
JOIN STORE_DETAILS
ON STORE_ID = STORE_ID
WHERE SALE_DATE >= ADD_MONTHS(CURRENT_DATE, -12*3)
GROUP BY 1
HAVING SERVICE_COUNT > 1
;

Filter results without affecting all columns in SQL Server 2017

I am using SQL Server 2017 and through asking numerous questions on here I have discovered case statements which act as if - else in SQL. This is good but will not satisfy what I need from my result set. If I have a sales table with an amount, date of sale and item description. I am trying to write something like this.
Select
sum(amount) -- total amount,
count(date_of_sale) -- number of days selling
sum(amount where date_of_sale between certain date and certain date)
I don't want to put a where clause outside this because I don't want it to effect the result of the other columns. I can't get around this using a case statement to what I have tried

We can use conditional aggregation here, and sum a CASE expression which includes in the sum only amounts from your date range of interest.
SELECT
SUM(amount) AS total_sales,
COUNT(date_of_sale) AS total_items,
SUM(CASE WHEN date_of_sale BETWEEN start_date AND end_date
THEN amount ELSE 0 END) AS partial_sales,
COUNT(CASE WHEN date_of_sale BETWEEN start_date AND end_date
THEN 1 END) AS partial_items
FROM yourTable;

SQL date (specific date)

I want to get the list of offices name that was working for us before 2015-01-01 and after 2016-01-01 but not between 2015-01-01 and 2016-01-01.
If try to put NOT BETWEEN then it will basically give me the result excluding that two dates.can somebody solve this ?
select distinct office_name
from history
where date not between '01-jan-2015' and '01-jan-2016'

I think you want aggregation:
select office_name
from history
group by office_name
having sum(case when date between date '2015-01-01' and '2016-01-01' then 1 else 0 end) = 0;;

mix two sql queries

I want to mix 2 sql queries, one to get total amount in a particular year, and other to know the amount in a particular month of a particular year
SELECT SUM(money) As Anual FROM Deposito WHERE Year(FechaDeposito)=2011
SELECT SUM(money) As monthly FROM Deposito WHERE Year(FechaDeposito)= 2011 AND Month(FechaDeposito)=10
How to do that in the most efficent way ?

Simply combining your two queries would give
SELECT SUM(money) As Anual,
SUM(CASE WHEN Month(FechaDeposito)=10 THEN money else 0 end) as monthly
FROM Deposito
WHERE Year(FechaDeposito)=2011
It will perform terribly however. Ideally you want an index on FechaDeposito, and to construct date ranges to test against instead of running functions over the column, e.g.
SELECT SUM(money) As Anual,
SUM(CASE WHEN Month(FechaDeposito)=10 THEN money else 0 end) as monthly
FROM Deposito
WHERE FechaDeposito >= '2011-01-01' and FechaDeposito < '2012-01-01'

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas