SQL Sum with non unique date - sql

I'm trying to write a SQL query that will sum total production from the following two example tables:
Table: CaseLots
DateProduced kgProduced
October 1, 2013 10000
October 1, 2013 10000
October 2, 2013 10000
Table: Budget
OperatingDate BudgetHours
October 1, 2013 24
October 2, 2013 24
I would like to output a table as follows:
TotalProduction TotalBudgetHours
30000 48
Here is what I have for code so far:
SELECT
Sum(kgProduced) AS TotalProduction, Sum(BudgetHours) AS TotalBudgetHours
FROM
dbo.CaseLots INNER JOIN dbo.Budget ON dbo.CaseLots.DateProduced = dbo.Budget.OperatingDate
WHERE
dbo.Budget.OperatingDate BETWEEN '2013-10-01' AND '2013-10-02'
It seems that the query is double summing the budget hour in instances where more than one case lot is produced in a day. The table I'm getting is as follows:
Total Production BudgetHours
30000 72
How do I fix this?

Think about what the INNER JOIN is doing.
For every row in CaseLot, its finding any row in Budget that has a matching date.
If you were to remove your aggregation statements in SQL, and just show the inner join, you would see the following result set:
DateProduced kgProduced OperatingDate BudgetHours
October 1, 2013 10000 October 1, 2013 24
October 1, 2013 10000 October 1, 2013 24
October 2, 2013 10000 October 2, 2013 24
(dammit StackOverflow, why don't you have Markdown for tables :( )
Running your aggregation on top of that it is easy to see how you get the 72 hours in your result.
The correct query needs to aggregate the CaseLots table first, then join onto the Budget table.
SELECT DateProduced, TotalKgProduced, SUM(BudgetHours) AS TotalBudgetHours
FROM
(
SELECT DateProduced, SUM(kgProduced) AS TotalKgProduced
FROM CaseLots
GROUP BY DateProduced
) AS TotalKgProducedByDay
INNER JOIN
Budget
ON TotalKgProducedByDay.DateProduced = Budget.OperatingDate
WHERE DateProduced BETWEEN '1 Oct 2013' AND '2 Oct 2013'
GROUP BY DateProduced

The problem is in the INNER JOIN produces a 3 row table since the keys match on all. So there is three '24's with a sum of 72.
To fix this, it would probably be easier to split this into two queries.
SELECT Sum(kgProduced) AS TotalProduction
FROM dbo.CaseLots
WHERE dbo.CaseLots.OperatingDate BETWEEN '2013-10-01' AND '2013-10-02'
LEFT JOIN
SELECT Sum(BudgetHours) AS TotalBudgetHours
FROM dbo.Budget
WHERE dbo.Budget.OperatingDate BETWEEN '2013-10-01' AND '2013-10-02'

This could be easily achieved by this:
SELECT
(SELECT SUM(kgProduced) FROM dbo.CaseLots WHERE DateProduced BETWEEN '2013-10-01' AND '2013-10-02') AS TotalProduction,
(SELECT SUM(BudgetHours) FROM dbo.Budget WHERE OperatingDate BETWEEN '2013-10-01' AND '2013-10-02') AS TotalBudgetHours
There's no need for joining the two tables.

The other answers are simpler for this particular case. However if you needed to SUM 10 different values on the CaseLots table, you'd need 10 different subqueries. The following is a general, more scaleable solution:
SELECT
SUM(DayKgProduced) AS TotalProduction,
SUM(BudgetHours) AS TotalBudgetHours
FROM (
SELECT
DateProduced,
SUM(kgProduced) AS DayKgProduced,
FROM dbo.CaseLots
WHERE DateProduced BETWEEN '2013-10-01' AND '2013-10-02'
GROUP BY DateProduced
) DailyTotals
INNER JOIN dbo.Budget b ON DailyTotals.DateProduced = b.OperatingDate
First you SUM the production of each CaseLot without having to SUM the BudgetHours. If you used a SELECT * FROM in the query above you'd see:
Date DayKgProduced BudgetHours
2013-10-01 20000 24
2013-10-02 10000 24
But you want the overall total, so we SUM those daily values, correctly producing:
TotalProduction TotalBudgetHours
30000 48

Try this:
select DateProduced,TotalProduction,TotalBudgetHours from
(select DateProduced,sum(kgProduced) as TotalProduction
from CaseLots group by DateProduced) p
join
(select OperatingDate,sum(BudgetHours) as TotalBudgetHours
from Budget group by OperatingDate) b
on (p.DateProduced=b.OperatingDate)
where p.DateProduced between '2013-10-01' AND '2013-10-02'

Related

Retrieve Customers with a Monthly Order Frequency greater than 4

I am trying to optimize the below query to help fetch all customers in the last three months who have a monthly order frequency +4 for the past three months.
Customer ID
Feb
Mar
Apr
0001
4
5
6
0002
3
2
4
0003
4
2
3
In the above table, the customer with Customer ID 0001 should only be picked, as he consistently has 4 or more orders in a month.
Below is a query I have written, which pulls all customers with an average purchase frequency of 4 in the last 90 days, but not considering there is a consistent purchase of 4 or more last three months.
Query:
SELECT distinct lines.customer_id Customer_ID, (COUNT(lines.order_id)/90) PurchaseFrequency
from fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY Customer_ID
HAVING PurchaseFrequency >=4;
I tried to use window functions, however not sure if it needs to be used in this case.
I would sum the orders per month instead of computing the avg and then retrieve those who have that sum greater than 4 in the last three months.
Also I think you should select your interval using "month(CURRENT_DATE()) - 3" instead of using a window of 90 days. Of course if needed you should handle the case of when current_date is jan-feb-mar and in that case go back to oct-nov-dec of the previous year.
I'm not familiar with Google BigQuery so I can't write your query but I hope this helps.
So I've found the solution to this using WITH operator as below:
WITH filtered_orders AS (
select
distinct customer_id ID,
extract(MONTH from date) Order_Month,
count(order_id) CountofOrders
from customer_order_lines` lines
where EXTRACT(YEAR FROM date) = 2022 AND EXTRACT(MONTH FROM date) IN (2,3,4)
group by ID, Order_Month
having CountofOrders>=4)
select distinct ID
from filtered_orders
group by ID
having count(Order_Month) =3;
Hope this helps!
An option could be first count the orders by month and then filter users which have purchases on all months above your threshold:
WITH ORDERS_BY_MONTH AS (
SELECT
DATE_TRUNC(lines.date, MONTH) PurchaseMonth,
lines.customer_id Customer_ID,
COUNT(lines.order_id) PurchaseFrequency
FROM fct_customer_order_lines lines
LEFT JOIN product_table product
ON lines.entity_id= product.entity_id
AND lines.vendor_id= product.vendor_id
WHERE LOWER(product.country_code)= "IN"
AND lines.date >= DATE_SUB(CURRENT_DATE() , INTERVAL 90 DAY )
AND lines.date < CURRENT_DATE()
GROUP BY PurchaseMonth, Customer_ID
)
SELECT
Customer_ID,
AVG(PurchaseFrequency) AvgPurchaseFrequency
FROM ORDERS_BY_MONTH
GROUP BY Customer_ID
HAVING COUNT(1) = COUNTIF(PurchaseFrequency >= 4)

How to find the average of last 52 weeks sales at each time by SQL

I have a CSV file with four columns: date, wholesaler, product, and sales.
I am looking for finding average of last 52 weeks sales for each Product and Wholesaler combination at each date. It means what is the average previous sales of product A at wholesaler B at time C in last 52 weeks.
For instance we know sales of product 'A' at wholesaler 'B' at Jan, Apr, May, Aug that are 100, 200, 300, 400 respectively. Let assume we do not have any record before Jan. So the average of previous sale of product 'A' in wholesaler 'B' at Apr is equal to 100/1, and at May is equal to (200+100)/2 and at Aug is (300+200+100)/3.
The following table shows my data:
date wholesaler product sales
12/31/2012 53929 UPE54 4
12/31/2012 13131 UPE55 1
2/23/2013 13131 UPE55 1156
4/24/2013 13131 UPE55 1
12/1/2013 83389 UPE54 9
12/17/2013 83389 UPE54 1
12/18/2013 52237 UPE54 9
12/19/2013 53929 UME24 1
12/31/2013 82204 UPE55 9
12/31/2013 11209 UME24 4
12/31/2013 52237 UPE54 1
Now I am using a python code that only works properly for small databases. Since my data-set has more than 25 million rows I am looking for a better way to find the solution. Thanks a million for your help!
I think this is what you are looking for.
WITH cte_prep
AS (
SELECT
YEAR(date) * 100 + DATEPART(WEEK, [DATE]) AS week
, date
, RANK() OVER ( PARTITION BY product, wholesaler ORDER BY YEAR(date) * 100 + DATEPART(WEEK, [DATE]) ) AS product_wholesaler_week_rank
, [wholesaler]
, [product]
, [sales]
FROM
[meta].[dbo].[sales]
)
SELECT
CW.wholesaler
, CW.product
, CW.week
, CW.product_wholesaler_week_rank
, CW.sales
, AVG(BW.sales) AS avg_sales
FROM
cte_prep AS CW
INNER JOIN cte_prep BW
ON BW.product = CW.product AND
BW.wholesaler = CW.wholesaler AND
CW.product_wholesaler_week_rank >= BW.product_wholesaler_week_rank
AND BW.product_wholesaler_week_rank >= CW.product_wholesaler_week_rank - 52
GROUP BY
CW.wholesaler
, CW.product
, CW.week
, CW.sales
, CW.product_wholesaler_week_rank
ORDER BY
CW.wholesaler
, CW.product
, CW.week desc
The results look like this
select sum('sales')/count('sales')
from table
Group by year(date)
What you're asking for is slightly more involved than the answer I gave. I gave an answer that works if you only want to group the year long periods between Jan 1 - Dec 31. It may be the case that you want year long periods, but maybe you want them from July 1 - June 30 instead.
The way to do this is to loo for ways to group by date ranges. Here are a handful of links you may find helpful.
https://dba.stackexchange.com/questions/59356/grouping-by-date-range-in-a-column
SQL Group by Date Range
In SQL, how can you "group by" in ranges?

Customers that stopped ordering monthly-SQL

I am trying to write an SQL query that shows STORES that stopped ordering in a month. That would be STORES that have orders the month before but no orders that month. For example STORES that have orders in January but do Not have orders in Febuary (these would be the STORES that stopped ordering for Febuary). I want to do this for every month (grouped) for a given date range - #datefrom-#dateto
I have one table with an INVOICE#,STORE# and a DATE column
I guess distinct STORE would be in there somewhere.
You can try something like this, break them into two select statements and left outer join them.
select table1.stores from (select * from table where date = 'January') as table1
left outer join (select * from table where date = 'Feburary') as table2
on table1.invoice= table2.invoice
this will return the unique results in January that does not match the results from February
ps. that was not an exact sql statement, just an idea
I have an example that might be close to what you desire. You may have to tweak it to your convenience and desired performance - http://sqlfiddle.com/#!3/231c4/15
create table test (
invoice int identity,
store int,
dt date
);
-- let's add some data to show that
-- store 1 ordered in Jan, Feb and Mar
-- store 2 ordered in Jan (missed Feb and Mar)
-- store 3 ordered in Jan and Mar (missed Feb)
insert into test (store, dt) values
(1, '2015-01-01'),(1, '2015-02-01'),(1, '2015-03-01'),
(2, '2015-01-01'),
(3, '2015-01-01'), (3, '2015-03-01');
Query
-----
with
months as (select distinct year(dt) as yr, month(dt) as mth from test),
stores as (select distinct store from test),
months_stores as (select * from months cross join stores)
select *
from months_stores ms
left join test t
on t.store = ms.store
and year(t.dt) = ms.yr
and month(t.dt) = ms.mth
where
(ms.yr = 2015 and ms.mth between 1 and 3)
and t.invoice is null
Result:
yr mth store ...other columns
2015 2 2
2015 2 3
2015 3 2
The results show us that store 2 missed orders in months Feb and Mar
and store 3 missed an order in Feb

SQL : finding people that have made more than one order within a 30 day period

I have a (simplified) table called Orders that has the following columns:
OrderId,
PersonId
OrderDate
What I am trying to find out is how many people in the table have made more than one order within a 30 day period. For example, if Bob orders something on Januay 3, 2015 and then orders another something on January 21, 2015, he would be included in the list because he ordered two things within a 30 day period.
I have been trying to put together the SQL statements to do this, but I am not very good at this and can't seem to figure it out.
I am using SQL Server.
Thanks for any help.
You COULD try by doing with a join to the table itself based on same customer and order date within 30 of the order being compared to...
I would ensure to have TWO indexes on the table...
First by date for the range you may be expecting (OrderDate)
Second based on ( PersonID, OrderID, OrderDate ) to get them sequential for a given person for the join basis.
select
O.PersonID,
O.OrderDate,
O.OrderID,
MAX( O2.OrderDate ) as LastOrderDate,
COUNT(*) as TotalOrders
from
Orders O
JOIN Orders O2
on O.PersonID = O2.PersonID
AND O.OrderID < O2.OrderID
AND O2.OrderDate < DATEADD( day, 30, O.OrderDate);
where
O.OrderDate > '2014-01-01'
group by
O.PersonID,
O.OrderDate,
O.OrderID
Now, because this is a JOIN (not a left-join), it is guaranteeing another order exists for the same person but a higher order than the one based on the main query "O" alias, but ONLY within 30 days of the original O.OrderDate.
Now, this may create multiple instances for a single person, such as an example where person buys on Jan 1, Jan 20, Jan 29, Feb 1, Feb 10, May 4 because
From the original order Jan 1, that will encompass a count of 3 entries ending at Jan 29.
But now, the outer cycle gets to the Jan 20 record and it finds 3 entries after it up to and including Feb 10... Same person, but the rolling 30 days.
Similar for Jan 29, then Feb 1, but will not have an entry for Feb 10 as nothing after it within 30 days until the May 4th date, and nothing for May 4th as nothing after it period.
Now, if you only care about the WHO, once you've confirmed the above query WOULD work, you MAY only want those with the total orders MORE than 2... say 3 or more within 45 days? Who knows.. Just add a HAVING COUNT(*) > 2.
That should still work, but now, getting just the person... Ignore the rest of the columns and just do
Select distinct O.PersonID from ... rest of query
Select PersonId, count(PersonId)
from Orders
where OrderDate <= '2015-01-31' and OrderDate >= '2015-01-01'
group by PersonId
having count(PersonId) > 1

TSQL Running Totals aggregate from sum of previous rows

Not sure how to word this. Say i have a select returing this.
Name, month, amount
John, June, 5
John, July,6
John, July, 3
John August, 10
and I want to aggregate and report beggining blance for each month.
name, month, beggining balance.
john, may, 0
john, june, 0
john, july, 5
john, august, 14
john, September, 24
I can do this in excel with cell formulas, but how can I do it in SQL without storing values somewhere? I have another table with fiscal months i can do a left outer join with so all months are reported, just not sure how to aggregate from prior months in sql.
select
name
, month
, (select sum(balance) from mytable
where mytable.month < m.month and mytable.name = m.name) as starting_balance
from mytable m
group by name, month
This is not as nice as windowing functions, but since they vary from database to database, you'd need to tell us which system you are using.
And it's an inline subquery, which is not very performant. But at least it's easy to understand what's going on !
Use Grouping like this
SELECT NAME, MONTH , SUM(Balance) FROM table GROUP BY NAME, MONTH
Assuming your months are represented as dates, this will give you the running total.
select t1.name, t1.month, sum(t2.amount)
from yourtable t1
left join yourtable t2
on t1.name = t2.name
and t1.month>t2.month
group by t1.name, t1.month