Summarize Table Based on Two Date Fields - sql

I have a table that, in its simplified form, has two date fields and an amount field. One of the date fields is holds the order date, and one of the fields contains the shipped date. I've been asked to report on both the amounts ordered and shipped grouped by date.
I used a self join that seemed to be working fine, except I found that it doesn't work on dates where no new orders were taken, but orders were shipped. I'd appreciate any help figuring out how best to solve the problem. (See below)
Order_Date Shipped_Date Amount
6/1/2015 6/2/2015 10
6/1/2015 6/3/2015 15
6/2/2015 6/3/2015 17
The T-SQL statement I'm using is as follows:
select a.ddate, a.soldamt, b.shippedamt
from
(select order_date as ddate, sum(amount) as soldamt from TABLE group by order_date) a
left join
(select shipped_date as ddate, sum(amount) as shippedamt from TABLE group by shipped_date) b
on a.order_date = b.shipped_date
This results in:
ddate soldamt shippedamt
6/1/2015 15 0
6/2/2015 17 10
The amount shipped on 6/3/2015 doesn't appear, obviously because there are no new orders on that date.
It's important to note this is being done in a Visual FoxPro table using T-SQL syntax, so some of the features found in more popular databases do not exist (for example, PIVOT)

The simplest change would be to use a FULL OUTER JOIN instead of LEFT. A full join combines both right and left joins including unmatched records in both directions.
SELECT a.ddate, a.soldamt, b.shippedamt
FROM
(select order_date as ddate, sum(amount) as soldamt from TABLE group by order_date) a
FULL OUTER JOIN
(select shipped_date as ddate, sum(amount) as shippedamt from TABLE group by shipped_date) b
ON a.order_date = b.shipped_date

An other method (besides full outer join) is to use union all and an additional aggregation:
select ddate, sum(soldamt) as soldamt, sum(shippedamt) as shippedamt
from ((select order_date as ddate, sum(amount) as soldamt, 0 as shippedamt
from TABLE
group by order_date
) union all
(select shipped_date as ddate, 0, sum(amount) as shippedamt
from TABLE
group by shipped_date
)
) os
group by ddate;
This also results in fewer NULL values.

Related

SQL: Difference between consecutive rows

Table with 3 columns: order id, member id, order date
Need to pull the distribution of orders broken down by No. of days b/w 2 consecutive orders by member id
What I have is this:
SELECT
a1.member_id,
count(distinct a1.order_id) as num_orders,
a1.order_date,
DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1
inner join orders as a2
on a2.member_id = a1.member_id+1;
It's not helping me completely as the output I need is:
You can use lag() to get the date of the previous order by the same customer:
select o.*,
datediff(
order_date,
lag(order_date) over(partition by member_id order by order_date, order_id)
) days_diff
from orders o
When there are two rows for the same date, the smallest order_id is considered first. Also note that I fixed your datediff() syntax: in Hive, the function just takes two dates, and no unit.
I just don't get the logic you want to compute num_orders.
May be something like this:
SELECT
a1.member_id,
count(distinct a1.order_id) as num_orders,
a1.order_date,
DATEDIFF(DAY, a1.order_date, a2.order_date) as days_since_last_order
from orders as a1
inner join orders as a2
on a2.member_id = a1.member_id
where not exists (
select intermediate_order
from orders as intermedite_order
where intermediate_order.order_date < a1.order_date and intermediate_order.order_date > a2.order_date) ;

Calculating business days in Teradata

I need help in business days calculation.
I've two tables
1) One table ACTUAL_TABLE containing order date and contact date with timestamp datatypes.
2) The second table BUSINESS_DATES has each of the calendar dates listed and has a flag to indicate weekend days.
using these two tables, I need to ensure business days and not calendar days (which is the current logic) is calculated between these two fields.
My thought process was to first get a range of dates by comparing ORDER_DATE with TABLE_DATE field and then do a similar comparison of CONTACT_DATE to TABLE_DATE field. This would get me a range from the BUSINESS_DATES table which I can then use to calculate count of days, sum(Holiday_WKND_Flag) fields making the result look like:
Order# | Count(*) As DAYS | SUM(WEEKEND DATES)
100 | 25 | 8
However this only works when I use a specific order number and cant' bring all order numbers in a sub query.
My Query:
SELECT SUM(Holiday_WKND_Flag), COUNT(*) FROM
(
SELECT
* FROM
BUSINESS_DATES
WHERE BUSINESS.Business BETWEEN (SELECT ORDER_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
AND
(SELECT CONTACT_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
TEMP
Uploading the table structure for your reference.
SELECT ORDER#, SUM(Holiday_WKND_Flag), COUNT(*)
FROM business_dates bd
INNER JOIN actual_table at ON bd.table_date BETWEEN at.order_date AND at.contact_date
GROUP BY ORDER#
Instead of joining on a BETWEEN (which always results in a bad Product Join) followed by a COUNT you better assign a bussines day number to each date (in best case this is calculated only once and added as a column to your calendar table). Then it's two Equi-Joins and no aggregation needed:
WITH cte AS
(
SELECT
Cast(table_date AS DATE) AS table_date,
-- assign a consecutive number to each busines day, i.e. not increased during weekends, etc.
Sum(CASE WHEN Holiday_WKND_Flag = 1 THEN 0 ELSE 1 end)
Over (ORDER BY table_date
ROWS Unbounded Preceding) AS business_day_nbr
FROM business_dates
)
SELECT ORDER#,
Cast(t.contact_date AS DATE) - Cast(t.order_date AS DATE) AS #_of_days
b2.business_day_nbr - b1.business_day_nbr AS #_of_business_days
FROM actual_table AS t
JOIN cte AS b1
ON Cast(t.order_date AS DATE) = b1.table_date
JOIN cte AS b2
ON Cast(t.contact_date AS DATE) = b2.table_date
Btw, why are table_date and order_date timestamp instead of a date?
Porting from Oracle?
You can use this query. Hope it helps
select order#,
order_date,
contact_date,
(select count(1)
from business_dates_table
where table_date between a.order_date and a.contact_date
and holiday_wknd_flag = 0
) business_days
from actual_table a

SQL: aggregation (group by like) in a column

I have a select that group by customers spending of the past two months by customer id and date. What I need to do is to associate for each row the total amount spent by that customer in the whole first week of the two month time period (of course it would be a repetition for each row of one customer, but for some reason that's ok ). do you know how to do that without using a sub query as a column?
I was thinking using some combination of OVER PARTITION, but could not figure out how...
Thanks a lot in advance.
Raffaele
Query:
select customer_id, date, sum(sales)
from transaction_table
group by customer_id, date
If it's a specific first week (e.g. you always want the first week of the year, and your data set normally includes January and February spending), you could use sum(case...):
select distinct customer_id, date, sum(sales) over (partition by customer_ID, date)
, sum(case when date between '1/1/15' and '1/7/15' then Sales end)
over (partition by customer_id) as FirstWeekSales
from transaction_table
In response to the comments below; I'm not sure if this is what you're looking for, since it involves a subquery, but here's my best shot:
select distinct a.customer_id, date
, sum(sales) over (partition by a.customer_ID, date)
, sum(case when date between mindate and dateadd(DD, 7, mindate)
then Sales end)
over (partition by a.customer_id) as FirstWeekSales
from transaction_table a
left join
(select customer_ID, min(date) as mindate
from transaction_table group by customer_ID) b
on a.customer_ID = b.customer_ID

How can I join a record with the closest prior date in another table?

I want to produce a sales report showing the price of each product at the time it was sold. The pricing is stored in a separate price history table, so I want to join with the most recent price that is prior to the sale date.
I have an idea of how to implement the query, but the database I'm working with (Vertica) doesn't seem to support what I want to do.
Here's a simplified version of the table structure:
Sales
-----
Date Product ID
1/1/2001 1
2/2/2002 1
3/3/2003 1
PriceHistory
-------------
Date ProductID Price
12/31/2000 1 1.00
12/31/2001 1 1.01
12/31/2002 1 1.11
Here's an example of the report I want to generate from the above data:
Sales Report
------------
Date ProductID Price
1/1/2001 1 1.00
2/2/2002 1 1.01
3/3/2003 1 1.11
Here's the SQL I've written so far:
SELECT s.date,
s.productid,
ph.price
FROM Sales s,
PriceHistory ph
WHERE s.productid=ph.productid
AND ph.date=
(SELECT MAX(date)
FROM PriceHistory
WHERE productid=s.productid
AND date < s.date)
That might work on another DB platform, but Vertica gives me this error: "Non-equality correlated subquery expression is not supported". It doesn't like the date < s.date component of my subquery.
Is there another way to do this?
Later, I found a workaround that did work on Vertica, using two different references to the PriceHistory table in the query:
SELECT s.date,
s.productid,
ph.price
FROM Sales s
JOIN PriceHistory ph ON ph.date < s.date
WHERE s.productid=ph.productid
AND ph.date=
(SELECT MAX(date)
FROM PriceHistory ph2
WHERE ph2.productid=s.productid
AND ph2.date = ph.date)
There may be a better way to do it, if so I'd love to hear it.
Try the JOIN INTERPOLATE clause. For your example the following example produce exactly what yow want:
SELECT s.date,
s.productid,
ph.price
FROM Sales s
left outer join PriceHistory ph ON s.date INTERPOLATE
PREVIOUS VALUE ph.date
where s.productid=ph.productid
Issue is with this:
SELECT MAX(date)
FROM PriceHistory
WHERE productid=s.productid
AND date < s.date
S isn't defined...subqueries are stand alone and can't refer to tables in the from statement in the 'outside' query. Just do the join in your subquery.
SELECT MAX(date)
FROM PriceHistory ph, Sales s
WHERE ph.productid=s.productid
AND date < s.date
I feel jealous knowing other people get to play on Vertica while I'm in MySQL. ugh..
Should mention you are using old syntax and the majority of this can be done as joins. Call the max_date a subquery and move it to the from statement...inner joins function as filters at that level.
Edit:
If that failed, it's vertica having some subquery problems...it's been a while but I think I've solved this one in the past (with vertica support):
SELECT s.date,
s.productid,
ph.price
FROM Sales s,
PriceHistory ph,
(SELECT MAX(date)
FROM PriceHistory ph, Sales s
WHERE ph.productid=s.productid
AND date < s.date) a
WHERE s.productid=ph.productid
AND ph.date= max_date
Give that version a try...if that fails, I'd go back to vertica support and ask them for how to handle it (functionally, what you are asking for should be standard, I assume vertica just has a slightly altered way of getting there...I recall being asked to avoid using subqueries in where clauses).

Can I limit the amount of rows to be used for a group in a GROUP BY statement

I'm having an odd problem
I have a table with the columns product_id, sales and day
Not all products have sales every day. I'd like to get the average number of sales that each product had in the last 10 days where it had sales
Usually I'd get the average like this
SELECT product_id, AVG(sales)
FROM table
GROUP BY product_id
Is there a way to limit the amount of rows to be taken into consideration for each product?
I'm afraid it's not possible but I wanted to check if someone has an idea
Update to clarify:
Product may be sold on days 1,3,5,10,15,17,20.
Since I don't want to get an the average of all days but only the average of the days where the product did actually get sold doing something like
SELECT product_id, AVG(sales)
FROM table
WHERE day > '01/01/2009'
GROUP BY product_id
won't work
If you want the last 10 calendar day since products had a sale:
SELECT product_id, AVG(sales)
FROM table t
JOIN (
SELECT product_id, MAX(sales_date) as max_sales_date
FROM table
GROUP BY product_id
) t_max ON t.product_id = t_max.product_id
AND DATEDIFF(day, t.sales_date, t_max.max_sales_date) < 10
GROUP BY product_id;
The date difference is SQL server specific, you'd have to replace it with your server syntax for date difference functions.
To get the last 10 days when the product had any sale:
SELECT product_id, AVG(sales)
FROM (
SELECT product_id, sales, DENSE_RANK() OVER
(PARTITION BY product_id ORDER BY sales_date DESC) AS rn
FROM Table
) As t_rn
WHERE rn <= 10
GROUP BY product_id;
This asumes sales_date is a date, not a datetime. You'd have to extract the date part if the field is datetime.
And finaly a windowing function free version:
SELECT product_id, AVG(sales)
FROM Table t
WHERE sales_date IN (
SELECT TOP(10) sales_date
FROM Table s
WHERE t.product_id = s.product_id
ORDER BY sales_date DESC)
GROUP BY product_id;
Again, sales_date is asumed to be date, not datetime. Use other limiting syntax if TOP is not suported by your server.
Give this a whirl. The sub-query selects the last ten days of a product where there was a sale, the outer query does the aggregation.
SELECT t1.product_id, SUM(t1.sales) / COUNT(t1.*)
FROM table t1
INNER JOIN (
SELECT TOP 10 day, Product_ID
FROM table t2
WHERE (t2.product_ID=t1.Product_ID)
ORDER BY DAY DESC
)
ON (t2.day=t1.day)
GROUP BY t1.product_id
BTW: This approach uses a correlated subquery, which may not be very performant, but it should work in theory.
I'm not sure if I get it right but If you'd like to get the average of sales for last 10 days for you products you can do as follows :
SELECT Product_Id,Sum(Sales)/Count(*) FROM (SELECT ProductId,Sales FROM Table WHERE SaleDAte>=#Date) table GROUP BY Product_id HAVING Count(*)>0
OR You can use AVG Aggregate function which is easier :
SELECT Product_Id,AVG(Sales) FROM (SELECT ProductId,Sales FROM Table WHERE SaleDAte>=#Date) table GROUP BY Product_id
Updated
Now I got what you meant ,As far as I know it is not possible to do this in one query.It could be possible if we could do something like this(Northwind database):
select a.CustomerId,count(a.OrderId)
from Orders a INNER JOIN(SELECT CustomerId,OrderDate FROM Orders Order By OrderDate) AS b ON a.CustomerId=b.CustomerId GROUP BY a.CustomerId Having count(a.OrderId)<10
but you can't use order by in subqueries unless you use TOP which is not suitable for this case.But maybe you can do it as follows:
SELECT PorductId,Sales INTO #temp FROM table Order By Day
select a.ProductId,Sum(a.Sales) /Count(a.Sales)
from table a INNER JOIN #temp AS b ON a.ProductId=b.ProductId GROUP BY a.ProductId Having count(a.Sales)<=10
If this is a table of sales transactions, then there should not be any rows in there for days on which there were no Sales. I.e., If ProductId 21 had no sales on 1 June, then this table should not have any rows with productId = 21 and day = '1 June'... Therefore you should not have to filter anything out - there should not be anything to filter out
Select ProductId, Avg(Sales) AvgSales
From Table
Group By ProductId
should work fine. So if it's not, then you have not explained the problem completely or accurately.
Also, in yr question, you show Avg(Sales) in the example SQL query but then in the text you mention "average number of sales that each product ... " Do you want the average sales amount, or the average count of sales transactions? And do you want this average by Product alone (i.e., one output value reported for each product) or do you want the average per product per day ?
If you want the average per product alone, for just thpse sales in the ten days prior to now? or the ten days prior to the date of the last sale for each product?
If the latter then
Select ProductId, Avg(Sales) AvgSales
From Table T
Where day > (Select Max(Day) - 10
From Table
Where ProductId = T.ProductID)
Group By ProductId
If you want the average per product alone, for just those sales in the ten days with sales prior to the date of the last sale for each product, then
Select ProductId, Avg(Sales) AvgSales
From Table T
Where (Select Count(Distinct day) From Table
Where ProductId = T.ProductID
And Day > T.Day) <= 10
Group By ProductId