SQL join with dates - sql

I have table A with columns:
customer_id, month, amount
Month is like 2015/12/01 meaning it's amount paid in December 2015.
Then there is table B with columns:
customer_id, plan_id, start_date, end_date
This is information on when a particular customer started and ended using a particular plan. The current plan will have end_date NULL. One customer could have used many different plans in the past.
I need to add plan_id column to table A by joining these 2 tables but I have no idea how to deal with the dates.
Note that for each customer one month should correspond to one plan only. So even if the start_date for a plan is 2015/11/02, it should only be applied for the next month (2015/12/01).

This is a basically a join, but with inequalities:
select a.*, b.*
from a left join
b
on a.customer_id = b.customer_id and
a.month >= b.start_date and
(a.month <= b.end_date or b.end_date is null);

Related

How to Select dates when i sold nothing?

Assume i have a table of item_sold from 01-01-2020 to 01-01-2023 with columns product_id, product_name, quantity and date.
I want to get all the dates when i sold nothing.
I am using Postgresql, please help me in this problem.
I tried withclaue and many other things but they didn't worked.
You need some kind of calendar table containing all dates which you potentially want to report. Assuming the date range be the entire years of 2020 through 2023 inclusive, we can try the following left anti-join approach:
WITH dates AS (
SELECT ('2020-01-01'::date + s.a) AS dt
FROM generate_series(0, 365*4) AS s(a)
)
SELECT d.dt
FROM dates d
LEFT JOIN yourTable t
ON t.item_sold = d.dt
WHERE t.item_sold IS NULL
ORDER BY d.dt;

SQL Sum Sales for a week based on previous week range

Need help with SQL query in resolving the below problem:
Input table has Product ID, Week and sales which is unique, while columns Start and End Week column are the range based on which I want to sum up sales for that particular week.
From the Input table we want to extract the Product ID along with Week and get the sum of sales based on the week being between start week and end week range.
The Sales value against each Product ID and Week is the sum of sales based on the corresponding start and end week for that product and week combination in the input table.
I was trying to do a self join on the input table but realized it would not work as I need to join on both Product ID and Week which will nullify the objective.
Select a.Product ID, a.Week, Sum(a.Sales)
from Input as a, Input as b
where a.Product ID = b.Product ID
and a.Week between b.Start Week and b.End Week
group by 1,2
You just need to switch to an Outer Join:
Select a.Product ID, a.Week, Sum(a.Sales)
from Input as a LEFT JOIN Input as b
ON a.Product ID = b.Product ID
and a.Week between b.Start Week and b.End Week
group by 1,2
This should result in a better plan than Alec's subquery.
I'm not crazy about the sub query, but it should at least get you started:
SELECT
a.ProductID,
a.Week,
(
SELECT
SUM(b.Sales)
FROM
Table b
WHERE
b.ProductID = a.ProductID AND
b.Week BETWEEN a.StartWeek AND a.EndWeek
) as CumulativeSales
FROM
Table a

Calculating business days in Teradata

I need help in business days calculation.
I've two tables
1) One table ACTUAL_TABLE containing order date and contact date with timestamp datatypes.
2) The second table BUSINESS_DATES has each of the calendar dates listed and has a flag to indicate weekend days.
using these two tables, I need to ensure business days and not calendar days (which is the current logic) is calculated between these two fields.
My thought process was to first get a range of dates by comparing ORDER_DATE with TABLE_DATE field and then do a similar comparison of CONTACT_DATE to TABLE_DATE field. This would get me a range from the BUSINESS_DATES table which I can then use to calculate count of days, sum(Holiday_WKND_Flag) fields making the result look like:
Order# | Count(*) As DAYS | SUM(WEEKEND DATES)
100 | 25 | 8
However this only works when I use a specific order number and cant' bring all order numbers in a sub query.
My Query:
SELECT SUM(Holiday_WKND_Flag), COUNT(*) FROM
(
SELECT
* FROM
BUSINESS_DATES
WHERE BUSINESS.Business BETWEEN (SELECT ORDER_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
AND
(SELECT CONTACT_DATE FROM ACTUAL_TABLE
WHERE ORDER# = '100'
)
TEMP
Uploading the table structure for your reference.
SELECT ORDER#, SUM(Holiday_WKND_Flag), COUNT(*)
FROM business_dates bd
INNER JOIN actual_table at ON bd.table_date BETWEEN at.order_date AND at.contact_date
GROUP BY ORDER#
Instead of joining on a BETWEEN (which always results in a bad Product Join) followed by a COUNT you better assign a bussines day number to each date (in best case this is calculated only once and added as a column to your calendar table). Then it's two Equi-Joins and no aggregation needed:
WITH cte AS
(
SELECT
Cast(table_date AS DATE) AS table_date,
-- assign a consecutive number to each busines day, i.e. not increased during weekends, etc.
Sum(CASE WHEN Holiday_WKND_Flag = 1 THEN 0 ELSE 1 end)
Over (ORDER BY table_date
ROWS Unbounded Preceding) AS business_day_nbr
FROM business_dates
)
SELECT ORDER#,
Cast(t.contact_date AS DATE) - Cast(t.order_date AS DATE) AS #_of_days
b2.business_day_nbr - b1.business_day_nbr AS #_of_business_days
FROM actual_table AS t
JOIN cte AS b1
ON Cast(t.order_date AS DATE) = b1.table_date
JOIN cte AS b2
ON Cast(t.contact_date AS DATE) = b2.table_date
Btw, why are table_date and order_date timestamp instead of a date?
Porting from Oracle?
You can use this query. Hope it helps
select order#,
order_date,
contact_date,
(select count(1)
from business_dates_table
where table_date between a.order_date and a.contact_date
and holiday_wknd_flag = 0
) business_days
from actual_table a

Summarize Table Based on Two Date Fields

I have a table that, in its simplified form, has two date fields and an amount field. One of the date fields is holds the order date, and one of the fields contains the shipped date. I've been asked to report on both the amounts ordered and shipped grouped by date.
I used a self join that seemed to be working fine, except I found that it doesn't work on dates where no new orders were taken, but orders were shipped. I'd appreciate any help figuring out how best to solve the problem. (See below)
Order_Date Shipped_Date Amount
6/1/2015 6/2/2015 10
6/1/2015 6/3/2015 15
6/2/2015 6/3/2015 17
The T-SQL statement I'm using is as follows:
select a.ddate, a.soldamt, b.shippedamt
from
(select order_date as ddate, sum(amount) as soldamt from TABLE group by order_date) a
left join
(select shipped_date as ddate, sum(amount) as shippedamt from TABLE group by shipped_date) b
on a.order_date = b.shipped_date
This results in:
ddate soldamt shippedamt
6/1/2015 15 0
6/2/2015 17 10
The amount shipped on 6/3/2015 doesn't appear, obviously because there are no new orders on that date.
It's important to note this is being done in a Visual FoxPro table using T-SQL syntax, so some of the features found in more popular databases do not exist (for example, PIVOT)
The simplest change would be to use a FULL OUTER JOIN instead of LEFT. A full join combines both right and left joins including unmatched records in both directions.
SELECT a.ddate, a.soldamt, b.shippedamt
FROM
(select order_date as ddate, sum(amount) as soldamt from TABLE group by order_date) a
FULL OUTER JOIN
(select shipped_date as ddate, sum(amount) as shippedamt from TABLE group by shipped_date) b
ON a.order_date = b.shipped_date
An other method (besides full outer join) is to use union all and an additional aggregation:
select ddate, sum(soldamt) as soldamt, sum(shippedamt) as shippedamt
from ((select order_date as ddate, sum(amount) as soldamt, 0 as shippedamt
from TABLE
group by order_date
) union all
(select shipped_date as ddate, 0, sum(amount) as shippedamt
from TABLE
group by shipped_date
)
) os
group by ddate;
This also results in fewer NULL values.

SELECT Query between dates, only selecting items between start and end fields

I have two tables that I will be using for tracking purposes, a Date Table and a Item Table. The Date Table is used to track the start and end dates of a tracked id. The Item Table is the amount of items that are pulled on a specific date for an id. The id is the foreign key between these two tables.
What I want to do, is a sum of the items with a GROUP BY of the id of the items, but only by summing the items based on if the date of the pulled item falls between the start_date and end_date of the tracked id.
The Date Table
id start_date end_date
1 2014-01-01 NULL
2 2014-01-01 2014-01-02
3 2014-01-25 NULL
The Item Table
id items date
1 3 2014-01-01
1 5 2014-01-02
1 5 2014-01-26
2 2 2014-01-01
2 3 2014-01-05
2 2 2014-01-26
3 2 2014-01-01
3 3 2014-01-05
3 2 2014-01-26
SQL I have so far, but I'm lost as to what to add to it from here.
SELECT
a.id,
SUM(items)
FROM
ww_test.dbo.items a
INNER JOIN ww_test.dbo.dates b ON
a.id = b.id
WHERE
a.date >= '2014-01-01' AND a.date <= '2014-01-30'
GROUP BY
a.id
ORDER BY
a.id
The output should be:
id items
1 13
2 2
3 2
Instead of:
id items
1 13
2 7
3 7
First of all, I strongly recommend that you stop using NULL in your date ranges to represent "no end date" and instead use a sentinel value such as 9999-12-31. The reason for this is primarily performance and secondarily query simplicity--a benefit to yourself now in writing the queries and to you or others later who have to maintain them. In front-end or middle-tier code, there is little difference to comparing a date range to Null or to 9999-12-31, and in fact you get some of the same benefits of simplified code there as you do in your SQL. I base this recommendation on over 10 years of full-time professional SQL query writing experience.
To fix your query as is, I think this would work:
SELECT
a.id,
ItemsSum = SUM(items)
FROM
ww_test.dbo.items a
INNER JOIN ww_test.dbo.dates b
ON a.id = b.id
AND a.date >= Coalesce(b.start_date, 0)
AND a.date <= Coalesce(b.end_date, '99991231')
WHERE
a.date >= '20140101'
AND a.date <= '20140130'
GROUP BY
a.id
ORDER BY
a.id
;
Note that if you followed my recommendation, your query JOIN conditions could look like this:
INNER JOIN ww_test.dbo.dates b
ON a.id = b.id
AND a.date >= b.start_date
AND a.date <= b.end_date
You will find that if your data sets become large, having to put a Coalesce or IsNull in there will hurt performance in a significant way. It doesn't help to use OR clauses, either:
INNER JOIN ww_test.dbo.dates b
ON a.id = b.id
AND (a.date >= b.start_date OR b.start_date IS NULL)
AND (a.date <= b.end_date OR b.end_date IS NULL)
That's going to have the same problems (for example converting what could have been a seek when there's a suitable index, into a scan, which would be very sad).
Last, I also recommend that you change your end dates to be exclusive instead of inclusive. This means that for the end date, instead of entering the date of the beginning of the final day the information is true, you put the date of the first day it is no longer true. There are several reasons for this recommendation:
If your date resolution ever changes to hours, or minutes, or seconds, every piece of code you have ever written dealing with this data will have to change (and it won't if you use exclusive end dates).
If you ever have to compare date ranges to each other (to collapse date ranges together or locate contiguous ranges or even locate non-contiguous ranges), you now have to do all the comparisons on a.end_date + 1 = b.start_date instead of a simple equijoin of a.end_date = b.start_date. This is painful, and easy to make mistakes.
Always thinking of dates as suggesting time of day will be extremely salutary to your coding ability in any language. Many mistakes are made, over and over, by people forgetting that dates, even ones in formats that can't denote a time portion (such as the date data type in SQL 2008 and up) still have an implicit time portion, and can be converted directly to date data types that do have a time portion, and that time portion will always be 0 or 12 a.m..
The only drawback is that in some cases, you have to do some twiddling about what date you show users (to convert to the inclusive date) and then convert dates they enter into the exclusive date for storing into the database. But this is confined to UI-handling code and is not throughout your database, so it's not that big a drawback.
The only change to your query would be:
INNER JOIN ww_test.dbo.dates b
ON a.id = b.id
AND a.date >= b.start_date
AND a.date < b.end_date -- no equal sign now
One last thing: be aware that the date format 'yyyy-mm-dd' is not culture-safe.
SET LANGUAGE FRENCH;
SELECT Convert(datetime, '2014-01-30'); -- fails with an error
The only invariantly culture-safe formats for datetime in SQL Server are:
yyyymmdd
yyyy-mm-ddThh:mm:ss
I think what you want to do is to compare the dates to be between the start_date and end_date of your Data table.
Change your query to the following and try
SELECT
a.id,
SUM(items)
FROM
ww_test.dbo.items a
INNER JOIN ww_test.dbo.dates b ON a.id = b.id
WHERE
a.date >= ISNULL(b.start_date, GETDATE())
AND a.date <= ISNULL(b.end_date, GETDATE())
GROUP BY a.id
ORDER BY a.id
The problem with the query is the condition part.
Also, since you need to retrieve data based on the condition defined in Dates table, you do not have to explicitly hard code the condition.
Assuming that your End Date can either be null or have values, you can use the following
query:
SELECT
a.id,
SUM(items)
FROM
ww_test.dbo.items a
INNER JOIN ww_test.dbo.dates b ON
a.id = b.id
where (b.end_date is not null and a.date between b.start_date and b.end_date)
or (b.end_date is null and a.date >= b.start_date)
GROUP BY
a.id
ORDER BY
a.id