Matching based on whether year is included (at all) in date range - sql

I am attempting to join two tables based on date ranges.
Table A format is:
ID CAT DATE_START DATE_END
1 10 2018-01-01 2020-12-31
2 15 2018-06-01 2018-07-01
Table B format is:
ID YEAR VALUE
1 2017 100
1 2018 110
1 2019 90
1 2020 30
2 2018 200
The resulting table should be merged if for a given ID, any of the days in B.YEAR are included in the date range from A.DATE_START to A.DATE_END, and should look like this:
ID YEAR CAT VALUE
1 2018 10 110
1 2019 10 90
1 2020 10 30
2 2018 15 200
I tried merging using extract(year from DATE_START) and extract(year from DATE_START), but I cannot manage to include the middle year 2019 in the interval, which means ID = 1 is missing its 2019 value.
I also tried merging using to_date(YEAR), 'YYYY'), but the generated date for YEAR = '2018' is '1.9.2018', which does not fall in the interval for ID = 2. Thanks a lot for help.

Join the tables like this:
select a.ID, b.YEAR, a.CAT, b.VALUE
from TableA a inner join TableB b
on b.ID = a.ID
and b.year between extract(year from a.DATE_START) and extract(year from a.DATE_END)
See the demo.
Results:
> ID | YEAR | CAT | VALUE
> -: | ---: | --: | ----:
> 1 | 2018 | 10 | 110
> 1 | 2019 | 10 | 90
> 1 | 2020 | 10 | 30
> 2 | 2018 | 15 | 200

First: I use Microsoft SQL Server, so apologies if this doesn't work in Oracle.
SELECT * FROM TableA
INNER JOIN TableB ON TableA.Id = TableB.Id AND TableB.Year BETWEEN YEAR(TableA.Date_Start) AND YEAR(TableA.Date_End)

Related

Calculate cumulative percentages by date in SQL

How might I calculate cumulative percentages in SQL (Postgres/Vertica)?
For instance, the question is "As of each date, of all patients who had been diagnosed by that date, what percent had been treated by that date?"
For instance, this table shows dates of diagnosis and treatment, with binary values that might be summed
ID | diagnosed | date_diag | treated | date_treat
---|------------|-----------|----------|-----------
1 1 Jan 1 0 null
2 1 Jan 15 1 Feb 20
3 1 Jan 29 1 Feb 1
4 1 Feb 08 1 Mar 4
5 1 Feb 12 0 null
6 1 Feb 18 1 Feb 24
7 1 Mar 15 1 May 5
8 1 Apr 14 1 Apr 20
I'd like to get a table of cumulative treated-vs-diagnosed ratio that might look like this.
date | ytd_diag | ytd_treat | ytd_percent
-------|------------|-----------|----------
Jan 01 1 0 0.00
Jan 15 2 0 0.00
Jan 29 3 0 0.00
Feb 08 4 1 0.25
Feb 12 5 1 0.20
Feb 18 6 1 0.17
Mar 15 7 4 0.57
Apr 14 8 4 0.50
I can calculate cumulative counts of diagnosed or treated (e.g. below), using window functions but I can't figure out a SQL query to get the number of people who'd already been treated as of each diagnosis date.
SELECT
date_diag ,
SUM(COUNT(*)) OVER ( ORDER BY date_diag ) as freq
FROM patients
WHERE diagnosed = 1
GROUP BY date_diag
ORDER BY date_diag;
You can use conditional aggregation with SUM() window function:
WITH cte AS (
SELECT kind,
date,
SUM((kind = 1)::int) OVER (ORDER BY date) ytd_diag,
SUM((kind = 2)::int) OVER (ORDER BY date) ytd_treat
FROM (
SELECT 1 kind, date_diag date, diagnosed status FROM patients
UNION ALL
SELECT 2, date_treat, treated FROM patients WHERE date_treat IS NOT NULL
) t
)
SELECT date, ytd_diag, ytd_treat,
ROUND(1.0 * ytd_treat / ytd_diag, 2) ytd_percent
FROM cte
WHERE kind = 1;
See the demo.
You can solve this with window functions. The first thing you want to do is to derive a table from your patients table that has a running tally of both the diagnosed and treated columns. The rows should be tallied in ascending order of the diagnosis date.
Here's how you do that.First I'll create a sample patients table and data (I'll only include the columns necessary):
create temporary table patients (
date_diag date,
diagnosed int default 0,
treated int default 0
);
insert into patients (date_diag, diagnosed, treated) values
('2021-01-01', 1, 0),
('2021-01-11', 1, 1),
('2021-01-16', 1, 0),
('2021-01-30', 1, 1),
('2021-02-04', 1, 1),
('2021-01-14', 1, 1);
Then here's how to create the derived table of all the tallied results.
select
date_diag,
diagnosed,
treated,
sum(treated) over(order by date_diag ASC ) as treated_cmtv,
count(diagnosed) over(order by date_diag ASC) as diagnosed_cmtv
from patients
/*
date_diag | diagnosed | treated | treated_cmtv | diagnosed_cmtv
------------+-----------+---------+--------------+----------------
2021-01-01 | 1 | 0 | 0 | 1
2021-01-11 | 1 | 1 | 1 | 2
2021-01-14 | 1 | 1 | 2 | 3
2021-01-16 | 1 | 0 | 2 | 4
2021-01-30 | 1 | 1 | 3 | 5
2021-02-04 | 1 | 1 | 4 | 6
*/
Now that you have this table you can easily calculate the percentage by using defining this derived table in a subquery and then selecting the necessary columns for the calculation. Like so:
select
p.date_diag,
p.diagnosed,
p.diagnosed_cmtv,
p.treated_cmtv,
p.treated,
TRUNC(p.treated_cmtv::numeric / p.diagnosed_cmtv * 1.0, 2) as percent
from (
-- same table as above
select
date_diag,
diagnosed,
treated,
sum(treated) over(order by date_diag ASC ) as treated_cmtv,
count(diagnosed) over(order by date_diag ASC) as diagnosed_cmtv
from patients
) as p;
/*
date_diag | diagnosed | diagnosed_cmtv | treated_cmtv | treated | percent
------------+-----------+----------------+--------------+---------+---------
2021-01-01 | 1 | 1 | 0 | 0 | 0.00
2021-01-11 | 1 | 2 | 1 | 1 | 0.50
2021-01-14 | 1 | 3 | 2 | 1 | 0.66
2021-01-16 | 1 | 4 | 2 | 0 | 0.50
2021-01-30 | 1 | 5 | 3 | 1 | 0.60
2021-02-04 | 1 | 6 | 4 | 1 | 0.66
*/
I think that gives you what you are asking for.
An alternative approach to the other answers is to use a coordinated sub query in the select
SELECT
p.date_diag,
(SELECT COUNT(*)
FROM patients p2
WHERE p2.date_treat <= p.date_diag) ytd_treated
FROM
patients p
WHERE diagnosed = 1
GROUP BY p.date_diag
ORDER BY p.date_diag
This will give you that column of 0,0,0,1,1,4,4 - you can divide it by the diagnosed column to give your percentage
SELECT
(select ...) / SUM(COUNT(*)) OVER(...)
Note you might need some more clauses in your inner where, such as having a treated date greater than or equal to Jan 1st of the year of the diag date if you're running it against a dataset with more than just one year's data
Also bear in mind that treated as an integer will (should) nearly always be less than diagnosed so if you do an integer divide you'll get zero. Cast one of the operands to float or if you're doing your percentage out of a hundred maybe *100.0

Sum with SQL depending on the value of a column

I have 3 columns : year, price, and day_type.
year day_type price
2016 0 10
2016 1 20
2016 2 5
2017 0 14
2017 1 6
2017 2 3
I want to keep only the lines where day_type = 1 or 2, but add to these lines the value when day_type = 0.
Expected Result :
year day_type price
2016 1 30
2016 2 15
2017 1 20
2017 2 17
How can I do that?
You can use a join:
select t.year, t.day_type, (t.price + coalesce(t0.price, 0)) as price
from t left join
t t0
on t.year = t0.year and t0.day_type = 0
where t.day_type <> 0;
This uses left join in case one of the years does not have a 0 price.
With sum() window function:
select * from (
select year, (2 * day_type) % 3 as day_type,
sum(price) over (partition by year) - price as price
from tablename
) t
where day_type <> 0
order by year, day_type
See the demo.
Results:
year | day_type | price
---: | -------: | ----:
2016 | 1 | 30
2016 | 2 | 15
2017 | 1 | 20
2017 | 2 | 17

alternate row value minus SQL

Hi Guys I have a data set from my query that looks like this:
Date | Count | Activity
10 Nov | 10 | A
11 Nov | 11 | A
10 Nov | 12 | B
11 Nov | 13 | B
I am trying to achieve this result. Basically the logic is the 2nd row will minus the 1st row, 4th row minus the 3rd row.
Date | Count | Activity | Diff
10 Nov | 10 | A | 0
11 Nov | 11 | A | 1
10 Nov | 12 | B | 0
11 Nov | 13 | B | 1
My current query looks like this:
select DATE, count(distinct(ID)) as Count,
count(distinct(ID)) - LAG(count(distinct(ID)),1) over (order by count(distinct(ID))) as Eng_change
from (Select DATA.*,PRODUCT.MAPPING from DATA left join PRODUCT on DATA.Part_Number=PRODUCT.PRODUCT_NUMBER ) OVERALLFUNNEL
WHERE ACTIVITY_RANK>5
group by OVERALLFUNNEL.ACTIVITY,OVERALLFUNNEL.DATE
ORDER BY ACTIVITY_RANK ASC
Using lag will give minus always the previous row but that's not what I want.
Any help or function?
regards
If I understand your problem correctly, you want the difference within each activity. If so:
select DATE, activity,
count(distinct ID) as Count,
(count(distinct ID) -
LAG(count(distinct ID), 1) over (partition by activity
order by count(distinct ID)
)
) as Eng_change
This will give NULL for the first value. If you want 0, use coalesce() or something similar.

How to select all records from one table that do not exist in particular year?

the table looks like this
num Year
1 | 2014
2 | 2014
3 | 2014
2 | 2015
4 | 2015
5 | 2015
6 | 2015
I would like my query to return
4 | 2014
5 | 2014
6 | 2014
1 | 2015
3 | 2015
from 1 to 6, the number that is not used in particular year.
Generate the all the combinations and then take out the ones that exist:
select n.num, y.year
from (select distinct num from t) n cross join
(select distinct year from t) y left join
t
on t.num = n.num and t.year = y.year
where t.num is null;
Note that year is a bad name for a column in SQL Server because it is the name of a function and a keyword (think datepart()).

How do you select from a date range as the data source

Short of creating a table with all of the values of a date range, how would I select from a datarange as a datasource.
What I'm trying to accomplish is to create a running total of all items created within the same week from separate tables, while showing weeks with 0 new
example table:
items
-----------------------------
created_on | name | type
-----------------------------
2012-01-01 | Cards | 1
2012-01-09 | Red Pen | 2
2012-01-31 | Pencil | 2
2012-02-01 | Blue Pen | 2
types
--------------
name | id
--------------
Fun | 1
Writing | 2
sample output:
----------------------------
year | week | fun | writing
----------------------------
2012 | 1 | 1 | 0
2012 | 2 | 0 | 1
2012 | 3 | 0 | 0
2012 | 4 | 0 | 0
2012 | 5 | 0 | 2
You could generate a number series for the week numbers
SELECT
w.week
FROM
(SELECT generate_series(1,52) as week) as w
Example
SELECT
w.year,
w.week,
COUNT(i1) as fun,
COUNT(i2) as writing
FROM (SELECT 2012 as year, generate_series(1,6) as week) as w
LEFT JOIN items i1 ON i1.type = 1 AND w.week = EXTRACT(WEEK FROM i1.created_on)
LEFT JOIN items i2 ON i2.type = 2 AND w.week = EXTRACT(WEEK FROM i2.created_on)
GROUP BY
w.year,
w.week
ORDER BY
w.year,
w.week
Very close erikxiv, but you got me in the right direction. I have multiple tables I need to grab information from, this the additional select in the select fields.
select
date_year.num,
date_week.num,
( select count(*) from items x
and EXTRACT(YEAR FROM x.created_on) = date_year.num
and EXTRACT(WEEK FROM x.created_on) = date_week.num
) as item_count
from
(SELECT generate_series(2011, date_part('year', CURRENT_DATE)::INTEGER) as num) as date_year,
(SELECT generate_series(1,52) as num) as date_week
where
(
date_year.num < EXTRACT (YEAR FROM CURRENT_DATE)
OR
(
date_year.num = EXTRACT (YEAR FROM CURRENT_DATE) AND
date_week.num <= EXTRACT (WEEK FROM CURRENT_DATE)
)
)