How to combine Date periods in SQL and create a composite timeline

How to combine Date periods in SQL and create a composite timeline - sql

In a oracle table for each family (unique id) can have several people (unique id) in different relationship for a date range. I would like to get a timeline created to obtain the FamilyType based on combinations of relationship for the time periods. An example is given below for a particular Family.
P1|-----Head---------------------------------------|
P2|--Partner--------------|
P3|---Child----------------------|
P4|---Child------------|
|=Single=|=Couple=|=Family=======|=SingleParent==|
Table has columns
FamilyId, PersonId, Relationship, StartDate, EndDate
Each | is a date (no time portion). The data guarantees that on a given date
* there will always be one person who is Head.
* There can be 0 or 1 Partner.
* There can be 0 or n child.
The rules are
* If there is only a Head the FamilyType is Single
* If there is a Head and a Partner the FamilyType is Couple
* If there is a Head , a Partner and 1 or more Children the FamilyType is Family
* If there is a Head and 1 or more Children the FamilyType is SingleParent
People can join or leave from a family on any date.
And people can change relationships. So following scenarios are possible
P1|----------Head--------------------|
P2|----partner------------|---Head--------|
P3|---Child----------------------|
P4|--Child-----------------|
|=Single=|=Couple=|=Family=======|=SingleParent==|
P1|----------Head--------------------|
P2|----partner------------|---Head--------|
P3|---Child----------------------|
P4|--Child-----------------|
p5|---Partner-----|
|=Single=|=Couple=|=Family=======================|
How can this be done using SQL in Oracle 11GR2 (working using SQL only and not using procedural code). I am trying to evaluate whether this is best done in SQL or C#. As a curiosity answer specific for SQL Server 2012 is also good to have.
The result should be rows with StartDate, EndDate and FamilyType.

you could do something like this:
with family_ranges(familyid, min_start, max_end, curr_date)
as (select familyid,
min(startdate),
max(enddate),
to_number(to_char(min(startdate), 'j'))
from family
group by familyid
union all
select familyid, min_start, max_end, curr_date+1
from family_ranges
where curr_date < to_number(to_char(max_end,'j')))
select familyid, min(curr_date) fromdate, max(curr_date) todate, state
from (select familyid, to_date(curr_date,'j') curr_date,
case when head = 'Y' and partner = 'Y' and child = 'Y' then 'Family'
when head = 'Y' and partner = 'Y' then 'Couple'
when head = 'Y' and child = 'Y' then 'SingleParent'
when head = 'Y' then 'Single'
end state
from (select f.familyid, d.curr_date, f.relationship
from family_ranges d
inner join family f
on f.familyid = d.familyid
and to_date(d.curr_date,'j') between f.startdate and f.enddate)
pivot (
max('Y')
for relationship in ('Head' as head, 'Partner' as partner, 'Child' as child)
))
group by familyid, state
order by familyid, fromdate;
forgive the nonsense with the date->julian. it's to work round a bug with 11.2.0.1-3 where date arithmetic fails with factored subqueries.
the fatored subquery part gets us a list of dates that the family spans. From that we join it back to family to work out who was in the family on that day.
select f.familyid, d.curr_date, f.relationship
from family_ranges d
inner join family f
on f.familyid = d.familyid
and to_date(d.curr_date,'j') between f.startdate and f.enddate;
now we pivot this to get a simple Y/N list
SQL> with family_ranges(familyid, min_start, max_end, curr_date)
2 as (select familyid,
3 min(startdate),
4 max(enddate),
5 to_number(to_char(min(startdate), 'j'))
6 from family
7 group by familyid
8 union all
9 select familyid, min_start, max_end, curr_date+1
10 from family_ranges
11 where curr_date < to_number(to_char(max_end,'j')))
12 select familyid, to_date(curr_date,'j') curr_date, head, partner, child
13 from (select f.familyid, d.curr_date, f.relationship
14 from family_ranges d
15 inner join family f
16 on f.familyid = d.familyid
17 and to_date(d.curr_date,'j') between f.startdate and f.enddate)
18 pivot (
19 max('Y')
20 for relationship in ('Head' as head, 'Partner' as partner, 'Child' as child)
21 );
FAMILYID CURR_DATE H P C
---------- --------- - - -
1 09-NOV-12 Y
1 11-NOV-12 Y
1 13-NOV-12 Y
1 23-NOV-12 Y
2 23-NOV-12 Y
2 28-NOV-12 Y Y
2 29-NOV-12 Y Y
1 30-NOV-12 Y Y
1 01-DEC-12 Y Y
1 03-DEC-12 Y Y
2 18-DEC-12 Y Y Y
2 20-DEC-12 Y Y Y
then its a simple case to get your required string from the rules and a group by to get the date ranges.
SQL> with family_ranges(familyid, min_start, max_end, curr_date)
2 as (select familyid,
3 min(startdate),
4 max(enddate),
5 to_number(to_char(min(startdate), 'j'))
6 from family
7 group by familyid
8 union all
9 select familyid, min_start, max_end, curr_date+1
10 from family_ranges
11 where curr_date < to_number(to_char(max_end,'j')))
12 select familyid, min(curr_date) fromdate, max(curr_date) todate, state
13 from (select familyid, to_date(curr_date,'j') curr_date,
14 case when head = 'Y' and partner = 'Y' and child = 'Y' then 'Family'
15 when head = 'Y' and partner = 'Y' then 'Couple'
16 when head = 'Y' and child = 'Y' then 'SingleParent'
17 when head = 'Y' then 'Single'
18 end state
19 from (select f.familyid, d.curr_date, f.relationship
20 from family_ranges d
21 inner join family f
22 on f.familyid = d.familyid
23 and to_date(d.curr_date,'j') between f.startdate and f.enddate)
24 pivot (
25 max('Y')
26 for relationship in ('Head' as head, 'Partner' as partner, 'Child' as child)
27 ))
28 group by familyid, state
29 order by familyid, fromdate;
FAMILYID FROMDATE TODATE STATE
---------- --------- --------- ------------
1 05-NOV-12 24-NOV-12 Single
1 25-NOV-12 14-DEC-12 Couple
1 15-DEC-12 24-JAN-13 Family
1 25-JAN-13 13-FEB-13 SingleParent
2 05-NOV-12 24-NOV-12 Single
2 25-NOV-12 14-DEC-12 Couple
2 15-DEC-12 13-FEB-13 Family
fiddle: http://sqlfiddle.com/#!4/484b5/1

Related

SQL sum between dates

I need to sum values in intersect of range dates.
sample of source data
person
item
start_date
end_date
value
a
apple
08.03.2018
29.03.2018
3
a
apple
01.01.2019
08.08.2021
2
a
apple
01.01.2019
09.10.2021
5
a
pen
10.10.2021
30.10.2021
2
a
cup
08.03.2018
20.03.2018
8
a
cup
15.03.2018
20.03.2019
2
b
pen
10.10.2021
30.10.2021
2
b
pen
10.10.2021
30.10.2021
6
b
orange
10.11.2021
10.11.2022
3
b
orange
20.11.2021
20.12.2021
2
expected result
person
item
start_date
end_date
value
a
apple
08.03.2018
29.03.2018
3
a
apple
01.01.2019
08.08.2021
7
a
apple
09.08.2021
09.10.2021
5
a
pen
10.10.2021
30.10.2021
2
a
cup
08.03.2018
14.03.2018
8
a
cup
15.03.2018
20.03.2018
10
a
cup
21.03.2018
20.03.2019
2
b
pen
10.10.2021
30.10.2021
8
b
orange
10.11.2021
19.11.2021
3
b
orange
20.11.2021
20.12.2021
5
b
orange
21.12.2021
10.11.2022
3
I use something code like this, but it is to simple, and results are not good
select
person
,item
,Min([start_date]) as [start_date]
,Max([end_date]) as [end_date]
,Sum([value]) as [value]
FROM table
Group by person, item
I tried to use LAG() function, but i'm lost

I have no access to Synapse , but assuming it's compatibile with SQL server...
db<>fiddle
Internal query build data ranges, creating additional dates for overlapping periods if needed. Main query just sum values.
select person, item, range_from, range_to,
(select sum(value) from test
where person = r.person
and item = r.item
and range_from between start_date and end_date) value
from (
select
be,
person,
item,
date range_from,
lead(date,1) over(partition by person, item order by date,be) range_to
from (
select 1 be, person, item, start_date date from test
union
select 2, person, item, end_date from test
union
select 2, person, item, dateadd(day,-1,start_date) from test a
where exists (select * from test where a.person = person and a.item = item and a.start_date > start_date and a.start_date < end_date)
union
select 1, person, item, dateadd(day,1,end_date) from test b
where exists (select * from test where b.person = person and b.item = item and b.end_date > start_date and b.end_date < end_date)
) k
) r where r.be = 1 order by r.person, r.item, r.range_from
column be contains:
1 - for period start
2 - for period end

Query to restrict results from left join

I have the following query
select S.id, X.id, 15,15,1 from schema_1.tbl_2638 S
JOIN schema_1.tbl_2634_customid X on S.field_1=x.fullname
That returns the following results, where you can see the first column is duplicated on matches to the 2nd table.
1 1 15 15 1
2 3 15 15 1
2 2 15 15 1
3 5 15 15 1
3 4 15 15 1
I'm trying to get a query that would just give me a single row per 1st ID, and the min value from 2nd ID. So I want a result that would be:
1 1 15 15 1
2 2 15 15 1
3 4 15 15 1
I'm a little rust on my SQL skills, how would I write the query to provide the above result?

From your result you can do,this to achieve your result, for much more compicated structures, you can always take a look at window fucntions
select S.id, MIN(X.id) x_id, 15,15,1 from schema_1.tbl_2638 S
JOIN schema_1.tbl_2634_customid X on S.field_1=x.fullname
GROUP BY 1,3,4,5
window function can be used, need always a outer SELECT
SELECT
s_id,x_idm a,b,c
FROM
(select S.id as s_id, X.id as x_id, 15 a ,15 b,1 c
, ROW_NUMBER() OVER (PARTITION BY S.id ORDER BY X.id ASC) rn
from schema_1.tbl_2638 S
JOIN schema_1.tbl_2634_customid X on S.field_1=x.fullname)
WHERE rn = 1
Or as CTE
WITH CTE as (select S.id as s_id, X.id as x_id, 15 a ,15 b,1 c
, ROW_NUMBER() OVER (PARTITION BY S.id ORDER BY X.id ASC) rn
from schema_1.tbl_2638 S
JOIN schema_1.tbl_2634_customid X on S.field_1=x.fullname)
SELECT s_id,x_id,a,b,c FROM CTE WHERE rn = 1

SQL to find course popularity from a survey

I am trying to write an SQL query that displays the course popularity,
in descending order.
Course popularity is measured in points, which determined as follows: For every survey:
a. if the votes difference > 10% of total votes, the more popular course gets 1 point, and the less popular course gets 0 points
b. if the votes difference <= 10% of total votes, each course gets 0.5
point
course_id
course_name
faculty
1001
economics_101
business
1002
algebra_101
math
1003
geometry_101
math
1004
management_101
business
1005
marketing_101
business
1006
physics_101
science
survey_id
option_a
option_b
votes_a
votes_b
2001
economics_101
geometry_101
61
34
2002
algebra_101
economics_101
31
68
2003
marketing_101
management_101
11
72
2005
management_101
algebra_101
43
54
2004
geometry_101
marketing_101
48
46
Result achieved so far:
course
popularity
economics_101
4
management_101
2
algebra_101
2
marketing_101
1
geometry_101
1
[NULL]
0
I managed to join it so far, would be great to have inputs on optimizing this query:
WITH x AS
(
WITH b AS
(
WITH a as
(
select * from course c
LEFT JOIN survey s
on c.course_name = s.option_a
UNION ALL
select * from course c
LEFT JOIN survey s
on c.course_name = s.option_b
)
SELECT a.*,
SUM(votes_a+votes_b) as total_votes,
CASE WHEN (a.votes_a - a.votes_b) > (0.1*SUM(votes_a+votes_b)) THEN 1
WHEN (a.votes_b - a.votes_a) <= (0.1*SUM(votes_a+votes_b)) THEN 0.5
ELSE 0
END AS 'Popularity_a',
CASE WHEN (a.votes_b - a.votes_a) > (0.1*SUM(votes_a+votes_b)) THEN 1
WHEN (a.votes_a - a.votes_b) <= (0.1*SUM(votes_a+votes_b)) THEN 0.5
ELSE 0
END AS 'Popularity_b'
FROM
a
GROUP BY
a.course_name ,
a.course_id,
a.faculty ,
a.survey_id ,
a.option_a ,
a.option_b ,
a.votes_a ,
a.votes_b
)
SELECT b.option_a as course,
b.Popularity_a as pop
FROM b
LEFT JOIN
course cx
ON b.option_a = cx.course_name
UNION ALL
SELECT b.option_b as course ,
b.Popularity_b as pop
FROM b
LEFT JOIN
course cx
ON b.option_b = cx.course_name
)
select
x.course ,
sum (x.pop) as popularity
from x
GROUP BY
x.course
order by popularity desc

Use UNION ALL to extract all courses and the respective points they get from the table survey and aggregate to get the popularity.
Then join to course:
WITH
cte AS (
SELECT option_a course_name,
CASE
WHEN votes_a - votes_b > 0.1 * (votes_a + votes_b) THEN 1.0
WHEN votes_b - votes_a > 0.1 * (votes_a + votes_b) THEN 0.0
ELSE 0.5
END points
FROM survey
UNION ALL
SELECT option_b,
CASE
WHEN votes_b - votes_a > 0.1 * (votes_a + votes_b) THEN 1.0
WHEN votes_a - votes_b > 0.1 * (votes_a + votes_b) THEN 0.0
ELSE 0.5
END
FROM survey
),
points AS (
SELECT course_name, SUM(points) total_points
FROM cte
GROUP BY course_name
)
SELECT c.*, COALESCE(p.total_points, 0) popularity
FROM course c LEFT JOIN points p
ON p.course_name = c.course_name
ORDER BY popularity DESC;
See the demo.

How to implement multiple joins on different fields based on different functions in SQL?

I have few tables as below. And, I need to fetch the records on the basis of each maximum level and latest level (ordered by date) for each ID and Type column. I'm using SQL Server to run the query. So far, I have tried the following SQL query:
select f.ID,x.MAX_LEVEL,f.TYPE, f.DATE
from (
select ID
,TYPE
, MAX(LEVEL) as MAX_LEVEL
from TABLEA
GROUP BY ID, TYPE
) as x
,
(
select ID
,TYPE
, MAX(DATE) as MAX_DATETIME
from TABLEA
GROUP BY ID, TYPE
) as y
inner join TABLEA as f
on f.ID = x.ID and f.LEVEL = x.MAX_LEVEL
inner join TABLEA as g
on f.ID = y.ID and g.DATE = y.MAX_DATETIME
and f.DATE > DATEADD(day, -1, GETDATE())
TABLEA
ID TYPE LEVEL DATE
1 ELECTRIC 2 01/06/2019
1 GAS 2 01/06/2019
2 ELECTRIC 2 01/06/2019
3 ELECTRIC 3 01/06/2019
3 ELECTRIC 3 01/06/2019
1 GAS 3 05/06/2019
1 GAS 5 13/06/2019
2 ELECTRIC 5 07/06/2019
3 GAS 5 08/06/2019
6 ELECTRIC 3 02/06/2019
2 ELECTRIC 3 04/06/2019
3 ELECTRIC 3 05/06/2019
2 GAS 10 06/06/2019
2 GAS 3 11/06/2019
3 ELECTRIC 3 11/06/2019
1 ELECTRIC 5 01/06/2019
1 GAS 3 02/06/2019
6 ELECTRIC 5 01/06/2019
1 ELECTRIC 5 10/06/2019
Expected Result:
ID TYPE MAX_LEVEL LATEST_LEVEL
1 ELECTRIC 5 5
1 GAS 5 3
2 ELECTRIC 5 5
2 GAS 10 3
3 ELECTRIC 3 3
3 GAS 5 5
6 ELECTRIC 5 3
Any thoughts, how could I achieve this?

if you are using sqlserver, you can try this.
SELECT ID, TYPE, MAX(T1.[LEVEL]) AS MAX_LEVEL, X.LEVEL AS LATEST_LEVEL
FROM TABLEA T1
OUTER APPLY (SELECT TOP 1 [LEVEL] FROM TABLEA T2 WHERE T2.ID = T1.ID AND T2.TYPE = T1.TYPE ORDER BY T2.[DATE] DESC) X
GROUP BY ID, TYPE, X.[LEVEL]
ORDER BY ID, TYPE

Unfortunately, SQL Server doesn't have a "first" or "last" aggregation function. But it does have first_value() and last_value() as window functions. So, one method is:
select distinct t.id, t.type
max(t.level) over (partition by id, type) as max_level,
first_value(t.level) over (partition by id, type order by date desc) as latest_level
from t;
Another alternative is using window functions in a subquery:
select id, type, max(level) as max_level,
max(case when seqnum = 1 then level end) as latest_level
from (select t.*,
row_number() over (partition by id, type order by date desc) as seqnum
from t
) t
group by id, type;

Ratings query - percentage of ratings for specified id over ratings for ALL ids

I have inherited the following query, which gets the average rating for a specified "try" (rugby jargon for touchdown). Hopefully we can still work with it.
SELECT i.id, i.title,
(
CASE
WHEN
COUNT( r.rating ) > 0
THEN
(
SUM( r.rating ) / COUNT( r.rating )
)
ELSE
0
END
) AS rating,
COALESCE( er.id, 0 ) AS has_existing_rating
FROM
(
SELECT 1 AS id, 'Try 1 – Israel Dagg v Chiefs.' as title UNION ALL
SELECT 2 AS id, 'Try 2 – Israel Dagg v Chiefs.' as title UNION ALL
SELECT 3 AS id, 'Try 3 – Leilia Masaga v Crusaders.' as title UNION ALL
SELECT 4 AS id, 'Try 4 – Israel Dagg v Chiefs.' as title UNION ALL
SELECT 5 AS id, 'Try 5 – Fred Flintstone v Hurricanes.' as title UNION ALL
SELECT 6 AS id, 'Try 6 – Israel Dagg v Chiefs.' as title UNION ALL
SELECT 7 AS id, 'Try 7 – Israel Dagg v Chiefs.' as title
) AS i
LEFT OUTER JOIN
tryPoll r
ON
i.id = r.try_id
<!--
Join this to the rating table AGAIN to see if the current
user has already rated the given try.
-->
LEFT OUTER JOIN
tryPoll er
ON
(
er.try_id = i.id
AND
er.ip_address = '#cgi.remote_addr#'
AND
er.user_agent = '#cgi.http_user_agent#'
)
GROUP BY
i.id,
r.try_id,
er.id,
i.title
ORDER BY
i.id ASC
So, given the following table (rating = 1 simply means a single vote in this case) ....
tryPoll
id try_id rating ip_address user_agent
------------------------------------------------------
1 2 1 58.28.220.51 Mozilla/5.0 blah
2 2 1 58.28.220.52 Mozilla/5.0 blah
3 6 1 58.28.220.53 Mozilla/5.0 blah
4 4 1 58.28.220.54 Mozilla/5.0 blah
... the query would return an average rating of (1 + 1) / 2 = 1 for try_id #2
HOWEVER, I need to adjust this query to return a percentage of ratings for a particular TRY over ratings for ALL tries. i.e., in the above example, determine what percentage of ALL ratings for ALL tries are attributed to try_id #2
How would I accomplish this?

You could try this solution:
DECLARE #try_id INT;
SET #try_id=2;
SELECT r.try_id,
SUM(r.rating) AS ratings_per_try,
SUM(SUM(r.rating)) OVER() AS ratings_overall,
SUM(r.rating)*1.0 / NULLIF(SUM(SUM(r.rating)) OVER(), 0) AS percent_try
FROM Table r
GROUP BY r.try_id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to combine Date periods in SQL and create a composite timeline - sql

Related

SQL sum between dates

Query to restrict results from left join

SQL to find course popularity from a survey

How to implement multiple joins on different fields based on different functions in SQL?

Ratings query - percentage of ratings for specified id over ratings for ALL ids

Categories

Resources