SQL Azure Data Bricks - apache-spark-sql

We have a table 1 Day table aggregated with group by
call_date
,tdlinx_id
,work_request_id
,category_name
another table we have 1 week level data aggregated with group by
week_end_date
,category_name
,sdo_reporting_name
How can I populate the data from day level to week level ???
week_end_date = date_add(call_date, 7-dayofweek(call_date))

what you need is the following functions:
SELECT current_date() + 7 - dayofweek(current_date());
So, In your case, it would be:
SELECT call_date + 7 - dayofweek(call_date)
FROM your_table;

Related

Extract previous row calculated value for use in current row calculations - Postgres

Have a requirement where I would need to rope the calculated value of the previous row for calculation in the current row.
The following is a sample of how the data currently looks :-
ID
Date
Days
1
2022-01-15
30
2
2022-02-18
30
3
2022-03-15
90
4
2022-05-15
30
The following is the output What I am expecting :-
ID
Date
Days
CalVal
1
2022-01-15
30
2022-02-14
2
2022-02-18
30
2022-03-16
3
2022-03-15
90
2022-06-14
4
2022-05-15
30
2022-07-14
The value of CalVal for the first row is Date + Days
From the second row onwards it should take the CalVal value of the previous row and add it with the current row Days
Essentially, what I am looking for is means to access the previous rows calculated value for use in the current row.
Is there anyway we can achieve the above via Postgres SQL? I have been tinkering with window functions and even recursive CTEs but have had no luck :(
Would appreciate any direction!
Thanks in advance!
select
id,
date,
coalesce(
days - (lag(days, 1) over (order by date, days))
, days) as days,
first_date + cast(days as integer) as newdate
from
(
select
-- get a running sum of days
id,
first_date,
date,
sum(days) over (order by date, days) as days
from
(
select
-- get the first date
id,
(select min(date) from table1) as first_date,
date,
days
from
table1
) A
) B
This query get the exact output you described. I'm not at all ready to say it is the best solution but the strategy employed is to essential create a running total of the "days" ... this means that we can just add this running total to the first date and that will always be the next date in the desired sequence. One finesse: to put the "days" back into the result, we calculated the current running total less the previous running total to arrive at the original amount.
assuming that table name is table1
select
id,
date,
days,
first_value(date) over (order by id) +
(sum(days) over (order by id rows between unbounded preceding and current row))
*interval '1 day' calval
from table1;
We just add cumulative sum of days to first date in table. It's not really what you want to do (we don't need date from previous row, just cumulative days sum)
Solution with recursion
with recursive prev_row as (
select id, date, days, date+ days*interval '1 day' calval
from table1
where id = 1
union all
select t.id, t.date, t.days, p.calval + t.days*interval '1 day' calval
from prev_row p
join table1 t on t.id = p.id+ 1
)
select *
from prev_row

Finding id's available in previous weeks but not in current week

How to find if an id which was present in previous weeks but not available in current week on a rolling basis. For e.g
Week1 has id 1,2,3,4,5
Week2 has id 3,4,5,7,8
Week3 has id 1,3,5,10,11
So I found out that id 1 and 2 are missing in week 2 and id 2,4,7,8 are missing in week 3 from previous 2 weeks But how to do this on a rolling window for a large amount of data distributed over a period of 20+ years
Please find the sample dataset and expected output. I am expecting the output to be partitioned based on the week_end Date
Dataset
ID|WEEK_START|WEEK_END|APPEARING_DATE
7152|2015-12-27|2016-01-02|2015-12-27
8350|2015-12-27|2016-01-02|2015-12-27
7152|2015-12-27|2016-01-02|2015-12-29
4697|2015-12-27|2016-01-02|2015-12-30
7187|2015-12-27|2016-01-02|2015-01-01
8005|2015-12-27|2016-01-02|2015-12-27
8005|2015-12-27|2016-01-02|2015-12-29
6254|2016-01-03|2016-01-09|2016-01-03
7962|2016-01-03|2016-01-09|2016-01-04
3339|2016-01-03|2016-01-09|2016-01-06
7834|2016-01-03|2016-01-09|2016-01-03
7962|2016-01-03|2016-01-09|2016-01-05
7152|2016-01-03|2016-01-09|2016-01-07
8350|2016-01-03|2016-01-09|2016-01-09
2403|2016-01-10|2016-01-16|2016-01-10
0157|2016-01-10|2016-01-16|2016-01-11
2228|2016-01-10|2016-01-16|2016-01-14
4697|2016-01-10|2016-01-16|2016-01-14
Excepted Output
Partition1: WEEK_END=2016-01-02
ID|MAX(LAST_APPEARING_DATE)
7152|2015-12-29
8350|2015-12-27
4697|2015-12-30
7187|2015-01-01
8005|2015-12-29
Partition1: WEEK_END=2016-01-09
ID|MAX(LAST_APPEARING_DATE)
7152|2016-01-07
8350|2016-01-09
4697|2015-12-30
7187|2015-01-01
8005|2015-12-29
6254|2016-01-03
7962|2016-01-05
3339|2016-01-06
7834|2016-01-03
Partition3: WEEK_END=2016-01-10
ID|MAX(LAST_APPEARING_DATE)
7152|2016-01-07
8350|2016-01-09
4697|2016-01-14
7187|2015-01-01
8005|2015-12-29
6254|2016-01-03
7962|2016-01-05
3339|2016-01-06
7834|2016-01-03
2403|2016-01-10
0157|2016-01-11
2228|2016-01-14
Please use below query,
select ID, MAX(APPEARING_DATE) from table_name
group by ID, WEEK_END;
Or, including WEEK)END,
select ID, WEEK_END, MAX(APPEARING_DATE) from table_name
group by ID, WEEK_END;
You can use aggregation:
select t.*, max(week_end)
from t
group by id
having max(week_end) < '2016-01-02';
Adjust the date in the having clause for the week end that you want.
Actually, your question is a bit unclear. I'm not sure if a later week end would keep the row or not. If you want "as of" data, then include a where clause:
select t.id, max(week_end)
from t
where week_end < '2016-01-02'
group by id
having max(week_end) < '2016-01-02';
If you want this for a range of dates, then you can use a derived table:
select we.the_week_end, t.id, max(week_end)
from (select '2016-01-02' as the_week_end union all
select '2016-01-09' as the_week_end
) we cross join
t
where t.week_end < we.the_week_end
group by id, we.the_week_end
having max(t.week_end) < we.the_week_end;

SQL - Find the two closest date after a specific date

Dear Stack Overflow community,
I am looking for the patient id where the two consecutive dates after the very first one are less than 7 days.
So differences between 2nd and 1st date <= 7 days
and differences between 3rd and 2nd date <= 7 days
Example:
ID Date
1 9/8/2014
1 9/9/2014
1 9/10/2014
2 5/31/2014
2 7/20/2014
2 9/8/2014
For patient 1, the two dates following it are less than 7 days apart.
For patient 2 however, the following date are more than 7 days apart (50 days).
I am trying to write an SQL query that just output the patient id "1".
Thanks for your help :)
You want to use lead(), but this is complicated because you want this only for the first three rows. I think I would go for:
select t.*
from (select t.*,
lead(date, 1) over (partition by id order by date) as next_date,
lead(date, 2) over (partition by id order by date) as next_date_2,
row_number() over (partition by id order by date) as seqnum
from t
) t
where seqnum = 1 and
next_date <= date + interval '7' day and
next_date2 <= next_date + interval '7' day;
You can try using window function lag()
select * from
(
select id,date,lag(date) over(order by date) as prevdate
from tablename
)A where datediff(day,date,prevdate)<=7

Find Distinct IDs when the due date is always on the last day of each month

I have to find distinct IDs throughout the whole history of each ID whose due dates are always on the last day of each month.
Suppose I have the following dataset:
ID DUE_DT
1 1/31/2014
1 2/28/2014
1 3/31/2014
1 6/30/2014
2 1/30/2014
2 2/28/2014
3 1/29/2016
3 2/29/2016
I want to write a code in SQL so that it gives me ID = 1 as for this specific ID the due date is always on the last day of each given month.
What would be the easiest way to approach it?
You can do:
select id
from t
group by id
having sum(case when extract(day from due_dt + interval '1 day') = 1 then 1 else 0 end) = count(*);
This uses ANSI/ISO standard functions for date arithmetic. These tend to vary by database, but the idea is the same in all databases -- add one day and see if the day of the month is 1 for all the rows.
If your using SQL Server 2012+ you can use the EOMONTH() function to achieve this:
SELECT DISTINCT ID FROM [table]
WHERE DUE_DT = EOMONTH(DUE_DT)
http://rextester.com/VSPQR78701
The idea is quite simple:
you are on the last day of the month if (the month of due date) is not the same as (the month of due date + 1 day). This covers all cases across year, leap year and so on.
from there on, if (the count of rows for one id) is the same as (the count of rows for this id which are the last day of the month) you have a winner.
I tried to write an example (not tested). You do not specify which DB so I will assume that cte (common table expression) are available. If not just put the cte as subquery.
In the same way, I am not sure that dateadd and interval work the same in all dialect.
with addlastdayofmonth as (
select
id
-- adding a 'virtualcolumn', 1 if last day of month 0 otherwise
, if(month(dateadd(due_date, interval '1' day)) != month(due_date), 1 ,0) as onlastday
from
table
)
select
id
, count(*) - sum(onlastday) as alwayslastday
from
addlastdayofmonth
group by
id
having
-- if count(rows) == count(rows with last day) we have a winner
halwayslastday = 0
MySQL-Version (credits to #Gordon Linoff)
SELECT
ID
FROM
<table>
GROUP BY
ID
HAVING
SUM(IF(day(DUE_DT + interval 1 Day) = 1, 1, 0)) = COUNT(ID);
Original Answer:
SELECT MAX(DUE_DT) FROM <table> WHERE ID = <the desired ID>
or if you want all MAX(DUE_DT) for each unique ID
SELECT ID, MAX(DATE) FROM <table> GROUP BY ID

SQL query for all the days of a month

i have the following table RENTAL(book_date, copy_id, member_id, title_id, act_ret_date, exp_ret_date). Where book_date shows the day the book was booked. I need to write a query that for every day of the month(so from 1-30 or from 1-29 or from 1-31 depending on month) it shows me the number of books booked.
i currently know how to show the number of books rented in the days that are in the table
select count(book_date), to_char(book_date,'DD')
from rental
group by to_char(book_date,'DD');
my questions are:
How do i show the rest of the days(if let's say for some reason in my database i have no books rented on 20th or 19th or multiple days) and put the number 0 there?
How do i show the number of days only of the current month so(28,29,30,31 all these 4 are possible depending on month or year)... i am lost . This must be done using only SQL query no pl/SQL or other stuff.
The following query would give you all days in the current month, in your case you can replace SYSDATE with your date column and join with this query to know how many for a given month
SELECT DT
FROM(
SELECT TRUNC (last_day(SYSDATE) - ROWNUM) dt
FROM DUAL CONNECT BY ROWNUM < 32
)
where DT >= trunc(sysdate,'mm')
The answer is to create a table like this:
table yearsmonthsdays (year varchar(4), month varchar(2), day varchar(2));
use any language you wish, e.g. iterate in java with Calendar.getInstance().getActualMaximum(Calendar.DAY_OF_MONTH) to get the last day of the month for as many years and months as you like, and fill that table with the year, month and days from 1 to last day of month of your result.
you'd get something like:
insert into yearsmonthsdays ('1995','02','01');
insert into yearsmonthsdays ('1995','02','02');
...
insert into yearsmonthsdays ('1995','02','28'); /* non-leap year */
...
insert into yearsmonthsdays ('1996','02','01');
insert into yearsmonthsdays ('1996','02','02');
...
insert into yearsmonthsdays ('1996','02','28');
insert into yearsmonthsdays ('1996','02','29'); /* leap year */
...
and so on.
Once you have this table done, your work is almost finished. Make an outer left join between your table and this table, joining year, month and day together, and when no lines appear, the count will be zero as you wish. Without using programming, this is your best bet.
In oracle, you can query from dual and use the conncect by level syntax to generate a series of rows - in your case, dates. From there on, it's just a matter of deciding what dates you want to display (in my example I used all the dates from 2014) and joining on your table:
SELECT all_date, COALESCE (cnt, 0)
FROM (SELECT to_date('01/01/2014', 'dd/mm/yyyy') + rownum - 1 AS all_date
FROM dual
CONNECT BY LEVEL <= 365) d
LEFT JOIN (SELECT TRUNC(book_date), COUNT(book_date) AS cnt
FROM rental
GROUP BY book_date) r ON d.all_date = TRUNC(r.book_date)
There's no need to get ROWNUM involved ... you can just use LEVEL in the CONNECT BY:
WITH d1 AS (
SELECT TRUNC(SYSDATE, 'MONTH') - 1 + LEVEL AS book_date
FROM dual
CONNECT BY TRUNC(SYSDATE, 'MONTH') - 1 + LEVEL <= LAST_DAY(SYSDATE)
)
SELECT TRUNC(d1.book_date), COUNT(r.book_date)
FROM d1 LEFT JOIN rental r
ON TRUNC(d1.book_date) = TRUNC(r.book_date)
GROUP BY TRUNC(d1.book_date);
Simply replace SYSDATE with a date in the month you're targeting for results.
All days of the month based on current date
select trunc(sysdate) - (to_number(to_char(sysdate,'DD')) - 1)+level-1 x from dual connect by level <= TO_CHAR(LAST_DAY(sysdate),'DD')
It did works to me:
SELECT DT
FROM (SELECT TRUNC(LAST_DAY(SYSDATE) - (CASE WHEN ROWNUM=1 THEN 0 ELSE ROWNUM-1 END)) DT
FROM DUAL
CONNECT BY ROWNUM <= 32)
WHERE DT >= TRUNC(SYSDATE, 'MM')
In Oracle SQL the query must look like this to not miss the last day of month:
SELECT DT
FROM(
SELECT trunc(add_months(sysdate, 1),'MM')- ROWNUM dt
FROM DUAL CONNECT BY ROWNUM < 32
)
where DT >= trunc(sysdate,'mm')