SQL Select only missing months - sql

Notice the 2017-04-01, 2018-02-01, 2018-07-01, and 2019-01-01 months are missing in the output. I want to show only those months which are missing. Does anyone know how to go about this?
Query:
SELECT TO_DATE("Month", 'mon''yy') as dates FROM sample_sheet
group by dates
order by dates asc;
Output:
2017-01-01
2017-02-01
2017-03-01
2017-05-01
2017-06-01
2017-07-01
2017-08-01
2017-09-01
2017-10-01
2017-11-01
2017-12-01
2018-01-01
2018-03-01
2018-04-01
2018-05-01
2018-06-01
2018-08-01
2018-09-01
2018-10-01
2018-11-01
2018-12-01
2019-02-01
2019-03-01
2019-04-01

I don't know Vertica, so I wrote a working proof of concept in Microsoft SQL Server and tried to convert it to Vertica syntax based on the online documentation.
It should look like this:
with
months as (
select 2017 as date_year, 1 as date_month, to_date('2017-01-01', 'YYYY-MM-DD') as first_date, to_date('2017-01-31', 'yyyy-mm-dd') as last_date
union all
select
year(add_months(first_date, 1)) as date_year,
month(add_months(first_date, 1)) as date_month,
add_months(first_date, 1) as first_date,
last_day(add_months(first_date, 1)) as last_date
from months
where first_date < current_date
),
sample_dates (a_date) as (
select to_date('2017-01-15', 'YYYY-MM-DD') union all
select to_date('2017-01-22', 'YYYY-MM-DD') union all
select to_date('2017-02-01', 'YYYY-MM-DD') union all
select to_date('2017-04-15', 'YYYY-MM-DD') union all
select to_date('2017-06-15', 'YYYY-MM-DD')
)
select *
from sample_dates right join months on sample_dates.a_date between first_date and last_date
where sample_dates.a_date is null
Months is a recursive dynamic table that holds all months since 2017-01, with first and last day of the month. sample_dates is just a list of dates to test the logic - you should replace it with your own table.
Once you build that monthly calendar table all you need to do is check your dates against it using an outer query to see what dates are not between any of those periods between first_date and last_date columns.

You can build a TIMESERIES of all dates between the first input date and the last input date (The highest granularity of a TIMESERIES is the day.), and filter out only the months' first days out of that; then left join that created sequence of firsts of month with your input to find out where the join would fail, checking for NULLS from the input branch of the join:
WITH
-- your input
input(mth1st) AS (
SELECT DATE '2017-01-01'
UNION ALL SELECT DATE '2017-02-01'
UNION ALL SELECT DATE '2017-03-01'
UNION ALL SELECT DATE '2017-05-01'
UNION ALL SELECT DATE '2017-06-01'
UNION ALL SELECT DATE '2017-07-01'
UNION ALL SELECT DATE '2017-08-01'
UNION ALL SELECT DATE '2017-09-01'
UNION ALL SELECT DATE '2017-10-01'
UNION ALL SELECT DATE '2017-11-01'
UNION ALL SELECT DATE '2017-12-01'
UNION ALL SELECT DATE '2018-01-01'
UNION ALL SELECT DATE '2018-03-01'
UNION ALL SELECT DATE '2018-04-01'
UNION ALL SELECT DATE '2018-05-01'
UNION ALL SELECT DATE '2018-06-01'
UNION ALL SELECT DATE '2018-08-01'
UNION ALL SELECT DATE '2018-09-01'
UNION ALL SELECT DATE '2018-10-01'
UNION ALL SELECT DATE '2018-11-01'
UNION ALL SELECT DATE '2018-12-01'
UNION ALL SELECT DATE '2019-02-01'
UNION ALL SELECT DATE '2019-03-01'
UNION ALL SELECT DATE '2019-04-01'
)
,
-- need a series of month's firsts
-- TIMESERIES works for INTERVAL DAY TO SECOND
-- so build that timeseries, and filter out
-- the month's firsts
limits(mth1st) AS (
SELECT MIN(mth1st) FROM input
UNION ALL SELECT MAX(mth1st) FROM input
)
,
alldates AS (
SELECT dt::DATE FROM limits
TIMESERIES dt AS '1 day' OVER(ORDER BY mth1st::TIMESTAMP)
)
,
allfirsts(mth1st) AS (
SELECT dt FROM alldates WHERE DAY(dt)=1
)
SELECT
allfirsts.mth1st
FROM allfirsts
LEFT JOIN input USING(mth1st)
WHERE input.mth1st IS NULL;
-- out mth1st
-- out ------------
-- out 2017-04-01
-- out 2018-02-01
-- out 2018-07-01
-- out 2019-01-01

Related

Find overlapping date in SQL

I need SELECT for finding data with overlapping date in Oracle SQL just from today to exactly one year ago. ID_FORMULAR is not UNIQUE value and I need to include just data with overlapping date where ID_FORMULAR is UNIQUE.
My code:
SELECT T1.*
FROM VISITORS T1, VISITORS T2
WHERE ( T1.ID_FORMULAR != T2.ID_FORMULAR
AND t1.FROM_DATE >= t2.FROM_DATE
AND t1.FROM_DATE <= t2.TO_DATE
AND T1.CREATED_DATE >= ADD_MONTHS (TRUNC (CURRENT_DATE), -12)
AND T1.CREATED_DATE < TRUNC (CURRENT_DATE) + 1)
OR ( T1.ID_FORMULAR != T2.ID_FORMULAR
AND t1.TO_DATE >= t2.FROM_DATE
AND t1.TO_DATE <= t2.TO_DATE
AND T1.CREATED_DATE >= ADD_MONTHS (TRUNC (CURRENT_DATE), -12)
AND T1.CREATED_DATE < TRUNC (CURRENT_DATE) + 1)
OR ( T1.ID_FORMULAR != T2.ID_FORMULAR
AND t1.TO_DATE >= t2.TO_DATE
AND t1.FROM_DATE <= t2.FROM_DATE
AND T1.CREATED_DATE >= ADD_MONTHS (TRUNC (CURRENT_DATE), -12)
AND T1.CREATED_DATE < TRUNC (CURRENT_DATE) + 1)
It is not working correctly. Any help?
From Oracle 12, you can use MATCH_RECOGNIZE to perform row-by-row processing:
SELECT *
FROM (
SELECT *
FROM visitors
WHERE created_date >= ADD_MONTHS(TRUNC(CURRENT_DATE), -12)
AND created_date < TRUNC(CURRENT_DATE) + 1
)
MATCH_RECOGNIZE(
ORDER BY from_date
ALL ROWS PER MATCH
PATTERN (any_row overlap+)
DEFINE
overlap AS PREV(id_formular) != id_formular
AND PREV(to_date) >= from_date
)
Which, for the sample data:
CREATE TABLE visitors (id_formular, created_date, from_date, to_date) AS
SELECT 1, DATE '2022-08-01', DATE '2022-08-01', DATE '2022-08-03' FROM DUAL UNION ALL
SELECT 2, DATE '2022-08-01', DATE '2022-08-02', DATE '2022-08-04' FROM DUAL UNION ALL
SELECT 3, DATE '2022-08-01', DATE '2022-08-03', DATE '2022-08-05' FROM DUAL UNION ALL
SELECT 1, DATE '2022-08-01', DATE '2022-08-06', DATE '2022-08-06' FROM DUAL UNION ALL
SELECT 2, DATE '2022-08-01', DATE '2022-08-07', DATE '2022-08-09' FROM DUAL UNION ALL
SELECT 2, DATE '2022-08-01', DATE '2022-08-08', DATE '2022-08-10' FROM DUAL UNION ALL
SELECT 1, DATE '2022-08-01', DATE '2022-08-09', DATE '2022-08-11' FROM DUAL;
Outputs:
FROM_DATE
ID_FORMULAR
CREATED_DATE
TO_DATE
01-AUG-22
1
01-AUG-22
03-AUG-22
02-AUG-22
2
01-AUG-22
04-AUG-22
03-AUG-22
3
01-AUG-22
05-AUG-22
08-AUG-22
2
01-AUG-22
10-AUG-22
09-AUG-22
1
01-AUG-22
11-AUG-22
db<>fiddle here
I don't quite understand the question. The thing that is confusing me is that you need just rows where ID is unique. If ID is unique than there is no other row to overlap with. Anyway, lets suppose that the sample data is like below:
WITH
tbl AS
(
SELECT 0 "ID", DATE '2021-07-01' "CREATED", DATE '2021-07-01' "DATE_FROM", DATE '2021-07-13' "DATE_TO" FROM DUAL UNION ALL
SELECT 1, DATE '2021-12-01', DATE '2021-12-01', DATE '2021-12-03' FROM DUAL UNION ALL
SELECT 1, DATE '2021-12-04', DATE '2021-12-04', DATE '2021-12-14' FROM DUAL UNION ALL
SELECT 1, DATE '2021-12-12', DATE '2021-12-12', DATE '2021-12-29' FROM DUAL UNION ALL
SELECT 2, DATE '2022-08-04', DATE '2022-08-04', DATE '2022-08-10' FROM DUAL UNION ALL
SELECT 2, DATE '2022-08-11', DATE '2022-08-11', DATE '2022-08-21' FROM DUAL UNION ALL
SELECT 2, DATE '2022-08-21', DATE '2022-08-21', DATE '2022-08-29' FROM DUAL UNION ALL
SELECT 3, DATE '2022-08-11', DATE '2022-08-11', DATE '2022-08-29' FROM DUAL UNION ALL
SELECT 4, DATE '2022-08-14', DATE '2022-08-14', DATE '2022-08-14' FROM DUAL UNION ALL
SELECT 4, DATE '2022-08-29', DATE '2022-08-14', DATE '2022-08-29' FROM DUAL
)
We can add some columns that will tell us if the ID is unique or not, what is the order of appearance of the same ID, what is the end date of the previous row for the same ID and if the rows of a particular ID overlaps or not. Here is the code: (used analytic functions with windowing clause)
SELECT
ID "ID",
CASE WHEN Count(*) OVER (PARTITION BY ID ORDER BY ID) = 1 THEN 'Y' ELSE 'N' END "IS_UNIQUE",
Count(ID) OVER (PARTITION BY ID ORDER BY ID, DATE_FROM, DATE_TO ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) "ID_ORDER_NO",
CREATED "CREATED",
DATE_FROM "DATE_FROM",
DATE_TO "DATE_TO",
CASE
WHEN Count(ID) OVER (PARTITION BY ID ORDER BY ID, DATE_FROM, DATE_TO ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) = 1
THEN Null
ELSE
First_Value(DATE_TO) OVER (PARTITION BY ID ORDER BY ID, DATE_FROM, DATE_TO ROWS BETWEEN 1 PRECEDING AND CURRENT ROW )
END "PREVIOUS_END_DATE",
CASE
WHEN Count(ID) OVER (PARTITION BY ID ORDER BY ID, DATE_FROM, DATE_TO ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) = 1
THEN 'N'
ELSE
CASE
WHEN DATE_FROM <= First_Value(DATE_TO) OVER (PARTITION BY ID ORDER BY ID, DATE_FROM, DATE_TO ROWS BETWEEN 1 PRECEDING AND CURRENT ROW )
THEN 'Y'
ELSE 'N'
END
END "OVERLAPS"
FROM
TBL
WHERE
CREATED BETWEEN ADD_MONTHS(TRUNC(SYSDATE, 'dd'), -12) And TRUNC(SYSDATE, 'dd')
Here is the resulting dataset...
/* R e s u l t
ID IS_UNIQUE ID_ORDER_NO CREATED DATE_FROM DATE_TO PREVIOUS_END_DATE OVERLAPS
---------- --------- ----------- --------- --------- --------- ----------------- --------
1 N 1 01-DEC-21 01-DEC-21 03-DEC-21 N
1 N 2 04-DEC-21 04-DEC-21 14-DEC-21 03-DEC-21 N
1 N 3 12-DEC-21 12-DEC-21 29-DEC-21 14-DEC-21 Y
2 N 1 04-AUG-22 04-AUG-22 10-AUG-22 N
2 N 2 11-AUG-22 11-AUG-22 21-AUG-22 10-AUG-22 N
2 N 3 21-AUG-22 21-AUG-22 29-AUG-22 21-AUG-22 Y
3 Y 1 11-AUG-22 11-AUG-22 29-AUG-22 N
4 N 1 14-AUG-22 14-AUG-22 14-AUG-22 N
4 N 2 29-AUG-22 14-AUG-22 29-AUG-22 14-AUG-22 Y
*/
This dataset could be further used to get you the rows and columns that you are trying to get. You can filter it, do some other calculations (like number of overlaping days), get number of rows per ID and so on....
Regards...

Getting last 4 months data from given date column some months data is midding

I have below data
Record_date ID
28-feb-2022 xyz
31-Jan-2022 ABC
30-nov-2022 jkl
31-oct-2022 dcs
I want to get last 3 months data from given date column. We don't have to consider the missing month.
Output should be:
Record_date ID
28-feb-2022 xyz
31-Jan-2022 ABC
30-nov-2022 jkl
In the last 3 months Dec is missing but we have to ignore it as the data is not available. Tried many things but not working.
Any suggestions?
Assuming you are using Oracle then you can use Oralce ADD_MONTHS function and filter the data.
--- untested
-- Assumption Record_date is a date column
SELECT * FROM table1
where Record_date > ADD_MONTHS(SYSDATE, -3)
To get the data for the three months that are latest in the table, you can use:
SELECT record_date,
id
FROM (
SELECT t.*,
DENSE_RANK() OVER (ORDER BY TRUNC(Record_date, 'MM') DESC) AS rnk
FROM table_name t
)
WHERE rnk <= 3;
Which, for the sample data:
CREATE TABLE table_name (Record_date, ID) AS
SELECT DATE '2022-02-28', 'xyz' FROM DUAL UNION ALL
SELECT DATE '2022-01-31', 'ABC' FROM DUAL UNION ALL
SELECT DATE '2022-11-30', 'jkl' FROM DUAL UNION ALL
SELECT DATE '2022-10-31', 'dcs' FROM DUAL;
Outputs:
RECORD_DATE
ID
2022-11-30 00:00:00
jkl
2022-10-31 00:00:00
dcs
2022-02-28 00:00:00
xyz
db<>fiddle here

SQL - Vertica: How to generate daily rows with most previous date data

I have a base table like below:
score_upd (Upd_dt,Url,Score) AS (
SELECT DATE '2019-07-26','A','x'
UNION ALL SELECT DATE '2019-07-26','B','alpha'
UNION ALL SELECT DATE '2019-08-01','A','y'
UNION ALL SELECT DATE '2019-08-01','B','beta'
UNION ALL SELECT DATE '2019-08-03','A','z'
UNION ALL SELECT DATE '2019-08-03','B','gamma'
)
Upd_dt URL Score
2019-07-26 A x
2019-07-26 B alpha
2019-08-01 A y
2019-08-01 B beta
2019-08-03 A z
2019-08-03 B gamma
And I want to create a table in daily-url level, using most previous date's value for the new rows, result should look like below:
score_upd (Upd_dt,Url,Score) AS (
SELECT DATE '2019-07-26','A','x'
UNION ALL SELECT DATE '2019-07-26','B','alpha'
UNION ALL SELECT DATE '2019-07-27','A','x'
UNION ALL SELECT DATE '2019-07-27','B','alpha'
UNION ALL SELECT DATE '2019-07-28','A','x'
UNION ALL SELECT DATE '2019-07-28','B','alpha'
UNION ALL SELECT DATE '2019-07-29','A','x'
UNION ALL SELECT DATE '2019-07-29','B','alpha'
UNION ALL SELECT DATE '2019-07-30','A','x'
UNION ALL SELECT DATE '2019-07-30','B','alpha'
UNION ALL SELECT DATE '2019-07-31','A','x'
UNION ALL SELECT DATE '2019-07-31','B','alpha'
UNION ALL SELECT DATE '2019-08-01','A','y'
UNION ALL SELECT DATE '2019-08-01','B','beta'
UNION ALL SELECT DATE '2019-08-02','A','y'
UNION ALL SELECT DATE '2019-08-02','B','beta'
UNION ALL SELECT DATE '2019-08-03','A','z'
UNION ALL SELECT DATE '2019-08-03','B','gamma'
UNION ALL SELECT DATE '2019-08-04','A','z'
UNION ALL SELECT DATE '2019-08-04','B','gamma'
UNION ALL SELECT DATE '2019-08-05','A','z'
UNION ALL SELECT DATE '2019-08-05','B','gamma'
)
Which looks like:
Upd_dt URL Score
2019-07-26 A x
2019-07-26 B alpha
2019-07-27 A x
2019-07-27 B alpha
2019-07-28 A x
2019-07-28 B alpha
2019-07-29 A x
2019-07-29 B alpha
2019-07-30 A x
2019-07-30 B alpha
2019-07-31 A x
2019-07-31 B alpha
2019-08-01 A y
2019-08-01 B beta
2019-08-02 A y
2019-08-02 B beta
2019-08-03 A z
2019-08-03 B gamma
2019-08-04 A z
2019-08-04 B gamma
2019-08-05 A z
2019-08-05 B gamma
.
.
.
Current process is:
I built a daily dimension table since 7/26/2019 till today by:
/*
SELECT CAST(slice_time AS DATE) dates
FROM testcalendar mtc
TIMESERIES slice_time as '1 day'
OVER (ORDER BY CAST(mtc.dates as TIMESTAMP));
*/
so I get:
Dates
2019-07-26
2019-07-27
2019-07-28
2019-07-29
.
.
.
2019-10-12 (today)
I'm thinking if I can use function such as "interpolate previous value" to join my first table by dates, to generate missing days by using values from most previous date data, while it failed.
The result didn't generate rows for missing days.
Please let me know if anyone has any better idea on this.
Thanks!
As a starting warning : only store a "daily photograph" when it really, really is necessary. In my past, I once ended up having 364 rows too many per year, as the values only changed once a year. In Vertica, that costs license, and CPU and clock time for joining and grouping ...
But, for the rest - Good start.
But you could apply the TIMESERIES without having to build a calendar.
The trick is to "extrapolate" manually what you can INTERPOLATE automatically.
Add an in-line 'padding' table, which contains the newest value per URL, but give it CURRENT_DATE instead of the newest actual date - using Vertica's peculiar analytic limit clause LIMIT 1 OVER(PARTITION BY url ORDER BY upd_dt DESC) .
UNION SELECT that padding table with your input, and apply the TIMESERIES clause to that UNION SELECT.
Like so:
WITH
-- your input ...
score_upd (Upd_dt,Url,Score) AS (
SELECT DATE '2019-07-26','A','x'
UNION ALL SELECT DATE '2019-07-26','B','alpha'
UNION ALL SELECT DATE '2019-08-01','A','y'
UNION ALL SELECT DATE '2019-08-01','B','beta'
UNION ALL SELECT DATE '2019-08-03','A','z'
UNION ALL SELECT DATE '2019-08-03','B','gamma'
)
-- real WITH clause would start here ...
,
-- newest row per Url, just with current date
pad_newest AS (
SELECT
CURRENT_DATE
, url
, score
FROM score_upd
LIMIT 1 OVER(PARTITION BY url ORDER BY upd_dt DESC)
)
,
with_newest AS (
SELECT
*
FROM score_upd
UNION ALL
SELECT *
FROM pad_newest
)
SELECT
ts_dt::DATE AS upd_dt
, url AS url
, TS_FIRST_VALUE(score) AS score
FROM with_newest
TIMESERIES ts_dt AS '1 day' OVER (
PARTITION BY url ORDER BY upd_dt::TIMESTAMP
)
ORDER BY 1,2
;

Select dates older than time frame SQL

I am trying to find all records in a database with an admission date which is older than a certain time frame (in this case, all admission dates older than 4 days old).
I have:
select memberid, admitdate
from membertable
where admitdate < (sysdate-4)
As a result, I'm getting a lot of admission dates which match this, but I'm ALSO getting dates which are from only 2 days ago, so that doesn't match my code. What am I doing wrong?
If it helps, the admit dates have a format of mm/dd/yyyy.
Dates, including sysdate, have a time component. Even if all your admitdate values are at midnight that is still a time, and sysdate is only going to be at midnight if you run your query then.
select sysdate, sysdate-4, trunc(sysdate), trunc(sysdate)-4 from dual;
SYSDATE SYSDATE-4 TRUNC(SYSDATE) TRUNC(SYSDATE)-4
------------------- ------------------- ------------------- -------------------
2018-06-21 16:44:53 2018-06-17 16:44:53 2018-06-21 00:00:00 2018-06-17 00:00:00
If you filter your records on sysdate-4 then that will include any admitdate values up to, in this example, 2018-06-17 16:44:53; so presumably all the records for the 17th if they are actually all midnight.
with membertable (memberid, admitdate) as (
select 1, date '2018-06-15' from dual
union all select 2, date '2018-06-16' from dual
union all select 3, date '2018-06-17' from dual
union all select 4, date '2018-06-18' from dual
union all select 5, date '2018-06-19' from dual
union all select 6, date '2018-06-20' from dual
union all select 7, date '2018-06-21' from dual
)
select memberid, admitdate
from membertable
where admitdate < (sysdate-4);
MEMBERID ADMITDATE
---------- -------------------
1 2018-06-15 00:00:00
2 2018-06-16 00:00:00
3 2018-06-17 00:00:00
If you truncate the value you're comparing against then its time portion will also be treated as midnight, so you'll only match record up to - but not including - that point in time, 2018-06-17 00:00:00:
with membertable (memberid, admitdate) as (
select 1, date '2018-06-15' from dual
union all select 2, date '2018-06-16' from dual
union all select 3, date '2018-06-17' from dual
union all select 4, date '2018-06-18' from dual
union all select 5, date '2018-06-19' from dual
union all select 6, date '2018-06-20' from dual
union all select 7, date '2018-06-21' from dual
)
select memberid, admitdate
from membertable
where admitdate < trunc(sysdate)-4;
MEMBERID ADMITDATE
---------- -------------------
1 2018-06-15 00:00:00
2 2018-06-16 00:00:00
admitdate should be a date. You seem to be suggesting it is a string. You can try:
where to_date(admitdate, 'MM/DD/YYYY') < trunc(sysdate) - 4;
You can then fix the data in the table, so it is stored as a date.

Next 5 Available Dates

I wonder if anyone could tell me how I can get the next 5 available dates using a table which only stores the Weekend dates and Bank Holiday dates.. So it has to select the next 5 days which do not collide with any dates in the table.
I would like to see the following results from this list of dates:
07/11/2015 (Saturday)
08/11/2015 (Sunday)
09/11/2015 (Holiday)
14/11/2015 (Saturday)
15/11/2015 (Sunday)
Results:
05/11/2015 (Thursday)
06/11/2015 (Friday)
10/11/2015 (Tuesday)
11/11/2015 (Wednesday)
12/11/2015 (Thursday)`
Based on limited information, here's a quick hack:
with offsets(n) as (
select 1 union all
select 2 union all
select 3 union all
select 4 union all
select 5 union all
select 6 union all
select 7 union all
select 8 union all
select 9 union all
select 10 union all
select 11
)
select top 5 dateadd(dd, n, cast(getdate() as date)) as dt from offsets
where dateadd(dd, n, cast(getdate() as date) not in (
select dt from <exclude_dates>
)
order by dt
A possible solution is to create a table of all possible dates in a year.
select top 5 date
from possible_dates
where date not in
(select date from unavailable_dates)
and date > [insert startdate here]
order by date