SQL CTE to Sybase IQ to identify 30 consecutive days - sql

I need to identify the member/providers (created a key called: PRV_MBR_KEY) who have 30 consecutive dates of service with no breaks. I've tested and validated on sample data successfully in SQL SVR. I need to re-code this to work in Sybase IQ platform.
Sample Data:
PRV_MBR_KEY FDOS_LN
330800913-00369544518 10/10/2016
330800913-00369544518 10/11/2016
330800913-00369544518 10/12/2016
330800913-00369544518 10/13/2016
330800913-00369544518 10/14/2016
330800913-00369544518 10/15/2016
330800913-00369544518 10/16/2016
330800913-00369544518 10/17/2016
330800913-00369544518 10/18/2016
330800913-00369544518 10/19/2016
Here's the SQL code that works:
WITH CTE AS (
SELECT PRV_MBR_KEY,
[FDOS_LN],
RangeId = DATEDIFF(DAY, 0, [FDOS_LN])
- ROW_NUMBER() OVER (PARTITION BY PRV_MBR_KEY ORDER BY [FDOS_LN])
FROM TEST_table)
SELECT PRV_MBR_KEY,
StartDate=MIN([FDOS_LN]),
EndDate=MAX([FDOS_LN]),
DAYCOUNT=DATEDIFF(DAY,MIN([FDOS_LN]),MAX([FDOS_LN]))+1
FROM CTE
GROUP BY PRV_MBR_KEY,RangeID
HAVING DATEDIFF(DAY,MIN([FDOS_LN]),MAX([FDOS_LN]))+1>=30
ORDER BY PRV_MBR_KEY,MIN([FDOS_LN])
Expected Results:
PRV_MBR_KEY StartDate EndDate DAYCOUNT
330800913-00369544518 2016-10-10 2016-11-13 35
330800913-00565274557 2017-01-26 2017-02-24 30
The error I get when running code in Sybase IQ:
[Sybase][ODBC Driver][Sybase IQ]Data exception - argument must be DATE or
DATETIME --(dflib/dfe_datepart.cxx 1450)

Related

How to bin timestamp data into buckets of custom width of n hours in vertica

I have a table which contains a column Start_Timestamp which has time stamp values like 2020-06-02 21:08:37. I would like to create new column which classifies these timestamps into bins of 6hours.
Eg.
Input :
Start_Timestamp
2020-06-02 21:08:37
2020-07-19 01:23:40
2021-11-13 12:08:37
Expected Output ( Here each bin is of 6hours width) :
Start_Timestamp
Bin
2020-06-02 21:08:37
18H - 24H
2020-07-19 01:23:40
00H - 06H
2021-11-13 12:08:37
12H - 18H
I have tried using TIMESERIES but can anyone help to generate output in following format
It's Vertica. Use the TIME_SLICE() function. Then, combine it with the TO_CHAR() function that Vertica shares with Oracle.
You can always add a CASE WHEN expression to change 00:00 to 24:00, but as that is not the standard, I wouldn't even bother.
WITH
indata(start_ts) AS (
SELECT TIMESTAMP '2020-06-02 21:08:37'
UNION ALL SELECT TIMESTAMP '2020-07-19 01:23:40'
UNION ALL SELECT TIMESTAMP '2021-11-13 12:08:37'
)
SELECT
TIME_SLICE(start_ts,6,'HOUR')
AS tm_slice
, TO_CHAR(TIME_SLICE(start_ts,6,'HOUR'),'HH24:MIH - ')
||TO_CHAR(TIME_SLICE(start_ts,6,'HOUR','END'),'HH24:MIH')
AS caption
, start_ts
FROM indata;
-- out tm_slice | caption | start_ts
-- out ---------------------+-----------------+---------------------
-- out 2020-06-02 18:00:00 | 18:00H - 00:00H | 2020-06-02 21:08:37
-- out 2020-07-19 00:00:00 | 00:00H - 06:00H | 2020-07-19 01:23:40
-- out 2021-11-13 12:00:00 | 12:00H - 18:00H | 2021-11-13 12:08:37
You can simply extract the hour and do some arithmetic:
select t.*,
floor(extract(hour from start_timestamp) / 6) * 6 as bin
from t;
Note: This characterizes the bin by the earliest hour. That seems more useful than a string representation, but you can construct a string if you really want.

Left join with nested selects and aggregate functions

Problem
I have one table of generated dates (s) which I want to join with another table (d) which is a list of dates where a specific occurrence has happened.
table s
Wednesday 23rd August 2017
Thursday 24th August 2017
Friday 25th August 2017
Saturday 26th August 2017
table d
day_created -------------------------------- count
Thursday 24th August 2017 ---------------- 45
Saturday 26th August 2017 ---------------- 32
I want to show rows where the occurrence does not take place, which I cannot do if I just have table d.
I want something that looks like:
day_created -------------------------------- count
Wednesday 23rd August --------------------- 0
Thursday 24th August 2017 ---------------- 45
Friday 25th August 2017 ------------------ 0
Saturday 26th August 2017 ---------------- 32
I've tried joining with a left join as follows:
SELECT day_created, COUNT(d.day_created) as total_per_day
FROM
(SELECT date_trunc('day', task_1.created_at) as day_created
FROM task_1
)
d
LEFT JOIN (
SELECT (generate_series('2017-05-01', current_date, '1 day'::INTERVAL)) as standard_date
)
s
ON d.day_created=s.standard_date
GROUP BY d.day_created
ORDER BY day_created DESC;
I don't get an error however the join isn't working (i.e. it doesn't return dates where the count is null). What it returns is the dates from table d and the count, but not the dates in between where there are 0 occurrences.
I've been going round in circles and have understood that I need to make table s (I think!) the left table, but I'm getting confused as a newbie with the syntax.
This is all in PostgreSQL 9.5.8.
Basically, you had the LEFT JOIN backwards. This should work, with some other simplifications and performance optimizations:
SELECT s.standard_date, COUNT(d.day_created) AS total_per_day
FROM generate_series('2017-05-01', current_date, interval '1 day') s(standard_date)
LEFT JOIN task_1 d ON d.day_created >= s.standard_date
AND d.day_created < s.standard_date + interval '1 day'
GROUP BY 1
ORDER BY 1;
This counts rows in d, like you commented. Does not sum values.
Be aware that generate_series() still returns timestamp with time zone, even if you pass date values to it. You may want to cast to date or format with to_char() for display in the outer SELECT. (But rather group and order by the original timestamp value, not the formatted string.)
There may be corner cases depending on the current time zone setting depending on the actual undisclosed table definition.
Related:
How to avoid a subquery in FILTER clause?
I have one table of generated dates (s)
In real databases, we don't store a generated series. We just generate them when needed.
which I want to join with another table (d) which is a list of dates where a specific occurrence has happened. [...] I want to show rows where the occurrence does not take place, which I cannot do if I just have table d.
Nah, you can do it.
CREATE TABLE d(day_created, count) AS VALUES
('24 August 2017'::date, 45),
('26 August 2017'::date, 32);
SELECT day_created, coalesce(count,0)
FROM (
SELECT d::date
FROM generate_series(
'2017-08-01'::timestamp without time zone,
'2017-09-01'::timestamp without time zone,
'1 day'
) AS gs(d)
) AS gs(day_created)
LEFT OUTER JOIN d USING(day_created)
ORDER BY day_created;
day_created | coalesce
-------------+----------
2017-08-01 | 0
2017-08-02 | 0
2017-08-03 | 0
2017-08-04 | 0
2017-08-05 | 0
2017-08-06 | 0
2017-08-07 | 0
2017-08-08 | 0
2017-08-09 | 0
2017-08-10 | 0
2017-08-11 | 0
2017-08-12 | 0
2017-08-13 | 0
2017-08-14 | 0
2017-08-15 | 0
2017-08-16 | 0
2017-08-17 | 0
2017-08-18 | 0
2017-08-19 | 0
2017-08-20 | 0
2017-08-21 | 0
2017-08-22 | 0
2017-08-23 | 0
2017-08-24 | 45
2017-08-25 | 0
2017-08-26 | 32
2017-08-27 | 0
2017-08-28 | 0
2017-08-29 | 0
2017-08-30 | 0
2017-08-31 | 0
2017-09-01 | 0
(32 rows)

SQL Server Query to Convert Quarterly and Semi-Annual Principal and Interest Payments to Monthly Values

We are using SQL Server 2014 as a main reporting database and we have report that requires some very specific data manipulation. The data we have to work with is a schedule of principal and interest payments that can be presented in various types of series (i.e. quarterly, semi-annualy, annualy, etc.). To determine monthly income, these principal and interest payments need to be reorganized from their original format to a monthly schedule. Below is an example of the original data format:
Original Cashflow Schedule
Cashflow_Date Principal Interest
------------- --------- --------
2015-12-15 0 1000.00
2016-06-15 0 1000.00
2016-12-15 10000.00 1000.00
Below is the format that is needed:
Desired Cashflow Schedule
Cashflow_Date Principal Interest
------------- --------- --------
2015-12-15 0 166.667
2016-01-15 0 166.667
2016-02-15 0 166.667
2016-03-15 0 166.667
2016-04-15 0 166.667
2016-05-15 0 166.667
2016-06-15 0 166.667
2016-07-15 0 166.667
2016-08-15 0 166.667
2016-09-15 0 166.667
2016-10-15 0 166.667
2016-11-15 0 166.667
2016-12-15 10000.00 1000.00
Basically, months between payments from the original schedule need to be returned along with the original payment dates and the original payments need to be broken out into monthly amounts between the original payment dates (i.e. 1000/6=166.667 from 2015-12-15 to 2016-05-15). The last payment date (in this case 2016-12-15) will stay as is. The principal and interest payments are not guaranteed to be the same throughout the original schedule, so it is important to divide the payments appropriately.
The current process we have unfortunately uses a cursor and a loop inside the cursor (very bad I know) to produce the needed result set. Can anyone provide any insight into a set based query that might be able to produce the same results much faster? Any assistance would be greatly appreciated.
Updated scenario
Per a question posted below, if a one time principal or interest payment is made in the original schedule, those payments are divided up accordingly. For example, if a one time principal payment is made on 2016-06-15, then the monthly income schedule would reflect that in this way:
Edge Case Desired Cashflow Schedule
Cashflow_Date Principal Interest
------------- --------- --------
2015-12-15 0 166.667
2016-01-15 0 166.667
2016-02-15 0 166.667
2016-03-15 0 166.667
2016-04-15 0 166.667
2016-05-15 0 166.667
2016-06-15 208.33 166.667
2016-07-15 208.33 166.667
2016-08-15 208.33 166.667
2016-09-15 208.33 166.667
2016-10-15 208.33 166.667
2016-11-15 208.33 166.667
2016-12-15 10000.00 1000.00
Presumably you want something like this:
DECLARE # TABLE (Cashflow_Date DATE, Principal DECIMAL (10,2), Interest DECIMAL (10,2));
INSERT # VALUES ('2015-12-15', 0, 1000.0), ('2016-06-15', 1250.0, 1000.0), ('2016-12-15', 10000.0, 1000.0);
SELECT DATEADD(MONTH, n.number, Cashflow_Date) Dates
, MAX(Principal) / ISNULL(DATEDIFF(MONTH, Cashflow_Date, nextDate), 1) Principal
, MAX(Interest) / ISNULL(DATEDIFF(MONTH, Cashflow_Date, nextDate), 1) Interest
FROM (
SELECT t.Cashflow_Date
, t.Principal
, t.Interest
, X.nextDate
FROM # t
OUTER APPLY (
SELECT MIN(Cashflow_Date)
FROM #
WHERE Cashflow_Date > t.Cashflow_Date) X(nextDate)) t
CROSS JOIN (
SELECT number
FROM master..spt_values
WHERE type='P') n
WHERE n.number < ISNULL(DATEDIFF(MONTH, Cashflow_Date, nextDate), 1)
GROUP BY DATEADD(MONTH, n.number, Cashflow_Date), nextDate, Cashflow_Date
ORDER BY DATEADD(MONTH, n.number, Cashflow_Date);
You need some sort of tally table to fill in the dates between the first date and the next date, then you just need to average out the principal and interest based on the number of months between the first date and the next date.
See if this will help you get started towards getting rid of cursors
(Not tested for all scenarios)
/*
one time setup
CREATE TABLE [dbo].[Tbl]
(
[cf] [date] NOT NULL,
[pmt] [int] NOT NULL,
[intrest] [int] NOT NULL
)
GO
insert into dbo.Tbl values ( '2015-12-15', 0 , 1000)
insert into dbo.Tbl values ( '2015-06-15', 0 , 1000)
insert into dbo.Tbl values ( '2016-12-15', 1000 , 1000)
*/
select Top 13 b.* , intrest/6.0 as int_mnthly, DATEADD(Month,Mnth, cf) as cf_mnthly
from dbo.Tbl b
cross join
(
select 1 as Mnth Union ALL
select 2 as Mnth Union ALL
select 3 as Mnth Union ALL
select 4 as Mnth Union ALL
select 5 as Mnth Union ALL
select 6 as Mnth
) a
order by cf_mnthly

Count days based on couple of date fields in SQL Server

Here is a sample of what I have in my table (SQL Server):
patientID DateCreated StartOn EndOn
---------------------------------------------------
1234 2015-09-16 2015-09-01 2015-09-30
2345 2015-09-16 2015-09-01 2015-09-30
2346 2015-09-16 2015-09-01 2015-09-30
Currently, it counts the "days" to be 30. So it is really looking at days elapsed between StartOn and EndOn. I want to be able to do this counting based on StartOn and DateCreated. So, in my example the "days" should be 16, that is days elapsed from StartOn to DateCreated.
You can use DateDiff(Day,StartOn,DateCreated)
So you can go with:
Select (EndOn - DateCreated +1) As "Days"
from Tablename
where patientID = 1234;

GROUP BY several hours

I have a table where our product records its activity log. The product starts working at 23:00 every day and usually works one or two hours. This means that once a batch started at 23:00, it finishes about 1:00am next day.
Now, I need to take statistics on how many posts are registered per batch but cannot figure out a script that would allow me achiving this. So far I have following SQL code:
SELECT COUNT(*), DATEPART(DAY,registrationtime),DATEPART(HOUR,registrationtime)
FROM RegistrationMessageLogEntry
WHERE registrationtime > '2014-09-01 20:00'
GROUP BY DATEPART(DAY, registrationtime), DATEPART(HOUR,registrationtime)
ORDER BY DATEPART(DAY, registrationtime), DATEPART(HOUR,registrationtime)
which results in following
count day hour
....
1189 9 23
8611 10 0
2754 10 23
6462 11 0
1885 11 23
I.e. I want the number for 9th 23:00 grouped with the number for 10th 00:00, 10th 23:00 with 11th 00:00 and so on. How could I do it?
You can do it very easily. Use DATEADD to add an hour to the original registrationtime. If you do so, all the registrationtimes will be moved to the same day, and you can simply group by the day part.
You could also do it in a more complicated way using CASE WHEN, but it's overkill on the view of this easy solution.
I had to do something similar a few days ago. I had fixed timespans for work shifts to group by where one of them could start on one day at 10pm and end the next morning at 6am.
What I did was:
Define a "shift date", which was simply the day with zero timestamp when the shift started for every entry in the table. I was able to do so by checking whether the timestamp of the entry was between 0am and 6am. In that case I took only the date of this DATEADD(dd, -1, entryDate), which returned the previous day for all entries between 0am and 6am.
I also added an ID for the shift. 0 for the first one (6am to 2pm), 1 for the second one (2pm to 10pm) and 3 for the last one (10pm to 6am).
I was then able to group over the shift date and shift IDs.
Example:
Consider the following source entries:
Timestamp SomeData
=============================
2014-09-01 06:01:00 5
2014-09-01 14:01:00 6
2014-09-02 02:00:00 7
Step one extended the table as follows:
Timestamp SomeData ShiftDay
====================================================
2014-09-01 06:01:00 5 2014-09-01 00:00:00
2014-09-01 14:01:00 6 2014-09-01 00:00:00
2014-09-02 02:00:00 7 2014-09-01 00:00:00
Step two extended the table as follows:
Timestamp SomeData ShiftDay ShiftID
==============================================================
2014-09-01 06:01:00 5 2014-09-01 00:00:00 0
2014-09-01 14:01:00 6 2014-09-01 00:00:00 1
2014-09-02 02:00:00 7 2014-09-01 00:00:00 2
If you add one hour to registrationtime, you will be able to group by the date part:
GROUP BY
CAST(DATEADD(HOUR, 1, registrationtime) AS date)
If the starting hour must be reflected accurately in the output (as 9, 23, 10, 23 rather than as 10, 0, 11, 0), you could obtain it as MIN(registrationtime) in the SELECT clause:
SELECT
count = COUNT(*),
day = DATEPART(DAY, MIN(registrationtime)),
hour = DATEPART(HOUR, MIN(registrationtime))
Finally, in case you are not aware, you can reference columns by their aliases in ORDER BY:
ORDER BY
day,
hour
just so that you do not have to repeat the expressions.
The below query will give you what you are expecting..
;WITH CTE AS
(
SELECT COUNT(*) Count, DATEPART(DAY,registrationtime) Day,DATEPART(HOUR,registrationtime) Hour,
RANK() over (partition by DATEPART(HOUR,registrationtime) order by DATEPART(DAY,registrationtime),DATEPART(HOUR,registrationtime)) Batch_ID
FROM RegistrationMessageLogEntry
WHERE registrationtime > '2014-09-01 20:00'
GROUP BY DATEPART(DAY, registrationtime), DATEPART(HOUR,registrationtime)
)
SELECT SUM(COUNT) Count,Batch_ID
FROM CTE
GROUP BY Batch_ID
ORDER BY Batch_ID
You can write a CASE statement as below
CASE WHEN DATEPART(HOUR,registrationtime) = 23
THEN DATEPART(DAY,registrationtime)+1
END,
CASE WHEN DATEPART(HOUR,registrationtime) = 23
THEN 0
END