Split column and values into multiple rows in Postgres - sql

Suppose I have a table like this:
subject | flag | first_date | last_date
----------------+----------------------------------
this is a test | 2 | 1/1/2016 | 1/4/2016
into something like this:
subject | flag | date
----------------+------------------
this is a test | .5 | 1/1/2016
this is a test | .5 | 1/2/2016
this is a test | .5 | 1/3/2016
this is a test | .5 | 1/4/2016
Is there an easy way to do this?

You can use generate_series() to produce list of consecutive days between first_date and last_date:
with dates as (
select d::date, last_date- first_date+ 1 ct
from test, generate_series(first_date, last_date, '1d'::interval) d
)
select subject, flag/ ct flag, d date
from dates
cross join test;
subject | flag | date
----------------+------------------------+------------
this is a test | 0.50000000000000000000 | 2016-01-01
this is a test | 0.50000000000000000000 | 2016-01-02
this is a test | 0.50000000000000000000 | 2016-01-03
this is a test | 0.50000000000000000000 | 2016-01-04
(4 rows)

Related

How to "expand" a SQL join such that each unique value in column A "get" all the unique values for B?

I have a dataset with two columns: id and date. The dates are monthly and span from Mar-21 to Aug-21. I am sure this question could be applied to non-date values, but I think dates are more intuitive for this example.
id | date |
----+--------+--
a | Mar-21 |
a | Apr-21 |
a | Aug-21 | <---- 'a' is missing Jun-21 and Jul-21
b | Mar-21 |
b | May-21 | <---- 'b' is missing Apr-21
b | Jun-21 |
b | Jul-21 |
b | Aug-21 |
And I want this
id | date |
----+--------+--
a | Mar-21 |
a | Apr-21 |
a | May-21 |
a | Jun-21 | <---- 'a' gets Jun-21
a | Aug-21 | <---- ...and now Jul-21
b | Mar-21 |
b | Apr-21 | <---- 'b' gets Apr-21
b | May-21 |
b | Jun-21 |
b | Jul-21 |
b | Aug-21 |
Basically I want to say "I want every single id to get all unique values of date.
Consider below approach
select id, format_date('%b-%y', dt) date
from unnest(generate_date_array('2021-03-01', '2021-08-01', interval 1 month)) dt,
(select distinct id from your_table)
-- order by id, dt
if applied to sample data in your question - output is

Insert a row for each month in the range [duplicate]

This question already has answers here:
Generate series of months for every row in Oracle
(1 answer)
Create all months list from a date column in ORACLE SQL
(3 answers)
Closed 1 year ago.
I want to make my table here in Oracle
+----+------------+------------+
| N | Start | End |
+----+------------+------------+
| 1 | 2018-01-01 | 2018-05-31 |
| 1 | 2018-01-01 | 2018-06-31 |
+----+------------+------------+
Into, as silly as it looks I need to insert one row for each month in the range for each in the first table
+----+------------+
| N | month| |
+----+------------+
| 1 | 2018-01-01 |
| 1 | 2018-01-01 |
| 1 | 2018-02-01 |
| 1 | 2018-02-01 |
| 1 | 2018-03-01 |
| 1 | 2018-03-01 |
| 1 | 2018-04-01 |
| 1 | 2018-04-01 |
| 1 | 2018-05-01 |
| 1 | 2018-05-01 |
| 1 | 2018-06-01 |
+----+------------+
I been trying to follow SQL: Generate Record Per Month In Date Range but I haven't had any luck figuring out the result I want.
Thanks for helping
My best guess is that you want to show all begining of months that are in the interval start to end in your table.
create table t1 as
select date'2018-01-01' start_d, date'2018-05-31' end_d from dual union all
select date'2018-01-01' start_d, date'2018-06-30' end_d from dual;
with cal as
(select add_months(date'2018-01-01', rownum-1) month_d
from dual connect by level <= 12)
select cal.month_d from cal
join t1 on cal.month_d between t1.start_d and t1.end_d
order by 1;
MONTH_D
-------------------
01.01.2018 00:00:00
01.01.2018 00:00:00
01.02.2018 00:00:00
01.02.2018 00:00:00
01.03.2018 00:00:00
01.03.2018 00:00:00
01.04.2018 00:00:00
01.04.2018 00:00:00
01.05.2018 00:00:00
01.05.2018 00:00:00
01.06.2018 00:00:00
So probaly there is a cut & paste error in your expectation for January.
Some other points
do not use reserved word as start for column names
Use DATE format to store dates to aviod invalid entries such as 2018-06-31
You can use a recursive CTE. For example:
with
n (s, e, cur) as (
select s, e, s from t
union all
select s, e, add_months(cur, 1)
from n
where add_months(cur, 1) < e
)
select cur from n;
Result:
CUR
---------
01-JAN-18
01-JAN-18
01-FEB-18
01-FEB-18
01-MAR-18
01-MAR-18
01-APR-18
01-APR-18
01-MAY-18
01-MAY-18
01-JUN-18
See running example at db<>fiddle.

Adding indicator column to table based on having two consecutive days within group

I need to add a logic that helps me to flag the first of two consecutive days as 1 and the second day as 0 grouped by a column (test). If a test (a) has three consecutive days then the third should start with 1 again etc.
Example table would be like following with new col being the column I need.
|---------------------|------------------|---------------------|
| test | test_date | new col |
|---------------------|------------------|---------------------|
| a | 1/1/2020 | 1 |
|---------------------|------------------|---------------------|
| a | 1/2/2020 | 0 |
|---------------------|------------------|---------------------|
| a | 1/3/2020 | 1 |
|---------------------|------------------|---------------------|
| b | 1/1/2020 | 1 |
|---------------------|------------------|---------------------|
| b | 1/2/2020 | 0 |
|---------------------|------------------|---------------------|
| b | 1/15/2020 | 1 |
|---------------------|------------------|---------------------|
As it seems to be some gaps-and-islands problem and I assume some windows function approach should get me there.
I tried something like following to get the consecutive part but struggle with the indicator column.
Select
test,
test_date,
grp_var = dateadd(day,
-row_number() over (partition by test order by test_date), test_date)
from
my_table
This does read as a gaps-and-island problem. I would recommend using the difference between row_number() and the date to generate the groups, and then arithmetic:
select
test,
test_date,
row_number() over(
partition by test, dateadd(day, -rn, test_date)
order by test_date
) % 2 new_col
from (
select
t.*,
row_number() over(partition by test order by test_date) rn
from mytable t
) t
Demo on DB Fiddle:
test | test_date | new_col
:--- | :--------- | ------:
a | 2020-01-01 | 1
a | 2020-01-02 | 0
a | 2020-01-03 | 1
b | 2020-01-01 | 1
b | 2020-01-02 | 0
b | 2020-01-15 | 1

SQL: how to check for neither overlapping nor holes in payment records

I do have a table PaymentSchedules with percentages info, and dates from/to for which those
percentages are valid, resource by resource:
| auto_numbered | res_id | date_start | date_end | org | pct |
|---------------+--------+------------+------------+-------+-----|
| 1 | A | 2018-01-01 | 2019-06-30 | One | 100 |
| 2 | A | 2019-07-01 | (NULL) | One | 60 |
| 3 | A | 2019-07-02 | 2019-12-31 | Two | 40 |
| 4 | A | 2020-01-01 | (NULL) | Two | 40 |
| 5 | B | (NULL) | (NULL) | Three | 100 |
| 6 | C | 2018-01-01 | (NULL) | One | 100 |
| 7 | C | 2019-11-01 | (NULL) | Four | 100 |
(Records #3 and #4 could be summarized onto just one line, but duplicated on purpose, to show that there are many combinations of date_start and date_end.)
A quick reading of the data:
Org "One" is fully paying for resource A up to 2019-06-30; then, it continues
to pay 60% of the cost, but the rest (40%) is being paid by org "Two" since
2019-07-02.
This should begin on 2019-07-01... small encoding error… provoking a 1-day gap.
Org "Three" is fully paying for resource B, at all times.
Org "One" is fully paying for resource C from 2018-01-01... but, starting on
2019-01-11, org "Four" is paying for it...
... and, there, there is an encoding error: we do have 200% of resource C being
taken into account since 2019-11-01: the record #6 should have been closed
(date_end set to 2019-10-31), but hasn't...
So, when we generate a financial report for the year 2019 (from 2019-01-01 to
2019-12-31), we will have calculation errors...
So, question: how can we make sure we don't have overlapping payments for
resources, or -- also the contrary -- "holes" for some period of times?
How is it possible to write an SQL query to check that there are neither
underpaid nor overpaid resources? That is, all resources in the table should be
paid, for every single day of the financial period being looked at, by exactly
one or more organizations, in a way that the summed up percentage is always
equal to 100%.
I don't see how to proceed with such a query. Anybody able to give hints, to put
me on track?
EDIT -- Working with both SQL Server and Oracle.
EDIT -- I don't own the DB, I can't add triggers or views. I need to be able to detect things "after the facts"... Need to easily spot the conflictual records, or the "missing" ones (in case of "period holes"), fix them by hand, and then re-run the financial report.
EDIT -- If we make an analysis for 2019, the following report would be desired:
| res_id | pct_sum | date |
|--------+---------+------------|
| A | 60 | 2019-07-01 |
| C | 200 | 2019-11-01 |
| C | 200 | 2019-11-02 |
| C | 200 | ... |
| C | 200 | ... |
| C | 200 | ... |
| C | 200 | 2019-12-30 |
| C | 200 | 2019-12-31 |
or, of course, an even much better version -- certainly unobtainable? -- where one
type of problem would one be present once, with the relevant date range for
which the problem is observed:
| res_id | pct_sum | date_start | date_end |
|--------+---------+------------+------------|
| A | 60 | 2019-07-01 | 2019-07-01 |
| C | 200 | 2019-11-01 | 2019-12-31 |
EDIT -- Fiddle code: db<>fiddle here
Here's an incomplete attempt for Sql Server.
Basically, the idea was to use a recursive CTE to unfold months for each res_id.
Then left join 'what could be' to the existing date ranges.
But I doubt it can be done in a sql that would work both for Oracle & MS Sql Server.
Sure, both have window functions and CTE's.
But the datetime functions are rarely the same for different RDMS.
So I give up.
Maybe someone else finds an easier solution.
create table PaymentSchedules
(
auto_numbered int identity(1,1) primary key,
res_id varchar(30),
date_start date,
date_end date,
org varchar(30),
pct decimal(3,0)
)
GO
✓
insert into PaymentSchedules
(res_id, org, pct, date_start, date_end)
values
('A', 'One', 100, '2018-01-01', '2018-06-30')
, ('A', 'One', 100, '2019-01-01', '2019-06-30')
, ('A', 'One', 60, '2019-07-01', null)
, ('A', 'Two', 40, '2019-07-02', '2019-12-31')
, ('A', 'Two', 40, '2020-01-01', null)
, ('B', 'Three', 100, null, null)
, ('C', 'One', 100, '2018-01-01', null)
, ('C', 'Four', 100, '2019-11-01', null)
;
GO
8 rows affected
declare #MaxEndDate date;
set #MaxEndDate = (select max(iif(date_start > date_end, date_start, isnull(date_end, date_start))) from PaymentSchedules);
;with rcte as
(
select res_id
, datefromparts(year(min(date_start)), month(min(date_start)), 1) as month_start
, eomonth(coalesce(max(date_end), #MaxEndDate)) as month_end
, 0 as lvl
from PaymentSchedules
group by res_id
having min(date_start) is not null
union all
select res_id
, dateadd(month, 1, month_start)
, month_end
, lvl + 1
from rcte
where dateadd(month, 1, month_start) < month_end
)
, cte_gai as
(
select c.res_id, c.month_start, c.month_end
, t.org, t.pct, t.auto_numbered
, sum(isnull(t.pct,0)) over (partition by c.res_id, c.month_start) as res_month_pct
, count(t.auto_numbered) over (partition by c.res_id, c.month_start) as cnt
from rcte c
left join PaymentSchedules t
on t.res_id = c.res_id
and c.month_start >= datefromparts(year(t.date_start), month(t.date_start), 1)
and c.month_start <= coalesce(t.date_end, #MaxEndDate)
)
select *
from cte_gai
where res_month_pct <> 100
order by res_id, month_start
GO
res_id | month_start | month_end | org | pct | auto_numbered | res_month_pct | cnt
:----- | :---------- | :--------- | :--- | :--- | ------------: | :------------ | --:
A | 2018-07-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-08-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-09-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-10-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-11-01 | 2019-12-31 | null | null | null | 0 | 0
A | 2018-12-01 | 2019-12-31 | null | null | null | 0 | 0
C | 2019-11-01 | 2020-01-31 | One | 100 | 7 | 200 | 2
C | 2019-11-01 | 2020-01-31 | Four | 100 | 8 | 200 | 2
C | 2019-12-01 | 2020-01-31 | One | 100 | 7 | 200 | 2
C | 2019-12-01 | 2020-01-31 | Four | 100 | 8 | 200 | 2
C | 2020-01-01 | 2020-01-31 | One | 100 | 7 | 200 | 2
C | 2020-01-01 | 2020-01-31 | Four | 100 | 8 | 200 | 2
db<>fiddle here
I am not giving the full answer here but I think that you are after cursors(https://learn.microsoft.com/en-us/sql/t-sql/language-elements/declare-cursor-transact-sql?view=sql-server-ver15).
This allows you to iterate through the database, checking all of the records.
This is bad practice because even though the idea is really good, they are quite heavy, and they are slow, and they block the involved tables.
I know some people have found a method to rewrite cursors using loops (while probably), so you need to understand a cursor, get how you would implement it and then translate it into a loop. (https://www.sqlbook.com/advanced/sql-cursors-how-to-avoid-them/)
Also, views can be helpful, but I am assuming that you know how to use them already.
The algorithm should be something like:
have table1 and table2 (table2 is a copy of table1, https://www.tutorialrepublic.com/sql-tutorial/sql-cloning-tables.php)
iterate through all of the records (I would use in the first instance a cursor for this) from table1. Picking up a record from table1.
if overlapping dates (check it against table2) do something
else do something else
pick another record from table1 and go to step 2.
Drop unnecessary tables

Repeating ID based on

I have a very simple requirement but I'm struggling to find a way around this.
I have a very simple query:
SELECT
ServiceCode,
StartDate,
Available,
Nights,
BookingID
FROM #tmpAvailability
LEFT JOIN vwRSBooking B
ON B.Depart = A.StartDate
AND B.ServiceCode = A.SupplierCode
AND B.StatusID IN (2640, 2621)
ORDER BY StartDate;
Made up of 2 tables
#tmpAvailability which consists of the following fields:
SupplierCode
StartDate
Available
vwRSBooking which consists of the following fields
BookingID
DepartDate
Code
Nights
StatusID
Departure and startdate can be joined to link the first day, and the servicecode and suppliercode can be joined to make sure that the availability is linked to the same supplier.
Which produces an output like this:
Code | Dates | Available | Nights | BookingID
TEST | 2018-01-04 | 1 | NULL | NULL
TEST | 2018-01-05 | 1 | NULL | NULL
TEST | 2018-01-06 | 0 | 4 | 123456
TEST | 2018-01-07 | 0 | NULL | NULL
TEST | 2018-01-08 | 0 | NULL | NULL
TEST | 2018-01-09 | 0 | NULL | NULL
TEST | 2018-01-10 | 1 | NULL | NULL
TEST | 2018-01-11 | 1 | NULL | NULL
TEST | 2018-01-12 | 1 | NULL | NULL
TEST | 2018-01-13 | 0 | NULL | 234567
TEST | 2018-01-14 | 0 | NULL | NULL
TEST | 2018-01-15 | 0 | NULL | NULL
What I need is when the BookingID in for 4 days that the bookingID and the nights are spread across those days, for example:
Code | Dates | Available | Nights | BookingID
TEST | 2018-01-04 | 1 | NULL | NULL
TEST | 2018-01-05 | 1 | NULL | NULL
TEST | 2018-01-06 | 0 | 4 | 123456
TEST | 2018-01-07 | 0 | 4 | 123456
TEST | 2018-01-08 | 0 | 4 | 123456
TEST | 2018-01-09 | 0 | 4 | 123456
TEST | 2018-01-10 | 1 | NULL | NULL
TEST | 2018-01-11 | 1 | NULL | NULL
TEST | 2018-01-12 | 1 | NULL | NULL
TEST | 2018-01-13 | 0 | 3 | 234567
TEST | 2018-01-14 | 0 | 3 | 234567
TEST | 2018-01-15 | 0 | 3 | 234567
TEST | 2018-01-16 | 1 | NULL | NULL
If anyone has any ideas on how to solve it would be most appreciated.
Andrew
You could replace your vwRSBooking with another view which uses a CTE to obtain all the dates the booking covers. Then use the view's coverdate for joining to the #tmpAvailability table:
CREATE VIEW vwRSBookingFull
AS
WITH cte ( bookingid, nights, depart, code, coverdate)
AS (SELECT bookingid,
nights,
depart,
code,
depart
FROM vwRSBooking
UNION ALL
SELECT c.bookingid,
c.nights,
c.depart,
c.code,
DATEADD(d, 1, c.coverdate)
FROM cte c
WHERE DATEDIFF(d, c.depart, c.coverdate) < (c.nights - 1))
SELECT c.bookingid,
c.nights,
c.depart,
c.code,
c.coverdate
FROM cte c
GO
You will need a calendar table with all the dates in the date range your dates may fall into. For this example, I build one for January 2018. We can then join onto this table to create the additional rows.
Here is the sample code I used. You can see it at SQL Fiddle.
CREATE TABLE code (
code varchar(max),
dates date,
available int,
nights int,
bookingid int
)
INSERT INTO code VALUES
('TEST','2018-01-04','1',NULL,NULL),
('TEST','2018-01-05','1',NULL,NULL),
('TEST','2018-01-06','0',4,123456),
('TEST','2018-01-07','0',NULL,NULL),
('TEST','2018-01-08','0',NULL,NULL),
('TEST','2018-01-09','0',NULL,NULL),
('TEST','2018-01-10','1',NULL,NULL),
('TEST','2018-01-11','1',NULL,NULL),
('TEST','2018-01-12','1',NULL,NULL),
('TEST','2018-01-13','0',3,234567),
('TEST','2018-01-14','0',NULL,NULL),
('TEST','2018-01-15','0',NULL,NULL)
CREATE TABLE dates (
dates date
)
INSERT INTO dates VALUES
('2018-01-01'),('2018-01-02'),('2018-01-03'),('2018-01-04'),('2018-01-05'),('2018-01-06'),('2018-01-07'),('2018-01-08'),('2018-01-09'),('2018-01-10'),('2018-01-11'),('2018-01-12'),('2018-01-13'),('2018-01-14'),('2018-01-15'),('2018-01-16'),('2018-01-17'),('2018-01-18'),('2018-01-19'),('2018-01-20'),('2018-01-21'),('2018-01-22'),('2018-01-23'),('2018-01-24'),('2018-01-25'),('2018-01-26'),('2018-01-27'),('2018-01-28'),('2018-01-29'),('2018-01-30'),('2018-01-31')
Here is the query based on this dataset:
SELECT
code.code,
dates.dates,
code.available,
code.nights,
code.bookingid
FROM code
LEFT JOIN dates ON
dates.dates >= code.dates
AND dates.dates < DATEADD(DAY,nights,code.dates)
Edit: Here is an example using your initial query as a subquery to join your result set onto the dates table if you want a copy & paste. Still requires creating the dates table.
SELECT
ServiceCode,
StartDate,
Available,
Nights,
BookingID
FROM (
SELECT
ServiceCode,
StartDate,
Available,
Nights,
BookingID
FROM #tmpAvailability
LEFT JOIN vwRSBooking B
ON B.Depart = A.StartDate
AND B.ServiceCode = A.SupplierCode
AND B.StatusID IN (2640, 2621)
) code
LEFT JOIN dates ON
dates.dates >= code.dates
AND dates.dates < DATEADD(DAY,nights,code.dates)
ORDER BY StartDate;