Generate Series from Rows in PostgreSQL - sql

I have a table of reservations which has two columns (started_at, and ended_at). I want to build a query that expands reservation rows into their individual days. So for instance if a reservation lasted 5 days I want 5 rows back for it. Something along the lines of:
Current Output
id | started_at | ended_at
----------------------------
1 | 2016-01-01 | 2016-01-05
2 | 2016-01-06 | 2016-01-10
Desired Output
id | date
---------------
1 | 2016-01-01
1 | 2016-01-02
1 | 2016-01-03
1 | 2016-01-04
1 | 2016-01-05
2 | 2016-01-06
2 | 2016-01-07
2 | 2016-01-08
2 | 2016-01-09
2 | 2016-01-10
I figured that generate_series might be of use here but I'm not certain of the syntax. Any help is greatly appreciated
SQL Fiddle
http://sqlfiddle.com/#!15/f0135/1

This runs ok on your fiddle
SELECT id, to_char(generate_series(started_at, ended_at, '1 day'),'YYYY-MM-DD') as date
FROM reservations;

Related

Missing row using left join and generate_series

I don't understand why my two table behaves incorrectly, the generated series doesn't show on the joined table.
TABLE 1: attendance
id | employee_id | flag | entry_date
-------------------------------
1 | 1 | 0 | 2017-11-17
2 | 2 | 0 | 2017-11-17
3 | 3 | 0 | 2017-11-17
Here's my query:
SELECT
attendance.employee_id,
TO_CHAR(series::date, 'YYYY-MM-DD')
FROM generate_series('2017-11-16', '2017-11-30', '1 day'::INTERVAL) series
LEFT JOIN (
SELECT
employee_id,
to_char(entry_date, 'YYYY-MM-DD') entry_date
FROM
attendance
WHERE entry_at >= '2017-11-16' AND entry_at <= '2017-11-30'
GROUP BY to_char(entry_at, 'YYYY-MM-DD'), employee_id
) attendance ON attendance.entry_date = TO_CHAR(series::date, 'YYYY-MM-DD')
ORDER by employee_id
The result is:
employee_id | to_char
------------------------
1 | 2017-11-17
2 | 2017-11-17
3 | 2017-11-17
I'm expecting a little bit different in which 2017-11-16 will show since my generated series started at said date.
Expected result:
employee_id | to_char
------------------------
null | 2017-11-16
null | 2017-11-16
null | 2017-11-16
1 | 2017-11-17
2 | 2017-11-17
3 | 2017-11-17
Here's a sample SQL Fiddle to test:
http://sqlfiddle.com/#!17/d7907/3
Update [Jan/4/2017]: Looks like i'm thinking on the wrong side, I decided to perform a correlated subquery
What I wanted to do is to count the number of dates present according to the generated_series.
It's hard to understand, what you really need. What you described as "The result is" - these are rows from your internal query. If you left join generate_series with this internal query - you will get 1 null row for every date excluding 2017-11-17 and 3 rows for 2017-11-17. I checked it - it worked as expected:
employee_id | to_char
-------------+------------
1 | 2017-11-17
2 | 2017-11-17
3 | 2017-11-17
| 2017-11-20
| 2017-11-21
| 2017-11-22
| 2017-11-23
| 2017-11-24
| 2017-11-25
| 2017-11-26
| 2017-11-27
| 2017-11-28
| 2017-11-29
| 2017-11-16
| 2017-11-30
| 2017-11-18
| 2017-11-19

BigQuery - how many entries per partition?

I have big partitioned tables and try to figure out how many entries are in each day-partition.
So far I used a for loop in a script but there must be a simpler way doing it.
Google did not help me. Does anyone know the right query?
Thanks
you can run the following query to count how many entries you have in each partition
#standardSQL
SELECT
_PARTITIONTIME AS pt,
COUNT(1)
FROM
`dataset.table`
GROUP BY
1
ORDER BY
1 DESC
and
#legacySQL
SELECT
_PARTITIONTIME AS pt,
COUNT(1)
FROM
[dataset:table]
GROUP BY
1
ORDER BY
1 DESC
it returns a table like this, please note that the NULL entries are still in streaming buffer. Hint: to obtain records which are in streaming buffer us a query with NULL.
+-------------------------+-----+--+
| 2017-02-14 00:00:00 UTC | 252 | |
+-------------------------+-----+--+
| 2017-02-13 00:00:00 UTC | 257 | |
+-------------------------+-----+--+
| 2017-02-12 00:00:00 UTC | 188 | |
+-------------------------+-----+--+
| 2017-02-11 00:00:00 UTC | 234 | |
+-------------------------+-----+--+
| 2017-02-10 00:00:00 UTC | 107 | |
+-------------------------+-----+--+
| null | 13 | |
+-------------------------+-----+--+

how to get second max date in postgres sql

I have following situation where i need to get several values between two invoices date.
So query is giving data based on invoices now what i need to do is for some values fetch data between this invoice date and last invoice date
already tried ways
1) sub query will easily solve this but as i have to do this for 4-5 column and its a 15 gb database so that's not possible.
2) if i go like this
left join (select inv.date ,inv,actno from invoice inv) as invo on invo.actno=act.id and invo.date < inv.date
then it will give all the data less then that date but i need only one data that will be less than main invoice date.
3) we can not get second max value in subquery of from clause because outer invoice is not grouped so it might be max or midlle or least .
4) we can not send values of other table in subquery of join table.
ex
create table inv (id serial ,date timestamp without time zone);
insert into inv (date) values('2017-01-31 00:00:00'),('2017-01-30 00:00:00'),('2017-01-29 00:00:00'),('2017-01-28 00:00:00'),('2017-01-27 00:00:00');
select date as d1 from inv;
id | date
----+---------------------
1 | 2017-01-31 00:00:00
2 | 2017-01-30 00:00:00
3 | 2017-01-29 00:00:00
4 | 2017-01-28 00:00:00
5 | 2017-01-27 00:00:00
(5 rows)
I need this
id |date |date | id
1 | 2017-01-31 00:00:00 | 2017-01-30 00:00:00 | 2
2 | 2017-01-30 00:00:00 | 2017-01-29 00:00:00 | 3
3 | 2017-01-29 00:00:00 | 2017-01-28 00:00:00 | 4
4 | 2017-01-28 00:00:00 | 2017-01-27 00:00:00 | 5
5 | 2017-01-27 00:00:00 |
I can't do subquery in select as database is big and need to do this for 4-5 column
UPDATE 1
I need this from same table but using it twice in FROM clause as my requirement is that I need several data joined from invoice table and then there is 4-5 column in which I need things like sum of amount paid between last and this invoice.
So I can take both invoice date in subquery and get the data between them
UPDATE 2
lag will not solve this
select i.id,i.date, lag(date) over (order by date) from inv i order by id ;
id | date | lag
----+---------------------+---------------------
1 | 2017-01-31 00:00:00 | 2017-01-30 00:00:00
2 | 2017-01-30 00:00:00 | 2017-01-29 00:00:00
3 | 2017-01-29 00:00:00 | 2017-01-28 00:00:00
4 | 2017-01-28 00:00:00 | 2017-01-27 00:00:00
5 | 2017-01-27 00:00:00 |
(5 rows)
Time: 0.480 ms
test=# select i.id,i.date, lag(date) over (order by date) from inv i where id=2 order by id ;
id | date | lag
----+---------------------+-----
2 | 2017-01-30 00:00:00 |
(1 row)
Time: 0.525 ms
test=# select i.id,i.date, lag(date) over (order by date) from inv i where id in (2,3) order by id ;
id | date | lag
----+---------------------+---------------------
2 | 2017-01-30 00:00:00 | 2017-01-29 00:00:00
3 | 2017-01-29 00:00:00 |
it will calculate on the data it will get from the table in that query it is bounded in that query see here 3 has a lag but could not get it cause query is not allowing it to have it ....something in left join needs to be done so the lag date can be taken from same table but calling it again in from clause Thanks Again buddy
Like here?:
t=# select date as d1,
lag(date) over (order by date)
from inv
order by 1 desc;
d1 | lag
---------------------+---------------------
2017-01-31 00:00:00 | 2017-01-30 00:00:00
2017-01-30 00:00:00 | 2017-01-29 00:00:00
2017-01-29 00:00:00 | 2017-01-28 00:00:00
2017-01-28 00:00:00 | 2017-01-27 00:00:00
2017-01-27 00:00:00 |
(5 rows)
Time: 1.416 ms

Can I put a condition on a window function in Redshift?

I have an events-based table in Redshift. I want to tie all events to the FIRST event in the series, provided that event was in the N-hours preceding this event.
If all I cared about was the very first row, I'd simply do:
SELECT
event_time
,first_value(event_time)
OVER (ORDER BY event_time rows unbounded preceding) as first_time
FROM
my_table
But because I only want to tie this to the first event in the past N-hours, I want something like:
SELECT
event_time
,first_value(event_time)
OVER (ORDER BY event_time rows between [N-hours ago] and current row) as first_time
FROM
my_table
A little background on my table. It's user actions, so effectively a user jumps on, performs 1-100 actions, and then leaves. Most users are 1-10x per day. Sessions rarely last over an hour, so I could set N=1.
If I just set a PARTITION BY date_trunc('hour', event_time), I'll double create for sessions that span the hour.
Assume my_table looks like
id | user_id | event_time
----------------------------------
1 | 123 | 2015-01-01 01:00:00
2 | 123 | 2015-01-01 01:15:00
3 | 123 | 2015-01-01 02:05:00
4 | 123 | 2015-01-01 13:10:00
5 | 123 | 2015-01-01 13:20:00
6 | 123 | 2015-01-01 13:30:00
My goal is to get a result that looks like
id | parent_id | user_id | event_time
----------------------------------
1 | 1 | 123 | 2015-01-01 01:00:00
2 | 1 | 123 | 2015-01-01 01:15:00
3 | 1 | 123 | 2015-01-01 02:05:00
4 | 4 | 123 | 2015-01-01 13:10:00
5 | 4 | 123 | 2015-01-01 13:20:00
6 | 4 | 123 | 2015-01-01 13:30:00
The answer appears to be "no" as of now.
There is a functionality in SQL Server of using RANGE instead of ROWS in the frame. This allows the query to compare values to the current row's value.
https://www.simple-talk.com/sql/learn-sql-server/window-functions-in-sql-server-part-2-the-frame/
When I attempt this syntax in Redshift I get the error that "Range is not yet supported"
Someone update this when that "yet" changes!

Query to use GROUP BY multiple columns

I have a table full of patients/responsible parties/insurance carrier combinations (e.g. patient Jim Doe's responsible party is parent John Doe who has insurance carrier Aetna Insurance). For each of these combinations, they have a contract that has multiple payments. For this particular table, I need to write a query to find any parent/RP/carrier combo that has multiple contract dates in the same month. Is there anyway to do this?
Example table:
ContPat | ContResp | ContIns | ContDue
------------------------------------------------------
53 | 13 | 27 | 2012-01-01 00:00:00.000
53 | 13 | 27 | 2012-02-01 00:00:00.000
53 | 15 | 27 | 2012-03-01 00:00:00.000
12 | 15 | 3 | 2011-05-01 00:00:00.000
12 | 15 | 3 | 2011-05-01 00:00:00.000
12 | 15 | 3 | 2011-06-01 00:00:00.000
12 | 15 | 3 | 2011-07-01 00:00:00.000
12 | 15 | 3 | 2011-08-01 00:00:00.000
12 | 15 | 3 | 2011-09-01 00:00:00.000
In this example, I would like to generate a list of all the duplicate months for any Patient/RP/Carrier combinations. The 12/15/3 combination would be the only row returned here, but I'm working with thousands of combinations.
Not sure if this is possible using a GROUP BY or similar functions. Thanks in advance for any advice!
If all you care about is multiple entries in the same calendar month:
SELECT
ContPat,
ContResp,
ContIns,
MONTH(ContDue) as Mo,
YEAR(ContDue) as Yr,
COUNT(*) as 'Records'
FROM
MyTable
GROUP BY
ContPat,
ContResp,
ContIns,
MONTH(ContDue),
YEAR(ContDue)
HAVING
COUNT(*) > 1
This will show you any Patient/Responsible Party/Insurer/Calendar month combination with more than one record for that month.