I need to create a query to get the consecutive days by the data dates.
Using this table as sample:
id_used | ref_date
---------+---------------------
1 | 2021-02-01 00:00:00
1 | 2021-09-01 00:00:00
1 | 2021-09-02 00:00:00
1 | 2021-09-03 00:00:00
My return should be 3 (The 3 last rows).
Related
I've a hive table 'driver_time_stats' with columns slot_id, number_of_drivers, slot_start_time and slot_end_time.
-----------------------------------------------------------------------
slot_id | number_of_drivers | slot_start_time | slot_end_time
-----------------------------------------------------------------------
1 | 5 | 2018-01-01 09:30:00 | 2018-01-01 10:00:00
2 | 8 | 2018-01-01 10:30:00 | 2018-01-01 11:00:00
-----------------------------------------------------------------------
Desired output: each row should be splitter into multiple rows at 1 minute interval between slot_start_time & slot_end_time.
-----------------------------------------------------------------------
slot_id | number_of_drivers | slot_start_time | slot_end_time
-----------------------------------------------------------------------
1 | 5 | 2018-01-01 09:30:00 | 2018-01-01 09:31:00
1 | 5 | 2018-01-01 09:31:00 | 2018-01-01 09:32:00
.
.
.
1 | 5 | 2018-01-01 09:59:00 | 2018-01-01 10:00:00
2 | 8 | 2018-01-01 10:30:00 | 2018-01-01 10:31:00
2 | 8 | 2018-01-01 10:31:00 | 2018-01-01 10:32:00
.
.
.
2 | 8 | 2018-01-01 10:59:00 | 2018-01-01 11:00:00
-----------------------------------------------------------------------
I was using lateral view, posexplode e.t.c functions but couldn't able to do it. Can anyone help me here ? in other way, I was trying to slice a record into multiple records at one minute interval in hive. I was able to achieve it in presto using UNNEST, but I want the solution in hive only as out ETLs built on hive.
-Nash
well, I could able to find the answer with the help of my friend Henry, posting the same here so that it can help others searching for a solution to similar problem. below code snipped gives you a guidance. you can tweak it according to your need.
select from_unixtime(unix_timestamp(st) + pe.i*60) FROM (
select '2018-01-12 09:00:00' as st, '2018-01-12 11:30:00' as en) t
lateral VIEW
posexplode(split(
space(cast(floor((unix_timestamp(t.en)-unix_timestamp(t.st))/60) as INT)), ' ')) pe as i, x
I have following situation where i need to get several values between two invoices date.
So query is giving data based on invoices now what i need to do is for some values fetch data between this invoice date and last invoice date
already tried ways
1) sub query will easily solve this but as i have to do this for 4-5 column and its a 15 gb database so that's not possible.
2) if i go like this
left join (select inv.date ,inv,actno from invoice inv) as invo on invo.actno=act.id and invo.date < inv.date
then it will give all the data less then that date but i need only one data that will be less than main invoice date.
3) we can not get second max value in subquery of from clause because outer invoice is not grouped so it might be max or midlle or least .
4) we can not send values of other table in subquery of join table.
ex
create table inv (id serial ,date timestamp without time zone);
insert into inv (date) values('2017-01-31 00:00:00'),('2017-01-30 00:00:00'),('2017-01-29 00:00:00'),('2017-01-28 00:00:00'),('2017-01-27 00:00:00');
select date as d1 from inv;
id | date
----+---------------------
1 | 2017-01-31 00:00:00
2 | 2017-01-30 00:00:00
3 | 2017-01-29 00:00:00
4 | 2017-01-28 00:00:00
5 | 2017-01-27 00:00:00
(5 rows)
I need this
id |date |date | id
1 | 2017-01-31 00:00:00 | 2017-01-30 00:00:00 | 2
2 | 2017-01-30 00:00:00 | 2017-01-29 00:00:00 | 3
3 | 2017-01-29 00:00:00 | 2017-01-28 00:00:00 | 4
4 | 2017-01-28 00:00:00 | 2017-01-27 00:00:00 | 5
5 | 2017-01-27 00:00:00 |
I can't do subquery in select as database is big and need to do this for 4-5 column
UPDATE 1
I need this from same table but using it twice in FROM clause as my requirement is that I need several data joined from invoice table and then there is 4-5 column in which I need things like sum of amount paid between last and this invoice.
So I can take both invoice date in subquery and get the data between them
UPDATE 2
lag will not solve this
select i.id,i.date, lag(date) over (order by date) from inv i order by id ;
id | date | lag
----+---------------------+---------------------
1 | 2017-01-31 00:00:00 | 2017-01-30 00:00:00
2 | 2017-01-30 00:00:00 | 2017-01-29 00:00:00
3 | 2017-01-29 00:00:00 | 2017-01-28 00:00:00
4 | 2017-01-28 00:00:00 | 2017-01-27 00:00:00
5 | 2017-01-27 00:00:00 |
(5 rows)
Time: 0.480 ms
test=# select i.id,i.date, lag(date) over (order by date) from inv i where id=2 order by id ;
id | date | lag
----+---------------------+-----
2 | 2017-01-30 00:00:00 |
(1 row)
Time: 0.525 ms
test=# select i.id,i.date, lag(date) over (order by date) from inv i where id in (2,3) order by id ;
id | date | lag
----+---------------------+---------------------
2 | 2017-01-30 00:00:00 | 2017-01-29 00:00:00
3 | 2017-01-29 00:00:00 |
it will calculate on the data it will get from the table in that query it is bounded in that query see here 3 has a lag but could not get it cause query is not allowing it to have it ....something in left join needs to be done so the lag date can be taken from same table but calling it again in from clause Thanks Again buddy
Like here?:
t=# select date as d1,
lag(date) over (order by date)
from inv
order by 1 desc;
d1 | lag
---------------------+---------------------
2017-01-31 00:00:00 | 2017-01-30 00:00:00
2017-01-30 00:00:00 | 2017-01-29 00:00:00
2017-01-29 00:00:00 | 2017-01-28 00:00:00
2017-01-28 00:00:00 | 2017-01-27 00:00:00
2017-01-27 00:00:00 |
(5 rows)
Time: 1.416 ms
I have a table of reservations which has two columns (started_at, and ended_at). I want to build a query that expands reservation rows into their individual days. So for instance if a reservation lasted 5 days I want 5 rows back for it. Something along the lines of:
Current Output
id | started_at | ended_at
----------------------------
1 | 2016-01-01 | 2016-01-05
2 | 2016-01-06 | 2016-01-10
Desired Output
id | date
---------------
1 | 2016-01-01
1 | 2016-01-02
1 | 2016-01-03
1 | 2016-01-04
1 | 2016-01-05
2 | 2016-01-06
2 | 2016-01-07
2 | 2016-01-08
2 | 2016-01-09
2 | 2016-01-10
I figured that generate_series might be of use here but I'm not certain of the syntax. Any help is greatly appreciated
SQL Fiddle
http://sqlfiddle.com/#!15/f0135/1
This runs ok on your fiddle
SELECT id, to_char(generate_series(started_at, ended_at, '1 day'),'YYYY-MM-DD') as date
FROM reservations;
I am looking for a SQL Statement which gives me all Entries whoms Date are not more than 5 days apart from another entry in this Table.
Example:
ID | Date
1 | 16.10.14 00:00:00
2 | 14.10.14 00:00:00
3 | 09.09.14 00:00:00
4 | 13.10.14 00:00:00
5 | 06.07.14 00:00:00
6 | 09.01.14 00:00:00
7 | 10.01.14 00:00:00
8 | 14.05.14 00:00:00
Expected Output:
ID | Date
1 | 16.10.14 00:00:00
2 | 14.10.14 00:00:00
4 | 13.10.14 00:00:00
6 | 09.01.14 00:00:00
7 | 10.01.14 00:00:00
8 | 14.01.14 00:00:00
EDIT:
In fact all I need is a way to do a diff over the datatype Date. That's why I cant even show my attempts cause I'm missing the keyword.
Nevermind I will still try
It should be something like this:
select * from example m where m.Date not more apart than 5 days from another entry in the Table
The - operator, when applied on two dates, will return their difference in days. So, you can use the exists operator to construct your query:
SELECT *
FROM my_table o
WHERE EXISTS (SELECT *
FROM my_table i
WHERE ABS (o.my_date - i.my_date) <= 5)
I have an activity table with a structure like this:
id prd_id act_dt grp
------------------------------------
1 1 2000-01-01 00:00:00
2 1 2000-01-01 00:00:01
3 1 2000-01-01 00:00:02
4 2 2000-01-01 00:00:00
5 2 2000-01-01 00:00:01
6 2 2000-01-01 01:00:00
7 2 2000-01-01 01:00:01
8 3 2000-01-01 00:00:00
9 3 2000-01-01 00:00:01
10 3 2000-01-01 02:00:00
I want to split the data within this activity table by product (prd_id) and activity date (act_dt), and update the the group (grp) column with a value from a sequence for each of these groups.
The kicker is, I need to group by similar timestamps, where similar means "all records have a difference of exactly 1 second." In other words, within a group, the difference between any 2 records when sorted by date will be exactly 1 second, and the difference between the first and last records can be any amount of time, so long as all the intermediary records are 1 second apart.
For the example data, the groups would be:
id prd_id act_dt grp
------------------------------------
1 1 2000-01-01 00:00:00 1
2 1 2000-01-01 00:00:01 1
3 1 2000-01-01 00:00:02 1
4 2 2000-01-01 00:00:00 2
5 2 2000-01-01 00:00:01 2
6 2 2000-01-01 01:00:00 3
7 2 2000-01-01 01:00:01 3
8 3 2000-01-01 00:00:00 4
9 3 2000-01-01 00:00:01 4
10 3 2000-01-01 02:00:00 5
What method would I use to accomplish this?
The size of the table is ~20 million rows, if that affects the method used to solve the problem.
I'm not an Oracle wiz, so I'm guessing at the best option for one line:
(CAST('2010-01-01' AS DATETIME) - act_dt) * 24 * 60 * 60 AS time_id,
This just needs to be "the number of seconds from [aDateConstant] to act_dt". The result can be negative. It just needs to be a the number of seconds, to turn your act_dt into an INT. The rest should work fine.
WITH
sequenced_data
AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY prd_id ORDER BY act_dt) AS sequence_id,
(CAST('2010-01-01' AS DATETIME) - act_dt) * 24 * 60 * 60 AS time_id,
*
FROM
yourTable
)
SELECT
DENSE_RANK() OVER (PARTITION BY prd_id ORDER BY time_id - sequence_id) AS group_id,
*
FROM
sequenced_data
Example data:
sequence_id | time_id | t-s | group_id
-------------+---------+-----+----------
1 | 1 | 0 | 1
2 | 2 | 0 | 1
3 | 3 | 0 | 1
4 | 8 | 4 | 2
5 | 9 | 4 | 2
6 | 12 | 6 | 3
7 | 14 | 7 | 4
8 | 15 | 7 | 4
NOTE: This does assume there are not multiple records with the same time. If there are, they would need to be filtered out first. Probably just using a GROUP BY in a preceding CTE.