Hive closest timestamp

Hive closest timestamp - hive

I have these 2 tables, each with 2 columns :
Table1
Timestamp1 Data1
2021-08-10 10:32:48.869 data11
2021-08-11 11:38:27.511 data12
2021-08-12 12:41:11.945 data13
Table2
Timestamp2 Data2
2021-08-10 10:32:47.748 data21
2021-08-10 10:32:51.356 data21
2021-08-11 11:38:55.669 data23
2021-08-11 11:39:11.333 data24
2021-08-12 12:39:11.998 data25
2021-08-12 12:48:01.558 data26
How can i obtain this table with the closest timestamp :
Table3
Timestamp1 Closest_Timestamp2
2021-08-10 10:32:48.869 2021-08-10 10:32:47.748
2021-08-11 11:38:27.511 2021-08-11 11:38:55.669
2021-08-12 12:41:11.945 2021-08-12 12:39:11.998

Related

generate date range between min and max dates Athena presto SQL sequence error

I'm attempting to generate a series of dates in Presto SQL (Athena) using unnest and sequence something similair to generate_series in postgres.
my table looks like
job_name | run_date
A | '2021-08-21'
A | '2021-08-25'
B | '2021-08-07'
B | '2021-08-24'
SELECT d.job_name, d.run_date
FROM (
VALUES
('A', '2021-08-21'), ('A', '2021-08-25'),
('B', '2021-08-07'), ('B', '2021-08-24')
) d(job_name, run_date)
I'm aiming for an output as follows
job_name | run_date
A | 2021-08-21
A | 2021-08-22
A | 2021-08-23
A | 2021-08-24
A | 2021-08-25
B | 2021-08-07
B | 2021-08-08
B | 2021-08-09
B | 2021-08-10
B | 2021-08-11
B | 2021-08-12
B | 2021-08-13
B | 2021-08-14
B | 2021-08-15
B | 2021-08-16
B | 2021-08-17
B | 2021-08-18
B | 2021-08-19
B | 2021-08-20
B | 2021-08-21
B | 2021-08-22
B | 2021-08-23
B | 2021-08-24
I've attempted to use the following query to achieve this - however I get an error when trying to unnest my date sequence
SELECT t.job_name, d.dte
FROM (SELECT job_name
, min(run_date) as mind
, max(run_date) as maxd
, SEQUENCE(min(run_date), max(run_date)) as date_arr
FROM job_log_table t
GROUP BY job_name
) jd
CROSS JOIN
UNNEST(jd.date_arr) d(dte)
LEFT JOIN job_log_table t
ON t.job_name = jd.job_name
AND t.latest_date = d.dte;
which yields the following error :
[HY000][100071] [Simba][AthenaJDBC](100071) An error has been thrown from the AWS Athena client. [ErrorCategory:USER_ERROR, ErrorCode:SYNTAX_ERROR], Detail:SYNTAX_ERROR: line 5:14: Unexpected parameters (date, date) for function sequence. Expected: sequence(bigint, bigint, bigint) , sequence(bigint, bigint) , sequence(timestamp, timestamp, interval day to second) , sequence(timestamp, timestamp, interval year to month)
Is this a limitation of Athena's flavour of Presto SQL or have I made a school boy error somewhere?

You need to provide interval to generate date sequence (in this case interval '1' day):
WITH dataset AS (
SELECT *
FROM
( VALUES
('A', DATE '2021-08-21'), ('A', DATE '2021-08-25'),
('B', DATE '2021-08-07'), ('B', DATE '2021-08-24')
) AS d (job_name, run_date)
)
select job_name, sequence(min(run_date), max(run_date), interval '1' day) seq
from dataset
group by job_name
Output:
job_name
seq
A
[2021-08-21 00:00:00.000, 2021-08-22 00:00:00.000, 2021-08-23 00:00:00.000, 2021-08-24 00:00:00.000, 2021-08-25 00:00:00.000]
B
[2021-08-07 00:00:00.000, 2021-08-08 00:00:00.000, 2021-08-09 00:00:00.000, 2021-08-10 00:00:00.000, 2021-08-11 00:00:00.000, 2021-08-12 00:00:00.000, 2021-08-13 00:00:00.000, 2021-08-14 00:00:00.000, 2021-08-15 00:00:00.000, 2021-08-16 00:00:00.000, 2021-08-17 00:00:00.000, 2021-08-18 00:00:00.000, 2021-08-19 00:00:00.000, 2021-08-20 00:00:00.000, 2021-08-21 00:00:00.000, 2021-08-22 00:00:00.000, 2021-08-23 00:00:00.000, 2021-08-24 00:00:00.000]

Combing temp tables in series

Say, I have 6 temp tables stored as the following (those 3 are samples) and I would to form them into 1 single table, to be in series (headers Date, Com, Price).
Com A
Date Price
2015-05-01 00:00:00.000 34.25
2015-05-02 00:00:00.000 35.20
2015-05-03 00:00:00.000 36.70
2015-05-04 00:00:00.000 32.37
2015-05-05 00:00:00.000 32.40
2015-05-06 00:00:00.000 32.20
Com B
Date Price
2015-05-07 00:00:00.000 54.29
2015-05-08 00:00:00.000 54.50
2015-05-09 00:00:00.000 56.21
2015-05-10 00:00:00.000 56.70
2015-05-11 00:00:00.000 58.20
Com C
Date Price
2015-05-12 00:00:00.000 34.29
2015-05-13 00:00:00.000 24.50
2015-05-14 00:00:00.000 76.21
2015-05-15 00:00:00.000 36.70
2015-05-16 00:00:00.000 48.20
The output to look like, and I would like to store it as another temp table for merging later:
Date Com Price
2015-05-01 00:00:00.000 A 34.25
2015-05-02 00:00:00.000 A 35.20
2015-05-03 00:00:00.000 A 36.70
2015-05-04 00:00:00.000 A 32.37
2015-05-05 00:00:00.000 A 32.40
2015-05-06 00:00:00.000 A 32.20
2015-05-07 00:00:00.000 B 54.29
2015-05-08 00:00:00.000 B 54.50
2015-05-09 00:00:00.000 B 56.21
2015-05-10 00:00:00.000 B 56.70
2015-05-11 00:00:00.000 B 58.20
2015-05-12 00:00:00.000 C 34.29
2015-05-13 00:00:00.000 C 24.50
2015-05-14 00:00:00.000 C 76.21
2015-05-15 00:00:00.000 C 36.70
2015-05-16 00:00:00.000 C 48.20

Seems like a simple union all to me:
SELECT [Date], 'A' as Com, Price
FROM [Com A]
UNION ALL
SELECT [Date], 'B' as Com, Price
FROM [Com B]
UNION ALL
SELECT [Date], 'C' as Com, Price
FROM [Com C]

Based on your sample data
Select Date,'A' AS Com,Price from [COM A]
UNION ALL
Select Date,'B' AS Com,Price from [COM B]
UNION ALL
Select Date,'C' AS Com,Price from [COM C]

How to group field by id and find the sum?

I have the following data
id starting_point ending_point Date
A 2525 6565 25/05/2017 13:25:00
B 5656 8989 25/01/2017 10:55:00
A 1234 5656 20/05/2017 03:20:00
A 4562 6245 01/02/2017 19:45:00
B 6496 9999 06/12/2016 21:55:00
B 1122 2211 20/03/2017 18:30:00
How to group the data by their id in the ascending order of date and find the sum of first stating point and last starting point. In this case,
Expected output is :
id starting_point ending_point Date Value
A 4562 6245 01/02/2017 19:45:00
A 1234 5656 20/05/2017 03:20:00
A 2525 6565 25/05/2017 13:25:00 4532 + 6565 = 11127
B 6496 9999 06/12/2016 21:55:00
B 1122 2211 20/03/2017 18:30:00 6496 + 2211 = 8707

IIUC:
In [146]: x.groupby('id').apply(lambda df: df['starting_point'].head(1).values[0]
+ df['ending_point'].tail(1).values[0])
Out[146]:
id
A 8770
B 7867
dtype: int64

SQL How to order by and keep expanded row by another order?

My result table is this:
FDate FTime FNo FId FRID FRCont
2016-12-19 07:25:00 1254 A1 A1 1
2016-12-19 08:45:00 1322 A2 A1 2
2016-12-19 13:20:00 4521 B1 B1 1
2016-12-19 16:40:00 7841 B2 B1 2
2016-12-19 20:45:00 1258 B3 B1 3
2016-12-19 11:25:00 3254 C1 C1 1
2016-12-19 13:10:00 3145 C2 C1 2
2016-12-19 15:20:00 3333 C3 C1 3
2016-12-20 07:35:00 7777 C4 C1 4
2016-12-20 08:50:00 7851 D1 D1 1
2016-12-20 10:30:00 45123 D2 D1 2
I want order by date and time in (FCont=1 rows)
but i do not want change relation by value in column FRID and FRCont.
looks like this:
FDate FTime FNo FId FRID FRCont
2016-12-19 07:25:00 1254 A1 A1 1
2016-12-19 08:45:00 1322 A2 A1 2
2016-12-19 11:25:00 3254 C1 C1 1
2016-12-19 13:10:00 3145 C2 C1 2
2016-12-19 15:20:00 3333 C3 C1 3
2016-12-20 07:35:00 7777 C4 C1 4
2016-12-19 13:20:00 4521 B1 B1 1
2016-12-19 16:40:00 7841 B2 B1 2
2016-12-19 20:45:00 1258 B3 B1 3
2016-12-20 08:50:00 7851 D1 D1 1
2016-12-20 10:30:00 45123 D2 D1 2
please resolve with any way in sql server query.
thanks a lot.

I think you are looking for something like this:
SELECT FDate, FTime, FNo, FId, FRID, FRCont
FROM (
SELECT FDate, FTime, FNo, FId, FRID, FRCont,
MIN(FDate) OVER (PARTITION BY FRID) AS Min_Date,
MIN(FTime) OVER (PARTITION BY FRID) AS Min_Time
FROM mytable ) AS t
ORDER BY Min_Date, Min_Time, FRID, FDate, FTime
The couple (Min_Date, Min_Time) gives the starting datetime value per FRID slice. Using this couple we can order each slice, placing in the first place the slice having the lowest datetime value followed by the slice having the next datetime value, etc.

You seem to want to sort by the minimum date/time for each group:
select t.*
from t
order by min(date + time) over (partition by frid),
frid,
fid;
Note: You might have to convert the date/time to datetime for the addition to work.

Please try this:
select FDate,FTime, FNo,FId,FRID,FRCont from (
select t.*
,min(fdate+ftime) over (partition by frid) mn
from t) t
order by mn, frcont;

Select Sql row to field

i have this data in table:
RowID PerID Date Time RowNumber
------------------------------------------------
2393 1856 2015-07-29 00:52:55 1
2408 1856 2015-07-29 19:13:32 2
2394 1864 2015-07-29 00:57:17 1
2399 1864 2015-07-29 11:07:26 2
2403 1864 2015-07-29 15:25:42 3
2406 1864 2015-07-29 19:06:37 4
2395 1877 2015-07-29 01:10:23 1
2407 1877 2015-07-29 19:13:26 2
2409 1881 2015-07-29 19:13:52 1
2391 1882 2015-07-29 00:32:15 1
2396 1882 2015-07-29 11:05:51 2
2397 1882 2015-07-29 11:05:53 3
2398 1882 2015-07-29 11:06:01 4
2401 1882 2015-07-29 15:20:16 5
2404 1882 2015-07-29 19:04:07 6
2392 1883 2015-07-29 00:35:50 1
2400 1883 2015-07-29 11:17:30 2
2402 1883 2015-07-29 15:24:10 3
2405 1883 2015-07-29 19:06:20 4
i want to create this data table from above data:
RowID PerID io_num ioDate InTime OutTime
----------------------------------------------
1 1856 1 2015-07-29 00:52:55 19:13:32
2 1864 1 2015-07-29 00:57:17 11:07:26
3 1864 2 2015-07-29 15:25:42 19:06:37
4 1877 1 2015-07-29 01:10:23 19:13:26
5 1881 1 2015-07-29 19:13:52 null
6 1882 1 2015-07-29 00:32:15 11:05:51
7 1882 2 2015-07-29 11:05:53 11:06:01
8 1882 3 2015-07-29 15:20:16 19:04:07
9 1883 1 2015-07-29 15:24:10 11:17:30
9 1883 2 2015-07-29 00:35:50 19:06:20
please help me
thanks

SQL FIDDLE DEMO
WITH calc_time as (
SELECT
t1.PerID,
t1.Date ioDate,
t1.Time InTime,
t2.Time OutTime
FROM mytable t1 left join
mytable t2 on
t1.PerId = t2.PerID
and t1.RowNumber = t2.RowNumber - 1
WHERE
(t1.RowNumber % 2) = 1
)
SELECT
ROW_NUMBER() OVER(ORDER BY PerID) AS RowID,
c.PerID,
ROW_NUMBER() OVER(Partition BY PerID ORDER BY ioDate, InTime) AS io_num,
c.ioDate,
c.InTime,
c.OutTime
FROM
calc_time c

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Hive closest timestamp - hive

Related

generate date range between min and max dates Athena presto SQL sequence error

Combing temp tables in series

How to group field by id and find the sum?

SQL How to order by and keep expanded row by another order?

Select Sql row to field

Categories

Resources