Enable expanded table formatting in HIVE CLI? - hive

Similar to this question: Is there a way to toggle expanded table formatting mode in PrestoDB cli?.
Is there a way to enable expanded table formatting mode in HIVE? I want to inspect a few records in a wide table before starting a big query job.
Copying the example from the other question:
Before expanded table formatting:
select * from sometable;
id | time | humanize_time | value
----+-------+---------------------------------+-------
1 | 09:30 | Early Morning - (9.30 am) | 570
2 | 11:30 | Late Morning - (11.30 am) | 690
3 | 13:30 | Early Afternoon - (1.30pm) | 810
4 | 15:30 | Late Afternoon - (3.30 pm) | 930
(4 rows)
After:
select * from sometable;
-[ RECORD 1 ]-+---------------------------
id | 1
time | 09:30
humanize_time | Early Morning - (9.30 am)
value | 570
-[ RECORD 2 ]-+---------------------------
id | 2
time | 11:30
humanize_time | Late Morning - (11.30 am)
value | 690
-[ RECORD 3 ]-+---------------------------
id | 3
time | 13:30
humanize_time | Early Afternoon - (1.30pm)
value | 810
-[ RECORD 4 ]-+---------------------------
id | 4
time | 15:30
humanize_time | Late Afternoon - (3.30 pm)
value | 930

You can use a combination of CROSS JOIN , CASE and UNION ALL.
select
c.col,
case c.col
when 'id' then id
when 'time' then time
when 'humanize_time' then humanize_time
when 'value' then value
end as data
from sometable t
cross join
(
select 'id' as col
union all select 'time' as col
union all select 'humanize_time' as col
union all select 'value' as col
) c ORDER BY id;

Related

SQL: usage time of item between dates combining two tables

Trying to create query that will give me usage time of each car part between dates when that part is used. Etc. let say part id 1 is installed on 2018-03-01 and on 2018-04-01 runs for 50min and then on 2018-05-10 runs 30min total usage of this part shoud be 1:20min as result.
These are examples of my tables.
Table1
| id | part_id | car_id | part_date |
|----|-------- |--------|------------|
| 1 | 1 | 3 | 2018-03-01 |
| 2 | 1 | 1 | 2018-03-28 |
| 3 | 1 | 3 | 2018-05-10 |
Table2
| id | car_id | run_date | puton_time | putoff_time |
|----|--------|------------|---------------------|---------------------|
| 1 | 3 | 2018-04-01 | 2018-04-01 12:00:00 | 2018-04-01 12:50:00 |
| 2 | 2 | 2018-04-10 | 2018-04-10 15:10:00 | 2018-04-10 15:20:00 |
| 3 | 3 | 2018-05-10 | 2018-05-10 10:00:00 | 2018-05-10 10:30:00 |
| 4 | 1 | 2018-05-11 | 2018-05-11 12:00:00 | 2018-04-01 12:50:00 |
Table1 contains dates when each part is installed, table2 contains usage time of each part and they are joined on car_id, I have try to write query but it does not work well if somebody can figure out my mistake in this query that would be healpful.
My SQL query
SELECT SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(t1.puton_time, t1.putoff_time)))) AS total_time
FROM table2 t1
LEFT JOIN table1 t2 ON t1.car_id=t2.car_id
WHERE t2.id=1 AND t1.run_date BETWEEN t2.datum AND
(SELECT COALESCE(MIN(datum), '2100-01-01') AS NextDate FROM table1 WHERE
id=1 AND t2.part_date > part_date);
Expected result
| part_id | total_time |
|---------|------------|
| 1 | 1:20:00 |
Hope that this problem make sence because in my search I found nothing like this, so I need help.
Solution, thanks to Kota Mori
SELECT t1.id, SEC_TO_TIME(SUM(TIME_TO_SEC(TIMEDIFF(t2.puton_time, t2.putoff_time)))) AS total_time
FROM table1 t1
LEFT JOIN table2 t2 ON t1.car_id = t2.car_id
AND t1.part_date >= t2.run_date
GROUP BY t1.id
You first need to join the two tables by the car_id and also a condition that part_date should be no greater than run_date.
Then compute the total minutes for each part_id separately.
The following is a query example for SQLite (The only SQL engine that I have access to right now).
Since SQLite does not have datetime type, I convert strings into unix timestamp by strftime function. This part should be changed in accordance with the SQL engine you are using. Apart from that, this is fairly a standard sql and mostly valid for other SQL dialect.
SELECT
t1.id,
sum(
cast(strftime('%s', t2.putoff_time) as integer) -
cast(strftime('%s', t2.puton_time) as integer)
) / 60 AS total_minutes
FROM
table1 t1
LEFT JOIN
table2 t2
ON
t1.car_id = t2.car_id
AND t1.part_date <= t2.run_date
GROUP BY
t1.id
The result is something like the below. Note that ID 1 gets 80 minutes (1:20) as expected.
id total_minutes
0 1 80
1 2 80
2 3 30

Optimizing results for query with WHERE EXISTS clause

I have this table in postgres:
id | id_datetime | longitude | latitude
--------+---------------------+---------------------+--------------------
639438 | 2018-02-20 18:00:00 | -122.3880011217841 | 37.75538988423265
639439 | 2018-02-20 20:30:00 | -122.38756878451498 | 37.760550220844614
639440 | 2018-02-20 20:05:00 | -122.39640513677658 | 37.76130039041195
639441 | 2018-02-24 10:00:00 | -122.45819139221014 | 37.724317534370066
639442 | 2018-02-10 09:00:00 | -122.44693382058489 | 37.77000760474354
I want an output with all the differents ID's which has at least another ID between the last 15 minutes and between 1000 meters (geographic distance).
My table has more than 100K rows. So, I'm currently trying with the following query which works but takes too long. Are there any steps I can take to optimize this?
SELECT DISTINCT
x.id
FROM table x
WHERE EXISTS(
SELECT
1
FROM table t
WHERE t.id <> x.id
AND (t.id_datetime between x.id_datetime - interval '15 minutes' AND x.id_datetime)
AND (ST_Distance((geography(ST_MakePoint(x.longitude, x.latitude))),
geography(ST_MakePoint(t.longitude, t.latitude)) ) <= 1000)
)

Split rows on different days if summing hours value to given day exceeds midnight

I have a structure like this
+-----+-----+------------+----------+------+----------------------+---+
| Row | id | date | time | hour | description | |
+-----+-----+------------+----------+------+----------------------+---+
| 1 | foo | 2018-03-02 | 19:00:00 | 8 | across single day | |
| 2 | bar | 2018-03-02 | 23:00:00 | 1 | end at midnight | |
| 3 | qux | 2018-03-02 | 10:00:00 | 3 | inside single day | |
| 4 | quz | 2018-03-02 | 23:15:00 | 2 | with minutes | |
+-----+-----+------------+----------+------+----------------------+---+
(I added the description column only to understand the context, for analysis purpose is useless)
Here is the statement to generate table
WITH table AS (
SELECT "foo" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time,8 AS hour
UNION ALL
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1
UNION ALL
SELECT "qux", CURRENT_dATE(), TIME(10,0,0), 3
UNION ALL
SELECT "quz", CURRENT_dATE(), TIME(23,15,0), 2
)
SELECT * FROM table
Adding the hour value to the given time, I need to split the row on multiple ones, if the sum goes on the next day.
Jumps on multiple days are NOT to be considered, like +27 hours (this should simplify the scenario)
My initial idea was starting from adding the hours value in a date field, in order to obtain start and end limits of the interval
SELECT
id,
DATETIME(date, time) AS date_start,
DATETIME_ADD(DATETIME(date, time), INTERVAL hour HOUR) AS date_end
FROM table
here is the result
+-----+-----+---------------------+---------------------+---+
| Row | id | date_start | date_end | |
+-----+-----+---------------------+---------------------+---+
| 1 | foo | 2018-03-02T19:00:00 | 2018-03-03T03:00:00 | |
| 2 | bar | 2018-03-02T23:00:00 | 2018-03-03T00:00:00 | |
| 3 | qux | 2018-03-02T10:00:00 | 2018-03-02T13:00:00 | |
| 4 | quz | 2018-03-02T23:15:00 | 2018-03-03T01:15:00 | |
+-----+-----+---------------------+---------------------+---+
but now I'm stuck on how to proceed considering the existing interval.
Starting from this table, the rows should be splitted if the day change, like
+-----+-----+------------+-------------+----------+-------+--+
| Row | id | date | hourt_start | hour_end | hours | |
+-----+-----+------------+-------------+----------+-------+--+
| 1 | foo | 2018-03-02 | 19:00:00 | 00:00:00 | 5 | |
| 2 | foo | 2018-03-03 | 00:00:00 | 03:00:00 | 3 | |
| 3 | bar | 2018-03-02 | 23:00:00 | 00:00:00 | 1 | |
| 4 | qux | 2018-03-02 | 10:00:00 | 13:00:00 | 3 | |
| 5 | quz | 2018-03-02 | 23:15:00 | 00:00:00 | 0.75 | |
| 6 | quz | 2018-03-03 | 00:00:00 | 01:15:00 | 1.25 | |
+-----+-----+------------+-------------+----------+-------+--+
I tried to study a similar scenario from an already analyzed scenario, but I was unable to adapt it for handling the day component as well.
My whole final scenario will include both this approach and the other one analyzed in the other question (split on single days and then split on given breaks of hours), but I can approach these 2 themes separately, first query split with day (this question) and then split on time breaks (other question)
Interesting problem ... I tried the following:
Create a second table creating all the new rows starting at midnight
UNION ALL it with source table while correcting hours of old rows accordingly
Commented Result:
WITH table AS (
SELECT "foo" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time,8 AS hour
UNION ALL
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1
UNION ALL
SELECT "qux", CURRENT_dATE(), TIME(10,0,0), 3
)
,table2 AS (
SELECT
id,
-- create datetime, add hours, then cast as date again
CAST( datetime_add( datetime(date, time), INTERVAL hour HOUR) AS date) date,
time(0,0,0) AS time -- losing minutes and seconds
-- substract hours to midnight
,hour - (24-EXTRACT(HOUR FROM time)) hour
FROM
table
WHERE
date != CAST( datetime_add( datetime(date,time), INTERVAL hour HOUR) AS date) )
SELECT
id
,date
,time
-- correct hour if midnight split
,IF(EXTRACT(hour from time)+hour > 24,24-EXTRACT(hour from time),hour) hour
FROM
table
UNION ALL
SELECT
*
FROM
table2
Hope, it makes sense.
Of course, if you need to consider jumps over multiple days, the correction fails :)
Here a possibile solution I came up starting from #Martin Weitzmann approach.
I used 2 different ways:
ids where there is a "jump" on the day
ids which are in the same day
and a final UNION ALL of the two data
I forgot to mention the first time that the hours value of the input value can be float (portion of hours) so I added that too.
#standardSQL
WITH
input AS (
-- change of day
SELECT "bap" as id, CURRENT_dATE() AS date, TIME(19,0,0) AS time, 8.0 AS hour UNION ALL
-- end at midnight
SELECT "bar", CURRENT_dATE(), TIME(23,0,0), 1.0 UNION ALL
-- inside single day
SELECT "foo", CURRENT_dATE(), TIME(10,0,0), 3.0 UNION ALL
-- change of day with minutes and float hours
SELECT "qux", CURRENT_dATE(), TIME(23,15,0), 2.5 UNION ALL
-- start from midnight
SELECT "quz",CURRENT_dATE(), TIME(0,0,0), 4.5
),
-- Calculate end_date and end_time summing hours value
table AS (
SELECT
id,
date AS start_date,
time AS start_time,
EXTRACT(DATE FROM DATETIME_ADD(DATETIME(date,time), INTERVAL CAST(hour*3600 AS INT64) SECOND)) AS end_date,
EXTRACT(TIME FROM DATETIME_ADD(DATETIME(date,time), INTERVAL CAST(hour*3600 AS INT64) SECOND)) AS end_time
FROM input
),
-- portion that start from start_time and end at midnight
start_to_midnight AS (
SELECT
id,
start_time,
start_date,
TIME(23,59,59) as end_time,
start_date as end_date
FROM
table
WHERE end_date > start_date
),
-- portion that start from midnightand end at end_time
midnight_to_end AS (
SELECT
id,
TIME(0,0,0) as start_time,
end_date as start_date,
end_time,
end_date
FROM
table
WHERE
end_date > start_date
-- Avoid rows that starts from 0:0:0 and ends to 0:0:0 (original row ends at 0:0:0)
AND end_time != TIME(0,0,0)
)
-- Union of the 3 tables
SELECT
id,
start_date,
start_time,
end_time
FROM (
SELECT id, start_time, end_time, start_date FROM table WHERE start_date = end_date
UNION ALL
SELECT id, start_time, end_time, start_date FROM start_to_midnight
UNION ALL
SELECT id, start_time, end_time, start_date FROM midnight_to_end
)
ORDER BY id,start_date,start_time
Here is the provided output
+-----+-----+------------+------------+----------+---+
| Row | id | start_date | start_time | end_time | |
+-----+-----+------------+------------+----------+---+
| 1 | bap | 2018-03-03 | 19:00:00 | 23:59:59 | |
| 2 | bap | 2018-03-04 | 00:00:00 | 03:00:00 | |
| 3 | bar | 2018-03-03 | 23:00:00 | 23:59:59 | |
| 4 | foo | 2018-03-03 | 10:00:00 | 13:00:00 | |
| 5 | qux | 2018-03-03 | 23:15:00 | 23:59:59 | |
| 6 | qux | 2018-03-04 | 00:00:00 | 01:45:00 | |
| 7 | quz | 2018-03-03 | 00:00:00 | 04:30:00 | |
+-----+-----+------------+------------+----------+---+

sql select rows with time range

Have a table scheme that look as :
events{id int, date date, begin time, end time, title varchar)
and want to retrieve rows from a web user form, by example : 11:00 (i'm french so use 24h format)
So, if i have in the table the following rows :
1 | 2016-03-29 | 11:15 | 12:00
2 | 2016-03-29 | 09:45 | 11:15
3 | 2016-03-29 | 10:45 | 13:00
4 | 2016-03-29 | 12:45 | 14:45
I would like to be able to count, or retrieve rows have events that have 11:00 in the range :
2 | 2016-03-29 | 09:45 | 11:15
3 | 2016-03-29 | 10:45 | 13:00
Note : i use PostgreSQL but want to respect SQL normalization to achieve...
Thanks
Use Between operator
Try this
select *
from yourtable
where '11:00' between "begin" and "end"

Transform log table into another table with values spread over every month

I have a table with rows consisting of values and timestamps. A row is inserted whenever the value has changed, and the timestamp indicates when the change has happened. For example, something like this:
id | value | timestamp
-----+-------+----------------------------
1 | 736 | 2014-03-18 16:38:22.20548
2 | 531 | 2014-06-18 16:38:22.664324
3 | 24 | 2014-07-18 16:38:22.980137
4 | 530 | 2014-09-22 10:01:36.13856
5 | 529 | 2014-09-23 10:01:27.202026
I need a query in Postgresql which generates a table with one row for every month of the year. Every row has a timestamp (first day of the month) and a value. The value is the last value of the first table which was inserted before the beginning of the given month. The value should be 0 if there are no matching rows in the first table. Something like this:
id | value | timestamp
-----+-------+----------------------------
1 | 0 | 2014-01-01 00:00:00.000000
2 | 0 | 2014-02-01 00:00:00.000000
3 | 0 | 2014-03-01 00:00:00.000000
4 | 736 | 2014-04-01 00:00:00.000000
5 | 736 | 2014-05-01 00:00:00.000000
6 | 736 | 2014-06-01 00:00:00.000000
7 | 531 | 2014-07-01 00:00:00.000000
8 | 24 | 2014-08-01 00:00:00.000000
9 | 24 | 2014-09-01 00:00:00.000000
10 | 529 | 2014-10-01 00:00:00.000000
11 | 529 | 2014-11-01 00:00:00.000000
12 | 529 | 2014-12-01 00:00:00.000000
I tried for a while, but I didn't manage to get the full result. I guess I need to generate a list of months like this.
SELECT
*
FROM
generate_series('2014-01-01 00:00'::timestamp, now(), '1 month') AS months
And then do something like this to get the last occurrency before a month:
SELECT
*
FROM first_table
WHERE timestamp < --current_month_selection--
ORDER BY timestamp desc
LIMIT 1;
I guess one needs an OUTER JOIN and a CASE conditional...
Unfortunately I didn't manage to put it all together. Can somebody help me?
Actually, I think I solved this by myself, it was quite trivial:
SELECT
month,
COALESCE((SELECT value
FROM first_table
WHERE timestamp < month
ORDER BY timestamp DESC
LIMIT 1),0)
FROM
generate_series('2014-01-01 00:00'::timestamp, now(), '1 month') AS month