I have a hive table with two rows like this:
0: jdbc:hive2://localhost:10000/default> select * from t2;
+-----+--------+
| id | value |
+-----+--------+
| 10 | 100 |
| 11 | 101 |
+-----+--------+
2 rows selected (1.116 seconds)
but when I issue a query :
select cast(1 as timestamp) from t2;
it gives out unconsistent result, can anyone tell me the reason ?
0: jdbc:hive2://localhost:10000/default> select cast(1 as timestamp) from t2;
+--------------------------+
| _c0 |
+--------------------------+
| 1970-01-01 07:00:00.001 |
| 1970-01-01 07:00:00.001 |
+--------------------------+
2 rows selected (0.913 seconds)
0: jdbc:hive2://localhost:10000/default> select cast(1 as timestamp) from t2;
+--------------------------+
| _c0 |
+--------------------------+
| 1970-01-01 08:00:00.001 |
| 1970-01-01 07:00:00.001 |
+--------------------------+
2 rows selected (1.637 seconds)
I can't reproduce your problem, which Hive version are you using? Hive had a bug with timestamp and bigint (see https://issues.apache.org/jira/browse/HIVE-3454), but it doesn't explain your problem. For example Hive 0.14 gives different results for
SELECT (cast 1 as timestamp), cast(cast(1 as double) as timestamp) from my_table limit 5;
Related
Good afternoon. What is the essence of the matter, the train has a geotag that determines its position in space. Location data is entered into a table. It is required to count how many times the train was in a certain timezone. But the problem is that being in a certain time zone, the geotag leaves several records in the table by time. What query can be used to count the number of arrivals?
I created a query that counts how many times the train was at point 270 and at point 289. To do this, I rounded the time to hours, but the problem is that if the train arrived at the end of the hour, but left at the beginning of the next, the query counts it as two arrivals . Below I will attach the query itself and the output results.
Create temp table tmpTable_1 ON COMMIT DROP as
select addr,zone_id,DATE_PART('hour',time)*100 as IntTime from trac_path_rmp where time between '2022.04.06' and '2022.04.07';
Create temp table tmpTable_2 ON COMMIT DROP as select addr,zone_id,IntTime from tmpTable_1 where addr in (12421,12422,12423,12425) group by addr,zone_id,IntTime;
select addr,sum(case when zone_id=289 then 1 else 0 end) as "Zone 289", sum(case when zone_id=270 then 1 else 0 end) as "Zone 270" from tmpTable_2 group by addr order by addr;
We can use LAG OVER() to get the timestamp of the previous row and only return the rows when there is at least a minutes difference. We could easily modify this: to 5 minutes for example.
We also keep the first row where LAG returns null.
We need to use hours and minutes because if we only use minutes we will get 0 time difference when there is exactly an hour between rows.
See dbFiddle link below.
;WITH CTE AS
(SELECT
*,
time_ - LAG(time_) OVER (ORDER BY id) AS dd
FROM table_name)
SELECT
id,time_,addr,x,y,z,zone_id,type
FROM cte
WHERE DATE_PART('hours',dd) + 60 * DATE_PART('minutes',dd) > 0
OR dd IS null;
id | time_ | addr | x | y | z | zone_id | type
--: | :------------------ | ----: | ------: | ------: | ------: | ------: | ---:
138 | 2022-04-06 19:19:11 | 12421 | 9793.50 | 4884.70 | -125.00 | 270 | 1
141 | 2022-04-06 20:37:23 | 12421 | 9736.00 | 4856.90 | -125.00 | 270 | 1
146 | 2022-04-06 22:58:15 | 12421 | 9736.00 | 4856.90 | -125.00 | 270 | 1
db<>fiddle here
I am having a table of GPS traces with Unix timestamp as shown below:
SELECT * FROM mytable LIMIT 10;
id | lat | lon | seconds | speed
-----------+------------+------------+------------+-------
536889001 | 41.1794675 | -8.6017187 | 1460465697 | 1.25
536889001 | 41.1794709 | -8.601675 | 1460465698 | 2
536889001 | 41.1794636 | -8.6016337 | 1460465700 | 1.25
536889001 | 41.1794468 | -8.6016014 | 1460465700 | 2.5
536889001 | 41.1794114 | -8.6015662 | 1460465701 | 3.5
536889001 | 41.1794376 | -8.6015672 | 1460465703 | 1.5
536889001 | 41.17944 | -8.6015516 | 1460465703 | 1.5
536889001 | 41.1794315 | -8.6015353 | 1460465704 | 1.5
536889001 | 41.1794367 | -8.6015156 | 1460465705 | 1.25
536889001 | 41.1794337 | -8.6014974 | 1460465706 | 1.75
(10 rows)
Column seconds is the Unix timestamp. I would like to update the table by selecting ONLY one row, for rows with timestamps logged more than 1. So for example in above, we see two rows each at timestamp 1460465700 and 1460465703.
Without a unique id on the row, this is tricky. But assuming that the combination of values is unique, you can use:
update gps
set . . .
from (select gps.*, count(*) over (partition by id, seconds) as cnt,
row_number() over (partition by id, seconds order by seconds) as seqnum
from gps
) gps2
where gps2.cnt > 1 and pgs2.seqnum = 1 and
gps2.seconds = pgs.seconds and
gps2.id = gps.id and
gps2.speed = gps.speed and
gps2.lat = gps.lat and
gps2.lon = gps.lon ;
I would advise you to add a unique id to the table, so this is much simpler (and guaranteed to work even if the table has duplicates).
I have a table in Postgres as follows:
| id | start_time | end_time | duration |
|----|--------------------------|--------------------------|----------|
| 1 | 2018-05-11T00:00:20.631Z | 2018-05-11T01:03:14.496Z | 1:02:54 |
| 2 | 2018-05-11T00:00:04.877Z | 2018-05-11T00:00:14.641Z | 0:00:10 |
| 3 | 2018-05-11T01:03:28.063Z | 2018-05-11T01:04:36.410Z | 0:01:08 |
| 4 | 2018-05-11T00:00:20.631Z | 2018-05-11T02:03:14.496Z | 2:02:54 |
start_time and end_time are stored as varchar. Format is 'yyyy-mm-dd hh24:mi:ss.ms' (ISO format).
duration has been calculated as end_time - start_time. Format is hh:mi:ss.
I need result table output as follows:
| id | start_time | end_time | duration | start | end | duration_minutes |
|----|--------------------------|--------------------------|----------|-----------|-----------|------------------|
| 1 | 2018-05-11T00:00:20.631Z | 2018-05-11T01:03:14.496Z | 1:02:54 | 5/11/2018 | 5/11/2018 | 62 | -- (60+2)
| 2 | 2018-05-11T00:00:04.877Z | 2018-05-11T00:00:14.641Z | 0:00:10 | 5/11/2018 | 5/11/2018 | 0 |
| 3 | 2018-05-11T01:03:28.063Z | 2018-05-11T01:04:36.410Z | 0:01:08 | 5/11/2018 | 5/11/2018 | 1 |
| 4 | 2018-05-11T00:00:20.631Z | 2018-05-11T02:03:14.496Z | 2:02:54 | 5/11/2018 | 5/11/2018 | 122 | -- (2X60 +2)
start and end need to contain only the mm/dd/yyyy portion of start_time and end_time respectively.
duration_minutes should calculate total duration in minutes (eg, if duration is 1:02:54, duration in minutes should be 62 which is 60+2)
How can I do this using SQL?
Based in varchar input, this query produces your desired result, exactly:
SELECT *
, to_char(start_time::timestamp, 'FMMM/DD/YYYY') AS start
, to_char(end_time::timestamp, 'FMMM/DD/YYYY') AS end
, extract(epoch FROM duration::interval)::int / 60 AS duration_minutes
FROM tbl;
Major points:
Use timestamp and interval instead of varchar to begin with.
Or do not store the functionally dependent column duration at all. It can cheaply be computed on the fly.
For display / a particular text representation use to_char().
Be explicit and do not rely on locale settings that may change from session to session.
The FM pattern modifier is for (quoting the manual):
fill mode (suppress leading zeroes and padding blanks)
extract (epoch FROM interval_tpe) produces the number of contained seconds. You want to truncate fractional minutes? Integer division does just that, so cast to int like demonstrated. Related:
Get difference in minutes between times with timezone
The following appears to do what you want:
select v.starttime::timestamp::date, v.endtime::date,
extract(epoch from v.endtime::timestamp - v.starttime::timestamp)/60
from (values ('2018-05-11T00:00:20.631Z', '2018-05-11T01:03:14.496Z')) v(starttime, endtime)
If you want the dates in a particular format, then use to_char().
Need to convert timestamps with 1/1000 second resolution to 1/100 resolution. I could possibly use to_char(timestamp, text) formatting function for this purpose, however need help with text to be used. Postgres way of doing this is here.
input table (note - the timestamp here is stored as varchar)
+-------------------------+
| ms1000_val |
+-------------------------+
| 2017/02/20 08:27:17.899 |
| 2017/02/20 08:23:43.894 |
| 2017/02/20 08:24:41.894 |
| 2017/02/20 08:28:09.899 |
+-------------------------+
output table
+------------------------+
| ms100_val |
+------------------------+
| 2017/02/20 08:27:17.89 |
| 2017/02/20 08:23:43.89 |
| 2017/02/20 08:24:41.89 |
| 2017/02/20 08:28:09.89 |
+------------------------+
Try this
select cast(to_char(sub.field,'YYYY-MM-DD HH24:MI:SS') as timestamp)
+ interval '10 millisecond' * (cast(to_char(sub.field,'MS') as integer)/10) as converted_value
from (
select to_timestamp('2017/02/20 08:27:17.899','YYYY/MM/DD HH24:MI:SS.MS') as field
union
select to_timestamp('2017/02/20 08:23:43.894','YYYY/MM/DD HH24:MI:SS.MS')
union
select to_timestamp('2017/02/20 08:24:41.894','YYYY/MM/DD HH24:MI:SS.MS')
union
select to_timestamp('2017/02/20 08:28:09.899','YYYY/MM/DD HH24:MI:SS.MS')
) sub
dateposted is a MySQL TIMESTAMP column:
SELECT *
FROM posts
WHERE dateposted > NOW() - 604800
...SHOULD, if I am not mistaken, return rows where dateposted was in the last week. But it returns only posts less than roughly one day old. I was under the impression that TIMESTAMP used seconds?
IE: 7 * 3600 * 24 = 604800
Use:
WHERE dateposted BETWEEN DATE_ADD(NOW(), INTERVAL -7 DAY) AND NOW()
That is because now() is implicitly converted into a number from timestamp and mysql conversion rules create a number like YYYYMMDDHHMMSS.uuuuuu
from mysql docs:
mysql> SELECT NOW();
-> '2007-12-15 23:50:26'
mysql> SELECT NOW() + 0;
-> 20071215235026.000000
Internally perhaps. The way to do this is the date math functions. So it would be:
SELECT * FROM posts WHERE dateposted > DATE_ADD(NOW(), INTERVAL -7 DAY)
I think there is a DATE_SUB, I'm just used to using ADD everywhere.
No, you can't implicitly use integer arithmetic with TIMESTAMP, DATETIME, and other date-related data types. You're thinking of the UNIX timestamp format, which is an integer number of seconds since 1/1/1970.
You can convert SQL data types to a UNIX timestamp in MySQL and then use arithmetic:
SELECT * FROM posts WHERE UNIX_TIMESTAMP(dateposted)+604800 > NOW()+0;
NB: adding zero to NOW() makes it return a numeric value instead of a string value.
update: Okay, I'm totally wrong with the above query. Converting NOW() to a numeric output doesn't produce a number that can be compared to UNIX timestamps. It produces a number, but the number doesn't count seconds or anything else. The digits are just YYYYMMDDHHMMSS strung together.
Example:
CREATE TABLE foo (
id SERIAL PRIMARY KEY,
dateposted TIMESTAMP
);
INSERT INTO foo (dateposted) VALUES ('2009-12-4'), ('2009-12-11'), ('2009-12-18');
SELECT * FROM foo;
+----+---------------------+
| id | dateposted |
+----+---------------------+
| 1 | 2009-12-04 00:00:00 |
| 2 | 2009-12-11 00:00:00 |
| 3 | 2009-12-18 00:00:00 |
+----+---------------------+
SELECT *, UNIX_TIMESTAMP(dateposted) AS ut, NOW()-604800 AS wk FROM foo
+----+---------------------+------------+-----------------------+
| id | dateposted | ut | wk |
+----+---------------------+------------+-----------------------+
| 1 | 2009-12-04 00:00:00 | 1259913600 | 20091223539359.000000 |
| 2 | 2009-12-11 00:00:00 | 1260518400 | 20091223539359.000000 |
| 3 | 2009-12-18 00:00:00 | 1261123200 | 20091223539359.000000 |
+----+---------------------+------------+-----------------------+
It's clear that the numeric values are not comparable. However, UNIX_TIMSTAMP() can also convert numeric values in that format as it can convert a string representation of a timestamp:
SELECT *, UNIX_TIMESTAMP(dateposted) AS ut, UNIX_TIMESTAMP(NOW())-604800 AS wk FROM foo
+----+---------------------+------------+------------+
| id | dateposted | ut | wk |
+----+---------------------+------------+------------+
| 1 | 2009-12-04 00:00:00 | 1259913600 | 1261089774 |
| 2 | 2009-12-11 00:00:00 | 1260518400 | 1261089774 |
| 3 | 2009-12-18 00:00:00 | 1261123200 | 1261089774 |
+----+---------------------+------------+------------+
Now one can run a query with an expression comparing them:
SELECT * FROM foo WHERE UNIX_TIMESTAMP(dateposted) > UNIX_TIMESTAMP(NOW())-604800
+----+---------------------+
| id | dateposted |
+----+---------------------+
| 3 | 2009-12-18 00:00:00 |
+----+---------------------+
But the answer given by #OMGPonies is still better, because this expression in my query probably can't make use of an index. I'm just offering this as an explanation of how the TIMESTAMP and NOW() features work.
Try this query:
SELECT * FROM posts WHERE DATE_SUB(CURDATE(),INTERVAL 7 DAY) < dateposted;
I am assuming that you are using mySQL.