How to find updated date for hive tables? - hive

How to find the last DML or DQL update timestamp for Hive table. I can find TransientDDLid by using "formatted describe ". But it is helping in getting Modified Date. How can I figure out the latest UPDATED DATE for a Hive Table(Managed/External)?

Do show table extended like 'table_name';
It will give number of milliseconds elapsed since epoch.
Copy that number, remove last 3 digits and do select from_unixtime(no. of milliseconds elapsed since epoch)
e.g. select from_unixtime(1532442615733);
This will give you timestamp of that moment in current system's time zone.
I guess this is what you're looking for...

Related

Invalid Time String Error when trying to change type of data from string to time

I am very new to data analytics and I need some help troubleshooting a SQL error I got. So, I have a column in this table which transferred over from Excel to SQL as a string type rather than a time piece of data. I want to make it into a time type so i can further analyze it.
So, I did the attached query to try and change the type of data using the CAST function. . However, it could not complete the query thanks to an outlier in the data set I have yet to clean the data and this was one of my first steps to so, but how do I remove this particular row that contains the invalid time string so the query can actually work? Or is there a better way to convert this entire column from text string to time?
BigQuery Time types adjust values outside the 24 hour boundary - 00:00:00 to 24:00:00; for example, if you subtract an hour from 00:30:00, the returned value is 23:30:00.
Based on your screenshot it looks like you are storing a duration? So 330 hours, 25 minutes and 55 seconds?
You would probably be best using timestamp, converting the hours to days and adding the remainder to your minutes and seconds.
You can then cast the resulting string to timestamp.
Edit
A much simpler solution is just cast('330:25:55' as interval) - thanks to #MatBailie

Hive Timestamp Differences in Milliseconds

A previous solution regarding obtaining an answer in milliseconds for differences between two timestamps does not work in Hive 1.0 on Amazon EMR. Hive returns a blank column when casting a timestamp as double in my testing today. No errors are thrown when doing the CAST. Being able to calculate a time difference in fractions of a second between two columns of type "timestamp" are critical to our analysis. Any ideas?
You should try to convert into unix_timestamp using unix_timestamp(timestamp) but I think you will still be losing milliseconds.
select (unix_timestamp(DATE1)-unix_timestamp(DATE2)) TIMEDIFF from TABLE;

how to update the previous rows with last_modified date column having null values?

I have a loader table in which the feed updates and inserts records for every three hours. A few set of records show Null values for the last_modified date even though I have a merge which checks for last_modified date column to sysdate. For future purpose, I set the last_modified to sysdate and enabled Not NULL constraint.Is there any way where we can rectify for these set of records alone to have the last_modified date with the correct timestamp (the records should have the last_modified date with the date when the insert/update is done).
Thanks
No, the last modification time is not stored in a row by default. You have to do that yourself like you are doing now, or enable some form of journaling. There is no way to correct any old records where you have not done so.
If your rows were modified "recently enough", you might still map their ora_rowscn to their approximate modification TIMESTAMP using SCN_TO_TIMESTAMP :
UPDATE MY_TABLE
SET Last_Modified = SCN_TO_TIMESTAMP(ora_rowscn)
WHERE Last_Modified IS NULL;
This is not a magic bullet though. To quote the documentation:
The usual precision of the result value is 3 seconds.
The association between an SCN and a timestamp when the SCN is generated is remembered by the database for a limited period of time. This period is the maximum of the auto-tuned undo retention period, if the database runs in the Automatic Undo Management mode, and the retention times of all flashback archives in the database, but no less than 120 hours. The time for the association to become obsolete elapses only when the database is open. An error is returned if the SCN specified for the argument to SCN_TO_TIMESTAMP is too old.
If you try to map ora_rowscn of rows outside the allowed window, you will get the error ORA-08181 "specified number is not a valid system change number".

SQL query date according to time zone

We are using a Vertica database with table columns of type timestamptz, all data is inserted according to the UTC timezone.
We are using spring-jdbc's NamedParameterJdbcTemplate
All queries are based on full calendar days, e.g. start date 2013/08/01 and end date 2013/08/31, which brings everything between '2013/08/01 00:00:00.0000' and '2013/08/31 23:59:59.9999'
We are trying to modify our queries to consider timezones, i.e. I can for my local timezone I can ask for '2013/08/01 00:00:00.0000 Asia/Jerusalem' till '2013/08/31 23:59:59.9999 Asia/Jerusalem', which is obviously different then '2013/08/01 00:00:00.0000 UTC' till '2013/08/31 23:59:59.9999 UTC'.
So far, I cannot find a way to do so, I tried setting the timezone in the session:
set timezone to 'Asia/Jerusalem';
This doesn't even work in my database client.
Calculating the difference in our Java code will not work for us as we also have queries returning date groupings (this will get completely messed up).
Any ideas or recommendations?
I am not familiar with Veritca, but some general advice:
It is usually best to use half-open intervals for date range queries. The start date should be inclusive, while the end date should be exclusive. In other words:
start <= date < end
or
start <= date && end > date
Your end date wouldn't be '2013/08/31 23:59:59.9999', it would instead be the start of the next day, or '2013/09/01 00:00:00.0000'. This avoids problems relating to precision of decimals.
That example is for finding a single date. Since you are querying a range of dates, then you have two inputs. So it would be:
startFieldInDatabase >= yourStartParameter
AND
endFieldInDatabase < yourEndParameter
Again, you would first increment the end parameter value to the start of the next day.
It sounds like perhaps Vertica is TZ aware, given that you talked about timestamptz types in your answer. Assuming they are similar to Oracle's TIMESTAMPTZ type, then it sounds like your solution will work just fine.
But usually, if you are storing times in UTC in your database, then you would simply convert the query input time(s) in advance. So rather than querying between '2013/08/01 00:00:00.0000' and '2013/09/01 00:00:00.0000', you would convert that ahead of time and query between '2013/07/31 21:00:00.0000' and '2013/08/31 21:00:00.0000'. There are numerous posts already on how to do that conversion in Java either natively or with Joda Time, so I won't repeat that here.
As a side note, you should make sure that whatever TZDB implementation you are using (Vertica's, Java's, or JodaTime's) has the latest 2013d update, since that includes the change for Israel's daylight saving time rule that goes into effect this year.
Okay, so apparently:
set time zone to 'Asia/Jerusalem';
worked and I just didn't realize it, but for the sake of helping others I'm going to add something else that works:
select fiels at time zone 'Asia/Jerusalem' from my_table;
will work for timestamptz fields

How to convert a unix timestamp (INT) to monetdb timestamp ('YYYY-MM-DD HH:MM:SS') local time format

Q1: I want to convert a unix timestamp (INT) to monetdb timestamp ('YYYY-MM-DD HH:MM:SS') format
but it is giving me the GMT time not my actual time.
When I do
select (epoch(cast(current_timestamp as timestamp))-epoch(timestamp '2013-04-25 11:49:00'))
where 2013-04-25 11:49:00 is my systems current time it gives the same difference
I tried using
set time zone interval '05:30' HOUR TO MINUTE;
but it did not change the result
How can I solve this problem??
Example Problem:
I wanted to convert unix timestamp 1366869289 which should be around "2013-04-25 11:25:00" but monetdb gives "2013-04-25 05:55:00"
Knowing nothing about MonetDB, but a lot about timezones, I decided to look in their documentation to see what kind of datatypes are supported and how conversions are handled.
I found this page on Temporal data types. Based on that, I can conclude that a timestamp in MonetDB is always intended to reference UTC/GMT time - which is consistent with other systems.
In order to get a value that is for a particular time zone, they offer the following example:
SET TIME ZONE INTERVAL '1' HOUR TO MINUTE
I assume this means to set the database to offset all times by 1 hour, effectively placing the values all in UTC+01:00, such as is the offset for British Summer Time.
The page also goes on to point out the problems that can arise with using just and offset to adjust time values (see TimeZone != Offset in the TimeZone tag wiki). It also offers a list of various named time zones. But it does not show how to set a time zone to one of the named values. Also, their list appears to be proprietary, and incomplete. While at first glance they appear to have similarities to the IANA/Olson time zone database - the identifiers they specify are not valid TZDB names.
There are some other functions listed on this page, without much explanation. One that looks promising for your needs is LOCALTIMESTAMP. Perhaps this will take the local time zone into account, which appears to be what you were looking for.
I could not find any additional details specific to MonetDB date/time/timezone handling. The documentation appears to be fairly incomplete. You might want to reach out to their mailing list.