Find most recent date in a table using HIVE - hive

I just need to make a simple query of a table on a MapR cluster in that I want to know what the date is of the most recent record in the table. Dates are in a 'report_date' column in string format. I tried the following query without success:
select max(report_date) from partition.table_name
I know the second part of the statement works. Is there something wrong with the first part?
Thanks,
A

Your date column datatype is string hence the max function doesnt produce the output as desired.
for example : string column with values 1,2,3,4 and when you run max(column) you wont get the output as 4 , since max doesnt work on string datatype.
Try changing your datatype to DATE or TIMESTAMP , Which should work.
OR
if changing datatype is not possible then try,
If there is an auto incrementing ID column in the table or any column like so , then
select report_date from table_name order by ID desc.
This should provide you the max date sting.

Related

AWS Athena query to find Date column has String or not satisfying timestamp format

I was testing Sprak 3 upgrades in AWS Athena and need to check date columns whether timestamp format is proper or not,Can any one please give me query to check whether date columns has any Values other than Timestamp format
Assuming that you have a varchar column you can try using date_parse wrapped in try:
select *
from table
where try(date_parse(string_column, 'your_expected_format')) is null -- assuming no original nulls in column
Or via try_cast for "standard" format:
select *
from table
where try_cast(string_column as timestamp) -- assuming no original nulls in column

Converting all data in a Varchar column to a date format

I'm working on a table with a column, 'Expiry Date', as a varchar with all data formatted as DD/MM/YYYY.
The creator of the table has used the wrong type for this expiry date column and now the client needs to filter and show all records before and after the current date as the time. This means the type needs to be changed to date or datetime type to be able to use the CURDATE() function.
However, the current format of the values does not satisfy and wont allow the type to change unless the format is changed to YYYY-MM-DD (or similar).
Is there any way to mass format the values in this column and this column alone as there are thousands of entries and formatting one by one would be extremely time consuming.
Let me assume that you are using MySQL.
Perhaps the simplest method is to add a generated column that is a date:
alter table t add column expiry_date_date as
(str_to_date(expiry_date, '%d/%m/%Y'));
You can also fix the data:
update t
set expiry_date = str_to_date(expiry_date, '%d/%m/%Y');
This will implicitly convert the result of str_to_date() to a date, which will be in the YYYY-MM-DD format.
More importantly, you can then do:
alter table t modify column expiry_date date;
Here is a db<>fiddle.
You can do similar operations in other databases, but the exact code is a bit different.
What you need is an update on that column, but before doing it I suggest you to check if the result is what you want.
select replace(expiry_date, '/', '-') new_expiry_date
from table_name
If this returns the results you want you can run the following update:
update table_name
set expiry_date = replace(expiry_date, '/', '-')
Of course you will need to replace expiry_date and table_name with the names of your column and table.

Subtract Date field from previous row, reset when value in other field changes

I am trying to write a SQL script that accomplishes the following:
Creates a column which subtracts the value in the Date Field from the value in the Date field from the previous row. This should reset and start over when the ID field changes.
The OpID and Resolutiondate field are fixed, and I am trying to create a column like the one see below.
You can use lag(). Date/time functions are notoriously database-specific, but the idea is:
select t.*,
(datefield - lag(datefield) over (partition by id order by datefield)) as diff
from t;

Why does this oracle query return null? (avg on number field)

I'm pulling my hair out this morning, as I'm trying to select a simple average from a single field from a table in an Oracle database. My table has 31 rows, the column in question is called AGE and I just want an average. The column is of type "number" and there are no nulls in it.
SELECT AVG(AGE)
FROM COLLECTIONS.CUSTOMERS
This query always returns null. I have also tried:
SELECT SUM(AGE)/COUNT(AGE)
FROM COLLECTIONS.CUSTOMERS
with the same result. Any help is greatly appreciated!
I have tried creating a sample table with single column (age). I kept data type int and number both and getting expected result:
INTEGER : Demo
NUMBER : Demo
NUMBER (with same datatype as OP): Demo

sql query for retrieving default or sorted values

im trying to figure out how to write a sql query for this:
I have a database which contains a column X. X has datetime values if set by a function otherwise it has a default value which looks something like this 0001-01-01.00.00.00.000000
What I am interested in doing is writing sql that will retrieve all the rows of X sorted by latest datetime values.
I thought this would be the answer
Select * from Some_Table st where st.Dbname = "blah" order_by st.x desc
but then I was thinking what happens to the default values? how do they affect the sorting
Any ideas if this is the way to go?
Ordering by date desc will work fine. Why not make the column nullable so you don't need a default? Also it sounds like you only want the latest single record so you may want to consider using SELECT TOP 1 * FROM blah ORDER BY date DESC