Finding Max date in hive - hive

I have a column named "Date" which is in string datatype.
01-24-2018
04-30-2017
How to find the maximum of these dates which is in string?
I used this query which is not returning the expected max.
max(to_date(from_unixtime(unix_timestamp(b.Date,'MM-dd-yyyy')))

Try
MAX( cast(to_date(from_unixtime(unix_timestamp(yourdate , 'MM-dd-yyyy'))) as date))
EDIT: In your comments you mentioned you need to get the account details for the max date.You could use this.
SELECT a.id,
b.contract,
CAST (
TO_DATE (
from_unixtime (unix_timestam‌​p (b.date, 'MM-dd-yyyy'))) AS DATE)
AS MAX_DATE
FROM acct a JOIN customer b ON (b.partyid = a.offerid)
WHERE CAST (
TO_DATE (
from_unixtime (unix_timestam‌​p (b.date, 'MM-dd-yyyy'))) AS DATE) IN
( SELECT
MAX (
CAST (
TO_DATE (
from_unixtime (unix_timestam‌​p (b.date, 'MM-dd-yyyy'))) AS DATE))
FROM acct a JOIN customer b ON (b.partyid = a.offerid)
WHERE b.contract = 200427747 );

Related

How to filter date with where on SQLite with '2021-07-31 13:53:26' format?

I wanted to take just the year and month from '2021-07-31 13:53:26' and group them based on count values.
i tried the date, datetime, strftime functions.
Date and Datetime resulting null. strftime result something, but i cant group the Year and Month i get with the count i want, resulting null again
Here is the preview of the data.
expected result example is like '2021-07' with the count of how many times this year and month occurs
This is the syntax i tried with strftime:
select strftime('%Y%m', started_at) year_month, count(year_month) from bike_trip
group by year_month
Thank You
Sqlite doesn't have a date data type so you will need to do string comparison to achieve this.
with d as (
select '2021-07-31 13:53:26' as d, 'A' val union all
select '2021-08-30 13:53:26' as d, 'B' val
)
select substr(d,1,4) as yyyy, substr(d,6,2) as mm, count(*)
from d
group by substr(d,1,4), substr(d,6,2)
in your query:
select substr(started_at,1,4) as yyyy, substr(started_at,6,2) as mm, count(*)
from bike_trip
group by substr(started_at,1,4), substr(started_at,6,2)
Use a CTE to get your answer.
with
-- uncomment to test
/*bike_trip(started_at) as (
values
('2021-07-31 13:53:26'),
('2021-07-17 19:06:01'),
('2021-08-30 13:53:26')
),*/
bike_months(year_month) as (
select strftime('%Y-%m', started_at) year_month from bike_trip
)
select year_month, count(year_month) count_year_month from bike_months
group by year_month;
Output:
year_month|count_year_month
2021-07|2
2021-08|1

Snowflake SQL Query Left Join Issue

For one of our code, left join is not behaving properly in snowflake. Require your help if you can find solution around the same.
We have a sample data setup like mentioned below with basicc table join.
CREATE TABLE patient_test(pid INT);
INSERT INTO patient_test (pid) VALUES (100);
CREATE TABLE pateint_entry_test (pid INT,DateAdded DATETIME);
INSERT INTO pateint_entry_test (pid, DateAdded) VALUES (100, '2020-07-13');
Now look below code where I am just giveing you sample sub query that we are using with other query set. Where our motivation was to get date entry for each patient based on given start/end date.
WITh patient_cte AS(
SELECT * FROM patient_test
)
,
dates AS(
SELECT DATEDIFF(day, CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-06') AS TIMESTAMP_NTZ)),
CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-12') AS TIMESTAMP_NTZ))) AS Total_Days,
CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-06') AS TIMESTAMP_NTZ)) AS Start_Date,
CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-12') AS TIMESTAMP_NTZ)) AS end_date
)
,
cte2 (date) as (
SELECT TO_DATE(START_DATE) FROM dates
UNION ALL
SELECT TO_DATE(DATEADD(day, 1, date)) FROM cte2 WHERE date < (SELECT TOP 1 END_DATE FROM dates)
),
cte3 AS (
select * from patient_cte
cross join cte2
)
SELECT cte3.pid as p_pid,
pateint_entry_test.pid as p_entry_pid,
pateint_entry_test.DateAdded,
cte3."DATE" ,
IFNULL( pateint_entry_test.DateAdded, cte3."DATE") AS CALCULATEDDATEMEASURED
FROM cte3
LEFT JOIN pateint_entry_test ON
cte3.pid = pateint_entry_test.pid AND
cte3."DATE" = TO_DATE(pateint_entry_test.DateAdded)
Output of the query gives result as below.
Where you can see CALCULATEDDATEMEASURED for Row number 2 to 7 are coming as 2020-07-06 00:00:00. But as DAETADDED is coming for null then it should come proper date based on DATE column value ( Based on this condition IFNULL( pateint_entry_test.DateAdded, cte3."DATE"))
Expecting below output from the query
Not sure what is wrong, but its not behaving as expected. Aprreciate your help on this. Thank you.
I'm not sure if this is a bug, but it's due to type coercion based on the way you wrote your query. Here's your query with the TO_DATE logic applied in the IFNULL statement the same as it is in the join (along with a COALESCE to show that it produces the same result as IFNULL):
WITh patient_cte AS(
SELECT * FROM patient_test
)
,
dates AS(
SELECT DATEDIFF(day, CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-06') AS TIMESTAMP_NTZ)),
CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-12') AS TIMESTAMP_NTZ))) AS Total_Days,
CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-06') AS TIMESTAMP_NTZ)) AS Start_Date,
CONVERT_TIMEZONE('EST', 'UTC', CAST(TO_TIMESTAMP('2020-07-12') AS TIMESTAMP_NTZ)) AS end_date
)
,
cte2 (date) as (
SELECT TO_DATE(START_DATE) FROM dates
UNION ALL
SELECT TO_DATE(DATEADD(day, 1, date)) FROM cte2 WHERE date < (SELECT TOP 1 END_DATE FROM dates)
),
cte3 AS (
select * from patient_cte
cross join cte2
)
SELECT cte3.pid as p_pid,
pateint_entry_test.pid as p_entry_pid,
pateint_entry_test.DateAdded,
cte3."DATE",
IFNULL( pateint_entry_test.DateAdded, cte3."DATE") AS ORIGINAL_ERROR,
IFNULL( to_date(pateint_entry_test.DateAdded), cte3."DATE") AS CALCULATEDDATEMEASURED,
coalesce(to_date(pateint_entry_test.DateAdded), cte3."DATE") as from_coalesce
FROM cte3
LEFT JOIN pateint_entry_test
ON cte3.pid = pateint_entry_test.pid
AND cte3."DATE" = to_date(pateint_entry_test.DateAdded);
Running this in Snowflake produces this:

Ignoring Duplicate Records SQL

In need of some help :)
So I have a table of records with the following columns:
Key (PK, FK, int) DT (smalldatetime) Value (real)
The DT is a datetime for every half hour of the day with an associated value
E.g.
Key DT VALUE
1000 2010-01-01 08:00:00 80
1000 2010-01-01 08:30:00 75
1000 2010-01-01 09:00:00 100
I have a Query that finds the max value every 24 hour period and its associated time however, on one day the max value occurs twice and hence duplicates the date which is causing processing issues. I have tried using rownumber() which works but I can't use a calculated column in my where clause?
Currently I have:
SELECT cast(T1.DT as date) as 'Date',Cast(T1.DT as time(0)) as 'HH', ROW_NUMBER() over (PARTITION BY cast(DT as date) ORDER BY DT) AS 'RowNumber'
FROM TABLE_1 AS T1
INNER JOIN (
SELECT CAST([DT] as date) as 'DATE'
, MAX([VALUE]) as 'MAX_HH'
FROM TABLE_1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST([DT] as date)
) AS MAX_DT
ON MAX_DT.[DATE] = CAST(T1.[DT] as date)
AND T1.VALUE = MAX_DT.MAX_HH
WHERE DT > '6-nov-2016' and [KEY] = '1000'
ORDER BY DT
This results in
Key DT VALUE HH
1000 2010-01-01 80 07:00:00
1000 2010-02-01 100 17:30:00
1000 2010-02-01 100 18:00:00
I need to remove the duplicate date (I Have no preference which HH it takes)
I think I've explained that terribly, let me know if it makes no sense and i'll try and re write
Any ideas?
Can you try this the new code is in ** **:
SELECT cast(T1.DT as date) as 'Date', ** MIN(Cast(T1.DT as time(0))) as 'HH' **
FROM TABLE_1 AS T1
INNER JOIN (
SELECT CAST([DT] as date) as 'DATE'
, MAX([VALUE]) as 'MAX_HH'
FROM TABLE_1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST([DT] as date)
) AS MAX_DT
ON MAX_DT.[DATE] = CAST(T1.[DT] as date)
AND T1.VALUE = MAX_DT.MAX_HH
WHERE DT > '6-nov-2016' and [KEY] = '1000'
here put the group by
GROUP BY cast(T1.DT as date)
ORDER BY DT
i would do something like this
i didnt try it but i think it s correct.
SELECT cast(T1.DT as date) as 'Date',Cast(T1.DT as time(0)) as 'HH', VALUE
FROM TABLE_1 T1
WHERE [DT] IN (
--select the max date from Table_1 for each day
SELECT MAX([DT]) max_date FROM TABLE_1
WHERE (CAST([DT] as date) ,value) IN
(
SELECT CAST([DT] as date) as 'CAST_DATE'
,MAX([VALUE]) as 'MAX_HH'
FROM TABLE_1
WHERE DT > '6-nov-2016' and [KEY] = '1000'
GROUP BY CAST([DT] as date
)group by [DT]
)
WHERE DT > '6-nov-2016' and [KEY] = '1000'
Change the JOIN to an APPLY.
The APPLY operation will allow you to limit the connected relation to just one result for each source relation.
SELECT v.[Key], cast(v.DT As Date) as "Date", v.[Value], cast(v.DT as Time(0)) as "HH"
FROM
( -- First a projection to get just the exact dates you want
SELECT DISTINCT [Key], CAST(DT as DATE) as DT
FROM Table_1
WHERE [Key] = '1000' AMD DT > '20161106'
) dates
CROSS APPLY (
-- Then use APPLY rather than JOIN to find just the exact one record you need for each date
SELECT TOP 1 *
FROM Table_1
WHERE [Key] = dates.[Key] AND cast(DT as DATE) = dates.DT ORDER BY [Value] DESC
) v
A final note: Both this query and your sample query in the question will include values from Nov 6, 2016. The query says > 2016-11-05 with an exlusive inequality, but the original was still comparing using full DateTime values, meaning there is a implied 0 as a time component. So 12:01 AM on Nov 6 is still greater than 12:00:00.001 AM on Nov 6. If you want to exclude all Nov 6 dates from the query, you either need to change this to use a time value at the end of the date, or cast to date before making that > comparison.
With SQL you can use SELECT DISTINCT,
The SELECT DISTINCT statement is used to return only distinct (different) values.
Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different (distinct) values.
The SELECT DISTINCT statement is used to return only distinct (different) values.

SQL Server return date range and associated value between specified values

I'm fetching records from the DB using a query like:
SELECT date as "Date", count(date) as "NumIssues"
FROM Issues
WHERE Date BETWEEN '2015-03-25' AND '2015-03-28'
GROUP BY date
ORDER BY date;
The query works fine but I need it to fetch results even if there are no values for the specified date and return 0 for the NumIssues value.
Would the best way to go about this would be to put in case statements? Thanks in advance!
;WITH dates ( "Date" ) AS (
SELECT CONVERT( DATE, '2015-03-25' )
UNION ALL
SELECT DATEADD( DAY, 1, Date )
FROM dates
WHERE DATEADD( DAY, 1, Date ) < '2015-03-28'
)
SELECT d.Date
, COUNT(date) as "NumIssues"
FROM dates AS d
LEFT JOIN Issues AS i
ON i.date = d.Date
GROUP BY d.date
ORDER BY d.date;

Cast string as date and use it in comparison

I have a table as
NUM | TDATE
1 | 200712
2 | 200708
3 | 200704
4 | 20081210
where mytable is created as
mytable
(
num int,
tdate char(8) -- legacy
);
The format of tdate is YYYYMMDD.. sometimes the date part is optional.
So a date such as "200712" can be interpreted as 2007-12-01.
I want to write query such that i can treat tdate as a Date column and apply date comparison.
like
select num, tdate from mytable where tdate
between '2007-12-31 00:00:00' and '2007-05-01 00:00:00'
So far i tried this
select num, tdate,
CAST(LEFT(tdate,6)
+ COALESCE(NULLIF(SUBSTRING(CAST(tdate AS VARCHAR(8)),7,8),''),'01') AS Date)
from mytable
SQL Fiddle
How can I use the above converted date (3rd column ) for comparison? (needs a join?)
Also is there a better way to do this?
Edit: I have no control over the table scheme for now.. we have suggested the change to the DB team..for now have to stick with char(8) .
I think this a better way to get your fixed date:
SELECT CAST(LEFT(RTRIM(tdate) + '01',8) AS DATE)
You can create a subquery/cte with the date cast properly:
;WITH cte AS (select num, tdate,CAST(LEFT(RTRIM(tdate)+ '01',8) AS DATE)'FixedDate'
from mytable )
select num, FixedDate
from cte
where FixedDate
between '2007-12-31' and '2007-05-01'
Or you can just use your fixed date in the query directly:
select num, tdate
from mytable
where CAST(LEFT(RTRIM(tdate)+ '01',8) AS DATE) between '2007-12-31' and '2007-05-01'
Ideally you would add the fixed date field to your table so that queries can benefit from indexing the date.
Note: Be wary of BETWEEN with DATETIME as the time portion can result in undesired results if you really only care about the DATE portion.
'2007-12-31 00:00:00' > '2007-05-01 00:00:00', so your BETWEEN clause will never return any records.
This will work, with a subquery, and with the dates flipped:
select num, tdate, formattedDate
from
(
select num, tdate
,
CAST(LEFT(tdate,6) + COALESCE(NULLIF(SUBSTRING(CAST(tdate AS VARCHAR(8)),7,8),''),'01') AS Date) as formattedDate
from mytable
) a
where formattedDate between '2007-05-01 00:00:00' and '2007-12-31 00:00:00'
sqlFiddle here
I think you should avoid storing date in string type fields. If that is something you have to live with try following solution.
Since you are having yyyymmdd or yyyymm format you can first get them all in yyyymmdd format which is Culture independent ISO format and then use style 112 to convert into Date for comparison:
--Culture independent solution
;with cte as (
select num, tdate,
convert(date,left(rtrim(tdate) + '01',8),112) mydate --yyyymmdd format
from mytable
)
select num,tdate,mydate
from cte
where mydate between convert(date,'20071231',112) and --Values are in yyyymmdd format
convert(date,'20070501',112)
Yet another way to turn your string values into dates would be to use REPLACE:
SELECT num, tdate
FROM mytable
WHERE CAST(REPLACE(tdate, ' ', '01') AS date) BETWEEN #date1 AND #date2
;
If you really want to both return the converted date value and use it for filtering, you can employ CROSS APPLY to avoid repeating the logic:
SELECT t.num, t.tdate, x.date
FROM mytable AS t
CROSS APPLY (SELECT CAST(REPLACE(t.tdate, ' ', '01') AS date)) AS x (date)
WHERE x.date BETWEEN #date1 AND #date2
;
This method assumes that your char(8) strings are formatted as either YYYYMMDD or YYYYMM, although the method will work without any changes if you decide to start using values formatted as just YYYY in addition to the other two formats (to imply the beginning of a year, just like a YYYYMM implies the beginning of a month).
with date_cte(num,date)as
(select num,CAST(LEFT(tdate,6)
+ COALESCE(NULLIF(SUBSTRING(CAST(tdate AS VARCHAR(8)),7,8),''),'01') AS Date)
from mytable)
select t1.num, t1.tdate,t2.date
from mytable t1 join date_cte t2 on t1.num=t2.num
where t2.date
between '2007-12-31 00:00:00' and '2007-05-01 00:00:00'
I don't have the time to test right now, but something like this may work...
select num, tdate
from mytable
WHERE CAST(LEFT(tdate,6)
+ COALESCE(NULLIF(SUBSTRING(CAST(tdate AS VARCHAR(8)),7,8),''),'01') AS Date) BETWEEN CAST('2007-12-31 00:00:00' as smalldatetime) and CAST('2007-05-01 00:00:00' as smalldatetime)
My proposal would be to add a date field to your table. If your table is regularly updated, fill it from the legacy field through a stored proc on a regular schedule (either trigger or job).
You'll then be able to use the date as ... a date, without all these tricks, turnarounds, and other approximations which are all potential source for confusion, mistakes and questionable results.