Need to convert chararray to datetime in specific format in PIG - apache-pig

In my pig code a variable event_date is calculated like this:
SUBSTRING(case when join_start_ts is NULL or TRIM(join_start_ts)=='' then 'null' else join_start_ts end,0,10) as event_date;
Where event_date looks like this (for Eg): 2018-04-30 00:00:00.0 (NOTE: All of hours, sec, mins should be zero)
In DDL table (where event_date variable is stored after calculation is done), the event_date variable is defined as:
........
,event_date timestamp
)
PARTITIONED BY (data_input_date string)
stored as orc
location
'${hiveconf:s3bucket}/${hiveconf:fact_path}/${hiveconf:join_failure_fact}/'
TBLPROPERTIES ("orc.compress"="snappy");
While doing calculation (as shown above) I only want to change event_date format to datetime in such a way that all of its hours,mins,sec part are Zero.
For this, I have tried 2 things:
By using todate() function outside of SUBSTRING but that is not supported in Pig. If no SUBSTRING would have been present then I could have used ToDate function though.
I used this below calculation method and using it, event_date is coming in the datetime format but it looks like this (For eg) : 2018-04-30 17:03:50.798 (I want all of hours,sec,mins to be Zero)
(case when join_start_ts is NULL or TRIM(join_start_ts)=='' then NULL else ToDate(join_start_ts) end) as eventdate;
What should I do so that in the variable calculation of event_date, it looks like 2018-04-30 00:00:00.0 and should be in datetime format ?

If join_start_ts is already in required datetime format but as a string then you could use substring to get the date part and then concat '00:00:00.0'. If it is in milliseconds or in a different format then use todate, tostring, substring, and concat.Finally cast it back to datetime.
(case
when join_start_ts is NULL or TRIM(join_start_ts) == '' then NULL
else Concat(SubString(join_start_ts,0,11),'00:00:00.0')
end) as eventdate;
OR
(case
when join_start_ts is NULL or TRIM(join_start_ts) == '' then NULL
else Concat(SubString(ToString(ToDate(join_start_ts)),0,11),'00:00:00.0')
end) as eventdate;

Related

Date column which contains null values as well

i have column called startup_date which defined as STRING datatype in bigquery
which contains value like "2001-09-09 02:19:38.0 UTC" and null values as well
please help to use convert function to fetch only date value not hours and mins
used below function and getting invalid datetime string error message
EXTRACT(date FROM
datetime(CASE when startup_date = '' THEN NULL ELSE startup_date END))
The DATE and TIMESTAMP functions do exactly what you are looking for. If you have a STRING column where its format is like TIMESTAMP, you can simply apply it. Then, DATE will extract just the date and it takes care of the NULL values.
WITH my_data AS
(
SELECT TIMESTAMP("2001-09-09 02:19:38.0 UTC") AS startup_date UNION ALL
SELECT NULL UNION ALL
SELECT "2021-10-10 07:29:30.0 UTC"
)
SELECT DATE(startup_date) as date FROM my_data
returns:
You can try substr[1] from 1 to 10 to get the date, and then you can use the safe.parse_date function[2].
SELECT safe.parse_date('%Y-%m-%d', substr(startup_date, 1, 10)) AS startup_date FROM you_dataset.your_table
It returns this:
[1] https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#substr
[2] https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#parse_date

Difference Between two dates or N/A

I have a calculation between two dates.
What I'm trying to do is if the start date is before 03/07/20 then show N/A otherwise the difference between the two dates
case
when StartDate < cast('03-07-20' as date) then
'N/A'
else
DATEDIFF(day, cast(StartDate as date), cast(SwabDate as date) )
end
as Days_From_First
I get
Conversion failed when converting the varchar value 'N/A' to data type int.
Warning: Null value is eliminated by an aggregate or other SET operation.
Thank you for your help
All branches of a case expression must return the same datatype, so you can't have a branch return a string ('NA') and the other a integer (as returned by datediff()). What happens when you do that is that SQL Server prioritizes the numeric datatype, and hence attempts to coerce 'NA' to an integer - which fails.
You could cast the return value of datediff() to a string - but I would not recommend that. Probably, using null is the best way to go here: in SQL, that's usually how we represent the absence of data:
case when StartDate >= '20200703'
then datediff(day, cast(startdate as date), cast(swabdate as date))
end as Days_From_First
Note that I changed the date comparison to use literal '20200703', that SQL Server is able to unambiguously understand as a date in format YYYYMMDD.
You need to cast the results to be strings:
(case when StartDate < cast('2020-03-07' as date)
then 'N/A'
else convert(varchar(255), datediff(day, cast(StartDate as date), cast(SwabDate as date) ))
end) as Days_From_First
That said, I would really suggest that you forget about 'N/A' and just use NULL.
Also, I don't know if your date is 2020-03-07 or 2020-07-03. That is why you should use standard date formats.

How to check if column having date of birth format has yyyymmdd in sql server?

How to check if column having date of birth format has yyyymmdd in sql server?
You can verify with the ISDATE Function, but I don't know which SQL edition do you have
Example
select ISDATE ( 1 )
------------------
0
select ISDATE ( 11111111 )
------------------
0
select ISDATE ( 20170501)
------------------
1
You can use:
select (case when try_convert(date, dob) is not null and
try_convert(int, dob) is not null
then 1 else 0
end)
I'm not 100% sure, but I think that yyyymmdd is the only format that will generally pass both conditions. Note: There is no way to know if 20170601 is really June 1st or Jan 6th, so this cannot actually validate the contents of the field.
But why do you care what the format is, so long as you can convert it to a date? You should then change the column to a date type and henceforth know that the "format" is correct.
If the column is a Char type (yuck) then Like '[12][0-9][0-9][0-9][0-1][0-9][0-3][0-9]' (off the top of my head)
If it's a DateTime then who cares, it's a date.

If/then/else in SQL Query

I would like to check a date value in my SQL query. If a date is equal to a predefined date then do not print anything, ELSE print the existing date value.
How can I write it correctly in order to take the desired date value ?
I have the following query:
(SELECT (CASE
WHEN (PaymentsMade.PaymentDate = '09/09/1987') THEN ' '
ELSE PaymentsMade.PaymentDate
END)
) as dateOfPayment
When I run this query it works correctly when the date is equal to '09/09/1987' , whereas when the date is not equal to '09/09/1987' it prints '01/01/1900'.
How can I retrieve the dates values that are not equal to the predefined date '09/09/1987'?
Any advice would be appreciated.
Thanks
The CASE clause needs to return a consistently-typed value, so it is implicitly converting a space to a date (which is evaluated as 1 Jan 1900).
You have two choices:
select a null instead of a blank space.
explicitly cast the date in the else condition to a string.
Here's an (implicit) example of the former:
SELECT (CASE WHEN PaymentsMade.PaymentDate <> '09/09/1987'
THEN PaymentsMade.PaymentDate
END)
as dateOfPayment
Use NULL, not empty string
An empty string is cast to zero implicitly, which is '01/01/1900'
SELECT CAST('' AS datetime)
Using a CASE statement changes the value in that field, but doesn't change which rows are returned.
You appear to want to filter out rows, and if that is the case, use a WHERE clause...
SELECT
*
FROM
PaymentsMade
WHERE
PaymentDate <> '09/09/1987'
You could use NULLIF to replace a specific date with a NULL:
SELECT NULLIF(PaymentsMade.PaymentDate, '09/09/1987')
FROM ...
Don't just use an empty string, because it would be converted to the type of PaymentDate, which is probably a datetime, and an equivalent datetime for '' would be 1900-01-01 00:00:00.000.

SQL Server: Using a case statement to see if a date is NULL and if it is returning ' '

I have a column in my select statement that looks like this:
SELECT CASE WHEN fu.SentOutDate IS NULL THEN '' ELSE fu.SentOutDate END
This returns 1900-01-01 00:00:00.000 for the ones that would otherwise be NULL
I know this because when I put in just fu.SentOutDate it comes up as NULL
Why does this happen and how can I just get it to return a blank value?
Try converting the date to a string so it doesn't try to convert '' to a date:
(CASE WHEN fu.SentOutDate IS NULL THEN '' ELSE CONVERT(varchar,fu.SentOutDate) END)
It's casting your '' to a DATETIME, since your other column you'd return is a datetime column.
SELECT CASE WHEN 1=1 THEN '' ELSE GETDATE() END
will give you the same value...
You can convert this to a varchar(32), but I'm not certain of the ramifications
SELECT CASE WHEN 1=1 THEN '' ELSE CAST(GETDATE() AS varchar(32)) END
A column can only return one data type - DATETIME != string/VARCHAR.
If you want a zero length string in the event of the value being NULL, you have to explicitly change the data type, using CAST/CONVERT to change the non-NULL value to a VARCHAR/etc data type.
If you're just checking for NULL values, you might try ISNULL() and cast the date as a varchar.
SELECT ISNULL(CAST(fu.SentOutDate AS VARCHAR(50)), '') AS SendOutDate
FROM tablename
It sounds like you're displaying this value in a GUI or client somewhere. In my opinion, the best practice is to convert it from the NULL value there, not in the query.
If you ever create a database that scales to millions of users, you want a little processing as possible in the database and as much as possible in the client. Doing conversion of a date to character is an unneeded load on the system (character calculation is always much slower than math).
This may work:
case when ISNULL(convert(varchar, a.rec_dt, 108), '00:00:00')='00:00:00' then ''
else CAST(rec_dt as varchar)