SQL; Split a value up using sql in hive/hue - sql

needing some advice on splitting a number into a date timestamp, currently using Hue to query the hive db;
In a table I have a column that is used to capture a unique ref for a record. The value looks like this;
219872021081000741
Contained within this is a date and time, I'm looking to extract (using sql) the date/time from this and have it as a column of its own. Here is the breakdown of the number:
Based on the bold values from left to right is DD YYYY MM HHMM
21 987 2021 08 1000 741
regex
[0-3]?[0-9]{1}$ref[2][0-9][0-9][0-9][0-1][0-9][0-2][0-9][0-5][0-9][0-9]{3}_"
Using sql, I want to assess the number then create a column that then formats it to DD-MM-YY HHMM as timestamp. Have reviewed some posts, and trying out a few things, but not having much luck. The other sticking point is the DD will not always be 2 values eg, if it was the 1st then it will be 1 not 01.
Trying to incorporate into the below. Thanks in advance for any advice.
select *,
cast((UTC +(60*60*12)*1000)/1000 as TIMESTAMP) as `LocalTime`
from Table.Name
where
name rlike 'FieldValue.*'
UPDATE: In a roundabout way I updated the sql to do a count of the value.
If it has 17 digits, then i know the day is anywhere from the 1st-9th
so I tag it as 17.
If it has 18 digits, then I know the day is anywhere from the 10h-endofmonth
From here i use substring to return the day components, which I'll bring into a single field via concat or something along those lines.**
Here is the update sql, just need to figure out/get some guidance on how I now determine how to use the new column FieldCount eg it is 17, then substring(FieldValue ,1,1) given its anything from the 1st-9th. If its 18, then substring(FieldValue ,1,2) given its anything from the 10th up.
select *,
cast((utc+(60*60*12)*1000)/1000 as TIMESTAMP) as `LocalTime`,
case
when FieldValue REGEXP '^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$' then '17'
when FieldValue REGEXP '^[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]$' then '17'
end FieldCount,
substring(FieldValue ,6,4) as Years,
substring(FieldValue ,1,1) as Days,
substring(FieldValue ,10,2) as Months,
substring(FieldValue ,12,2) as Hours,
substring(FieldValue ,14,2) as Minut
from table.name
New Update, I changed this now to separate based on case condition. This basically separates out the value into separate fields. Any ideas to concat based on alias field names?
select
AField,
cast((UTC+(60*60*12)*1000)/1000 as TIMESTAMP) as `LocalTime`,
case when length(AField) = 18 then substring(AField,1,2) else substring(AField,1,1) end Days,
case when length(AField) = 18 then substring(AField,10,2) else substring(AField,9,2) end Months,
case when length(AField) = 18 then substring(AField,6,4) else substring(AField,5,4) end years,
case when length(AField) = 18 then substring(AField,12,2) else substring(caseid,11,2) end Hours,
case when length(AField) = 18 then substring(AField,14,2) else substring(AField,13,2) end minutes
from table.name

Correct timestamp string representation in Hive is yyyy-MM-dd HH:mm:ss.S.
You do not need to extract all parts separately, then concat to get timestamps. Using regexp_replace you can build correct timestamp using backreferences to capturing groups (in round brackets) in the regexp.
with mytable as(--test dataset, use your table instead
select stack(2,
'219872021081000741',
'19872021081000741'
) as AField
)
select
case when length(AField) = 18
then timestamp(regexp_replace(AField,'^(\\d{2})\\d{3}(\\d{4})(\\d{2})(\\d{2})(\\d{2})\\d{3}$','$2-$3-$1 $4:$5:00.0'))
else timestamp(regexp_replace(AField,'^(\\d)\\d{3}(\\d{4})(\\d{2})(\\d{2})(\\d{2})\\d{3}$','$2-$3-0$1 $4:$5:00.0'))
end as result
from mytable
Result:
result
2021-08-21 10:00:00.0
2021-08-01 10:00:00.0
Note: timestamp() construct here is to demonstrate that string produced is compatible with timestamp data type and is being cast correctly, you can keep it as string if you prefer.

Related

Concat in not working inside case statement

I have a month column with values from 1,2,3 up to 12. I am writing below query to convert column values with 1 digit to 2 digits that is values like 1 and 2 will be converted to 01 and 02, but that concatenation is not working, the month still remains as single digit.
Main query:
select
case
when len(month) = 1
then concat(0, month)
else month
end as month_new,
month
from
Table
But when I tried the query separately as below the concatenation works and it converts single digit month to 2 digits
Query 1
select top 10 concat(0, month), month
from table
Query 1 alone is working
Query 2
select
case
when len(month) = 1
then 1
else 0
end,
month
from
Table
Query 2 alone is working, means the checking of length in column month is working as expected. But when concat used inside case it is not working.
I have modified the query as below and worked for me
select
case
when len(month) = 1
then concat(0, month)
else cast(month as varchar)
end as month_new,
month
from
table
The problem is that month is an integer, whereas the result from concat() is a string. So. case is trying to cast the string back into an integer. You could force the integer into a string by using cast, but there are better ways to do this.
Instead, just use the FORMAT function:
select
format(month, '00') as month_new
, month
from viivscaazure.F_SALES_DETAIL
Don't know what database are you using and since you don't provide any sample data I only can assume that your CASE is not the problem, but if you want to do so that means your datatype is string and you tried to CONCAT string with integer in your query.
Maybe you can try to add "quote" to your zero string and CAST the result as a string.

Extracting date in SQL

I have a column due date in the format 20210701 (YYYYMMDD), using SQL I want to extract all the dates apart from 5th of particular month ( As highlighted in the pic below )
I used the below code:
SELECT Due_Date_Key
FROM Table
WHERE Due_Date_Key <>20210705
However the error in the above code is it will exclude only the month of jul but not for other months.
How can extract the dates apart from 5th from the entire column.
Help would be much appreciated.
Note that column DUE_DATE_KEY is numeric.
A more SQLish way would be to convert string to date and then check if day is not 5
SELECT * FROM Table
WHERE DATE_PART('day', to_date(cast(DUE_DATE_KEY as varchar), 'YYYYMMDD')) != 5
Using modulo operator to determine whether the last two digits of DUE_DATE_KEY are 05.
select * from T where DUE_DATE_KEY % 100 <> 5
Using your sample data, the above query returns the following:
due_date_key
20210701
20210708
20210903
Refer to this db fiddle

How to calculate difference of dates in different formats in Snowflake?

I am merging 2 huge tables in Snowflake and I have 2 columns (one on each table):
"Year_birth" and "Exam_date" and the info inside looks like this respectively:
"1918" and "2007-03-13" (NUMBER(38,0) and VARCHAR(256))
I only want to merge the rows where the difference (i.e., age when the exam was made) is ">18" and "<60"
I was playing around with SELECT DATEDIFF(year,Exam_date, Year_birth) with no success.
Any ideas on how would I do it in Snowflake?
Cheers!
You only have a year, so there is not much you can do about the specific day of the year -- you need to deal with approximations.
So, extract the year from the date string (arggh! it should really be a date) and just compare them:
where (left(datestr, 4)::int - yearnum) not between 18 and 60
I would strongly advise you to fix the database and store these values using a proper date datatype.
You will need to convert the integer year into a date before doing a datediff
example:
set YearOfBirth = 1918;
set ExamDate = '2007-03-03'::DATE;
-- select $YearofBirth as YearofBirth, $ExamDate as ExamDate;
select $YearofBirth as YearofBirth,($YearofBirth::TEXT||'-01-01')::DATE as YearofBirthDate, $ExamDate as ExamDate, datediff(year,($YearofBirth::TEXT||'-01-01')::DATE,$ExamDate) as YearsSinceExam;
USE YEARS_DIFF IN WHERE CLAUSE TO FILTER DIFFERENCE BETWEEN 18 & 60
SELECT DATEDIFF( YEAR,'2007-03-03',TO_DATE(2018::CHAR(4),'YYYY')) YEARS_DIFF;

SQL DB2 - Possible to shorten long-listed 'case when' statement?

Sometimes my queries have long case when statements. For example,
CASE WHEN BASE_YM = TO_CHAR(DEFAULT_YM, 'YYYYMM') THEN '00'
WHEN BASE_YM = TO_CHAR(DEFAULT_YM - 1 MONTHS, 'YYYYMM') THEN '01'
WHEN BASE_YM = TO_CHAR(DEFAULT_YM - 2 MONTHS, 'YYYYMM') THEN '02'
.
.
.
WHEN BASE_YM = TO_CHAR(DEFAULT_YM - 35 MONTHS, 'YYYYMM') THEN '35'
WHEN BASE_YM = TO_CHAR(DEFAULT_YM - 36 MONTHS, 'YYYYMM') THEN '36'
This case when statement itself takes up 37lines and I was wondering if there might be maybe a way to shorten the case when statement by using i=00, ..., 36 something like that?
It seems like all you're really doing is finding out how many months are between BASE_YM and DEFAULT_YM, and making that a character type. I think the same result can be retrieved by doing some date math (and I'm guessing it'd be more efficient, because it wouldn't have to calculate up to 36 different TO_CHAR calls) This whole statement would replace the CASE. I tested similar logic on DB2 for Linux/Unix/Windows v.9.7, and z/OS v10:
SELECT
LPAD(
ABS(
MONTH(
TIMESTAMP_FORMAT(BASE_YM, 'YYYYMM') -
DEFAULT_YM
)
)
,2,'0')
FROM your_table
I added the extra spacing so that you could (hopefully) follow along a little easier. Basically, it just subtracts DEFAULT_YM from the converted form of BASE_YM. Then, I take the Absolute Value of the difference in months, and then convert it to a zero-padded character field.
make a new table with 2 columns containing your 36 rows of distinct values, then join it on? this will eliminate the need for a case statement at all

Select range age with unit of measure in same field

I am working on a table where the age of a person is in a string field where it is in the following format: (amount UnitOfMeasurement)
1 year old = 1 y
11 months old = 11 m
5 Days old = 5 d
I am trying to do a search between a range of age. Is is possible to this via a SQL query where it would order the days (d) first, then months (m), and years (y)?
The database is on SQL Server 2008, but the query will probably be done on Access as it is used for a report's record source.
The first thing I'd do in your situation is try to clean up the messy age field, and standardise it. A quick start might be to create a query where you separate the age value and the age unit, by using expressions such as:
age_unit: Right([age], 1)
and
age_value: Val([age])
If you then sort by age_unit and age_value, you will get all ages sorted correctly (under the assumption that an age in days is always less than an age in months, which in turn is always less than an age in years). Note that you must sort by unit first, then value.
If you want to return ages between a certain minimum and maximum, it's not a problem if you're sticking to a single unit, such as all ages between 5 years and 15 years. Just enter "y" as a criteria under the "age_unit" field (assuming you're using the visual query builder here) and enter "Between 5 and 15" under the "age_value" field.
If you're mixing units ("all ages between 6 months and 2 years") it gets a little more complicated. In this case you'd need to do the following:
On one criteria row you'd enter the following values for each field:
age_unit: "m"
age_value: >=6
And then on the next criteria row:
age_unit: "y"
age_value: <=2
This will return all ages having unit "m" and a value >= 6 OR having unit "y" with a value <=2.
Another somewhat simpler solution would be to convert all ages to a standard unit such as years, by doing some simple calculations, e.g. divide "d" unit values by 365.25, and divide "m" unit values by 12. Then create a new field in your table for the new standardised age data.
Your best bet would be to create a new colum with a real DATETIME value in it. You could then write code, such as a CASE statement, to help convert the string into a DATETIME. Once completed, your calculations will become much simpler.
1.This field doesn't has atomic values. This means that your table is not in 1NF.
You should split Age field into 2 columns with atomic values: IntervalType(CHAR(1)... CHECK(IntervalType IN ('d','m','y')) and IntervalValue (INT; 1,2, etc).
So, instead of Table(...,Age) you can use Table(...,IntervalType,IntervalValue) and
SELECT *
,CONVERT(VARCHAR(10),IntervalValue)
+' '+CASE IntervalType WHEN 'd' THEN 'day' WHEN 'm' THEN 'month' WHEN 'y' THEN 'year' END
+CASE WHEN IntervalValue > 1 THEN 's' ELSE '' END
+' old = '
+CONVERT(VARCHAR(10),IntervalValue)
+' '+IntervalType
FROM table
2.How do you sort these two values: 30 d and 1 month ? One month can have from 28 to 31 days.
3.SQL Server solution:
DECLARE #TestData TABLE
(
Age VARCHAR(25) NOT NULL
,IntervalValue AS CONVERT(INT,LEFT(Age,CHARINDEX(' ',Age))) PERSISTED
,IntervalType AS RIGHT(Age,1) PERSISTED
);
INSERT #TestData
VALUES
('1 year old = 1 y')
,('2 years old = 2 y')
,('11 months old = 11 m')
,('30 Days old = 30 d')
,('5 Days old = 5 d');
SELECT *
FROM #TestData a
ORDER BY a.IntervalType, a.IntervalValue;