Timestamp comparison is failing in spark SQL in databricks - sql

I was executing below simple Spark-SQL code azure databricks.
val df2=spark.sql(
s"""
select
mbrgm.mbrgm_id as case_id,
case
when mbr_hist.meck is not null
and mbr_hist.efdt is not null
and mbr_hist.efdt <= mbr_pgm.credttm
and (
mbr_hist.exp_dt is null
or mbr_hist.exp_dt > mbrgm.creat_dttm
) then mbr_hist.meck
else mbrgm.facmbid
end as mb_fid,
.....
from
tempview1 mbrgm
left join left outer join tempview2 mbr_hist on (mbrgm.mrid = mbr_hist.mrid
and mbr_hist.efdt <= mbrgm.credttm
and mbr_hist.exdt > mbrgm.credttm
Every time I execute I get else condition value for mb_fid field i.e, mbrgm.facmbid. I have checked My data and compared with logic. As per logic it should go for then condition. I think while comparing mbr_hist.efdt <= mbr_pgm.credttm it is always not true.
I am having mbr_hist.efdt as a String type ex: 2017-07-22 21:58:46 and mbr_pgm.credttm as a timestamp ex:2011-08-13T11:00:00.910+0000. Is it like because of different in length of values ,my comparison is failing. What I can use to compare correctly.

Databricks can't directly compare the string with timestamp. You need to convert your string into the timestamp. By default, cast works only with strings in the ISO 8601 format, so you need to use the to_timestamp function with explicit date/time pattern to do the conversion.
like
select to_timestamp(mbr_hist.efdt, 'pattern') as efdt ...

Related

Store int, float and boolean in same database column

Is there a sane way of storing int, float and boolean values in the same column in Postgres?
If have something like that:
rid
time
value
2d9c5bdc-dfc5-4ce5-888f-59d06b5065d0
2021-01-01 00:00:10.000000 +00:00
true
039264ad-af42-43a0-806b-294c878827fe
2020-01-03 10:00:00.000000 +00:00
2
b3b1f808-d3c3-4b6a-8fe6-c9f5af61d517
2021-01-01 00:00:10.000000 +00:00
43.2
Currently I'm using jsonb to store it, the problem however now is, that I can't filter in the table with for instance the greater operator.
The query
SELECT *
FROM points
WHERE value > 0;
gives back the error:
ERROR: operator does not exist: jsonb > integer: No operator matches the given name and argument types. You might need to add explicit type casts.
For me it's okay to handle boolean as 1 or 0 in case of true or false. Is there any possibility to achieve that with jsonb or is there maybe another super type which lets me use a column that is able to use all three types?
Performance is not so much of a concern here, as I'm going to have very few records inside of that table, max 5k I guess.
If you were just storing integers and floats, normally you'd use a float or numeric column.
But there's that pesky true.
You could cast the JSON...
select *
from test
where value::float > 1;
...but there's that pesky true.
You have to convert the boolean to a number to make it work.
select *
from test
where
(case when value = 'true' then 1.0 when value = 'false' then 0.0 else value::float end) >= 1;
Or ignore it.
This having to work around the type system suggests that value is actually two or even three different fields crammed into one. Consider separating them into multiple columns.
You should skip the rows where value is not number and cast the value to numeric, e.g.:
with points(id, value) as (
values
(1, 'true'::jsonb),
(2, '2'),
(3, '43.2')
)
select *
from points
where jsonb_typeof(value) = 'number'
and value::text::numeric > 0;
id | value
----+-------
2 | 2
3 | 43.2
(2 rows)
I actually found out, regardless of the jsonb fields value, that you can compare it to other jsonb in postgres. That means, I can for instance do the following:
SELECT *
FROM points
WHERE val > '5'
This correctly gives me back only the third row. It just ignores the bool value. To filter for a certain bool I can achieve that with the following query:
SELECT *
FROM points
WHERE val = 'true'
This is good enough for me. I even could hold timestamps in the json column and compare them using this methodology.
Another way of solving the problem after all your comments seem to be to make the column a numeric. This would work as well, but requires more client side conversion, as I would have to have a second type column, remembering what the actual type is. This type should than be used on the client side to convert the value back into its og value. For integers its trivial, for booleans like #schwern suggested, one can use 1 and 0, for dates, one could use the unix timestamp representation.
When I now want to search for a certain value, the type has to be contained in the where clause as well.

Conditional Data Validation In SSIS

I have a situation where table has two main columns DataFieldName & DataFieldValue along with some identifier such as OrderNumber.
Now in DataFieldName there is value named as "OrderDate" and respective dates are coming in DataFieldValue.
But some of the value for "OrderDate" are coming as non date values. I need to validate such non date value based on the condition where DataFieldName has value as "OrderDate" then validate the DataFieldValue for valid date in SSIS.
You can split your data using a Conditional Split :
or a query with a condition if you are using SQL :
SELECT DataFieldName, DataFieldValue FROM yourTable
WHERE DataFieldName LIKE 'OrderDate'
If you are using SQL Server :
SELECT
CASE WHEN TRY_CONVERT(date, DataFieldValue) IS NULL
THEN 'Cast failed'
ELSE 'Cast succeeded'
END AS Result
FROM yourTable
If you are using Oracle :
SELECT cast(DataFieldValue AS NUMBER DEFAULT NULL ON CONVERSION ERROR) FROM yourTable
Or, you can use a Data Conversion Transformation component :
Then you can redirect the output to a flat file for example :

Hive - how to check if a numeric columns have number/decimal?

I am trying to generate a hive query which will take multiple numeric column names and check whether it is has numeric values. If the column has numeric values then the output should be (column name,true) else if the field has NULL or some string value the output should be (column name,false)
SELECT distinct (test_nr1,test_nr2) FROM test.abc WHERE (test_nr1,test_nr2) not like '%[^0-9]%';
SELECT distinct test_nr1,test_nr2 from test.abc limit 2;
test_nr1 test_nr2
NULL 81432269
NULL 88868060
the desired output should be :
test_nr1 false
test_nr2 true
Since test_nr1 is a decimal field and it has NULL values, it should output false.
Appreciate valuable suggestions.
You can use cast function. It returns NULL when the value can not not be cast to numeric.
For example:
select case when cast('23ccc' as double) is null then false else true end as IsNumber;
You're trying to use character class pattern matching syntax here, and it doesn't work in every SQL implementation IIRC, however, regexp matching works in most, if not all, SQL implementations.
Considering you're using hive, this should do it:
SELECT ('test_nr1', test_nr1 RLIKE '\d'), ('test_nr2', test_nr2 RLIKE '\d') FROM test.abc;
You should remember that regexp matching is very slow in SQL though.

sqlldr - how to use if/then logic on a field?

I am loading a particular field that has date values. However, some of them are not complete... for example the values look like this
START_DATE
'2015-06-12'
'2016-12-24'
'2015-02' <--- this is what causes an error
'2016-01-03'
I have tried solving this by combining NULLIF with a LENGTH() function like so, but this is not allowed:
Start_date NULLIF LENGTH(:start_date)<10 to_date .....
this returns the error
Expecting positive integer or column name, found keyword length.
My main objective is to load dates that are of a proper format, and load NULL otherwise. What is the easiest way to do this within the ctl file? Can I avoid creating a custom function?
Say I have a table like this:
create table dateTable(START_DATE date)
and I need to load this file, where I want to insert NULL where the string does not match my pattern
'2016-12-28'
'2016-12-'
'2016-12-31'
I can add some logic in my ctl file to check the length of the string to load this way:
load data
infile dateTable.csv
into TABLE dateTable
fields enclosed by "'"
( START_DATE "to_date(case when length(:START_DATE) = 10 then :START_DATE end, 'yyyy-mm-dd')"
)
This simply checks the length of the string, but you can edit it anyway you need to build your own logic; notice that CASE gives NULL when no condition is matched, so this is equivalent to case when length(:START_DATE) = 10 then :START_DATE else NULL end.
This gives the following result:
SQL> select * from dateTable;
START_DATE
----------
28-DEC-16
31-DEC-16
In oracle, you can verify a string to make sure that is it valid date or not. Please Check IsDate function.

Why would YEAR fail with a conversion error from a Date?

I got a view named 'FechasFirmaHorometros' defined as
SELECT IdFormulario,
CONVERT(Date, RValues) AS FechaFirma
FROM dbo.Respuestas
WHERE ( IdPreguntas IN (SELECT IdPregunta
FROM dbo.Preguntas
WHERE
( FormIdentifier = dbo.IdFormularioHorometros() )
AND ( Label = 'SLFYHDLR' )) )
And i have a Function named [RespuestaPreguntaHorometrosFecha] defined as
SELECT Respuestas.RValues
FROM Respuestas
JOIN Preguntas
ON Preguntas.Label = #LabelPregunta
JOIN FechasFirmaHorometros
ON FechasFirmaHorometros.IdFormulario = Respuestas.IdFormulario
WHERE Respuestas.IdPreguntas = Preguntas.IdPregunta
AND YEAR(FechasFirmaHorometros.FechaFirma) = #Anio
AND MONTH(FechasFirmaHorometros.FechaFirma) = #Mes
#LabelPregunta VARCHAR(MAX)
#Anio INT
#Mes INT
I keep getting this message upon hitting the aforementioned function while debugging another stored procedure that uses it
Conversion failed when converting date and/or time from character string.
Yet i can freely do things like
SELECT DAY(FechaFirma) FROM FechasFirmaHorometros
Why is this happening and how can i solve or work around it?
I assume that RValues is a string column of some type, for some reason. You should fix that and store date data using a date data type (obviously in a separate column than this mixed bag).
If you can't fix that, then you can prevent what Damien described above by:
CASE WHEN ISDATE(RValues) = 1 THEN CONVERT(Date, RValues) END AS FechaFirma
(Which will make the "date" NULL if SQL Server can't figure out how to convert it to a date.)
You can't prevent this simply by adding a WHERE clause, because SQL Server will often try to attempt the conversion in the SELECT list before performing the filter (all depends on the plan). You also can't force the order of operations by using a subquery, CTE, join order hints, etc. There is an open Connect item about this issue - they are "aware of it" and "hope to address it in a future version."
Short of a CASE expression, which forces SQL Server to evaluate the ISDATE() result before attempting to convert (as long as no aggregates are present in any of the branches), you could:
dump the filtered results into a #temp table, and then subsequently select from that #temp table, and only apply the convert then.
just return the string, and treat it as a date on the client, and pull YEAR/MONTH etc. parts out of it there
just use string manipulation to pull YEAR = LEFT(col,4) etc.
use TRY_CONVERT() since I just noticed you're on SQL Server 2012:
TRY_CONVERT(DATE, RValues) AS FechaFirma