Need the equivalent of SQL IsNumeric function in spark sql - apache-spark-sql

Like we have SQL ISNUMERIC Function which validates whether the expression is numeric or not , I need if there is any equivalent function in Spark SQL, I have tried to find it but couldn't get it. Please if someone can help or suggest for the same ?

Try using spark udf, this approach will help you clone any function -
scala> spark.udf.register("IsNumeric", (inpColumn: Int) => BigInt(inpColumn).isInstanceOf[BigInt])
res46: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(<function1>,BooleanType,Some(List(IntegerType)))
scala> spark.sql(s""" select "ABC", IsNumeric(123) as IsNumeric_1 """).show(false)
+---+-----------+
|ABC|IsNumeric_1|
+---+-----------+
|ABC|true |
+---+-----------+
scala> spark.sql(s""" select "ABC", IsNumeric("ABC") as IsNumeric_1 """).show(false)
+---+-----------+
|ABC|IsNumeric_1|
+---+-----------+
|ABC|null |
+---+-----------+
Here, above function will return null if column value is not integer.
Hope this will be helpful.

For anyone coming here by way of Google :) , there is an alternative answer by regex for isnumeric in spark sql
select
OldColumn,
CASE WHEN OldColumn not rlike '[^0-9]' THEN 1 ELSE 0 END AS OldColumnIsNumeric
from table
The regex simply checks if the column is numeric or not.
You can modify to fit for substrings of the column you are checking too.

Related

SQL issue with Like operator not working as expected

I have an sql issue using teradata sql-assistant with like operator as is shown in the below exemple:
table A
id|
23_0
111_10
201_540
so i should select only the id that finish with '_0'
i tried the below query but it give me all the three ids
select * from A
where id like '%_0'
but i expect only
id|
23_0
have you any idea, please ?
The problem is that _ is a special character. So, one method is:
where id like '$_0' escape '$'
You can also use right():
where right(id, 2) = '_0'

Oracle SQL placeholder for everything in WHERE clause

I am trying to write a SQL query with a WHERE clause that does not actually filter anything.
import cx_Oracle
condition = []
#condition = ['a', 'b']
query = ("""
SELECT *
FROM table
WHERE var = {}""".format(tuple(condition)))
with cx_Oracle.connect(dsn=tsn), encoding="UTF-8")) as con:
df = pd.read_sql(con=con, sql=query)
Disclaimer: Still a bit new to SQL. I appreciate corrections of terminology
Edit:
This is often handled using logic like this:
where col = :input_val or :input_val is null
If you don't want to have the WHERE clause filter anything then it needs to contain a condition which is always true; something like
SELECT *
FROM table1
WHERE var = var OR
var IS NULL
db<>fiddle here
If you need to use an index and the column value is not nullable then you'll want to use the NVL trick. I don't know why, but Oracle has better optimizations for NVL than for the ...OR A IS NULL expression.
SELECT *
FROM table
WHERE nvl(anyvalueofvar1, var) = var;
See my answer here for an example of the better execution plan generated by this different syntax.
(I said "and the column value is not nullable" because NULL = NULL is not true in Oracle, so the NVL expression won't work if the column has nulls. I usually hate these cryptic syntaxes with weird behavior, but they are sometimes necessary for performance.)

String Concatenation issue in Spark SQL when using rtrim()

I am facing a peculiar or unknown concatenation problem during PySpark SQL query
spark.sql("select *,rtrim(IncomeCat)+' '+IncomeCatDesc as trimcat from Dim_CMIncomeCat_handled").show()
In this query both IncomeCat and IncomeCatDesc fields hold String type value so logically i thought it would concatenate but i get resultant field null
where the achievable result will be '14100abcd' where 14100 is IncomeCat part and abcd is IncomeCatdesc part . i have tried explicit casting as well on IncomeCat field
spark.sql("select *,cast(rtrim(IncomeCat) as string)+' '+IncomeCatDesc as IncomeCatAndDesc from Dim_CMIncomeCat_handled").show()
but I am getting same result. so am i something missing here. kindly help me to solve this
Spark doesn't override + operator for strings and as a result query you use doesn't express concatenation. If you take a look at the basic example you'll see what is going on:
spark.sql("SELECT 'a' + 'b'").explain()
== Physical Plan ==
*Project [null AS (CAST(a AS DOUBLE) + CAST(b AS DOUBLE))#48]
+- Scan OneRowRelation[]
Both arguments are assumed to be numeric and in general case the result will be undefined. Of course it will work for strings that can be casted to numerics:
spark.sql("SELECT '1' + '2'").show()
+---------------------------------------+
|(CAST(1 AS DOUBLE) + CAST(2 AS DOUBLE))|
+---------------------------------------+
| 3.0|
+---------------------------------------+
To concatenate strings you can use concat:
spark.sql("SELECT CONCAT('a', 'b')").show()
+------------+
|concat(a, b)|
+------------+
| ab|
+------------+
or concat_ws:
spark.sql("SELECT CONCAT_WS('*', 'a', 'b')").show()
+------------------+
|concat_ws(*, a, b)|
+------------------+
| a*b|
+------------------+

How to cast float to string with no exponents in BigQuery

I have float data in a BigQuery table like 5302014.2 and 5102014.4.
I'd like to run a BigQuery SQL that returns the values in String format, but the following SQL yields this result:
select a, string(a) from my_table
5302014.2 "5.30201e+06"
5102014.4 "5.10201e+06"
How can I rewrite my SQL to return:
5302014.2 "5302014.2"
5102014.4 "5102014.4"
use standardSQL doesn't have the problem
$ bq query '#standardSQL
SELECT a, CAST(a AS STRING) AS a_str FROM UNNEST(ARRAY[530201111114.2, 5302014.4]) a'
+-------------------+----------------+
| a | a_str |
+-------------------+----------------+
| 5302014.4 | 5302014.4 |
| 5.302011111142E11 | 530201111114.2 |
+-------------------+----------------+
SELECT STRING(INTEGER(f)) + '.' + SUBSTR(STRING(f-INTEGER(f)), 3)
FROM (SELECT 5302014.5642 f)
(not a nice hack, but a better method would be a great feature request to post at https://code.google.com/p/google-bigquery/issues/list?can=2&q=label%3DFeature-Request)
Converting your legacy sql to standard sql is really the best way going forward as far as working with GBQ is concerned. Standard sql is much faster and have way better implementation of features.
For your use case, going with standard sql with CAST(a AS STRING) would be best.

Oracle Documentation NOT NVL

I'm doing some converting from Oracle to MSSQL and I was reading a guide by Oracle on B Supported SQL Syntax and Functions.
I noticed it was stated that there is a NOT NVL function (and its MSSQL equivalent was IS NOT NULL).
I'm compiling a list for my colleagues so we can have a one-stop resource for syntax and supported functions, am I correct in assuming that NOT NVL works like so:
There are 3 columns, name, location, loves_marmite
Andrew | UK | Yes
NOT NVL(loves_marmite, 'Nope')
So the data displayed would be:
Andrew | UK | Nope
I just don't get why it would be listed as an Oracle Function when it's just a logic issue, and what's more is that Oracle has IS NULL and IS NOT NULL.
I'm sorry I'm just looking for some clarification before I pass this document on to my colleagues.
EDIT : If possible would someone have a comprehensive list of function and syntax differences between the two platforms?
Check NVL2(param1, param2, param3) function.
If param1 is NOT (NULL or EMPTY STRING) it returns param2 else returns param3.
You could write:
NVL2(loves_marmite, 'Nope', something_else)
Also, see this answer for a list of null-related functions in Oracle
First, please see the isNull function. But Oracle may be trying to tell you to replace the NVL funcionality with a case;
SELECT CASE WHEN Foo IS NOT NULL THEN bar
ELSE BLA
END