How to cast float to string with no exponents in BigQuery - google-bigquery

I have float data in a BigQuery table like 5302014.2 and 5102014.4.
I'd like to run a BigQuery SQL that returns the values in String format, but the following SQL yields this result:
select a, string(a) from my_table
5302014.2 "5.30201e+06"
5102014.4 "5.10201e+06"
How can I rewrite my SQL to return:
5302014.2 "5302014.2"
5102014.4 "5102014.4"

use standardSQL doesn't have the problem
$ bq query '#standardSQL
SELECT a, CAST(a AS STRING) AS a_str FROM UNNEST(ARRAY[530201111114.2, 5302014.4]) a'
+-------------------+----------------+
| a | a_str |
+-------------------+----------------+
| 5302014.4 | 5302014.4 |
| 5.302011111142E11 | 530201111114.2 |
+-------------------+----------------+

SELECT STRING(INTEGER(f)) + '.' + SUBSTR(STRING(f-INTEGER(f)), 3)
FROM (SELECT 5302014.5642 f)
(not a nice hack, but a better method would be a great feature request to post at https://code.google.com/p/google-bigquery/issues/list?can=2&q=label%3DFeature-Request)

Converting your legacy sql to standard sql is really the best way going forward as far as working with GBQ is concerned. Standard sql is much faster and have way better implementation of features.
For your use case, going with standard sql with CAST(a AS STRING) would be best.

Related

How to create a BQ SQL UDF that iterates over a string?

TL;DR:
Is there a way to do string manipulation in BQ only with SQL UDF?
Eg:
____________________________________________________
id | payload
----------------------------------------------------
1 | key1=val1&key2=val2&key3=val3=&key4=val4
----------------------------------------------------
2 | key5=val5&key6=val6=
select removeExtraEqualToFromPayload(payload) from table
should give
____________________________________________________
payload
----------------------------------------------------
key1=val1&key2=val2&key3=val3&key4=val4
----------------------------------------------------
key5=val5&key6=val6
Long version:
My goal is to iterate over a string that is part of one of the columns
This is our table structure
____________________________________________________
id | payload
----------------------------------------------------
1 | key1=val1&key2=val2&key3=val3=&key4=val4
----------------------------------------------------
2 | key5=val5&key6=val6=
As you see, key3 in first row has an = after val3 and key6 in second row has an = after val6 which is not desired for us
So the goal is to iterate over the string and remove these extra =
I had gone through https://cloud.google.com/bigquery/docs/reference/standard-sql/user-defined-functions that explains how to use custom functions in BQ. As of now SQL UDF only supports SQL query, where as with JS UDF we can write our custom logic to add loops etc
Since JS UDF is very slow, using it has been ruled out and we only had to rely on SQL UDF.
I thought of using BQ Scripting(https://cloud.google.com/bigquery/docs/reference/standard-sql/scripting) in combination with SQL UDF but that doesn't seem to work. Looks like script has to be altogether different
I had explored stored procedures with BQ for the same, however, that is also not working. I'm not sure if I am doing it right
I've created a procedure like this:
CREATE PROCEDURE test.AddDelta(INOUT x INT64, delta INT64)
BEGIN
SET x = x + delta;
END;
I'm not able to use the above procedure like this:
with ta as (select 1 id union all select 2 id)
select id from ta;
call test.AddDelta(id, 1);
select id;
I'm wondering if there is a way to parse strings like this without using Javascript UDF
Disclaimer: My regex-fu is not good. definitely have a look at the re2 syntax
You should be able to do it with REGEXP_REPLACE
SELECT
payload,
REGEXP_REPLACE(payload,r'=(&)|=$','\\1') AS payload_clean
FROM
`myproject.mydataset.mytable`
example output:
payload
payload_clean
key1=val1&key2=val2&key3=val3=&key4=val4=
key1=val1&key2=val2&key3=val3&key4=val4
Executable example:
WITH
payload_table AS (
SELECT "key1=val1&key2=val2&key3=val3=&key4=val4" AS payload UNION ALL
SELECT "key5=val5&key6=val6=" AS payload UNION ALL
SELECT "key1=val1&key2=val2&key3=val3=&key4=val4=" AS payload UNION ALL
SELECT "key3=val3=abc&key4=val4" AS payload
)
SELECT
payload,
REGEXP_REPLACE(payload,r'(=val\pN)=(\pL*&)|=(&)|=$','\\1\\2') AS payload_clean
FROM
payload_table
Of course (=val\pN)=(\pL*&) in the pattern won't necessarily work for you since you probably have different patterns. If there are no patterns to match then I'm not sure how you will remove the extra '=' from your strings automatically.

ID of each data type in oracle

How can we find the ID of each Data type in oracle as we find in SQL Server using user_type_id?
I dont really understand what you are looking for, but the following queries, give a ton of information. You can filter them down in the where clauses, to get what you need.
SELECT
*
FROM
ALL_TAB_COLUMNS;
SELECT
*
FROM
ALL_TABLES;
SELECT
*
FROM
ALL_COLL_TYPES;
SELECT
*
FROM
ALL_IDENTIFIERS;
You can find the code associated with each built-in data type by looking it up in Oracle's built-in data type documentation or by using the DUMP function:
SELECT DUMP( 'abc' ) FROM DUAL;
Outputs:
| DUMP('ABC') |
| :--------------------- |
| Typ=96 Len=3: 97,98,99 |
I.e. A CHAR or NCHAR are fixed-length character data types and have a code of 96.
db<>fiddle here

SQL function to transform number with a certain pattern

I need for a SQL query to transform an int with a value between 1 to 300000 to a number which has this pattern : always 8 number.
For example:
1 becomes 00000001,
123 becomes 00000123,
123456 becomes 00123456.
I have no idea how to do that... How can I do it?
In Standard SQL, you can use this trick:
select substring(cast( (num + 100000000) as varchar(255)) from 2)
Few databases actually support this syntax. Any given database can do what you want, but the method depends on the database you are using.
For MS SQL Server
You could use FORMAT function, like this:
SELECT FORMAT(123,'00000000')
https://database.guide/how-to-format-numbers-in-sql-server/#:~:text=Starting%20from%20SQL%20Server%202012,the%20output%20should%20be%20formatted.
Read at the link Leading Zeroes
For MySql/Oracle
You could use LPAD, like this:
SELECT LPAD('123',8,'0')
https://database.guide/how-to-add-leading-zeros-to-a-number-in-mysql/

Function names case sensitivity in BigQuery

I'm learning the syntax of Google BigQuery, and currently, I'm reading documentation regarding identifiers and case sensitivity. I'm focused on standard SQL syntax of BigQuery.
Documentation says:
BigQuery follows these rules for case sensitivity:
Category | Case Sensitive?
Function names | No
But when I running the following statements in Console:
#standardSQL
create function cs_test.function_a (x int64, y int64) as (x*y);
create function cs_test.function_A (x int64, y int64) as (x-y);
select cs_test.function_a(5,6); -- 30
select cs_test.function_A(5,6); -- -1
two functions are created and different results are provided as a result of select statements.
At the same time if I run the following statements I get an error, which says that the function is not found:
create function cs_test.function_b (x int64, y int64) as (x+y);
select cs_test.function_B(5,6); -- NOK
Is the function name case insensitive in Google BigQuery? From the code snippets provided above it seems to be case sensitive.
Thank you.
What you've found is correct. Documentation has been updated to reflect it:
| Category | Case Sensitive? |
| Built-in Function names | No |
| User-Defined Function names | Yes |

String Concatenation issue in Spark SQL when using rtrim()

I am facing a peculiar or unknown concatenation problem during PySpark SQL query
spark.sql("select *,rtrim(IncomeCat)+' '+IncomeCatDesc as trimcat from Dim_CMIncomeCat_handled").show()
In this query both IncomeCat and IncomeCatDesc fields hold String type value so logically i thought it would concatenate but i get resultant field null
where the achievable result will be '14100abcd' where 14100 is IncomeCat part and abcd is IncomeCatdesc part . i have tried explicit casting as well on IncomeCat field
spark.sql("select *,cast(rtrim(IncomeCat) as string)+' '+IncomeCatDesc as IncomeCatAndDesc from Dim_CMIncomeCat_handled").show()
but I am getting same result. so am i something missing here. kindly help me to solve this
Spark doesn't override + operator for strings and as a result query you use doesn't express concatenation. If you take a look at the basic example you'll see what is going on:
spark.sql("SELECT 'a' + 'b'").explain()
== Physical Plan ==
*Project [null AS (CAST(a AS DOUBLE) + CAST(b AS DOUBLE))#48]
+- Scan OneRowRelation[]
Both arguments are assumed to be numeric and in general case the result will be undefined. Of course it will work for strings that can be casted to numerics:
spark.sql("SELECT '1' + '2'").show()
+---------------------------------------+
|(CAST(1 AS DOUBLE) + CAST(2 AS DOUBLE))|
+---------------------------------------+
| 3.0|
+---------------------------------------+
To concatenate strings you can use concat:
spark.sql("SELECT CONCAT('a', 'b')").show()
+------------+
|concat(a, b)|
+------------+
| ab|
+------------+
or concat_ws:
spark.sql("SELECT CONCAT_WS('*', 'a', 'b')").show()
+------------------+
|concat_ws(*, a, b)|
+------------------+
| a*b|
+------------------+