how to truncate decimal places in databricks without rounding off in databricks?

how to truncate decimal places in databricks without rounding off in databricks? - sql

select truncate(12.455555,2)
I was trying to truncate the decimal value in the database from databricks but it was giving me the following error. it gave the same error when I tried executing a simple statement for trimming the decimal places given above.
Error-
Error in SQL statement: AnalysisException: Undefined function: 'truncate'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7
Can anyone tell me how we can truncate the decimal places without rounding off the decimals?

You can use substring.
spark.sql("select substring(12.455555,0, instr(12.455555,'.')+2) as out").show()
+-----+
| out|
+-----+
|12.45|
+-----+

If you want to do that for printing the data, then you can use format_number function (doc), like this:
SELECT format_number(12332.123456, '#.###');

Related

Semantic difference between standard SQL and Teradata cast syntax

I'm trying to understand the difference between the following two syntaxes, which appear to be doing the same thing when casting strings to integer:
SELECT CAST('1' AS INTEGER), '1' (INTEGER)
Resulting in:
|'1'|'1'|
|---|---|
|1 |1 |
But they don't do the same thing when chaining the conversion:
SELECT CAST(CAST('1' AS INTEGER) AS VARCHAR(3)), ('1' (INTEGER)) (VARCHAR(3))
I'm now getting:
|'1'|'1'|
|---|---|
|1 | |
The second column contains an empty string, not null. Is there a semantic difference between the two syntaxes, or is this just a bug?
I'm using version 16.20.05.01

As mentioned in the comments (no one wanted to answer?), there's documented difference
of behaviour in the section "How CAST Differs from Teradata Conversion Syntax"
Specifically:
Using Teradata conversion syntax (that is, not using CAST) for explicit conversion of numeric -to-character data requires caution.
The process is as follows:
Convert the numeric value to a character string using the default or specified FORMAT for the numeric value.
Leading and trailing pad characters are not trimmed.
Extend to the right with pad characters if required, or truncate from the right if required, to conform to the target length specification.
If non-pad characters are truncated, no string truncation error is reported.

Using try_cast in snowflake to deal with very long numbers

I'm using try_cast in snowflake to convert any long values in sql to NULL.
Here is my code:
When I try running the above code, I'm getting the error as below:
I'm flattening a JSON array and using try_cast to make any large values to NULL because I was getting an error Failed to cast variant value {numberLong: -8301085358432}
SELECT try_cast(item.value:price) as item_price,
try_cast(item.value:total_price_bill) as items_total_price
FROM table, LATERAL FLATTEN(input => products) item
Error:
SQL compilation error error at line 1 at position ')'.
I don't understand where I'm doing wrong

you are using wrong syntax for try_cast. according to snowflake documentations the syntax is :
TRY_CAST( <source_string_expr> AS <target_data_type> )
and also note:
Only works for string expressions.
target_data_type must be one of the following:
VARCHAR (or any of its synonyms)
NUMBER (or any of its synonyms)
DOUBLE
BOOLEAN
DATE
TIME
TIMESTAMP, TIMESTAMP_LTZ, TIMESTAMP_NTZ, or TIMESTAMP_TZ
so for example you have to have something like this if item.value:price is string:
select try_cast(item.value:price as NUMBER) as item_price,
....

merge databricks SQL of different decimal datatypes

I have a Databricks SQL search that results in a data type decimal(18,0)
I want to append the results of this search into an existing table (df.write.format("delta").mode("append").save("a_path")) but cannot because it has a data type of decimal(38,18)
When I try to append, the error I get is:
AnalysisException: Failed to merge fields 'id' and 'id'. Failed to merge decimal types with incompatible precision 38 and 18 & scale 18 and 0;
Is there a way around this?
I tried to cast the result of the search to a decimal(38,18) select cast(id decimal(38,18))... but this did not work
Any suggestions

As a work around, I converted the SQL search columns into decimal type in pyspark, and then continued to merge:
query="""select * from ..."""
df=spark.sql(query)
df=df.withColumn("id",df["id"].cast(DecimalType(38,18)))
df.write.format("delta").mode("append").save("a_path")

SQL Code Error converting data type varchar to float

The following code encounters an error when executed in Microsoft Server Management Studion:
USE [DST]
GO
Select
CAST([Balance] as float)
FROM [RAW_XXX]
WHERE ISNUMERIC(Balance) = 1
Msg 8114, Level 16, State 5, Line 2
Error converting data type varchar to float.
I thought that the ISNUMERIC would exclude anything that can not be cast or converted.
It is a massive database in SQLServer 2012 so I am unsure how to find the data that is causing the error.

Use TRY_CONVERT to flush out the offending records:
SELECT *
FROM [RAW_XXX]
WHERE TRY_CONVERT(FLOAT, Balance) IS NULL;
The issue with your current logic is that something like $123.45 would be true according to ISNUMERIC, but would fail when trying to cast as floating point.
By the way, if you wanted a more bare bones way of finding records not castable to float you could just rely on LIKE:
SELECT *
FROM [RAW_XXX]
WHERE Balance NOT LIKE '%[^0-9.]%' AND Balance NOT LIKE '%.%.%';
The first LIKE condition ensures that Balance consists only of numbers and decimal points, and the second condition ensures that at most one decimal point appears. Checkout the demo below to see this working.
Demo

Invalid digits on Redshift

I'm trying to load some data from stage to relational environment and something is happening I can't figure out.
I'm trying to run the following query:
SELECT
CAST(SPLIT_PART(some_field,'_',2) AS BIGINT) cmt_par
FROM
public.some_table;
The some_field is a column that has data with two numbers joined by an underscore like this:
some_field -> 38972691802309_48937927428392
And I'm trying to get the second part.
That said, here is the error I'm getting:
[Amazon](500310) Invalid operation: Invalid digit, Value '1', Pos 0,
Type: Long
Details:
-----------------------------------------------
error: Invalid digit, Value '1', Pos 0, Type: Long
code: 1207
context:
query: 1097254
location: :0
process: query0_99 [pid=0]
-----------------------------------------------;
Execution time: 2.61s
Statement 1 of 1 finished
1 statement failed.
It's literally saying some numbers are not valid digits. I've already tried to get the exactly data which is throwing the error and it appears to be a normal field like I was expecting. It happens even if I throw out NULL fields.
I thought it would be an encoding error, but I've not found any references to solve that.
Anyone has any idea?
Thanks everybody.

I just ran into this problem and did some digging. Seems like the error Value '1' is the misleading part, and the problem is actually that these fields are just not valid as numeric.
In my case they were empty strings. I found the solution to my problem in this blogpost, which is essentially to find any fields that aren't numeric, and fill them with null before casting.
select cast(colname as integer) from
(select
case when colname ~ '^[0-9]+$' then colname
else null
end as colname
from tablename);
Bottom line: this Redshift error is completely confusing and really needs to be fixed.

When you are using a Glue job to upsert data from any data source to Redshift:
Glue will rearrange the data then copy which can cause this issue. This happened to me even after using apply-mapping.
In my case, the datatype was not an issue at all. In the source they were typecast to exactly match the fields in Redshift.
Glue was rearranging the columns by the alphabetical order of column names then copying the data into Redshift table (which will
obviously throw an error because my first column is an ID Key, not
like the other string column).
To fix the issue, I used a SQL query within Glue to run a select command with the correct order of the columns in the table..
It's weird why Glue did that even after using apply-mapping, but the work-around I used helped.
For example: source table has fields ID|EMAIL|NAME with values 1|abcd#gmail.com|abcd and target table has fields ID|EMAIL|NAME But when Glue is upserting the data, it is rearranging the data by their column names before writing. Glue is trying to write abcd#gmail.com|1|abcd in ID|EMAIL|NAME. This is throwing an error because ID is expecting a int value, EMAIL is expecting a string. I did a SQL query transform using the query "SELECT ID, EMAIL, NAME FROM data" to rearrange the columns before writing the data.

Hmmm. I would start by investigating the problem. Are there any non-digit characters?
SELECT some_field
FROM public.some_table
WHERE SPLIT_PART(some_field, '_', 2) ~ '[^0-9]';
Is the value too long for a bigint?
SELECT some_field
FROM public.some_table
WHERE LEN(SPLIT_PART(some_field, '_', 2)) > 27
If you need more than 27 digits of precision, consider a decimal rather than bigint.

If you get error message like “Invalid digit, Value ‘O’, Pos 0, Type: Integer” try executing your copy command by eliminating the header row. Use IGNOREHEADER parameter in your copy command to ignore the first line of the data file.
So the COPY command will look like below:
COPY orders FROM 's3://sourcedatainorig/order.txt' credentials 'aws_access_key_id=<your access key id>;aws_secret_access_key=<your secret key>' delimiter '\t' IGNOREHEADER 1;

For my Redshift SQL, I had to wrap my columns with Cast(col As Datatype) to make this error go away.
For example, setting my columns datatype to Char with a specific length worked:
Cast(COLUMN1 As Char(xx)) = Cast(COLUMN2 As Char(xxx))

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

how to truncate decimal places in databricks without rounding off in databricks? - sql

You can use substring. spark.sql("select substring(12.455555,0, instr(12.455555,'.')+2) as out").show() +-----+ | out| +-----+ |12.45| +-----+

If you want to do that for printing the data, then you can use format_number function (doc), like this: SELECT format_number(12332.123456, '#.###');

Related

Semantic difference between standard SQL and Teradata cast syntax

Using try_cast in snowflake to deal with very long numbers

merge databricks SQL of different decimal datatypes

SQL Code Error converting data type varchar to float

Invalid digits on Redshift

Categories

Resources