Storing extremely small values in Amazon Redshift - sql

I am creating a table in Amazon Redshift using the following command:
CREATE TABLE asmt.incorrect_question_pairs_unique
AS
SELECT question1,
question2,
occurrences,
occurrences / (SUM(occurrences)::FLOAT) OVER () AS prob_q1_q2
FROM (SELECT question1,
question2,
SUM(occurrences) AS occurrences
FROM asmt.incorrect_question_pairs
GROUP BY question1,
question2
HAVING SUM(occurrences) >= 50)
I also tried an alternate:
CREATE TABLE asmt.incorrect_question_pairs_unique
AS
SELECT question1,
question2,
occurrences,
occurrences::float / SUM(occurrences) OVER () AS prob_q1_q2
FROM (SELECT question1,
question2,
SUM(occurrences) AS occurrences
FROM asmt.incorrect_question_pairs
GROUP BY question1,
question2
HAVING SUM(occurrences) >= 50)
I'd like the column prob_q1_q2 to be a float column, which is why I am converting the denominator/numerator to float. But in the resulting table, I get all zeros in that column.
I would like to point out that the SUM(occurrences) would amount to about 10 Billion, so the column prob_q1_q2 will contain extremely small values. Is there a way to store such small values in Amazon Redshift?
How do I make sure that all the values in the column are non-zero float?
Any help would be appreciated.

METHOD 1 - I have had the same problem! In my case it was million of rows so I Multiplied the result by 10000. whenever I wanted to select values from that column I would divide by 10000 in the select statement to make it even. I know its not the perfect solution but works for me.
METHOD 2 - I created a sample table with Numeric(12,6) datatype and when I imported the result set similar to yours, I can see the float values upto 6 decimal precision.
I guess, the conversion does not work when you use create table AS command, you need to create the table specifying the datatype which enforces the result set to be stored to a certain precision level. Its odd! how the same select returns 0.00 but when inserted into table with enforced column, it returns 0.00333.
If I’ve made a bad assumption please comment and I’ll refocus my answer.

Patthebug,
You might be getting a way too low number which cannot be stored in the FLOAT type of Amazon Redshift. Try using DECIMAL instead, there is no way it cannot store your value it's a 128 bit variable.
The way it works is the following, if the value if too big or in your case too small and it exceeds the max/min value of your type the last digits are trimmed and then the new (trimmed) value is stored in the variable/column of your type.
When it is trimming a big value you lose almost nothing lets say you are trimming 20 cents out of 20 billion dollars, you wont be hurt much. But in your case when the number is too small you can loose everything when it trims the last digits to fit in the type
(f.e. A type can store up to 5 digits and you want to store a value of 0.000009 in a variable/column of this type. Your value cannot be fit in the type so its trimmed from the last 2 digits so it can be fit and you receive a new value of 0.0000 )
So if you followed my thought just changing the ::float to ::decimal should fix your issue.
P.S. decimal might require specifying it's size f.e. decimal(127,100)

Try:
select cast(num1 as float) / cast(num2 as float);
This will give you results upto 2 decimal places (by default), but takes up some of your processing time. Doing anything else will round-off the decimal part.

You can have up to 38 digits in a DECIMAL/NUMERIC column with of 37 digits of scale.
CREATE TEMP TABLE precision_test (test NUMERIC(38,37)) DISTSTYLE ALL
;
INSERT INTO precision_test
SELECT CAST( 0.0000000000000000000000000000000000001 AS NUMERIC(38,37)) test
;
SELECT * FROM precision_test
;
--Returns 0.0000000000000000000000000000000000001

Related

Why does Oracle seem to add extra decimal places when converting number using to_char?

I have a large Oracle table (millions of rows) with two columns of type NUMBER. I am trying to write a query to get the max number of decimal places on pl_pred (I expect this to be around 7 or 8). When I do a to_char on the column, there are extra decimal values showing up and it says the max is 18 when I'm only seeing around 4-7 when selecting on that column. Anyone know why as well as how to accurately assess what the max amount of decimal places is? I need to transfer this data to SQL Server and was trying to come up with the right precision and scale values for the numeric data type.
select
pl.pl_pred as pred_number,
to_char(pl_pred) as pred_char,
length(to_char(pl_pred)) as pred_len
from pollution pl
where length(to_char(pl_pred)) > 15
Results:
PRED_NUMBER PRED_CHAR PRED_LEN
4.6328 "4.6327999999999987" 18
5.8767 "5.8766999999999996" 18
11.19625 "11.196250000000001" 18
13.566375 "13.566374999999997" 18
Table:
CREATE TABLE RHT.POLLUTION
(
LOCATION_GROUP VARCHAR2(20 BYTE),
PL_DATE DATE,
PL_PRED NUMBER,
PL_SE NUMBER
)
Update (again): I ran the same query in SQL Developer and got this, where the two values are showing up exactly the same. So that's interesting. I went back and was able to look at the raw data and it does not match up, though. 4.63278 and 5.8767 is what I see. There are some longer ones, like 10.4820321428571. It's like it's treating Number like float but I thought Number in Oracle was exact.
"PRED_NUMBER" "PRED_CHAR" "PRED_LEN"
4.6327999999999987 "4.6327999999999987" 18
5.8766999999999996 "5.8766999999999996" 18
10.4820321428571 "10.4820321428571" 15
Raw data:
4.6328
5.8767
10.4820321428571

How to extract all (including int and float) numerical values in a string column in Google BigQuery?

I have a table Table_1 on Google BigQuery which includes a string column str_column. I would like to write a SQL query (compatible with Google BigQuery) to extract all numerical values in str_column and append them as new numerical columns to Table_1. For example, if str_column includes first measurement is 22 and the other is 2.5; I need to extract 22 and 2.5 and save them under new columns numerical_val_1 and numerical_val_2. The number of new numerical columns should ideally be equal to the maximum number of numerical values in str_column, but if that'd be too complex, extracting the first 2 numerical values in str_column (and therefore 2 new columns) would be fine too. Any ideas?
Consider below approach
select * from (
select str_column, offset + 1 as offset, num
from your_table, unnest(regexp_extract_all(str_column, r'\b([\d.]+)\b')) num with offset
)
pivot (min(num) as numerical_val for offset in (1,2,3))
if applied to sample data like in your question - output is

SQLite3 Order by highest/lowest numerical value

I am trying to do a query in SQLite3 to order a column by numerical value. Instead of getting the rows ordered by the numerical value of the column, the rows are ordered alphabetically by the first digit's numerical value.
For example in the query below 110 appears before 2 because the first digit (1) is less than two. However the entire number 110 is greater than 2 and I need that to appear after 2.
sqlite> SELECT digit,text FROM test ORDER BY digit;
1|one
110|One Hundred Ten
2|TWO
3|Three
sqlite>
Is there a way to make 110 appear after 2?
It seems like digit is a stored as a string, not as a number. You need to convert it to a number to get the proper ordering. A simple approach uses:
SELECT digit, text
FROM test
ORDER BY digit + 0

How to calculate the ratio of this column with 2 rows

I am very new to SQL and am having difficulty figuring out hot to divide row1 (101) by row2 (576).
COUNT
101
576
I want the output to be a single value expressed to 2 decimal places.
Any tips?
Thanks for the help
For two rows, it's easy.
If you have a big input table, and you want to divide the first row by the second, the third row by the fourth, etc, then you need an ordering column to save yourself.
So, with a two-row table (remember, tables are never ordered), you just rely on the fact that you divide the smaller number by the bigger number.
Here goes:
WITH
-- your input ...
input(counter) AS ( -- count is reserved word, use another name ...
SELECT 101
UNION ALL SELECT 576
)
-- cheat and just divide the smaller by the bigger
-- as "#Gordon Linoff" suggests
-- force a float division by adding a non-integer operand
-- and hard-cast it to DECIMAL(5,2)
SELECT
CAST(
MIN(counter) * 1.00 / MAX(counter)
AS DECIMAL(5,2)
) AS result
FROM input;
-- out result
-- out ----------
-- out 0.18
If, however, you have many rows, and you always need to divide the first row by the second, the third row by the fourth, that is, each odd row in the order by the next even row in the order, then you need an ordering column.
Is your problem just what you suggested, or is there more to it?
There is no such thing as row "1" or "2" in a table. Tables represent unordered sets, so without a column specifying the ordering, there is no first or second row.
You can use aggregation to divide by min by the max:
select min(count) * 1.0 / max(count)
from t;
Note the * 1.0. Postgres does integer division, so you want to convert to something with a decimal point.

Multiplying a varchar and a decimal field together

I have a table that contains the field as:
doses_given decimal(9,2)
that I want to multiply against this field:
drug_units_per_dose varchar(255)
So I did something like this:
CAST(ppr.drug_units_per_dose as decimal(9,2)) * doses_given dosesGiven,
However, looking at the data, I notice some odd characters:
select distinct(drug_units_per_dose) from patient_prescr
NULL
1
1-2
1-4
1.5
1/2
1/4
10
12
15
1½
2
2-3
2.5
20
2½
3
3-4
30
4
5
6
7
8
½
As you can see, I am getting some characters that cannot be CAST to decimal. On the web page these fields are interpreted as a small 1/2 symbol:
Is there anyway to replace the ½ field with a .5 to accurately complete the multiplication?
The 1/2 symbol is ascii character 189, so to replace:
CAST(REPLACE(ppr.drug_units_per_dose,char(189),'.5') as decimal(9,2)) * doses_given dosesGiven
You have a rather nasty problem. You have a field drug_units_per_dose that a normal human being would consider to be an integer or floating point number. Clearly, the designers of your database are super-normal people, and they understand a much wider range of options for this concept.
I say that partly tongue in cheek, but to make an important point. The column in the database does not represent a number, at least not in all cases. I would suggest that you have a translation table for drug_units_per_dose. It would have columns such as:
1 1
1/2 0.5
3-4 ??
I realize that you will have hundreds of rows, and a lot of them will look redundant because they will be "50,50" and "100,100". However, if you want to keep control of the business logic for turning these strings into a number, then a lookup table seems like the sanest approach.
CAST(prod.em_amccom_comset AS int) * invline.qtyinvoiced AS setcredits
syntax - CAST(char_value as int) * integer_value as alias_name