How can I generate a cryptographically secure number in SQL Server? - sql

I am currently using guid NEWID() but I know it is not cryptographically secure.
Is there any better way of generating a cryptographically secure number in SQL Server?

CRYPT_GEN_RANDOM is documented to return a "cryptographic random number".
It takes a length parameter between 1 and 8000 which is the length of the number to return in bytes.
For lengths <= 8 bytes. This can be cast to one of the SQL Server integer types straightforwardly.
+-----------+------------------+---------+
| Data type | Range | Storage |
+-----------+------------------+---------+
| bigint | -2^63 to 2^63-1 | 8 Bytes |
| int | -2^31 to 2^31-1 | 4 Bytes |
| smallint | -2^15 to 2^15-1 | 2 Bytes |
| tinyint | 0 to 255 | 1 Byte |
+-----------+------------------+---------+
Three of them are signed integers and one unsigned. The following will each use the full range of their respective datatypes.
SELECT
CAST(CRYPT_GEN_RANDOM(1) AS TINYINT),
CAST(CRYPT_GEN_RANDOM(2) AS SMALLINT),
CAST(CRYPT_GEN_RANDOM(4) AS INT),
CAST(CRYPT_GEN_RANDOM(8) AS BIGINT)
It is also possible to supply a shorter value than the datatype storage.
SELECT CAST(CRYPT_GEN_RANDOM(3) AS INT)
In this case only positive numbers can be returned. The sign bit will always be 0 as the last byte is treated as 0x00. The range of possible numbers that can be returned by the above is between 0 and POWER(2, 24) - 1 inclusive.
Suppose the requirement is to generate some random number between 1 and 250.
One possible way of doing it would be
SELECT ( 1 + CAST(CRYPT_GEN_RANDOM(1) AS TINYINT) % 250) AS X
INTO #T
FROM master..spt_values V1, master..spt_values
However this method has a problem.
SELECT COUNT(*),X
FROM #T
GROUP BY X
ORDER BY X
The first ten rows of results are
+-------+----+
| Count | X |
+-------+----+
| 49437 | 1 |
| 49488 | 2 |
| 49659 | 3 |
| 49381 | 4 |
| 49430 | 5 |
| 49356 | 6 |
| 24914 | 7 |
| 24765 | 8 |
| 24513 | 9 |
| 24732 | 10 |
+-------+----+
Lower numbers (in this case 1 -6) are generated twice as regularly as the others because there are two possible inputs to the modulus function that can generate each of those results.
One possible solution would be to discard all numbers >= 250
UPDATE #T
SET X = CASE
WHEN Random >= 250 THEN NULL
ELSE ( 1 + Random % 250 )
END
FROM #T
CROSS APPLY (SELECT CAST(CRYPT_GEN_RANDOM(1) AS TINYINT)) CA (Random)
This appears to work on my machine but it is probably not guaranteed that SQL Server will only evaluate the function once across both references to Random in the CASE expression. Additionally it still leaves the problem of needing second and subsequent passes to fix up the NULL rows where the random value was discarded.
Declaring a scalar UDF can solve both those issues.
/*Work around as can't call CRYPT_GEN_RANDOM from a UDF directly*/
CREATE VIEW dbo.CRYPT_GEN_RANDOM1
AS
SELECT CAST(CRYPT_GEN_RANDOM(1) AS TINYINT) AS Random
go
CREATE FUNCTION GET_CRYPT_GEN_RANDOM1()
RETURNS TINYINT
AS
BEGIN
DECLARE #Result TINYINT
WHILE (#Result IS NULL OR #Result >= 250)
/*Not initialised or result to be discarded*/
SELECT #Result = Random FROM dbo.CRYPT_GEN_RANDOM1
RETURN #Result
END
And then
UPDATE #T
SET X = dbo.GET_CRYPT_GEN_RANDOM1()
Alternatively and more straight forwardly one could simply use
CAST(CRYPT_GEN_RANDOM(8) AS BIGINT) % 250
On the grounds that the range of bigint is so huge that any bias will likely be insignificant. There are 73,786,976,294,838,208 ways that 1 can be generated and 73,786,976,294,838,206 that 249 can be from the query above.
If even that small possible bias is not permitted you could discard any values NOT BETWEEN -9223372036854775750 AND 9223372036854775749 as shown earlier.

Interesting question :)
I think this will work: CRYPT_GEN_RANDOM

Related

Replace values in a column for all rows

I have a column with entries like:
column:
156781
234762
780417
and would like to have the following:
column:
0000156781
0000234762
0000780417
For this I use the following query:
Select isnull(replicate('0', 10 - len(column)),'') + rtrim(column) as a from table)
However, I don't know how to replace the values in the whole column.
I already tried with:
UPDATE table
SET column= (
Select isnull(replicate('0', 10 - len(column)),'') + rtrim(column) as columnfrom table)
But I get the following error.
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
The answer to your question is going to depend on the data type of your column. If it is a text column for example VARCHAR then you can modify the value in the table. If it is a number type such as INT it is the value and not the characters which is stored.
We can also express this by saying that "0" + "1" = "01" whilst 0 + 1 = 1.
In either case we can format the value in a query.
create table numberz(
val1 int,
val2 varchar(10));
insert into numberz values
(156781,'156781'),
(234762,'234762'),
(780417,'780417');
/* required format
0000156781
0000234762
0000780417
*/
select * from numberz;
GO
val1 | val2
-----: | :-----
156781 | 156781
234762 | 234762
780417 | 780417
UPDATE numberz
SET val1 = isnull(
replicate('0',
10 - len(val1)),'')
+ rtrim(val1),
val2 = isnull(
replicate('0',
10 - len(val2)),'')
+ rtrim(val2);
GO
3 rows affected
select * from numberz;
GO
val1 | val2
-----: | :---------
156781 | 0000156781
234762 | 0000234762
780417 | 0000780417
select isnull(
replicate('0',
10 - len(val1)),'')
+ rtrim(val1)
from numberz
GO
| (No column name) |
| :--------------- |
| 0000156781 |
| 0000234762 |
| 0000780417 |
db<>fiddle here
Usually, when we need to show values in specificity format these processes are performed using the CASE command or with other functions on the selection field list, mean without updating. In such cases, we can change our format to any format and anytime with changing functions. As dynamic fields.
For example:
select id, lpad(id::text, 6, '0') as format_id from test.test_table1
order by id
Result:
id format_id
-------------
1 000001
2 000002
3 000003
4 000004
5 000005
Maybe you really need an UPDATE, so I wrote a sample query for an UPDATE command too.
update test.test_table1
set
id = lpad(id::text, 6, '0');

listagg produces ORA-01489 if used as window function in conditional expression

My query returns many (thousands of) rows.
Column l has certain value for very small amount of rows (up to 10).
For each such row I want to output aggregated comma-separated values of very short (up to 5 chars) varchar column v over all of these rows.
For rows not having the special value of l I want to simply output the v value for that row.
Synthetized example of same problem: from first 10000 integers, I want to output 1,2,3,4,5,6,7,8,9 for each single-digit number; that number for multiple-digit number. (Yes, silly example but real case makes sense.)
with x (v,l) as (
select to_char(level), length(to_char(level)) from dual connect by level <= 10000
)
select case l
when 1 then listagg(v,',') within group (order by v) over (partition by l)
else v
end
from x
order by 1;
The problem is, listagg function fails on ORA-01489: result of string concatenation is too long error.
I am aware of 4000 char limit of listagg function as well as xmlagg-based workaround. I just don't get the limit is enough for data I want to concatenate even though not enough for all data. In example above, the partition of 9 single-digit numbers fits into 4000 chars, the partition of 9000 four-digit numbers not. I expected the case expression would prevent execution of window for unrelated rows but, for some reason, it seems the db engine evaluates window for all rows. (Also note that order by clause causes query to fail-fast - without it some rows are returned before failure.)
Can you please explain some reasoning for this behaviour? I suspect the window computation is logically before select clause but without any evidence. Reproduced on Oracle 11g, 18c and 19 (livesql).
Well you are using SQL which is not procedural, so you can't expect that some parts of the code path will not be executed, only because they are not used. (So filling a bug as other suggested will have no success).
Anyway you can do the often used trick based on the fact that listagg ignores null values.
So this formulation works fine:
with x (v,l) as (
select to_char(level), length(to_char(level)) from dual connect by level <= 10000
)
select nvl(listagg(case when l = 1 then v end,',') within group (order by v) over (partition by l),v) lst
from x
order by 1;
giving
LST
------------------
1,2,3,4,5,6,7,8,9
1,2,3,4,5,6,7,8,9
..
10
100
1000
10000
The explanation of the problem can be found in the execution plan (showing only the relevant part)
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 35 | 4 (50)| 00:00:01 |
| 1 | SORT ORDER BY | | 1 | 35 | 4 (50)| 00:00:01 |
| 2 | WINDOW SORT | | 1 | 35 | 4 (50)| 00:00:01 |
| 3 | VIEW | | 1 | 35 | 2 (0)| 00:00:01 |
|* 4 | CONNECT BY WITHOUT FILTERING| | | | | |
| 5 | FAST DUAL | | 1 | | 2 (0)| 00:00:01 |
----------------------------------------------------------------------------------------
...
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=1) CASE "L" WHEN 1 THEN LISTAGG("V",',') WITHIN GROUP ( ORDER BY
"V") OVER ( PARTITION BY "L") ELSE "V" END [4000]
2 - (#keys=2) "L"[NUMBER,22], "V"[VARCHAR2,40], LISTAGG("V",',') WITHIN
GROUP ( ORDER BY "V") OVER ( PARTITION BY "L")[4000]
3 - "V"[VARCHAR2,40], "L"[NUMBER,22]
4 - LEVEL[4]
So in the line 2 the listagg is calculated (for all rows) only to be filtered in the line 1.
It is odd that you do get an error about the 4000 character limit even though no result is longer than 4000 characters. Maybe you could file this as a bug to Oracle Support.
Another workaround is to make use of the ON OVERFLOW logic of the LISTAGG function if you are on Oracle 12.2 or higher. Using LISTAGG (v, ',' ON OVERFLOW TRUNCATE) in the query allows the query to be run without error and does not truncate any values (at least in the example).

Reference the output of a calculated column in Hive SQL

I have a self-referencing/recursive calculation in Excel that needs to be moved to Hive SQL. Basically the column needs to SUM the two values only if the total of the concrete column plus the result from the previous calculation is greater than 0.
The data is as follows, A is the value and B is the expected output:
| A | B |
|-----|-----|
| -1 | 0 |
| 2 | 2 |
| -2 | 0 |
| 2 | 2 |
| 2 | 4 |
| -1 | 3 |
| 2 | 5 |
In Excel it would be written in column B as:
=MAX(0,B1+A2)
The problem in SQL is you need to have the output of the current calculation. I think I've got it sorted in SQL as the following:
DECLARE #Numbers TABLE(A INT, Rn INT)
INSERT INTO #Numbers VALUES (-1,1),(2,2),(-2,3),(2,4),(2,5),(-1,6),(2,7);
WITH lagged AS
(
SELECT A, 0 AS B, Rn
FROM #Numbers
WHERE Rn = 1
UNION ALL
SELECT i.A,
CASE WHEN ((i.A + l.B) >= 0) THEN (i.A + l.B)
ELSE l.B
END,
i.Rn
FROM #Numbers i INNER JOIN lagged l
ON i.Rn = l.Rn + 1
)
SELECT *
FROM lagged;
But this being Hive, it doesn't support CTEs so I need to dumb the SQL down a touch. Is that possible using LAG/LEAD? My brain is hurting having got this far!
I initially thought that it would help to first compute the Sum of all elements until each rank and then fix the values somehow using negative elements.
However, one big negative that would zero the B column will carry forward in the sum and will make all following elements negative.
It's as Gordon commented - 0 is max in the calculation =MAX(0,B1+A2) depends on the previous location where it happened and it seems to be impossible to compute them in advance analytically.

How to convert string to number based on units

I am trying to change the following strings into their respective numerical values, by identifying the units (millions or billions) and then multiplying accordingly. I believe I am having issues with the variable types but can't seem to find a solution. Any tips?
1.44B to 1,440,000,000
1.564M to 1,564,000
UPDATE [_ParsedXML_Key_Stats]
SET [Value] = CASE
WHEN right(rtrim([_ParsedXML_Key_Stats].[Value]),1) = 'B' And [_ParsedXML_Key_Stats].[NodeName] = 'EBITDA'
THEN substring(rtrim([_ParsedXML_Key_Stats].[Value]),1,len([_ParsedXML_Key_Stats].[Value])-1) * 1000000000
WHEN right(rtrim([_ParsedXML_Key_Stats].[Value]),1) = 'M' And [_ParsedXML_Key_Stats].[NodeName] = 'EBITDA'
THEN substring(rtrim([_ParsedXML_Key_Stats].[Value]),1,len([_ParsedXML_Key_Stats].[Value])-1) * 1000000
ELSE 0
END
With your original query I got a conversion error as the multiplication was treating the decimal value as an int, I guess you might have experienced the same problem.
One remedy that fixed it was to turn the factor into a decimal by adding .0 to it.
If you want to get the number formatted with commas you can use format function like so: FORMAT(CAST(value AS DECIMAL), 'N0') (be sure to specify appropriate length and precision for the decimal type).
Sample test data and output from SQL Fiddle below:
SQL Fiddle
MS SQL Server 2014 Schema Setup:
CREATE TABLE [_ParsedXML_Key_Stats] (value VARCHAR(50), NodeName VARCHAR(50));
INSERT [_ParsedXML_Key_Stats] VALUES
('111', 'SOMETHING ELSE'),
('999', 'EBITDA'),
('47.13B', 'EBITDA'),
('1.44B', 'EBITDA'),
('1.564M', 'EBITDA');
WITH cte AS
(
SELECT
Value,
CAST(LEFT([Value],LEN([Value])-1) AS DECIMAL(28,6)) AS newValue,
RIGHT(RTRIM([Value]),1) AS c
FROM [_ParsedXML_Key_Stats]
WHERE [NodeName] = 'EBITDA'
AND RIGHT(RTRIM([Value]),1) IN ('B','M')
)
UPDATE cte
SET [Value] =
CASE
WHEN c = 'B' THEN newValue * 1000000000.0
WHEN c = 'M' THEN newValue * 1000000.0
END;
Query 1:
SELECT *, FORMAT(CAST(Value AS DECIMAL(18,0)),'N0') AS formattedValue
FROM _ParsedXML_Key_Stats
Results:
| value | NodeName | formattedValue |
|--------------------|----------------|----------------|
| 111 | SOMETHING ELSE | 111 |
| 999 | EBITDA | 999 |
| 47130000000.000000 | EBITDA | 47,130,000,000 |
| 1440000000.000000 | EBITDA | 1,440,000,000 |
| 1564000.000000 | EBITDA | 1,564,000 |

Oracle - Integer datatype precision max? I'm able to enter more than 38 numbers into a integer field

I have an integer column, and according to others an integer is supposed to have a precision of 38 and is basically an alias for the type delcaration of Number(38)
I'm sure I'm missing something, but how am I able to enter 128 digits into an INTEGER column?
CREATE TABLE TEST
(
ID_INT INTEGER NOT NULL
);
insert into test( id_int)
values ( '0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890129010');
Version: Oracle 11
You can insert the row. But the data itself is then truncated. For example, note that the last 2 digits are lost when I query the data
SQL> insert into test( id_int )
2 values( 123456789012345678901234567890123456789012 );
1 row created.
SQL> select id_int from test;
ID_INT
------------------------------------------------------
123456789012345678901234567890123456789000
INTEGER is actually an alias for NUMBER (which can also be written as NUMBER(*)) and not NUMBER(38). NUMBER on its own means no precision and you can store any value. The 38 is a guarantee of 38 digits of precision to allow portability between different systems running Oracle though it will happily allow numbers that are a lot higher - just don't expect it to always port correctly if you ever have to. I created a test table:
create table TESTNUM
(
ID_INT integer
,ID_NUM38 number(38)
,ID_NUM number(*)
);
And here is a query to show the precisions stored:
select CNAME, COLTYPE, WIDTH, SCALE, PRECISION
from COL
where TNAME = 'TESTNUM';
I get back:
+----------+---------+-------+-------+-----------+
| CNAME | COLTYPE | WIDTH | SCALE | PRECISION |
+----------+---------+-------+-------+-----------+
| ID_INT | NUMBER | 22 | 0 | |
| ID_NUM38 | NUMBER | 22 | 0 | 38 |
| ID_NUM | NUMBER | 22 | | |
+----------+---------+-------+-------+-----------+
I believe precision refers to maximum number of significant digits
Not the same thing as only allowing 38 length number.
See here for explanation of significant digits.