After Success Execution i got error :Arithmetic overflow error converting expression to data type int - sql

I am reading an sql article on how to cleverly use sql queries ,i get the bellow article which is calculating factorial and its working for number 5 but when i change to another number bigger than 12 it push the error:Msg 8115, Level 16, State 2, Line 1
Arithmetic overflow error converting expression to data type int.
I think about 12 is the sum of numbers<=5 which are 5 4 3 2
the bellow query is working for all num<12 but when i change num to bigger than 12
it does not work even if i change the condition "num<"
with fact as (
select 1 as fac, 1 as num
union all
select fac*(num+1), num+1
from fact
where num<12)
select fac
from fact
where num=5

Your code is a little strange -- counting up and then using the outer where to stop.
I would just put in one number and count down. If you use decimal(38, 0), you can go much higher:
with fact as (
select cast(1 as decimal(38, 0)) as fac, 33 as num
union all
select fac * num, num - 1
from fact
where num > 0
)
select max(fac)
from fact;
This returns 8,683,317,618,811,886,495,518,194,401,280,000,000. That seems sufficiently large for most practical application.

The result of the calculation when using the number 13 is greater than the range of values an integer can hold
Instead you will have to convert to BIGINT
with fact as (
select CONVERT(BIGINT,1) as fac,
1 as num
union all
select CONVERT(BIGINT,fac*(num+1)),
num+1
from fact
where num<13
)
select fac
from fact
where num=13

Related

WHILE Window Operation with Different Starting Point Values From Column - SQL Server [duplicate]

In SQL there are aggregation operators, like AVG, SUM, COUNT. Why doesn't it have an operator for multiplication? "MUL" or something.
I was wondering, does it exist for Oracle, MSSQL, MySQL ? If not is there a workaround that would give this behaviour?
By MUL do you mean progressive multiplication of values?
Even with 100 rows of some small size (say 10s), your MUL(column) is going to overflow any data type! With such a high probability of mis/ab-use, and very limited scope for use, it does not need to be a SQL Standard. As others have shown there are mathematical ways of working it out, just as there are many many ways to do tricky calculations in SQL just using standard (and common-use) methods.
Sample data:
Column
1
2
4
8
COUNT : 4 items (1 for each non-null)
SUM : 1 + 2 + 4 + 8 = 15
AVG : 3.75 (SUM/COUNT)
MUL : 1 x 2 x 4 x 8 ? ( =64 )
For completeness, the Oracle, MSSQL, MySQL core implementations *
Oracle : EXP(SUM(LN(column))) or POWER(N,SUM(LOG(column, N)))
MSSQL : EXP(SUM(LOG(column))) or POWER(N,SUM(LOG(column)/LOG(N)))
MySQL : EXP(SUM(LOG(column))) or POW(N,SUM(LOG(N,column)))
Care when using EXP/LOG in SQL Server, watch the return type http://msdn.microsoft.com/en-us/library/ms187592.aspx
The POWER form allows for larger numbers (using bases larger than Euler's number), and in cases where the result grows too large to turn it back using POWER, you can return just the logarithmic value and calculate the actual number outside of the SQL query
* LOG(0) and LOG(-ve) are undefined. The below shows only how to handle this in SQL Server. Equivalents can be found for the other SQL flavours, using the same concept
create table MUL(data int)
insert MUL select 1 yourColumn union all
select 2 union all
select 4 union all
select 8 union all
select -2 union all
select 0
select CASE WHEN MIN(abs(data)) = 0 then 0 ELSE
EXP(SUM(Log(abs(nullif(data,0))))) -- the base mathematics
* round(0.5-count(nullif(sign(sign(data)+0.5),1))%2,0) -- pairs up negatives
END
from MUL
Ingredients:
taking the abs() of data, if the min is 0, multiplying by whatever else is futile, the result is 0
When data is 0, NULLIF converts it to null. The abs(), log() both return null, causing it to be precluded from sum()
If data is not 0, abs allows us to multiple a negative number using the LOG method - we will keep track of the negativity elsewhere
Working out the final sign
sign(data) returns 1 for >0, 0 for 0 and -1 for <0.
We add another 0.5 and take the sign() again, so we have now classified 0 and 1 both as 1, and only -1 as -1.
again use NULLIF to remove from COUNT() the 1's, since we only need to count up the negatives.
% 2 against the count() of negative numbers returns either
--> 1 if there is an odd number of negative numbers
--> 0 if there is an even number of negative numbers
more mathematical tricks: we take 1 or 0 off 0.5, so that the above becomes
--> (0.5-1=-0.5=>round to -1) if there is an odd number of negative numbers
--> (0.5-0= 0.5=>round to 1) if there is an even number of negative numbers
we multiple this final 1/-1 against the SUM-PRODUCT value for the real result
No, but you can use Mathematics :)
if yourColumn is always bigger than zero:
select EXP(SUM(LOG(yourColumn))) As ColumnProduct from yourTable
I see an Oracle answer is still missing, so here it is:
SQL> with yourTable as
2 ( select 1 yourColumn from dual union all
3 select 2 from dual union all
4 select 4 from dual union all
5 select 8 from dual
6 )
7 select EXP(SUM(LN(yourColumn))) As ColumnProduct from yourTable
8 /
COLUMNPRODUCT
-------------
64
1 row selected.
Regards,
Rob.
With PostgreSQL, you can create your own aggregate functions, see http://www.postgresql.org/docs/8.2/interactive/sql-createaggregate.html
To create an aggregate function on MySQL, you'll need to build an .so (linux) or .dll (windows) file. An example is shown here: http://www.codeproject.com/KB/database/mygroupconcat.aspx
I'm not sure about mssql and oracle, but i bet they have options to create custom aggregates as well.
You'll break any datatype fairly quickly as numbers mount up.
Using LOG/EXP is tricky because of numbers <= 0 that will fail when using LOG. I wrote a solution in this question that deals with this
Using CTE in MS SQL:
CREATE TABLE Foo(Id int, Val int)
INSERT INTO Foo VALUES(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)
;WITH cte AS
(
SELECT Id, Val AS Multiply, row_number() over (order by Id) as rn
FROM Foo
WHERE Id=1
UNION ALL
SELECT ff.Id, cte.multiply*ff.Val as multiply, ff.rn FROM
(SELECT f.Id, f.Val, (row_number() over (order by f.Id)) as rn
FROM Foo f) ff
INNER JOIN cte
ON ff.rn -1= cte.rn
)
SELECT * FROM cte
Not sure about Oracle or sql-server, but in MySQL you can just use * like you normally would.
mysql> select count(id), count(id)*10 from tablename;
+-----------+--------------+
| count(id) | count(id)*10 |
+-----------+--------------+
| 961 | 9610 |
+-----------+--------------+
1 row in set (0.00 sec)

Possible explanation on WITH RECURSIVE Query Postgres

I have been reading around With Query in Postgres. And this is what I'm surprised with
WITH RECURSIVE t(n) AS (
VALUES (1)
UNION ALL
SELECT n+1 FROM t WHERE n < 100
)
SELECT sum(n) FROM t;
I'm not able to understand how does the evaluation of the query work.
t(n) it sound like a function with a parameter. how does the value of n is passed.
Any insight on how the break down happen of the recursive statement in SQL.
This is called a common table expression and is a way of expressing a recursive query in SQL:
t(n) defines the name of the CTE as t, with a single column named n. It's similar to an alias for a derived table:
select ...
from (
...
) as t(n);
The recursion starts with the value 1 (that's the values (1) part) and then recursively adds one to it until the 99 is reached. So it generates the numbers from 1 to 99. Then final query then sums up all those numbers.
n is a column name, not a "variable" and the "assignment" happens in the same way as any data retrieval.
WITH RECURSIVE t(n) AS (
VALUES (1) --<< this is the recursion "root"
UNION ALL
SELECT n+1 FROM t WHERE n < 100 --<< this is the "recursive part"
)
SELECT sum(n) FROM t;
If you "unroll" the recursion (which in fact is an iteration) then you'd wind up with something like this:
select x.n + 1
from (
select x.n + 1
from (
select x.n + 1
from (
select x.n + 1
from (
values (1)
) as x(n)
) as x(n)
) as x(n)
) as x(n)
More details in the manual:
https://www.postgresql.org/docs/current/static/queries-with.html
If you are looking for how it is evaluated, the recursion occurs in two phases.
The root is executed once.
The recursive part is executed until no rows are returned. The documentation is a little vague on that point.
Now, normally in databases, we think of "function" in a different way than we think of them when we do imperative programming. In database terms, the best way to think of a function is "a correspondence where for every domain value you have exactly one corresponding value." So one of the immediate challenges is to stop thinking in terms of programming functions. Even user-defined functions are best thought about in this other way since it avoids a lot of potential nastiness regarding the intersection of running the query and the query planner... So it may look like a function but that is not correct.
Instead the WITH clause uses a different, almost inverse notation. Here you have the set name t, followed (optionally in this case) by the tuple structure (n). So this is not a function with a parameter, but a relation with a structure.
So how this breaks down:
SELECT 1 as n where n < 100
UNION ALL
SELECT n + 1 FROM (SELECT 1 as n) where n < 100
UNION ALL
SELECT n + 1 FROM (SELECT n + 1 FROM (SELECT 1 as n)) where n < 100
Of course that is a simplification because internally we keep track of the cte state and keep joining against the last iteration, so in practice these get folded back to near linear complexity (while the above diagram would suggest much worse performance than that).
So in reality you get something more like:
SELECT 1 as n where 1 < 100
UNION ALL
SELECT 1 + 1 as n where 1 + 1 < 100
UNION ALL
SELECT 2 + 1 AS n WHERE 2 + 1 < 100
...
In essence the previous values carry over.

Quicker with SQL to SUM 0 values or exclude them?

I want to SUM a lot of rows.
Is it quicker (or better practice, etc) to do Option A or Option B?
Option A
SELECT
[Person]
SUM([Value]) AS Total
FROM
Database
WHERE
[Value] > 0
GROUP BY
[Person]
Option B
SELECT
[Person]
SUM([Value]) AS Total
FROM
Database
GROUP BY
[Person]
So if I have, for Person X:
0, 7, 0, 6, 0, 5, 0, 0, 0, 4, 0, 9, 0, 0
Option A does:
a) Remove zeros
b) 7 + 6 + 5 + 4 + 9
Option B does:
a) 0 + 7 + 0 + 6 + 0 + 5 + 0 + 0 + 0 + 4 + 0 + 9 + 0 + 0
Option A has less summing, because it has fewer records to sum, because I've excluded the load that have a zero value. But Option B doesn't need a WHERE clause.
Anyone got an idea as to whether either of these are significantly quicker/better than the other? Or is it just something that doesn't matter either way?
Thanks :-)
Well, if you have a filtered index that exactly matches the where clause, and if that index removes a significant amount of data (as in: a good chunk of the data is zeros), then definitely the first... If you don't have such an index: then you'll need to test it on your specific data, but I would probably expect the unfiltered scenario to be faster, as it can do use a range of tricks to do the sum if it doesn't need to do branching etc.
However, the two examples aren't functionally equivalent at the moment (the second includes negative values, the first doesn't).
Assuming that Value is always positive the 2nd query might still return less rows if there's a Person with all zeroes.
Otherwise you should simply test actual runtime/CPU on a really large amount of rows.
As already pointed out, the two are not functionally equivalent. In addition to the differences already pointed out (negative values, different output row count), Option B also filters out rows where Value is NULL. Option A doesn't.
Based on the Execution plan for both of these and using a small dataset similar to the one you provided, Option B is slightly faster with an Estimated Subtree Cost of .0146636 vs .0146655. However, you may get different results depending on the query or size of dataset. Only option is to test and see for yourself.
http://www.developer.com/db/how-to-interpret-query-execution-plan-operators.html
Drop Table #Test
Create Table #Test (Person nvarchar(200), Value int)
Insert Into #Test
Select 'Todd', 12 Union
Select 'Todd', 11 Union
Select 'Peter', 20 Union
Select 'Peter', 29 Union
Select 'Griff', 10 Union
Select 'Griff', 0 Union
Select 'Peter', 0 Union
SELECT [Person], SUM([Value]) AS Total
FROM #Test
WHERE [Value] > 0
GROUP BY [Person]
SELECT [Person],SUM([Value]) AS Total
FROM #Test
GROUP BY [Person]

Ordered count of consecutive repeats / duplicates

I highly doubt I'm doing this in the most efficient manner, which is why I tagged plpgsql on here. I need to run this on 2 billion rows for a thousand measurement systems.
You have measurement systems that often report the previous value when they lose connectivity, and they lose connectivity for spurts often but sometimes for a long time. You need to aggregate but when you do so, you need to look at how long it was repeating and make various filters based on that information. Say you are measuring mpg on a car but it's stuck at 20 mpg for an hour than moves around to 20.1 and so on. You'll want to evaluate the accuracy when it's stuck. You could also place some alternative rules that look for when the car is on the highway, and with window functions you can generate the 'state' of the car and have something to group on. Without further ado:
--here's my data, you have different systems, the time of measurement, and the actual measurement
--as well, the raw data has whether or not it's a repeat (hense the included window function
select * into temporary table cumulative_repeat_calculator_data
FROM
(
select
system_measured, time_of_measurement, measurement,
case when
measurement = lag(measurement,1) over (partition by system_measured order by time_of_measurement asc)
then 1 else 0 end as repeat
FROM
(
SELECT 5 as measurement, 1 as time_of_measurement, 1 as system_measured
UNION
SELECT 150 as measurement, 2 as time_of_measurement, 1 as system_measured
UNION
SELECT 5 as measurement, 3 as time_of_measurement, 1 as system_measured
UNION
SELECT 5 as measurement, 4 as time_of_measurement, 1 as system_measured
UNION
SELECT 5 as measurement, 1 as time_of_measurement, 2 as system_measured
UNION
SELECT 5 as measurement, 2 as time_of_measurement, 2 as system_measured
UNION
SELECT 5 as measurement, 3 as time_of_measurement, 2 as system_measured
UNION
SELECT 5 as measurement, 4 as time_of_measurement, 2 as system_measured
UNION
SELECT 150 as measurement, 5 as time_of_measurement, 2 as system_measured
UNION
SELECT 5 as measurement, 6 as time_of_measurement, 2 as system_measured
UNION
SELECT 5 as measurement, 7 as time_of_measurement, 2 as system_measured
UNION
SELECT 5 as measurement, 8 as time_of_measurement, 2 as system_measured
) as data
) as data;
--unfortunately you can't have window functions within window functions, so I had to break it down into subquery
--what we need is something to partion on, the 'state' of the system if you will, so I ran a running total of the nonrepeats
--this creates a row that stays the same when your data is repeating - aka something you can partition/group on
select * into temporary table cumulative_repeat_calculator_step_1
FROM
(
select
*,
sum(case when repeat = 0 then 1 else 0 end) over (partition by system_measured order by time_of_measurement asc) as cumlative_sum_of_nonrepeats_by_system
from cumulative_repeat_calculator_data
order by system_measured, time_of_measurement
) as data;
--finally, the query. I didn't bother showing my desired output, because this (finally) got it
--I wanted a sequential count of repeats that restarts when it stops repeating, and starts with the first repeat
--what you can do now is take the average measurement under some condition based on how long it was repeating, for example
select *,
case when repeat = 0 then 0
else
row_number() over (partition by cumlative_sum_of_nonrepeats_by_system, system_measured order by time_of_measurement) - 1
end as ordered_repeat
from cumulative_repeat_calculator_step_1
order by system_measured, time_of_measurement
So, what would you do differently in order to run this on a huge table, or what alternative tools would you use? I'm thinking plpgsql because I suspect this needs to done in-database, or during the data insertion process, although I generally work with the data after it's loaded. Is there any way to get this in one sweep without resorting to sub-queries?
I have tested one alternative method, but it still relies on a sub-query and I think this is faster. For that method you create a "starts and stops" table with start_timestamp, end_timestamp, system. Then you join to the larger table and if the timestamp is between those, you classify it as being in that state, which is essentially an alternative to cumlative_sum_of_nonrepeats_by_system. But when you do this, you join on 1=1 for thousands of devices and thousands or millions of 'events'. Do you think that's a better way to go?
Test case
First, a more useful way to present your data - or even better, in an sqlfiddle, ready to play with:
CREATE TEMP TABLE data(
system_measured int
, time_of_measurement int
, measurement int
);
INSERT INTO data VALUES
(1, 1, 5)
,(1, 2, 150)
,(1, 3, 5)
,(1, 4, 5)
,(2, 1, 5)
,(2, 2, 5)
,(2, 3, 5)
,(2, 4, 5)
,(2, 5, 150)
,(2, 6, 5)
,(2, 7, 5)
,(2, 8, 5);
Simplified query
Since it remains unclear, I am assuming only the above as given.
Next, I simplified your query to arrive at:
WITH x AS (
SELECT *, CASE WHEN lag(measurement) OVER (PARTITION BY system_measured
ORDER BY time_of_measurement) = measurement
THEN 0 ELSE 1 END AS step
FROM data
)
, y AS (
SELECT *, sum(step) OVER(PARTITION BY system_measured
ORDER BY time_of_measurement) AS grp
FROM x
)
SELECT * ,row_number() OVER (PARTITION BY system_measured, grp
ORDER BY time_of_measurement) - 1 AS repeat_ct
FROM y
ORDER BY system_measured, time_of_measurement;
Now, while it is all nice and shiny to use pure SQL, this will be much faster with a plpgsql function, because it can do it in a single table scan where this query needs at least three scans.
Faster with plpgsql function:
CREATE OR REPLACE FUNCTION x.f_repeat_ct()
RETURNS TABLE (
system_measured int
, time_of_measurement int
, measurement int, repeat_ct int
) LANGUAGE plpgsql AS
$func$
DECLARE
r data; -- table name serves as record type
r0 data;
BEGIN
-- SET LOCAL work_mem = '1000 MB'; -- uncomment an adapt if needed, see below!
repeat_ct := 0; -- init
FOR r IN
SELECT * FROM data d ORDER BY d.system_measured, d.time_of_measurement
LOOP
IF r.system_measured = r0.system_measured
AND r.measurement = r0.measurement THEN
repeat_ct := repeat_ct + 1; -- start new array
ELSE
repeat_ct := 0; -- start new count
END IF;
RETURN QUERY SELECT r.*, repeat_ct;
r0 := r; -- remember last row
END LOOP;
END
$func$;
Call:
SELECT * FROM x.f_repeat_ct();
Be sure to table-qualify your column names at all times in this kind of plpgsql function, because we use the same names as output parameters which would take precedence if not qualified.
Billions of rows
If you have billions of rows, you may want to split this operation up. I quote the manual here:
Note: The current implementation of RETURN NEXT and RETURN QUERY
stores the entire result set before returning from the function, as
discussed above. That means that if a PL/pgSQL function produces a
very large result set, performance might be poor: data will be written
to disk to avoid memory exhaustion, but the function itself will not
return until the entire result set has been generated. A future
version of PL/pgSQL might allow users to define set-returning
functions that do not have this limitation. Currently, the point at
which data begins being written to disk is controlled by the work_mem
configuration variable. Administrators who have sufficient memory to
store larger result sets in memory should consider increasing this
parameter.
Consider computing rows for one system at a time or set a high enough value for work_mem to cope with the load. Follow the link provided in the quote on more about work_mem.
One way would be to set a very high value for work_mem with SET LOCAL in your function, which is only effective for for the current transaction. I added a commented line in the function. Do not set it very high globally, as this could nuke your server. Read the manual.

SQL that should never return anything, but does

I came across the following SQL statement:
SELECT A.NAME
FROM THE_TABLE A
WHERE A.NAME LIKE '%JOHN%DOE%'
AND ((A.NUM_FIELD/1) - (A.NUM_FIELD/2)*2 <> 0)
That last condition, "((A.NUM_FIELD/1) - (A.NUM_FIELD/2)*2 <> 0)" is what baffles me. Depening on the implementation of order of operations, it should always result to 0 or A.NUM_FIELD / 2.
How does SQL still return records from this view? If it always results to half the original value, why have it? (This is a delivered SQL package)
Probably integer division, so an odd NUM_FIELD is going to be one less.
MSDN says:
If an integer dividend is divided by an integer divisor, the result is
an integer that has any fractional part of the result truncated.
if the NUM_FIELD is an integer, and an odd one- then
(A.NUM_FIELD/1) - (A.NUM_FIELD/2)*2
is equal to one
What SQL implementation is this?
As noted,
(x/1) - (x/2)*2
is equivalent to
X - (2*(x/2))
which, if integer division is being performed yields 0 or 1 depending on whether the value is even or odd:
x x/2 2*(x/2) x-(2*(x/2))
- --- ------- -----------
0 0 0 0
1 0 0 1
2 1 2 0
3 1 2 1
4 2 4 0
...
if so, it seems like an odd way way of checking for odd/even values, especially since most SQL implementations that I'm aware of support a modulo operator or function, either x % y or mod(x,y).
The seemingly extraneous division by 1 makes me think the column in question might be floating point. Perhaps the person who coded the query is trying to check for jitter or fuzzyness in the low order bits of the floating point column?
if you modified the query to return all the intermediate values of the computation:
SELECT A.NAME as Name ,
A.NUM_FIELD as X ,
A.NUM_FIELD / 1 as X_over_1 ,
A.NUM_FIELD / 2 as X_over_2 ,
( X.NUM_FIELD / 2 )
* 2 as 2x_over_2 ,
( A.NUM_FIELD / 1 )
- ( A.NUM_FIELD / 2 ) * 2 as Delta ,
case when ( ( A.NUM_FIELD / 1 ) - ( A.NUM_FIELD / 2 ) * 2 ) <> 0
then 'return'
else 'discard'
end as Row_Status
FROM THE_TABLE A
WHERE A.NAME LIKE '%JOHN%DOE%'
and then executed it, what results do you get?