SQL Server - Multiplying row values for a given column value [duplicate] - sql

Im looking for something like SELECT PRODUCT(table.price) FROM table GROUP BY table.sale similar to how SUM works.
Have I missed something on the documentation, or is there really no PRODUCT function?
If so, why not?
Note: I looked for the function in postgres, mysql and mssql and found none so I assumed all sql does not support it.

For MSSQL you can use this. It can be adopted for other platforms: it's just maths and aggregates on logarithms.
SELECT
GrpID,
CASE
WHEN MinVal = 0 THEN 0
WHEN Neg % 2 = 1 THEN -1 * EXP(ABSMult)
ELSE EXP(ABSMult)
END
FROM
(
SELECT
GrpID,
--log of +ve row values
SUM(LOG(ABS(NULLIF(Value, 0)))) AS ABSMult,
--count of -ve values. Even = +ve result.
SUM(SIGN(CASE WHEN Value < 0 THEN 1 ELSE 0 END)) AS Neg,
--anything * zero = zero
MIN(ABS(Value)) AS MinVal
FROM
Mytable
GROUP BY
GrpID
) foo
Taken from my answer here: SQL Server Query - groupwise multiplication

I don't know why there isn't one, but (take more care over negative numbers) you can use logs and exponents to do:-
select exp (sum (ln (table.price))) from table ...

There is no PRODUCT set function in the SQL Standard. It would appear to be a worthy candidate, though (unlike, say, a CONCATENATE set function: it's not a good fit for SQL e.g. the resulting data type would involve multivalues and pose a problem as regards first normal form).
The SQL Standards aim to consolidate functionality across SQL products circa 1990 and to provide 'thought leadership' on future development. In short, they document what SQL does and what SQL should do. The absence of PRODUCT set function suggests that in 1990 no vendor though it worthy of inclusion and there has been no academic interest in introducing it into the Standard.
Of course, vendors always have sought to add their own functionality, these days usually as extentions to Standards rather than tangentally. I don't recall seeing a PRODUCT set function (or even demand for one) in any of the SQL products I've used.
In any case, the work around is fairly simple using log and exp scalar functions (and logic to handle negatives) with the SUM set function; see #gbn's answer for some sample code. I've never needed to do this in a business application, though.
In conclusion, my best guess is that there is no demand from SQL end users for a PRODUCT set function; further, that anyone with an academic interest would probably find the workaround acceptable (i.e. would not value the syntactic sugar a PRODUCT set function would provide).
Out of interest, there is indeed demand in SQL Server Land for new set functions but for those of the window function variety (and Standard SQL, too). For more details, including how to get involved in further driving demand, see Itzik Ben-Gan's blog.

You can perform a product aggregate function, but you have to do the maths yourself, like this...
SELECT
Exp(Sum(IIf(Abs([Num])=0,0,Log(Abs([Num])))))*IIf(Min(Abs([Num]))=0,0,1)*(1-2*(Sum(IIf([Num]>=0,0,1)) Mod 2)) AS P
FROM
Table1
Source: http://productfunctionsql.codeplex.com/

There is a neat trick in T-SQL (not sure if it's ANSI) that allows to concatenate string values from a set of rows into one variable. It looks like it works for multiplying as well:
declare #Floats as table (value float)
insert into #Floats values (0.9)
insert into #Floats values (0.9)
insert into #Floats values (0.9)
declare #multiplier float = null
select
#multiplier = isnull(#multiplier, '1') * value
from #Floats
select #multiplier
This can potentially be more numerically stable than the log/exp solution.

I think that is because no numbering system is able to accommodate many products. As databases are designed for large number of records, a product of 1000 numbers would be super massive and in case of floating point numbers, the propagated error would be huge.
Also note that using log can be a dangerous solution. Although mathematically log(a*b) = log(a)*log(b), it might not be in computers as we are not dealing with real numbers. If you calculate 2^(log(a)+log(b)) instead of a*b, you may get unexpected results. For example:
SELECT 9999999999*99999999974482, EXP(LOG(9999999999)+LOG(99999999974482))
in Sql Server returns
999999999644820000025518, 9.99999999644812E+23
So my point is when you are trying to do the product do it carefully and test is heavily.

One way to deal with this problem (if you are working in a scripting language) is to use the group_concat function.
For example, SELECT group_concat(table.price) FROM table GROUP BY table.sale
This will return a string with all prices for the same sale value, separated by a comma.
Then with a parser you can get each price, and do a multiplication. (In php you can even use the array_reduce function, in fact in the php.net manual you get a suitable example).
Cheers

Another approach based on fact that the cardinality of cartesian product is product of cardinalities of particular sets ;-)
⚠ WARNING: This example is just for fun and is rather academic, don't use it in production! (apart from the fact it's just for positive and practically small integers)⚠
with recursive t(c) as (
select unnest(array[2,5,7,8])
), p(a) as (
select array_agg(c) from t
union all
select p.a[2:]
from p
cross join generate_series(1, p.a[1])
)
select count(*) from p where cardinality(a) = 0;

The problem can be solved using modern SQL features such as window functions and CTEs. Everything is standard SQL and - unlike logarithm-based solutions - does not require switching from integer world to floating point world nor handling nonpositive numbers. Just number rows and evaluate product in recursive query until no row remain:
with recursive t(c) as (
select unnest(array[2,5,7,8])
), r(c,n) as (
select t.c, row_number() over () from t
), p(c,n) as (
select c, n from r where n = 1
union all
select r.c * p.c, r.n from p join r on p.n + 1 = r.n
)
select c from p where n = (select max(n) from p);
As your question involves grouping by sale column, things got little bit complicated but it's still solvable:
with recursive t(sale,price) as (
select 'multiplication', 2 union
select 'multiplication', 5 union
select 'multiplication', 7 union
select 'multiplication', 8 union
select 'trivial', 1 union
select 'trivial', 8 union
select 'negatives work', -2 union
select 'negatives work', -3 union
select 'negatives work', -5 union
select 'look ma, zero works too!', 1 union
select 'look ma, zero works too!', 0 union
select 'look ma, zero works too!', 2
), r(sale,price,n,maxn) as (
select t.sale, t.price, row_number() over (partition by sale), count(1) over (partition by sale)
from t
), p(sale,price,n,maxn) as (
select sale, price, n, maxn
from r where n = 1
union all
select p.sale, r.price * p.price, r.n, r.maxn
from p
join r on p.sale = r.sale and p.n + 1 = r.n
)
select sale, price
from p
where n = maxn
order by sale;
Result:
sale,price
"look ma, zero works too!",0
multiplication,560
negatives work,-30
trivial,8
Tested on Postgres.

Here is an oracle solution for anyone who needs it
with data(id, val) as(
select 1,1.0 from dual union all
select 2,-2.0 from dual union all
select 3,1.0 from dual union all
select 4,2.0 from dual
),
neg(val , modifier) as(
select exp(sum(ln(abs(val)))), case when mod(count(*),2) = 0 then 1 Else -1 end
from data
where val <0
)
,
pos(val) as (
select exp(sum(ln(val)))
from data
where val >=0
)
select (select val*modifier from neg)*(select val from pos) product from dual

Related

Doing Math with 2 Subquerys

I have two subquerys both calculating sums. I would like to do an Artithmetic Minus(-) with the result of both Querys . eg Query1: 400 Query2: 300 Result should be 100.
Obvious a basic - in the query does not work. The minus works as MINUS on sets. How can I solve this? Do you have any ideas?
SELECT CustumersNo FROM Custumers WHERE
(
SELECT SUM(value) FROM roe WHERE roe.credit = Custumers.CustumersNo
-
SELECT SUM(value) FROM roe WHERE roe.debit = Custumers.CustumersNo
)
> 500
Using Informix - sorry missed that point
To get the original syntax to work, you would need to surround the sub-selects in parentheses:
SELECT CustumersNo
FROM Custumers
WHERE ((SELECT SUM(value) FROM roe WHERE roe.credit = Custumers.CustumersNo)
-
(SELECT SUM(value) FROM roe WHERE roe.debit = Custumers.CustumersNo)
) > 500
Note that aggregates are defined to ignore nulls in the values they aggregate in standard SQL. However, the SUM of an empty set of rows is NULL, not zero.
You can get inventive and devise ways to always have a value for each customer listed in the roe table, such as:
SELECT CustomersNo
FROM (SELECT CustomersNo, SUM(value) AS net_credit
FROM (SELECT credit AS CustomersNo, +value
UNION
SELECT debit AS CustomersNo, -value
) AS x
GROUP BY CustomersNo
) AS y
WHERE net_credit > 500;
You can also do that with an appropriate HAVING clause if you wish. Note that this avoids issues with customers who have credit entries but no debit entries or vice versa; all the entries that are present are treated appropriately.
Your misspelling (or unorthodox spelling) of 'customers' is nearly as good as 'costumers'.
Something like what you tried should work. It may be a syntax problem, and it may depend on what type of SQL you are using. However, an approach like this would be more efficient:
Update: I see you were having a problem with nulls, so I updated it to handle nulls properly.
select CustumersNo from (
select CustumersNo,
sum(coalesce(roecredit.value,0)) - sum(coalesce(roedebit.value,0))
as balance
FROM Custumers
join roe roecredit on roe.credit = Custumers.CustumersNo
join roe roedebit on roe.debit = Custumers.CustumersNo
group by CustumersNo
)
where balance > 500
Caveat: I don't have experience with Informix specifically.

SQL Server : how to select a fixed amount of rows (select every x-th value)

A short description: I have a table with data that is updated over a certain time period. Now the problem is, that - depending on the nature of the sensor which sends the data - in this time period there could be either 50 data sets or 50.000. As I want to visualize this data (using ASP.NET / c#), for a first preview I would like to SELECT just 1000 values from the table.
I already have an approach doing this: I count the rows in the time period of interest, with a simple "where" clause to specify the sensor-id, save it as a variable in SQL, and then divide the count() by 1000. I've tried it in MS Access, where it works just fine:
set #divider = select count(*) from table where [...]
SELECT (Int([RowNumber]/#divider)), First(Value)
FROM myTable
GROUP BY (Int([RowNumber]/#divider));
The trick in Access was, that I simply have a data field ("RowNumber"), which is my PK/ID, and goes from 0 up. I tried to accomplish that in SQL Server using the ROW_NUMBER() method, which works more or less. I've got the right syntax for the method, but I can not use the GROUP BY statement
Windowed functions can only appear in the SELECT or ORDER BY
clauses.
meaning ROW_NUMBER() can't be in the GROUP BY statement.
Now I'm kinda stuck. I've tried to save the ROW_NUMBER value into a char or a separate column, and GROUP BY it later on, but I couldn't get it done. And somehow I start to think, that my strategy might have its weaknesses ...? :/
To clarify once more: I don't need to SELECT TOP 1000 from my table, because this would just mean that I select the first 1000 values (depending on the sorting). I need to SELECT every x-th value, while I can compute the x (and I could even round it to an INT, if that would help to get it done). I hope I was able to describe the problem understandable ...
This is my first post here on StackOverflow, I hope I didn't forget anything essential or important, if you need any further information (table structure, my queries so far, ...) please don't hesitate to ask. Any help or hint is highly appreciated - thanks in advance! :)
Update: SOLUTION! Big thanks to https://stackoverflow.com/users/52598/lieven!!!
Here is how I did it in the end:
I declare 2 variables - I count my rows and SET it into the first var. Then I use ROUND() on the just assigned variable, and divide it by 1000 (because in the end I want ABOUT 1000 values!). I split this operation into 2 variables, because if I used the value from the COUNT function as basis for my ROUND operation, there were some mistakes.
declare #myvar decimal(10,2)
declare #myvar2 decimal(10,2)
set #myvar = (select COUNT(*)
from value_table
where channelid=135 and myDate >= '2011-01-14 22:00:00.000' and myDate <= '2011-02-14 22:00:00.000'
)
set #myvar2 = ROUND(#myvar/1000, 0)
Now I have the rounded value, which I want to be my step-size (take every x-th value -> this is our "x" ;)) stored in #myvar2. Next I will subselect the data of the desired timespan and channel, and add the ROW_NUMBER() as column "rn", and finally add a WHERE-clause to the outer SELECT, where I divide the ROW_NUMBER through #myvar2 - when the modulus is 0, the row will be SELECTed.
select * from
(
select (ROW_NUMBER() over (order by id desc)) as rn, myValue, myDate
from value_table
where channel_id=135 and myDate >= '2011-01-14 22:00:00.000' and myDate<= '2011-02-14 22:00:00.000'
) d
WHERE rn % #myvar2 = 0
Works like a charm - once again all my thanks to https://stackoverflow.com/users/52598/lieven, see the comment below for the original posting!
In essence, all you need to do to select the x-th value is retain all rows where the modulus of the rownumber divided by x is 0.
WHERE rn % #x_thValues = 0
Now to be able to use your ROW_NUMBER's result, you'll need to wrap the entire statement into in a subselect
SELECT *
FROM (
SELECT *
, rn = ROW_NUMBER() OVER (ORDER BY Value)
FROM DummyData
) d
WHERE rn % #x_thValues = 0
Combined with a variable to what x-th values you need, you might use something like this testscript
DECLARE #x_thValues INTEGER = 2
;WITH DummyData AS (SELECT * FROM (VALUES (1), (2), (3), (4)) v (Value))
SELECT *
FROM (
SELECT *
, rn = ROW_NUMBER() OVER (ORDER BY Value)
FROM DummyData
) d
WHERE rn % #x_thValues = 0
One more option to consider:
Select Top 1000 *
From dbo.SomeTable
Where ....
Order By NewID()
but to be honest- like the previous answer more than this one.
The question could be about performance..

Purposely having a query return blank entries at regular intervals

I want to write a query that returns 3 results followed by blank results followed by the next 3 results, and so on. So if my database had this data:
CREATE TABLE table (a integer, b integer, c integer, d integer);
INSERT INTO table (a,b,c,d)
VALUES (1,2,3,4),
(5,6,7,8),
(9,10,11,12),
(13,14,15,16),
(17,18,19,20),
(21,22,23,24),
(25,26,37,28);
I would want my query to return this
1,2,3,4
5,6,7,8
9,10,11,12
, , ,
13,14,15,16
17,18,19,20
21,22,23,24
, , ,
25,26,27,28
I need this to work for arbitrarily many entries that I select for, have three be grouped together like this.
I'm running postgresql 8.3
This should work flawlessly in PostgreSQL 8.3
SELECT a, b, c, d
FROM (
SELECT rn, 0 AS rk, (x[rn]).*
FROM (
SELECT x, generate_series(1, array_upper(x, 1)) AS rn
FROM (SELECT ARRAY(SELECT tbl FROM tbl) AS x) x
) y
UNION ALL
SELECT generate_series(3, (SELECT count(*) FROM tbl), 3), 1, (NULL::tbl).*
ORDER BY rn, rk
) z
Major points
Works for a query that selects all columns of tbl.
Works for any table.
For selecting arbitrary columns you have to substitute (NULL::tbl).* with a matching number of NULL columns in the second query.
Assuming that NULL values are ok for "blank" rows.
If not, you'll have to cast your columns to text in the first and substitute '' for NULL in the second SELECT.
Query will be slow with very big tables.
If I had to do it, I would write a plpgsql function that loops through the results and inserts the blank rows. But you mentioned you had no direct access to the db ...
In short, no, there's not an easy way to do this, and generally, you shouldn't try. The database is concerned with what your data actually is, not how it's going to be displayed. It's not an appropriate scope of responsibility to expect your database to return "dummy" or "extra" data so that some down-stream process produces a desired output. The generating script needs to do that.
As you can't change your down-stream process, you could (read that with a significant degree of skepticism and disdain) add things like this:
Select Top 3
a, b, c, d
From
table
Union Select Top 1
'', '', '', ''
From
table
Union Select Top 3 Skip 3
a, b, c, d
From
table
Please, don't actually try do that.
You can do it (at least on DB2 - there doesn't appear to be equivalent functionality for your version of PostgreSQL).
No looping needed, although there is a bit of trickery involved...
Please note that though this works, it's really best to change your display code.
Statement requires CTEs (although that can be re-written to use other table references), and OLAP functions (I guess you could re-write it to count() previous rows in a subquery, but...).
WITH dataList (rowNum, dataColumn) as (SELECT CAST(CAST(:interval as REAL) /
(:interval - 1) * ROW_NUMBER() OVER(ORDER BY dataColumn) as INTEGER),
dataColumn
FROM dataTable),
blankIncluder(rowNum, dataColumn) as (SELECT rowNum, dataColumn
FROM dataList
UNION ALL
SELECT rowNum - 1, :blankDataColumn
FROM dataList
WHERE MOD(rowNum - 1, :interval) = 0
AND rowNum > :interval)
SELECT *
FROM dataList
ORDER BY rowNum
This will generate a list of those elements from the datatable, with a 'blank' line every interval lines, as ordered by the initial query. The result set only has 'blank' lines between existing lines - there are no 'blank' lines on the ends.

Is there an SQL function which generates a given range of sequential numbers?

I need to generate an array of sequential integers with a given range in order to use it in:
SELECT tbl.pk_id
FROM tbl
WHERE tbl.pk_id NOT IN (sequential array);
If you have a given range - ie a start point and an end point - of sequential integers you should just be able to use the BETWEEN keyword:
SELECT tbl.pk_id
FROM tbl
WHERE tbl.pk_id NOT BETWEEN START_INT AND END_INT
or am I misunderstanding your question..?
Because you say you've already got a number table, I would suggest this:
SELECT element
FROM series
WHERE element NOT IN (SELECT pk_id
FROM tbl)
Might possibly be more efficient than the query you've tried.
Two thoughts . . .
First, there's no standard SQL function that does that. But some systems include a non-standard function that does generates a series. In PostgreSQL, for example, you can use the generate_series() function.
select generate_series(1,100000);
1
2
3
...
100000
That function essentially returns a table; it can be used in joins.
If Informix doesn't have a function that does something similar, maybe you can write an Informix SPL function that does.
Second, you could just create a one-column table and populate it with a series of integers. This works on all platforms, and doesn't require programming. It requires only minimal maintenance. (You need to keep more integers in this table than you're using in your production table.)
create table integers (
i integer primary key
);
Use a spreadsheet or a utility program to generate a series of integers to populate it. The easiest way if you have a Unix, Linux, or Cygwin environment laying around is to use seq.
$ seq 1 5 > integers
$ cat integers
1
2
3
4
5
Informix has a free developer version you can download. Maybe you can build a compelling demo with it, and management will let you upgrade.
i'll suggest a generic solution to create a result set containing the positive integers 0 .. 2^k-1 for a given k for subsequent use as a subquery, view or materialized view.
the code below illustrates the technique for k=2.
SELECT bv0 + 2* bv1 + 4*bv2 val
FROM (
SELECT *
FROM
(
SELECT 0 bv0 FROM DUAL
UNION
SELECT 1 bv0 FROM DUAL
) bit0
CROSS JOIN (
SELECT 0 bv1 FROM DUAL
UNION
SELECT 1 bv1 FROM DUAL
) bit1
CROSS JOIN (
SELECT 0 bv2 FROM DUAL
UNION
SELECT 1 bv2 FROM DUAL
) bit2
) pow2
;
i hope that helps you with your task
best regards,
carsten

Thin out many ST_Points

I have many (1.000.000) ST_Points in a postgres-db with postgis
extension. When i show them on a map, browsers are getting very busy.
For that I would like to write an sql-statement which filters the high density to only one point.
When a User zoom out of 100 ST_Points, postgres should give back only one.
But only if these Points are close together.
I tried it with this statement:
select a.id, count(*)
from points as a, points as b
where st_dwithin(a.location, b.location, 0.001)
and a.id != b.id
group by a.id
I would call it thin-out but didnot find anything - maybe because I'm
not a native english speaker.
Does anybody have some suggestions?
I agree with tcarobruce that clustering is the term you are looking for. But it is available in postgis.
Basically clustering can be achieved by reducing the number of decimals in the X and Y and grouping upon them;
select
count(*),
round(cast (ST_X(geom) as numeric),3)
round(cast (ST_Y(geom) as numeric),3)
from mytable
group by
round(cast (ST_X(geom) as numeric),3),
round(cast (ST_Y(geom) as numeric),3)
Which will result in a table with coordinates and the number of real points at that coordinate. In this particular sample, it leaves you with rounding on 3 decimals, 0.001 like in your initial statement.
You can cluster nearby Points together using ST_ClusterDBSCAN
Then keep all single points and for example:
Select one random Point per cluster
or
Select the centroid of each Point cluster.
I use eps 300 to cluster points together that are within 300 meters.
create table buildings_grouped as
SELECT geom, ST_ClusterDBSCAN(geom, eps := 300, minpoints := 2) over () AS cid
FROM buildings
1:
create table buildings_grouped_keep_random as
select geom, cid from buildings_grouped
where cid is null
union
select * from
(SELECT DISTINCT ON (cid) *
FROM buildings_grouped
ORDER BY cid, random()) sub
2:
create table buildings_grouped_keep_centroid as
select geom, cid from buildings_grouped
where cid is null
union
select st_centroid(st_union(geom)) geom, cid
from buildings_grouped
where cid is not null
group by cid
The term you are looking for is "clustering".
There are client-side libraries that do this, as well as commercial services that do it server-side.
But it's not something PostGIS does natively. (There's a ticket for it.)
You'll probably have to write your own solution, and precompute your clusters ahead of time.
ST_ClusterDBSCAN- and KMeans- based clustering works but it is VERY SLOW! for big data sets. So it is practically unusable. PostGIS functions like ST_SnapToGrid and ST_RemoveRepeatedPoints is faster and can help in some cases. But the best approach, I think, is using PDAL thinning filters like sample filter. I recommend using it with PG Point Cloud.
Edit:
ST_SnapToGrid is pretty fast and useful. Here is the example query for triangulation with optimizations:
WITH step1 AS
(
SELECT geometry, ST_DIMENSION(geometry) AS dim FROM table
)
, step2 AS
(
SELECT ST_SIMPLIFYVW(geometry, :tolerance) AS geometry FROM step1 WHERE dim > 0
UNION ALL
(WITH q1 AS
(
SELECT (ST_DUMP(geometry)).geom AS geometry FROM step1 WHERE dim = 0
)
SELECT ST_COLLECT(DISTINCT(ST_SNAPTOGRID(geometry, :tolerance))) FROM q1)
)
, step3 AS
(
SELECT ST_COLLECT(geometry) AS geometry FROM step2
)
SELECT ST_DELAUNAYTRIANGLES(geometry, :tolerance, 0)::BYTEA AS geometry
FROM step3
OFFSET :offset LIMIT :limit;