Browse subcolumns, but discard some - sql

I have a table (or view) in my PostgreSQL database and want to do the following:
Query the table and feed a function in my application subsequent n-tuples of rows from the query, but only those that satisfy some condition. I can do the n-tuple listing using a cursor, but I don't know how to do the condition checking on database level.
For example, the query returns:
3
2
4
2
0
1
4
6
2
And I want triples of even numbers. Here, they would be:
(2,4,2) (4,2,0) (4,6,2)
Obviously, I cannot discard the odd numbers from the query result. Instead using cursor, a query returning arrays in similar manner would also be acceptable solution, but I don't have any good idea how to use them to do this.
Of course, I could check it at application level, but I think it'd be cleaner to do it on database level. Is it possible?

With the window function lead() (as mentioned by #wildplasser):
SELECT *
FROM (
SELECT tbl_id, i AS i1
, lead(i) OVER (ORDER BY tbl_id) AS i2
, lead(i, 2) OVER (ORDER BY tbl_id) AS i3
FROM tbl
) sub
WHERE i1%2 = 0
AND i2%2 = 0
AND i3%2 = 0;
There is no natural order of rows - assuming you want to order by tbl_id in the example.
% .. modulo operator
SQL Fiddle.

You can also use an array aggregate for this instead of using lag:
SELECT
a[1] a1, a[2] a2, a[3] a3
FROM (
SELECT
array_agg(i) OVER (ORDER BY tbl_id ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM
tbl
) x(a)
WHERE a[1] % 2 = 0 AND a[2] % 2 = 0 AND a[3] % 2 = 0;
No idea if this'll be better, worse, or the same as Erwin's answer, just putting it in for completeness.

Related

WHILE Window Operation with Different Starting Point Values From Column - SQL Server [duplicate]

In SQL there are aggregation operators, like AVG, SUM, COUNT. Why doesn't it have an operator for multiplication? "MUL" or something.
I was wondering, does it exist for Oracle, MSSQL, MySQL ? If not is there a workaround that would give this behaviour?
By MUL do you mean progressive multiplication of values?
Even with 100 rows of some small size (say 10s), your MUL(column) is going to overflow any data type! With such a high probability of mis/ab-use, and very limited scope for use, it does not need to be a SQL Standard. As others have shown there are mathematical ways of working it out, just as there are many many ways to do tricky calculations in SQL just using standard (and common-use) methods.
Sample data:
Column
1
2
4
8
COUNT : 4 items (1 for each non-null)
SUM : 1 + 2 + 4 + 8 = 15
AVG : 3.75 (SUM/COUNT)
MUL : 1 x 2 x 4 x 8 ? ( =64 )
For completeness, the Oracle, MSSQL, MySQL core implementations *
Oracle : EXP(SUM(LN(column))) or POWER(N,SUM(LOG(column, N)))
MSSQL : EXP(SUM(LOG(column))) or POWER(N,SUM(LOG(column)/LOG(N)))
MySQL : EXP(SUM(LOG(column))) or POW(N,SUM(LOG(N,column)))
Care when using EXP/LOG in SQL Server, watch the return type http://msdn.microsoft.com/en-us/library/ms187592.aspx
The POWER form allows for larger numbers (using bases larger than Euler's number), and in cases where the result grows too large to turn it back using POWER, you can return just the logarithmic value and calculate the actual number outside of the SQL query
* LOG(0) and LOG(-ve) are undefined. The below shows only how to handle this in SQL Server. Equivalents can be found for the other SQL flavours, using the same concept
create table MUL(data int)
insert MUL select 1 yourColumn union all
select 2 union all
select 4 union all
select 8 union all
select -2 union all
select 0
select CASE WHEN MIN(abs(data)) = 0 then 0 ELSE
EXP(SUM(Log(abs(nullif(data,0))))) -- the base mathematics
* round(0.5-count(nullif(sign(sign(data)+0.5),1))%2,0) -- pairs up negatives
END
from MUL
Ingredients:
taking the abs() of data, if the min is 0, multiplying by whatever else is futile, the result is 0
When data is 0, NULLIF converts it to null. The abs(), log() both return null, causing it to be precluded from sum()
If data is not 0, abs allows us to multiple a negative number using the LOG method - we will keep track of the negativity elsewhere
Working out the final sign
sign(data) returns 1 for >0, 0 for 0 and -1 for <0.
We add another 0.5 and take the sign() again, so we have now classified 0 and 1 both as 1, and only -1 as -1.
again use NULLIF to remove from COUNT() the 1's, since we only need to count up the negatives.
% 2 against the count() of negative numbers returns either
--> 1 if there is an odd number of negative numbers
--> 0 if there is an even number of negative numbers
more mathematical tricks: we take 1 or 0 off 0.5, so that the above becomes
--> (0.5-1=-0.5=>round to -1) if there is an odd number of negative numbers
--> (0.5-0= 0.5=>round to 1) if there is an even number of negative numbers
we multiple this final 1/-1 against the SUM-PRODUCT value for the real result
No, but you can use Mathematics :)
if yourColumn is always bigger than zero:
select EXP(SUM(LOG(yourColumn))) As ColumnProduct from yourTable
I see an Oracle answer is still missing, so here it is:
SQL> with yourTable as
2 ( select 1 yourColumn from dual union all
3 select 2 from dual union all
4 select 4 from dual union all
5 select 8 from dual
6 )
7 select EXP(SUM(LN(yourColumn))) As ColumnProduct from yourTable
8 /
COLUMNPRODUCT
-------------
64
1 row selected.
Regards,
Rob.
With PostgreSQL, you can create your own aggregate functions, see http://www.postgresql.org/docs/8.2/interactive/sql-createaggregate.html
To create an aggregate function on MySQL, you'll need to build an .so (linux) or .dll (windows) file. An example is shown here: http://www.codeproject.com/KB/database/mygroupconcat.aspx
I'm not sure about mssql and oracle, but i bet they have options to create custom aggregates as well.
You'll break any datatype fairly quickly as numbers mount up.
Using LOG/EXP is tricky because of numbers <= 0 that will fail when using LOG. I wrote a solution in this question that deals with this
Using CTE in MS SQL:
CREATE TABLE Foo(Id int, Val int)
INSERT INTO Foo VALUES(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)
;WITH cte AS
(
SELECT Id, Val AS Multiply, row_number() over (order by Id) as rn
FROM Foo
WHERE Id=1
UNION ALL
SELECT ff.Id, cte.multiply*ff.Val as multiply, ff.rn FROM
(SELECT f.Id, f.Val, (row_number() over (order by f.Id)) as rn
FROM Foo f) ff
INNER JOIN cte
ON ff.rn -1= cte.rn
)
SELECT * FROM cte
Not sure about Oracle or sql-server, but in MySQL you can just use * like you normally would.
mysql> select count(id), count(id)*10 from tablename;
+-----------+--------------+
| count(id) | count(id)*10 |
+-----------+--------------+
| 961 | 9610 |
+-----------+--------------+
1 row in set (0.00 sec)

Postgres union of queries in loop

I have a table with two columns. Let's call them
array_column and text_column
I'm trying to write a query to find out, for K ranging from 1 to 10, in how many rows does the value in text_column appear in the first K elements of array_column
I'm expecting results like:
k | count
________________
1 | 70
2 | 85
3 | 90
...
I did manage to get these results by simply repeating the query 10 times and uniting the results, which looks like this:
SELECT 1 AS k, count(*) FROM table WHERE array_column[1:1] #> ARRAY[text_column]
UNION ALL
SELECT 2 AS k, count(*) FROM table WHERE array_column[1:2] #> ARRAY[text_column]
UNION ALL
SELECT 3 AS k, count(*) FROM table WHERE array_column[1:3] #> ARRAY[text_column]
...
But that doesn't looks like the correct way to do it. What if I wanted a very large range for K?
So my question is, is it possible to perform queries in a loop, and unite the results from each query? Or, if this is not the correct approach to the problem, how would you do it?
Thanks in advance!
You could use array_positions() which returns an array of all positions where the argument was found in the array, e.g.
select t.*,
array_positions(array_column, text_column)
from the_table t;
This returns a different result but is a lot more efficient as you don't need to increase the overall size of the result. To only consider the first ten array elements, just pass a slice to the function:
select t.*,
array_positions(array_column[1:10], text_column)
from the_table t;
To limit the result to only rows that actually contain the value you can use:
select t.*,
array_positions(array_column[1:10], text_column)
from the_table t
where text_column = any(array_column[1:10]);
To get your desired result, you could use unnest() to turn that into rows:
select k, count(*)
from the_table t, unnest(array_positions(array_column[1:10], text_column)) as k
where text_column = any(array_column[1:10])
group by k
order by k;
You can use the generate_series function to generate a table with the expected number of rows with the expected values and then join to it within the query, like so:
SELECT t.k AS k, count(*)
FROM table
--right join ensures that you will get a value of 0 if there are no records meeting the criteria
right join (select generate_series(1,10) as k) t
on array_column[1:t.k] #> ARRAY[text_column]
group by t.k
This is probably the closest thing to using a loop to go through the results without using something like PL/SQL to do an actual loop in a user-defined function.

Parallel unnest() and sort order in PostgreSQL

I understand that using
SELECT unnest(ARRAY[5,3,9]) as id
without an ORDER BY clause, the order of the result set is not guaranteed. I could for example get:
id
--
3
5
9
But what about the following request:
SELECT
unnest(ARRAY[5,3,9]) as id,
unnest(ARRAY(select generate_series(1, array_length(ARRAY[5,3,9], 1)))) as idx
ORDER BY idx ASC
Is it guaranteed that the 2 unnest() calls (which have the same length) will unroll in parallel and that the index idx will indeed match the position of the item in the array?
I am using PostgreSQL 9.3.3.
Yes, that is a feature of Postgres and parallel unnesting is guaranteed to be in sync (as long as all arrays have the same number of elements).
Postgres 9.4 adds a clean solution for parallel unnest:
Unnest multiple arrays in parallel
The order of resulting rows is not guaranteed, though. Actually, with a statement as simple as:
SELECT unnest(ARRAY[5,3,9]) AS id;
the resulting order of rows is "guaranteed", but Postgres does not assert anything. The query optimizer is free to order rows as it sees fit as long as the order is not explicitly defined. This may have side effects in more complex queries.
If the second query in your question is what you actually want (add an index number to unnested array elements), there is a better way with generate_subscripts():
SELECT unnest(ARRAY[5,3,9]) AS id
, generate_subscripts(ARRAY[5,3,9], 1) AS idx
ORDER BY idx;
Details in this related answer:
How to access array internal index with postgreSQL?
You will be interested in WITH ORDINALITY in Postgres 9.4:
PostgreSQL unnest() with element number
Then you can use:
SELECT * FROM unnest(ARRAY[5,3,9]) WITH ORDINALITY tbl(id, idx);
Short answer: No, idx will not match the array positions, when accepting the premise that unnest() output may be randomly ordered.
Demo:
since the current implementation of unnest actually output the rows in the order of elements, I suggest to add a layer on top of it to simulate a random order:
CREATE FUNCTION unnest_random(anyarray) RETURNS setof anyelement
language sql as
$$ select unnest($1) order by random() $$;
Then check out a few executions of your query with unnest replaced by unnest_random:
SELECT
unnest_random(ARRAY[5,3,9]) as id,
unnest_random(ARRAY(select generate_series(1, array_length(ARRAY[5,3,9], 1)))) as idx
ORDER BY idx ASC
Example of output:
id | idx
----+-----
3 | 1
9 | 2
5 | 3
id=3 is associated with idx=1 but 3 was in 2nd position in the array. It's all wrong.
What's wrong in the query: it assumes that the first unnest will shuffle the elements using the same permutation as the second unnest (permutation in the mathematic sense: the relationship between order in the array and order of the rows). But this assumption contradicts the premise that the order output of unnest is unpredictable to start with.
About this question:
Is it guaranteed that the 2 unnest() calls (which have the same
length) will unroll in parallel
In select unnest(...) X1, unnest(...) X2, with X1 and X2 being of type SETOF something and having the same number of rows, X1 and X2 will be paired in the final output so that the X1 value at row N will face the X2 value at the same row N.
(it's a kind of UNION for columns, as opposed to a cartesian product).
But I wouldn't describe this pairing as unroll in parallel, so I'm not sure this is what you meant.
Anyway this pairing doesn't help with the problem since it happens after the unnest calls have lost the array positions.
An alternative: In this thread from the pgsql-sql mailing list, this function is suggested:
CREATE OR REPLACE FUNCTION unnest_with_ordinality(anyarray, OUT value
anyelement, OUT ordinality integer)
RETURNS SETOF record AS
$$
SELECT $1[i], i FROM
generate_series(array_lower($1,1),
array_upper($1,1)) i;
$$
LANGUAGE sql IMMUTABLE;
Based on this, we can order by the second output column:
select * from unnest_with_ordinality(array[5,3,9]) order by 2;
value | ordinality
-------+------------
5 | 1
3 | 2
9 | 3
With postgres 9.4 and above: The WITH ORDINALITY clause that can follow SET RETURNING function calls will provide this functionality in a generic way.

Implementing a total order ranking in PostgreSQL 8.3

The issue with 8.3 is that rank() is introduced in 8.4.
Consider the numbers [10,6,6,2].
I wish to achieve a rank of those numbers where the rank is equal to the row number:
rank | score
-----+------
1 | 10
2 | 6
3 | 6
4 | 2
A partial solution is to self-join and count items with a higher or equal, score. This produces:
1 | 10
3 | 6
3 | 6
4 | 2
But that's not what I want.
Is there a way to rank, or even just order by score somehow and then extract that row number?
If you want a row number equivalent to the window function row_number(), you can improvise in version 8.3 (or any version) with a (temporary) SEQUENCE:
CREATE TEMP SEQUENCE foo;
SELECT nextval('foo') AS rn, *
FROM (SELECT score FROM tbl ORDER BY score DESC) s;
db<>fiddle here
Old sqlfiddle
The subquery is necessary to order rows before calling nextval().
Note that the sequence (like any temporary object) ...
is only visible in the same session it was created.
hides any other table object of the same name.
is dropped automatically at the end of the session.
To use the sequence in the same session repeatedly run before each query:
SELECT setval('foo', 1, FALSE);
There's a method using an array that works with PG 8.3. It's probably not very efficient, performance-wise, but will do OK if there aren't a lot of values.
The idea is to sort the values in a temporary array, then extract the bounds of the array, then join that with generate_series to extract the values one by one, the index into the array being the row number.
Sample query assuming the table is scores(value int):
SELECT i AS row_number,arr[i] AS score
FROM (SELECT arr,generate_series(1,nb) AS i
FROM (SELECT arr,array_upper(arr,1) AS nb
FROM (SELECT array(SELECT value FROM scores ORDER BY value DESC) AS arr
) AS s2
) AS s1
) AS s0
Do you have a PK for this table?
Just self join and count items with: a higher or equal score and higher PK.
PK comparison will break ties and give you desired result.
And after you upgrade to 9.1 - use row_number().

Counting non-null columns in a rather strange way

I have a table which has 32 columns in an Oracle table.
Two of these columns are identity columns
the rest are values
I would like to get the average of all the value columns, which is complicated by the null (identity) columns. Below is the pseudocode for what I am trying to achieve:
SELECT
((nvl(val0, 0) + nvl(val1, 0) + ... nvl(valn, 0))
/ nonZero_Column_Count_In_This_Row)
Such that: nonZero_Column_Count_In_This_Row = (ifNullThenZeroElse1(val0) + ifNullThenZeroElse1(val1) ... ifNullThenZeroElse(valn))
The difficulty here is of course in getting 1 for any non-null column. It seems I need a function similar to NVL, but with an else clause. Something that will return 0 if the value is null, but 1 if not, rather than the value itself.
How should I go about about getting the value for the denominator?
PS: I feel I must explain some motivation behind this design. Ideally this table would have been organized as the identity columns and one value per row with some identifier for the row itself. This would have made it more normalized and the solution to this problem would have been pretty simple. The reasons for it not to be done like this are throughput, and saving space. This is a huge DB where we insert 10 million values per minute into. Making each of these values one row would mean 10M rows per minute, which is definitely not attainable. Packing 30 of them into a single row reduces the number of rows inserted to something we can do with a single DB, and the overhead data amount (the identity data) much less.
(Case When col is null then 0 else 1 end)
You could use NVL2(val0, 1, 0) + NVL2(val1, 1, 0) + ... since you are using Oracle.
Another option is to use the AVG function, which ignores NULLs:
SELECT AVG(v) FROM (
WITH q AS (SELECT val0, val1, val2, val3 FROM mytable)
SELECT val0 AS v FROM q
UNION ALL SELECT val1 FROM q
UNION ALL SELECT val2 FROM q
UNION ALL SELECT val3 FROM q
);
If you're using Oracle11g you can use the UNPIVOT syntax to make it even simpler.
I see this is a pretty old question, but I don't see a sufficient answer. I had a similar problem, and below is how I solved it. It's pretty clear a case statement is needed. This solution is a workaround for such cases where
SELECT COUNT(column) WHERE column {IS | IS NOT} NULL
does not work for whatever reason, or, you need to do several
SELECT COUNT ( * )
FROM A_TABLE
WHERE COL1 IS NOT NULL;
SELECT COUNT ( * )
FROM A_TABLE
WHERE COL2 IS NOT NULL;
queries but want it as a data set when you run the script. See below; I use this for analysis and it's been working great for me so far.
SUM(CASE NVL(valn, 'X')
WHEN 'X'
THEN 0
ELSE 1
END) as COLUMN_NAME
FROM YOUR_TABLE;
Cheers!
Doug
Generically, you can do something like this:
SELECT (
(COALESCE(val0, 0) + COALESCE(val1, 0) + ...... COALESCE(valn, 0))
/
(SIGN(ABS(COALESCE(val0, 0))) + SIGN(ABS(COALESCE(val1, 0))) + .... )
) AS MyAverage
The top line will return the sum of values (omitting NULL values) whereas the bottom line will return the number of non-null values.
FYI - it's SQL Server syntax, but COALESCE is just like ISNULL for the most part. SIGN just returns -1 for a negative number, 0 for zero, and 1 for a positive number. ABS is "absolute value".