Vertica SQL: Disregard rows, based on two conditions - sql

I need a where condition that considers to following for an entire table:
If a 0 exists for an ID (in column d) then exclude everything that is >0, if 0 does not exist, but exists a row where d = a then exclude everything before that..
In Example (Case 1) I want to disregard rows 1 & 2, in Example 2 (Case 2) I want to disregard rows 1,2,& 3.
Currenty I have: where d <= 0 or d = a) but in Case 1 this also returns row nr 2, which I do not want.
row nr
ID
d
a
1
1
180
78
2
1
78
78
3
1
0
78
4
1
-67
78
5
1
-121
78
row nr
ID
d
a
1
2
180
148
2
2
171
148
3
2
170
148
4
2
148
148
5
2
-67
148
6
2
-121
148

This becomes a bit more complex to do than what you expected. You will have to involve a nested query with an OLAP function to detect that each row in a partition ( defined by the value of id) belongs to a partition of which at least one row has a value of 0 for d, and then, outside of that nested query, filter for that fact, and the value of d being 0 or greater. That's case 1.
In the other case, you use the same nested query to ascertain that you use only rows with no row with a value of 0 for d in the partition, and from there, the easiest way is to use Vertica's MATCH() clause to filter out the pattern of rows that consists of : a row with a d equal to a; zero, one or more occurrences of any row following, which I describe, in the query, with the pattern: (d_equal_a anyrow*) .
Here goes:
WITH
-- YOUR INPUT, don't use in query
indata(row_nr,ID,d,a) AS (
SELECT 1,1,180,78
UNION ALL SELECT 2,1,78,78
UNION ALL SELECT 3,1,0,78
UNION ALL SELECT 4,1,-67,78
UNION ALL SELECT 5,1,-121,78
UNION ALL SELECT 1,2,180,148
UNION ALL SELECT 2,2,171,148
UNION ALL SELECT 3,2,170,148
UNION ALL SELECT 4,2,148,148
UNION ALL SELECT 5,2,-67,148
UNION ALL SELECT 6,2,-121,148
)
-- end of your input, real query starts here, replace following comma with "WITH"
,
min_abs_d_eq_0 AS (
-- nested query with OLAP expression returning Boolean
SELECT
*
, (MIN(ABS(d)) OVER (PARTITION BY id) = 0) AS min_abs_d_eq_0
FROM indata
)
,
case1 AS (
SELECT
row_nr
, id
, d
, a
, 'no match clause' AS event_name -- these are based on the
, 0 AS pattern_id -- MATCH clause coming from
, 0 AS match_id -- the next CTE, "case2"
FROM min_abs_d_eq_0
WHERE min_abs_d_eq_0 AND d <= 0
)
,
case2 AS (
SELECT
row_nr
, id
, d
, a
, event_name()
, pattern_id()
, match_id()
FROM min_abs_d_eq_0
WHERE NOT min_abs_d_eq_0
MATCH (
PARTITION BY id ORDER BY row_nr
DEFINE
d_equal_a AS d = a
, anyrow AS true
PATTERN p AS (d_equal_a anyrow*)
)
)
SELECT * FROM case1
UNION ALL
SELECT * FROM case2
ORDER BY id,row_nr;
-- out row_nr | id | d | a | event_name | pattern_id | match_id
-- out --------+----+------+-----+-----------------+------------+----------
-- out 3 | 1 | 0 | 78 | no match clause | 0 | 0
-- out 4 | 1 | -67 | 78 | no match clause | 0 | 0
-- out 5 | 1 | -121 | 78 | no match clause | 0 | 0
-- out 4 | 2 | 148 | 148 | d_equal_a | 1 | 1
-- out 5 | 2 | -67 | 148 | anyrow | 1 | 2
-- out 6 | 2 | -121 | 148 | anyrow | 1 | 3

Related

query for column that are within a variable + or 1 of another column

I have a table that has 2 columns, and I am trying to determine a way to select the records where the two columns are CLOSE to one another. Maybe based on standard deviation if i can think about how to do that. But for now, this is what my table looks like:
ID| PCT | RETURN
1 | 20 | 1.20
2 | 15 | 0.90
3 | 0 | 3.00
The values in the pct field is a percent number (for example 20%). The value in the return field is a not fully calculated % number (so its supposed to be 20% above what the initial value was). The query I am working with so far is this:
select * from TABLE1 where ((pct = ((return - 1)* 100)));
What I'd like to end up with are the rows where both are within a set value of each other. For example If they are within 5 points of each other, then the row would be returned and the output would be:
ID| PCT | RETURN
1 | 20 | 1.20
2 | 15 | 0.90
In the above, ID 1 should work out to be PCT = 20 and Return = 20, and ID 2, is PCT = 15 and RETURN = 10. Because it was within 5 points of each other, it was returned.
ID 3 was not returned because 0 and 200 are way above the 5 point threshold.
Is there any way to set a variable that would return a +- 5 when comparing the two values from the above attributes? Thanks.
RexTester Example:
Use Lead() over (Order by PCT) to look ahead and LAG() to look back to the next row do the math and evaluate results...
WITH CTE (ID, PCT , RETURN) as (
SELECT 1 , 20 , 1.20 FROM DUAL UNION ALL
SELECT 2 , 15 , 0.90 FROM DUAL UNION ALL
SELECT 3 , 0 , 3.00 FROM DUAL),
CTE2 as (SELECT A.*, LEAD(PCT) Over (ORDER BY PCT) LEADPCT, LAG(PCT) Over (order by PCT) LAGPCT
FROM CTE A)
SELECT * FROM CTE2
WHERE LEADPCT-PCT <=5 OR PCT-LAGPCT <=5
Order by ID
Giving us:
+----+----+-----+--------+---------+--------+
| | ID | PCT | RETURN | LEADPCT | LAGPCT |
+----+----+-----+--------+---------+--------+
| 1 | 1 | 20 | 1,20 | NULL | 15 |
| 2 | 2 | 15 | 0,90 | 20 | 0 |
+----+----+-----+--------+---------+--------+
or use the return value instead of PCT... just depends on what you're after. But maybe I don't fully understand the question..

How to count all the connected nodes (rows) in a graph on Postgres?

My table has account_id and device_id. One account_id could have multiple device_ids and vice versa. I am trying to count the depth of each connected many-to-many relationship.
Ex:
account_id | device_id
1 | 10
1 | 11
1 | 12
2 | 10
3 | 11
3 | 13
3 | 14
4 | 15
5 | 15
6 | 16
How do I construct a query that knows to combine accounts 1-3 together, 4-5 together, and leave 6 by itself? All 7 entries of accounts 1-3 should be grouped together because they all touched the same account_id or device_id at some point. I am trying to group them together and output the count.
Account 1 was used on device's 10, 11, 12. Those devices used other accounts too so we want to include them in the group. They used additional accounts 2 and 3. But account 3 was further used by 2 more devices so we will include them as well. The expansion of the group brings in any other account or device that also "touched" an account or device already in the group.
A diagram is shown below:
You can use a recursive cte:
with recursive t(account_id, device_id) as (
select 1, 10 union all
select 1, 11 union all
select 1, 12 union all
select 2, 10 union all
select 3, 11 union all
select 3, 13 union all
select 3, 14 union all
select 4, 15 union all
select 5, 15 union all
select 6, 16
),
a as (
select distinct t.account_id as a, t2.account_id as a2
from t join
t t2
on t2.device_id = t.device_id and t.account_id >= t2.account_id
),
cte as (
select a.a, a.a2 as mina
from a
union all
select a.a, cte.a
from cte join
a
on a.a2 = cte.a and a.a > cte.a
)
select grp, array_agg(a)
from (select a, min(mina) as grp
from cte
group by a
) a
group by grp;
Here is a SQL Fiddle.
You can GROUP BY the device_id and then aggregate together the account_id into a Postgres array. Here is an example query, although I'm not sure what your actual table name is.
SELECT
device_id,
array_agg(account_id) as account_ids
FROM account_device --or whatever your table name is
GROUP BY device_id;
Results - hope it's what you're looking for:
16 | {6}
15 | {4,5}
13 | {3}
11 | {1,3}
14 | {3}
12 | {1}
10 | {1,2}
-- \i tmp.sql
CREATE TABLE graph(
account_id integer NOT NULL --references accounts(id)
, device_id integer not NULL --references(devices(id)
,PRIMARY KEY(account_id, device_id)
);
INSERT INTO graph (account_id, device_id)VALUES
(1,10) ,(1,11) ,(1,12)
,(2,10)
,(3,11) ,(3,13) ,(3,14)
,(4,15)
,(5,15)
,(6,16)
;
-- SELECT* FROM graph ;
-- Find the (3) group leaders
WITH seed AS (
SELECT row_number() OVER () AS cluster_id -- give them a number
, g.account_id
, g.device_id
FROM graph g
WHERE NOT EXISTS (SELECT*
FROM graph nx
WHERE (nx.account_id = g.account_id OR nx.device_id = g.device_id)
AND nx.ctid < g.ctid
)
)
-- SELECT * FROM seed;
;
WITH recursive omg AS (
--use the above CTE in a sub-CTE
WITH seed AS (
SELECT row_number()OVER () AS cluster_id
, g.account_id
, g.device_id
, g.ctid AS wtf --we need an (ordered!) canonical id for the tuples
-- ,just to identify and exclude them
FROM graph g
WHERE NOT EXISTS (SELECT*
FROM graph nx
WHERE (nx.account_id = g.account_id OR nx.device_id = g.device_id) AND nx.ctid < g.ctid
)
)
SELECT s.cluster_id
, s.account_id
, s.device_id
, s.wtf
FROM seed s
UNION ALL
SELECT o.cluster_id
, g.account_id
, g.device_id
, g.ctid AS wtf
FROM omg o
JOIN graph g ON (g.account_id = o.account_id OR g.device_id = o.device_id)
-- AND (g.account_id > o.account_id OR g.device_id > o.device_id)
AND g.ctid > o.wtf
-- we still need to exclude duplicates here
-- (which could occur if there are cycles in the graph)
-- , this could be done using an array
)
SELECT *
FROM omg
ORDER BY cluster_id, account_id,device_id
;
Results:
DROP SCHEMA
CREATE SCHEMA
SET
CREATE TABLE
INSERT 0 10
cluster_id | account_id | device_id
------------+------------+-----------
1 | 1 | 10
2 | 4 | 15
3 | 6 | 16
(3 rows)
cluster_id | account_id | device_id | wtf
------------+------------+-----------+--------
1 | 1 | 10 | (0,1)
1 | 1 | 11 | (0,2)
1 | 1 | 12 | (0,3)
1 | 1 | 12 | (0,3)
1 | 2 | 10 | (0,4)
1 | 3 | 11 | (0,5)
1 | 3 | 13 | (0,6)
1 | 3 | 14 | (0,7)
1 | 3 | 14 | (0,7)
2 | 4 | 15 | (0,8)
2 | 5 | 15 | (0,9)
3 | 6 | 16 | (0,10)
(12 rows)
Newer version (I added an Id column to the table)
-- for convenience :set of all adjacent nodes.
CREATE TEMP VIEW pair AS
SELECT one.id AS one
, two.id AS two
FROM graph one
JOIN graph two ON (one.account_id = two.account_id OR one.device_id = two.device_id)
AND one.id <> two.id
;
WITH RECURSIVE flood AS (
SELECT g.id, g.id AS parent_id
, 0 AS lev
, ARRAY[g.id]AS arr
FROM graph g
UNION ALL
SELECT c.id, p.parent_id AS parent_id
, 1+p.lev AS lev
, p.arr || ARRAY[c.id] AS arr
FROM graph c
JOIN flood p ON EXISTS (
SELECT * FROM pair WHERE p.id = pair.one AND c.id = pair.two)
AND p.parent_id < c.id
AND NOT p.arr #> ARRAY[c.id] -- avoid cycles/loops
)
SELECT g.*, a.parent_id
, dense_rank() over (ORDER by a.parent_id)AS group_id
FROM graph g
JOIN (SELECT id, MIN(parent_id) AS parent_id
FROM flood
GROUP BY id
) a
ON g.id = a.id
ORDER BY a.parent_id, g.id
;
New results:
CREATE VIEW
id | account_id | device_id | parent_id | group_id
----+------------+-----------+-----------+----------
1 | 1 | 10 | 1 | 1
2 | 1 | 11 | 1 | 1
3 | 1 | 12 | 1 | 1
4 | 2 | 10 | 1 | 1
5 | 3 | 11 | 1 | 1
6 | 3 | 13 | 1 | 1
7 | 3 | 14 | 1 | 1
8 | 4 | 15 | 8 | 2
9 | 5 | 15 | 8 | 2
10 | 6 | 16 | 10 | 3
(10 rows)

Count groups of NULL values - partition or window?

n | g
---------
1 | 1
2 | NULL
3 | 1
4 | 1
5 | 1
6 | 1
7 | NULL
8 | NULL
9 | NULL
10 | 1
11 | 1
12 | 1
13 | 1
14 | 1
15 | 1
16 | 1
17 | NULL
18 | 1
19 | 1
20 | 1
21 | NULL
22 | 1
23 | 1
24 | 1
25 | 1
26 | NULL
27 | NULL
28 | 1
29 | 1
30 | NULL
31 | 1
From the above column g I should get this result:
x|y
---
1|4
2|1
3|1
where
x stands for the count of contiguous NULLs and
y stands for the times a single group of NULLs occurs.
I.e., there is ...
4 groups of only 1 NULL,
1 group of 2 NULLs and
1 group of 3 NULLs
Compute a running count of not-null values with a window function to form groups, then 2 two nested counts ...
SELECT x, count(*) AS y
FROM (
SELECT grp, count(*) FILTER (WHERE g IS NULL) AS x
FROM (
SELECT g, count(g) OVER (ORDER BY n) AS grp
FROM tbl
) sub1
WHERE g IS NULL
GROUP BY grp
) sub2
GROUP BY 1
ORDER BY 1;
count() only counts not null values.
This includes the preceding row with a not null g in the following group (grp) of NULL values - which has to be removed from the count.
I replaced the HAVING clause I had for that in my initial query with WHERE g IS NULL, like #klin uses in his answer), that's simpler.
Related:
Find ā€œnā€ consecutive free numbers from table
Select longest continuous sequence
If n is a gapless sequence of integer numbers, you can simplify further:
SELECT x, count(*) AS y
FROM (
SELECT grp, count(*) AS x
FROM (
SELECT n - row_number() OVER (ORDER BY n) AS grp
FROM tbl
WHERE g IS NULL
) sub1
GROUP BY 1
) sub2
GROUP BY 1
ORDER BY 1;
Eliminate not null values immediately and deduct the row number from n, thereby arriving at (meaningless) group numbers directly ...
While the only possible value in g is 1, sum() is a smart trick (like #klin provided). But that should be a boolean column then, wouldn't make sense as numeric type. So I assume that's just a simplification of the actual problem in the question.
select x, count(x) y
from (
select s, count(s) x
from (
select *, sum(g) over (order by i) as s
from example
) s
where g isnull
group by 1
) s
group by 1
order by 1;
Test it here.

Select statement that duplicates rows based on N value of column

I have a Power table that stores building circuit details. A circuit can be 1 phase or 3 phase but is always represented as 1 row in the circuit table.
I want to insert the details of the circuits into a join table which joins panels to circuits
My current circuit table has the following details
CircuitID | Voltage | Phase | PanelID | Cct |
1 | 120 | 1 | 1 | 1 |
2 | 208 | 3 | 1 | 3 |
3 | 208 | 2 | 1 | 8 |
Is it possible to create a select where by when it sees a 3 phase row it selects 3 rows (or 2 select 2 rows) and increments the Cct column by 1 each time or do I have to create a loop?
CircuitID | PanelID | Cct |
1 | 1 | 1 |
2 | 1 | 3 |
2 | 1 | 4 |
2 | 1 | 5 |
3 | 1 | 8 |
3 | 1 | 9 |
Here is one way to do it
First generate numbers using tally table(best possible way). Here is one excellent article about generating number without loops. Generate a set or sequence without loops
Then join the numbers table with yourtable where phase value of each record should be greater than sequence number in number's table
;WITH e1(n) AS
(
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL
SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
), -- 10
e2(n) AS (SELECT 1 FROM e1 CROSS JOIN e1 AS b), -- 10*10
e3(n) AS (SELECT 1 FROM e1 CROSS JOIN e2), -- 10*100
numbers as ( SELECT n = ROW_NUMBER() OVER (ORDER BY n) FROM e3 )
SELECT CircuitID,
PanelID,
Cct = Cct + ( n - 1 )
FROM Yourtable a
JOIN numbers b
ON a.Phase >= b.n
You can do this with a one recursive cte.
WITH cte AS
(
SELECT [CircuitID], [Voltage], [Phase], [PanelID], [Cct], [Cct] AS [Ref]
FROM [Power]
UNION ALL
SELECT [CircuitID], [Voltage], [Phase], [PanelID], [Cct] + 1, [Ref]
FROM cte
WHERE [Cct] + 1 < [Phase] + [Ref]
)
SELECT [CircuitID], [PanelID], [Cct]
FROM cte
ORDER BY [CircuitID]
Simplest way,
Select y.* from (
Select 1 CircuitID,120 Voltage,1 Phase,1 PanelID, 1 Cct
union
Select 2,208,3,1,3
union
Select 3,208,2,1,8)y,
(Select 1 x
union
Select 2 x
union
Select 3 x)x
Where x.x <= y.Phase
Directly copy paste this and try, it will run 100%. After that, just replace my 'y' table with your real table.

Count rows in each 'partition' of a table

Disclaimer: I don't mean partition in the window function sense, nor table partitioning; I mean it in the more general sense, i.e. to divide up.
Here's a table:
id | y
----+------------
1 | 1
2 | 1
3 | 1
4 | 2
5 | 2
6 | null
7 | 2
8 | 2
9 | null
10 | null
I'd like to partition by checking equality on y, such that I end up with counts of the number of times each value of y appears contiguously, when sorted on id (i.e. in the order shown).
Here's the output I'm looking for:
y | count
-----+----------
1 | 3
2 | 2
null | 1
2 | 2
null | 2
So reading down the rows in that output we have:
The first partition of three 1's
The first partition of two 2's
The first partition of a null
The second partition of two 2's
The second partition of two nulls
Try:
SELECT y, count(*)
FROM (
SELECT y,
sum( xyz ) OVER (
ORDER BY id
rows between unbounded preceding
and current row
) qwe
FROM (
SELECT *,
case
when y is null and
lag(y) OVER ( ORDER BY id ) is null
then 0
when y = lag(y) OVER ( ORDER BY id )
then 0
else 1 end xyz
FROM table1
) alias
) alias
GROUP BY qwe, y
ORDER BY qwe;
demo: http://sqlfiddle.com/#!15/b1794/12