Enumerating table partitions in Postgres table - sql

Suppose I have a table like this:
id | part | value
----+-------+-------
1 | 0 | 8
2 | 0 | 3
3 | 0 | 4
4 | 1 | 6
5 | 0 | 13
6 | 0 | 4
7 | 1 | 2
8 | 0 | 11
9 | 0 | 15
10 | 0 | 3
11 | 0 | 2
I would like to enumerate groups that have part atribute 0.
Ultimately I want to get this:
id | part | value | number
----+-------+-----------------
1 | 0 | 8 | 1
2 | 0 | 3 | 2
3 | 0 | 4 | 3
4 | 1 | 6 | 0
5 | 0 | 13 | 1
6 | 0 | 4 | 2
7 | 1 | 2 | 0
8 | 0 | 11 | 1
9 | 0 | 15 | 2
10 | 0 | 3 | 3
11 | 0 | 2 | 4
Is it possible to solve this with Postgres window functions or is there another way?

Yes, that is simple:
SELECT id, part, value,
row_number() OVER (PARTITION BY grp ORDER BY id) - 1 AS number
FROM (SELECT id, part, value,
sum(part) OVER (ORDER BY id) AS grp
FROM mytable
) AS q;
id | part | value | number
----+------+-------+--------
1 | 0 | 8 | 0
2 | 0 | 3 | 1
3 | 0 | 4 | 2
4 | 1 | 6 | 0
5 | 0 | 13 | 1
6 | 0 | 4 | 2
7 | 1 | 2 | 0
8 | 0 | 11 | 1
9 | 0 | 15 | 2
10 | 0 | 3 | 3
11 | 0 | 2 | 4
(11 rows)

Related

Group items in a data frame using a conditions

| ID | CUSTOMER_ID | LAST_TRAN_DATE | is_active | NO_OF_ACC | |
|----|-------------|----------------|-----------|-----------|--|
| | | | | | |
| 1 | 1 | 3-Apr-15 | 0 | 5 | |
| 2 | 2 | 26-Mar-04 | 0 | 4 | |
| 3 | 2 | 25-Jul-14 | 0 | 4 | |
| 4 | 2 | 3-Jan-13 | 0 | 4 | |
| 5 | 2 | 28-Jun-13 | 0 | 4 | |
| 6 | 3 | 19-Nov-08 | 0 | 3 | |
| 7 | 3 | 21-May-09 | 0 | 3 | |
| 8 | 3 | 24-Feb-12 | 0 | 3 | |
| 9 | 1 | 1-Jun-16 | 0 | 5 | |
| 10 | 1 | 8-Apr-19 | 1 | 5 | |
| 11 | 1 | 25-Nov-17 | 0 | 5 | |
| 12 | 1 | 22-Feb-19 | 1 | 5 | |
My data is like above and I want to calculate no of active accounts for each customer id, create a new column and display them in front of each row.
I used
df.groupby(['CUSTOMER_ID', 'is_active']).size()
which gave me the following result.
| CUSTOMER_ID | is_active | |
|--------------|-----------|------|
| 1 | 0 | 3 |
| | 1 | 2 |
| 2 | 0 | 4 |
| 3 | 0 | 3 |
| dtype: int64 | | |
But I have no idea how to map them in front of each row by creating a new column.
Please help me
IIUC, you need transform .sum with an initial filter and .map to apply the operation to the entire index of the dataframe.
df["active_accounts"] = df["CUSTOMER_ID"].map(
df[df["is_active"].eq(1)].groupby("CUSTOMER_ID")["NO_OF_ACC"].sum()
)
print(df)
ID CUSTOMER_ID LAST_TRAN_DATE is_active Count_Column NO_OF_ACC \
2 1 1 3-Apr-15 0 5 5
3 2 2 26-Mar-04 0 4 4
4 3 2 25-Jul-14 0 4 4
5 4 2 3-Jan-13 0 4 4
6 5 2 28-Jun-13 0 4 4
7 6 3 19-Nov-08 0 3 3
8 7 3 21-May-09 0 3 3
9 8 3 24-Feb-12 0 3 3
10 9 1 1-Jun-16 0 5 5
11 10 1 8-Apr-19 1 5 5
12 11 1 25-Nov-17 0 5 5
13 12 1 22-Feb-19 1 5 5
active_accounts
2 10.0
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 10.0
11 10.0
12 10.0

Query with WITH clause and COUNT subquery

In the query below, I don't get the results i would expect. Any insights why? How could i reformulate such query to get the desired results?
Schema (SQLite v3.30)
WITH RECURSIVE
cnt(x,y) AS (VALUES(0,ABS(Random()%3)) UNION ALL SELECT x+1, ABS(Random()%3) FROM cnt WHERE x<10),
i_rnd as (SELECT r1.x, r1.y, (SELECT COUNT(*) FROM cnt as r2 WHERE r2.y<=r1.y) as idx FROM cnt as r1)
SELECT * FROM i_rnd ORDER BY y;
result:
| x | y | idx |
| --- | --- | --- |
| 1 | 0 | 3 |
| 5 | 0 | 6 |
| 8 | 0 | 5 |
| 9 | 0 | 4 |
| 10 | 0 | 2 |
| 3 | 1 | 4 |
| 0 | 2 | 11 |
| 2 | 2 | 11 |
| 4 | 2 | 11 |
| 6 | 2 | 11 |
| 7 | 2 | 11 |
expected result:
| x | y | idx |
| --- | --- | --- |
| 1 | 0 | 5 |
| 5 | 0 | 5 |
| 8 | 0 | 5 |
| 9 | 0 | 5 |
| 10 | 0 | 5 |
| 3 | 1 | 6 |
| 0 | 2 | 11 |
| 2 | 2 | 11 |
| 4 | 2 | 11 |
| 6 | 2 | 11 |
| 7 | 2 | 11 |
In other words, idx should indicate how many rows have y less or equal than the y of row considered.
I would just use:
select cnt.*,
count(*) over (order by y)
from cnt;
Here is a db<>fiddle.
The issue with your code is probably that the CTE is re-evaluated each time it is called, so the values are not consistent -- a problem with volatile functions in CTEs.

Deleting recursively in a function (ERROR: query has no destination for result data)

I have this table of relationships (only id_padre and id_hijo are interesting):
id | id_padre | id_hijo | cantidad | posicion
----+----------+---------+----------+----------
0 | | 1 | 1 | 0
1 | 1 | 2 | 1 | 0
2 | 1 | 3 | 1 | 1
3 | 3 | 4 | 1 | 0
4 | 4 | 5 | 0.5 | 0
5 | 4 | 6 | 0.5 | 1
6 | 4 | 7 | 24 | 2
7 | 4 | 8 | 0.11 | 3
8 | 8 | 6 | 0.12 | 0
9 | 8 | 9 | 0.05 | 1
10 | 8 | 10 | 0.3 | 2
11 | 8 | 11 | 0.02 | 3
12 | 3 | 12 | 250 | 1
13 | 12 | 5 | 0.8 | 0
14 | 12 | 6 | 0.8 | 1
15 | 12 | 13 | 26 | 2
16 | 12 | 8 | 0.15 | 3
This table store the links between nodes (id_padre = parent node and id_hijo = child node).
I'm trying to do a function for a recursive delete of rows where I begin with a particular row. After deleted, I check if there are more rows with id_hijo column with the same value I used to delete the first row.
If there aren't rows with this condition, I'll must to delete all the rows where id_padre are equal id_hijo of the deleted row.
i.e.: If I begin to delete the row where id_padre=3 and id_hijo=4 then I delete this row:
id | id_padre | id_hijo | cantidad | posicion
----+----------+---------+----------+----------
3 | 3 | 4 | 1 | 0
and the table remains like that:
id | id_padre | id_hijo | cantidad | posicion
----+----------+---------+----------+----------
0 | | 1 | 1 | 0
1 | 1 | 2 | 1 | 0
2 | 1 | 3 | 1 | 1
4 | 4 | 5 | 0.5 | 0
5 | 4 | 6 | 0.5 | 1
6 | 4 | 7 | 24 | 2
7 | 4 | 8 | 0.11 | 3
8 | 8 | 6 | 0.12 | 0
9 | 8 | 9 | 0.05 | 1
10 | 8 | 10 | 0.3 | 2
11 | 8 | 11 | 0.02 | 3
12 | 3 | 12 | 250 | 1
13 | 12 | 5 | 0.8 | 0
14 | 12 | 6 | 0.8 | 1
15 | 12 | 13 | 26 | 2
16 | 12 | 8 | 0.15 | 3
Because of there aren't any row with id_hijo = 4 I will delete the rows where id_padre = 4....and so on..recursively. (in this example the process end here)
I have try to do this function (this function calls itself):
CREATE OR REPLACE FUNCTION borrar(integer,integer) RETURNS VOID AS
$BODY$
DECLARE
padre ALIAS FOR $1;
hijo ALIAS FOR $2;
r copia_rel%rowtype;
BEGIN
DELETE FROM copia_rel WHERE id_padre = padre AND id_hijo = hijo;
IF NOT EXISTS (SELECT id_hijo FROM copia_rel WHERE id_hijo = hijo) THEN
FOR r IN SELECT * FROM copia_rel WHERE id_padre = hijo LOOP
RAISE NOTICE 'Selecciono: %,%',r.id_padre,r.id_hijo;--for debugging
SELECT borrar(r.id_padre,r.id_hijo);
END LOOP;
END IF;
END;
$BODY$
LANGUAGE plpgsql;
But I get this error:
ERROR: query has no destination for result data
I know that there are specific recursive ways in postgresql wit CTE. I have used it for traverse my graph, but I don't know how could use it in this case.
The error is due to the SELECT used to call the function recursively. PostgreSQL wants to put the results somewhere but is not told where.
If you want to run a function and discard results use PERFORM instead of SELECT in PL/PgSQL functions.

SQL how to force to display row with 0 if no data available?

My table returns results as following (skips row if HourOfDay does not have data for particular ID)
ID HourOfDay Counts
--------------------------
1 5 5
1 13 10
1 23 3
..........................HourOfDay up till 23
2 9 1
and so on.
What I am trying to achieve is to force showing rows displaying 0 for HoursOfDay, which don't have data, like following:
ID HourOfDay Counts
--------------------------
1 0 0
1 1 0
1 2 0
1......................
1 5 5
1 6 0
1......................
1 23 3
2 0 0
2 1 0
etc.
I have researched around about it. It looks like I can achieve this result if I create an extra table and outer join it. So I have created table variable in SP (as a temp workaround)
DECLARE #Hours TABLE
(
[Hour] INT NULL
);
INSERT INTO #Hours VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)
,(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23);
However, no matter how I join it, it does not achieve desired result.
How do I proceed? Do I add extra columns to join on? Completely different approach? Any hint in the right direction is appreciated!
Using a derived table for the distinct Ids cross joined to #Hours, left joined to your table:
select
i.Id
, h.Hour
, coalesce(t.Counts,0) as Counts
from (select distinct Id from t) as i
cross join #Hours as h
left join t
on i.Id = t.Id
and h.Hour = t.HourOfDay
rextester demo: http://rextester.com/XFZYX88502
returns:
+----+------+--------+
| Id | Hour | Counts |
+----+------+--------+
| 1 | 0 | 0 |
| 1 | 1 | 0 |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 4 | 0 |
| 1 | 5 | 5 |
| 1 | 6 | 0 |
| 1 | 7 | 0 |
| 1 | 8 | 0 |
| 1 | 9 | 0 |
| 1 | 10 | 0 |
| 1 | 11 | 0 |
| 1 | 12 | 0 |
| 1 | 13 | 10 |
| 1 | 14 | 0 |
| 1 | 15 | 0 |
| 1 | 16 | 0 |
| 1 | 17 | 0 |
| 1 | 18 | 0 |
| 1 | 19 | 0 |
| 1 | 20 | 0 |
| 1 | 21 | 0 |
| 1 | 22 | 0 |
| 1 | 23 | 3 |
| 2 | 0 | 0 |
| 2 | 1 | 0 |
| 2 | 2 | 0 |
| 2 | 3 | 0 |
| 2 | 4 | 0 |
| 2 | 5 | 0 |
| 2 | 6 | 0 |
| 2 | 7 | 0 |
| 2 | 8 | 0 |
| 2 | 9 | 1 |
| 2 | 10 | 0 |
| 2 | 11 | 0 |
| 2 | 12 | 0 |
| 2 | 13 | 0 |
| 2 | 14 | 0 |
| 2 | 15 | 0 |
| 2 | 16 | 0 |
| 2 | 17 | 0 |
| 2 | 18 | 0 |
| 2 | 19 | 0 |
| 2 | 20 | 0 |
| 2 | 21 | 0 |
| 2 | 22 | 0 |
| 2 | 23 | 0 |
+----+------+--------+

Postgres crosstab query

I have a table which has 7 different classes with an area value.
pid | class| area |
----+------+------+
2 | 1 | 10 |
2 | 2 | 10 |
2 | 6 | 20 |
4 | 1 | 30 |
4 | 2 | 40 |
4 | 3 | 50 |
4 | 4 | 60 |
4 | 5 | 70 |
9 | 6 | 80 |
11 | 1 | 90 |
11 | 4 | 10 |
11 | 7 | 20 |
However I want to present this data in a format that has each distinct pid as a column heading and then have each row correspond to a class area (i.e. first row is the area of class 1 for each pid).
2 | 4 | 9 | 11 |
---+-----+-----+----+
10 | 30 | 0 | 90 |
10 | 40 | 0 | 0 |
0 | 50 | 0 | 0 |
0 | 60 | 0 | 10 |
0 | 70 | 0 | 0 |
20 | 0 | 60 | 0 |
0 | 0 | 0 | 20 |
Is it possible to create an output like this in PostgreSQL?
Try this:
SELECT
SUM(CASE WHEN pid = 2 THEN area ELSE 0 END) As "2",
SUM(CASE WHEN pid = 4 THEN area ELSE 0 END) As "4",
SUM(CASE WHEN pid = 9 THEN area ELSE 0 END) As "9",
SUM(CASE WHEN pid = 11 THEN area ELSE 0 END) As "11"
FROM t
GROUP BY class
ORDER BY class