Restart cumsum in Pandas with condition - pandas

I have columns amount & assets. Column target should be the cumsum of amount, but the sum should be reset to the current amount if the previous assets was equal to zero.
Sample:
+--------+--------+--------+
| amount | assets | target |
+--------+--------+--------+
| 6 | 10 | 6 |
| 8 | 20 | 14 |
| -1 | 0 | 13 |
| 6 | 1 | 6 |
| -7 | 0 | -1 |
| 2 | 4 | 2 |
| -5 | 7 | -3 |
| 3 | 9 | 0 |
| 7 | 0 | 7 |
| 9 | 2 | 9 |
| 1 | 3 | 10 |
| -4 | 5 | 6 |
+--------+--------+--------+

Use GroupBy.cumsum with groups created by compare column by 0 with shifting Series.shift, processing first NaN and Series.cumsum:
g = df['assets'].eq(0).shift().bfill().cumsum()
#alternative
#g = df['assets'].eq(0).shift(fill_value=0).cumsum()
df['new'] = df.groupby(g)['amount'].cumsum()
print (df)
amount assets target new
0 6 10 6 6
1 8 20 14 14
2 -1 0 13 13
3 6 1 6 6
4 -7 0 -1 -1
5 2 4 2 2
6 -5 7 -3 -3
7 3 9 0 0
8 7 0 7 7
9 9 2 9 9
10 1 3 10 10
11 -4 5 6 6

Related

Group items in a data frame using a conditions

| ID | CUSTOMER_ID | LAST_TRAN_DATE | is_active | NO_OF_ACC | |
|----|-------------|----------------|-----------|-----------|--|
| | | | | | |
| 1 | 1 | 3-Apr-15 | 0 | 5 | |
| 2 | 2 | 26-Mar-04 | 0 | 4 | |
| 3 | 2 | 25-Jul-14 | 0 | 4 | |
| 4 | 2 | 3-Jan-13 | 0 | 4 | |
| 5 | 2 | 28-Jun-13 | 0 | 4 | |
| 6 | 3 | 19-Nov-08 | 0 | 3 | |
| 7 | 3 | 21-May-09 | 0 | 3 | |
| 8 | 3 | 24-Feb-12 | 0 | 3 | |
| 9 | 1 | 1-Jun-16 | 0 | 5 | |
| 10 | 1 | 8-Apr-19 | 1 | 5 | |
| 11 | 1 | 25-Nov-17 | 0 | 5 | |
| 12 | 1 | 22-Feb-19 | 1 | 5 | |
My data is like above and I want to calculate no of active accounts for each customer id, create a new column and display them in front of each row.
I used
df.groupby(['CUSTOMER_ID', 'is_active']).size()
which gave me the following result.
| CUSTOMER_ID | is_active | |
|--------------|-----------|------|
| 1 | 0 | 3 |
| | 1 | 2 |
| 2 | 0 | 4 |
| 3 | 0 | 3 |
| dtype: int64 | | |
But I have no idea how to map them in front of each row by creating a new column.
Please help me
IIUC, you need transform .sum with an initial filter and .map to apply the operation to the entire index of the dataframe.
df["active_accounts"] = df["CUSTOMER_ID"].map(
df[df["is_active"].eq(1)].groupby("CUSTOMER_ID")["NO_OF_ACC"].sum()
)
print(df)
ID CUSTOMER_ID LAST_TRAN_DATE is_active Count_Column NO_OF_ACC \
2 1 1 3-Apr-15 0 5 5
3 2 2 26-Mar-04 0 4 4
4 3 2 25-Jul-14 0 4 4
5 4 2 3-Jan-13 0 4 4
6 5 2 28-Jun-13 0 4 4
7 6 3 19-Nov-08 0 3 3
8 7 3 21-May-09 0 3 3
9 8 3 24-Feb-12 0 3 3
10 9 1 1-Jun-16 0 5 5
11 10 1 8-Apr-19 1 5 5
12 11 1 25-Nov-17 0 5 5
13 12 1 22-Feb-19 1 5 5
active_accounts
2 10.0
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 10.0
11 10.0
12 10.0

Enumerating table partitions in Postgres table

Suppose I have a table like this:
id | part | value
----+-------+-------
1 | 0 | 8
2 | 0 | 3
3 | 0 | 4
4 | 1 | 6
5 | 0 | 13
6 | 0 | 4
7 | 1 | 2
8 | 0 | 11
9 | 0 | 15
10 | 0 | 3
11 | 0 | 2
I would like to enumerate groups that have part atribute 0.
Ultimately I want to get this:
id | part | value | number
----+-------+-----------------
1 | 0 | 8 | 1
2 | 0 | 3 | 2
3 | 0 | 4 | 3
4 | 1 | 6 | 0
5 | 0 | 13 | 1
6 | 0 | 4 | 2
7 | 1 | 2 | 0
8 | 0 | 11 | 1
9 | 0 | 15 | 2
10 | 0 | 3 | 3
11 | 0 | 2 | 4
Is it possible to solve this with Postgres window functions or is there another way?
Yes, that is simple:
SELECT id, part, value,
row_number() OVER (PARTITION BY grp ORDER BY id) - 1 AS number
FROM (SELECT id, part, value,
sum(part) OVER (ORDER BY id) AS grp
FROM mytable
) AS q;
id | part | value | number
----+------+-------+--------
1 | 0 | 8 | 0
2 | 0 | 3 | 1
3 | 0 | 4 | 2
4 | 1 | 6 | 0
5 | 0 | 13 | 1
6 | 0 | 4 | 2
7 | 1 | 2 | 0
8 | 0 | 11 | 1
9 | 0 | 15 | 2
10 | 0 | 3 | 3
11 | 0 | 2 | 4
(11 rows)

Deleting recursively in a function (ERROR: query has no destination for result data)

I have this table of relationships (only id_padre and id_hijo are interesting):
id | id_padre | id_hijo | cantidad | posicion
----+----------+---------+----------+----------
0 | | 1 | 1 | 0
1 | 1 | 2 | 1 | 0
2 | 1 | 3 | 1 | 1
3 | 3 | 4 | 1 | 0
4 | 4 | 5 | 0.5 | 0
5 | 4 | 6 | 0.5 | 1
6 | 4 | 7 | 24 | 2
7 | 4 | 8 | 0.11 | 3
8 | 8 | 6 | 0.12 | 0
9 | 8 | 9 | 0.05 | 1
10 | 8 | 10 | 0.3 | 2
11 | 8 | 11 | 0.02 | 3
12 | 3 | 12 | 250 | 1
13 | 12 | 5 | 0.8 | 0
14 | 12 | 6 | 0.8 | 1
15 | 12 | 13 | 26 | 2
16 | 12 | 8 | 0.15 | 3
This table store the links between nodes (id_padre = parent node and id_hijo = child node).
I'm trying to do a function for a recursive delete of rows where I begin with a particular row. After deleted, I check if there are more rows with id_hijo column with the same value I used to delete the first row.
If there aren't rows with this condition, I'll must to delete all the rows where id_padre are equal id_hijo of the deleted row.
i.e.: If I begin to delete the row where id_padre=3 and id_hijo=4 then I delete this row:
id | id_padre | id_hijo | cantidad | posicion
----+----------+---------+----------+----------
3 | 3 | 4 | 1 | 0
and the table remains like that:
id | id_padre | id_hijo | cantidad | posicion
----+----------+---------+----------+----------
0 | | 1 | 1 | 0
1 | 1 | 2 | 1 | 0
2 | 1 | 3 | 1 | 1
4 | 4 | 5 | 0.5 | 0
5 | 4 | 6 | 0.5 | 1
6 | 4 | 7 | 24 | 2
7 | 4 | 8 | 0.11 | 3
8 | 8 | 6 | 0.12 | 0
9 | 8 | 9 | 0.05 | 1
10 | 8 | 10 | 0.3 | 2
11 | 8 | 11 | 0.02 | 3
12 | 3 | 12 | 250 | 1
13 | 12 | 5 | 0.8 | 0
14 | 12 | 6 | 0.8 | 1
15 | 12 | 13 | 26 | 2
16 | 12 | 8 | 0.15 | 3
Because of there aren't any row with id_hijo = 4 I will delete the rows where id_padre = 4....and so on..recursively. (in this example the process end here)
I have try to do this function (this function calls itself):
CREATE OR REPLACE FUNCTION borrar(integer,integer) RETURNS VOID AS
$BODY$
DECLARE
padre ALIAS FOR $1;
hijo ALIAS FOR $2;
r copia_rel%rowtype;
BEGIN
DELETE FROM copia_rel WHERE id_padre = padre AND id_hijo = hijo;
IF NOT EXISTS (SELECT id_hijo FROM copia_rel WHERE id_hijo = hijo) THEN
FOR r IN SELECT * FROM copia_rel WHERE id_padre = hijo LOOP
RAISE NOTICE 'Selecciono: %,%',r.id_padre,r.id_hijo;--for debugging
SELECT borrar(r.id_padre,r.id_hijo);
END LOOP;
END IF;
END;
$BODY$
LANGUAGE plpgsql;
But I get this error:
ERROR: query has no destination for result data
I know that there are specific recursive ways in postgresql wit CTE. I have used it for traverse my graph, but I don't know how could use it in this case.
The error is due to the SELECT used to call the function recursively. PostgreSQL wants to put the results somewhere but is not told where.
If you want to run a function and discard results use PERFORM instead of SELECT in PL/PgSQL functions.

How can I get the matrix for these tables?

I have two tables here and need to produce a matrix for all combinations
Table 1
Brand Company ID
1 1 1
2 2 2
3 3 3
Table 2
Prod1 Prod2 Prod3 Prod4 Prod5
4 5 6 18 19
5 6 7 20 5
The result I'm trying to achieve
Result table:
Brand Company ID Prod1 Prod2 Prod3 Prod4 Prod5
1 1 1 4 5 6 18 19
1 1 1 5 6 7 20 5
2 2 2 4 5 6 18 19
2 2 2 5 6 7 20 5
I could have worked with this if they have some kind of ID just not to how to approach this to get the matrix.
Thank you
Not sure what happened to the third row from table1 in your query and why it isn't in the result, but I think you are looking for a cross join.
select Brand, Company, ID, Prod1, Prod2, Prod3, Prod4, Prod5
from table1
cross join table2
rextester demo: http://rextester.com/UOZ33372
returns (with added order by):
+-------+---------+----+-------+-------+-------+-------+-------+
| Brand | Company | ID | Prod1 | Prod2 | Prod3 | Prod4 | Prod5 |
+-------+---------+----+-------+-------+-------+-------+-------+
| 1 | 1 | 1 | 4 | 5 | 6 | 18 | 19 |
| 1 | 1 | 1 | 5 | 6 | 7 | 20 | 5 |
| 2 | 2 | 2 | 4 | 5 | 6 | 18 | 19 |
| 2 | 2 | 2 | 5 | 6 | 7 | 20 | 5 |
| 3 | 3 | 3 | 4 | 5 | 6 | 18 | 19 |
| 3 | 3 | 3 | 5 | 6 | 7 | 20 | 5 |
+-------+---------+----+-------+-------+-------+-------+-------+

Summary of row values according to summation of n number

I have table like this
a | b
_____
1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
and i want the result like this
a | b | c
_________
1 | 1 | 1
2 | 2 | 3
3 | 3 | 6
4 | 4 | 10
5 | 5 | 15