Group items in a data frame using a conditions - pandas

| ID | CUSTOMER_ID | LAST_TRAN_DATE | is_active | NO_OF_ACC | |
|----|-------------|----------------|-----------|-----------|--|
| | | | | | |
| 1 | 1 | 3-Apr-15 | 0 | 5 | |
| 2 | 2 | 26-Mar-04 | 0 | 4 | |
| 3 | 2 | 25-Jul-14 | 0 | 4 | |
| 4 | 2 | 3-Jan-13 | 0 | 4 | |
| 5 | 2 | 28-Jun-13 | 0 | 4 | |
| 6 | 3 | 19-Nov-08 | 0 | 3 | |
| 7 | 3 | 21-May-09 | 0 | 3 | |
| 8 | 3 | 24-Feb-12 | 0 | 3 | |
| 9 | 1 | 1-Jun-16 | 0 | 5 | |
| 10 | 1 | 8-Apr-19 | 1 | 5 | |
| 11 | 1 | 25-Nov-17 | 0 | 5 | |
| 12 | 1 | 22-Feb-19 | 1 | 5 | |
My data is like above and I want to calculate no of active accounts for each customer id, create a new column and display them in front of each row.
I used
df.groupby(['CUSTOMER_ID', 'is_active']).size()
which gave me the following result.
| CUSTOMER_ID | is_active | |
|--------------|-----------|------|
| 1 | 0 | 3 |
| | 1 | 2 |
| 2 | 0 | 4 |
| 3 | 0 | 3 |
| dtype: int64 | | |
But I have no idea how to map them in front of each row by creating a new column.
Please help me

IIUC, you need transform .sum with an initial filter and .map to apply the operation to the entire index of the dataframe.
df["active_accounts"] = df["CUSTOMER_ID"].map(
df[df["is_active"].eq(1)].groupby("CUSTOMER_ID")["NO_OF_ACC"].sum()
)
print(df)
ID CUSTOMER_ID LAST_TRAN_DATE is_active Count_Column NO_OF_ACC \
2 1 1 3-Apr-15 0 5 5
3 2 2 26-Mar-04 0 4 4
4 3 2 25-Jul-14 0 4 4
5 4 2 3-Jan-13 0 4 4
6 5 2 28-Jun-13 0 4 4
7 6 3 19-Nov-08 0 3 3
8 7 3 21-May-09 0 3 3
9 8 3 24-Feb-12 0 3 3
10 9 1 1-Jun-16 0 5 5
11 10 1 8-Apr-19 1 5 5
12 11 1 25-Nov-17 0 5 5
13 12 1 22-Feb-19 1 5 5
active_accounts
2 10.0
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
10 10.0
11 10.0
12 10.0

Related

Restart cumsum in Pandas with condition

I have columns amount & assets. Column target should be the cumsum of amount, but the sum should be reset to the current amount if the previous assets was equal to zero.
Sample:
+--------+--------+--------+
| amount | assets | target |
+--------+--------+--------+
| 6 | 10 | 6 |
| 8 | 20 | 14 |
| -1 | 0 | 13 |
| 6 | 1 | 6 |
| -7 | 0 | -1 |
| 2 | 4 | 2 |
| -5 | 7 | -3 |
| 3 | 9 | 0 |
| 7 | 0 | 7 |
| 9 | 2 | 9 |
| 1 | 3 | 10 |
| -4 | 5 | 6 |
+--------+--------+--------+
Use GroupBy.cumsum with groups created by compare column by 0 with shifting Series.shift, processing first NaN and Series.cumsum:
g = df['assets'].eq(0).shift().bfill().cumsum()
#alternative
#g = df['assets'].eq(0).shift(fill_value=0).cumsum()
df['new'] = df.groupby(g)['amount'].cumsum()
print (df)
amount assets target new
0 6 10 6 6
1 8 20 14 14
2 -1 0 13 13
3 6 1 6 6
4 -7 0 -1 -1
5 2 4 2 2
6 -5 7 -3 -3
7 3 9 0 0
8 7 0 7 7
9 9 2 9 9
10 1 3 10 10
11 -4 5 6 6

Enumerating table partitions in Postgres table

Suppose I have a table like this:
id | part | value
----+-------+-------
1 | 0 | 8
2 | 0 | 3
3 | 0 | 4
4 | 1 | 6
5 | 0 | 13
6 | 0 | 4
7 | 1 | 2
8 | 0 | 11
9 | 0 | 15
10 | 0 | 3
11 | 0 | 2
I would like to enumerate groups that have part atribute 0.
Ultimately I want to get this:
id | part | value | number
----+-------+-----------------
1 | 0 | 8 | 1
2 | 0 | 3 | 2
3 | 0 | 4 | 3
4 | 1 | 6 | 0
5 | 0 | 13 | 1
6 | 0 | 4 | 2
7 | 1 | 2 | 0
8 | 0 | 11 | 1
9 | 0 | 15 | 2
10 | 0 | 3 | 3
11 | 0 | 2 | 4
Is it possible to solve this with Postgres window functions or is there another way?
Yes, that is simple:
SELECT id, part, value,
row_number() OVER (PARTITION BY grp ORDER BY id) - 1 AS number
FROM (SELECT id, part, value,
sum(part) OVER (ORDER BY id) AS grp
FROM mytable
) AS q;
id | part | value | number
----+------+-------+--------
1 | 0 | 8 | 0
2 | 0 | 3 | 1
3 | 0 | 4 | 2
4 | 1 | 6 | 0
5 | 0 | 13 | 1
6 | 0 | 4 | 2
7 | 1 | 2 | 0
8 | 0 | 11 | 1
9 | 0 | 15 | 2
10 | 0 | 3 | 3
11 | 0 | 2 | 4
(11 rows)

Deleting recursively in a function (ERROR: query has no destination for result data)

I have this table of relationships (only id_padre and id_hijo are interesting):
id | id_padre | id_hijo | cantidad | posicion
----+----------+---------+----------+----------
0 | | 1 | 1 | 0
1 | 1 | 2 | 1 | 0
2 | 1 | 3 | 1 | 1
3 | 3 | 4 | 1 | 0
4 | 4 | 5 | 0.5 | 0
5 | 4 | 6 | 0.5 | 1
6 | 4 | 7 | 24 | 2
7 | 4 | 8 | 0.11 | 3
8 | 8 | 6 | 0.12 | 0
9 | 8 | 9 | 0.05 | 1
10 | 8 | 10 | 0.3 | 2
11 | 8 | 11 | 0.02 | 3
12 | 3 | 12 | 250 | 1
13 | 12 | 5 | 0.8 | 0
14 | 12 | 6 | 0.8 | 1
15 | 12 | 13 | 26 | 2
16 | 12 | 8 | 0.15 | 3
This table store the links between nodes (id_padre = parent node and id_hijo = child node).
I'm trying to do a function for a recursive delete of rows where I begin with a particular row. After deleted, I check if there are more rows with id_hijo column with the same value I used to delete the first row.
If there aren't rows with this condition, I'll must to delete all the rows where id_padre are equal id_hijo of the deleted row.
i.e.: If I begin to delete the row where id_padre=3 and id_hijo=4 then I delete this row:
id | id_padre | id_hijo | cantidad | posicion
----+----------+---------+----------+----------
3 | 3 | 4 | 1 | 0
and the table remains like that:
id | id_padre | id_hijo | cantidad | posicion
----+----------+---------+----------+----------
0 | | 1 | 1 | 0
1 | 1 | 2 | 1 | 0
2 | 1 | 3 | 1 | 1
4 | 4 | 5 | 0.5 | 0
5 | 4 | 6 | 0.5 | 1
6 | 4 | 7 | 24 | 2
7 | 4 | 8 | 0.11 | 3
8 | 8 | 6 | 0.12 | 0
9 | 8 | 9 | 0.05 | 1
10 | 8 | 10 | 0.3 | 2
11 | 8 | 11 | 0.02 | 3
12 | 3 | 12 | 250 | 1
13 | 12 | 5 | 0.8 | 0
14 | 12 | 6 | 0.8 | 1
15 | 12 | 13 | 26 | 2
16 | 12 | 8 | 0.15 | 3
Because of there aren't any row with id_hijo = 4 I will delete the rows where id_padre = 4....and so on..recursively. (in this example the process end here)
I have try to do this function (this function calls itself):
CREATE OR REPLACE FUNCTION borrar(integer,integer) RETURNS VOID AS
$BODY$
DECLARE
padre ALIAS FOR $1;
hijo ALIAS FOR $2;
r copia_rel%rowtype;
BEGIN
DELETE FROM copia_rel WHERE id_padre = padre AND id_hijo = hijo;
IF NOT EXISTS (SELECT id_hijo FROM copia_rel WHERE id_hijo = hijo) THEN
FOR r IN SELECT * FROM copia_rel WHERE id_padre = hijo LOOP
RAISE NOTICE 'Selecciono: %,%',r.id_padre,r.id_hijo;--for debugging
SELECT borrar(r.id_padre,r.id_hijo);
END LOOP;
END IF;
END;
$BODY$
LANGUAGE plpgsql;
But I get this error:
ERROR: query has no destination for result data
I know that there are specific recursive ways in postgresql wit CTE. I have used it for traverse my graph, but I don't know how could use it in this case.
The error is due to the SELECT used to call the function recursively. PostgreSQL wants to put the results somewhere but is not told where.
If you want to run a function and discard results use PERFORM instead of SELECT in PL/PgSQL functions.

SQL how to force to display row with 0 if no data available?

My table returns results as following (skips row if HourOfDay does not have data for particular ID)
ID HourOfDay Counts
--------------------------
1 5 5
1 13 10
1 23 3
..........................HourOfDay up till 23
2 9 1
and so on.
What I am trying to achieve is to force showing rows displaying 0 for HoursOfDay, which don't have data, like following:
ID HourOfDay Counts
--------------------------
1 0 0
1 1 0
1 2 0
1......................
1 5 5
1 6 0
1......................
1 23 3
2 0 0
2 1 0
etc.
I have researched around about it. It looks like I can achieve this result if I create an extra table and outer join it. So I have created table variable in SP (as a temp workaround)
DECLARE #Hours TABLE
(
[Hour] INT NULL
);
INSERT INTO #Hours VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)
,(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23);
However, no matter how I join it, it does not achieve desired result.
How do I proceed? Do I add extra columns to join on? Completely different approach? Any hint in the right direction is appreciated!
Using a derived table for the distinct Ids cross joined to #Hours, left joined to your table:
select
i.Id
, h.Hour
, coalesce(t.Counts,0) as Counts
from (select distinct Id from t) as i
cross join #Hours as h
left join t
on i.Id = t.Id
and h.Hour = t.HourOfDay
rextester demo: http://rextester.com/XFZYX88502
returns:
+----+------+--------+
| Id | Hour | Counts |
+----+------+--------+
| 1 | 0 | 0 |
| 1 | 1 | 0 |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 4 | 0 |
| 1 | 5 | 5 |
| 1 | 6 | 0 |
| 1 | 7 | 0 |
| 1 | 8 | 0 |
| 1 | 9 | 0 |
| 1 | 10 | 0 |
| 1 | 11 | 0 |
| 1 | 12 | 0 |
| 1 | 13 | 10 |
| 1 | 14 | 0 |
| 1 | 15 | 0 |
| 1 | 16 | 0 |
| 1 | 17 | 0 |
| 1 | 18 | 0 |
| 1 | 19 | 0 |
| 1 | 20 | 0 |
| 1 | 21 | 0 |
| 1 | 22 | 0 |
| 1 | 23 | 3 |
| 2 | 0 | 0 |
| 2 | 1 | 0 |
| 2 | 2 | 0 |
| 2 | 3 | 0 |
| 2 | 4 | 0 |
| 2 | 5 | 0 |
| 2 | 6 | 0 |
| 2 | 7 | 0 |
| 2 | 8 | 0 |
| 2 | 9 | 1 |
| 2 | 10 | 0 |
| 2 | 11 | 0 |
| 2 | 12 | 0 |
| 2 | 13 | 0 |
| 2 | 14 | 0 |
| 2 | 15 | 0 |
| 2 | 16 | 0 |
| 2 | 17 | 0 |
| 2 | 18 | 0 |
| 2 | 19 | 0 |
| 2 | 20 | 0 |
| 2 | 21 | 0 |
| 2 | 22 | 0 |
| 2 | 23 | 0 |
+----+------+--------+

Summary of row values according to summation of n number

I have table like this
a | b
_____
1 | 1
2 | 2
3 | 3
4 | 4
5 | 5
and i want the result like this
a | b | c
_________
1 | 1 | 1
2 | 2 | 3
3 | 3 | 6
4 | 4 | 10
5 | 5 | 15