Pandas.Series.mode eventually have multirow results. How to fix it? - pandas

I have this df:
nome_socio cnpj_cpf_socio municipio
Alexandre AAA Curitiba
Alexandre AAA Rio
Alexandre AAA Porto Alegre
Bruno BBB Porto Alegre
Bruno BBB Porto Alegre
I want to get the mode for rows with the same nome_socio and cnpj_cpf_socio. For that I'm using the following code:
moda_municipio=df[['nome_socio','cnpj_cpf_socio','municipio']].groupby(['nome_socio','cnpj_cpf_socio'])['municipio'].apply(pd.Series.mode).to_frame().reset_index().rename(columns={'municipio':"cidade_pred"})
It does find the mode, however since for Alexandre + AAA rows there is a draw between the three municipios it returns three different rows. I'm getting this result:
nome_socio cnpj_cpf_socio level_2 cidade_pred
0 Alexandre AAA 0 Curitiba
1 Alexandre AAA 1 Porto Alegre
2 Alexandre AAA 2 Rio
3 Bruno BBB 0 Porto Alegre
I need to make it look like this:
nome_socio cnpj_cpf_socio level_2 cidade_pred
Alexandre AAA 0 Curitiba, Porto Alegre, Rio
Bruno BBB 0 Porto Alegre
Is there a way to do it?

We should do mode first then join the result
df.groupby(['nome_socio','cnpj_cpf_socio'])['cidade_pred'].agg(lambda x : ','.join(x.mode().tolist()))

Related

Count common friends/nodes

I have the table1 below and want to make a table that shows how many friends Ent1 has in common with any other person from Ent1.
Eg A (short for AAA) has friends B,C,D,E,F while B has friends A,C,E,F. They have C,E,F in common so the outcome should be AAA BBB 3
Can this be done in acces/SQL? I have no clue ...
Table1
Ent1 Ent2 link
AAA BBB friend
AAA CCC friend
AAA DDD friend
AAA EEE friend
AAA FFF friend
BBB AAA friend
BBB CCC friend
BBB EEE friend
BBB FFF friend
CCC AAA friend
CCC BBB friend
CCC EEE friend
CCC FFF friend
DDD AAA friend
DDD KKK friend
DDD LLL friend
EEE AAA friend
EEE BBB friend
EEE CCC friend
FFF AAA friend
FFF BBB friend
FFF CCC friend
KKK DDD friend
LLL DDD friend
The outcome should be :
AAA BBB 3
AAA CCC 3
AAA DDD 0
AAA EEE 2
AAA FFF 2
AAA KKK 1
AAA LLL 1
BBB CCC 3 etc...
Yeah, it can be done. I kinda hate your table and I feel like it could be normalized into two tables to work better but I'm not coming up with how right now. So here's how to do it with your current table in SQL mode:
select s.Person_1, s.Person_2, count(*) as tot
from
(select
a.Ent1 as Person_1
, a.Ent2 as Common_friend
, b.Ent1 as Person_2
from [Table1] as a
inner join [Table1] as b
on a.ent2 = b.ent2
where a.ent1 <> b.ent1) as s
group by s.Person_1, s.Person_2
The way this works is that the subtable s gets all the links you want to count by self-joining your table (the where a.ent1 <> b.ent1 makes it so you don't count a person as linked to themself). Then the outer query counts the number of links between pairs. This code will give pseudo-duplicates as it counts a link between AAA as the first person and BBB as the second person as different from BBB as the first person and AAA as the second person. I'm not coming up with an easy way to fix that, but you the data you want will be there.

SQL Server 2008 - Fill Same Value Based On The Name

I am working on a query and having a difficult time to figure out how to fill the same value based on one column. Let me explain what I am trying to accomplish....
Says, I have a table like this below with too columns: Name & Value. So, "Select Name, Value FROM Table1 Order By Name" will produce the following result.
Table1
Name Value
AAA 111
AAA
BBB 222
BBB
BBB
BBB
CCC 333
CCC
DDD 444
DDD
DDD
Now, What I am trying to accomplish is producing the result below with the "Select .... from Table1" query.
Table1
Name Value
AAA 111
AAA 111
BBB 222
BBB 222
BBB 222
BBB 222
CCC 333
CCC 333
DDD 444
DDD 444
DDD 444
Please help and provide the sample code if possible.
Thanks in advance
You can use MAX() as a window function:
SELECT Name, Value,
MAX(Value) OVER (PARTITION BY Name) as imputed_name
FROM Table1
ORDER BY Name;

MS ACCESS query that returns rows where one column is the same but the other isn't

I have a table with the following columns.
DATE - CUSTOMER - COLOR - JOBNAME -ORDERNUM
I can't figure out how to write a query to return rows that have the same JOBNAME but different date.
Let's say I have
1/9 AAA GREEN JOHN 1235
1/9 AAA GREEN JOHN 1236
1/9 AAA GREEN JOHN 1237
1/8 AAA GREEN JOHN 1238
1/9 BBB ORANGE MATT 1239
1/9 BBB ORANGE MATT 1240
1/12 CCC PINK BRETT 1241
1/5 DDD YELLOW JASON 1242
1/5 DDD YELLOW JASON 1243
I want the query to return only
1/9 AAA GREEN JOHN 1235
1/9 AAA GREEN JOHN 1236
1/9 AAA GREEN JOHN 1237
1/8 AAA GREEN JOHN 1238
because they have the same JOBNAME but different dates.
I would start by getting the list of jobs with different dates:
select jobname
from table
group by jobname
having min(date) <> max(date);
If you want the complete list, then use join or in or exists:
select t.*
from table as t
where t.jobname in (select jobname
from table
group by jobname
having min(date) <> max(date)
);
Did I completely misunderstand something, or are you not just looking for:
SELECT *
FROM MyTable
WHERE JobName = "JOHN"
Because that's basically what your result set is, and it fits your request.

update query question

i have this table MEN
Fname
aaa
bbb
ccc
aaa
aaa
bbb
ggg
i need query that replace all the aaa - in - ZZZ
Fname
ZZZ
bbb
ccc
ZZZ
ZZZ
bbb
ggg
how to do it on Oracle query ?
thanks in advance
Will work not only in Oracle
update MEN
set Fname='ZZZ'
where Fname='aaa';
UPDATE MEN SET Fname='ZZZ' WHERE Fname='aaa';
see http://psoug.org/reference/update.html

MSAccess: Ranking rows based upon column criteria

I have a dataset that looks like this:
Account Cost Centre TransNo
aaa 111 43443
aaa 111 32112
aaa 111 43211
aaa 112 32232
aaa 113 56544
bbb 222 43222
bbb 222 98332
ccc 111 88778
I need a column added that is a counter of the number of rows that relate to that Account/Cost Centre combination:
Account Cost Centre TransNo rCounter
aaa 111 43443 1
aaa 111 32112 2
aaa 111 43211 3
aaa 112 32232 1
aaa 112 56544 2
bbb 222 43222 1
bbb 222 98332 2
ccc 111 88778 1
Is this possible to do in MSAccess using SQL? and how would I go about it (ie what would be the SQL script I would need to write)?
Thanks in advance.
Something like:
SELECT a.Account, a.[Cost Centre], a.TransNo, (SELECT Count(*)
FROM table4 b
WHERE b.Account=a.Account
AND b.[Cost Centre]=a.[Cost Centre]
AND b.TransNo<=a.TransNo) AS AccountNo
FROM Table4 AS a
ORDER BY a.Account, a.[Cost Centre], a.TransNo;