Create a new column for group based on condition - pandas

I wanted to create a new column (Group ID) on the basis of following conditions:
If the DOB and first three letters of Name are same, then it must fall is same Group ID.
Name
DOB
Group ID
Anny
18-01-1922
0
Anny Scott
01-01-1950
1
Annie
01-01-1950
1
David
14-02-1950
2
David Kern
15-02-1951
3
William Perry
15-02-1953
4
Kenneth Field
15-02-1953
5
This how I want to create the groups
I have used the following code, to create the group ID for name (If first three letters are matched)
df['Group ID Name']=df.groupby(df['name'].str[:3]).ngroup()
The following code is used to create the group ID for DOB (If two records have the same DOB)
df['Group ID DOB']=df.groupby('Date of Birth').ngroup()
I want to use both the condition to create the Group ID, please help me out for the same.

Add multiple columns in list and also for correct ordering sort=False:
df['Group ID Name'] = df.groupby(['DOB',df['Name'].str[:3]], sort=False).ngroup()
print (df)
Name DOB Group;ID Group ID Name
0 Anny 18-01-1922 0 0
1 Anny Scott 01-01-1950 1 1
2 Annie 01-01-1950 1 1
3 David 14-02-1950 2 2
4 David Kern 15-02-1951 3 3
5 William erry 15-02-1953 4 4
6 Kenneth Field 15-02-1953 5 5

Related

Sql: add a column with integers in a loop for duplicates

I have a sql table like:
ID Name Balance
1 Peter 324.5
2 Michael 122.7
3 Peter 788.3
4 Mark 45.7
5 Ralph 333.5
6 Thomas 563.2
7 Ralph 9685.1
8 Peter 2444.5
9 Susi 35.2
10 Andrew 442.5
11 Susi 2424.8
Is it possible to write a while loop in sql, where you could add a whole new column with integer numbers (for example 1....3) for each duplicate names (3 times Peter, 2 times Susi, 2 times Ralph)? For the non duplicate names it should be a value of 0.
So the final table should look like this:
ID Name Balance Value
1 Peter 324.5 1
2 Michael 122.7 0
3 Peter 788.3 1
4 Mark 45.7 0
5 Ralph 333.5 2
6 Thomas 563.2 0
7 Ralph 9685.1 2
8 Peter 2444.5 1
9 Susi 35.2 3
10 Andrew 442.5 0
11 Susi 2424.8 3
You wouldn't want to use a while loop for this. Just use window functions:
select t.*, count(*) over (partition by name) as cnt
from t;
This provides the total count for each name. If you want an incremental value, you can use row_number():
select t.*, row_number() over (partition by name order by id) as seqnum
from t;
This would enumerate the rows for each name, so every name would have a "1" value, some would have "2" and so on.

Using a Case statement for an alias

I'm trying to change the header of a column based on a variable
Currently I have
SELECT
(CASE
WHEN GROUPING(CASE ##Role
WHEN 2 THEN Processor
WHEN 3 THEN Reviewer
END) = 1
THEN 'Total'
ELSE (CASE ##Role
WHEN 2 THEN Processor
WHEN 3 THEN Reviewer
END)
END) AS 'User',
COUNT(EntityId) AS 'Tickets Processed'
FROM
table
WHERE
conditions
GROUP BY
CASE ##Role
WHEN 2 THEN Processor
WHEN 3 THEN Reviewer
END WITH ROLLUP
Right now this returns the data I need for the correct role, however is there a way to change the second column's header based on the variable to something like
COUNT(EntityId) AS CASE ##Role
WHEN 2 THEN 'Tickets Processed'
WHEN 3 THEN 'Tickets Reviewed'
END
EDIT:
Sample of current result:
##Role = 2 or ##Role = 3
Both return:
User Tickets Processed
-----------------------------
Steve 1
Gerald 3
John 1
Paul 2
Peter 5
Total 12
Desired result:
##Role = 2
User Tickets Processed
-----------------------------
Steve 1
Gerald 3
John 1
Paul 2
Peter 5
Total 12
##Role = 3
User Tickets Reviewed
-----------------------------
Steve 1
Gerald 3
John 1
Paul 2
Peter 5
Total 12
Sample data
EntityID Processor Reviewer
----------------------------------
1 Peter Bob
2 Peter Paul
3 Peter Bob
4 John Paul
5 Peter Bob
6 Peter Bob
...
You can either use dynamic sql, or you can split the logic based on the ##role variable:
IF ##Role = 2 THEN {do Query A}
ELSE {do Query B}
But you definitely cannot base the column alias on the value of a variable in the context of a non-dynamic query.

How to add values to the pandas dataframe coulmn depending upon value of column in other dataframe

I have pandas dataframe somthing like this. This dataframe contains unique user_id and corresponding user-name
df
user_id user_name
1 Jack
2 Neil
3 Peter
4 Smith
5 Neev
And I have second dataframe something like this
df1
user_id item_id user_name
1 23 Null
1 24 Null
2 34 Null
3 35 Null
5 45 Null
I want to fill user_name column above from the 1st dataframe. So,where user_id is matched it should enter corresponding user_name in that position.
So it should look like this..
df1
user_id item_id user_name
1 23 Jack
1 24 Jack
2 34 Neil
3 35 Peter
5 45 Neev
I am doin following in python
b = df.user_name[df['user_id'].isin(df1['user_id'])]
df1['user_name'] = b
But,It drops duplicates. I don't want to do that. Please help.
Use merge:
In [299]:
df1[['user_id','item_id']].merge(df,on='user_id')
Out[299]:
user_id item_id user_name
0 1 23 Jack
1 1 24 Jack
2 2 34 Neil
3 3 35 Peter
4 5 45 Neev

Row aggregation of count-distinct measure

I have a fairly simple project set up to demonstrate what I want here. Here's the data:
Group
ID Name
1 Group 1
2 Group 2
3 Group 3
Person
ID GroupID Age Name
1 1 18 John
2 1 21 Stephen
3 1 18 Kate
4 2 18 Mary
5 2 19 Joseph
6 2 19 Michael
7 3 21 David
8 3 22 Kevin
9 3 21 Julian
I have 1 measure in my cube called Person Count which is a Distinct count on Person ID
I have set up each non-ID column in the dimensions as attributes (Age, Person Name, Group).
When I process and browse the cube in Business Intelligence Development Studio, I get the following result set:
But what I actually want here are the rows for Age to aggregate up the count of the Person Count together, so here it should show 2 and only one row for 18.
Is this possible (and how)?
Turns out this was a problem with the way I set up the Age attribute for the dimension.
I had:
KeyColumns = Person.ID
ValueColumn = Person.Age.
I don't know why I did this, but the solution is to delete the content of ValueColumn and set the KeyColumns to Person.Age again.
I now get the following result:
Everything else is the same for the project; this was the only change and is exactly what I wanted. If I get any issues with it I will keep this post updated for anyone else who may run into this in the future.

SQL comparing two tables with common id but the id in table 2 could being in two different columns

Given the following SQL tables:
Administrators:
id Name rating
1 Jeff 48
2 Albert 55
3 Ken 35
4 France 56
5 Samantha 52
6 Jeff 50
Meetings:
id originatorid Assitantid
1 3 5
2 6 3
3 1 2
4 6 4
I would like to generate a table from Ken's point of view (id=3) therefore his id could be possibly present in two different columns in the meetings' table. (The statement IN does not work since I introduce two different field columns).
Thus the ouput would be:
id originatorid Assitantid
1 3 5
2 6 3
If you really just need to see which column Ken's id is in, you only need an OR. The following will produce your example output exactly.
SELECT * FROM Meetings WHERE originatorid = 3 OR Assistantid = 3;
If you need to take the complex route and list names along with meetings, an OR in your join's ON clause should work here:
SELECT
Administrators.name,
Administrators.id,
Meetings.originatorid,
Meetings.Assistantid
FROM Administrators
JOIN Meetings
ON Administrators.id = Meetings.originatorid
OR Administrators.id = Meetings.Assistantid
Where Administrators.name = 'Ken'