insert names on a dataframe read on a URL - pandas

i want insert this names list ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'] on my data frame export from this URL https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv
I want creat a dataframe or list with names
Thks

df = pd.read_csv('https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv',
names = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class'])

Related

How to combine two columns into one in a DataFrama

I have a DataFrame with two columns and want to combine their contents into another column. In the 3rd column, I will like to replace all entries that are 'hello' by the corresponding non 'hello' term. But my just returns df['C'] as the string addition of the df['A'] and df['B'].
df = pd.DataFrame({'A' : ['here', 'there', 'hello', 'hello', 'hello'],
'B' : ['hello', 'hello', 'go', 'click', 'clack']})
df['C'] = df['A'] + df['B']
I will be glad if someone can help me.

Explode 1 column in dataframe that has a list of dictionaries, each dictionary should be a new column

I have a dataframe that has a column that looks likes this:
attachments.data
Heading 2
[{'title': 'Test Title for Testing', 'unshimmed_url': 'https://www.etc.com'}]
34
[{'title': 'This is another Test Title for Testing', 'unshimmed_url': 'https://www.etc2.com'}]
42
and would like to separate out title and make it a new column name and unshimmed_url to link
I've tried this but I think I'm missing a step because I lose the Heading 2 and Heading 3 columns and now its just the name and link columns..
s = df['attachments.data'].explode()
calcu = pd.DataFrame(s.tolist(), index=s.index)
df2 = calcu.rename(columns={'title': 'name', 'unshimmed_url': 'link'})
pop the attachments.data, create the DataFrame in the same way, but then join back to df:
s = df.pop('attachments.data').explode()
df = df.join(
pd.DataFrame(s.tolist(), index=s.index)
.rename(columns={'title': 'name', 'unshimmed_url': 'link'})
)
df:
Heading 2 name link
0 34 Test Title for Testing https://www.etc.com
1 42 This is another Test Title for Testing https://www.etc2.com
Or without modifying df drop and create a new DataFrame:
s = df['attachments.data'].explode()
df2 = df.drop(columns='attachments.data').join(
pd.DataFrame(s.tolist(), index=s.index)
.rename(columns={'title': 'name', 'unshimmed_url': 'link'})
)
Or with apply pd.Series to build the new columns:
df2 = df.drop(columns='attachments.data').join(
df['attachments.data'].explode()
.apply(pd.Series)
.rename(columns={'title': 'name', 'unshimmed_url': 'link'})
)
df2:
Heading 2 name link
0 34 Test Title for Testing https://www.etc.com
1 42 This is another Test Title for Testing https://www.etc2.com
DataFrame and imports:
import pandas as pd
df = pd.DataFrame({
'attachments.data': [
[{'title': 'Test Title for Testing',
'unshimmed_url': 'https://www.etc.com'}],
[{'title': 'This is another Test Title for Testing',
'unshimmed_url': 'https://www.etc2.com'}]],
'Heading 2': [34, 42]
})

GroupBy Function Not Applying

I am trying to groupby for the following specializations but I am not getting the expected result (or any for that matter). The data stays ungrouped even after this step. Any idea what's wrong in my code?
cols_specials = ['Enterprise ID','Specialization','Specialization Branches','Specialization Type']
specials = pd.read_csv(agg_specials, engine='python')
specials = specials.merge(roster, left_on='Enterprise ID', right_on='Enterprise ID', how='left')
specials = specials[cols_specials]
specials = specials.groupby(['Enterprise ID'])['Specialization'].transform(lambda x: '; '.join(str(x)))
specials.to_csv(end_report_specials, index=False, encoding='utf-8-sig')
Please try using agg:
import pandas as pd
df = pd.DataFrame(
[
['john', 'eng', 'build'],
['john', 'math', 'build'],
['kevin', 'math', 'asp'],
['nick', 'sci', 'spi']
],
columns = ['id', 'spec', 'type']
)
df.groupby(['id'])[['spec']].agg(lambda x: ';'.join(x))
resiults in:
if you need to preserve starting number of lines, use transform. transform returns one column:
df['spec_grouped'] = df.groupby(['id'])[['spec']].transform(lambda x: ';'.join(x))
df
results in:

Grouping and heading pandas dataframe

I have the following dataframe of securities and computed a 'liquidity score' in the last column, where 1 = liquid, 2 = less liquid, and 3 = illiquid. I want to group the securities (dynamically) by their liquidity. Is there a way to group them and include some kind of header for each group? How can this be best achieved. Below is the code and some example, how it is supposed to look like.
import pandas as pd
df = pd.DataFrame({'ID':['XS123', 'US3312', 'DE405'], 'Currency':['EUR', 'EUR', 'USD'], 'Liquidity score':[2,3,1]})
df = df.sort_values(by=["Liquidity score"])
print(df)
# 1 = liquid, 2 = less liquid,, 3 = illiquid
Add labels for liquidity score
The following replaces labels for numbers in Liquidity score:
df['grp'] = df['Liquidity score'].replace({1:'Liquid', 2:'Less liquid', 3:'Illiquid'})
Headers for each group
As per your comment, find below a solution to do this.
Let's illustrate this with a small data example.
df = pd.DataFrame({'ID':['XS223', 'US934', 'US905', 'XS224', 'XS223'], 'Currency':['EUR', 'USD', 'USD','EUR','EUR',]})
Insert a header on specific rows using np.insert.
df = pd.DataFrame(np.insert(df.values, 0, values=["Liquid", ""], axis=0))
df = pd.DataFrame(np.insert(df.values, 2, values=["Less liquid", ""], axis=0))
df.columns = ['ID', 'Currency']
Using Pandas styler, we can add a background color, change font weight to bold and align the text to the left.
df.style.hide_index().set_properties(subset = pd.IndexSlice[[0,2], :], **{'font-weight' : 'bold', 'background-color' : 'lightblue', 'text-align': 'left'})
You can add a new column like this:
df['group'] = np.select(
[
df['Liquidity score'].eq(1),
df['Liquidity score'].eq(2)
],
[
'Liquid','Less liquid'
],
default='Illiquid'
)
And try setting as index, so you can filter using the index:
df.set_index(['grouping','ID'], inplace=True)
df.loc['Less liquid',:]

Can the index name be accessed/retrieved from a pandas.core.groupby.generic.SeriesGroupBy object?

I created a pandas.core.groupby.generic.SeriesGroupBy object from a DataFrame like so:
df = pd.DataFrame(np.arange(16).reshape((4,4)), columns=list('ABCD'))
gobj = df['B'].groupby(df['A'])
I know how to retrieve the column name from gobj (gobj._selection_name returns 'B'), but I don't know how to retrieve the name of the index (which is 'A'). Is it possible to access/retrieve that from gobj?
it is
gobj.keys.name
Out[57]: 'A'
Based on the source code it looks like there are three options:
df = pd.DataFrame(np.arange(16).reshape((4,4)), columns=list('ABCD'))
gobj = df.groupby(['C', 'A'])['B']
print(gobj.grouper.result_index.names)
print(gobj.grouper.names)
print(gobj.grouper.groupings)
# out
# ['C', 'A']
# ['C', 'A']
# [Grouping(C), Grouping(A)]