Arrays to row in pandas

Arrays to row in pandas - pandas

I have following dict which I want to convert into pandas. this dict have nested list which can appear for one node but not other.
dis={"companies": [{"object_id": 123,
"name": "Abd ",
"contact_name": ["xxxx",
"yyyy"],
"contact_id":[1234,
33455]
},
{"object_id": 654,
"name": "DDSPP"},
{"object_id": 987,
"name": "CCD"}
]}
AS
object_id, name, contact_name, contact_id
123,Abd,xxxx,1234
123,Abd,yyyy,
654,DDSPP,,
987,CCD,,
How can i achive this
I was trying to do like
abc = pd.DataFrame(dis).set_index['object_id','contact_name']
but it says
'method' object is not subscriptable

This is inspired from #jezrael answer in this link: Splitting multiple columns into rows in pandas dataframe
Use:
s = {"companies": [{"object_id": 123,
"name": "Abd ",
"contact_name": ["xxxx",
"yyyy"],
"contact_id":[1234,
33455]
},
{"object_id": 654,
"name": "DDSPP"},
{"object_id": 987,
"name": "CCD"}
]}
df = pd.DataFrame(s) #convert into DF
df = df['companies'].apply(pd.Series) #this splits the internal keys and values into columns
split1 = df.apply(lambda x: pd.Series(x['contact_id']), axis=1).stack().reset_index(level=1, drop=True)
split2 = df.apply(lambda x: pd.Series(x['contact_name']), axis=1).stack().reset_index(level=1, drop=True)
df1 = pd.concat([split1,split2], axis=1, keys=['contact_id','contact_name'])
pd.options.display.float_format = '{:.0f}'.format
print (df.drop(['contact_id','contact_name'], axis=1).join(df1).reset_index(drop=True))
Output with regular index:
name object_id contact_id contact_name
0 Abd 123 1234 xxxx
1 Abd 123 33455 yyyy
2 DDSPP 654 nan NaN
3 CCD 987 nan NaN
Is this something you were looking for?

If you have just only one column needs to convert, then you can use something more shortly, like this:
df = pd.DataFrame(d['companies'])
d = df.loc[0].apply(pd.Series)
d[1].fillna(d[0], inplace=True)
df.drop([0],0).append(d.T)
Otherwise, if you need to do this with more then one raw, you can use it, but it have to be modified.

Related

Join 2 data frame with special columns matching new

I want to join two dataframes and get result as below. I tried many ways, but it fails.
I want only texts on df2 ['A'] which contain text on df1 ['A']. What do I need to change in my code?
I want:
0 A0_link0
1 A1_link1
2 A2_link2
3 A3_link3
import pandas as pd
df1 = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
})
df2 = pd.DataFrame(
{ "A": ["A0_link0", "A1_link1", "A2_link2", "A3_link3", "A4_link4", 'An_linkn'],
"B" : ["B0_link0", "B1_link1", "B2_link2", "B3_link3", "B4_link4", 'Bn_linkn']
})
result = pd.concat([df1, df2], ignore_index=True, join= "inner", sort=False)
print(result)

Create an intermediate dataframe and map:
d = (df2.assign(key=df2['A'].str.extract(r'([^_]+)'))
.set_index('key'))
df1['A'].map(d['A'])
Output:
0 A0_link0
1 A1_link1
2 A2_link2
3 A3_link3
Name: A, dtype: object
Or merge if you want several columns from df2 (df1.merge(d, left_on='A', right_index=True))

You can set the index as An and pd.concat on columns
result = (pd.concat([df1.set_index(df1['A']),
df2.set_index(df2['A'].str.split('_').str[0])],
axis=1, join="inner", sort=False)
.reset_index(drop=True))
print(result)
A A B
0 A0 A0_link0 B0_link0
1 A1 A1_link1 B1_link1
2 A2 A2_link2 B2_link2
3 A3 A3_link3 B3_link3

df2.A.loc[df2.A.str.split('_',expand=True).iloc[:,0].isin(df1.A)]
0 A0_link0
1 A1_link1
2 A2_link2
3 A3_link3

Pandas dataframe - column with list of dictionaries, extract values and convert to comma separated values

I have the following dataframe that I want to extract each numerical value from the list of dictionaries and keep in the same column.
for instance for the first row I would want to see in the data column: 179386782, 18017252, 123452
id
data
12345
[{'id': '179386782'}, {'id': 18017252}, {'id': 123452}]
below is my code to create the dataframe above ( I've hardcoded stories_data as an example)
for business_account in data:
business_account_id = business_account[0]
stories_data = {'data': [{'id': '179386782'}, {'id': '18017252'}, {'id': '123452'}]}
df = pd.DataFrame(stories_data.items())
df.set_index(0, inplace=True)
df = df.transpose()
df_stories['id'] = business_account_id
col = df_stories.pop("id")
df_stories.insert(0, col.name, col)
I've tried this: df_stories["data"].str[0]
but this only returns the first element (dictionary) in the list

Try:
df['data'] = df['data'].apply(lambda x: ', '.join([str(d['id']) for d in x]))
print(df)
# Output:
id data
0 12345 179386782, 18017252, 123452
Another way:
df['data'] = df['data'].explode().str['id'].astype(str) \
.groupby(level=0).agg(', '.join)
print(df)
# Output:
id data
0 12345 179386782, 18017252, 123452

Conditional mapping among columns of two data frames with Pandas Data frame

I needed your advice regarding how to map columns between data-frames:
I have put it in simple way so that it's easier for you to understand:
df = dataframe
EXAMPLE:
df1 = pd.DataFrame({
"X": [],
"Y": [],
"Z": []
})
df2 = pd.DataFrame({
"A": ['', '', 'A1'],
"C": ['', '', 'C1'],
"D": ['D1', 'Other', 'D3'],
"F": ['', '', ''],
"G": ['G1', '', 'G3'],
"H": ['H1', 'H2', 'H3']
})
Requirement:
1st step:
We needed to track a value for X column on df1 from columns A, C, D respectively. It would stop searching once it finds any value and would select it.
2nd step:
If the selected value is "Other" then X column of df1 would map columns F, G, and H respectively until it finds any value.
Result:
X
0 D1
1 H2
2 A1
Thank you so much in advance

Try this:
def first_non_empty(df, cols):
"""Return the first non-empty, non-null value among the specified columns per row"""
return df[cols].replace('', pd.NA).bfill(axis=1).iloc[:, 0]
col_x = first_non_empty(df2, ['A','C','D'])
col_x = col_x.mask(col_x == 'Other', first_non_empty(df2, ['F','G','H']))
df1['X'] = col_x

Pandas merge two columns into Json

I have a pandas dataframe like below
Col1 Col2
0 a apple
1 a anar
2 b ball
3 b banana
I am looking to output json which outputs like
{ 'a' : ['apple', 'anar'], 'b' : ['ball', 'banana'] }

Use groupby with apply and last convert Series to json by Series.to_json:
j = df.groupby('Col1')['Col2'].apply(list).to_json()
print (j)
{"a":["apple","anar"],"b":["ball","banana"]}
If want write json to file:
s = df.groupby('Col1')['Col2'].apply(list)
s.to_json('file.json')
Check difference:
j = df.groupby('Col1')['Col2'].apply(list).to_json()
d = df.groupby('Col1')['Col2'].apply(list).to_dict()
print (j)
{"a":["apple","anar"],"b":["ball","banana"]}
print (d)
{'a': ['apple', 'anar'], 'b': ['ball', 'banana']}
print (type(j))
<class 'str'>
print (type(d))
<class 'dict'>

You can groupby() 'Col1' and apply() list to 'Col2' and convert to_dict(), Use:
df.groupby('Col1')['Col2'].apply(list).to_dict()
Output:
{'a': ['apple', 'anar'], 'b': ['ball', 'banana']}

Defining a function to play a graph from CSV data - Python panda

I am trying to play around with data analysis, taking in data from a simple CSV file I have created with random values in it.
I have defined a function that should allow the user to type in a value3 then from the dataFrame, plot a bar graph. The below:
def analysis_currency_pair():
x=raw_input("what currency pair would you like to analysie ? :")
print type(x)
global dataFrame
df1=dataFrame
df2=df1[['currencyPair','amount']]
df2 = df2.groupby(['currencyPair']).sum()
df2 = df2.loc[x].plot(kind = 'bar')
When I call the function, the code returns my question, along with giving the output of the currency pair. However, it doesn't seem to put x (the value input by the user) into the later half of the function, and so no graph is produced.
Am I doing something wrong here?
This code works when we just put the value in, and not within a function.
I am confused!

I think you need rewrite your function with two parameters: x and df, which are passed to function analysis_currency_pair:
import pandas as pd
df = pd.DataFrame({"currencyPair": pd.Series({1: 'EURUSD', 2: 'EURGBP', 3: 'CADUSD'}),
"amount": pd.Series({1: 2, 2: 2, 3: 3.5}),
"a": pd.Series({1: 7, 2: 8, 3: 9})})
print df
# a amount currencyPair
#1 7 2.0 EURUSD
#2 8 2.0 EURGBP
#3 9 3.5 CADUSD
def analysis_currency_pair(x, df1):
print type(x)
df2=df1[['currencyPair','amount']]
df2 = df2.groupby(['currencyPair']).sum()
df2 = df2.loc[x].plot(kind = 'bar')
#raw input is EURUSD or EURGBP or CADUSD
pair=raw_input("what currency pair would you like to analysie ? :")
analysis_currency_pair(pair, df)
Or you can pass string to function analysis_currency_pair:
import pandas as pd
df = pd.DataFrame({"currencyPair": [ 'EURUSD', 'EURGBP', 'CADUSD', 'EURUSD', 'EURGBP'],
"amount": [ 1, 2, 3, 4, 5],
"amount1": [ 5, 4, 3, 2, 1]})
print df
# amount amount1 currencyPair
#0 1 5 EURUSD
#1 2 4 EURGBP
#2 3 3 CADUSD
#3 4 2 EURUSD
#4 5 1 EURGBP
def analysis_currency_pair(x, df1):
print type(x)
#<type 'str'>
df2=df1[['currencyPair','amount']]
df2 = df2.groupby(['currencyPair']).sum()
print df2
# amount
#currencyPair
#CADUSD 3
#EURGBP 7
#EURUSD 5
df2 = df2.loc[x].plot(kind = 'bar')
analysis_currency_pair('CADUSD', df)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Arrays to row in pandas - pandas

Related

Join 2 data frame with special columns matching new

Pandas dataframe - column with list of dictionaries, extract values and convert to comma separated values

Conditional mapping among columns of two data frames with Pandas Data frame

Pandas merge two columns into Json

Defining a function to play a graph from CSV data - Python panda

Categories

Resources