How can I get an expanded version of a dataframe which has lists as values in it? [duplicate] - pandas

This question already has answers here:
How to unnest (explode) a column in a pandas DataFrame, into multiple rows
(16 answers)
Closed 6 months ago.
How can I get an expanded version of a dataframe which has lists as values in it?
Here's a sample of the dataframe I have:
raw = pd.DataFrame().assign(Therapuetic_Area = ['Oncology'],
LocationState = [['Ohio','Illinois','Oregon','New York']])
Now, I need it to look like this edited DataFrame:
edited = pd.DataFrame().assign(Therapuetic_Area = ['Oncology','Oncology','Oncology','Oncology'],LocationState = ['Ohio','Illinois','Oregon','New York'])
Is there a Pandas method I can use for this? How could I get the edited dataframe without having to manually input the values? I can't possibly manually input it because my data is enormously large. Any help would be appreciated!

you can use explode to create rows from the list values
raw.explode('LocationState')
Therapuetic_Area LocationState
0 Oncology Ohio
0 Oncology Illinois
0 Oncology Oregon
0 Oncology New York

Related

How to print the value of a row that returns false using .isin method in python [duplicate]

This question already has answers here:
How do I select rows from a DataFrame based on column values?
(16 answers)
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 4 months ago.
I am new to writing code and currently working on a project to compare two columns of an excel sheet using python and return the rows that does not match.
I tried using the .isin funtion and was able to identify output the values comparing the columns however i am not sure on how to print the actual row that returns the value "False"
For Example:
import pandas as pd
data = ["Darcy Hayward","Barbara Walters","Ruth Fraley","Minerva Ferguson","Tad Sharp","Lesley Fuller","Grayson Dolton","Fiona Ingram","Elise Dolton"]
df = pd.DataFrame(data, columns=['Names'])
df
data1 = ["Darcy Hayward","Barbara Walters","Ruth Fraley","Minerva Ferguson","Tad Sharp","Lesley Fuller","Grayson Dolton","Fiona Ingram"]
df1 = pd.DataFrame(data1, columns=['Names'])
df1
data_compare = df["Names"].isin(df1["Names"])
for data in data_compare:
if data==False:
print(data)
However, i want to know that 8 index returned False, something like the below format
Could you please advise how i can modify the code to get the output printed with the Index, Name that returned False?

Pandas Stack Column Number Mismatch [duplicate]

This question already has answers here:
Pandas: Adding new column to dataframe which is a copy of the index column
(3 answers)
Closed 1 year ago.
Try to stack and result in 3 columns not 1
Hello, I am trying to use the stack function in pandas, but when I use it results in only 1 column when using shape, but displays 3. I see that they are on different levels and I have tried stuff with levels with no success. What can I do I need 3 columns!?
-Thanks
Use new_cl_traff.reset_index()
As you can see in your screenshot you have a multi-index on your dataframe with year and month - see the line where you name the two index levels:
new_cl_traf.index.set_names(["Year","Month"], inplace=True)
You can see the documentation for pandas.stack here
if you use new_cl_traff.reset_index() the index or a subset of levels will be reset - see documentation here

filter multiple separate rows in a DataFrame that meet the condition in another DataFrame with pandas? [duplicate]

This question already has answers here:
How to filter Pandas dataframe using 'in' and 'not in' like in SQL
(11 answers)
Closed 2 years ago.
This is my DataFrame
df = pd.DataFrame({'uid': [109200005, 108200056, 109200060, 108200085, 108200022],
'grades': [69.233627, 70.130900, 83.357011, 88.206387, 74.342212]})
This is my condition list which comes from another DataFrame
condition_list = [109200005, 108200085]
I use this code to filter records that meet the condition
idx_list = []
for i in condition_list:
idx_list.append(df[df['uid']==i].index.values[0])
and get what I need
>>> df.iloc[idx_list]
uid grades
0 109200005 69.233627
3 108200085 88.206387
Job is done. I'd just like to know is there a simpler way to do the job?
Yes, use isin:
df[df['uid'].isin(condition_list)]

How to convert ndarray to pandas DataFrame [duplicate]

This question already has answers here:
Convert two numpy array to dataframe
(3 answers)
Closed 3 years ago.
I have ndarray data with the shape of (231,31). now I want to convert this ndarray to pandas DataFrame with 31 columns. I am using this code:
for i in range (1,32):
dataset = pd.DataFrame({'Column{}'.format(i):data[:,i-1]})
but this code just creates the last column, it means with 231 indexes and just 1 column, but I need 31 columns. is there any way to fix this problem and why it happens?
Every time you are creating a new dataframe, that is why only the last column remains.
You need to create the dataframe with pd.DataFrame(data).

unable to convert groupby dataset to json in pandas [duplicate]

This question already has answers here:
How to reset index in a pandas dataframe? [duplicate]
(3 answers)
Closed 4 years ago.
I have group by data set but I'm unable to convert it to json. It throws out json with a bad format. TO_excel works fine.
Country Sub amount
3 source4
UK 1 source3
1 source1
US 2 source2
How can I export groupby dataset to_json?
There is problem you have MultiIndex in DataFrame, so need reset_index:
j = df.reset_index().to_json()
print (j)
{"Country":{"0":"UK","1":"UK","2":"US"},
"Sub":{"0":1,"1":1,"2":2},
"amount":{"0":"source3","1":"source1","2":"source2"}}