convert a list into one row pandas dataframe - pandas

I need to take a Words list, and break it into a one row pandas dataframe with a similar number of columns.
this is the Words list:
Words
they
ate
apples
and
then
and the desired pandas dataframe output would be consisted of one row and five columns:
first
second
third
fourth
fifth
they
ate
apples
and
then

Related

Pandas Matching exact key phrases / words Between Main dataframe and Keywords dataframe

I have 2 dataframes. First one is my main data with hundred-thousands of rows.
Another is where my list of keywords/phrases are kept.
In the main data, there will be a column called 'Match' where the matching keywords/phrases are found. Example: string from main data is "I have a red apple and a pear.", and the keywords/phrases I have are "apple", "pear" and "red apple" (Non-Case sensitive)
The output im looking for is to have multiple columns of matching results ("red apple" in 1st column, "apple" in 2nd column and "pear" in 3rd column. Order of the matching keywords/phrases are not crucial).
If not, I can just have red apple and pear

Joining all elements in an array in a dataframe column with another dataframe

Let's say pcPartsInfoDf has the columns
pcPartCode:integer
pcPartName:string
And df has the array column
pcPartCodeList:array
|-- element:integer
The pcPartCodeList in df has a list of codes for each row that match with pcPartCode values in pcPartsInfoDf, but only pcPartsInfoDf has the names of the parts.
I'm trying to join the two dataframes so that we get a new column that is an array of strings for all the pc part names for a row, corresponding to the array of ints, pcPartCodeList. I tried doing this with the code below, but this only adds at most 1 part since pcPartName is typed as a string and only holds 1 value.
df
.join(pcPartsInfoDf, expr("array_contains(pcPartCodeList, pcPartCode"))
.select(computerDf("*"), pcPartsInfoDf("pcPartName"))
How could I collect all the pcPartName values corresponding to a pcPartCodeList for a row, and put them in an array of strings in that row?

Pandas: Extracting data from sorted dataframe

Consider I have a dataframe with 2 columns: the first column is 'Name' in the form of a string and the second is 'score' in type int. There are many duplicate Names and they are sorted such that the all 'Name1's will be in consecutive rows, followed by 'Name2', and so on. Each row may contain a different score.The number of duplicate names may also be different for each unique string.'
I wish to extract data afrom this dataframe and put it in a new dataframe such that There are no duplicate names in the name column, and each name's corresponding score is the average of his scores in the original dataframe.
I've provided a picture for a better visualization:
Firstly make use of groupby() method as mentioned by #QuangHong:
result=df.groupby('Name', as_index=False)['Score'].mean()
Finally make use of rename() method:
result=result.rename(columns={'Score':'Avg Score'})

Iterate two dataframes, compare and change a value in pandas or pyspark

I am trying to do an exercise in pandas.
I have two dataframes. I need to compare few columns between both dataframes and change the value of one column in the first dataframe if the comparison is successful.
Dataframe 1:
Article Country Colour Buy
Pants Germany Red 0
Pull Poland Blue 0
Initially all my articles have the flag 'Buy' set to zero.
I have dataframe 2 that looks as:
Article Origin Colour
Pull Poland Blue
Dress Italy Red
I want to check if the article, country/origin and colour columns match (so check whether I can find the each article from dataframe 1 in dataframe two) and, if so, I want to put the flag 'Buy' to 1.
I trying to iterate through both dataframe with pyspark but pyspark daatframes are not iterable.
I thought about doing it in pandas but apaprently is a bad practise to change values during iteration.
Which code in pyspark or pandas would work to do what I need to do?
Thanks!
merge with an indicator then map the values. Make sure to drop_duplicates on the merge keys in the right frame so the merge result is always the same length as the original, and rename so we don't repeat the same information after the merge. No need to have a pre-defined column of 0s.
df1 = df1.drop(columns='Buy')
df1 = df1.merge(df2.drop_duplicates().rename(columns={'Origin': 'Country'}),
indicator='Buy', how='left')
df1['Buy'] = df1['Buy'].map({'left_only': 0, 'both': 1}).astype(int)
Article Country Colour Buy
0 Pants Germany Red 0
1 Pull Poland Blue 1

Convert Series to Dataframe where series index is Dataframe column names

I am selecting row by row as follows:
for i in range(num_rows):
row = df.iloc[i]
as a result I am getting a Series object where row.index.values contains names of df columns.
But I wanted instead dataframe with only one row having dataframe columns in place.
When I do row.to_frame() instead of 1x85 dataframe (1 row, 85 cols) I get 85x1 dataframe where index contains names of columns and row.columns
outputs
Int64Index([0], dtype='int64').
But all I want is just original data-frame columns with only one row. How do I do it?
Or how do I convert row.index values to row.column values and change 85x1 dimension to 1x85
You just need to adding T
row.to_frame().T
Also change your for loop with adding []
for i in range(num_rows):
row = df.iloc[[i]]