Find a column that contains a specific string from another column

Find a column that contains a specific string from another column - pandas

I have 2 data frames. One called cuartos (rooms in English) and another called paredes (walls in English) They have room temperatures and walls temperatures. I want to create a new data frame with the temperature difference between each wall and its room. For example
Room name = 2_APTO_1
Walls of the room = 2_APTO_1.FACE2, 2_APTO_1.FACE3 and 2_APTO_1.FACE4
The new data frame should be something like
2_APTO_1.FACE2 = 2_APTO_1.FACE2 - 2_APTO_1
2_APTO_1.FACE3 = 2_APTO_1.FACE3 - 2_APTO_1
2_APTO_1.FACE4 = 2_APTO_1.FACE4 - 2_APTO_1 ....
I tried this:
get a list of paredes and cuartos columns
col_names_paredes= paredes.columns.tolist()
col_names_cuartos= cuartos.columns.tolist()
Check if col_names_paredes has col_names_cuartos names in it
for i in col_names_cuartos:
for k in col_names_paredes:
if col_names_paredes[k] in col_names_cuartos[i]:
print(k)
I got this error
TypeError: list indices must be integers or slices, not str
any help would be appreciated.

When you do for i in col_names_cuartos, i will take column names values, not indice values that you would obtain with for i in range(len(col_names_cuartos)).
So you can use the following code instead :
for col_cuartos in col_names_cuartos:
for col_paredes in col_names_paredes:
if col_paredes in col_cuartos:
print(col_paredes)

Related

HighRadius Ivoice prediction Challenges

Generate a new column "avgdelay" from the existing columns
Note - You are expected to make a new column "avgdelay" by grouping "name_customer" column with reapect to mean of the "Delay" column.
This new column "avg_delay" is meant to store "customer_name" wise delay
groupby('name_customer')['Delay'].mean(numeric_only=False)
Display the new "avg_delay" column
Can Anyone guide me

Let df be your main data frame
avgdelay = df.groupby('name_customer')['Delay'].mean(numeric_only=False)
You need to add the "avg_delay" column with the maindata, mapped with "name_customer" column
Note - You need to use map function to map the avgdelay with respect to "name_customer" column
df['avg_delay'] = df['name_customer'].map(avgdelay)

Try to check if column consists out of 3 numbers, and change the value to the first number of the column

I'm trying to create a new column called 'team'. In the image below you see different type of codes. The first number of the code is the team someone's in, IF the number consists out of 3 characters. E.G: 315 = team 3, 240 = team 2, and 3300 = NULL.
In the image below you can see my data flow so far and the expression I have tried, but doesn't work.

You forget parenthesis () in your regex :
Try :
^([0-9]{3})$
Demo

[pandas]Dividing all elements of columns in df with elements in another column (Same df)

I'm sorry, I know this is basic but I've tried to figure it out myself for 2 days by sifting through documentation to no avail.
My code:
import numpy as np
import pandas as pd
name = ["bob","bobby","bombastic"]
age = [10,20,30]
price = [111,222,333]
share = [3,6,9]
list = [name,age,price,share]
list2 = np.transpose(list)
dftest = pd.DataFrame(list2, columns = ["name","age","price","share"])
print(dftest)
name age price share
0 bob 10 111 3
1 bobby 20 222 6
2 bombastic 30 333 9
Want to divide all elements in 'price' column with all elements in 'share' column. I've tried:
print(dftest[['price']/['share']]) - Failed
dftest['price']/dftest['share'] - Failed, unsupported operand type
dftest.loc[:,'price']/dftest.loc[:,'share'] - Failed
Wondering if I could just change everything to int or float, I tried:
dftest.astype(float) - cant convert from str to float
Ive tried iter and items methods but could not understand the printouts...
My only suspicion is to use something called iterate, which I am unable to wrap my head around despite reading other old posts...
Please help me T_T

Apologies in advance for the somewhat protracted answer, but the question is somewhat unclear with regards to what exactly you're attempting to accomplish.
If you simply want price[0]/share[0], price[1]/share[1], etc. you can just do:
dftest['price_div_share'] = dftest['price'] / dftest['share']
The issue with the operand types can be solved by:
dftest['price_div_share'] = dftest['price'].astype(float) / dftest['share'].astype(float)
You're getting the cant convert from str to float error because you're trying to call astype(float) on the ENTIRE dataframe which contains string columns.
If you want to divide each item by each item, i.e. price[0] / share[0], price[1] / share[0], price[2] / share[0], price[0] / share[1], etc. You would need to iterate through each item and append the result to a new list. You can do that pretty easily with a for loop, although it may take some time if you're working with a large dataset. It would look something like this if you simply want the result:
new_list = []
for p in dftest['price'].astype(float):
for s in dftest['share'].astype(float):
new_list.append(p/s)
If you want to get this in a new dataframe you can simply save it to a new dataframe using pd.Dataframe() method:
new_df = pd.Dataframe(new_list, columns=[price_divided_by_share])
This new dataframe would only have one column (the result, as mentioned above). If you want the information from the original dataframe as well, then you would do something like the following:
new_list = []
for n, a, p in zip(dftest['name'], dftest['age'], dftest['price'].astype(float):
for s in dftest['share'].astype(float):
new_list.append([n, a, p, s, p/s])
new_df = pd.Dataframe(new_list, columns=[name, age, price, share, price_div_by_share])

If you check the data types of your dataframe, you will realise that they are all strings/object type :
dftest.dtypes
name object
age object
price object
share object
dtype: object
first step will be to change the relevant columns to numbers - this is one way:
dftest = dftest.set_index("name").astype(float)
dftest.dtypes
age float64
price float64
share float64
dtype: object
This way you make the names a useful index, and separate it from the numeric data. This is just a suggestion; you may have other reasons to leave names as a columns - in that case, you have to individually change the data types of each column.
Once that is done, you can safely execute your code :
dftest.div(dftest.share,axis=0)
age price share
name
bob 3.333333 37.0 1.0
bobby 3.333333 37.0 1.0
bombastic 3.333333 37.0 1.0
I assume this is what you expect as your outcome. If not, you can tweak it. Main part is get your data types as numbers before computation/division can occur.

Performing calculations on multiple columns in dataframe and create new columns

I'm trying to perform calculations based on the entries in a pandas dataframe. The dataframe looks something like this:
and it contains 1466 rows. I'll have to run similar calculations on other dfs with more rows later.
What I'm trying to do, is calculate something like mag='(U-V)/('R-I)' (but ignoring any values that are -999), put that in a new column, and then z_pred=10**((mag-c)m) in a new column (mag, c and m are just hard-coded variables). I have other columns I need to add too, but I figure that'll just be an extension of the same method.
I started out by trying
for i in range(1):
current = qso[:]
mag = (U-V)/(R-I)
name = current['NED']
z_pred = 10**((mag - c)/m)
z_meas = current['z']
but I got either a Series for z, which I couldn't operate on, or various type errors when I tried to print the values or write them to a file.
I found this question which gave me a start, but I can't see how to apply it to multiple calculations, as in my situation.
How can I achieve this?

Conditionally adding calculated columns row wise are usually performed with numpy's np.where;
df['mag'] = np.where(~df[['U', 'V', 'R', 'I']].eq(-999).any(1), (df.U - df.V) / (df.R - df.I), -999)
Note; assuming here that when any of the columns contain '-999' it will not be calculated and a '-999' is returned.

One ggplot from two data frames (1 bar each)

I was looking for an answer everywhere, but I just couldn't find one to this problem (maybe I was just too stupid to use other answers, because I'm new to R).
I have two data frames with different numbers of rows. I want to create a plot containing a single bar per data frame. Both should have the same length and the count of different variables should be stacked over each other. For example: I want to compare the proportions of gender in those to data sets.
t1<-data.frame(cbind(c(1:6), factor(c(1,2,2,1,2,2))))
t2<-data.frame(cbind(c(1:4), factor(c(1,2,2,1))))
1 represents male, 2 represents female
I want to create two barplots next to each other that represent, that the proportions of gender in the first data frame is 2:4 and in the second one 2:2.
My attempt looked like this:
ggplot() + geom_bar(aes(1, t1$X2, position = "fill")) + geom_bar(aes(1, t2$X2, position = "fill"))
That leads to the error: "Error: stat_count() must not be used with a y aesthetic."

First I should merge the two dataframes. You need to add a variable that will identify the origin of the data, add in both dataframes a column with an ID (like t1 and t2). Keep in mind that your columnames are the same in both frames so you will be able to use the function rbind.
t1$data <- "t1"
t2$data <- "t2"
t <- (rbind(t1,t2))
Now you can make the plot:
ggplot(t[order(t$X2),], aes(data, X2, fill=factor(X2))) +
geom_bar(stat="identity", position="stack")

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find a column that contains a specific string from another column - pandas

Related

HighRadius Ivoice prediction Challenges

Try to check if column consists out of 3 numbers, and change the value to the first number of the column

[pandas]Dividing all elements of columns in df with elements in another column (Same df)

Performing calculations on multiple columns in dataframe and create new columns

One ggplot from two data frames (1 bar each)

Categories

Resources