I have two dataframes. The first df includes column b&c that has multiple stings seperated by a comma. the second has three columns, one that includes all stings in column B, two that includes all strings in c, and three is the resulting string I want to use.
x <- data.frame("uuid" = 1:2, "first" = c("jeff,fred,amy","tina,cat,dog"), "job" = c("bank teller,short cook, sky diver, no job, unknown job","bank clerk,short pet, ocean diver, hot job, rad job"))
x1 <- data.frame("meta" = c("ace", "king", "queen", "jack", 10, 9, 8,7,6,5,4,3), "first" = c("jeff","jeff","fred","amy","tina","cat","dog","fred","amy","tina","cat","dog"), "job" = c("bank teller","short cook", "sky diver", "no job", "unknown job","bank clerk","short pet", "ocean diver", "hot job", "rad job","bank teller","short cook"))
The result would be
result <- data.frame("uuid" = 1:2, "combined" = c("ace,king,queen,jack","5,9,8"))
Thank you in advance!
I tried to beat my head against the wall and it didn't help
Edit- This is the first half of the puzzle BUT it does not search for and then concat the strings together in a cell, only returns the first match found rather than all matches.
Is there a way to exactly match a string in one column with couple of strings in another column in R?
I have a dataframe (all5) including one column with dates('CREATIE_DATUM'). Sometimes the notation is 01/JAN/2015 sometimes it's written as 01-JAN-15.
I only need the year, so I wrote the following code line:
all5[['Day','Month','Year']]=all5['CREATIE_DATUM'].str.split('-/',expand=True)
but I get the following error:
columns must be same length as key
so I assume somewhere in my dataframe (>100.000 lines) a value has more than two '/' signs.
How can I make my code skip this line?
You can try to use pd.to_datetime and then use .dt property to access day, month and year:
x = pd.to_datetime(all5["CREATIE_DATUM"])
all5["Day"] = x.dt.day
all5["Month"] = x.dt.month
all5["Year"] = x.dt.year
I am working with several CSV's that first N columns are information and then the next Ms (M is big) columns are information regarding a date.
This is the dataframe picture
I need to set just the columns between N+1 to N+M - 1 columns name to date format.
I tried this, in this case N+1 = 5, no matter M, I suppose that I can use -1 to not affect the last column name.
ContDiarios.columns[5:-1] = pd.to_datetime(ContDiarios.columns[5:-1])
but I get the following error:
TypeError: Index does not support mutable operations
The way you are doing is not feasable. Please try this way
def convert(x):
try:
return pd.to_datetime(x)
except:
return x
x.columns = map(convert,x.columns)
Or you can also use df.rename property to convert it.
I have two columns and I need to check whether the value in one column, all_news['Query], is in another column, all['description'] column.
I found the following solution:
all_news['C'] = on.apply(lambda x: x.Query in x.description, axis=1)
but I get the following error:
TypeError: ("argument of type 'float' is not iterable", 'occurred at
index 737')
Likely because there are some weird characters the iteration cannot decipher and it seems I cannot run any exception management in a one-line iteration.
How can I unfold this one line iteration into a for loop?
Result for index 737:
Query = 'medike international'
description = 'po ketvirtadienį praūžusios liūties nukentėjo ne tik kauno miestas, bet ir rajonas. pliaupiant lietui prie vilkijos vydūno alėjos esančioje apžvalgos aikštelėje ...'
I'm trying to perform calculations based on the entries in a pandas dataframe. The dataframe looks something like this:
and it contains 1466 rows. I'll have to run similar calculations on other dfs with more rows later.
What I'm trying to do, is calculate something like mag='(U-V)/('R-I)' (but ignoring any values that are -999), put that in a new column, and then z_pred=10**((mag-c)m) in a new column (mag, c and m are just hard-coded variables). I have other columns I need to add too, but I figure that'll just be an extension of the same method.
I started out by trying
for i in range(1):
current = qso[:]
mag = (U-V)/(R-I)
name = current['NED']
z_pred = 10**((mag - c)/m)
z_meas = current['z']
but I got either a Series for z, which I couldn't operate on, or various type errors when I tried to print the values or write them to a file.
I found this question which gave me a start, but I can't see how to apply it to multiple calculations, as in my situation.
How can I achieve this?
Conditionally adding calculated columns row wise are usually performed with numpy's np.where;
df['mag'] = np.where(~df[['U', 'V', 'R', 'I']].eq(-999).any(1), (df.U - df.V) / (df.R - df.I), -999)
Note; assuming here that when any of the columns contain '-999' it will not be calculated and a '-999' is returned.