Renaming columns in dataframe according to reference key - dataframe

Let's assume I have a simple data frame:
df <- data.frame("one"= c(1:5), "two" = c(6:10), "three" =c(7:11))
I would like like to rename my column names, so that they match the reference
Let my reference be following:
df2 <- data.frame("Name" = c("A", "B", "C"), "Oldname" = c("one", "two", "three"))
How could I replace my column names from df with those from df2, if they match whats there (So that column names in df are: A, B C)?
In my original data df2 is way bigger and I have multiple data sets such a df, so for a solution to work, the code should be as generic as possible. Thanks in advance!

We can use the match function here to map the new names onto the old ones:
names(df) <- df2$Name[match(names(df), df2$Oldname)]
names(df)
[1] "A" "B" "C"
Demo

Related

pandas - lookup a value in another DF without merging data from the other dataframe

I have 2 DataFrames. DF1 and DF2.
(Please note that DF2 has more entries than DF1)
What I want to do is add the nationality column to DF1 (The script should look up the name in DF1 and find the corresponding nationality in DF2).
I am currently using the below code
final_DF =df1.merge(df2[['PPS','Nationality']], on=['PPS'], how ='left')
Although the nationality column is being added to DF1 the code duplicating entries and also adding additional data from DF2 that I do not want.
Is there a method to get the nationality from DF2 while only keeping the DF1 data?
Thanks
DF1
DF2
OUPUT
2 points, you need to do.
If there is any duplicated in the DF2
You can define 'how' in the merge statement. so it will look like
final_DF = DF1.merge(DF2, on=['Name'], how = 'left')
since you want to keep only to DF1 rows, 'left' should be the ideal option for you.
For more info refer this

Select rows with missing value in a Julia dataframe

I'm just started exploring Julia and am struggeling with subsetting dataframes. I would like to select rows where LABEL has the value "B" and VALUE is missing. Selecting rows with "B" works fine, but trying to add a filter for missing fails. Any suggestions how to solve this. Tips for good documentation on subsetting/filtering dataframes in Julia are welcome. In the Julia documentation I haven't found a solution.
using DataFrames
df = DataFrame(ID = 1:5, LABEL = ["A", "A", "B", "B", "B"], VALUE = ["A1", "A2", "B1", "B2", missing])
df[df[:LABEL] .== "B", :] # works fine
df[df[:LABEL] .== "B" && df[:VALUE] .== missing, :] # fails
Use:
filter([:LABEL, :VALUE] => (l, v) -> l == "B" && ismissing(v), df)
(a very similar example is given in the documentation of the filter function).
If you want to use getindex then write:
df[(df.LABEL .== "B") .& ismissing.(df.VALUE), :]
The fact that you need to use .& instead of && when working with arrays is not DataFrames.jl specific - this is a common pattern in Julia in general when indexing arrays with booleans.

Multiple column selection on a Julia DataFrame

Imagine I have the following DataFrame :
10 rows x 26 columns named A to Z
What I would like to do is to make a multiple subset of the columns by their name (not the index). For instance, assume that I want columns A to D and P to Z in a new DataFrame named df2.
I tried something like this but it doesn't seem to work :
df2=df[:,[:A,:D ; :P,:Z]]
syntax: unexpected semicolon in array expression
top-level scope at Slicing.jl:1
Any idea of the way to do it ?
Thanks for any help
df2 = select(df, Between(:A,:D), Between(:P,:Z))
or
df2 = df[:, All(Between(:A,:D), Between(:P,:Z))]
if you are sure your columns are only from :A to :Z you can also write:
df2 = select(df, Not(Between(:E, :O)))
or
df2 = df[:, Not(Between(:E, :O))]
Finally, you can easily find an index of the column using columnindex function, e.g.:
columnindex(df, :A)
and later use column numbers - if this is something what you would prefer.
In Julia you can also build Ranges with Chars and hence when your columns are named just by single letters yet another option is:
df[:, Symbol.(vcat('A':'D', 'P':'Z'))]

Merge two unequal length data frames by factor matching

I'm new to R and I have been searching all over to look for a solution to merge two data frames and match them by factor. Some of the data does have white space. Here is a simple example in what I am trying to do:
df1 = data.frame(id=c(1,2,3,4,5), item=c("apple", " ", "coffee", "orange", "bread"))
df2 = data.frame(item=c("orange", "carrot", "peas", "coffee", "cheese", "apple", "bacon"),count=c(2,5,13,4,11,9,3))
When I use the merge() function to combine df1 into df1 by matching by 'item' name, I end up with a "item" column of NAs.
ndf = merge(df1, df2, by="item")
How do I resolve this issue? Am I getting this because I have white space in my data? Any help would be great. Thanks,

Select column of dataframe with name matching a string in Julia?

I have a large DataFrame that I import from a spreadsheet. I have the names of several columns that I care about in an array of strings. How do I select a column of the DataFrame who's name matches the contents of a string? I would have though that something like this would work
using DataFrames
df = DataFrame(A = 1:4, B = ["M", "F", "F", "M"], C = 2:5)
colsICareAbout = [":B" ":C"]
df[:A] #This works
df[colsICareAbout[1]] #This doesn't work
Is there a way to do this?
Strings are different than symbols, but they're easy to convert.
colsICareAbout = ["B","C"]
df[symbol(colsICareAbout[1])]
Mind you it might be better to make the entries in colsICareAbout symbols to begin with, but I don't know where your data is coming from.