Creating a function that changes row values in a specific column - pandas

So I have multiple different dataframes, all with varying team names under the same column name, 'Team'. I've created a function to run through these all at once, but for some reason it doesn't execute.
def rename(df):
df = df.replace({'Team':
{'NewEngland' : 'Patriots',
'GreenBay' : 'Packers',
'Pittsburgh' : 'Steelers',
'KansasCity' : 'Chiefs',
'Denver' : 'Broncos',
'Seattle' : 'Seahawks',
'Indianapolis' : 'Colts',
'New Orleans' : 'Saints',
'NewOrleans' : 'Saints',
'Dallas' : 'Cowboys',
'Baltimore' : 'Ravens',
'Philadelphia' : 'Eagles',
'Cincinnati' : 'Bengals',
'Carolina' : 'Panthers',
'Tennessee' : 'Titans',
'Arizona' : 'Cardinals',
'Buffalo' : 'Bills',
'SanFrancisco' : '49ers',
'Minnesota' : 'Vikings',
'Washington' : 'Redskins',
'Chicago' : 'Bears',
'Atlanta' : 'Falcons',
'NYGiants' : 'Giants',
'NYJets' : 'Jets',
'Cleveland' : 'Browns',
'Detroit' : 'Lions',
'Miami' : 'Dolphins',
'TampaBay' : 'Buccaneers',
'Jacksonville' : 'Jaguars',
'Houston' : 'Texans',
'HoustonTexans' : 'Texans',
'Oakland' : 'Raiders',
'SanDiego' : 'Chargers',
'St.Louis' : 'Rams',
'LARams' : 'Rams',
'LAChargers' : 'Chargers',
'LasVegas' : 'Raiders',
'LosAngeles' : 'Rams',
'NewYork' : 'Giants',
'KCChiefs' : 'Chiefs',
'Kansas' : 'Chiefs',
'Tampa' : 'Buccaneers'
}})
When I run this code, as seen in the picture, the code does not work - Image
As seen, I ran the function on a bunch of different dataframes, but when I sample one of them, there are no changes to the 'Team' column.
I know my code is correct because when I run the code outside of the function, such as below:
nfl_07 = nfl_07.replace({'Team':
{'NewEngland' : 'Patriots',
'GreenBay' : 'Packers',
'Pittsburgh' : 'Steelers'
etc.
This code works for some reason; my nfl_07 dataframe has the correct Team names... Is there something wrong with my function?

replace does not act in place by default.
In your function you fail to return the renamed DataFrame, when you call the function you fail to assign the output. So nothing happens.
The assignment to df within the function is local. It does not impact the outer scope.
Either modify in place:
def rename(df):
df.replace({'Team':
{'NewEngland' : 'Patriots',
'GreenBay' : 'Packers',
# ...
}}, inplace=True)
rename(nfl_07)
Or return and reassign the output:
def rename(df):
return df.replace({'Team':
{'NewEngland' : 'Patriots',
'GreenBay' : 'Packers',
# ...
}})
nfl_07 = rename(nfl_07)

Related

Scala MatchError while joining a dataframe and a dataset

I have one dataframe and one dataset :
Dataframe 1 :
+------------------------------+-----------+
|City_Name |Level |
+------------------------------+------------
|{City -> Paris} |86 |
+------------------------------+-----------+
Dataset 2 :
+-----------------------------------+-----------+
|Country_Details |Temperature|
+-----------------------------------+------------
|{City -> Paris, Country -> France} |31 |
+-----------------------------------+-----------+
I am trying to make a join of them by checking if the map in the column "City_Name" is included in the map of the Column "Country_Details".
I am using the following UDF to check the condition :
val mapEqual = udf((col1: Map[String, String], col2: Map[String, String]) => {
if (col2.nonEmpty){
col2.toSet subsetOf col1.toSet
} else {
true
}
})
And I am making the join this way :
dataset2.join(dataframe1 , mapEqual(dataset2("Country_Details"), dataframe1("City_Name"), "leftanti")
However, I get such error :
terminated with error scala.MatchError: UDF(Country_Details#528) AS City_Name#552 (of class org.apache.spark.sql.catalyst.expressions.Alias)
Has anyone previously got the same error ?
I am using Spark version 3.0.2 and SQLContext, with scala language.
There are 2 issues here, the first one is that when you're calling your function, you're passing one extra parameter leftanti (you meant to pass it to join function, but you passed it to the udf instead).
The second one is that the udf logic won't work as expected, I suggest you use this:
val mapContains = udf { (col1: Map[String, String], col2: Map[String, String]) =>
col2.keys.forall { key =>
col1.get(key).exists(_ eq col2(key))
}
}
Result:
scala> ds.join(df1 , mapContains(ds("Country_Details"), df1("City_Name")), "leftanti").show(false)
+----------------------------------+-----------+
|Country_Details |Temperature|
+----------------------------------+-----------+
|{City -> Paris, Country -> France}|31 |
+----------------------------------+-----------+

How can I sort elements of a TypedPipe in Scalding?

I have not been able to find a way to sort elements of a TypedPipe in Scalding (when not performing a group operation). Here are the relevant parts of my program (replacing irrelevant parts with ellipses):
case class ReduceOutput(val slug : String, score : Int, json1 : String, json2 : String)
val pipe1 : TypedPipe[(String, ReduceFeatures)] = ...
val pipe2 : TypedPipe[(String, ReduceFeatures)] = ...
pipe1.join(pipe2).map { entry =>
val (slug : String, (features1 : ReduceFeatures, features2 : ReduceFeatures)) = entry
new ReduceOutput(
slug,
computeScore(features1, features2),
features1.json,
features2.json)
}
.write(TypedTsv[ReduceOutput](args("output")))
Is there a way to sort the elements on their score after the map but before the write?

Get-SBNamespace truncates ManageUsers list when its long

Get-SBNamespace
When I run this, and I have a list of ManageUsers, it truncates the list.
PS C:\Program Files\Service Bus\1.1> Get-SBNamespace
SubscriptionId : 00000000000000000000000000000000
State : Active
Name : NameSpaceOne
AddressingScheme : Path
CreatedTime : 9/14/2017 8:39:34 PM
IssuerName : NameSpaceOne
IssuerUri : NameSpaceOne
ManageUsers : {Me#mycompany.com,You#mycompany.com,Others#mycomp
DnsEntry :
PrimarySymmetricKey : No No No
SecondarySymmetricKey :
SubscriptionId : 00000000000000000000000000000000
State : Active
Name : NameSpaceTwo
AddressingScheme : Path
CreatedTime : 2/14/2017 7:32:39 PM
IssuerName : NameSpaceTwo
IssuerUri : NameSpaceTwo
ManageUsers : {Me#mycompany.com,You#mycompany.com,Others#mycomp
DnsEntry :
PrimarySymmetricKey : No No No
SecondarySymmetricKey :
Is there a way to get the ManagerUsers values, when there is many of them?
#stopTruncatingMe
(Get-SBNamespace).ManageUsers
will access to the ManagerUsers property and display the full list of users.

SAP HANA CDS View Fuzzy Search not working

I have a HDBDD defined as such, but the fuzzy search I tried using the query below doesn't work. Only maps to full text like "Singapore".
https://xxxxxxxx.xxx.xx.xxxxx.com/xxxxx.xsodata/LandValue?$format=json&search=singaporw
namespace xxx;
#Schema : 'XXX'
context fuzzysearch {
#Catalog.tableType : #COLUMN
entity ADDRESS {
key id : Integer;
street : String(80);
zipCode : Integer;
city : String(80);
#SearchIndex.text.enabled : true
#SearchIndex.fuzzy.enabled : true
country : String(80);
};
#Search.searchable: true
define view V_ADDRESS as select from ADDRESS as ADDRESS {
#EnterpriseSearch.key : true
ADDRESS.id,
#Search.defaultSearchElement: true
#Search.ranking: #HIGH
#Search.fuzzinessThreshold : 0.7
ADDRESS.country
};
};
Looks like you are using this as your base example?
Try changing your fuzzy threshold to like .8 or .87
https://xxxxxxxx.xxx.xx.xxxxx.com/xxxxx.xsodata/LandValue?$format=json&search=singporw
Now if the only country in your dataset is Singapore then you will get everything every time of course.

boto and the 'In' comparator

I'm trying to use the 'In' comparator with boto for specifying multiple locales on Mechanical Turk jobs. This answer says it's possible, as do the AMT docs.
I tried:
min_qualifications.add(
LocaleRequirement(
comparator='In',
required_to_preview=False,
locale=['US', 'CA', 'GB', 'IE', 'AU']))
I also tried, variously:
locale='US, CA, GB, IE, AU'
locale='US|CA|GB|IE|AU'
locale='US CA GB IE AU'
How is it done?
Just because something is possible in the mTurk API does not mean that Boto will support it. Boto has not been updated for this yet.
Here's how to do it with mturk-python:
import mturk
m = mturk.MechanicalTurk()
question = """
<QuestionForm xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/QuestionForm.xsd">
<Question>
<QuestionIdentifier>answer</QuestionIdentifier>
<QuestionContent>
<Text>Hello world :^)</Text>
</QuestionContent>
<AnswerSpecification>
<FreeTextAnswer/>
</AnswerSpecification>
</Question>
</QuestionForm>
"""
qual = [
{'QualificationTypeId' : mturk.LOCALE,
'Comparator' : 'In',
'LocaleValue' : [{'Country':'GB'},{'Country':'US'},{'Country':'AU'}]},
]
reward = {'Amount' : 0, 'CurrencyCode' : 'USD'}
createhit = {"Title" : "Multiple locales",
"Description" : "https://github.com/ctrlcctrlv/mturk-python",
"Keywords" : "testing, one, two, three",
"Reward" : reward,
"Question" : question,
"QualificationRequirement" : qual,
"AssignmentDurationInSeconds" : 90,
"LifetimeInSeconds" : (60*60*24)}
r = m.create_request('CreateHIT', createhit)
print r
print m.flattened_parameters