I'm trying to use the 'In' comparator with boto for specifying multiple locales on Mechanical Turk jobs. This answer says it's possible, as do the AMT docs.
I tried:
min_qualifications.add(
LocaleRequirement(
comparator='In',
required_to_preview=False,
locale=['US', 'CA', 'GB', 'IE', 'AU']))
I also tried, variously:
locale='US, CA, GB, IE, AU'
locale='US|CA|GB|IE|AU'
locale='US CA GB IE AU'
How is it done?
Just because something is possible in the mTurk API does not mean that Boto will support it. Boto has not been updated for this yet.
Here's how to do it with mturk-python:
import mturk
m = mturk.MechanicalTurk()
question = """
<QuestionForm xmlns="http://mechanicalturk.amazonaws.com/AWSMechanicalTurkDataSchemas/2005-10-01/QuestionForm.xsd">
<Question>
<QuestionIdentifier>answer</QuestionIdentifier>
<QuestionContent>
<Text>Hello world :^)</Text>
</QuestionContent>
<AnswerSpecification>
<FreeTextAnswer/>
</AnswerSpecification>
</Question>
</QuestionForm>
"""
qual = [
{'QualificationTypeId' : mturk.LOCALE,
'Comparator' : 'In',
'LocaleValue' : [{'Country':'GB'},{'Country':'US'},{'Country':'AU'}]},
]
reward = {'Amount' : 0, 'CurrencyCode' : 'USD'}
createhit = {"Title" : "Multiple locales",
"Description" : "https://github.com/ctrlcctrlv/mturk-python",
"Keywords" : "testing, one, two, three",
"Reward" : reward,
"Question" : question,
"QualificationRequirement" : qual,
"AssignmentDurationInSeconds" : 90,
"LifetimeInSeconds" : (60*60*24)}
r = m.create_request('CreateHIT', createhit)
print r
print m.flattened_parameters
Related
So I have multiple different dataframes, all with varying team names under the same column name, 'Team'. I've created a function to run through these all at once, but for some reason it doesn't execute.
def rename(df):
df = df.replace({'Team':
{'NewEngland' : 'Patriots',
'GreenBay' : 'Packers',
'Pittsburgh' : 'Steelers',
'KansasCity' : 'Chiefs',
'Denver' : 'Broncos',
'Seattle' : 'Seahawks',
'Indianapolis' : 'Colts',
'New Orleans' : 'Saints',
'NewOrleans' : 'Saints',
'Dallas' : 'Cowboys',
'Baltimore' : 'Ravens',
'Philadelphia' : 'Eagles',
'Cincinnati' : 'Bengals',
'Carolina' : 'Panthers',
'Tennessee' : 'Titans',
'Arizona' : 'Cardinals',
'Buffalo' : 'Bills',
'SanFrancisco' : '49ers',
'Minnesota' : 'Vikings',
'Washington' : 'Redskins',
'Chicago' : 'Bears',
'Atlanta' : 'Falcons',
'NYGiants' : 'Giants',
'NYJets' : 'Jets',
'Cleveland' : 'Browns',
'Detroit' : 'Lions',
'Miami' : 'Dolphins',
'TampaBay' : 'Buccaneers',
'Jacksonville' : 'Jaguars',
'Houston' : 'Texans',
'HoustonTexans' : 'Texans',
'Oakland' : 'Raiders',
'SanDiego' : 'Chargers',
'St.Louis' : 'Rams',
'LARams' : 'Rams',
'LAChargers' : 'Chargers',
'LasVegas' : 'Raiders',
'LosAngeles' : 'Rams',
'NewYork' : 'Giants',
'KCChiefs' : 'Chiefs',
'Kansas' : 'Chiefs',
'Tampa' : 'Buccaneers'
}})
When I run this code, as seen in the picture, the code does not work - Image
As seen, I ran the function on a bunch of different dataframes, but when I sample one of them, there are no changes to the 'Team' column.
I know my code is correct because when I run the code outside of the function, such as below:
nfl_07 = nfl_07.replace({'Team':
{'NewEngland' : 'Patriots',
'GreenBay' : 'Packers',
'Pittsburgh' : 'Steelers'
etc.
This code works for some reason; my nfl_07 dataframe has the correct Team names... Is there something wrong with my function?
replace does not act in place by default.
In your function you fail to return the renamed DataFrame, when you call the function you fail to assign the output. So nothing happens.
The assignment to df within the function is local. It does not impact the outer scope.
Either modify in place:
def rename(df):
df.replace({'Team':
{'NewEngland' : 'Patriots',
'GreenBay' : 'Packers',
# ...
}}, inplace=True)
rename(nfl_07)
Or return and reassign the output:
def rename(df):
return df.replace({'Team':
{'NewEngland' : 'Patriots',
'GreenBay' : 'Packers',
# ...
}})
nfl_07 = rename(nfl_07)
I was trying to write an operation that takes an undetermined amount of parameters, so if a user chooses not to fill one of the parameters then the operator changes its functionality.
oper
gen_NP = overload{
gen_NP : N -> NP =
\noun ->
mkNP(noun);
gen_NP : Str -> N -> NP =
\mdfir, noun ->
mkNP(mkN(mdfir) (noun));
....
}
But writing in this method would generate a huge number of overload with each new undetermined parameter.
So I used this method
oper
gen_NP : {noun : N ; mdfir : Str ; ....} -> NP =
\obj
case eqStr (obj.mdfir) ("") of {
PFalse =>
mkNP(mkN(mdfir) (noun));
PTrue =>
mkNP(noun);
};
}
When I tried the second method the program keep reporting:
Applying Predef.eqStr: Expected a value of type String, got VP (VGen 1 []) (LIdent(Id{rawId2utf8 = "mdfir"}))
Is there's a way to fix this problem, or is there's a better way to deal with an undetermined number of parameters?
Thank you
Best practices for overloading opers
A huge number of overloads is the intended way of doing things. Just look at any category in the RGL synopsis, you see easily over 20 overloads for a single function name. It may be annoying to define them, but that's something you only need to do once. Then when you use your overloads, it's much nicer to use them like this:
myRegularNoun = mkN "dog" ;
myIrregNoun = mkN "fish" "fish" ;
rather than being forced to give two arguments to everything:
myRegularNoun = mkN "dog" "" ;
myIrregNoun = mkN "fish" "fish" ;
So having several mkN instances is a feature, not a bug.
How to fix your code
I don't recommend using the Predef functions like eqStr, unless you really know what you're doing. For most cases when you need to check strings, you can use the standard pattern matching syntax. This is how to fix your function:
oper
gen_NP : {noun : N ; mdfir : Str} -> NP = \obj ->
case obj.mdfir of {
"" => mkNP obj.noun ;
_ => mkNP (mkN obj.mdfir obj.noun)
} ;
Testing in the GF shell, first with mdfir="":
> cc -unqual -table gen_NP {noun = mkN "dog" ; mdfir = ""}
s . NCase Nom => dog
s . NCase Gen => dog's
s . NPAcc => dog
s . NPNomPoss => dog
a . AgP3Sg Neutr
And now some non-empty string in mdfir:
> cc -unqual -table gen_NP {noun = mkN "dog" ; mdfir = "hello"}
s . NCase Nom => hello dog
s . NCase Gen => hello dog's
s . NPAcc => hello dog
s . NPNomPoss => hello dog
a . AgP3Sg Neutr
From my Data Science Experience, I am able to make a connection to the Hive database in BigInsights and read the table schema. But Data Science Experience does not seem to be able to read the table contents as I get a count of zero! Here are some of my settings:
conf = (SparkConf().set("com.ibm.analytics.metadata.enabled","false"))
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
dash = {
'jdbcurl': 'jdbc:hive2://nnnnnnnnnnn:10000/;ssl=true;',
'user': 'xxxxxxxxxx',
'password': 'xxxxxxxxx',
}
spark.conf
offers = spark.read.jdbc(dash['jdbcurl'],
table='offers',
properties={"user" : dash["user"],
"password" : dash["password"]})
offers.count() returns: 0
offers.show()
returns:
+-----------+----------+
|offers.name|offers.age|
+-----------+----------+
+-----------+----------+
Thanks.
Yes i was able to see same behaviour with hive jdbc connector.
I tried this python connector and it returned correct count.
https://datascience.ibm.com/docs/content/analyze-data/python_load.html
from ingest.Connectors import Connectors
`HiveloadOptions = { Connectors.Hive.HOST : 'bi-hadoop-prod-4222.bi.services.us-south.bluemix.net',
Connectors.Hive.PORT : '10000',
Connectors.Hive.SSL : True,
Connectors.Hive.DATABASE : 'default',
Connectors.Hive.USERNAME : 'charles',
Connectors.Hive.PASSWORD : 'march14march',
Connectors.Hive.SOURCE_TABLE_NAME : 'student'}
`
`HiveDF = sqlContext.read.format("com.ibm.spark.discover").options(**HiveloadOptions).load()`
HiveDF.printSchema()
HiveDF.show()
HiveDF.count()
Thanks,
Charles.
Given a specific URL rendered with Python/requests, I need to findAll kind of div, h3, p, etc with class name "Specific".
This works partially :
data = soup.findAll("div", { "class" : "Specific" })
because it only finds div.
I am looking for something like :
data = soup.findAll("*", { "class" : "Specific" })
Good answer from soon :
data = soup.find_all(class_='Specific')
You should specify class_ parameter in the find_all method. name parameter may be omitted as well:
In [12]: html = '''<div class='Specific'><span class='Specific c1'></span><p class='NonSpecific'></p></div>'''
In [13]: soup = bs4.BeautifulSoup(html, 'html.parser')
In [14]: soup.find_all(class_='Specific')
Out[14]:
[<div class="Specific"><span class="Specific c1"></span><p class="NonSpecific"></p></div>,
<span class="Specific c1"></span>]
Is there a way to have shorter _cls values in mongoengine, apart from making the names of the classes shorter (which would make code difficult to read)?
I was looking for something like this:
class User(Document):
login = StringField(primary_key = True)
full_name = StringField()
meta = { "short_class_name": "u" }
class StackOverFlowUser(User):
rep = IntField()
meta = { "short_class_name": "s" }
If the short_class_name meta attribute existed (but I have not found it or anything similar), then we could have this:
{ "_cls" : "s", "_id" : "john",
"full_name" : "John Smith", "rep" : 600 }
instead of this:
{ "_cls" : "User.StackOverFlowUser", "_id" : "john",
"full_name" : "John Smith", "rep" : 600 }
In this example, this leads to about 20% space saving, and in some cases, it could be even greater.
I guess mongoengine is open source, I could go ahead and code this, but if you know a simpler solution, I would love to hear it.
Thanks.
After looked into mongoengine's source code I and in most part MiniQuark got next hack:
def hack_document_cls_name(cls, new_name):
cls._class_name = new_name
from mongoengine.base import _document_registry
_document_registry[new_name] = cls
or as class decorator:
def hack_document_cls_name(new_name):
def wrapper(cls):
cls._class_name = new_name
from mongoengine.base import _document_registry
_document_registry[new_name] = cls
return cls
return wrapper
We see no other way than hacking with the _class_name and the _document_registry.
When you want to rename a class, you must apply this hack immediately after the class definition (or at least before you define any sub-classes, or else they will have a _types attribute with the base class's long name). For example:
class User(Document):
login = StringField(primary_key = True)
full_name = StringField()
hack_document_cls_name(User, "u")
class StackOverflowUser(User):
rep = IntField()
hack_document_cls_name(StackOverflowUser, "s")
or as class decorator:
#hack_document_cls_name("u")
class User(Document):
login = StringField(primary_key = True)
full_name = StringField()
#hack_document_cls_name("s")
class StackOverflowUser(User):
rep = IntField()
Ok, so far, the best I could come up with is this. It works, but I'm sure there must be less hackish solutions...
class U(Document): # User
meta = { "collection": "user" }
login = StringField(primary_key = True)
full_name = StringField()
class S(U): # StackOverflowUser
rep = IntField()
User = U; del(U)
StackOverflowUser = S; del(S)
User.__name__ = "User"
StackOverflowUser.__name__ = "StackOverflowUser"
Now when I do this:
StackOverflowUser.objects.create(login="john", full_name="John Smith", rep=600)
I get this document in the user collection:
{ "_cls" : "U.S", "_id" : "john", "full_name" : "John Smith", "rep" : 600 }
Compared to the standard behavior, this is about 20% space saving. But I don't like how hackish it is.