Applying Predef.eqStr: Expected a value of type string - gf

I was trying to write an operation that takes an undetermined amount of parameters, so if a user chooses not to fill one of the parameters then the operator changes its functionality.
oper
gen_NP = overload{
gen_NP : N -> NP =
\noun ->
mkNP(noun);
gen_NP : Str -> N -> NP =
\mdfir, noun ->
mkNP(mkN(mdfir) (noun));
....
}
But writing in this method would generate a huge number of overload with each new undetermined parameter.
So I used this method
oper
gen_NP : {noun : N ; mdfir : Str ; ....} -> NP =
\obj
case eqStr (obj.mdfir) ("") of {
PFalse =>
mkNP(mkN(mdfir) (noun));
PTrue =>
mkNP(noun);
};
}
When I tried the second method the program keep reporting:
Applying Predef.eqStr: Expected a value of type String, got VP (VGen 1 []) (LIdent(Id{rawId2utf8 = "mdfir"}))
Is there's a way to fix this problem, or is there's a better way to deal with an undetermined number of parameters?
Thank you

Best practices for overloading opers
A huge number of overloads is the intended way of doing things. Just look at any category in the RGL synopsis, you see easily over 20 overloads for a single function name. It may be annoying to define them, but that's something you only need to do once. Then when you use your overloads, it's much nicer to use them like this:
myRegularNoun = mkN "dog" ;
myIrregNoun = mkN "fish" "fish" ;
rather than being forced to give two arguments to everything:
myRegularNoun = mkN "dog" "" ;
myIrregNoun = mkN "fish" "fish" ;
So having several mkN instances is a feature, not a bug.
How to fix your code
I don't recommend using the Predef functions like eqStr, unless you really know what you're doing. For most cases when you need to check strings, you can use the standard pattern matching syntax. This is how to fix your function:
oper
gen_NP : {noun : N ; mdfir : Str} -> NP = \obj ->
case obj.mdfir of {
"" => mkNP obj.noun ;
_ => mkNP (mkN obj.mdfir obj.noun)
} ;
Testing in the GF shell, first with mdfir="":
> cc -unqual -table gen_NP {noun = mkN "dog" ; mdfir = ""}
s . NCase Nom => dog
s . NCase Gen => dog's
s . NPAcc => dog
s . NPNomPoss => dog
a . AgP3Sg Neutr
And now some non-empty string in mdfir:
> cc -unqual -table gen_NP {noun = mkN "dog" ; mdfir = "hello"}
s . NCase Nom => hello dog
s . NCase Gen => hello dog's
s . NPAcc => hello dog
s . NPNomPoss => hello dog
a . AgP3Sg Neutr

Related

Scala MatchError while joining a dataframe and a dataset

I have one dataframe and one dataset :
Dataframe 1 :
+------------------------------+-----------+
|City_Name |Level |
+------------------------------+------------
|{City -> Paris} |86 |
+------------------------------+-----------+
Dataset 2 :
+-----------------------------------+-----------+
|Country_Details |Temperature|
+-----------------------------------+------------
|{City -> Paris, Country -> France} |31 |
+-----------------------------------+-----------+
I am trying to make a join of them by checking if the map in the column "City_Name" is included in the map of the Column "Country_Details".
I am using the following UDF to check the condition :
val mapEqual = udf((col1: Map[String, String], col2: Map[String, String]) => {
if (col2.nonEmpty){
col2.toSet subsetOf col1.toSet
} else {
true
}
})
And I am making the join this way :
dataset2.join(dataframe1 , mapEqual(dataset2("Country_Details"), dataframe1("City_Name"), "leftanti")
However, I get such error :
terminated with error scala.MatchError: UDF(Country_Details#528) AS City_Name#552 (of class org.apache.spark.sql.catalyst.expressions.Alias)
Has anyone previously got the same error ?
I am using Spark version 3.0.2 and SQLContext, with scala language.
There are 2 issues here, the first one is that when you're calling your function, you're passing one extra parameter leftanti (you meant to pass it to join function, but you passed it to the udf instead).
The second one is that the udf logic won't work as expected, I suggest you use this:
val mapContains = udf { (col1: Map[String, String], col2: Map[String, String]) =>
col2.keys.forall { key =>
col1.get(key).exists(_ eq col2(key))
}
}
Result:
scala> ds.join(df1 , mapContains(ds("Country_Details"), df1("City_Name")), "leftanti").show(false)
+----------------------------------+-----------+
|Country_Details |Temperature|
+----------------------------------+-----------+
|{City -> Paris, Country -> France}|31 |
+----------------------------------+-----------+

How to convert a class expression (from restriction unionOf) to a string?

A SPARQL query returns a result with restrictions with allValuesFrom and unionOf. I need do concat these values, but, when I use bind or str functions, the result is blank.
I tried bind, str and group_concat functions, but, all of it was unsuccessful. Group_concat return a blank node.
SELECT DISTINCT ?source ?is_succeeded_by
WHERE {
?source rdfs:subClassOf ?restriction .
?restriction owl:onProperty j.0:isSucceededBy .
?restriction owl:allValuesFrom ?is_succeeded_by .
FILTER (REGEX(STR(?source), 'gatw-Invoice_match'))
}
Result of SPARQL query in Protegé:
You can hardly obtain strings like 'xxx or yyy' programmatically in Jena,
since it is Manchester Syntax, an OWL-API native format, and it is not supported by Jena.
Any class expression is actually b-node, there are no such builtin symbols like 'or' in raw RDF.
To represent any anonymous class expression as a string, you can use ONT-API,
which is a jena-based OWL-API, and, therefore, both SPARQL and Manchester Syntax are supported there.
Here is an example based on pizza ontology:
// use pizza, since no example data provided in the question:
IRI pizza = IRI.create("https://raw.githubusercontent.com/owlcs/ont-api/master/src/test/resources/ontapi/pizza.ttl");
// get OWLOntologyManager instance from ONT-API
OntologyManager manager = OntManagers.createONT();
// as extended Jena model:
OntModel model = manager.loadOntology(pizza).asGraphModel();
// prepare query that looks like the original, but for pizza
String txt = "SELECT DISTINCT ?source ?is_succeeded_by\n" +
"WHERE {\n" +
" ?source rdfs:subClassOf ?restriction . \n" +
" ?restriction owl:onProperty :hasTopping . \n" +
" ?restriction owl:allValuesFrom ?is_succeeded_by .\n" +
" FILTER (REGEX(STR(?source), 'Am'))\n" +
"}";
Query q = new Query();
q.setPrefixMapping(model);
q = QueryFactory.parse(q, txt, null, Syntax.defaultQuerySyntax);
// from owlapi-parsers package:
OWLObjectRenderer renderer = new ManchesterOWLSyntaxOWLObjectRendererImpl();
// from ont-api (although it is a part of internal API, it is public):
InternalObjectFactory iof = new SimpleObjectFactory(manager.getOWLDataFactory());
// exec SPARQL query:
try (QueryExecution exec = QueryExecutionFactory.create(q, model)) {
ResultSet res = exec.execSelect();
while (res.hasNext()) {
QuerySolution qs = res.next();
List<Resource> vars = Iter.asStream(qs.varNames()).map(qs::getResource).collect(Collectors.toList());
if (vars.size() != 2)
throw new IllegalStateException("For the specified query and valid OWL must not happen");
// Resource (Jena) -> OntCE (ONT-API) -> ONTObject (ONT-API) -> OWLClassExpression (OWL-API)
OWLClassExpression ex = iof.getClass(vars.get(1).inModel(model).as(OntClass.class)).getOWLObject();
// format: 'class local name' ||| 'superclass string in ManSyn'
System.out.println(vars.get(0).getLocalName() + " ||| " + renderer.render(ex));
}
}
The output:
American ||| MozzarellaTopping or PeperoniSausageTopping or TomatoTopping
AmericanHot ||| HotGreenPepperTopping or JalapenoPepperTopping or MozzarellaTopping or PeperoniSausageTopping or TomatoTopping
Used env: ont-api:2.0.0, owl-api:5.1.11, jena-arq:3.13.1

How can I sort elements of a TypedPipe in Scalding?

I have not been able to find a way to sort elements of a TypedPipe in Scalding (when not performing a group operation). Here are the relevant parts of my program (replacing irrelevant parts with ellipses):
case class ReduceOutput(val slug : String, score : Int, json1 : String, json2 : String)
val pipe1 : TypedPipe[(String, ReduceFeatures)] = ...
val pipe2 : TypedPipe[(String, ReduceFeatures)] = ...
pipe1.join(pipe2).map { entry =>
val (slug : String, (features1 : ReduceFeatures, features2 : ReduceFeatures)) = entry
new ReduceOutput(
slug,
computeScore(features1, features2),
features1.json,
features2.json)
}
.write(TypedTsv[ReduceOutput](args("output")))
Is there a way to sort the elements on their score after the map but before the write?

Querying for close matches in MongoDB and Rails 3

So, I need to write something in Rails 3 that does a query to a MongoDB (If you don't know mongo I don't need the code just some ideas) that can query the data for close matches. For instance, let us say you are searching a collection for {item : a, item2 : b, item3 : c}. And exact match would have all three, but I also want matches that omit one of the three keys. I have two theories on how I should handle this. One would be to do multiple queries to omit certain parts of the data and the other would be to write a complex or statement. I don't feel these are the best solutions though. Could anyone else suggest something to me? Even if it is from an SQL perspective, that would work for me.
I do need something that can be done fast. This is for a search that needs to return results as fast as possible.
Yet another approach would be to use MapReduce.
With it you can calculate how many fields a document matches.
Though it's not very performant approach at the moment (but one of the most flexible).
The code can be something like this:
var m = function() {
var fieldsToMatch = {item: a, item2: b, item3: c};
for(var k in fieldsToMatch) {
if(this[k] == fieldsToMatch[k]) {
emit(this.id, {count : 1}); // emit 1 for each field matched
}
}
};
var r = function(k, vals) {
var result = {count: 0};
vals.forEach(function(v) {
result.count += v.count;
});
return result;
}
db.items.mapReduce(m, r, {out: 'out_collection'});
Why dont you just use mongodb OR, in ruby (using mongoid) you can do this by
Collection.any_of({:item => a, :item2 => b, item3 => c},{:item => a, :item2 => b},{:item => a, :item3 => c},{:item2 => b, item3 => c})
which is equivalent to
db.Collection.find({$or:[{item:a,item2:b,item3:c}],[{item:a,item2:b}],[{item:a,item3:c}],[{item2:b,item3:c}]})

Add quoted strings support to Antlr3 grammar

I'm trying to implement a grammar for parsing queries. Single query consists of items where each item can be either name or name-ref.
name is either mystring (only letters, no spaces) or "my long string" (letters and spaces, always quoted). name-ref is very similar to name and the only difference is that it should start with ref: (ref:mystring, ref:"my long string"). Query should contain at least 1 item (name or name-ref).
Here's what I have:
NAME: ('a'..'z')+;
REF_TAG: 'ref:';
SP: ' '+;
name: NAME;
name_ref: REF_TAG name;
item: name | name_ref;
query: item (SP item)*;
This grammar demonstrates what I basically need to get and the only feature is that it doesn't support long quoted strings (it works fine for names that doesn't have spaces).
SHORT_NAME: ('a'..'z')+;
LONG_NAME: SHORT_NAME (SP SHORT_NAME)*;
REF_TAG: 'ref:';
SP: ' '+;
Q: '"';
short_name: SHORT_NAME;
long_name: LONG_NAME;
name_ref: REF_TAG (short_name | (Q long_name Q));
item: (short_name | (Q long_name Q)) | name_ref;
query: item (SP item)*;
But that doesn't work. Any ideas what's the problem? Probably, that's important: my first query should be treated as 3 items (3 names) and "my first query" is 1 item (1 long_name).
ANTLR's lexer matches greedily: that is why input like my first query is being tokenized as LONG_NAME instead of 3 SHORT_NAMEs with spaces in between.
Simply remove the LONG_NAME rule and define it in the parser rule long_name.
The following grammar:
SHORT_NAME : ('a'..'z')+;
REF_TAG : 'ref:';
SP : ' '+;
Q : '"';
short_name : SHORT_NAME;
long_name : Q SHORT_NAME (SP SHORT_NAME)* Q;
name_ref : REF_TAG (short_name | (Q long_name Q));
item : short_name | long_name | name_ref;
query : item (SP item)*;
will parse the input:
my first query "my first query" ref:mystring
as follows:
However, you could also tokenize a quoted name in the lexer and strip the quotes from it with a bit of custom code. And removing spaces from the lexer could also be an option. Something like this:
SHORT_NAME : ('a'..'z')+;
LONG_NAME : '"' ~'"'* '"' {setText(getText().substring(1, getText().length()-1));};
REF_TAG : 'ref:';
SP : ' '+ {skip();};
name_ref : REF_TAG (SHORT_NAME | LONG_NAME);
item : SHORT_NAME | LONG_NAME | name_ref;
query : item+ EOF;
which would parse the same input as follows:
Note that the actual token LONG_NAME will be stripped of its start- and end-quote.
Here's a grammar that should work for your requirements:
SP: ' '+;
SHORT_NAME: ('a'..'z')+;
LONG_NAME: '"' SHORT_NAME (SP SHORT_NAME)* '"';
REF: 'ref:' (SHORT_NAME | LONG_NAME);
item: SHORT_NAME | LONG_NAME | REF;
query: item (SP item)*;
If you put this at the top:
grammar Query;
#members {
public static void main(String[] args) throws Exception {
QueryLexer lex = new QueryLexer(new ANTLRFileStream(args[0]));
CommonTokenStream tokens = new CommonTokenStream(lex);
QueryParser parser = new QueryParser(tokens);
try {
TokenSource ts = parser.getTokenStream().getTokenSource();
Token tok = ts.nextToken();
while (EOF != (tok.getType())) {
System.out.println("Got a token: " + tok);
tok = ts.nextToken();
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
You should see the lexer break everything apart nicely (I hope ;-) )
hi there "long name" ref:shortname ref:"long name"
Should give:
Got a token: [#-1,0:1='hi',<6>,1:0]
Got a token: [#-1,2:2=' ',<7>,1:2]
Got a token: [#-1,3:7='there',<6>,1:3]
Got a token: [#-1,8:8=' ',<7>,1:8]
Got a token: [#-1,9:19='"long name"',<4>,1:9]
Got a token: [#-1,20:20=' ',<7>,1:20]
Got a token: [#-1,21:33='ref:shortname',<5>,1:21]
Got a token: [#-1,34:34=' ',<7>,1:34]
Got a token: [#-1,35:49='ref:"long name"',<5>,1:35]
I'm not 100% sure what the problem is with your grammar, but I suspect the issue relates to your definition of a LONG_NAME without the quotes. Perhaps you can see what the distinction is?