adding (|) functionality to determiner on GF - gf

On GF writing sentences' tree often encounter many options where multiple prepositions could be used in the same tree such as
Download it on my phone
Download it to my phone
Download it onto my phone
... and the list goes on and on.
this kind of problem could be solved as below
(on_Prep|to_Prep|...)
But in some situations, this problem occurs with determiners such as
Eat the food
Eat food
I know the meaning of the above sentences is not exactly the same but is there any way to accomplish such a goal?
I tried the following but it seemed unlogical.
mkNP
(the_Det|)
(mkN ("food"))
I also tried to add an empty string for determiner such as mkDet (mkDigits (""))
but unfortunately, the above two ways seem not smart enough.😁😁

Your general approach with using | is correct.
There is no empty determiner, but rather another overload instance of mkNP. There's one with a determiner (so Det -> N -> NP) and another without, just N -> NP. So you can do this:
eat_food_VP : VP =
mkVP eat_V2 (mkNP the_Det food_N | mkNP food_N) ;

Related

How to keep translations separated where the same word is used in English but a different one in other languages?

Imagine I have a report, a letter actually, which I need to translate to several languages. I have created a greeting field in the form which is filled programatically by an onchange event method.
if self.partner_id.gender == 'female':
self.letter_greeting = _('Dear %s %s,') % ( # the translation should be "Estimada"
self.repr_recipient_id.title.shorcut, surname
)
elif self.partner_id.gender == 'male':
self.letter_greeting = _('Dear %s %s,') % ( # translation "Estimado"
self.repr_recipient_id.title.shorcut, surname
)
else:
self.letter_greeting = _('Dear %s %s,') % ( # translation: "Estimado/a"
self.partner_id.title.shorcut, surname
)
In that case the word Dear should be translated to different Spanish translations depending on which option is used, this is because we use different termination depending on the gender. Exporting the po file I found that all the options are altogether, that make sense because almost all the cases the translations will be the same, but not in this case:
#. module: custom_module
#: code:addons/custom_module/models/sale_order.py:334
#: code:addons/custom_module/models/sale_order.py:338
#: code:addons/custom_module/models/sale_order.py:342
#, python-format
msgid "Dear %s %s,"
msgstr "Dear %s %s,"
Solutions I can apply directly
Put all the terms in different entries to avoid the same translation manually every time I need to update the po file. This can be cumbersome if you have many different words with that problem. If I do it and I open the file with poedit, this error appears: duplicate message definition
Put all the possible combinations with slashes, this is done y some other parts of Odoo. For both gender would be:
#. module: stock
#: model:res.company,msg:stock.res_company
msgid "Dear"
msgstr "Estimado/a"
This is just an example. I can think of many words that look the same in English, but they use different spelling or meanings in other languages depending on the context.
Possible best solutions
I don't know if Odoo know anything aboutu the context of a word to know if it was already translated or not. Adding a context manually could solve the problem, at least for words with different meanings.
The nicest solution would be to have a parameter to the translation module to make sure that the word is exported as an isolated entry for that especific translation.
Do you think that I am giving to it too much importance haha? Do you know if there is any better solution? Why is poedit not taking into account that problem at all?
I propose an extension of models res.partner.title and res.partner.
res.partner.title should get a translateable field for saving salutation prefixes like 'Dear' or 'Sehr geehrter' (German). Maybe it's worth to get something about genders, too, but i won't get into detail here about that.
You probably want to show the configuring user an example like "Dear Mr. Name" or something like that. A computed field should work.
On res.partner you should just implement either a computed field or just a method to get a full salutation for a partner record.
To some degree this is a linguistics problem. I believe the best solution would be to use a different "Source Language", one made up of keys, and then have English as another Translation. The word "Dear" in English does not have a gender context (and typically, much of English doesn't), while the word "Estimado" in Spanish does. The translation from that Spanish word to English is more appropriately "Masculine Dear." Therefore, using keys as your source language, you would have this:
SourceText (EnglishDescription) -> Translation (English) -> Translation (Spanish)
DearMasculine -> Dear -> Estimado
DearFeminine -> Dear -> Estimada
DearNuetral -> Dear -> Estimado/a

The relative clause (which) in GF

"Play Toy Story which was published last year"
Sentence = mkUtt( mkImp (mkVP
(mkV2 "play")
(mkNP (mkCN
(mkCN (mkN "Toy Story"))
(mkS pastTense simultaneousAnt(mkCl (mkVP
(mkV2 "publish")
(mkNP (mkCN (mkN "last yser")))
)))
))
));
When creating a relative clause sentence in GF it always the syntax S for sentence will add that between the two of the sentences is there is any way to replace that with which.
First of all, the structure you made isn't a relative clause, but this structure:
mkCN : CN -> S -> CN -- rule that she sleeps
Relative clause has the type RS in the RGL.
How to construct an actual RS
Here I build the RS gradually. Feel free to put these steps back to a single expression, if you wish so, but I find it clearer to to things like this.
oper
last_year_Adv : Adv = ParadigmsEng.mkAdv "last year" ;
published_last_year_VP : VP = mkVP (passiveVP (mkV2 "publish")) last_year_Adv ;
which_is_published_last_year_RCl : RCl = mkRCl which_RP published_last_year_VP ;
which_was_published_last_year_RS : RS = mkRS pastTense which_is_published_last_year_RCl ;
lin
play_TS = mkUtt (mkImp (mkVP
(mkV2 "play")
(mkNP
(mkNP (mkPN "Toy Story"))
which_was_published_last_year_RS)
)
) ;
Now when we test it on the GF shell, we see that despite the name, which_RP is actually "that".
> l play_TS
play Toy Story , that was published last year
How to change "that" into "which"
The first thing to check when you want to create a new lexical item is Paradigms module. Unfortunately, there is no mkRP for English. Things like relative pronouns are usually thought of as closed class: there's only a small, fixed amount of them. Other examples of closed classes are basic numerals (you can't just make up a new integer between 4 and 5!) and determiners. Contrast this to open classes like nouns, verbs and adjectives, those pop up all the time. For open classes, Paradigms modules have many options. But not for closed classes, like relative pronouns.
So if Paradigms doesn't help, the next thing to check is the language-specific Extra module.
Check language-specific Extra modules
If you have the RGL source on your own computer, you can just go to the directory gf-rgl/src/english and grep for which. Or you can use the RGL browser to search for interesting functions.
And there is indeed a relative pronoun in ExtraEng, also called which_RP. (There is also one called who_which_RP.) So now you can do these modifications in your grammar:
concrete MyGrammarEng of MyGrammar =
open SyntaxEng,
ParadigmsEng,
ExtraEng -- Need to open ExtraEng in the concrete!
in ** {
-- … as usual, except for
oper
which_is_published_last_year_RCl : RCl =
mkRCl ExtraEng.which_RP -- Use the ExtraEng.which_RP!
published_last_year_VP ;
And the rest of the code is like before. This produces the result you want.
> l play_TS
play Toy Story , which was published last year
Last-resort hack
So you have looked in all possible modules and found nothing. Consider making it into an issue in the gf-rgl repository, if it's something that's clearly wrong or missing.
But in any case, here's a general, unsafe hack to quickly construct what you want.
First, let's look at the lincat of RP in CatEng, {s : RCase => Str ; a : RAgr}. Then let's look at its implementation in RelativeEng. There you see also the explanation why it's always "that": unlike who and which, "that" works for animate and inanimate NPs.
So I would do this to force the string "which":
oper
myWhich_RP : RP = which_RP ** {s = table {_ => "which"}} ;
The ** syntax means record extension. We use all other fields from the original which_RP, but in the s field, we put a table whose all branches contain the string "which". (You can read more about this technique in my blog.)
Then we use the newly defined myWhich_RP in forming the relative clause:
oper
which_is_published_last_year_RCl : RCl = mkRCl myWhich_RP published_last_year_VP ;
This works too, but it's unsafe, because whenever the RGL changes, any code that touches the raw implementation may break.

Creating a string from a path

I have a chain of nodes, e.g.
{Name: 'X'}->{Name: 'Y'}->{Name: 'Z'}
and I'd like to create a string representing the path in cypher (I know I can do it client-side but I want to do it in a query) that would look like this:
'X->Y->Z'
is that feasible? I've investigated use of collect() and UNWIND, and have googled till I'm blue in the face
thoughts?
* Edit I *
as an addendum (and to make the problem more difficult) my query is going to return a collection of paths (a tree, a DAG), so I'll need to create a string for each of the paths in the tree
REDUCE is your friend here:
WITH reduce(s="",n in nodes(p) | s+n.name+"->") as str
RETURN substring(str,0,length(str)-2)
or if you want to save the extra operation
RETURN reduce(s=head(nodes(p)).name, n in tail(nodes(p)) | s+"->"+n.name)
or with APOC
RETURN apoc.text.join([n in nodes(p) | n.name],"->")
found an old posting (2014) from Michael Hunger with the answer:
http://grokbase.com/t/gg/neo4j/147ch1nj9b/concatenate-nodes-properties

Am I training my wit.ai bot correctly?

I'm trying to train my Wit.ai bot in order to recognize the first name of someone. I'm not very sure if I well understand how the NLP works so I'll give you an example.
I defined a lot of expressions like "My name is XXXX", "Everybody calls me XXXX"
In the "Understanding" table I added an entity named "contact_name" and I add almost 50 keywords like "Michel, John, Mary...".
I put the trait as "free-text" and "keywords".
I'm not sure if this process is correctly. So, I ask you:
does it matter the context like "My name is..." for the NLP? I mean...will it help the bot to predict that after this expression probably a fist name will come on?
is that right to add like 50 values to an entity or it's completly wrong?
what do you suggest as a training process in order to get the first name of someone?
You have done it right by keeping the entity's search strategy as "free-text" and "Keywords". But Adding keywords examples to the entity doesn't make any sense because a person's name is not a keyword.
So, I would recommend a training strategy which is as follows:
Create various templates of the message like, "My name is XYZ", "I am XYZ", "This is XYZ" etc. (all possible introduction messages you could think of)
Remove all keywords and expressions for the entity you created and add these two keywords:
"a b c d e f g h i j k l m n o p q r s t u v w x y z"
"XYZ" (can give any name but maintain this name same for validating the templates)
In the 'Understanding' tab enter the messages and extract the name into the entity ("contact_name" in your case) and validate them
Similarly, validate all message templates keeping the name as "XYZ"
After you have done this for all templates your bot will be able to recognise any name in a given template of the message
The logic behind this is your entity is a free-text and keyword which means it first tries to match the keyword if not matched it tries to find the word in the same position of the templates. Keeping the name same for validations helps to train the bot with the templates and learn the position where the name will be usually found.
Hope this works. I have tried this and worked for me. I am not sure how bot trains in background. I recommend you to start a new app and do this exercise.
Comment if there is any problem.
wit.ai has a pre-trained entity extraction method called wit/contact, which
Captures free text that's either the name or a clear reference to a
person, like "Paul", "Paul Smith", "my husband", "the dentist".
It works good even without any training data.
To read about the method refer to duckling.

Hadoop Pig - Replace strings in a relation with their corresponding values in a map

I have a relation called conversations_grouped made up of bags of tuples of varying sizes, like so:
DUMP conversations_grouped:
...
({(L194),(L195),(L196),(L197)})
({(L198),(L199)})
({(L200),(L201),(L202),(L203)})
({(L204),(L205),(L206)})
({(L207),(L208)})
({(L271),(L272),(L273),(L274),(L275)})
({(L276),(L277)})
({(L280),(L281)})
({(L363),(L364)})
({(L365),(L366)})
({(L666256),(L666257)})
({(L666369),(L666370),(L666371),(L666372)})
({(L666520),(L666521),(L666522)})
Each L[0-9]+ is a tag corresponding to a string. For example, L194 might be "Hello, how are you doing?" and L195 might be "fine, how are you?". This correspondence is maintained by a map called line_map. Here's a sample:
DUMP line_map;
...
([L666324#Do you think she might be interested in someone?])
([L666264#Well that's typical of Her Majesty's army. Appoint an engineer to do a soldier's work.])
([L666263#Um. There are rumours that my Lord Chelmsford intends to make Durnford Second in Command.])
([L666262#Lighting COGHILL' 5 cigar: Our good Colonel Dumford scored quite a coup with the Sikali Horse.])
([L666522#So far only their scouts. But we have had reports of a small Impi farther north, over there. ])
([L666521#And I assure you, you do not In fact I'd be obliged for your best advice. What have your scouts seen?])
([L666520#Well I assure you, Sir, I have no desire to create difficulties. 45])
([L666372#I think Chelmsford wants a good man on the border Why he fears a flanking attack and requires a steady Commander in reserve.])
([L666371#Lord Chelmsford seems to want me to stay back with my Basutos.])
([L666370#I'm to take the Sikali with the main column to the river])
([L666369#Your orders, Mr Vereker?])
([L666257#Good ones, yes, Mr Vereker. Gentlemen who can ride and shoot])
([L666256#Colonel Durnford... William Vereker. I hear you 've been seeking Officers?])
What I'm trying to do now is parse through each line and replace the L[0-9]+ tags with their corresponding text from line_map. Is it possible to make references to line_map from within a Pig FOREACH statement, or is there something else I have to do?
The first issue with this is that in a map the key must be a quoted string. So you can't use a schema value to access the map. E.G. This will not work.
C: {foo: chararray, M: [value:chararray]}
D = FOREACH C GENERATE M#foo ;
The solution that comes to mind is to FLATTEN conversations_grouped. Then do a join between conversations_grouped and line_map on the L[0-9]+ tag. You'll probably want to project out some of the extra fields (like the L[0-9]+ tag after the join) to make the next step faster. After that you'll have to regroup the data, and massage it into the correct format.
This won't work unless each bag has it's own unique ID for the regrouping, but if each of the L[0-9]+ tags appear in only one bag (conversation) you can use this to create a unique id.
-- A is dumped conversations_grouped
B = FOREACH A {
-- Pulls out an element from the bag to use as the id
id = LIMIT tags 1 ;
-- Flattens B into id, tag form. Each group of tags will have the same id.
GENERATE FLATTEN(id), FLATTEN(tags) ;
}
The schema and output for B is:
B: {id: chararray,tags::tag: chararray}
(L194,L194)
(L194,L195)
(L194,L196)
(L194,L197)
(L198,L198)
(L198,L199)
(L200,L200)
(L200,L201)
(L200,L202)
(L200,L203)
(L204,L204)
(L204,L205)
(L204,L206)
(L207,L207)
(L207,L208)
(L271,L271)
(L271,L272)
(L271,L273)
(L271,L274)
(L271,L275)
(L276,L276)
(L276,L277)
(L280,L280)
(L280,L281)
(L363,L363)
(L363,L364)
(L365,L365)
(L365,L366)
(L666256,L666256)
(L666256,L666257)
(L666369,L666369)
(L666369,L666370)
(L666369,L666371)
(L666369,L666372)
(L666520,L666520)
(L666520,L666521)
(L666520,L666522)
Assuming that the tags are unique, the rest is done like:
-- A2 is line_map, loaded in tag/message pairs instead of a map
-- Joins conversations_grouped and line_map on tag
C = FOREACH (JOIN B by tags::tag, A2 by tag)
-- This generate removes the tag
GENERATE id, message ;
-- Regroups C on the id created in B
D = FOREACH (GROUP C BY id)
-- This step limits the output to just messages
GENERATE C.(message) AS messages ;
Schema and output from D:
D: {messages: {(A2::message: chararray)}}
({(Colonel Durnford... William Vereker. I hear you 've been seeking Officers?),(Good ones, yes, Mr Vereker. Gentlemen who can ride and shoot)})
({(Your orders, Mr Vereker?),(I'm to take the Sikali with the main column to the river),(Lord Chelmsford seems to want me to stay back with my Basutos.),(I think Chelmsford wants a good man on the border Why he fears a flanking attack and requires a steady Commander in reserve.)})
({(Well I assure you, Sir, I have no desire to create difficulties. 45),(And I assure you, you do not In fact I'd be obliged for your best advice. What have your scouts seen?),(So far only their scouts. But we have had reports of a small Impi farther north, over there. )})
NOTE: If at worst, (the L[0-9]+ tags aren't unique) you can give each line of the input file(s) a sequential, integer id before you load it into pig.
UPDATE: If you are using pig 0.11, then you can also use the RANK operator.