Adding Tenses to Variants on GF - gf

On GF tutorial on variants construct it says to express variation on GF one of the following methods should be used.
lin Delicious = {s = "delicious" | "exquisit" | "tasty"} ;
lin Delicious = {s = variants {"delicious" ; "exquisit" ; "tasty"}} ;
I have been using the first method for a while but GF would report some weird error messages sometimes.
Currently, I use the second method all the time. My question is, is there is any way to create such variants for a verb with tenses such as
lin Action = {s = variants {"write", "wrote" ; "buy", "bought" ; "read", "read}} ;
And if so how to use it!
Keep up the good work ~.~

What do you want to achieve with having different tenses as variants? With tenses as variants, I'm thinking of a situation where tense doesn't make a difference in your application grammar, and you want to parse sentences like "I buy food" and "I bought food" into the same tree.
I'm going to show how to do it, but first, let's fix the syntax error.
Fixing the syntax error
I don't understand what is your aim with variants {"write", "wrote" ; "buy", "bought" ; "read", "read}. First of all, this is a syntax error in GF: the terms between ; or | must be valid GF expressions, but "write", "wrote" is two strings separated by a comma, which is not a valid GF expression.
If you really create a variant that includes any verb in any tense, replace the commas with semicolons. Like this:
all_verbs : Str = variants { "write" ; "wrote" ; "buy" ; "bought" ; "read"} ;
Testing in GF shell:
> cc -all all_verbs
write / wrote / buy / bought / read
If you want just the tenses, then you can write this:
write : Str = variants {"write" ; "wrote"} ;
buy : Str = variants {"buy" ; "bought"} ;
Looks like this in the GF shell:
> cc -all write
write / wrote
Tenses as variants in an application grammar
I've written a small application grammar, where I demonstrate the best way to do this using the RGL. The link is here: https://gist.github.com/inariksit/b444a01b0a23468b7a558e39c6ce06fd The crucial thing is, as you see in the code, that the verbs are completely normal, and we add the variants only at the sentence level. That is much safer than trying to introduce variants in the verbs' inflection tables.
This grammar behaves as follows:
1. linearize only outputs the first option.
TenseVariants> l Pred Cat Sleep
the cat sleeps
2. parse accepts both.
TenseVariants> p "the cat sleeps"
Pred Cat Sleep
TenseVariants> p "the cat slept"
Pred Cat Sleep
If you want to control when to output "sleeps" and when "slept", then you can't do it with variants: you need to actually have two different abstract syntax trees that correspond to the two outputs.
However, if you only need to output one of the forms, but want to parse both, then variants is a right choice.
Nondeterminism in GF
I'm curious what was the reason you wanted to put all verbs under the same variant? It's already possible to generate all combinations with GF, using the gt ("generate trees") command:
TenseVariants> gt Pred Cat ? | l
the cat reads
the cat sleeps
the cat writes
The question mark is in the place of the verb, and what gt does is that it replaces the question mark with all suitable expressions in the grammar.[1]
Combine this with the flag -all for linearize, and you get this:
TenseVariants> gt Pred Cat ? | l -all
the cat reads
the cat read
the cat sleeps
the cat slept
the cat writes
the cat wrote
If I misunderstood your goal with the variants, or something else is unclear, let me know and I'll be glad to clarify!
[1] Try running this command in the resource grammar and see what happens!
$ gf alltenses/LangEng.gfo
Lang> gt PredVP (UsePron ?) (UseComp (CompAP (PositA ?)))
You'll get a long list of stuff. If you only want to generate one, use the command gr ("generate random").
Lang> gr PredVP (UsePron ?) (UseComp (CompAP (PositA ?))) | l -treebank
Lang: PredVP (UsePron i_Pron) (UseComp (CompAP (PositA bad_A)))
LangEng: I am bad

Related

Script that removes all occurences of duplicated lines from file + keeps the original order of lines (perl + python + lua) [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 2 years ago.
Improve this question
As the title says, i need to make a script in perl, one in python and one in lua that removes all occurences of a duplicate line (can be even a one-line command). For example let's say the file has the following lines (I don't know exactly what the file has, need a generic command to do that, this is just an example):
apple
orange
banana
banana
berry
cherry
orange
melon
The output should be like :
apple
berry
cherry
melon
Another thing to note is that i need the file to have the same line order as it was at the beginning. I managed to pull multiple commands using awk and sed, but i couldn't find anything related to removing all occurences in python / lua / perl.
In Perl, you'd keep a hash to record what you've already seen.
#!/usr/bin/perl
use strict;
use warnings;
my %seen;
while (<>) {
print unless $seen{$_}++;
}
This reads from STDIN and writes to STDOUT, so you can use it as a Unix filter. If it's in a file called filter:
$ filter < input_data > filtered_data
Update: Ok, I misunderstood the requirement. You can't do this without iterating across the data twice. Here's a Perl solution.
#!/usr/bin/perl
use strict;
use warnings;
my #data;
my %count;
# Store the input data and also
# keep a count.
while (<>) {
$count{$_}++;
push #data, $_;
}
# Print the input data, but only
# records which only appear once.
print grep { $count{$_} == 1 } #data;
In python you can just use the following script.
file = open("myfile.txt", "r")
no_duplicates = list(dict.fromkeys(file.readlines())
readlines() returns an ordered list of the file content. Each line gives a list item.
dict.fromkeys(a) generates a dict for the list of key a. It reads the list in order, and doesn't add already existing keys.
Algorithm of the code
walk through the data
++$count{$_} occurrences of word
store word in an array #words only if seen for first time (to preserve words order)
use array #words to output word only if it's $count{$_} == 1
use strict;
use warnings;
my %count;
my #words;
for( <DATA> ) {
++$count{$_} == 1 && push #words, $_;
}
$count{$_} == 1 && print for #words;
__DATA__
apple
orange
banana
banana
berry
cherry
orange
melon
Output
apple
berry
cherry
melon
sorry for that one... here the crt code in python....
x=open("dupli","r")
ar=(x.readlines())
new=[]
for i in ar:
co=ar.count(i)
if co == 1:
new.append(i)
print(new)
the result will be..
['apple\n', 'berry\n', 'cherry\n', 'melon\n']

Can gather be used to unroll Junctions?

In this program:
use v6;
my $j = +any "33", "42", "2.1";
gather for $j -> $e {
say $e;
} # prints 33␤42␤2.1␤
for $j -> $e {
say $e; # prints any(33, 42, 2.1)
}
How does gather in front of forchange the behavior of the Junction, allowing to create a loop over it? The documentation does not seem to reflect that behavior. Is that spec?
Fixed by jnthn in code and test commits.
Issue filed.
Golfed:
do put .^name for any 1 ; # Int
put .^name for any 1 ; # Mu
Any of ten of the thirteen statement prefixes listed in the doc can be used instead of do or gather with the same result. (supply unsurprisingly produces no output and hyper and race are red herrings because they try and fail to apply methods to the junction values.)
Any type of junction produces the same results.
Any number of elements of the junction produces the same result for the for loop without a statement prefix, namely a single Mu. With a statement prefix the for loop repeats the primary statement (the put ...) the appropriate number of times.
I've searched both rt and gh issues and failed to find a related bug report.

Create a wordlist using hashcat?

Hashcat doesn't support the target application I'm trying to crack, but I'm wondering whether the mask function can be 'fed' the list of passwords and parsed through the rockyou rule to generate an effective wordlist for me?
If so, how can this be done as the documentation leaves lots to be desired.. !
Many thanks
I used HashCatRulesEngine:
https://github.com/llamasoft/HashcatRulesEngine
You can chain all the HashCat rules together, it then union selects them, weeds out any duplicates and takes as input your sample password file.
It then generates all possible permutations.
For instance:
echo "password">foo
./hcre /Users/chris/Downloads/hashcat-4.0.0/rules/Incisive-leetspeak.rule /Users/chris/Downloads/hashcat-4.0.0/rules/InsidePro-HashManager.rule /Users/chris/Downloads/hashcat-4.0.0/rules/InsidePro-PasswordsPro.rule /Users/chris/Downloads/hashcat-4.0.0/rules/T0XlC-insert_00-99_1950-2050_toprules_0_F.rule /Users/chris/Downloads/hashcat-4.0.0/rules/T0XlC-insert_space_and_special_0_F.rule /Users/chris/Downloads/hashcat-4.0.0/rules/T0XlC-insert_top_100_passwords_1_G.rule /Users/chris/Downloads/hashcat-4.0.0/rules/T0XlC.rule /Users/chris/Downloads/hashcat-4.0.0/rules/T0XlCv1.rule /Users/chris/Downloads/hashcat-4.0.0/rules/best64.rule /Users/chris/Downloads/hashcat-4.0.0/rules/combinator.rule /Users/chris/Downloads/hashcat-4.0.0/rules/d3ad0ne.rule /Users/chris/Downloads/hashcat-4.0.0/rules/dive.rule /Users/chris/Downloads/hashcat-4.0.0/rules/generated.rule /Users/chris/Downloads/hashcat-4.0.0/rules/generated2.rule /Users/chris/Downloads/hashcat-4.0.0/rules/hybrid /Users/chris/Downloads/hashcat-4.0.0/rules/leetspeak.rule /Users/chris/Downloads/hashcat-4.0.0/rules/oscommerce.rule /Users/chris/Downloads/hashcat-4.0.0/rules/rockyou-30000.rule /Users/chris/Downloads/hashcat-4.0.0/rules/specific.rule /Users/chris/Downloads/hashcat-4.0.0/rules/toggles1.rule /Users/chris/Downloads/hashcat-4.0.0/rules/toggles2.rule /Users/chris/Downloads/hashcat-4.0.0/rules/toggles3.rule /Users/chris/Downloads/hashcat-4.0.0/rules/toggles4.rule /Users/chris/Downloads/hashcat-4.0.0/rules/toggles5.rule /Users/chris/Downloads/hashcat-4.0.0/rules/unix-ninja-leetspeak.rule < foo >passwordsx
1 password the word "password" was permutated a total of:
bash-3.2# wc -l passwordsx
227235 passwordsx
bash-3.2#
Times meaning that each word you feed into this generates 227235 possible combinations roughly giving you full coverage..
You can use hashcat itself as a candidate generator by adding the --stdout switch (then pipe to your file or program of choice). I haven't tried all the possibilities, but it should work with any of the supported hashcat modes.
Here's an example using a ruleset: https://hashcat.net/wiki/doku.php?id=rule_based_attack#debugging_rules

Regex speed in Perl 6

I've been previously working only with bash regular expressions, grep, sed, awk etc. After trying Perl 6 regexes I've got an impression that they work slower than I would expect, but probably the reason is that I handle them incorrectly.
I've made a simple test to compare similar operations in Perl 6 and in bash. Here is the Perl 6 code:
my #array = "aaaaa" .. "fffff";
say +#array; # 7776 = 6 ** 5
my #search = <abcde cdeff fabcd>;
my token search {
#search
}
my #new_array = #array.grep({/ <search> /});
say #new_array;
Then I printed #array into a file named array (with 7776 lines), made a file named search with 3 lines (abcde, cdeff, fabcd) and made a simple grep search.
$ grep -f search array
After both programs produced the same result, as expected, I measured the time they were working.
$ time perl6 search.p6
real 0m6,683s
user 0m6,724s
sys 0m0,044s
$ time grep -f search array
real 0m0,009s
user 0m0,008s
sys 0m0,000s
So, what am I doing wrong in my Perl 6 code?
UPD: If I pass the search tokens to grep, looping through the #search array, the program works much faster:
my #array = "aaaaa" .. "fffff";
say +#array;
my #search = <abcde cdeff fabcd>;
for #search -> $token {
say ~#array.grep({/$token/});
}
$ time perl6 search.p6
real 0m1,378s
user 0m1,400s
sys 0m0,052s
And if I define each search pattern manually, it works even faster:
my #array = "aaaaa" .. "fffff";
say +#array; # 7776 = 6 ** 5
say ~#array.grep({/abcde/});
say ~#array.grep({/cdeff/});
say ~#array.grep({/fabcd/});
$ time perl6 search.p6
real 0m0,587s
user 0m0,632s
sys 0m0,036s
The grep command is much simpler than Perl 6's regular expressions, and it has had many more years to get optimized. It is also one of the areas that hasn't seen as much optimizing in Rakudo; partly because it is seen as being a difficult thing to work on.
For a more performant example, you could pre-compile the regex:
my $search = "/#search.join('|')/".EVAL;
# $search = /abcde|cdeff|fabcd/;
say ~#array.grep($search);
That change causes it to run in about half a second.
If there is any chance of malicious data in #search, and you have to do this it may be safer to use:
"/#search».Str».perl.join('|')/".EVAL
The compiler can't quite generate that optimized code for /#search/ as #search could change after the regex gets compiled. What could happen is that the first time the regex is used it gets re-compiled into the better form, and then cache it as long as #search doesn't get modified.
(I think Perl 5 does something similar)
One important fact you have to keep in mind is that a regex in Perl 6 is just a method that is written in a domain specific sub-language.

how do I set one variable equal to another in pig latin

I would like to do
register s3n://uw-cse344-code/myudfs.jar
-- load the test file into Pig
--raw = LOAD 's3n://uw-cse344-test/cse344-test-file' USING TextLoader as (line:chararray);
-- later you will load to other files, example:
raw = LOAD 's3n://uw-cse344/btc-2010-chunk-000' USING TextLoader as (line:chararray);
-- parse each line into ntriples
ntriples = foreach raw generate FLATTEN(myudfs.RDFSplit3(line)) as (subject:chararray,predicate:chararray,object:chararray);
--filter 1
subjects1 = filter ntriples by subject matches '.*rdfabout\\.com.*' PARALLEL 50;
--filter 2
subjects2 = subjects1;
but I get the error:
2012-03-10 01:19:18,039 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1200: mismatched input ';' expecting LEFT_PAREN
Details at logfile: /home/hadoop/pig_1331342327467.log
so it seems pig doesn't like that. How do I accomplish this?
i don't think that kind of 'typical' assignment works in pig. It's not really a programming language in the strict sense - it's a high-level language on top of hadoop with some specialized functions.
i think you'll need to simply re-project the data from subjects1 to subjects2, such as:
subjects2 = foreach subjects1 generate $0, $1, $2;
another approach might be to use the LIMIT function with some absurdly high parameter.
subjects2 = subjects2 LIMIT 100000000 ;
there could be a lot of reasons why that doesn't make sense, but it's a thought.
i sense you are considering doing things as you would in a programming language
i have found that rarely works out like you want it to but you can always get the job done once you think like Pig.
As I understand your example fro DataScience coursera course.
It's strange but I found the same problem. This code works on the on amount of data and don't on the another.
Because we need to change parameters I used this code:
filtered2 = foreach filtered generate subject as subject2, predicate as predicate2, object as object2;