How to find the "lexical file" in Wordnet? - sparql

If you look at the original Wordnet search and select "Display options: Show Lexical File Info", you'll see an extremely useful classification of words called lexical file. Eg for "filling" we have:
<noun.substance>S: (n) filling, fill (any material that fills a space or container)
<noun.process>S: (n) filling (flow into something (as a container))
<noun.food>S: (n) filling (a food mixture used to fill pastry or sandwiches etc.)
<noun.artifact>S: (n) woof, weft, filling, pick (the yarn woven across the warp yarn in weaving)
<noun.artifact>S: (n) filling ((dentistry) a dental appliance consisting of ...)
<noun.act>S: (n) filling (the act of filling something)
The first thing in brackets is the "lexical file". Unfortunately I have not been able to find a SPARQL endpoint that provides this info
The latest RDF translation of Wordnet 3.0 points to two things:
Talis SPARQL endpoint. Use eg this query to check there's no such info:
DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-chair-noun-1>
W3C's mapping description. Appendix D "Conversion details" describes something useful: wn:classifiedByTopic.
But it's not the same as lexical file, and is quite incomplete. Eg "chair" has nothing, while one of the senses of "completion" is in the topic "American Football"
DESCRIBE <http://purl.org/vocabularies/princeton/wn30/synset-completion-noun-1> ->
<j.1:classifiedByTopic rdf:resource="http://purl.org/vocabularies/princeton/wn30/synset-American_football-noun-1"/>
The question: is there a public Wordnet query API, or a database, that provides the lexical file information?

Using the Python NLTK interface:
from nltk.corpus import wordnet as wn
for synset in wn.synsets('can'):
print synset.lexname

I don't think you can find it in the RDF/OWL Representation of WordNet. It's in the WordNet distribution though: dict/lexnames. Here is the content of the file as of WordNet 3.0:
00 adj.all 3
01 adj.pert 3
02 adv.all 4
03 noun.Tops 1
04 noun.act 1
05 noun.animal 1
06 noun.artifact 1
07 noun.attribute 1
08 noun.body 1
09 noun.cognition 1
10 noun.communication 1
11 noun.event 1
12 noun.feeling 1
13 noun.food 1
14 noun.group 1
15 noun.location 1
16 noun.motive 1
17 noun.object 1
18 noun.person 1
19 noun.phenomenon 1
20 noun.plant 1
21 noun.possession 1
22 noun.process 1
23 noun.quantity 1
24 noun.relation 1
25 noun.shape 1
26 noun.state 1
27 noun.substance 1
28 noun.time 1
29 verb.body 2
30 verb.change 2
31 verb.cognition 2
32 verb.communication 2
33 verb.competition 2
34 verb.consumption 2
35 verb.contact 2
36 verb.creation 2
37 verb.emotion 2
38 verb.motion 2
39 verb.perception 2
40 verb.possession 2
41 verb.social 2
42 verb.stative 2
43 verb.weather 2
44 adj.ppl 3
For each entry of dict/data.*, the second number is the lexical file info. For example, this filling entry contains the number 13, which is noun.food.
07883031 13 n 01 filling 0 002 # 07882497 n 0000 ~ 07883156 n 0000 | a food mixture used to fill pastry or sandwiches etc.

It can be done through MIT JWI (MIT Java Wordnet Interface) a Java API to query Wordnet. There's a topic in this link showing how to implement a java class to access lexicographic

This is what worked for me,
Synset[] synsets = database.getSynsets(wordStr);
ReferenceSynset referenceSynset = (ReferenceSynset) synsets[i];
int lexicalCode =referenceSynset.getLexicalFileNumber();
Then use above table to deduce "lexnames" e.g. noun.time

If you're on Windows, chances are it is in your appdata, in the local directory. To get there, you will want to open your file browser, go to the top, and type in %appdata%
Next click on roaming, and then find the nltk_data directory. In there, you will have your corpora file. The full path is something like:
C:\Users\yourname\AppData\Roaming\nltk_data\corpora
and lexnames will present under
C:\Users\yourname\AppData\Roaming\nltk_data\corpora\wordnet.

Related

How to convert Stack based instructions to Register based

This is what I have tested with the dis module in python -
>>> def f():
... a = 1
... b = 2
... c = 3
... a = b + c * a
... return a + c
...
>>> dis.dis(f)
2 0 LOAD_CONST 1 (1)
2 STORE_FAST 0 (a)
3 4 LOAD_CONST 2 (2)
6 STORE_FAST 1 (b)
4 8 LOAD_CONST 3 (3)
10 STORE_FAST 2 (c)
5 12 LOAD_FAST 1 (b)
14 LOAD_FAST 2 (c)
16 LOAD_FAST 0 (a)
18 BINARY_MULTIPLY
20 BINARY_ADD
22 STORE_FAST 0 (a)
6 24 LOAD_FAST 0 (a)
26 LOAD_FAST 2 (c)
28 BINARY_ADD
30 RETURN_VALUE
Those are instructions for a stack-based virtual machine. Is there any way to convert the above stack-based instructions into register-based instructions provided I have access to unlimited number of registers.
I only know about one tool which does that, we know that JVM is stack based but Dalvik VM is register-based. When we write code in Java, the class files contain stack based instructions and the dx tool converts the stack based instructions to register based instructions so that it can run in the Dalvik VM. So most probably there could be an algorithm somewhere which I have missed.
Also can there be an edge can where the stack could dynamically grow and shrink(which would be decided in runtime) , in that case it would be impossible to convert stack based instructions to register based. However one tool does it.
Can someone point me to the correct direction. Or know any algorithm which can help in this.

Find complexity for rewritten DFS

For the following question:
By given N stations, source station, destination station, data structure that represents all the possible paths between every 2 stations (paths are undirected) and the length of every path from one station to another – Find the longest path from source station the destination station
I wrote the following pseudo-code:
Vertex
String key
boolean visited
LinkedList<Edge>nextEdges
Edge
Vertext nextVertex
int length
tempSum=0
stack
maxSum=0
maxPath
maxPathByDFS(src,dst)
1 stack.push(src)
2 src.visited=true
3 if (!src.nextEdges.isEmpty)
4 for (currEdge∶src.nextEdges)
5 if (currEdge.nextVertex==dst)
6 tempSum+=currEdge.lenght
7 stack.push(dst)
8 if(tempSum>maxSum)
9 maxSum=tempSum
10 stack.toArray(maxPath)
11 stack.pop
12 tempSum-=currEdge.lenght
13 else if (!currEdge.nextVertex.visited)
14 tempSum+=currEdge.lenght
15 maxPathByDFS(currEdge.nextVertex,dst)
16 tempSum-=currEdge.lenght
17 stack.pop
18 src.visited=false
19 return
Could you please help me to find the run-time complexity of my solution?

Regex: extracting a house number from an address

I have following patterns:
13 R 2
48 B / 5
42 B
42B
303 Box 15
303 Bte 15
303 B Bt 15
and only want to have the following results (because Box 15, Bte 15 are the box numbers, and I only want the house nbr + potentially the letter attached to the house number):
13 R 2
48 B / 5
42 B
42B
303
303
303 B
Is this possible using a regular expression? I tried the following: REGEXP_SUBSTR(my_string_variable, '^\d+(\s*\w$)?'). This however only works for the patterns 3-5, and not for the first 2 and last patterns. Dropping the $ from the regex would incorrectly 'strip' the first letter for patterns 5 and 6.
I am basically assuming that if the letter behind the numeric is more than 1 character, that it belongs to the box number. For example, BTE is the French abbreviation for Boite which means Box. I realise this might be invalid if a house number has 2 letters (e.g.: 11 AA), but I would not know a solution for this and I don't think it occurs much.
This will remove: a space followed by an uppercase letter followed by at least one lowercase letter followed by an optional space followed by any number of digits:
RegExp_Replace(house_number, '\s[A-Z][a-z]+\s+\d+$')
See regex101.com

Microsoft Internet Controls

I am using:
IE.ExecWB 17, 0 '// SelectAll
IE.ExecWB 12, 2 '// Copy selection
in an Excel VBA program successfully, but I am having trouble finding a reference for all ExecWB methods. Can anyone point me in the right direction?
Here is something from my database. I doubt you will find this on the web anymore. I will be surprised if you do...
ExecWB syntax is as follows:
object.ExecWB nCmdID, nCmdExecOpt, [pvaIn], [pvaOut]
The ExecWB method requires an OLE Command ID to be passed in to identify the command to execute. This value nCmdID is of type Long. The nCmdExecOpt parameter represents the value for the command execution option. Together, these values instruct the control as to what supported command to execute and what degree of user prompting should occur.
The last two parameters pvaIn and paOut are optional and is usually set to either NULL or an empty string.
Here is a complete list for the 1st parameter
OLECMDID_OPEN 1 Open
OLECMDID_NEW 2 Create a new document
OLECMDID_SAVE 3 Preservation
OLECMDID_SAVEAS 4 Save as
OLECMDID_SAVECOPYAS 5  
OLECMDID_PRINT 6 Print
OLECMDID_PRINTPREVIEW 7 Print preview
OLECMDID_PAGESETUP 8 Page setup
OLECMDID_SPELL 9 The spelling check
OLECMDID_PROPERTIES 10 Attribute
OLECMDID_CUT 11 Shear
OLECMDID_COPY 12 Replication
OLECMDID_PASTE 13 Paste
OLECMDID_PASTESPECIAL 14 Paste special
OLECMDID_UNDO 15 Revoke
OLECMDID_REDO 16 Repeat
OLECMDID_SELECTALL 17 Select all
OLECMDID_CLEARSELECTION 18 Clear selection
OLECMDID_ZOOM 19
OLECMDID_GETZOOMRANGE 20
OLECMDID_UPDATECOMMANDS 21 The update command
OLECMDID_REFRESH 22 Refresh
OLECMDID_STOP 23 Stop it
OLECMDID_HIDETOOLBARS 24 Hide toolbar
OLECMDID_SETPROGRESSMAX 25 Progress bar maximum
OLECMDID_SETPROGRESSPOS 26 Progress bar position
OLECMDID_SETPROGRESSTEXT 27 Progress bar text
OLECMDID_SETTITLE 28 Set the title
OLECMDID_SETDOWNLOADSTATE 29 Set download status
OLECMDID_STOPDOWNLOAD 30 Stop downloading
OLECMDID_ONTOOLBARACTIVATED 31
OLECMDID_FIND 32 Search
OLECMDID_DELETE 33 Delete
OLECMDID_HTTPEQUIV 34
OLECMDID_HTTPEQUIV_DONE 35
OLECMDID_ENABLE_INTERACTION 36 Allow the interaction
OLECMDID_ONUNLOAD 37 When uninstall
OLECMDID_PROPERTYBAG2 38
OLECMDID_PREREFRESH 39
OLECMDID_SHOWSCRIPTERROR 40
OLECMDID_SHOWMESSAGE 41 Display a message
OLECMDID_SHOWFIND 42 Display search
OLECMDID_SHOWPAGESETUP 43 Display page setup
OLECMDID_SHOWPRINT 44 Display and printing
OLECMDID_CLOSE 45 Close
OLECMDID_ALLOWUILESSSAVEAS 46
OLECMDID_DONTDOWNLOADCSS 47
OLECMDID_UPDATEPAGESTATUS 48
OLECMDID_PRINT2 49 Print 2
OLECMDID_PRINTPREVIEW2 50 Print preview
OLECMDID_SETPRINTTEMPLATE 51 Set the print template
OLECMDID_GETPRINTTEMPLATE 52 Get a print template
OLECMDID_PAGEACTIONBLOCKED 55
OLECMDID_PAGEACTIONUIQUERY 56
OLECMDID_FOCUSVIEWCONTROLS 57
OLECMDID_FOCUSVIEWCONTROLSQUERY 58
OLECMDID_SHOWPAGEACTIONMENU 59
OLECMDID_ADDTRAVELENTRY 60
OLECMDID_UPDATETRAVELENTRY 61
OLECMDID_UPDATEBACKFORWARDSTATE 62
OLECMDID_OPTICAL_ZOOM 63
OLECMDID_OPTICAL_GETZOOMRANGE 64
OLECMDID_WINDOWSTATECHANGED 65 windows status change
Here is a complete list for the 2nd parameter
OLECMDEXECOPT_DODEFAULT 0 Default parameters
OLECMDEXECOPT_PROMPTUSER 1 Prompt the user, namely the pop-up dialog box
LECMDEXECOPT_DONTPROMPTUSER 2 User is not prompted
OLECMDEXECOPT_SHOWHELP 3 displays help
Examples
WebBrowser.ExecWB(6,1) '<~~ Print
WebBrowser.ExecWB(7,1) '<~~ Print preview
WebBrowser.ExecWB(8,1) '<~~ The printed page setup

What is wrong with this LDAP filter packet?

I am trying to port a program which queries an LDAP server from Perl to Go, and with the Go version I am receiving a response that the filter is malformed:
00000057: LdapErr: DSID-0C0C0968, comment: The server was unable to decode a search request filter, data 0, v1db1\x00
I have used tcpdump to capture the data transmitted to the server with both the Perl and Go versions of my program, and have found that they are sending slightly different filter packets. This question is not about any possible bugs in the Go program, but simply about understanding the contents of the LDAP filter packets.
The encoded filter is:
(objectClass=*)
And the Perl-generated packet (which the server likes) looks like this:
ASCII . . o b j e c t C l a s s
Hex 87 0b 6f 62 6a 65 63 74 43 6c 61 73 73
Byte# 0 1 2 3 4 5 6 7 8 9 10 11 12
The Go-generated packet (which the server doesn't like) looks like this:
ASCII . . . . o b j e c t C l a s s
Hex a7 0d 04 0b 6f 62 6a 65 63 74 43 6c 61 73 73
Byte# 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
This is my own breakdown of the packets:
##Byte 0: Tag
When I dissect Byte 0 from both packets, I see they are identical, except for the Primitive/Constructed bit, which is set to Primitive in the Perl version, and Constructed in the Go version. See DER encoding for details.
Bit# 87 6 54321
Perl 10 0 00111
Go 10 1 00111
Bits 87: In both packets, 10 = Context Specific
Bit 6: In the Perl version 0 = Primitive, in the Go version 1 = Constructed
Bits 54321: 00111 = 7 = Object descriptor
##Byte 1: Length
11 bytes for the Perl version, 13 for the Go version
##Bytes 2-3 for the Go version
Byte 2: Tag 04: Substring Filter (See section 4.5.1 of RFC 4511)
Byte 3: Length of 11 bytes
##Remainder: Payload
For both packets this is simply the ASCII text objectClass
My reading of RFC 4511 section 4.5.1 suggests that the Go version is "more" correct, yet the Perl version is the one that works with the server. What gives?
Wireshark is able to parse both packets, and interprets them both equally.
The Perl version is correct, and the Go version is incorrect.
As you point out, RFC 4511 section 4.5.1 specifies encoding for the filter elements, like:
Filter ::= CHOICE {
and [0] SET SIZE (1..MAX) OF filter Filter,
or [1] SET SIZE (1..MAX) OF filter Filter,
not [2] Filter,
equalityMatch [3] AttributeValueAssertion,
substrings [4] SubstringFilter,
greaterOrEqual [5] AttributeValueAssertion,
lessOrEqual [6] AttributeValueAssertion,
present [7] AttributeDescription,
approxMatch [8] AttributeValueAssertion,
extensibleMatch [9] MatchingRuleAssertion,
... }
And in this case, the relevant portion is:
present [7] AttributeDescription,
The AttributeDescription element is defined in section 4.1.4 of the same specification:
AttributeDescription ::= LDAPString
-- Constrained to <attributedescription>
-- [RFC4512]
And from section 4.1.2:
LDAPString ::= OCTET STRING -- UTF-8 encoded,
-- [ISO10646] characters
So this means that the present filter component is an octet string, which is a primitive element. Go is incorrectly converting it to a constructed element, and the directory server is correctly rejecting that malformed request.