I have a class Selfie, an objectProperty personPicture (domain Picture, range Person) to explain who is on the picture, and a dc:creator. I want to say that a selfie must have at least 1 personPicture, and the dc:creator should be someone in the list of personPicture. I tried :
<owl:Class rdf:about="http://www.semanticweb.org/leo/ontologies/album#Selfie">
<rdfs:subClassOf rdf:resource="http://www.semanticweb.org/leo/ontologies/album#Picture"/>
<rdfs:subClassOf>
<owl:Restriction>
<owl:onProperty rdf:resource="http://www.semanticweb.org/leo/ontologies/album#personPicture"/>
<owl:minQualifiedCardinality rdf:datatype="http://www.w3.org/2001/XMLSchema#nonNegativeInteger">1</owl:minQualifiedCardinality>
<owl:onClass rdf:resource="http://www.semanticweb.org/leo/ontologies/album#Person"/>
</owl:Restriction>
</rdfs:subClassOf>
</owl:Class>
but it doesn't work. I can have a selfie without personPicture and I absolutely don't know how to do the second trick!
Your axioms are easier to read in Manchester syntax, where they are:
Selfie SubClassOf Picture
Selfie SubClassOf personPicture min 1 Person
The second says that each Selfie is related to at least one Person by the property personPicture. That's consistent with a dataset like:
x a Selfie .
Even though it doesn't specify which Person x is related to by personPicture, a reasoner will correctly infer that there is one. That's part of the open-world assumption that is adopted in OWL. Just because something isn't know does not mean that it is false.
As for your second condition, I'm not sure that you can express it in OWL. You might be able to the axiom that "if a picture's creator is in the picture, then the picture is a Selfie", but that's the other direction from what you're asking.
I am trying to create an interface for WOLF (Wordnet Libre du Français, Free French Wordnet). The goal is to replicate the AWNDatabaseManger for the Arabic Wordnet (http://www.talp.upc.edu/index.php/technology/resources/multilingual-lexicons-and-machine-translation-resources/multilingual-lexicons/72-awn), but for WOLF.
The problem I am facing is that I cannot find proper data specifications for WOLF (http://alpage.inria.fr/~sagot/wolf-en.html) or WoNeF (which is another French tranlated Wordnet http://wonef.fr/)
For the Arabic Wordnet they have given detailed Data Specifications which can be found at http://globalwordnet.org/arabic-wordnet/awn-data-spec/
I am trying to find the same for either WOLF or WoNeF.
Otherwise how do i map the two files?
For example an word and its relation in awn look like:
<item itemid="$ajarap_AlS~amog_n1AR" offset="111586059" lexfile="" name="شَجَرَة الصَّمْغ " type="synset" headword="" POS="n" source="" gloss="" authorshipid="80" />
<word wordid="$ajarap__1" value="شَجَرَة الصَّمْغ " synsetid="$ajarap_AlS~amog_n1AR" frequency="" corpus="" authorshipid="11461" />
<link type="has_hyponym" link1="$ajarap_AlS~amog_n1AR" link2=">ukAlibotws_n1AR" authorshipid="35038" />
<link type="has_hyponym" link1="$ajarap_n1AR" link2="$ajarap_AlS~amog_n1AR" authorshipid="35041" />
The word defintion (item) and it's relations (link) are seperated with different attributes.
whereas in WOLF a word and it's relations look like:
<SYNSET>
<ILR type="near_antonym">eng-30-00002098-a</ILR>
<ILR type="be_in_state">eng-30-05200169-n</ILR>
<ILR type="be_in_state">eng-30-05616246-n</ILR>
<ILR type="eng_derivative">eng-30-05200169-n</ILR>
<ILR type="eng_derivative">eng-30-05616246-n</ILR>
<ID>eng-30-00001740-a</ID>
<SYNONYM>
<LITERAL lnote="2/2:fr.csbgen,fr.csen">comptable</LITERAL>
</SYNONYM>
<DEF>(usually followed by `to') having the necessary means or skill or know-how or authority to do something
</DEF>
<USAGE>able to swim</USAGE>
<USAGE>she was able to program her computer</USAGE>
<USAGE>we were at last able to buy a car</USAGE>
<USAGE>able to get a grant for the project</USAGE>
<BCS>3</BCS>
<POS>a</POS>
</SYNSET>
I can make assumptions that awn attribute gloss is equal to wolf tag usage, and awn attribute pos is equal to wolf tag pos.
But the point is I don't want to make assumptions, i am looking for proper documentation from which I can be sure and conclude the mappings between the two files.
Could anyone please point me to the right docs?
Depending on your needs, a workaround could be to use the NLTK Python library which integrates some French synsets coming probably from WOLF
>>> from nltk.corpus import wordnet as wn
>>> [synset.lemma_names('fra') for synset in wn.synsets(u'chien'.decode('utf-8'), lang='fra')]
[[u'canis_familiaris', u'chien'], [u'aboyeur', u'chien', u'chienchien', u'clébard', u'toutou'], [u'chien', u'chien_de_chasse'], [u'chien'], [u'chien', u'clic', u'cliquer', u'cliquet'], [u'chien', u'franc', u'hot-dog'], [u'achille', u'chien', u'quignon', u'talon'], [u'chien'], [u'chien']]
The WOLF database is formatted based on VisDic defined here:
https://nlp.fi.muni.cz/trac/deb2/wiki/WordNetFormat
The XSD is available here: http://deb.fi.muni.cz/debvisdic.xsd
I am using Solr 3.6.2 to extract snippets for documents that I am certain to contain a specific string. (First of all, is that usage correct?) Unfortunately, I get snippets that do not contain my query string (simple, single, non-stop word).
For example, for the document 123456, that I know to contain "funmitflags", I have a query of the type:
id:123456 and content_en:funmitflags
and
fl=id&hl=true&hl.fl=content_en&hl.snippets=2&hl.alternateField=content_en&hl.maxAlternateFieldLength=400&hl.maxAnalyzedCharacters=2147483647&hl.fragsize=400&rows=100
(I put my "content_en" as alternate field in order to get any snippet from the document.
I usually have large amount of texts in this field.)
But, now I usually get returned the first 400 characters instead of those that contain my "funmitflags" word.
I can retrieve the document from the admin page, anyway, just not a proper highlight.
It is awkward, because I have this problem with about ~75% of all queries.
In my schema.xml, I have "content_en" to be defined as "text_en".
<field name="content_en" type="text_en" indexed="true" stored="true" />
I changed "text_en" from the original definition, to the following:
<fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1"
generateNumberParts="1"
catenateWords="0"
catenateNumbers="0"
catenateAll="0"
splitOnCaseChange="1"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true"
words="lang/stopwords_en.txt"
enablePositionIncrements="true"
/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.EnglishPossessiveFilterFactory"/>
<filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
<filter class="solr.PorterStemFilterFactory"/>
</analyzer>
</fieldType>
Reindexed, I get no correct snippets in both cases.
Can someone give me a direction?
Should I always get a snippet containing my search word?
Just tried highlighting on field largetext_en of type text_en which uses your analyzer chain. Highlighting works fine and the 4 snippets contain the searched word Polyana, as you can see below. (It is tough to read the content of the field below, so copy paste to a text editor and see.)
http://localhost:8983/solr/collection1/select?q=id:mateva_highlight%20AND%20largetext_en:polyana&wt=json&fl=id,largetext_en&hl=true&hl.fl=largetext_en&hl.snippets=10&hl.alternateField=largetext_en&hl.maxAlternateFieldLength=400&hl.maxAnalyzedCharacters=2147483647&hl.fragsize=400&rows=100
Here is the output:
responseHeader: {
status: 0
QTime: 5
-params: {
hl.fragsize: "400"
fl: "id,largetext_en"
hl.snippets: "10"
hl.maxAlternateFieldLength: "400"
q: "id:mateva_highlight AND largetext_en:polyana"
hl.alternateField: "description"
hl.fl: "largetext_en"
wt: "json"
hl: "true"
-rows: [
"41"
"100"
]
hl.maxAnalyzedCharacters: "2147483647"
}
}
-response: {
numFound: 1
start: 0
-docs: [
-{
id: "mateva_highlight"
-largetext_en: [
"COUNT LEO NIKOLAYEVICH TOLSTOY was born August 28, 1828, at the family estate of Yasna- ya Polyana, in the province of Tula. His moth- er died when he was three and his father six years later. Placed in the care of his aunts, he passed many of his early years at Kazan, where, in 1844, after a preliminary training by French tutors, he entered the university. He cared lit- tle for the university and in 1847 withdrew be- cause of "ill-health and domestic circum- stances." He had, however, done a great deal of reading, of French, English, and Russian novels, the New Testament, Voltaire, and Hegel. The author exercising the greatest in- fluence upon him at this time was Rousseau; he read his complete works and for sometime wore about his neck a medallion of Rousseau. Immediately upon leaving the university, Tolstoy returned to his estate and, perhaps inr spired by his enthusiasm for Rousseau, pre- pared to devote himself to agriculture and to improving the condition of his serfs. His first attempt at social reform proved disappointing, and after six months he withdrew to Moscow and St. Petersburg, where he gave himself over to the irregular life characteristic of his class and time. In 1851, determined to "escape my debts and, more than anything else, my hab- its," he enlisted in the Army as a gentleman- volunteer, and went to the Caucasus. While at Tiflis, preparing for his examinations as a cadet, he wrote the first portion of the trilogy, Childhood, Boyhood, and Youth, in which he celebrated the happiness of "being with Na- ture, seeing her, communing with her." He al- so began The Cossacks with the intention of showing that culture is the enemy of happi- ness. Although continuing his army life, he gradually came to realize that "a military ca- reer is not for me, and the sooner I get out of it and devote myself entirely to literature the better." His Sevastopol Sketches (1855) were so successful that Czar Nicholas issued special orders that he should be removed from a post of danger. Returning to St. Petersburg, Tolstoy was re- ceived with great favor in both the official and literary circles of the capital. He soon became interested in the popular progressive move- ment of the time, and in 1857 he decided to go abroad and study the educational and munici- pal systems of other countries. That year, and again in 1860, he traveled in Europe. At Yas- naya Polyana in 1861 he liberated his serfs and opened a school, established on the principle that "everything which savours of compulsion is harmful." He started a magazine to promote his notions on education and at the same time served as an official arbitrator for grievances between the nobles and the recently emanci- pated serfs. By the end of 1863 he was so ex- hausted that he discontinued his activities and retired to the steppes to drink koumis for his health. Tolstoy had been contemplating marriage for some time, and in 1862 he married Sophie Behrs, sixteen years his junior, and the daugh- ter of a fashionable Moscow doctor. Their early married life at Yasnaya Polyana was tranquil. Family cares occupied the Countess, and in the course of her life she bore thirteen children, nine of whom survived infancy. Yet she also acted as a copyist for her husband, who after their marriage turned again to writ- ing. He was soon at work upon "a novel of the i8io's and *2o's" which absorbed all his time and effort. He went frequently to Mos- cow, "studying letters, diaries, and traditions" and "accumulated a whole library" of histori- cal material on the period. He interviewed survivors of the battles of that time and trav- eled to Borodino to draw up a map of the battleground. Finally, in 1869, after his work had undergone several changes in conception and he had "spent five years of uninterrupted andjgxceptionally strenuous labor Tnnierthe IbesfcondUtions of life/' he published War and Peace. Its appearance immediately established Tolstoy's reputation, and in the judgment of Turgenev, the acknowledged dean of Russian letters, gave him "first place among all our contemporary writers." The years immediately following the com- pletion of War and Peace were pa**efl in a great variety of occupations, none of which Tohtoy found satisfying. He tried busying VI BIOGRAPHICAL NOTE himself with the affairs of his estate, under- took the learning of Greek to read the ancient classics, turned again to education, wrote a series of elementary school books, and served as school inspector. With much urging from his wife and friends, he completed Anna Kare- nina, which appeared serially between 1875 and 1877. Disturbed by what he considered his unreflective and prosperous existence, Tolstoy became increasingly interested in religion. At first he turned to the orthodox faith of the people. Unable to find rest there, he began a detailed examination of religions, and out of his reading, particularly of the Gospels, gradu- ally evolved his own personal doctrine. Following his conversion, Tolstoy adopted a new mode of life. He dressed like a peasant, devoted much of his time to manual work, learned shoemaking, and followed a vegetari- an diet. With the exception of his youngest daughter, Alexandra, Tolstoy's family re- mained hostile to his teaching. The breach be- tween him and his wife grew steadily wider. In 1879 he wrote the Kreutzer Sonata in which he attacked the normal state of marriage and extolled a life of celibacy and chastity. In 1881 he divided his estate among his heirs and, a few years later, despite the opposition of his wife, announced that he would forego royal- ties on all the works published after his con- version. Tolstoy made no attempt at first to propa- gate his religious teaching, although it attracted many followers. After a visit to the Moscow slums iri 1881, he became concerned with social conditions, and he subsequently aided the suf- ferers of the famine by sponsoring two hun- dred and fifty relief kitchens. After his meet- ing and intimacy with Chertkov, "Tolstoyism" began to develop as an organized sect. Tol- stoy's writings became almost exclusively pre- occupied with religious problems. In addition to numerous pamphlets and plays, he wrote IV hat is Art? (1896), in which he explained his new aesthetic theories, and Hadji-Murad, (1904), which became the favorite work of his old age. Although his activities were looked upon with increasing suspicion by the official authorities, Tolstoy escaped official censure until 1901, when he was excommunicated by the Orthodox Church. His followers were f re- quently subjected to persecution, and many were either banished or imprisoned. Tolstoy's last years were embittered by mounting hostility within his own household. Although his personal life was ascetic, he felt the ambiguity of his position as a preacher of poverty living on his great estate. Finally, at the age of eighty-two, with the aid of his daugh- ter, Alexandra, he fled from home. His health broke down a few days later, and he was re- moved from the train to the station-master's hut at Astopovo, where he died, November 7, 1910. He was buried at Yasnaya Polyana, in the first public funeral to be held in Russia without religious rites. "
]
}
]
}
-highlighting: {
-mateva_highlight: {
-largetext_en: [
"COUNT LEO NIKOLAYEVICH TOLSTOY was born August 28, 1828, at the family estate of Yasna- ya <em>Polyana</em>, in the province of Tula. His moth- er died when he was three and his father six years later. Placed in the care of his aunts, he passed many of his early years at Kazan, where, in 1844, after a preliminary training by French tutors, he entered the university. He cared lit"
" and study the educational and munici- pal systems of other countries. That year, and again in 1860, he traveled in Europe. At Yas- naya <em>Polyana</em> in 1861 he liberated his serfs and opened a school, established on the principle that "everything which savours of compulsion is harmful." He started a magazine to promote his notions on education and at the same time served as an official"
" doctor. Their early married life at Yasnaya <em>Polyana</em> was tranquil. Family cares occupied the Countess, and in the course of her life she bore thirteen children, nine of whom survived infancy. Yet she also acted as a copyist for her husband, who after their marriage turned again to writ- ing. He was soon at work upon "a novel of the i8io's and *2o's" which absorbed all his time"
" position as a preacher of poverty living on his great estate. Finally, at the age of eighty-two, with the aid of his daugh- ter, Alexandra, he fled from home. His health broke down a few days later, and he was re- moved from the train to the station-master's hut at Astopovo, where he died, November 7, 1910. He was buried at Yasnaya <em>Polyana</em>, in the first public funeral to be held"
]
}
}
}
Thanks for #arun's experiment that truncated half the possibilities, I found a solution.
As my texts are very large, I set in solrconfig.xml
<maxFieldLength>1000000</maxFieldLength>
In order to increase the speed I started using fastVectorHighlighter:
solrQuery.set("hl.useFastVectorHighlighter", true);
to my query. Seems that it disabled my highligherSimplePre and highligherSimplePost, but who cares.
Also, I had to add the term* options to my content field:
` <field name="content_en" type="text_en" indexed="true" stored="true" termVectors="true" termPositions="true" termOffsets="true" />`
Ofcourse, reindexing was performed.
Note that the query has hl.maxAnalyzedCharacters=2147483647, but this is the wrong parameter name -- what's wanted instead is hl.maxAnalyzedChars
I want to start a mediaWiki based site, but rather than manually adding categories and subcategories I want to add them in an automated fashion, where I provide something like an xml file and the bot/script/algorithm/... goes through the list and creates the categories and subcategories with their pages automatically.
There are no pages yet, but I want to start with a clean set of categories, helping users to sort the pages.
I found the pywikipediabot, but I can't figure out how to use it for my purposes - it seems to only work for categories of existing pages. Would you use pywikipediabot for creating hierarchies of new categories and if yes how? Can an xml file be used as a template?
I found a solution to my initial problem of creating categories in bulk, however I don't mark the question as closed, if you know a better solution - please post.
MediaWiki has an import functionality. With your admin account go to
http://yourMediaWiki/index.php/Special:Import
This allows you to choose to import an xml file, which has to follow a certain structure: see here
For a category with the name "Test Category" and the text "Category Testing", you have to create a 'page' element like this:
<page>
<title>Category:Test Category</title> <!-- Name of the category, don't forget to prefix with 'Categroy:' -->
<ns>14</ns> <!-- 14 is the namespace of categories -->
<id>n</id> <!-- identifier for category -->
<revision>
<id>16</id> <!-- number of revision -->
<timestamp>2013-02-10T22:07:46Z</timestamp> <!-- Creation date & time -->
<contributor>
<username>admin</username> <!-- Name of user who created the category -->
<id>1</id> <!-- ID of the user -->
</contributor>
<comment></comment> <!-- Comment about the category. Can be left blank -->
<sha1></sha1> <!-- sha1 hash can be left blank -->
<text xml:space="preserve" bytes="1">Category Testing</text> <!-- It seems it doesn't matter what you write into the bytes attribute. -->
</revision>
</page>
If you want to create hierarchies of categories just add the parent category tags into the text element. Say the category should be part of the 'Parent Category' category then the text element should look like this:
<text xml:space="preserve" bytes="1">Category Testing [[Category:Parent Category]]</text>
If you are able to get pywikibot up and running, then you can use the its Category class. Main entry point on Github search for class Category(Page).
Categories in Mediawiki are basically standard pages but in Namespace 14. To include any page in a Category - including a page which is a category - in the wikitext of the page you include [[Category:<The-Category>]]
So you can do something like this
>>> import pywikibot as pwb
#Your site will be different than this
>>> testwiki = pwb.Site('en','test')
>>> catA = pwb.Category(testwiki, 'testCatA')
>>> catA.namespace()
14
>>> catA._text = u'[[Category:testCatB]]'
>>> catA.save()
Page [[test:Category:TestCatA]] saved
Now you have a page Category:TestCatA which is a subcategory of Category:TestCatB.