Can DTD entities be used to define children of an element? - entity

When defining an element in DTD is it possible to use an entity to replace probable duplication of children elements?
For example instead of defining the following elements:
<!ELEMENT bear (weight, height, power)>
<!ELEMENT human (weight, height, power)>
Could I just replace the definition of the children with a defined entity like that:
<!ELEMENT bear &stats;>
<!ELEMENT human &stats;>
<!ENTITY stats "(weight, height, power)">
If not, what is the way to avoid duplication (in DTDs not Schemas)?

After seeking help from a web specialist I found that the example in the question is ALMOST right. Entities can be used in order to define elements and avoid duplication in a slightly different way (see example below).
<!ENTITY % stats "(weight, height, power)">
<!ELEMENT bear %stats;>
<!ELEMENT human %stats;>

Related

Is this conversion from BNF to EBNF correct?

As context, my textbook uses this style for EBNF:
Sebesta, Robert W. Concepts of Programming Languages 11th ed., Pearson, 2016, 150.
The problem:
Convert the following BNF rule with three RHSs to an EBNF rule with a single RHS.
Note: Conversion to EBNF should remove all explicit recursion and yield a single RHS EBNF rule.
A ⟶ B + A | B – A | B
My solution:
A ⟶ B [ (+ | –) A ]
My professor tells me:
"First, you should use { } instead of [ ],
Second, according to the BNF rule, <"term"> is B." (He is referring the the style guide posted above)
Is he correct? I assume so but have read other EBNF styles and wonder if I am entitled to credit.
You were clearly asked to remove explicit recursion and your proposed solution doesn't do that; A is still defined in terms of itself. So independent of naming issues, you failed to do the requested conversion and your prof is correct to mark you down for it. The correct solution for the problem as presented, ignoring the names of non-terminals, is A ⟶ B { (+ | –) B }, using indefinite repetition ({…}) instead of optionality ([…]). With this solution, the right-hand side of the production for A only references B, so there is no recursion (at least, in this particular production).
Now, for naming: clearly, your textbook's EBNF style is to use angle brackets around the non-terminal names. That's a common style, and many would say that it is more readable than using single capital letters which mean nothing to a human reader. Now, I suppose your prof thinks you should have changed the name of B to <term> on the basis that that is the "textbook" name for the non-terminal representing the operand of an additive operator. The original BNF you were asked to convert does show the two additive operators. However, it makes them right-associative, which is definitely non-standard. So you might be able to construct an argument that there's no reason to assume that these operators are additive and that their operands should be called "terms" [Note 1]. But even on that basis, you should have used some name written in lower-case letters and surrounded in angle brackets. To me, that's minor compared with the first issue, but your prof may have their own criteria.
In summary, I'm afraid I have to say that I don't believe you are entitled to credit for that solution.
Notes
If you had actually come up with that explanation, your prof might have been justified in suggesting a change of major to Law.

Fuziness In UIMA ruta

Is there any option of fuzziness in case of word matching, or ignoring some special cases.
For ex:
STRINGLIST AMIMALLIST = {"LION","TIGER","MONKEY"};
DECLARE ANIMAL;
Document {-> MARKFAST(ANIMAL, AMIMALLIST, true)};
I need to match words with list in case I face some special character like
Tiger- or MONKEY$
According to documentation There are different evaluator any idea how to use?
Or can I use SCORE or MARKSCORE
There are several aspects to consider here. In general, UIMA Ruta does not support fuzziness in the dictionary lookup. SCORE and MARKSCORE are language elements which can be utilized to introduce some heurstic scoring (not really fuzziness) in sequential rules. In the examples you gave in your question, you do not really need fuzzy matching.
The dictionary lookup in UIMA Ruta works on the RutaBasic annotation. These annotations are automatically created and maintained by UIMA Ruta itself (and should not be changed by other analysis engines or rules directly). The RutaBasic annotations represent the smallest fragments annotations are referring to. By default, the seeder of the RutaEngine creates annotations for words (W -> CW, SW, CAP) and many other tokens like SPECIAL for - or $. This means that there is also a RutaBasic annotation, and that the dictionary lookup can distinghish between these tokens. As a result, Tiger and Monkey should be annotated and the example in your question should actually work (I tested it). You maybe need some postprossesing in order to include the SPECIAL in ANIMAL.
I have to mention that there is also the functionality to use an edit distance in the dictionary lookup (Multi Tree Word List, TRIE). However, this functionality has not been maintained for several years. It should also support different weights for specific replacements. I do not know if this counts as fuzziness.
DISCLAIMER: I am a developer of UIMA Ruta

Sparql query with Blank node can be complex

I read this blog article, Problems of the RDF model: Blank Nodes, and there's mentioned that using blank nodes can complicate the handling of data.
Can you give me an example why using blank nodes is difficult to perform a SPARQL query?
I do not understand the complexity of blank nodes.
Can you explain me the meaning and semantics of an existential variable?
I do not understand clearly this explanation given in the RDF Semantics Recommendation, 1.5. Blank Nodes as Existential Variables.
Existential Variables
In the (first-order) predicate calculus, there is existential quantification which lets us make assertions about things that exist, without saying (or, possibly, knowing) which specific individuals in the domain we're actually talking about. For instance, a sentence like
hasUserId(JoshuaTaylor,1281433)
entails the sentence
∃x.hasUserId(x,1281433)
Of course, there are lots of scenarios in which the second sentence could be true without the first one being true. In that sense, the second sentence gives us less information than the first. It's also important to note that the variable x in the second sentence doesn't provide any way to find out which element in the domain of discourse actually has the given userId. It also also doesn't make any claim that there's only one such thing that has the given user id. To make that clearer, we might use an example:
∃y.hasAge(y,29)
This is presumably true, since someone or something out there is age 29. Note that we can't talk about y as the individual that is age 29, though, because there could be lots of them. All this sentence tells us is that there is at least one.
Even though we used different variables in the two sentences, there's nothing to say that the individuals with the specified properties might not be the same. This is particularly important in nested quantification, e.g.,
∃x.∃y.likes(x, y)
This sentence could be true because there is one individual in the domain that likes itself. just because x and y have different names in the sentence doesn't mean that they might not refer to the same individual.
Blank Nodes as Existential Variables
There is a defined RDF entailment model defined in RDF Semantics. This has been described more in another Stack Overflow question, RDF Graph Entailment. The idea is that an RDF graph is treated a big existential quantification over the blank nodes mentioned in the graph. E.g., if the triples in the graph are t1, …, tn, and the blank nodes that appear in those triples are b1, …, bm, then the graph is a formula:
∃b1, …, bm.(t1 &wedge; … &wedge; tn)
Based on the discussion of the existential variables above, note that this means that blank nodes in the data might refer to same element of the domain, or different elements, and that it's not required that exactly one element could take the place of a blank node. This means that a graph with blank nodes, when interpreted in this manner, provides much less information than you might expect.
Blank Nodes in Real Data
Now, the discussion above is useful if people are using blank nodes as existential variables. In many cases, authors think of them more as anonymous, but definite and distinct objects. E.g., if we casually write
#prefix : <https://stackoverflow.com/q/20629437/1281433/> .
:Carol :hasAddress [ :hasNumber 4222 ;
:hasStreet :Clinton_Way ] .
we may well be trying to say that there is a single address out there with the specified properties, but according to the RDF entailment model, that's not what we're doing.
In practice, this isn't so much of a problem, because we're usually not using RDF entailment. What is a problem though is that since the scope of blank variables is local to a graph, we can't run a SPARQL query against an endpoint asking for Carol's address and get back an IRI that we can reuse. If we run a query like this:
prefix : <https://stackoverflow.com/q/20629437/1281433/>
construct {
:Mike :hasAddress ?address
}
where {
:Carol :hasAddress ?address
}
then we get back the following (unhelpful) graph as a result:
#prefix : <https://stackoverflow.com/q/20629437/1281433/> .
:Mike :hasAddress [] .
We won't have a way to get more information about the address because all we have now is a blank node. If we had used IRIs, e.g.,
#prefix : <https://stackoverflow.com/q/20629437/1281433/> .
:Carol :hasAddress :address1267389 .
:address1267389 :hasNumber 4222 ;
:hasStreet :Clinton_Way .
then the query would have produced something more helpful:
#prefix : <https://stackoverflow.com/q/20629437/1281433/> .
:Mike :hasAddress :address1267389 .
Why is this more useful? The first case is like having the data
∃ x.(hasAddress(Carol,x) &wedge; hasNumber(x,4222) &wedge; hasStreet(x,ClintonWay))
and getting back a result
∃ y.hasAddress(Mike,y)
Sure, it's possible that Mike and Carol have the same address, but from these sentences there's no way to know for sure. It's much more helpful to have data like
hasAddress(Carol,address1267389)
hasNumber(address1267389,4222)
hasStreet(address1267389,ClintonWay))
and getting back a result
hasAddress(Mike,address1267389)
From this, you know that they have the same address, and you can ask things about it.
Conclusion
How much this will affect your data and its consumers depends on what the typical use cases are. For automatically constructed graphs, it may be hard to know in advance just what kind of data you'll need to be able to refer to later, so it's a good idea to generate IRIs for as many of your resources as you can. Since IRIs are free-form, it's usually not too hard to do this. For instance, if you've got some sensible “base” IRI, e.g.,
http://example.org/myData/
then you can easily append suffixes to identify your resources. E.g.,
http://example.org/myData/addresses/addr1
http://example.org/myData/addresses/addr2
http://example.org/myData/addresses/addr3
http://example.org/myData/individuals/ind34
http://example.org/myData/individuals/ind35

Solr/Lucene highlighted keywords - paddings

I would like a specific padding (EG '...') --> only on start/end of fragments that are truncated,
I would also like to concatenate 2 fragments, even if they are not close, like:
... fragment1 ... fragment2 ...
Are there any fragmenters / highlight settings that can be used?
i hope i understood your question correctly.
check out the HighlightingParameters
you have a lot of options there.
you can also specify your own hl.simple.pre/hl.simple.post. then you can parse the output every way you like.
it is also quite common to give a span tag with a custom css class, that you can style.

Bison input analyzer - basic question on optional grammar and input interpretation

I am very new to Flex/Bison, So it is very navie question.
Pardon me if so. May look like homework question - but I need to implement project based on below concept.
My question is related to two parts,
Question 1
In Bison parser, How do I provide rules for optional input.
Like, I need to parse the statment
Example :
-country='USA' -state='INDIANA' -population='100' -ratio='0.5' -comment='Census study for Indiana'
Here the ratio token can be optional. Similarly, If I have many tokens optional, then How do I provide the grammar in the parser for the same?
My code looks like,
%start program
program : TK_COUNTRY TK_IDENTIFIER TK_STATE TK_IDENTIFIER TK_POPULATION TK_IDENTIFIER ...
where all the tokens are defined in the lexer. Since there are many tokens which are optional, If I use "|" then there will be many different ways of input combination possible.
Question 2
There are good chance that the comment might have quotes as part of the input, so I have added a token -tag which user can provide to interpret the same,
Example :
-country='USA' -state='INDIANA' -population='100' -ratio='0.5' -comment='Census study for Indiana$'s population' -tag=$
Now, I need to reinterpret Indiana$'s as Indiana's since -tag=$.
Please provide any input or related material for to understand these topic.
Q1: I am assuming we have 4 possible tokens: NAME , '-', '=' and VALUE
Then the grammar could look like this:
attrs:
attr attrs
| attr
;
attr:
'-' NAME '=' VALUE
;
Note that, unlike you make specific attribute names distinguished tokens, there is no way to say "We must have country, state and population, but ratio is optional."
This would be the task of that part of the program that analyses the data produced by the parser.
Q2: I understand this so, that you think of changing the way lexical analysis works while the parser is running. This is not a good idea, at least not for a beginner. Have you even started to think about lexical analysis, as opposed to parsing?