Finding a regular grammar for a given regular expression? - grammar

I am trying to find a regular grammar that generates the language given by the regular expression ((a+b∗c)d)∗. Is there a general technique I can use to convert regular expressions into regular grammars?

It's usually a lot easier to convert a finite automaton for a regular language into a regular grammar than it is to convert a regular expression into a regular grammar. I'd recommend starting off by building an automaton for the regular expression - either manually or by applying Thompson's algorithm to mechanically convert the regex to an automaton - and then doing the conversion from there.

Related

Formal Language Theory - Unique "leftmost" and "rightmost" derivation tree

I'm studying Formal Language Theory and doing the exercises assigned to me, I found this question that I can't give a certain answer to:
When, in general, in a context-free grammar are the leftmost and rightmost derivations of each generated string equal?
Only if the grammar generates a deterministic context-free language
Only if there is at most one non-terminal symbol in the right-hand
side of each derivation
Never
(P.S. : The question is translated from Italian, I hope it is understandable)

Determine if a string can be derived ambiguously in a CFG

I know that given a specific context free grammar, to check if it is ambiguous requires checking if there exists any string that can be derived in more than 1 way. And this is undecidable.
However, I have a simpler problem. Given a specific context free grammar and a specific string, is it possible to determine if the string can be derived from the grammar ambiguously? Is there a general algorithm to do this check?
Yes, you can use any generalized parsing algorithm, such as a GLR (Tomita) parser, an Earley parser, or even a CYK parser; all of those can produce a parse "forest" (i.e. a digraph of all possible parsers) in O(N3) time and space. Creating the parse forest is a bit trickier than the "parsing" (that is, recognition), but there are known algorithms which are referenced in the Wikipedia article.
Since the generalized parsing algorithms find all possible parses, you can rest assured that if exactly one parse is found for the string, then the string is not ambiguous.
I'd stay away from CYK parsing for this algorithm because it requires converting the grammar to Chomsky Normal Form, which makes recovering the original parse tree(s) more complicated.
Bison will generate a GLR parser, if requested, so you could just use that tool. However, be aware that it does not optimize storage of the parse forest, since it is expecting to produce only a single parse, and therefore you can end up with exponentially-sized datastructures (which then take exponential time to construct). That's usually only a problem with pathological grammars, though. Also, you will have to declare a custom %merge function on all possibly ambiguous productions; otherwise, the Bison-generated parser will fail with an "ambiguous parse" error if more than one parse is possible.

SPARQL - Restricting Result Resource to Certain Namespace(s)

Is there a standard way of restricting the results of a SPARQL query to belong to a specific namespace.
Short answer - no there is no standard direct way to do this
Long answer - However yes you can do a limited form of this with the string functions and a FILTER clause. What function you use depends on what version of SPARQL your engine supports.
SPARQL 1.1 Solution
Almost all implementations will these days support SPARQL 1.1 and you can use the STRSTARTS() function like so:
FILTER(STRSTARTS(STR(?var), "http://example.org/ns#"))
This is my preferred approach and should be relatively efficient because it is simple string matching.
SPARQL 1.0 Solution
If you are stuck using an implementation that only supports SPARQL 1.0 you can still do this like so but it uses regular expressions via the REGEX() function so will likely be slower:
FILTER(REGEX(STR(?var), "^http://example\\.org/ns#"))
Regular Expressions and Meta-Characters
Note that for the regular expression we have to escape the meta-character . as otherwise it could match any character e.g. http://exampleXorg/ns#foo would be considered a valid match.
As \ is the escape character for both regular expressions and SPARQL strings it has to be double escaped here in order to get the regular expression to have just \. in it and treat . as a literal character.
Recommendation
If you can use SPARQL 1.1 then do so because using the simpler string functions will be more performant and avoids the need to worry about escaping any meta-characters that you have when using REGEX

SSIS 2008 - how to do regular expression search in SSIS Derived Column tool

How do I do a regular expression in an SSIS Derived Column Tool
i.e.
I have string in the format XXXNNNN and I want to filter our those strings not in this format using an SSIS Derived Column Tool.
i.e
ABC1234 is ok
ABCDEFG is not.
The Derived Column transformation doesn't support regular expressions, so you'll have to look at some other options:
Use a Script Task and write the regex using the standard .NET regex features
Use a third-party component
If you always have 7 characters, you could use the SUBSTRING and CODEPOINT functions to check that each one is in the range you expect (see the function reference). But that's probably awkward to read and maintain, and may not be practical at all depending on what your data looks like.

Split SQL statements blocks using Regular expression in C#

Can we write an regular expression such that it splits the stored procedure in multiple SQL statements.
It should split update, delete select etc statements.
Edit: my attempt to solve the problem http://tsqlparsergdr.codeplex.com/
If you have the grammar for the stored procedure language you could use ANTLR so parse the procedure to get the relevant parts of the language out and the do any further processing necessary. It should be reletively easy to get a grammar going from scratch as well.
There would need to be a set of regex expressions to deal with the whole procedure. I.e. a regex to mach just insert statements that possible spans many lines and possible has local variables from the proc in it and so on.
If you are working with a known set of SQL procedures it should be pretty easy to examine them and come up with a set of regexes to split them as required.
If you are looking for something which will handle any possible set of SQL procedures then regexes wont hack it! SQL has a complex recursive grammer, and, there will always be some sub select, group by, or literal that will break your regex based parser.
As the previous poster recommended you really need a full parser such as can be generated by ANTLR or Javacc (is there a C# eqivalent?).
There are a number of SQL-92 grammer definitions available for these parser generators on the net so a large part of the work has been done for you - the remaining part - writing the parsers application logic - is still far from trivial.
To parse arbitrary stored procedures, you're far better off with a SQL parser. Trying to parse arbitrary SQL with regexes will amount to writing your own parser.
To parse a specific set of stored procedures, a regex may be able to do the job. You'll need to provide a few examples of the input you have and the desired output if you want a more detailed answer.