Constructing Language generated by the grammar - grammar

We have to find L(G), where grammar G is given as-
S->AB|CD, A->aA|a ,B->bB|bC, C->cD|d, D->aD|AD
I have attempted the question but it is recursing very deep and I am unable to terminate the string.[I know that A will generate a^n after n steps but what about D,C,B?]
Till now I have attempted as follows-
A->aA->aaA->....->a^(n-1)A (after n-1 steps)->a^n
B->bB->bbB->....->b^(m-1)B (after m-1 steps)->b^(m-1)bC->b^(m-1)bbC->...b^(m-1)b^(n-1)C->b^kC
C->cD->ccD->...->c^(p-1)D or c^(p-1)d[Thus we will consider as C->c^pD or C->c^pd]
D->aD->aaD->...->a^(q-1)D->a^(q-1)a^nD[Thus we will consider D->a^rD]
Now B depends on C and C depends on D and D depends on itself(i.e- D recurs on itself as D->a^rD)
So how can I find the grammar for this language which doesn't terminates?

D does not yield a string of terminals, so it is useless and can be omitted including all rules that has D.
The simplified grammar would be:
S->AB, A->aA|a ,B->bB|bC, C->d
And the language will be: {a^m b^n d : m,n>=1}

Related

Query for all N elements in an M:N relation

Say I have the following tables that model tags attached to articles:
articles (article_id, title, created_at, content)
tags (tag_id, tagname)
articles_tags (article_fk, tag_fk)
What is the idiomatic way to retrieve the n newest articles with all their attached tag-names? This appears to be a standard problem, yet I am new to SQL and don't see how to elegantly solve this problem.
From an application perspective, I would like to write a function that returns a list of records of the form [title, content, [tags]], i.e., all the tags attache to an article would be contained in a variable length list. SQL relations aren't that flexible; so far, I can only think about a query to joint the tables that returns a new row for each article/tag combination, which I then need to programmatically condense into the above form.
Alternatively, I can think of a solution where I issue two queries: First, for the articles; second, an inner join on the link table and the tag table. Then, in the application, I can filter the result set for each article_id to obtain all tags for a given article? The latter seems to be a rather verbose and inefficient solution.
Am I missing something? Is there a canonical way to formulate a single query? Or a single query plus minor postprocessing?
On top of the bare SQL question, how would a corresponding query look like in the Opaleye DSL? That is, if it can be translated at all?
You would typically use a row-limiting query that selects the articles and orders them by descending date, and a join or a correlated subquery with an aggregation function to generate the list of tags.
The following query gives you the 10 most recent articles, along with the name of their related tags in an array:
select
a.*,
(
select array_agg(t.tagname)
from article_tags art
inner join tags t on t.tag_id = art.tag_fk
where art.article_fk = a.article_id
) tags
from articles
order by a.created_at desc
limit 10
You have converted most of GMB's answer successfully to Opaleye in your answer to your subsequent question. Here's a fully-working version in Opaleye.
In the future you are welcome to ask such questions on Opaleye's issue tracker. You will probably get a quicker response there.
{-# LANGUAGE Arrows #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TemplateHaskell #-}
import Control.Arrow
import qualified Opaleye as OE
import qualified Data.Profunctor as P
import Data.Profunctor.Product.TH (makeAdaptorAndInstance')
type F field = OE.Field field
data TaggedArticle a b c =
TaggedArticle { articleFk :: a, tagFk :: b, createdAt :: c}
type TaggedArticleR = TaggedArticle (F OE.SqlInt8) (F OE.SqlInt8) (F OE.SqlDate)
data Tag a b = Tag { tagKey :: a, tagName :: b }
type TagR = Tag (F OE.SqlInt8) (F OE.SqlText)
$(makeAdaptorAndInstance' ''TaggedArticle)
$(makeAdaptorAndInstance' ''Tag)
tagsTable :: OE.Table TagR TagR
tagsTable = error "Fill in the definition of tagsTable"
taggedArticlesTable :: OE.Table TaggedArticleR TaggedArticleR
taggedArticlesTable = error "Fill in the definition of taggedArticlesTable"
-- | Query all tags.
allTagsQ :: OE.Select TagR
allTagsQ = OE.selectTable tagsTable
-- | Query all article-tag relations.
allTaggedArticlesQ :: OE.Select TaggedArticleR
allTaggedArticlesQ = OE.selectTable taggedArticlesTable
-- | Join article-ids and tag names for all articles.
articleTagNamesQ :: OE.Select (F OE.SqlInt8, F OE.SqlText, F OE.SqlDate)
articleTagNamesQ = proc () -> do
ta <- allTaggedArticlesQ -< ()
t <- allTagsQ -< ()
OE.restrict -< tagFk ta OE..=== tagKey t -- INNER JOIN ON
returnA -< (articleFk ta, tagName t, createdAt ta)
-- | Aggregate all tag names for all articles
articleTagsQ :: OE.Select (F OE.SqlInt8, F (OE.SqlArray OE.SqlText))
articleTagsQ =
OE.aggregate ((,) <$> P.lmap (\(i, _, _) -> i) OE.groupBy
<*> P.lmap (\(_, t, _) -> t) OE.arrayAgg)
(OE.limit 10 (OE.orderBy (OE.desc (\(_, _, ca) -> ca)) articleTagNamesQ))

The multi-part identifier could not be bound even though everything is unique

So I'm getting an IntelliSense error and I can't figure out why. I've renamed everything to use aliases and I've ready everything I can on mutli-part identifiers and it seems to suggest that it's not unique? But with an Alias it seems to be unique, although "MachineID" is referenced in a number of tables
Here's my query
SELECT DISTINCT TOP 1000
a.Name00,
a.UserName00,
a.Domain00,
a.TotalPhysicalMemory00,
a.Manufacturer00,
a.Model00,
a.MachineID,
a.SystemType00,
b.MACAddress00,
b.ServiceName00,
c.System_OU_Name0,
d.Name0,
e.Model00
FROM
[dbo].[Computer_System_DATA] AS a,
[dbo].[v_RA_System_SystemOUName] AS c,
[dbo].[v_GS_PROCESSOR] AS d,
[dbo].[Disk_DATA] AS e
INNER JOIN [dbo].[Network_DATA] AS b ON b.MachineID=a.MachineID
WHERE
b.MACAddress00 IS NOT NULL AND b.ServiceName00 LIKE '%express'
The error is showing on line 22 at a.MachineID
What am I missing? Also, the error goes away if I comment out the following;
--c.System_OU_Name0,
--d.Name0,
--e.Model00
--[dbo].[v_RA_System_SystemOUName] AS c,
--[dbo].[v_GS_PROCESSOR] AS d,
--[dbo].[Disk_DATA] AS e
Any help is massively appreciated!
Dmitrij Kultasev was spot on for the issue. Explicit joins happen first. So at the moment, the INNER JOIN is between e and b; a, c and d aren't in scope for that ON clause - hence the error (there's no a) and why it works when the commenting changes the join order (which now means you're joining a and b.
Fix your query to eliminate the old comma join syntax - it's from over a quarter of a century ago!
SELECT DISTINCT TOP 1000
a.Name00,
a.UserName00,
a.Domain00,
a.TotalPhysicalMemory00,
a.Manufacturer00,
a.Model00,
a.MachineID,
a.SystemType00,
b.MACAddress00,
b.ServiceName00,
c.System_OU_Name0,
d.Name0,
e.Model00
FROM
[dbo].[Computer_System_DATA] AS a
INNER JOIN --?--
[dbo].[v_RA_System_SystemOUName] AS c
ON
--?-- What links a and c together?
INNER JOIN --?--
[dbo].[v_GS_PROCESSOR] AS d
ON
--?-- What links d to the combination of a and c?
INNER JOIN --?--
[dbo].[Disk_DATA] AS e
ON
--?-- What links e to the combination of a, c and d?
INNER JOIN [dbo].[Network_DATA] AS b ON b.MachineID=a.MachineID
WHERE
b.MACAddress00 IS NOT NULL AND b.ServiceName00 LIKE '%express'
Of course, you may want to switch around the order in which you perform the joins if e.g. the link between a and c is actually via d.
The multi-part identifier could not be bound even though everything is unique
The question title doesn't really make sense. An error stating "The multi-part identifier could not be bound" usually means that part of a name you've used somewhere isn't available at all at that location - not anything to do with multiple definitions. That would usually generate an error along the lines of "The correlation name '<x>' is specified multiple times".

Removed left recursion from grammar, but now the grammar allows invalid derivations

I'm trying to remove the left recursion from a grammar, however after following an algorithm to do so I now believe the grammar I've produced allows for the derivation of statements that aren't valid.
A part of the grammar is as follows:
A -> a A b A c A d
A -> e A
A -> f
A -> A + A
A -> A * A
So from my understanding, the bottom two productions are direct left recursion and the way to remove the left recursion is by introducing a new nonterminal A', however I then get this:
A -> a A b A c A d A'
A -> e A A'
A -> f A'
A' -> + A A'
A' -> * A A'
A' -> epsilon
Yet this allows the derivation of a A b A c A d * f
Which would not be a valid derivation with the original grammar. Can please someone please explain what I'm doing wrong?

Pig matching with an external file

I have a file (In Relation A) with all tweets
today i am not feeling well
i have viral fever!!!
i have a fever
i wish i had viral fever
...
I have another file (In Relation B) with words to be filtered
sick
viral fever
feeling
...
My Code
//loads all the tweets
A = load 'tweets' as tweets;
//loads all the words to be filtered
B = load 'filter_list' as filter_list;
Expected Output
(sick,1)
(viral fever,2)
(feeling,1)
...
How do i achieve this in pig using a join?
EDITED SOLUTION
The basic concept that I supplied earlier will work, but it requires the addition of a UDF to generate NGrams pairs of the tweets. You then union the NGram pairs to the Tokenized tweets, and then perform the wordcount function on that dataset.
I've tested the code below, and it works fine against the data provided. If records in your filter_list have more than 2 words in a string (ie: "I feel bad"), you'll need to recompile the ngram-udf with the appropriate count (or ideally, just turn it into a variable and set the ngram count on the fly).
You can get the source code for the NGramGenerator UDF here: Github
ngrams.pig
REGISTER ngram-udf.jar
DEFINE NGGen org.apache.pig.tutorial.NGramGenerator;
--Load the initial data
A = LOAD 'tweets.txt' as (tweet:chararray);
--Create NGram tuple with a size limit of 2 from the tweets
B = FOREACH A GENERATE FLATTEN(NGGen(tweet)) as ngram;
--Tokenize the tweets into single word tuples
C = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)tweet)) as ngram;
--Union the Ngram and word tuples
D = UNION B,C;
--Group similar tuples together
E = GROUP D BY ngram;
--For each unique ngram, generate the ngrame name and a count
F = FOREACH E GENERATE group, COUNT(D);
--Load the wordlist for joining
Z = LOAD 'wordlist.txt' as (word:chararray);
--Perform the innerjoin of the ngrams and the wordlist
Y = JOIN F BY group, Z BY word;
--For each intersecting record, store the ngram and count
X = FOREACH Y GENERATE $0,$1;
DUMP X;
RESULTS/OUTPUT
(feeling,1)
(viral fever,2)
tweets.txt
today i am not feeling well
i have viral fever!!!
i have a fever
i wish i had viral fever
wordlist.txt
sick
viral fever
feeling
Original Solution
I don't have access to my Hadoop system at the moment to test this answer, so the code may be off slightly. The logic should be sound, however. An easy solution should be:
Perform the classic wordcount program against the tweets dataset
Perform an inner join of the wordlist and tweets
Generate the data again to get rid of the duplicate word in the tuple
Dump/Store the join results
Example code:
A = LOAD 'tweets.txt';
B = FOREACH A GENERATE FLATTEN(TOKENIZE((chararray)$0)) as word;
C = GROUP B BY word;
D = FOREACH C GENERATE group, COUNT(B);
Z = LOAD 'wordlist.txt' as (word:chararray);
Y = JOIN D BY group, Z BY word;
X = FOREACH Y GENERATE ($1,$2);
DUMP X;
As far as I know, this is not possible using a join.
You could do a CROSS followed by a FILTER with a regex match.

Change the order of tuple fields

As an example, lets say I load two different files into a pig script
A = LOAD 'file1' USING PigStorage('\t') AS (
day:chararray,
month:chararray,
year:chararray,
message:chararray);
B = LOAD 'file2' USING PigStorage('\t) AS (
month:chararray,
day:chararray,
year:chararry,
message:chararray);
Now, notice the order of the fields is different, so if I combine them into one file C = UNION A, B; I get...
(2,OCT,2013,INFO INVALID USERNAME)
(OCT,3,2013,WARN STACK OVERFLOW)
If for no other reason than to make the data easier to read, I'd like to reorder the fields, so that both of them follow a common format and have the same positional notation for each field.
(2,OCT,2013,INFO INVALID USERNAME)
(3,OCT,2013,WARN STACK OVERFLOW)
This also crops up in a few other places with messages, levels, hosts, etc. It's not just date fields, I'd like to make everything "prettier" all around.
In some weird pseudo-code, I'd be looking for something like:
D = FOREACH B
REORDER (month,day,year) TO (day,month,year);
I haven't been able to find an example of anyone trying to do this and don't see a function that would do it. So maybe it's not possible and I'm alone here, but if anyone has any ideas I'd appreciate some hints.
In general, this is not necessary in Pig because you can just refer to fields by name and not worry about their position in the record. If your goal is to do a UNION of the two relations, you can achieve this by using the ONSCHEMA keyword:
C = UNION ONSCHEMA A, B;
That said, if you do really need to reorder a relation, a simple FOREACH...GENERATE is all you need:
D = FOREACH B GENERATE day, month, year, message;
Note that in your example, you are not actually working with tuples, you are working with entire records. If you did have a tuple, though, you can use the TOTUPLE built-in UDF to get where you need to go:
DESCRIBE E;
E: {t: (month: chararray,day: chararray,year: chararray,message: chararray)}
F = FOREACH E GENERATE TOTUPLE(t.day, t.month, t.year, t.message) AS t;
DESCRIBE F;
F: {t: (day: chararray,month: chararray,year: chararray,message: chararray)}