I want some examples of how to take intersection of two finite autometa machines(with diagram).
I have learned taking union of two finite autometas.
I have searched throughout the internet but hasn't find anything.
The idea is pretty straightforward, although I can see where the confusion comes in. I will give a text/symbolic description of the process for making the intersection (union, difference) machines via the Cartesian Product Machine construction (same thing as you are talking about).
A DFA is a 5-tuple (E, Q, q0, A, f) where
E is the input alphabet, a non-empty finite set of symbols
Q is the set of states, non-empty and finite
q0 is the start state, an element of Q
A is the set of accepting or final states, a subset of Q
f is the transition function, taking pairs from Q x E to Q
Say we have two machines M' = (E', Q', q0', A', f') and M'' = (E'', Q'', q0'', A'', f''). To make the discussion easier, we assume E' = E''. We will now construct M''' so that L(M''') = L(M') intersect (or union or difference) L(M'').
Take E''' = E'' = E'
Take Q''' = Q' x Q''
Take q0''' = (q0', q0'')
Take A''' = (x, y) where x in A' and y in A'' (for union, x in A' or y in A''; for difference, x in A' but not y in A'').
Take f'''((x, y), e) = (f'(x, e), f''(y, e)).
There you go! Let's now consider two machines: one which accepts a^2n, and one which accepts a^3n (the intersection should be a machine accepting a^6n... right?).
For M', we have...
E' = {a}
Q' = {s0, s1}
q0' = s0
A' = {s0}
f'(s0, a) = s1, f'(s1, a) = s0
For M'', we have...
E'' = {a}
Q'' = {t0, t1, t2}
q0'' = t0
A'' = {t0}
f''(t0, a) = t1, f''(t1, a) = t2, f''(t2, a) = t0
For M''', we get...
E''' = {a}
Q''' = {(s0, t0), (s0, t1), (s0, t2), (s1, t0), (s1, t1), (s1, t2)}
q0''' = (s0, t0)
A''' = {(s0, t0)} for intersection, {(s0, t0), (s0, t1), (s0, t2), (s1, t0)} for union, {(s0, t1), (s0, t2)} for difference.
f'''((s0, t0), a) = (s1, t1), f'''((s1, t1), a) = (s0, t2), f'''((s0, t2), a) = (s1, t0), f'''((s1, t0), a) = (s0, t1), f'''((s0, t1), a) = (s1, t2), f'''((s1, t2), a) = (s0, t0).
And there you go! Please let me know if this needs clarification.
Source: How to use the intersection construction to form a DFA?
Related
I'm trying to align historical source tables to a daily snapshot for a new Dimension table.
I finally got my query to do what I want but I'm struggling with performance tuning between two equivalent versions. I'm on a Dev environment with no access to Prod so my tests are not realistic (results are immediate). The prod environment will have many millions of lines and dozens of columns with multiple historical versions of each field, so performance is important. Performance will be reveiwed by a DBA of course, but this will take some time and I'd like to understand what happens under the hood.
For the code below, notice A, B, C & X share a key, X has a subset of the key only (left join needed) and Y has the descriptive information for X (and doesn't have the main key).
My query runs something like this:
Select
A.field,
B.field,
C.field,
nvl(X.field,valueIfNull), --nvl is the oracle equivalent of coalesce
nvl(Y.field,valueIfNull)
from tableA A
inner join tables BCD B, C, D
on A.key = B.key
and A.date<= B.date < A.date
... do this 3 times for B, C & D...
Now here I have two options that I see (and that work)
left join tableX X
on A.key = X.key and A.date = X.date -- this will yield null values -> left join -> ok
left join tableY Y
on Y.key = nvl(**X**.key,'~') and Y.date = X.date
where Y.status in ('statusFlag1');
Alternatively, this works too:
left join tableX X
on A.key = X.key and A.date = X.date -- this will yield null values -> left join -> ok
left join tableY Y
on Y.key = X.key and Y.date = X.date
where (Y.status in ('statusFlag1') or Y.status is null);
A last alternative would be to use an inner query to compute the Y & X columns and join that one. I haven't added it yet because it would be less readable imo, but I could if it will mean performance improvement.
So basically is it faster to nvl and join on those "null" values, or to add the nulls in the where clause? Or use an inner query? We will have many millions of lines, mi
left join tableX X
on A.key = X.key and A.date = X.date -- this will yield null values -> left join -> ok
left join tableY Y
on Y.key = nvl(**X**.key,'~') and Y.date = X.date
where Y.status in ('statusFlag1');
This is not a LEFT JOIN as the WHERE clause enforces that there must be a value in Y.status and so this is effectively an INNER JOIN as, since Y.status cannot be NULL; then assuming Y.date is NOT NULL then X.date also cannot be NULL so X must be a valid row too.
If you want a LEFT OUTER JOIN then put the filter condition in the ON clause and not in the WHERE clause:
left join tableX X
on ( A.key = X.key and A.date = X.date )
left join tableY Y
on ( Y.key = X.key and Y.date = X.date AND Y.status = 'statusFlag1' );
Alternatively, if you only want the combination of X and Y tables where Y.status has the desired flag then use a sub-query:
LEFT OUTER JOIN (
SELECT x.*,
y.*
FROM tableX X
INNER JOIN tableY Y
ON (Y.key = X.key and Y.date = X.date)
WHERE Y.status = 'statusFlag1'
) XY
on ( A.key = XY.key and A.date = XY.date );
Note: This does not address performance as that is something you will have to profile with your tables, partitions, indexes, data, statistics and hardware and is not something the we can easily advise on.
Both solutions would require the or Y.status is null because the left join can give no results for Y in both cases (or you would change the query result).
With that added, it is better to do the join directly on the keys without using nvl() as that is a heavy operation for the database.
Real life performance should still be measured on production as hardware, indexes, table size (also relatively to each other) and database parameters have a lot of influence.
So I currently have a pretty big table that I need to query frequently. This table has a lot of columns and rows and I need to filter the rows based on 2 lists that are placed in 2 virtual tables and I want the results that match with either one of them
I'm doing a procedure to get the results but our development environment doesn't have enough sample size to test the procedure performance. So now I have 2 options.
Use LEFT JOINS to do something like this
SELECT GT.* FROM GiantTable GT
LEFT JOIN List1 L1 ON GT.X = L1.X
LEFT JOIN List2 L2 ON GT.Y = L2.Y
WHERE L1.X IS NOT NULL OR L2.Y IS NOT NULL
Use 2 JOIN queries and UNION the results
SELECT GT.* FROM GiantTable GT JOIN List1 L1 ON GT.X = L1.X
UNION
SELECT GT.* FROM GiantTable GT JOIN List2 L2 ON GT.Y = L1.Y
I'm pretty sure that the 2nd should wield much better performance but I would like to know if I'm mistaken
You can probably do better with
select *
from gianttable
where x in (select x from L1) or y in (select y from L2)
Edit:
OP reports that this version is slower than his original attempts. This may be true, especially if x and y are indexed in gianttable (they are primary keys in their respective lists, so they are indexed in those smaller tables). What gets in the way is the OR operator in the where condition - that means neither condition can be used as an access predicate.
There's one more thing to try... where indexes should actually help. The query below is equivalent to the OP's first attempt, and also to my first attempt (above) and dasblinkenlight's solution. (That is so because x and y are PK in their respective lists, so we don't need to handle null there.)
select * from gianttable where x in (select x from L1)
union all
select * from gianttable where y in (select y from L2) and x not in (select x from L1)
UNION is problematic, because it requires filtering out duplicates from the results of two subqueries. You should be able to obtain a decent performance with EXISTS operator and a pair of correlated subqueries:
SELECT *
FROM GiantTable GT
WHERE (EXISTS (SELECT * FROM List1 L1 WHERE L1.X = GT.X))
OR (EXISTS (SELECT * FROM List2 L2 WHERE L2.Y = GT.Y))
Locally, this query runs fine (<1sec) but on the clients side it takes around 3mins (With a different Execution Plan).
I've rebuilt the indexes, stats, etc. This decreased the time to 2:15. After investigating I found out that the issue revolves around the or statement beginning on the line
AND (a.p = '123456789' or ... If I restructure the query to use a union instead, the query takes <1sec. So what is it about this 'or' that causes the clients time to jump 2+ minutes?
select *
from foo_main A with(nolock)
where a.i = a.i
and (IsNull(A.v, '0') = '1')
and (IsNull(A.d, '') = 'CODE_B')
and A.c in ('CODE_B')
AND (a.p = '123456789' or
a.p IN (SELECT DISTINCT f.ui
FROM foo_faculty f
LEFT JOIN foo_unit ff ON (f.ui = ff.ui)
LEFT JOIN unit u ON (ff.ui = u.ui AND
f.c = u.c),
foo_Personnel p, foo_System s, unit u1
WHERE s.P = p.P
AND s.s = 'G'
AND s.R = 'R4'
AND p.ui = '1q2w3e4r5t6y'
AND p.ui = u1.ui
AND p.i = u1.i
AND u.i = u1.i
AND u.dI LIKE u1.DI + '%' COLLATE
SQL_Latin1_General_CP1_CS_AS))
order by lname, fname
Thank you for any and all help!
Sorry I should have mentioned before, the select * is not part of the original code. I changed it to minimize the amount of code so everyone could find the or statement more easily.
Use JOINS instead of IN () clause.
select
whatever
from
bank_accs b1,
bank_accs b2,
table3 t3
where
t3.bank_acc_id = t1.bank_acc_id and
b2.bank_acc_number = b1.bank_acc_number and
b2.currency_code(+) = t3.buy_currency and
trunc(sysdate) between nvl(b2.start_date, trunc(sysdate)) and nvl(b2.end_date, trunc(sysdate));
My problem is with the date (actuality) check on b2. Now, I need to return a row for each t3xb1 (t3 = ~10 tables joined, of course), even if there are ONLY INVALID records (date-wise) in b2. How do I outer-join this bit properly?
Can't use ANSI joins, must do in a single flat query.
Thanks.
If I understand you, just add the outer sign(+) to all columns of b2:
select
whatever
from
bank_accs b1,
bank_accs b2,
table3 t3
where
t3.bank_acc_id = t1.bank_acc_id and
b2.bank_acc_number = b1.bank_acc_number and
b2.currency_code(+) = t3.buy_currency and
trunc(sysdate) between nvl(b2.start_date(+), trunc(sysdate)) and nvl(b2.end_date(+), trunc(sysdate));
It's possible to write old-style outer join with inequalities but it is error-prone. I suggest you use an inline view and the outer join will be clear and explicit:
SELECT whatever
FROM bank_accs b1,
table3 t3,
(SELECT b2.*
FROM bank_accs b2
WHERE trunc(sysdate) BETWEEN nvl(b2.start_date, trunc(sysdate))
AND nvl(b2.end_date, trunc(sysdate))
) b2
WHERE t3.bank_acc_id = t1.bank_acc_id
AND b2.bank_acc_number = b1.bank_acc_number
AND b2.currency_code(+) = t3.buy_currency;
I have got 3 tables with those columns below:
Topics:
[TopicID] [TopicName]
Messages:
[MessageID] [MessageText]
MessageTopicRelations
[EntryID] [MessageID] [TopicID]
messages can be about more than one topic. question is: given couple of topics, I need to get messages which are about ALL these topics and not the less, but they can be about some other topic too. a message which is about SOME of these given topics won't be included. I hope I explained my request well. otherwise I can provide sample data. thanks
The following use x, y, and z to stand in for topic ids, being that none were provided for examples.
Using JOINs:
SELECT m.*
FROM MESSAGES m
JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
JOIN TOPICS tx ON tx.topicid = mtr.topicid
AND tx.topicid = x
JOIN TOPICS ty ON ty.topicid = mtr.topicid
AND ty.topicid = y
JOIN TOPICS tz ON tz.topicid = mtr.topicid
AND tz.topicid = z
Using GROUP BY/HAVING COUNT(*):
SELECT m.*
FROM MESSAGES m
JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
JOIN TOPICS t ON t.topicid = mtr.topicid
WHERE t.topicid IN (x, y, z)
GROUP BY m.messageid, m.messagetext
HAVING COUNT(*) = 3
Of the two, the JOIN approach is safer.
The GROUP BY/HAVING relies on the MESSAGETOPICRELATIONS.TOPICID being either part of the primary key, or having a unique key constraint to ensure there aren't duplicates. Otherwise, you could have 2+ instances of the same topic associated to a message - which would be a false positive. Using HAVING COUNT(DISTINCT ... would clear up any false positives, but support depends on the database - MySQL supports it at 5.1+, but not on 4.1. Oracle might, have to wait till Monday to test on SQL Server...
I looked into Bill's comment about not needing the join to the TOPICS table:
SELECT m.*
FROM MESSAGES m
JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
AND mtr.topicid IN (x, y, z)
...will return false positives - rows that match at least one of the values defined in the IN clause. And:
SELECT m.*
FROM MESSAGES m
JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
AND mtr.topicid = x
AND mtr.topicid = y
AND mtr.topicid = z
...won't return anything at all, because the topicid can never be all of the values at once.
Here's a profoundly inelegant solution
SELECT
m.MessageID
,m.MessageText
FROM
Messages m
WHERE
m.MessageID IN (
SELECT
mt.MessageID
FROM
MessageTopicRelations mt
WHERE
TopicID IN (1,4,5)// List of topic IDS
GROUP BY
mt.MessageID
HAVING
count(*) = 3 //Number of topics
)
Edit: thanks to #Paul Creasey and #OMG Ponies for finding the flaws in my approach.
The correct way to do this is with a self-join for each topic; as shown in the leading answer.
Another profoundly inelegant entry:
select m.MessageText
, t.TopicName
from Messages m
inner join MessageTopicRelations mtr
on mtr.MessageID = m.MessageID
inner join Topics t
on t.TopicID = mtr.TopicID
and
t.TopicName = 'topic1'
UNION
select m.MessageText
, t.TopicName
from Messages m
inner join MessageTopicRelations mtr
on mtr.MessageID = m.MessageID
inner join Topics t
on t.TopicID = mtr.TopicID
and
t.TopicName = 'topic2'
...
Re: the answer by OMG Ponies, you don't need to join to the TOPICS table. And the HAVING COUNT(DISTINCT) clause works fine in MySQL 5.1. I just tested it.
This is what I mean:
Using GROUP BY/HAVING COUNT(*):
SELECT m.*
FROM MESSAGES m
JOIN MESSAGETOPICRELATIONS mtr ON mtr.messageid = m.messageid
WHERE mtr.topicid IN (x, y, z)
GROUP BY m.messageid
HAVING COUNT(DISTINCT mtr.topicid) = 3
The reason that I suggest COUNT(DISTINCT) is that if the columns (messageid,topicid) don't have a unique constraint, you could get duplicates, which would result in a count of 3 in the group, even with fewer than three distinct values.