I want to, for each of the following languages on Τ={a, b, c}, construct the corresponding regular expression and regular grammar:
All strings containing exactly three a’s.
All strings containing at most three b’s.
How can I do this?
You may always use unions, concatenations and Kleene stars in addition to the given symbols (unless the task explicitly forbids it). So if you don't know how those work, read up on those first. Afterwards, here's a hint to the first task: take any string that contains three or more b's, say, acbaacbbaacbacb. Each character is either one of the first three b's or not: xxbxxxbbxxxxxxx. So the structure of such a string is a sequence of any characters (or maybe none if it starts with a b), and then a b, then more other characters (maybe), then another b, more characters (maybe), the third b, and finally more characters (maybe). How do you express "any character", and how do you express the alternating sequence of b's and "any character, zero or more times"?
Related
I have been searching on good use-cases and differences between the use of LIKE and = where I faced this problem regarding LIKE.
Since LIKE "_r%" means those with 'r' as their second character, doesn't it hold to assume that "r%_" means those with 'r' as their first character, essentially making it functionally same as "a%".
I am asking this because our lecture slides says the otherwise, and I am not sure whether I am wrong or not. I have also ran this SQL test program (https://www.w3schools.com/sql/trysql.asp?filename=trysql_select_like_underscore) to see it firsthand and this also proves my point.
Have a good day.
No, both r%_ and r% don't actually mean the same thing. The first version r%_ will match any string starting with r, followed by zero or more of any character, followed by any single character. This pattern will match ra and ran, but it will not match single r. The pattern r% on the other hand will match r, since % allows for zero characters following the leading r.
The difference is that a% will match anything that starts with a, including a by itself (% matches zero, one, or multiple characters), while a%_ will match anything that starts with a, but must be followed by at least one other character (_ matches exactly one character).
I'm quite new to SQL Databases, but I'm trying to add a Conditional Split in my Data Flow between my Flat File Source and OLE DB Database to exclude records containing some special characters such as ø and ¿ and ¡ on the [title] column. Those are causing errors when creating a table and therefore I want those records to be split from my table. How can I create a conditional split for this?
As a bonus: Is there a way to only filter in a conditional split the rows that contain numbers from 0-9 and letters from a-zA-Z so that all rows with "special" symbols are filtered out automatically?
A conditional split works by determining whether a condition is true or false. So, if you can write a rule that evaluates to true or false, and you can multiple rules to address assorted business needs, then you can properly shunt rows into different pathways.
How do I do that?
I always advocate that people add new columns to their data flows to handle this stuff. It's the only way you're going to have to debug when a condition comes up that you think should have been handled but wasn't.
Whether you create a column called IsTitleOnlyAlphaNumeric or IsTitleInternational is really up to you. General programming rule is you go for the common/probable case. Since the ASCII universe is 127 characters max, 255 for extended ASCII, I'd advocate the former. Otherwise, you're going to play whack-a-mole as the next file has umlats or a thorn in it.
Typically, we would add a new column through a Derived Column Transformation which means you're working with the SSIS expression language. However, in this case the expression does not have the ability to gracefully* identify whether the string is good or not. Instead, you'll want to use the .NET library for this heavy lifting. That's the Script Component and you'll have it operate in the Transformation mode (default).
Add a new column of type boolean IsTitleOnlyAlphaNumeric and crib the regular expression from check alphanumeric characters in string in c#
The relevant bit of the OnRowProcessed (name approximate) would look like
Row.IsTitleOnlyAlphaNumeric = isAlphaNumeric(Row.Title);
As rows flow through, that will be evaluated for each one and you'll see whether it meets the criteria or not. Depending on your data, you might need a check for NULL before you call that method.
How I shouldn't do that
*You could abuse the daylights out of the REPLACE function and test the allowable length of an expression by doing something like creating a new column called StrippedTitle and we are going to replace everything allowable character with an empty string. If the length of the trimmed final string is not zero, then there's something bad in there.
REPLACE(REPLACE(REPLACE([Title], "A", ""), "B", ""), "C", "") ..., "a", ""), "b", "") ..., "9", "")
where ... implies you've continued the pattern. Yes, you'll have to replace upper and lower cased characters. ASCIITable.com or similar will be your friend.
That will be a new column. So add a second Derived Column component to calculate whether it's empty - again, easier to debug. IsTitleOnlyAlphaNumeric
LEN(RTRIM(StrippedTitle)) == 0
Terrible approach but the number of questions I answer where people later clarify "I cannot use script" is decidedly non-zero.
I am learning basics of SQL through W3School and during understanding basics of wildcards I went through the following query:
--Finds any values that start with "a" and are at least 3 characters in length
WHERE CustomerName LIKE 'a_%_%'
as per the example following query will search the table where CustomerName column start with 'a' and have at least 3 characters in length.
However, I try the following query also:
WHERE CustomerName LIKE 'a__%'
The above query also gives me the exact same result.
I want to know whether there is any difference in both queries? Does the second query produce a different output in some specific scenario? If yes what will be that scenario?
Both start with A, and end with %. In the middle part, the first says "one char, then between zero and many chars, then one char", while the second one says "one char, then one char".
Considering that the part that comes after them (the final part) is %, which means "between zero and many chars", I can only see both clauses as identical, as they both essentially just want a string starting with A then at least two following characters. Perhaps if there were at least some limitations on what characters were allowed by the _, then maybe they could have been different.
If I had to choose, I'd go with the second one for being more intuitive. After all, many other masks (e.g. a%%%%%%_%%_%%%%%) will yield the same effect, but why the weird complexity?
For Like operator a single underscore "_" means, any single character, so if you put One underscore like
ColumnName LIKE 'a_%'
you basically saying you need a string where first letter is 'a' then followed by another single character and then followed by anything or nothing.
ColumnName LIKE 'a__%' OR ColumnName LIKE 'a_%_%'
Both expressions mean first letter 'a' then followed by two characters and then followed by anything or nothing. Or in simple English any string with 3 or more character starting with a.
I need help creating a single tape deterministic Turing machine for this language
here I am not sure how to determine which strings the TM will accept. How can I make the machine accept strings where a=c? because the b part has elements from both a and c.
Maybe you can try do adapt a machine which accepts palidromes: you read a character to the left. If it belongs to {0,1} you delete it and go to the right (the last character). If the character belongs to {2,3}, you delete it and go back to the left (the first character). Repeat it until you find a character which does not belong to the "a" or "c" side (and check the last character if you were on the left), the remaining characters should belong to the "b" block.
I need to explain the differences using French and Spanish first and last names. Any pointers are appreciated. I did a Google search but the results are not satisfactory.
Here are some explanations:
Lexicographical
In this case, you sort text without considering numbers. In fact, numbers are just "letters", they have no numeric combined meaning.
This means that the text "ABC123" is sorted as the letters A, B, C, 1, 2 and 3, not as A, B, C and then the number 123.
This has the unfortunate consequence that ordering things that might look like they should order like numbers doesn't.
For instance, when sorting these two:
ABC90
ABC100
You might expect the one with 90 to be sorted before 100 because 90 comes before 100, but that's not how lexicographical ordering works, it compares the 9 with the 1, and then swaps them around.
Natural Ordering
This is the ordering that would make the above ordering work properly, by sorting 90 before 100. Natural ordering switches to numeric ordering for a portion of the text, if it encounters numbers in both texts.
Collation-based ordering
This one handles things like variations between languages.
Normally, lexicographical ordering compares one letter to another letter, and determines their order, usually according to the "value" of the letter. This can have some strange effects.
For instance, how do you think the following two strings would be ordered?
ABCTEN
ABCßEN
Well, since the letter for ß might have an ordinal value (ie. its "place" in the Unicode alphabet) that has a higher value than the T, the above order is what would be the outcome. Basically, if you go look in the Unicode chart that contains all the letters, you might find that T has a symbol value of less than 100, and the ß be above 100.
However, in Germany, you should consider the above two texts as this:
ABCTEN
ABCSSEN
and thus their order should be reversed, since S comes before T.
This is collation-based ordering. You pick a collation for your text that describes the context in which those texts should be processed. This allows you to get natural ordering in different languages.
For instance, in Norway, the letters Æ, Ø and Å are ranked as coming directly after the Z, however in other languages (I forget which), Æ should be ranked just after E, Ø just after O and Å just after A. The collation dictates this.