Regex - isolating string from larger word - sql

The following regex within DB2 SQL works pretty well to get extra elements out of an address (i.e. not the street name or number). Limiting myself to two cases (UNIT or GATE) to keep my example simple, where HAD1 is the field containing the first line of a street address:
select HAD1,
regexp_substr(HAD1,'(UNITS?|GATES?)\s[0-9A-Z]{1,}')
from ECH
where regexp_like(HAD1,'(UNIT|GATE)')
and length(trim(HAD1)) > 12
I get this:
Ship To REGEXP_SUBSTR
Address
Line 1
UNIT 4, 117 MONTGOMORIE RD UNIT 4
END OF WAINUI RD, HIGHGATE -
UNIT 3, 37 TE ROTO DRIVE UNIT 3
GATE 6 52 MAHIA ROAD GATE 6
UNIT B 11 LANGSTONE LANE UNIT B
ASHBURTON FITTINGS GATE 2 GATE 2
GOODS: PLACEMAKERS - WESTGATE -
UNIT 3, 37 TE ROTO DRIVE UNIT 3
ASHBURTON FITTINGS GATE 2 GATE 2
SH 8A TARRAS-LUGGATE HIGHWAY GATE HIGHWAY
Which is very encouraging. It correctly didn't pick up HIGHGATE or WESTGATE because they weren't followed by a space then something else.
But it did pick up LUGGATE (last line), which I don't want. So, I'd like to be able to include that my text strings are not preceded by any character.
As you may guess I'm an absolute beginner with regex, so thank you for your patience.
Edit
Now I have my most excellent regex like so:
\b(GATE|LEVEL|DOOR|UNITS?)\s[\dA-Z]{1,}
Using it over a larger data set I notice the occasional unwanted match where, for instance, GATE is followed by an ordinary English word:
THE THIRD GATE ON THE LEFT = GATE ON
The gates, levels, doors and units that I'm looking for will always be followed by one of the following: (a) A number of up to 6 digits (b) One letter (c) A number and one letter, possibly with a dash
Examples:
UNIT 7A
GATE 6
GATE 31113
UNIT B
LEVEL B2
LEVEL 2B
UNIT D06
So, my follow up question is, can I limit the number of letters in second part of the expression to 0 or 1, but allow up to six digits.
I've played around with the numbers in curly brackets but they seem to affect only how many characters are returned rather than how many characters must be present.

Related

how to use positioning/range in regexp

I have a product code where the references always follows this pattern: XX00XX000XX. Characters 1 and 2 are always a combination of 2 letters, 3 to 4 a combination of 2 numbers, 5 to 6 letters, 7 to 10 numbers and 10 to 11 letters again (they`re always varying so it'll never be the same).
I want to do a regexp_contains (or another variant) that matches by position like; position 1 - 2 must be [[:alpha:]], 3 - 4 [[:digit:]], and so on.
(I need this to find product codes that match the reference pattern inside sell links, but I can't find any clear explanation on how to use positioning on regex statements...)
You can use character classes for this.
[a-zA-Z][a-zA-Z]\d\d[a-zA-Z][a-zA-Z]\d\d\d[a-zA-Z][a-zA-Z]
This regex contains the class [a-zA-Z] and \d, which matches letter and digit respectively. This explicitly checks, first character is a letter, second character is a letter, third character is a digit, etc.
The character classes match 1 character in the set specified, so [a-zA-Z] matches any letter, [13579] will match any odd number, etc.

Regex: extracting a house number from an address

I have following patterns:
13 R 2
48 B / 5
42 B
42B
303 Box 15
303 Bte 15
303 B Bt 15
and only want to have the following results (because Box 15, Bte 15 are the box numbers, and I only want the house nbr + potentially the letter attached to the house number):
13 R 2
48 B / 5
42 B
42B
303
303
303 B
Is this possible using a regular expression? I tried the following: REGEXP_SUBSTR(my_string_variable, '^\d+(\s*\w$)?'). This however only works for the patterns 3-5, and not for the first 2 and last patterns. Dropping the $ from the regex would incorrectly 'strip' the first letter for patterns 5 and 6.
I am basically assuming that if the letter behind the numeric is more than 1 character, that it belongs to the box number. For example, BTE is the French abbreviation for Boite which means Box. I realise this might be invalid if a house number has 2 letters (e.g.: 11 AA), but I would not know a solution for this and I don't think it occurs much.
This will remove: a space followed by an uppercase letter followed by at least one lowercase letter followed by an optional space followed by any number of digits:
RegExp_Replace(house_number, '\s[A-Z][a-z]+\s+\d+$')
See regex101.com

Google Spreadsheet with SQL query - finding best combination

I have a google spreadsheet for my gaming information. It contains 2 sheets - one for monster information, another for team.
Monster information sheet contains the attack value, defend value, and the mana cost of monsters. It's almost like a database of monsters that I can summon.
Team sheet does the following:
Asks for the amount of mana I currently have.
Computes a list of up to 5 monsters that I can summon (it can be less than 5).
Each monster has their own mana cost, therefore total mana cost mustn't exceed the amount of mana I have given in point 1.
The tabulated list should give me a team that have the highest combined attack value. It does not matter how many monsters are summoned. Each monster cannot be summoned twice though.
I have been thinking of using query() function so that I can make use of SQL statements. (so that I can hopefully retrieve the tabulated list directly)
Sample: Monster Info
A B C D
1 Monster Attack Defense Cost
2 MonA 1200 1200 35
3 MonB 1400 1300 50
... ...
Sample: Team
A B C D
1 Mana 120
2
3 Attack Team
4 Monster Attack Cost Total Attack
5 MonB 1400 50 1400
6 MonA 1200 35 2600
7 ... ...
I have these formula in "Team" sheet
A5: =query('Monster Info'!$A$:$D,"SELECT A,B,D ORDER BY B DESC LIMIT 5")
B5: =CONTINUE(A5, 1, 2)
C5: =CONTINUE(A5, 1, 3)
D5: =C5
A6: =CONTINUE(A5, 2, 1)
B6: =CONTINUE(A5, 2, 2)
C6: =CONTINUE(A5, 2, 3)
D6: =D5+C6
That only gets the 5 best attack monsters, regardless of the mana cost consideration. How do I do that such that it takes consideration of both attack value and mana cost value? There is another problem shown in the example below:
Example: (simplified version, without defense value etc)
Monster Attack Cost
MonA 1400 50
MonB 1200 35
MonC 1100 30
MonD 900 25
MonE 500 20
MonF 400 15
MonG 350 10
MonH 250 5
If I have 160 mana, then the obvious team is A+B+C+D+E (5100 Attack).
If I have 150 mana, it becomes A+B+C+D+G (4950 Attack).
If I have 140 mana, it becomes A+B+C+D (4600 Attack).
If I have 130 mana, it becomes B+C+D+E+F (4100 Attack using 125 mana) or A+B+C+F (4100 Attack using all 130 mana).
If I have 120 mana, it becomes B+C+D+E+G (4050 Attack).
If I have 110 mana, it becomes B+C+D+F+H (3850 Attack).
As you can see, there isn't really a pattern within the results.
Any expert willing to share their insights on this?
I've played with the problem for an hour and I only have a workaround here. Your problem seems to be a standard linear programming task which should can easily be solved by a "Solver" software. There used to be a so called "Solver" in google spreadsheet, but unfortunately it was removed from the newest version. If you are not insisting on Google solution, you should try it in one of the Solver-supported spreadsheet manager softwares.
I tried MS Office (it has a Solver add-in, installation guide: http://office.microsoft.com/en-001/excel-help/load-the-solver-add-in-HP010342660.aspx).
Before you run the solver, you should prepare your original dataset a bit, with helper columns and cells.
Add a new column next to the "Cost" column (let's assume it is column "D"), and under it put each row either 0, or 1. This column will tell you if a monster is selected to the attack team or not.
Add two more columns ("E" and "F" respectively). These columns will be products of the Attack and of the Cost respectively. So you should write a function to the E2 cell: =b2*d2, and for the F2 cell: =c2*d2. With this way if a monster is selected (which is told by the D column, remember), the appropriate E and F cells will be non zero values, aotherwise they will be 0.
Create a SUM row under the last row, and create a summarizing function for the D,E,F columns respectively. So in my spreadsheet D10 cell gets its value like this: =sum(d2:d9), and so on.
I created a spreadsheet to show these steps: https://docs.google.com/spreadsheets/d/1_7XRlupEEwat3CthSSz8h_yJ44MysK9hMsj0ijPEn18/edit?usp=sharing
Remember to copy this worksheet to an MS Office worksheet, before you start the Solver.
Now, you are ready to start the Solver. (Data menu, Solver in MS Office). You can see a video here on using the Solver: https://www.youtube.com/watch?v=Oyc0k9kiD7o
It's not that hard as it looks like, but for this case I'll describe what to write where:
Set Objective: you should select the "E10" cell, as that represents the sum of all the attack points.
Check "Max" radiobutton as we would like to maximize the value of the attacks.
By Changing variable cells: Select the "d2:d9" interval as those cells are representing whether a monster is selected or not. The solver will try to adjust these values (0, or 1) in order to maximise the sum attack.
Subject to the Contraints: Here we should add some constraints. Click on the Add button, and then:
First we should ensure that d2:d9 are all binary values. So "Cell reference" should be "d2:d9" and from the dropdown menu, select "bin" as binary.
Another constraint should be that the sum of the selected monsters should not exceed 5. So select the cell where the sum of the selected monsters is represented (D10) and add "<=" and the value "5"
Finally we cannot use more manna that we have, so select the cell in which you store the sum of used manna (F2), and "<=", and add the whole amount of manna we can spend in my case it's in the I2 cell).
Done. It should work, in my case it worked at least.
Hope it helps anyway.

Create all combinations of a word through spaces in the Console Application

I'm trying to experiment with this, http://gyazo.com/8190a3c98a520bbeb77335e05ea5a636 (a visual basic console application). I want it to allow the user to enter in a word such, and have the console reply with it in all spaced combinations possible, so:
Say i'm using the word TEST, for example it would be created spaced out like this:
T EST
T E ST
T E S T
TE ST
TES T
T ES T
And so on... (Such as every combination it can be spaced out with multiple spaces or not)
Is this possible through the Console Application?
When counting, you start at the lowest digit. You start with that digit at zero and you count up until you reach the highest value for that digit, like this: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9. Then, once you reach the highest value, you have to add a second digit (e.g. 10). Then you go from lowest to highest again on the lowest digit again (e.g. 10 - 19) before incrementing the second digit again (e.g. 20). In that way, once you reach 999, you will have listed every possible combination of values in a three digit number.
When counting in binary, it works the same way, but the highest value for each digit is one, so you count up on the lowest digit like this: 0, 1. Then you have to add the second digit and count up again: 10, 11. Then you need to add a third digit (e.g. 100) and do it all again on the first two. By the time you get to 111, you will have listed every possibly combination of 1's and 0's in a three digit binary number.
So, if you think of the space between each letter as a digit in a binary number, where 0 means no space and 1 means there is a space, then all you have to do is count up from 0 to the highest value in a binary number that is the same number of digits as the length of your word, minus 1. So, for instance, with the word TEST, the the counting would look like this:
000 = TEST
001 = TES T
010 = TE ST
011 = TE S T
100 = T EST
...

Determinant Finite Automata (JFLAP)

I have a DFA question (Determinant Finite Automata) . We are using JFLAP to construct the automata. I cannot figure this question out to save my life! Here it is
"DFA to recognize the language of all strings that have an even number of zeros and an odd number of ones."
So the alphabet is {0,1} and only using 0,1. So I need to build an automata that recognizes an even number of zeros and an odd number of ones.

			
				
I don't know whether my understanding is right.
I could give you the description in Grail format that generate an even number of zeros and an odd number of ones.
START 1
1 1 2
2 1 1
1 0 3
3 0 4
4 0 3
FINAL 3