Talking a slice of a string in Pharo smalltalk - smalltalk

I have a string an I would like to be able to take a substring from the middle of it. For example for the string 'abcdefg' I want to get 'cde' (but the exact start and stop could be arbitrary). I have found allButFirst: and allButLast: messages so I can do it in two steps like
str := 'abcdefg'
(str allButFirst: 2) allButLast: 2
But I was expecting there to be a single message. I also tried using from:to: (which naively makes sense to me) but that doesn't work on the string class.
Is there a message in Pharo 9 Smalltalk that will take the start and stop or start and length for getting a substring?

You can use
'abcdefg' copyFrom: 3 to: 5
and get cde.
Here is one more important piece of advice:
You can go to Main Menu -> Tool -> Finder. Then in the Finder window switch from selectors search to examples. Then in the search bar enter
'abcdefg' . 3 . 5 . 'cde'
These are the "4 elements" of the search. The first one is the receiver, the last one is the expected result, and the middle elements will be used as parameters. So if there is any method that does what you expect, you will see in among the results.

Related

Regex match first number if it does not appear at the end

I am currently facing a Regex problem which apparently I cannot find an answer to.
My Regex is embedded in a teradata SQL of the form:
REGEXP_SUBSTR(column, 'regex_pattern')
I want to find the first appearance of any number except if it appears at the end of the string.
For Example:
"YEL2X30" -> "2"
"YEL19XYZ05" -> "19"
"YELLOW05" -> ""
I tried it with '[0-9]+(?!$)/' but this returns me a blank String always.
Thanks in Advance!
Shot in the dark here since I'm unfamiliar with teradata and the supported SQL-functionality. However, reading the docs on the REGEXP_SUBSTR() function it seems like you may want to use the 3rd and 4th possible argument along with a slightly different regular expression:
[0-9]+(?![0-9]|$)
Meaning: 1+ Digits that are not followed by either the end of the string or another digit.
I'd believe the following syntax may work now to retrieve the 1st appearance of any number from the matching results:
REGEXP_SUBSTR(column, '[0-9]+(?![0-9]|$)', 1, 1)
The 3rd parameter states from which position in the source-string we need to start searching whereas the 4th will return the 1st match from any possible multiple matches (is how I read the docs). For example: abc123def456ghi789 whould return 123.
Fiddling around in online IDE's gave me that:
CREATE TABLE TBL (TST varchar(100));
INSERT INTO TBL values ('YEL2X30'), ('YEL19XYZ05'), ('YELLOW05'), ('abc123def456ghi789');
SELECT REGEXP_SUBSTR(TST, '[0-9]+(?![0-9]|$)', 1, 1) as 'RESULTS' FROM TBL;
Resulted in:
RESULTS
2
19
NULL
123
NOTE: I also noticed that leaving out the 3rd and 4th parameter made no difference since they will default back to 1 without explicitly mentioning them. I tested this over here.
Possibly the simplest way is to look for digits followed by a non-digit. Then keep all the digits:
regexp_substr(regexp_substr(column, '[0-9]+[^0-9]'), '[0-9]+')

Freemarker: Removing Items from One Sequence from Another Sequence

This may be something really simple, but I couldn't figure it out and been trying to find an example online to no avail. I'm basically trying to remove items found in one sequence from another sequence.
Example #1
Items added to the cart is in one sequence; items removed from cart is in another sequence:
<#assign Added_Items_to_Cart = "AAAA,BBBB,CCCC,DDDD,EEEE,FFFF">
<#assign Deleted_Items_from_Cart = "BBBB,DDDD">
The result I'm looking for is: AAAA,CCCC,EEEE,FFFF
Example #2
What if the all items added to and deleted from cart are in the same sequence?
<#assign Cart_Activity = "AAAA,BBBB,BBBB,CCCC,DDDD,EEEE,DDDD,FFFF,Add,Add,Delete,Add,Add,Add,Delete,Add">
The result I'm looking for is the same: AAAA,CCCC,EEEE,FFFF
First things first: You ask about sequence but the data you are dealing with are strings.
I know you are using the string to work as a sequence (and it works), but sequences are sequences and strings are strings, and they have diferente ways of dealing with. I just felt this was important to clarify if someone who is starting to learn how to program get to this answer.
Some assumptions since you're providing strings with data separated by comma:
You want a string with data separated by comma as a result.
You know how to properly create strings with data separated by comma.
You dont have commas in your items names.
Observations:
I'll give you the logic but not the code donne, as this can be a great chance for you to learn/practice freemarker (stackoverflow spirit, you know...)
You question is not about something specific of freemaker (it just happens to be the language you want to work with). Think about adding the logic tag to you question. :-)
Now to the answer on how to do what you want on a "string that is working as a sequence":
Example #1
Change your string to a real sequence :-)
1 - Use a built-in to split your string on commas. Do it for both Added_Items_to_Cart and Deleted_Items_from_Cart. Now you have two real sequences to work with.
2 - Create a new string tha twill be your result .
3 - Iterate over the sequence of added itens.
4 - For each item of the added list, you will check if the deleted list also contains this item.
4.1 - If the deleted list contains the item you do nothing.
4.2 - If the deleted list do not contains the item, you add that item to your string result
At the end of this nested iteration (thats another hint) you should get the result you're looking for.
Example #2
There are many ways of doing it and i'll just share the one that pops out of my mind right now.
I think it's noteworthy that in this approach you will always have an even sized list, as you always insert 2 infos each time: item and action.
So always the first half will be the 'item list' and the second half will be the 'action list'.
1 - Change that string to a sequence (yes, like on the other example).
2 - Get half of its size (in your example size = 16 so half of it is 8)
3 - Iterate over a range from 0 to half-1 (in your example 0 to 7)
4 - At each iteration you'll have a number. Lets call it num (yes I'm very creative):
4.1 - If at the position num + half you have the word "Add" you add the item of position num in your result string
4.2 - If at the position num + half you have the word "Delete" you remove the item of position num from your result string
And for the grand finale, some really usefull links that will help you in your freemarker life forever!!!
All built-ins from freemarker:
https://freemarker.apache.org/docs/ref_builtins.html
All directives from freemarker:
https://freemarker.apache.org/docs/ref_directive_alphaidx.html
Freemarekr cheatsheet :
https://freemarker.apache.org/docs/dgui_template_exp.html#exp_cheatsheet

Regex matching sequence of characters

I have a test string such as: The Sun and the Moon together, forever
I want to be able to type a few characters or words and be able to match this string if the characters appear in the correct sequence together, even if there are missing words. For example, the following search word(s) should all match against this string:
The Moon
Sun tog
Tsmoon
The get ever
What regex pattern should I be using for this? I should add that the supplied test strings are going to be dynamic within an app, and so I'd like to be able to use a pattern based on the search string.
From your example Tsmoon you show partial words (T), ignoring case (s, m) and allow anything between each entered character. So as a first attempt you can:
Set the ignore case option
Between each chapter input insert the regular expression to match zero or more of anything. You can choose whether to match the shortest or longest run.
Try that, reading the documentation for NSRegularExpression if you're stuck, and see how it goes. If you get stuck ask a new question showing your code and the RE constructed and explain what happens/doesn't work as expected.
HTH

Find All in a Textbox

I am working on an application to search for and build a list of all the times a string (or variable of) is in a text file. Kind of like a Find All function in a text editor that I can build a list with the info that is found, such as
S350
S250
S270
S5000
What can I use to do this search? It will have one value that does not change (The S in this case) followed by up to 4 digits
RegEx seems like a good choice.
Something like.. S(\d{1,4})? might work for you.
Expresso is my preferred regular expression composer.

Add spaces between words in spaceless string

I'm on OS X, and in objective-c I'm trying to convert
for example,
"Bobateagreenapple"
into
"Bob ate a green apple"
Is there any way to do this efficiently? Would something involving a spell checker work?
EDIT: Just some extra information:
I'm attempting to build something that takes some misformatted text (for example, text copy pasted from old pdfs that end up without spaces, especially from internet archives like JSTOR). Since the misformatted text is probably going to be long... well, I'm just trying to figure out whether this is feasibly possible before I actually attempt to actually write system only to find out it takes 2 hours to fix a paragraph of text.
One possibility, which I will describe this in a non-OS specific manner, is to perform a search through all the possible words that make up the collection of letters.
Basically you chop off the first letter of your letter collection and add it to the current word you are forming. If it makes a word (eg dictionary lookup) then add it to the current sentence. If you manage to use up all the letters in your collection and form words out of all of them, then you have a full sentence. But, you don't have to stop here. Instead, you keep running, and eventually you will produce all possible sentences.
Pseudo-code would look something like this:
FindWords(vector<Sentence> sentences, Sentence s, Word w, Letters l)
{
if (l.empty() and w.empty())
add s to sentences;
return;
if (l.empty())
return;
add first letter from l to w;
if w in dictionary
{
add w to s;
FindWords(sentences, s, empty word, l)
remove w from s
}
FindWords(sentences, s, w, l)
put last letter from w back onto l
}
There are, of course, a number of optimizations you could perform to make it go fast. For instance checking if the word is the stem of any word in the dictionary. But, this is the basic approach that will give you all possible sentences.
Solving this problem is much harder than anything you'll find in a framework. Notice that even in your example, there are other "solutions": "Bob a tea green apple," for one.
A very naive (and not very functional) approach might be to use a spell-checker to try to isolate one "real word" at a time in the string; of course, in this example, that would only work because "Bob" happens to be an English word.
This is not to say that there is no way to accomplish what you want, but the way you phrase this question indicates to me that it might be a lot more complicated than what you're expecting. Maybe someone can give you an acceptable solution, but I bet they'll need to know a lot more about what exactly you're trying to do.
Edit: in response to your edit, it would probably take less effort to run some kind of OCR tool on a PDF and correct its output than it would just to correct what this system might give you, let alone program it
I implemented a solution, the code is avaible on code project:
http://www.codeproject.com/Tips/704003/How-to-add-spaces-between-spaceless-strings
My idea was to prioritize results that use up most of the characters (preferable all of them) then favor the ones with the longest words, because 2,3 or 4 character long words can often come up by chance from leftout characters. Most of the times this provides the correct solution.
To find all possible permutations I used recursion. The code is quite fast even with big dictionaries (tested with 50 000 words).