I'm currently trying to implement some kind of search system in a program of mine, and wanted to use an index, but I'm fairly new at Objective-C. The main idea is to have a 'search' command or text box and when I type a word, it'll show me all the items that include that word. All these 'items' will be listed in a .txt file (hopefully) in alphabetical order. Any help is appreciated.
You need to read the .txt file into an NSSet or some other collection class and you can then search it using something like:
[words filterUsingPredicate:[NSPredicate predicateWithFormat:#"SELF contains[c] 'word'"]];
(See the Predicate Guide for details).
The ideal thing, if the text file is large and you want to index by the leading characters of each entry, is to create a "dope vector" of sorts, where each entry in the dope vector contains the first few characters of the line, followed by the file offset where the line starts. Note that one dope vector entry can cover a number of file lines, since it's just serving like the index tabs in a dictionary.
But if you want to search for words within a line in your file, you're better off using a SQL database, or some KWIC scheme.
Related
I'm working on a translator that will take English language text (as user input into a UITextView) and (with a button press) replace specific words with alternatives. I have both the English words in scope plus their alternatives in separate Arrays (englishArray and alternativeArray), indexed correspondingly.
My challenge is finding an algorithm that will allow me to identify a word in the input text (a UITextView) ignoring characters like <",.()>, lookup the word in englishArray (case insensitive), locate the corresponding word in alternativeArray and then use that word in place of the original - writing it back to the UITextView.
Any help greatly appreciated.
NB. I have created a Category extending the NSArray functionality with a indexOfCaseInsensitiveString method that ignores case when doing an indexOfObject type lookup if that helps.
Tony.
I think that using an NSScanner would be best to parse the string into separate words which you could then pass to your indexOfCaseInsensitiveString method. scanCharactersFromSet:intoString: using a set of all the characters you want to ignore, including whitespace and newline characters should get you to the start of a word, and then you could use scanUpToCharactersFromSet:intoString: using the same set to scan to the end of the word. Using scanLocation at the beginning and end of each scan should allow you to get the range of that word, so if you find a match in your array, you will know where in your string to make the replacement.
Thanks for your suggestion. It's working with one exception.
I want to capture all punctuation so I can recreate the original input but with the substituted words. Even though I have a 'space' in my Character Set, the scanner is not putting the spaces into the 'intoString'. Other characters I specify in the Character Set such as '(' and ';' are represented in the 'intoString'.
Net is that when I recreate the input, it's perfect except that I get individual words running into each other.
UPDATE: I fixed that issue by including:
[theScanner setCharactersToBeSkipped:nil];
Thanks again.
I have url, for example:
http://i.myhost.com/myimage.jpg
I want to change this url to
http://i.myhost.com/myimageD.jpg.
(Add D after image name and before point)
i.e I want add some words after image name and before point using regex.
What is the best way do it using regex?
Try using ^(.*)\.([a-zA-Z]{3,5}) and replacing with \1D\2. I'm assuming the extension is 3-5 alphanumeric numbers but you can modify it to suit. E.g. if it's just jpg images then you can put that instead of the [a-zA-Z]{3,5}.
Sounds like a homework question given the solution must use a regex, on that assumption here is an outline to get you going.
If all you have is a URL then #mathematical.coffee's solution will suit. However if you have a chunk of text within which is one or more URLs and you have to locate and change just those then you'll need something a little more involved.
Look at the structure of a URL: {protocol}{address}{item}; where
{protocol} is "http://", "ftp://" etc.;
{address} is a name, e.g. "www.google.com", or a number, e.g. "74.125.237.116" - there will always be at least one dot in the address; and
{item} is "/name" where name is quite flexible - there will be zero or more items, you can think of them as directories and a file but this isn't strictly true. Also the sequence of items can end in a "/" (including when there are zero of them).
To make a regex which matches a URL start by matching each part. In the case of the items you'll want to match the last in the sequence separately - you'll have zero or more "directories" and one "file", the latter must be of the form "name.extension".
Once you have regexes for each part you just concatenate them to produce a regex for the whole. To form the replacement pattern you can surround parts of your regex with parentheses and refer to those parts using \number in the replacement string - see #mathematical.coffee's solution for an example.
The best way to learn regexs is to use an editor which supports them and just experiment. The exact syntax may not be the same as NSRegularExpression but they are mostly pretty similar for the basic stuff and you can translate from one to another easily.
I started working on a new home project, I need to index specific files names with there paths.
The program will index files on my local hard-disk with no need to deal with the contents of the files (so I assuming/hoping it would be simple implementation).
At first the user will insert a list of file extensions to get indexed (During setup time).
Then the program will run and create the data structure holding the path for the specific file entered by the user.
Retrieving data from by data structure would look like this:
path of the file on my HDD=function(filename entered by user)
I thought about it for quite a while and wrote a design for the data structure here is my suggestion
(Design Illustration):
I'll use an array with a hash function for mapping extension to a cell (Each cell presents the
first letter of the extension file).
inside each cell there would be a list of extensions starting with the same letter.
for each node in the list there would be a red black tree for searching the filename and then
after we found the filename the program will retrieve the path of the file stored in the tree
node.
Oh by the way usually I program in c (low level) or in c++.
I think you are making a way too elaborate and complicated scheme. If locating a MyFileTree based on extension is what you want then just use SortedDictionary<string, MyFileTree> where string is your extension and you'll get a O(log n) retrieval mechanism out of the box.
At work we have a txt file with items recorded in them
The columns are typical of
Apple, *fruit
Cow, *animal
House, *thing
Tree, *plant
Is it possible too read through this txt file to check if apple already exists. I namely want to build a preventative for adding double items...
I think you have to read the file in to a list of objects.
In this case, the object will get 2 properties: Type and Category.
After that you can easily perform comparisons etc on the list itself.
Edit:
Some stuff for you to read;
Reading a file: http://msdn.microsoft.com/en-us/library/system.io.streamreader.readline.aspx
Dictionary: http://msdn.microsoft.com/en-us/library/xfhwa508.aspx
I'm unable to write down an example now, but those are the ingredients.
I'm using Doxygen to generate documentation for my code. I need to make a PDF version of this and using Doxygen's LaTeX output appears to be the way to do it.
However I've run into a number of annoying problems, and not knowing anything about LaTeX previously haven't really got much of an idea on how to approach them, and the countless references for LaTeX related things are not much help...
I worked out how to create a custom style thing in a sty file and how to get Doxygen to use it. After a lot of searching I found out how to set the page margins etc. through this, and I'm guessing the perhaps this is the file I want for doing the other things I want, but I cant seem to find any commands for doign what I want :(
The table of contents at the start of the document contains a lot of items Id rather it didn't as it makes the contents very long. Is there some way to limit this contents to just say the first two levels, rather than having entries for every single individual function, variable, etc.? Id quite like to keep all the bookmarks however. I did try the "COMPACT_LATEX" option but as well as removing items on the contents pages, it removed the bookmarks and the member lists at the start of each section, which I do really want to keep.
Is there a way to change the order of things, like putting the full class description at the start of the section, rather than after all the members and attributes?
Wow, that's kind of evil of Doxygen.
Okay, to get around the tocdepth counter problem, add the following line to your .sty file:
\AtBeginDocument{\setcounter{tocdepth}{2}}% or whatever level you want
You can set the PDF bookmarks depth to a separate value:
% requires you \usepackage{hyperref} first
\hypersetup{
bookmarksdepth = section, % of whatever level you want
}
Also note that if you have a list of figures/tables, the tocdepth must be at least 2 for them to show up.
I don't see any way of rearranging those items within the LaTeX files---Doxygen just barfs them out there, so we can't do much. You'll have to poke around the Doxygen documentation to see if there's any way to specify the order I guess. (Here's hoping!)
You're so close.
Googling on "latex contents level" brought me to LaTeX - customizing the depth of the table of contents for different parts of the thesis which suggests
\setcounter{tocdepth}{n}
where n starts at zero for only the highest level division. This is presumable defined in all the default styles, but is worth a try in doxygen.
You could write a Perl/Awk script to simply delete the unwanted lines from the table of contents. For the file burble.tex, Latex will generate the file burble.toc, which will contain lines such as:
\contentsline {subsection}{Class F rewrites}{38}
\contentsline {subsection}{Class M rewrites}{39}
\contentsline {section}{\numberline {7}Definition and properties of the translation}{44}
\contentsline {paragraph}{Well-formedness}{54}
Simple regexes will identify which levels each line belongs to, and you can filter the file based on that. Once you have the table of contents the way you want it, insert \nofiles in the appropriate place (the style sheet?), which means that Latex will read the auxiliary files but not overwrite them.