Looking for efficient way to index files - indexing

I started working on a new home project, I need to index specific files names with there paths.
The program will index files on my local hard-disk with no need to deal with the contents of the files (so I assuming/hoping it would be simple implementation).
At first the user will insert a list of file extensions to get indexed (During setup time).
Then the program will run and create the data structure holding the path for the specific file entered by the user.
Retrieving data from by data structure would look like this:
path of the file on my HDD=function(filename entered by user)
I thought about it for quite a while and wrote a design for the data structure here is my suggestion
(Design Illustration):
I'll use an array with a hash function for mapping extension to a cell (Each cell presents the
first letter of the extension file).
inside each cell there would be a list of extensions starting with the same letter.
for each node in the list there would be a red black tree for searching the filename and then
after we found the filename the program will retrieve the path of the file stored in the tree
node.
Oh by the way usually I program in c (low level) or in c++.

I think you are making a way too elaborate and complicated scheme. If locating a MyFileTree based on extension is what you want then just use SortedDictionary<string, MyFileTree> where string is your extension and you'll get a O(log n) retrieval mechanism out of the box.

Related

Renaming named destinations in PDF files

I've been using named destinations in PDF files to open the PDF file at a specific location in the file. The team responsible for generating the PDF document uses a tool to automatically generated named destinations from book marks, so the named destinations tend to have names like *9_Glossary* or *Additional_Information*. We've been asked to produce the same documents in multiple languages. I expect the we will be supplied PDF documents in multiple foreign languages with bookmarks in the same locations, but the names of the book marks will of course be in these other languages, and the automatically generated named destinations will be in the foreign language. I would like the named destinations in all the documents to be the same.
I can't be the first person to run into this problem, so I'm interested to see if others have dealt with this.
One thought that comes to might might be to rename the destinations in the foreign language documents. I have used iSharpText to extract a list of named destinations. Is it possible to iSharpText to rename the destination? Ideally, I'd have a tool that my translator could use to match the named destination names in each language, then have the tool replace the named destination names.
Another thought is to maintain a database where the translation is performed in real time; the translator still has to match of named destination names, but they are stored in a database. The program that orders Adobe Reader to open would use the English version to look up the foreign language name and then use that to open the document.
I'd also be interested in recommendations of PDF authoring tools that might make this problem easier to solve.
Please take a look at the example RenameDestinations. It's doing more than you need, because the original document itself contains links to the named destinations and for those to keep working, we need to change the action that refers to the names we've changed.
This is the part that is relevant to you:
PdfDictionary catalog = reader.getCatalog();
PdfDictionary names = catalog.getAsDict(PdfName.NAMES);
PdfDictionary dests = names.getAsDict(PdfName.DESTS);
PdfArray name = dests.getAsArray(PdfName.NAMES);
for (int i = 0; i < name.size(); i += 2) {
PdfString original = name.getAsString(i);
PdfString newName = new PdfString("new" + original.toString());
name.set(i, newName);
}
First you get the root dictionary (or catalog). You get the /Names dictionary, looking for named destinations (/Dests). In my case, I have a simple array with pairs of PdfString and references to PdfArray values. I replace all the string values with new names.
The structure I'm changing like this, is a name tree, and it usually isn't that linear. In your PDF, this tree can have branches, so you may want to write a recursive method to go through those branches. I don't have the time to write a more elaborate example, nor do I want to steal your job for you. This example should help you on your way.
Note that I keep track of the original and new names in a Map named renamed. I use this map to change the destinations of the Link annotations on the first page.
I’m late to the party, lol but this might be of benefit to others hitting on this thread from a google search.
I suggest getting Autobookmark professional, it provides a lot of tools for bookmarks, links and named destinations.
Far easier to export then use a text editor to change your names then reimport.
Just remember that if the named destinations are attached to links you will have to reset those links as they don’t automatically update.

SBJSON append new data into existing JSON file without parsing it first

I am making an app that lets the user draw on the screen in different colors and brush sizes. I am storing the info about each drawn path in a JSON file once it has been drawn to keep it out of memory. Right now I have it parsing all existing paths, then adding the new one in and writing it all back out again. I want it to simply append the new data into the JSON file without having to read it in and parse it first, that will make it so only one path is ever in memory at a time.
I am using SBJSON, the JSONWriter has a few append functions but I think you need to have the JSON string to append it to first, not the file, meaning I would have to read in the file anyway. Is there a way to do this without reading in the file at all? I know exactly how the data is structured.
It's possible, but you have to cheat a little. You can just create a stand-alone JSON document per path, and append that to the file. So you'll have something like this in your file:
{"name":"path1", "from": [0,3], "to":[3, 9]}
{"name":"path2", "from": [0,3], "to":[3, 9]}
{"name":"path3", "from": [0,3], "to":[3, 9]}
Note that this is not ONE JSON document but THREE. Handily, however, SBJsonStreamParser supports reading multiple JSON documents in one go. Set the supportMultipleDocuments property and plug it into a SBJsonStreamParserAdapter, and off you go. This also has the benefit that if you have many, many paths in your file as you can start drawing before you're finished reading the whole file. (Because you get a callback for each path.)
You can see some information on the use case here.
I'm pretty sure its not possible...what I ended up doing was reading in the JSON file as a string then instead of wasting memory changing all that into Dictionaries and Arrays, I just looked for an instance of part of the string (ex: i wanted to insert something before the string "], "texts"" showed up) where I wanted to insert data and inserted it there and wrote it back out to file.
As far as I can tell this is the best solution.

Objective-C Indexing

I'm currently trying to implement some kind of search system in a program of mine, and wanted to use an index, but I'm fairly new at Objective-C. The main idea is to have a 'search' command or text box and when I type a word, it'll show me all the items that include that word. All these 'items' will be listed in a .txt file (hopefully) in alphabetical order. Any help is appreciated.
You need to read the .txt file into an NSSet or some other collection class and you can then search it using something like:
[words filterUsingPredicate:[NSPredicate predicateWithFormat:#"SELF contains[c] 'word'"]];
(See the Predicate Guide for details).
The ideal thing, if the text file is large and you want to index by the leading characters of each entry, is to create a "dope vector" of sorts, where each entry in the dope vector contains the first few characters of the line, followed by the file offset where the line starts. Note that one dope vector entry can cover a number of file lines, since it's just serving like the index tabs in a dictionary.
But if you want to search for words within a line in your file, you're better off using a SQL database, or some KWIC scheme.

How to strip a text file into a single line, and then split that into a relevant list in python?

I'm a noob right now with pygame and I was wondering how to load a textfile, then strip that into a a single line. I believe that i would need to use the .rstrip('/n') function on my variable with the openned text file. But now, how do I turn this into a list? If I intentionally used two colons (::) to separate between my relevant pieces of information in the text file, how do I make it into a list with each list index being the contents in between two sets of ::? The purpose is to create save files in a menu GUI when closed, so is there a simpler way to save and open the contents of variables from one instance of the program to the next?
>>> "foo::bar::baz".split("::")
['foo', 'bar', 'baz']
If you just want to save structured data, however, you might want to look at either the pickle or json libraries. Both of them give ways to dump Python objects to files and then load them back out again.

Ways of reading txt file contents into forms?

At work we have a txt file with items recorded in them
The columns are typical of
Apple, *fruit
Cow, *animal
House, *thing
Tree, *plant
Is it possible too read through this txt file to check if apple already exists. I namely want to build a preventative for adding double items...
I think you have to read the file in to a list of objects.
In this case, the object will get 2 properties: Type and Category.
After that you can easily perform comparisons etc on the list itself.
Edit:
Some stuff for you to read;
Reading a file: http://msdn.microsoft.com/en-us/library/system.io.streamreader.readline.aspx
Dictionary: http://msdn.microsoft.com/en-us/library/xfhwa508.aspx
I'm unable to write down an example now, but those are the ingredients.