large collection of photos for IR software testing - testing

I'm looking for large (preferably 50K+) collection of photos that I could use for testing image recognition software. So preferably photos of objects. I'm fine with album covers, movie posters or anything like that.
Any suggestions?

The ImageNet database (http://www.image-net.org/ - seems to be down at the time I'm writing this, but i think that's temporary) is something you could look into, if it comes back up. It has literally millions of labeled images, seperated into a hierarchy of classes (you don't have to download the complete set).

What about a google search like this?
Another google search let me find this one. Lots of objects there.

Related

Fetching all pictures of certain artist on mediawiki

Currently I'm looking for a way to fetch URLs of paintings on mediawiki that is authored by Albrecht Durer.
Can you point me to a some explanations, is there any API like "give me all images where artist is Albrecht Durer"?
I have found an imageinfo (http://www.mediawiki.org/wiki/API:Properties#imageinfo_.2F_ii), but didn't find how to filter by artist.
There isn't a great way to do that. The structured media data project aims at providing exactly this kind of capability, but it is still ways off.
Right now, your best bet is using the category system. Category:Paintings by Albrecht Dürer and its subcategories contain the images you are looking for, and you can use the categorymembers API as a generator for imageinfo to fetch the URL's. There is no way to get a recursive list though, so you will have to recurse into subcategories manually. To make it worse, the category graph is not guaranteed to be a tree, so you will have to implement things like duplicate filtering and cycle detection.
If the wiki in question is Wikimedia Commons, there are various external tools which can help, such as CatScan or catgraph.

Should I use SQLite to add this feature?

I need your advise on this, I'm currently developing a kinda family application.(Everything relates to the family)
I would like to add something similar to a family tree or the family members. (Using Table view) and each member/element on the list will have its own "view" containing a 50 words biography about him and his photo.
Since, I'm still new to iOS development and I still haven't worked with SQLite yet. Do you guys think SQLite is the best for this job? How about the photos. Is there a way to put a thumbnail photo for each member?
SQLite does this well, though Core Data is generally considered the preferred iOS technology. There are a few situations where I might advise using SQLite over Core Data, but you haven't outlined any app requirements that would make me lean that direction.
If you do your own SQLite, though, I'd suggest you use something like FMDB, so you spare yourself the hassles of writing SQLite code.
And, as I mentioned in the comment of another answer to this question, regarding images in Core Data or SQLite, you face a significant performance hit for that. If you're dealing with small images (e.g. thumbnails), it's fine, but if you're dealing with a lot of large images, you really might want to consider storing them in some directory structure under the Documents folder (and then store relative path names in your database). It not architecturally elegant to take the images out of the database and use the Documents folder, but for performance reasons you might want to do precisely that.
No. I would use CoreData for this. CoreData gives you the graphical modelling tools to build an object model and handles all the tedious housekeeping required to persist your object graph to disk.
The photos you would store as conventional files on disk and be modelled by a CoreData object that maintains a reference (URI or file path) to the photo.
I would use CoreData for this, it boils down to an SQLite database, but Apple have added their own wrapper round the SQLite database, making it really simple to use.
There are a number sample apps on the Developer Site as well as numerous Tuts available just by searching the phrase "CoreData example" in google, the link here is ro Raywenderlich which is a good place to start. I think once you go through this blog you'll be using CoreData more and more when you need to store things like this.
With regards to the thumbnail storage I would store those on the device and save the path to the file in the Database.
Yes you can use SQLite for this; in fact it's ideal for holding a family tree given its relational nature.
The photo data can be serialised into a byte stream (NSData *) and stored in a column as a blob.
A database has the huge pro, that you can keep everything stored at one place.
You could (not that I recommend) also use a folder-structure to specify the data like /images/, /words/, /people/ and use the same name for everyone throughout the folders (tim.jpg, tim.txt, tim.dat )
Or use a small database to store everything in different tables all with relation to your "family(_members)" table.
You can also store images in a database, mostly as a blob (or base64 encoded or or or... yuck)
I don't know how well iOS stuff handles those database types of SQLite but you should be better of using a database for that.
You have a number of options here.
If you are storing all of the info within the application itself (ie. the details aren't being fetched from the web somewhere), SQLite (as a CoreData backend) would probably be a good idea. Read up on using CoreData so that you don't end up reinventing the wheel, and so that your implementation provides a smooth scrolling experience that iPhone users expect.
The photos, however, need a different means of storage/retrieval.
A common technique is to implement a 2-level cache system. What this would entail is storing the pictures in individual files, but keeping some of them in-memory after they are retrieved for speed. You could then have a class that looks something like the following:
#interface ThumbnailManager : NSObject
{
id<ImageCache> _imageCache; // You make this.
}
- (UIImage *)imageForFamilyMemberWithName:(NSString *)name;
#end
That's similar to something I would do in your position.
Good luck!

Objective-C best choice for saving data

I'm currently looking for the best way to save data in my iPhone application; data that will persist between opening and closing of the application. I've looked into archiving using a NSKeyedArchiver and I have been successful in making it work. However, I've noticed that if I try to save multiple objects, they keep getting overwritten every time I save. (Essentially, the user will be able to create a list of things he/she wants, save the list, create a few more lists, save them all, then be able to go back and select any of those lists to load at a future date.)
I've heard about SQLite, Core Data, or using .plists to store multiple arrays of data that will persist over time. Could someone point me in the best direction to save my data? Thanks!
Core Data is very powerful and easy to use once you get over the initial learning curve. here's a good tutorial to get you started - clicky
As an easy and powerful alternative to CoreData, look into ActiveRecord for Objective-C. https://github.com/aptiva/activerecord
I'd go with NSKeyedArchiver. Sounds like the problem is you're not organizing your graph properly.
You technically have a list of lists, but you're only saving the inner-nested list.
You should be added the list to a "super" list, and then archiving the super-list.
CoreData / SQL seems a bit much from what you described.
Also you can try this framework. It's very simple and easy to use.
It's based on ActiveRecord pattern and allow to use migrations, relationships, validations, and more.
It use sqlite3 only, without CoreData, but you don't need to use raw sql or create tables manually.
Just describe your iActiveRecord and enjoy.
You want to check out this tutorial by Ray Wenderlich on Getting started with CoreData. Its short and goes over the basics of CoreData.
Essentially you only want to look at plists if you have a small amount of data to store. A simple list of settings or preferences. Anything larger than that and it breaks down specifically around performance. There is a great video on iTunesU where the developers at LinkedIn describe their performance metrics between plists and CoreData.
Archiving works, but is going to be a lot of work to store and retrieve your data, as well as put the performance challenge on your back. So I wouldn't go there. I would use CoreData. Its extremely simple to get started with and if you understand the objects in this stack overflow question then you know everything you need to get going.

global variables, arrays, and search results

I am trying to write an app that searches a website, and takes all of the results and puts them into a customized table. I am an Objective-C and iPhone SDK noob, and am hoping that this logic is what I am trying to accomplish:
1) Searching multiple search engines and pulling all of the data off of each website, storing each into a different array (for example: Searching Google, Yahoo, and Bing for "Shoes", and taking all of the different search results, hyperlinks and all, and storing them into three different arrays)
2) Pulling the data out of each array, and putting into a table (Table view in Interface Builder)
I am assuming that I need to declare global variables, so that they can be called from different classes......right?
What's the syntax for doing this?
How do I set this up in IB?
Did I bite off more than I can chew for this first app?
Thanks for your help!
Aaron, I also think you're biting off more than you can chew WRT a single question on SO, but let me point you to a resource I wrote on a similar topic about how to structure your program.
As an Obj-C noob, you're going to need to take extra care to remember the Model-View-Controller pattern. Extracting data from a web site is a bit of work - and you want to keep that very separate from your display and control code.
Have a clean API model that extracts and sorts data, and have a clear view controller class that reads data from the API.
My advice is to write the whole app in psuedo-code first and try out your thinking on us.

optical character recognition of PDFs of parliamentary debates

For a contract work, I need to digitalize a lot of old, scanned-graphic-only plenary debate protocol PDFs from the Federal Parliament of Germany.
The problem is that most of these files have a two-column format:
Sample Protocol http://sert.homedns.org/img/btp12001.png
I would love to read your answer to my following questions:
How I can split the two columns before feeding them into OCR?
Which commercial, open-source OCR software or framework, do you recommend and why?
Please note that any tool, programming-language, framework etc. is all fine. Don't hesitate recommend esoteric products, libraries if you think they are cut for the jub ^__^!!
UPDATE: These documents are already scanned by the parliament o_O: sample (same as the image above) and there are lots of them and I want to deliver on the contract ASAP so I can't go fetch print copies of the same documents, cut and scan them myself. There are just too many of them.
Best Regards,
Cetin Sert
Cut the pages down the middle before you scan.
It depends what OCR software you are using. A few years ago I did some work with an OCR API, I cant quite remember the name but I think there's lots of alternatives. Anyway this API allowed me to define regions on the page to OCR, If you always know roughly where the columns are you could use an SDK to map out parts of the page.
I use Omnipage 17 for such things. It has an batchmode too, where you can put the documents in an folder, where they was grabed, and put the result into another.
It autorecognit the layout, include columns, or you can set the default layout to columns.
You can set many options how the output should look like.
But try a demo, if it goes correct. I have at the moment problems with ligaturs in some of my documents. So words like "fliegen" comes out as "fl iegen" so you must spell them.
Take a look at http://www.wisetrend.com/wisetrend_ocr_cloud.shtml (an online, REST API for OCR). It is based on the powerful ABBYY OCR engine. You can get a free account and try it with a few of your images to see if it handles the 2-column format (it should be able to do it). Also, there are a bunch of settings you can play with (see API documentation) - you may have to tweak some of them before it will work with 2 columns. Finally, as a solution of last resort, if the 2-column split is always in the same place, you can first create a program that splits the input image into two images (shouldn't be very difficult to write this using some standard image processing library), and then feed the resulting images to the OCR process.