Search database using sqlite versus NSPredicate - sql

I'm building a database in sqlite with multiple tables. It will work like a tag based search where CARS will be compared based on how many TAGS match between them. There will also be one layer used to categorize items called MANUFACTURER. So a typical use case would be user selects MANUFACTURER1 (lets say Ford) as an input and MANUFACTURER2 (lets say Toyota) as an output, enters a CAR [database compares TAGS to CARS between the two MANUFACTURERS] and fectches a CAR recommendation of MANUFACTURER2. I am using Core Data with entities of each, but this does not involve newly created objects, just what's in the original sql database.
My question is - is it better to generate the search with SQLite code, or NSPredicate/NSCompoundPredicate? Are there performance differences?

If you use Core Data with a SQlite store, the NSFetchRequest with a specific predicate will be resolved at the sql level, so you don't need to add nothing to it.
Core Data will abstract this for you. If you use Core Data you cannot use your own query. Just stick with NSFetchRequests and NSPredicates.
Maybe what you need it's to import the db you have in the actual Core Data store.
Maybe I cannot understand your question but what's your goal?

Related

Automatically connect SQL tables based on keys

Is there a method to automatically join tables that have primary to foreign relationship rather then designate joining on those values?
The out and out answer is "no" - no RDBMS I know of will allow you to get away with not specifying columns in an ON clause intended to join two tables in a non-cartesian fashion, but it might not matter...
...because typically multi tier applications these days are built with data access libraries that DO take into account the relationships defined in a database. Picking on something like entity framework, if your database exists already, then you can scaffold a context in EF from it, and it will make a set of objects that obey the relationships in the frontend code side of things
Technically, you'll never write an ON clause yourself, because if you say something to EF like:
context.Customers.Find(c => c.id = 1) //this finds a customer
.Orders //this gets all the customer's orders
.Where(o => o.date> DateTIme.UtcNow.AddMonths(-1)); //this filters the orders
You've got all the orders raised by customer id 1 in the last month, without writing a single ON clause yourself... EF has, behind the scenes, written it but in the spirit of your question where there are tables related by relation, we've used a framework that uses that relation to relate the data for the purposes thtat the frontend put it to.. All you have to do is use the data access library that does this, if you have an aversion to writing ON clauses yourself :)
It's a virtual certaintythat there will be some similar ORM/mapping/data access library for your front end language of choice - I just picked on EF in C# because it's what I know. If you're after scouting out what's out there, google for {language of choice} ORM (if you're using an OO language) - you mentioned python,. seems SQLAlchemy is a popular one (but note, SO answers are not for recommending particular softwares)
If you mean can you write a JOIN at query time that doesn't need an ON clause, then no.
There is no way to do this in SQL Server.
I am not sure if you are aware of dbForge; it may help. It recognises joinable tables automatically in following cases:
The database contains information that specifies that the tables are related.
If two columns, one in each table, have the same name and data type.
Forge Studio detects that a search condition (e.g. the WHERE clause) is actually a join condition.

With Core Data, should I use one entity or split into smaller ones?

For example, I have attributes for an entity like this:
category_id
data
I don't have complicated operations. I only access the records separately by category_id == X. I am wondering if I should use one entity to store all records or split it into N entities where each one represents one category? Should I just consider and design it just like a SQL database?
This question is not really appropriate for SO (too broad and opinion-based), but here goes...
Data modeling should generally be independent of the underlying representation. If different categories are different entities, model them that way. It sounds to me like this is not the case, and if that's so, leave them as one entity.

SQL vs NoSQL for data that will be presented to a user after multiple filters have been added

I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)

Many to Many relationship for single entity

I'm currently writing my first project using core data, and am having trouble working out how to query the relationship between some of my data.
In sql language, i have a Country table, which joins to a CountryLink M-M table containing the following fields:
countryId1
countryId2
bearing
What would be the correct way to model this in Core Data?
So far i have set up a single Country entity and a CountryLink entity (containing only a bearing field) and have added two 1-to-Many relationships from Country to CountryLink ('CountryLink1' and 'CountryLink2').
I've run the project and looked at the Sqlite db structure produced by Core Data (found here, using this sqlite gui), and the M-M join table seems correct (it contains the bearing, CountryLink1 and CountryLink2 fields), but i'm not sure how i would go about carrying out a fetch request for a single Country NSManagedObject to return an array of related Countries and their bearings?
Any help or related links would be much appreciated.
Thanks, Ted
First a word of warning:
Core Data is not SQL. Entities are not tables. Objects are not rows. Columns are not attributes. Core Data is an object graph management system that may or may not persist the object graph and may or may not use SQL far behind the scenes to do so. Trying to think of Core Data in SQL terms will cause you to completely misunderstand Core Data and result in much grief and wasted time.
See the Tequilla advice
Now, forgetting SQL and thinking in object graphs, your entities would look something like this:
Country{
someAttribute:string // or whatever
countryLinks<-->>CountryLink.country
}
CountryLink{
countryID1:string // or whatever
countryID2:string // or whatever
country<<-->Country.countryLinks
}
As you add Country and CountryLink objects you add them to the relationships as needed. Then to find CountryLink objects related to a specific Country object, you would perform a fetch on the Country entity for Country objects matching some criteria. Once you have that object, you simply ask it for the CountryLink objects in its countryLinks relationship. And your done.
The important thing to remember here is that entities in combination with managedObjects are intended to model real-world objects, conditions or events and the relationship between the same. e.g. a person and his cars. SQL doesn't really model or simulate, it just stores.

Simulating variable column names in sqlite

I want to store entries (a set of key=>value pairs) in a database, but the keys vary from entry to entry.
I thought of storing with two tables, (1) of the keys for each entry and (2) of the values of specific keys for each entry, where entries share a common id field in both tables, but I am not sure how to pull entries as a key=>value pairs in sql with this sort of configuration.
Is there a better method? If this is not possible in sqlite, is it possible in mysql? Thanks!
It sounds like you are looking for the Entity-Attribute-Value model.
Alternatives are to create different tables for different types of entities, or to have a table with a column for every possible key and set the value to NULL for entities that don't have that key.
You might want to take a look at Bill Karwin's presentation SQL Antipatterns where he covers some of the pros and cons of the EAV model and suggests possible alternatives. The relevant part starts from slide 16.
#Mark Byers is right, this is the EAV model. You should read Bad CaRMa before you go down that dark path. It's a story of how this database design practically destroyed a company.
In a relational database, every row in a relation must include the same columns. That's part of the definition for a relation. This is true in SQLite, MySQL, or any other relational database.
Also see my presentation Practical Object-Oriented Models in SQL or my book SQL Antipatterns, in which I show the problems caused by the EAV model.
If you need variable columns per entity, you need a non-relational database. There are document-oriented databases like CouchDB or MongoDB that are catching on in popularity.
Or try Berkeley DB if you want an embeddable single-user solution like SQLite.