SQLITE: Single table with 20 columns vs 20 key-value tables - sql

I'm developing system that manages objects consisting of components. What would the best way to store them in SQLITE database from performance point of view? if there are 20 component types
each component is a blob 1-10Kb size. Typically each object consists of 4-6 different components.
I can see two options:
Implement it as one table with key and 20 blob columns
Use 20 tables with key and single blob column
The only queries I will make to database are: get component data by id, write data and remove data.
PS: object class looks like this:
class Entity
{
Component *components[20];
}
usually components array have 4-6 not null pointers

You will probably want an Entity Attribute Value structure to store the BLOBs.
CREATE TABLE myObjectComponents (
objectID INTEGER, -- Entity
componentTypeID INTEGER, -- Attribute
componentBLOB BLOB, -- Value
PRIMARY KEY objectID, componentTypeID
)
You can then also add a traditional "myObject" table with other non-blob values (Such as it's identity column, owner, name, creation and modified timestamps, etc, etc), and enforce integrity with foreign key constraints.
EAV tables are very flexible and good for fast look-up of the Value column.
They're very poor in the other direction; "given a Value (or combination of values), which Entities have it?" But you don't seem likely to be searching a BLOB field.
You may want to read more about the merits and dis-advantages of EAV, there are plenty of references on-line.
The benefit of this structure in your case is that each row only has one BLOB and (possibly more importantly) it isn't sparsely populated; You won't have rows with capacity for 20 BLOBs but only use, for example, four of those. This will make it easier to transfer the relevant rows around in memory.

Related

Resolve related entities by unique string column value (SQL, Entity Framework Core)

While refactoring the code for a project I am currently working on I have been wondering about whether a certain kind of entity insert/update into an SQL database could be done more elegantly.
TLDR of the module I am working on: A large amount of data is regularily synchronized from a weakly structured data store (Google Sheets) into a relational database. The source store is database-agnostic and contains no database IDs or foreign keys. The database (EF Core code first) uses standard incremental integer IDs and foreign keys.
My problem: The sheets in the Google Sheets largely resemble the database tables. Related entities are referenced by a (by design) unique string value ("Name"). These natural names need to be resolved to the artificial foreign keys when inserting and updating to the database. So far, this process is done by loading the IDs and names of all related entities into memory, finding the matching entity, and then saving the dependant entity with the found foreign keys to the database.
While this process works well, it requires either many more database roundtrips (if the related entity is looked up one by one) or a decent amount of memory to keep all [Name, Id] pairs in memory, and also rendering the database index on the name column useless as the searching is done in memory.
As an example, an excerpt from my data model:
Table Items, with columns
Id (Primary Key, unique, clustered index)
Name (String, unique, indexed)
Table Locations, with columns
Id (Primary Key, unique, clustered index)
Name (String, unique, indexed)
Table PlacedItems (Specific Items, which are placed in a specific location), with columns:
Id (Primary Key, unique, clustered index)
LocationId (Foreign Key referencing Locations, indexed)
ItemId (Foreign Key referencing Items, indexed)
The Google Sheet which I import has the following sheets:
Sheet items, with column "Name" (string)
Sheet locations, with column "Name" (string)
Sheet placed_items, with columns "LocationName" (string) and "ItemName" (string)
My question:
Assuming, all Items and Locations have been imported, and have been given auto-incremented IDs, how do I most elegantly / efficiently insert all the Placed Items?
The most intrigueing kind of solution would be some way to resolve the Names to Ids entirely on the database, respectively in one round trip, without having to load any item / location data into memory.
To my naïve self it seems like this is a type of problem that I am not the first that tries to solve it, and which has a smart solution already that I am just not seeing yet.
Additional points
I'd like to stick with the artificial IDs in the database. While the strings are unique, they are values designed for being edited by humans and are thus readibility is the prime concern for these strings, and not any kind of efficiency as the DB-generated IDs probably are. When doing reads and any further operations on the data, the IDs are used as primary source for identity and uniqueness of records.
I don't except Entity Framework Core to have a solution to this problem (but surprise me if it has!), so I am also very interested in solutions with raw SQL, stored procedures and similar

Is json suitable for holding values coming from dynamic columns

Is json suitable for holding values coming from dynamic columns?
I want to store data about documents where the structure is known in advance, e.g. Name, Name, Year, Place, etc. For now, I have defined these fields as columns in a table in MS SQL.
However, I would like to store data about "dynamic" documents, where the user himself will select a field from a certain pool and insert a value. For now, I store this data as json in one column. For example once will be something like this:'{"Name":"John Doe","Place":"Chicago"}'
and other times it can only be: '{"Name": "John Doe"}'
I wonder how to unify this data. I thought that data from first type of documents (with a known number of columns) should also be stored in json. But I don't know if this is a good approach with large amounts of data, eg 100,000 records.
First, 100,000 rows is not a large number of columns. It is doubtful that you are even talking about a gigabyte of data.
Both XML and JSON incur overhead for storing field names in the data. If you have lots of repeated field names, then lots of redundant field names are being stored. And bigger rows slow down queries.
JSON and XML also have challenges in verifying the field names. This can be handled through the application or constraints. That said, they can be quite useful when the attributes are rarely used.
Your sample data simply suggests NULLable columns. You can have name and place. If there is no place then the value is NULL. Given that you have a fixed pool, there is a good change that this structure is the simplest and most efficient. The only downside is that adding a new column requires adding a column to a table. And that can be an expensive operation.
An alternative is an EAV model, which is mentioned in the comments. This solves the problem of repeating the names, because you can use ids instead. So, you could have:
create table optionalFields (
optionFieldId int identity(1, 1) primary key,
name varchar(255)
);
create table userOptionalFields (
userOptionalFieldId int identity (1, 1) primary key,
userId int references users(userId),
optionalFieldId int references optionalFields(optionalFieldId),
value varchar(255)
);
The downside to an EAV model is that it is simplest when all the values are strings, and that can be a little tricky if some of the values are numbers or dates. On the positive side, the database ensures that the fields are valid.
The choice between the different data models depends on factors such as:
The data type of the values.
The total number of fields.
How often new fields are added.
Whether the fields (for a given user) are updated and if so, if they are updated one-by-one or all-at-once.
How common fields are for a given user.
How familiar you are with XML and JSON.
Whether field names have synonyms (for instance "FullName for "Name").
Whether field names ever change. For instance, might "Name" suddenly become "FullName"?
And no doubt other issues as well.

How do I structure a generic item that can have a relationship with different tables?

In my example, I have a watch, which is an indication a user wants notifications about events on a different item, say a group and an organization.
I see two ways to do this:
Have a groupwatch resource, with a groupwatch table, with id,user,group (group FK to group resource and table); and a orgwatch resource, with a orgwatch table, with id,user,organization (org FK to organization resource and table)
Have a generic watch resource, with a watch table, with id,user,type,typeid. type is one of group or organization, and typeid is the ID of the group or organization being watched.
Since both of them are watches, it seems a waste to have two different tables and resources to watch 2 different objects. It gets worse if I start watching 4, 5, 6, 20, 50 different types of resources.
On the other hand, a foreign key relationship appears impossible if I just have a generic typeid, which means that my database (if relational) and my framework (activerecord or anything else) cannot enforce it correctly.
How do I best implement this type of "association to different types of record/table for each record in my table"?
UPDATE:
Are my only choices for doing this:
separate tables/resources for each watch type, which enables the database to enforce relational integrity and do joins
single table for all watches, but I will have to enforce relational integrity and do joins at the app level?
If you add a new type of resource once every six months, you may want to define your tables in such a way that adding new resources involves changing data definitions. If you add a new resource type every week, you may want to make your data definitions stay the same when you add new types. There's a downside to either choice.
If you do choose to define table in such a way that the types are visible in the table structure, there are two patterns often used with type/subtype (aka class/subclass) situations.
One pattern has been called "single table inheritance". Put data about all the types in a single table, and leave some columns NULL wherever they do not apply.
Another pattern has been called "class table inheritance". Define one table for the superclass, with all the data that is common to all the types. Then define tables for each subtype (subclass) to contain class specific data. Make the primary key of the subtype tables a duplicate of the primary key in the supertype table, and also declare it as a foreign key that references the primary key of the supertype table. It's going to be up to the app, at insert time, to replicate the value of the primary key in the supertype table over in the subtype table.
I like Fowlers' treatment of these two patterns.
http://martinfowler.com/eaaCatalog/classTableInheritance.html
http://www.martinfowler.com/eaaCatalog/singleTableInheritance.html
This matter of sharing primary keys has a few beneficial effects.
First, it enforces the one-to-one nature of the ISa relationships.
Second, it makes it easy to find out whether a given entry belongs to a desired subtype, by just joining with the subtype table. You don't really need an extra type field.
Third, it speeds up the joins, because of the index that gets built when you declare a primary key.
If you want a structure that can adapt to new attributes without changing data definitions, you can look into E-A-V design. Be careful, though. Sometimes this results in data that is nearly impossible to use, because the logical structure is so obscure. I usually think of E-A-V as an anti-pattern for this reason, although there are some who really like the results they get from it.

What is the best way to store categorical references in SQL tables?

I'm wanting to store a wide array of categorical data in MySQL database tables. Let's say that for instance I want to to information on "widgets" and want to categorize attributes in certain ways, i.e. shape category.
For instance, the widgets could be classified as: round, square, triangular, spherical, etc.
Should these categories be stored within a table to reference them best from an application? Another possibility, I would imagine, would be to add a column to widgets that contained a shape column that contained a tiny int. That way my application could search shapes by that and then use a coordinating enum type that would map the shape int meanings.
Which would be best? Or is there another solution that I'm not thinking of yet?
Define a category table for each attribute grouping. IE:
WIDGET_SHAPE_TYPE_CODES
WIDGET_SHAPE_TYPE_CODE (primary key)
DESCRIPTION
Then use a foreign key reference in the WIDGETS table:
WIDGETS
WIDGET_ID (primary key)
...
WIDGET_SHAPE_TYPE_CODE (foreign key)
This has the benefit of being portable to other databases, and more obvious relationships which means simpler maintenance.
What I would do is start with a Widgets table that has a category field that is a numeric type. If you also use the category table the numeric category is a foreign key that relates to a row in the category table. A numeric type is nice and small for better performance.
Optionally you can add a category table containing a a primary key numeric value, and a text description. This matches up the numeric value to a human friendly text value. This table can be used to convert the numbers to text if you just want to run reports directly from the database. The nice thing about having this table is you don't need to update an executable if you add a new category. I would add such a table to my design.
MySQL's ENUM is handy but it stores int the table as a string so it uses up more space in the table than is really needed. However it does have the advantage of preventing values that are not recognized from being stored. Preventing the storage of invalid numeric values is possible, but not as elegantly as ENUM. The other problem with ENUM is because it is regarded as a string, the database must do more work if you are selecting by the value because instead of comparing a single number, multiple characters have to be compared.
If you really want to you can have an enumeration in your code that coverts the numeric category back into something more application code friendly, but you are making your code more difficult to maintain by doing this. However it can have a performance advantage because fewer bytes have to be returned when you run a query. I would try to avoid this because it requires updating the application code every time a category is added to the database. If you really need to squeeze performance out of the database you could select the whole category table, and select the widgets table and merge them in application code, but that is a rare circumstance since the DB client almost always has a fast connection to the DB server and a few more bytes over the network are insignificant.
I think the best way is use ENUM, for example thereare pre defined enum type in mysql - http://dev.mysql.com/doc/refman/5.0/en/enum.html

How to add user customized data to database?

I am trying to design a sqlite database that will store notes. Each of these notes will have common fields like title, due date, details, priority, and completed.
In addition though, I would like to add data for more specialized notes like price for shopping list items and author/publisher data for books.
I also want to have a few general purpose fields that users can fill with whatever text data they want.
How can I design my database table in this case?
I could just have a field for each piece of data for every note, but that would waste a lot of fields and I'd like to have other options and suggestions.
There are several standard approaches you could use for solving this situation.
You could create separate tables for each kind of note, copying over the common columns in each case. this would be easy but it would make it difficult to query over all notes.
You could create one large table with many columns and some kind of type field which would let you know which type of note it is (and therefore which subset of columns to use)
CREATE TABLE NOTE ( ID int PRIMARY KEY, NOTE_TYPE int, DUEDATE datetime, ...more common fields, price NUMBER NULL, author VARCHAR(100) NULL,.. more specific fields)
you could break your tables up into a inheritance relationship something like this:
CREATE TABLE NOTE ( ID int PRIMARY KEY, NOTE_TYPE int, DUEDATE datetime, ...more common fields);
CREATE TABLE SHOPPINGLITITEM (ID int PRIMARY KEY, NOTE_ID int FORIENKEY NOTE.ID, price number ... more shopping list item fields)
Option 1 would be easy to implement but would involve lots of mostly redundant table definitions.
Option 2 would be easy to create and easy to write queries on but would be space inefficient
And option 3 would be more space efficient and less redundant but would possibly have slower queries because of all the foreign keys.
This is the typical set of trade-offs for modeling these kinds of relationships in SQL, any of these solutions could be appropriate for use case depending non your performance requirements.
You could create something like a custom_field table. It gets pretty messy once you start to normalize.
So you have your note table with it's common fields.
Now add:
dynamic_note_field
id label
1 publisher
2 color
3 size
dynamic_note_field_data
id dynamic_note_field_id value
1 1 Penguin
2 1 Marvel
3 2 Red
Finally, you can relate instances of your data with the fields they use through
note_dynamic_note_field_data
note_id dynamic_note_field_data_id
1 1
1 3
2 2
So now we've said: note_id 1 has two additional fields. The first one has a value "Penguin" and represents a publisher. The second one has a value of "Red" and represents a color.
So what's the point of normalizing it this far?
You're not wasting space adding fields to every item (you relate a note with it's additional dynamic field via the m2m table).
You're not storing redundant labels (you may continue to store redundant data however as the same publisher is likely to appear many times... this aspect is extremely subjective. If you want rich data about your publishers you typically want to take the step of turning them into their own entity rather than an ad-hoc string. Be careful when making this leap because it adds an extra level of hairiness to the db. Evaluate the use case accordingly.
The dynamic_note_field acts as your data definition. If you're interested in answering a question such as "what are the additional fields I've created" this lets you do it easily without searching all of your dynamic_note_field_data. Eventually, you might add extra info to this table such as a type field. I like to create this separation off the bat, but that might be a violation of the YAGNI principle in your case.
Disadvantages:
It's not too bad to search for all notes that have a publisher, where that publisher is "Penguin".
What's tricky is something like "Find any note with a value of 'Penguin' in any field". You don't know up front which field's your searching. At this point you're better off with a separate index that's generated alongside your normalized db data which acts as the point of truth. Again, the nice thing about normalization is that you maintain the data in a very lossless, non-destructive state.
For data you want to store but does not have to be searchable, another option is to serialize it to/from JSON and store it in a TEXT column. This gives you arbitrary structure, but you cannot readily query against those values.
Yet another option is to dump SQLite and go with an object database. I seem to recall there are one or two working for Android. I have not tried any of these, however.
Just create a small table which contains the common fields of all your notes.
Then a table for each class of special notes you have, that that contains all the extra fiels plus a reference on your first table.
For each note you will enter, you create a row in your main table (that contains the common fields) and a row in your extra table that contains the extra fields, and a reference to the row in your main table.
Then you will just have to make a join in you request.
With this solution :
1)you have a safe design (can't access fields that are not part of your note)
2)your db will be optimized