SQL: Modelling template inheritance - sql

I wanted to ask whether it is possible to model a templated datastructure which can be overriden if necessary.
Suppose we have a list with the following items:
Template List
Item 1 Position 0
Item 2 Position 1
Item 3 Position 2
Item 4 Position 3
Now I want to create a list which uses Template List as a base, but modifies some parts of it:
Concrete List, based on Template List
Item 1 Position 0 // Inherited from Template List
Item 5 Position 1 // New and only available in Concrete List
Item 4 Position 2 // Inherited from Template List, but with a different position
Item 3 Position 3 // Inherited from Template List, but with a different position
In this list Item 2 from Template List is missing should not be a part of the resulting list.
Is it possible to model these relations in SQL? (We are using PostgreSQL)

It's possible to do something like what you want, but it's not necessarily a good solution or what you need. What you're asking for looks like a metamodel, but relational databases were designed for first-order logical models, and while SQL can go beyond that somewhat, it's usually better not to go too abstract.
That said, here's an example. I assumed the identity of list items were position or slot-based.
CREATE TABLE template_list (
template_list_id SERIAL NOT NULL,
PRIMARY KEY (template_list_id)
);
CREATE TABLE template_list_items (
template_list_id INTEGER NOT NULL,
slot_number INTEGER NOT NULL,
item_number INTEGER NOT NULL,
PRIMARY KEY (template_list_id, slot_number),
FOREIGN KEY (template_list_id) REFERENCES template_list (template_list_id)
);
CREATE TABLE concrete_list (
concrete_list_id SERIAL NOT NULL,
template_list_id INTEGER NOT NULL,
FOREIGN KEY (template_list_id) REFERENCES template_list (template_list_id),
UNIQUE (concrete_list_id, template_list_id)
);
CREATE TABLE concrete_list_items (
concrete_list_id INTEGER NOT NULL,
template_list_id INTEGER NOT NULL,
slot_number INTEGER NOT NULL,
item_number INTEGER NULL,
PRIMARY KEY (concrete_list_id, slot_number),
FOREIGN KEY (concrete_list_id, template_list_id) REFERENCES concrete_list (concrete_list_id, template_list_id),
FOREIGN KEY (template_list_id, slot_number) REFERENCES template_list_items (template_list_id, slot_number)
);
Now, to get the items in a concrete list, you would use a query like:
SELECT c.concrete_list_id, x.slot_number, x.item_number
FROM concrete_list c
LEFT JOIN (
SELECT ci.concrete_list_id,
COALESCE(ci.template_list_id, ti.template_list_id) AS template_list_id,
COALESCE(ci.slot_number, ti.slot_number) AS slot_number,
COALESCE(ci.item_number, ti.item_number) AS item_number
FROM concrete_list_items AS ci
FULL JOIN template_list_items AS ti ON ci.template_list_id = ti.template_list_id
AND ci.slot_number = ti.slot_number
) x ON c.concrete_list_id = x.concrete_list_id OR c.template_list_id = x.template_list_id;
Here's a SQL fiddle for demonstration. Note that I replaced the serial types with integers and hardcoded values for simplicity of demonstration.

Related

SQL Table with mixed data type field Best Practice

everyone,
I would like an advice on best practice for creating realtional database structure with field having mixed data type.
I have 'datasets' (some business objects) and I would like to have list of parameters, associated with each dataset. And those parameters can have different types - strings, integers, float and json values.
What would be the best structure for the parameters table? Should I have single column with string type?
CREATE TABLE param_desc (
id serial PRIMARY KEY,
name varchar NOT NULL,
param_type int -- varchar, int, real, json
);
CREATE TABLE param_value (
id serial PRIMARY KEY,
dataset_id int NOT NULL,
param int NOT NULL REFERENCES param_desc (id),
value varchar NOT NULL,
CONSTRAINT _param_object_id_param_name_id_time_from_key UNIQUE (dataset_id, param)
);
The problem with such approach is that I can't easily cast value for some additional conditions. For example, I want to get all datasets with some specific integer parameter, having int value more than 10. But if I write where clause, the casting will return error, as other non-integer parameters can't be casted.
SELECT dataset_id FROM vw_param_current WHERE name = 'priority' AND value::int > 5
Or should I have 4 separate columns, with 3 of them being NULL for every row?
Or should I have 4 different tables?

SQLITE3: find IDs across multiple tables

I would like to do analysis of what codes appear in multiple tables under certains conditions. However I don't think the database schema suits the task very well but maybe there's something I don't know about that can help me. Here's a simplified schema:
CREATE TABLE "batchDescription" (
id INTEGER NOT NULL,
name TEXT NOT NULL UNIQUE,
PRIMARY KEY (id)
);
CREATE TABLE "simulationDetails" (
id INTEGER NOT NULL,
ko_index_id INTEGER NOT NULL,
batch_description_id INTEGER NOT NULL,
data1 REAL NOT NULL,
data2 INTEGER NOT NULL,
PRIMARY KEY (id)
FOREIGN KEY(ko_index_id) REFERENCES "koIndex" (id)
FOREIGN KEY(batch_description_id) REFERENCES "batchDescription" (id)
);
CREATE TABLE "koIndex" (
id INTEGER NOT NULL,
number_of_kos INTEGER NOT NULL,
PRIMARY KEY (id)
);
CREATE TABLE "1kos" (
ko_index_id INTEGER NOT NULL,
ko1 INTEGER NOT NULL,
PRIMARY KEY (ko_index_id)
FOREIGN KEY(ko_index_id) REFERENCES "koIndex" (id)
);
CREATE TABLE "2kos" (
ko_index_id INTEGER NOT NULL,
ko1 INTEGER NOT NULL,
ko2 INTEGER NOT NULL,
PRIMARY KEY (ko_index_id)
FOREIGN KEY(ko_index_id) REFERENCES "koIndex" (id)
);
CREATE TABLE "3kos" (
ko_index_id INTEGER NOT NULL,
ko1 INTEGER NOT NULL,
ko2 INTEGER NOT NULL,
ko3 INTEGER NOT NULL,
PRIMARY KEY (ko_index_id)
FOREIGN KEY(ko_index_id) REFERENCES "koIndex" (id)
);
This goes up to table "525kos" which has ko1 to ko525 in it - ko1 to ko525 are IDs that are primary keys in a table not shown here. I want to do an analysis of how often certain IDs are present under certain conditions. Here is a simple example to illustrate:
I would like to like to count the amount of times a certain ID (let's say 127) (in any koX column) in the "13kos" table occurs when simulationDetails.data1 not equal to 0. I would do this on a database called ko.db from the bash command line like:
for ko_idx in {1..13}; do sqlite3 ko.db "select count(ko${ko_idx}) from '13kos' where ko${ko_idx} = 127 and ko_index_id in (select ko_index_id from simulationDetails where data1 != 0);"; done
Already this is slow and inefficient but is simple compared to what I would like to do. What if I wanted to do an analysis of all the IDs in all possible columns in all "Xkos" tables and compare them to where data1 is equal and not equal to zero?
Can anybody direct me to a better way of doing this or is the schema design just not very good for this kind of analysis and I'll have to give up?
EDIT: Thought I'd add a bit of extra detailto avoid confusion. I suspect that a good way to achieve want I want would be to somehow combine all the "Xkos" tables into one temporary table and then search for certain IDs from that table. How would I combine all 525 ko tables without writing out each table name?
How would I combine all 525 ko tables without writing out each table
name?
Create a table with the same number of columns as the largest table (the table into which you merge) allowing nulls.
query the sqlite_master table using something like :-
SELECT * from sqlite_master WHERE name LIKE '%kos%' AND type = 'table'
Loop through the extracted table names building an INSERT SELECT for each table that will insert the rows from the tables into the table created in 1.
See 2. INSERT INTO table SELECT ...; especially in regard to handling missing columns.
All done, the table created in 1 will be populated accordingly.

How to select data in same table on different line that are empty?

I have these three tables (I attach a preview). And of the end of list is example of data in table “virustotalscans.” There is column with name “virustotal.“ The each unique sample has number, for example 165, next sample has number 166 and etc.
TABLE VIRUTOTALS
CREATE TABLE virustotals (
virustotal INTEGER PRIMARY KEY,
virustotal_md5_hash TEXT NOT NULL,
virustotal_timestamp INTEGER NOT NULL,
virustotal_permalink TEXT NOT NULL
);
CREATE INDEX virustotals_md5_hash_idx
ON virustotals (virustotal_md5_hash);
TABLE VIRUSTOTALSCANS
CREATE TABLE virustotalscans (
virustotalscan INTEGER PRIMARY KEY,
virustotal INTEGER NOT NULL,
virustotalscan_scanner TEXT NOT NULL,
virustotalscan_result TEXT
);
CREATE INDEX virustotalscans_result_idx
ON virustotalscans (virustotalscan_result);
CREATE INDEX virustotalscans_scanner_idx
ON virustotalscans (virustotalscan_scanner);
CREATE INDEX virustotalscans_virustotal_idx
ON virustotalscans (virustotal);
TABLE DOWNLOADS
CREATE TABLE downloads (
download INTEGER PRIMARY KEY,
connection INTEGER,
download_url TEXT,
download_md5_hash TEXT
-- CONSTRAINT downloads_connection_fkey FOREIGN KEY (connection) REFERENCES connections (connection)
);
CREATE INDEX downloads_connection_idx ON downloads (connection);
CREATE INDEX downloads_md5_hash_idx
ON downloads (download_md5_hash);
CREATE INDEX downloads_url_idx
ON downloads (download_url);
Example of data in table “virustotalscans”: http://pastebin.com/7E7McZwT
Now, I need select all samples, which are on all lines in column “virustotalscan_result” empty. So I need select all samples, which don´t detect VirusTotal with any antivirus. I tried this select:
select distinct downloads.download_md5_hash from virustotalscans, virustotals,
downloads
where downloads.download_md5_hash = virustotals.virustotal_md5_hash and
virustotals.virustotal = virustotalscans.virustotal and
virustotalscans.virustotalscan_result IS NULL;
but I get MD5 hashes of all samples... Probably reason is that all samples contain at least one line, which is empty. It is logical because, some antivirus always doesn’t detect some sample.
The better example: http://pastebin.com/y81DPpmQ. Now I need select sample - number (column virustotal), where are all lines empty in column virustotalscan_result. It can be for example only number 2.
Can you help me please?
Thank you very much for replies.
SELECT download_md5_hash
FROM downloads
JOIN virustotals ON download_md5_hash = virustotal_md5_hash
WHERE virustotal IN (SELECT virustotal
FROM virustotalscans
GROUP BY virustotal
HAVING COUNT(virustotalscan_result) = 0)

First DB - How to structure required information

I watched a few youtube videos about how to structure a database using tables and fields. I am a bit confused about how to strucuture my information.
I have put my attempt below:
// Identifier Table
// This is where we give each item a new unique identifier
UniqueID []
// Item Table
// This is where the main content goes which is displayed
UniqueID []
Title []
Description []
Date []
Location []
Coordinates []
Source []
Link []
// Misc Table
// This is additional useful information, but not displayed
geocoded []
country name []
By separating out the uniqueID when I delete a record I can make sure that new records still have a unique incrementing ID. Can I get some feedback on how I divided up my data into three tables.
you gave us no hint what you want to represent in your db.
For example: if location and coordinate describe a building or maybe room, than it could be useful to save that information in an extra table and have a relationship from item to it, as this would allow to easily fetch all items connected with on place.
Of course you should apply the same principle for country: a locations lays with-in a country.
BEGIN;
CREATE TABLE "country" (
"id" integer NOT NULL PRIMARY KEY,
"name" varchar(255) NOT NULL
)
;
CREATE TABLE "location" (
"id" integer NOT NULL PRIMARY KEY,
"name" varchar(255) NOT NULL,
"coordinate" varchar(255) NOT NULL,
"country_id" integer NOT NULL REFERENCES "country" ("id")
)
;
CREATE TABLE "item" (
"id" integer NOT NULL PRIMARY KEY,
"title" varchar(25) NOT NULL,
"description" text NOT NULL,
"date" datetime NOT NULL,
"source" varchar(255) NOT NULL,
"link" varchar(255) NOT NULL,
"location_id" integer NOT NULL REFERENCES "location" ("id")
)
;
In the case stated above I would pack everything into one table since there is not enugh complexity to benfit from spliting the data into diferent tables.
When you have more metadata you can split it up into:
Item (For display data)
ItemMeta (For meta data)

How to do multiple column unique-constraint in ormlite ( SQLite )

I'm using ormlite for Android and I'm trying to get a multiple column unique-constraint. As of now i'm only able to get a unique constraint on indiviudal columns like this:
CREATE TABLE `store_group_item` (`store_group_id` INTEGER NOT NULL UNIQUE ,
`store_item_id` INTEGER NOT NULL UNIQUE ,
`_id` INTEGER PRIMARY KEY AUTOINCREMENT );
and what I want is
CREATE TABLE `store_group_item` (`store_group_id` INTEGER NOT NULL ,
`store_item_id` INTEGER NOT NULL ,
`_id` INTEGER PRIMARY KEY AUTOINCREMENT,
UNIQUE( `store_group_id`, `store_item_id` );
In my model I've been using the following annotations for the unique columns:
#DatabaseField( unique = true )
Is there a way to get this to work?
How about using
#DatabaseField (uniqueCombo = true)
String myField;
annotation instead - is it a matter of the uniqueIndexName being faster when accessing items in the table?
Edit:
As #Ready4Android pointed out, we've since added in version 4.20 support for uniqueCombo annotation field. Here are the docs:
http://ormlite.com/docs/unique-combo
There should be no performance differences between using this mechanism versus the uniqueIndexName mentioned below.
Yes. You can't do this with the unique=true tag but you can with a unique index.
#DatabaseField(uniqueIndexName = "unique_store_group_and_item_ids")
int store_group_id;
#DatabaseField(uniqueIndexName = "unique_store_group_and_item_ids")
int store_item_id;
This will create an index to accomplish the unique-ness but I suspect that the unique=true has a hidden index anyway. See the docs:
http://ormlite.com/docs/unique-index
I will look into allowing multiple unique fields. May not be supported by all database types.