unique constraint (w/o Trigger) on "one-to-many" relation - sql

To illustrate the problem, I make an example:
A tag_bundle consists of one or more than one tags.
A unique tag combination can map to a unique tag_bundle, vice versa.
tag_bundle tag tag_bundle_relation
+---------------+ +--------+ +---------------+--------+
| tag_bundle_id | | tag_id | | tag_bundle_id | tag_id |
+---------------+ +--------+ +---------------+--------+
| 1 | | 100 | | 1 | 100 |
+---------------+ +--------+ +---------------+--------+
| 2 | | 101 | | 1 | 101 |
+---------------+ +--------+ +---------------+--------+
| 102 | | 2 | 101 |
+--------+ +---------------+--------+
| 2 | 102 |
+---------------+--------+
There can't be another tag_bundle having exactly the same combination from tag 100 and tag 101.
There can't be another tag_bundle having exactly the same combination from tag 101 and tag 102.
How can I ensure such unique constraint when executing SQL "concurrently"!!
that is, to prevent concurrently adding two bundles with exactly the same tag combination
Adding a simple unique constraint on any table does not work,
Is there any solution other than Trigger or explicit lock.
I come to only this simple way: make tag combination into string, and let it be a unique column.
tag_bundle (unique on tags) tag tag_bundle_relation
+---------------+-----------+ +--------+ +---------------+--------+
| tag_bundle_id | tags | | tag_id | | tag_bundle_id | tag_id |
+---------------+-----------+ +--------+ +---------------+--------+
| 1 | "100,101" | | 101 | | 1 | 101 |
+---------------+-----------+ +--------+ +---------------+--------+
| 100 | | 1 | 100 |
+--------+ +---------------+--------+
but it seems not a good way :(

Why the constraint of 'without a trigger'? With it, combined with a bit of data duplication, you can get what you need. Change your 'tags' field in your solution to an array field of INTEGERs (or whatever type tag_id is)
While recognising the unpleasantness of the solution, I don't see a way round it. Though I would use an array instead of a string for 'tags', put it in a separate table from tag_bundle, still make it unique and put a trigger on tag_bundle_relation to update the tags field with array_agg(tag_id) (>8.4), and if that fails, fail the trigger update.

In order to work correctly when multiple transactions will be updating the tables, you will need to create a deferable, initially deferred, constraint trigger.

Related

How to define a relationship between two tables from different sources with different identifiers

Background:
I'm working on a project that does not allow me to share the data, but I'll do my best to give you some visualisation below. So before going further, I know (some) SQL, and I have done basic work relationship before, but the data was clean and simple and for some reason I just can't' figure out a solution.
Problem (?)
I'm trying to define a relationship between two tables from two different sources that each work with different identifiers. I do have however a mapping table from one of those but again the identifiers do not align. Let me try explain visually:
| TABLE 1 (cies) | | TABLE 2 (forms) |
| ------------ | | ------------- |
| id(PK) | | id(PK) |
| 4_digit_code | | 16_digit_code |
| ...more fields | | ...more fields |
The second source provided me a mapping table they use internally:
| MAPPING TABLE |
| ------------- |
| id(PK) |
| 4_digit_code | (= to the one in TABLE 1)
| 16_digit_code | (= to the one in TABLE 2)
My first thought was to create a script and just merge the info in the mapping table in TABLE 1 like so:
| TABLE 1 | | TABLE 2 |
| ------------ | | ------------- |
| id(PK) | | id(PK) |
| 16_digit_code | ==== | 16_digit_code |
| 4_digit_code |
The issue here is the 16_digit_code is not unique so I believe this does not work. Now comes something I have no experience with so I am just thinking out loud here:
Can I keep (?) the mapping table and each time reference that one to get my data from the other table via another? On other hand should not all values in a mapping table be unique as well for it to work? The reason there are non-unique values is that (some) very old numbers end up getting recycled.
For example get me all forms from company with id 1:
| TABLE 1 | | MAPPING TABLE | | TABLE 2 |
| ------------ | | ------------- | | ------------- |
| id(PK) | | id(PK) | | id(PK) |
| 16_digit_code | | 16_digit_code | ==== | 16_digit_code |
| 4_digit_code | ==== | 4_digit_code | | ...more fields |
And in the above, I would not know how to efficiently approach this problem. I really don't know if it makes any sense though what I am saying or I am missing something or making this way too complex.
Solution?
I'd love it if someone could point me in the right direction. And if you have the solution I'd love to know the reasoning, not just the solution as I'd love to learn from this for the future obviously.
Edit/Clarification:
Just for completion sake, the mapping combination (4 digit + 16 digit code) is unique. Although, as I said earlier one 16 digit code can be linked to multiple 4 digit codes.

How should I design a table where a row can have different columns depending on the type of row?

I'm planning to use the Reddit API and store my saved posts in a database. The saves can be of two types - Comments or Posts, both of them have few common columns - author, score, subreddit etc. and a few columns unique to each category:
comment - body_text, comment_id, parent_id, etc.
posts - selftext,link_url,is_video, etc.
I decided to separate the 2 categories into their own tables - Comments table and Posts table. But I don't know how to link these tables to the master table "saves".
My current solution is to have a column kind for the type of save. The comment_id and post_id link the save to its own table. However, this feels like a messy solution and a bit cumbersome. A save can either have a comment_id or a link_id (but not both or neither), and I also have to manage this constraint.
Saves Table :
+----+---------+-------+---------------------------------------------+---------+------------+---------+
| ID | Kind | title | post_url | author | comment_id | post_id |
+----+---------+-------+---------------------------------------------+---------+------------+---------+
| 1 | comment | abc | https://redd.i/redditpostid/redditcommentid | FusionX | 1 | NULL |
| 2 | post | xyz | https://redd.i/redditpostid | XnoisuF | NULL | 1 |
+----+---------+-------+---------------------------------------------+---------+------------+---------+
Post Table :
+----+---------+-------------------------------------------+-----------------------+--------------+--------------+
| ID | is_self | selftext | post_url | num_comments | thumbnail |
+----+---------+-------------------------------------------+-----------------------+--------------+--------------+
| 1 | no | NULL | i.imgur.com/xyz.jpg | 1020 | someimageurl |
| 2 | yes | "some random selftext of variable length" | redd.it/redditpostid/ | 10 | |
+----+---------+-------------------------------------------+-----------------------+--------------+--------------+
Comment table:
+----+---------------------------------+---------------------+--------------------+
| ID | body_html | reddit_comment_id | reddit_parent_id |
+----+---------------------------------+---------------------+--------------------+
| 1 | comment text of variable length | <reddit comment id> | <reddit parent id> |
+----+---------------------------------+---------------------+--------------------+
(reddit ID's are different from my table's own IDs and are only relevant at reddit's end)
Is there a better way to design this database?
I think you should move the owning side of the relation to the two other tables.
So instead of having comment_id and post_id columns in saves table, have a saves_id column in post table and comment table.

Pivot Way or Straight Way in SQL

I have following association in pivot way.
| DOCID | Note1 | Note2 | Note3 |
|-------|-------|-------|-------|
| 1 | N11 | N21 | N31 |
| 2 | N12 | NULL | N32 |
| 3 | N13 | N23 | N33 |
| 4 | N14 | N24 | NULL |
| 5 | NULL | N25 | N35 |
Other way of storing above is as below.
| DOCID | Field | Value |
|-------|---------|-------|
| 1 | Note1 | N11 |
| 1 | Note2 | N21 |
| 1 | Note3 | N31 |
| 2 | Note1 | N12 |
| 2 | Note3 | N32 |
| 3 | Note1 | N13 |
| 3 | Note2 | N23 |
| 3 | Note3 | N33 |
| 4 | Note1 | N14 |
| 4 | Note2 | N24 |
| 5 | Note2 | N25 |
| 5 | Note3 | N35 |
which of the above two option is better.
I might have more null values. in that case 2nd option seems better. as it will have less records.
but when I have 10 million records, it will be multiplied by notes (in our case it will be (30 million - null) records).
So considering performance for fetching associated records. which option is better and why?
I will have more notes associated with DocIDs.
"Better" is often subjective. In this case, though, I think one method is generally better than the other.
The second approach is the better approach -- one row per document/note pair. In general, when you have columns that are only distinguished by a number -- but otherwise contain the same things -- then the data model is suspect. There may be good reasons for representing the data across columns, but the structure should be questioned. If you still need it, then fine.
Consider a simple query such as which ids have a particular note. In the first representation, you need to check all three columns. This makes it hard to use an index. And, it negates the value of columnar storage.
If the business changes and you suddenly want 4 notes per docid -- or want to limit them to 2 -- then the table needs to be restructured. That is an expensive process.
I'm not sure what the notes refer to. But if they represent a foreign key relationship to another table, then the pivoted version needs to maintain multiple foreign key relationships -- for essentially the same purpose.

Primary key auto-increment manipulation

Is there any way to have a primary key with a feature that increments it but fills in gaps? Assuming I have the following table:
____________________
| ID | Value |
| 1 | A |
| 2 | B |
| 3 | C |
^^^^^^^^^^^^^^^^^^^^^
Notice that the value is only an example, the order has nothing to do with the question.
Once I remove the row with the ID of 2 (the table will look like this):
____________________
| ID | Value |
| 1 | A |
| 3 | C |
^^^^^^^^^^^^^^^^^^^^^
And I add another row, with regular auto-increment feature it will look like this:
____________________
| ID | Value |
| 1 | A |
| 3 | C |
| 4 | D |
^^^^^^^^^^^^^^^^^^^^^
As expected.
The output I'd want would be:
____________________
| ID | Value |
| 1 | A |
| 2 | D |
| 3 | C |
^^^^^^^^^^^^^^^^^^^^^
Where the gap is filled with the new row. Also note that maybe, in memory, it would look different. But the point is that the primary key would fill the gaps.
When having the primary keys (for instance) 1, 2, 3, 6, 7, 10, 11, 4 should be first filled in, then 5, 8 and so on... When the table is empty (even if it had a million of rows before) it should start over from 1.
How do I accomplish that? Is there any built-in feature similar to that? Can I implement it?
EDIT: If it's not possible, why not?
No, you don't want to do that, as juergen-d said. It's unlikely to do what you think it is doing, and it will do it even less in a multi-user environment.
In a multiuser environment you are likely to get voids even when there are no deletes, just from aborted inserts.

SQL LIKE question

I was wondering if there's a drawback (other than bad practice) to using something like this
SELECT * FROM my_table WHERE id LIKE '1';
where id is an integer. I know you're supposed to use id=1 but I am writing a java program and if everything can use LIKE it'll be a lot easier for me. Also, so far, everything works fine; I get the correct query results, so if there is no drawback I will continue doing it like this.
edit: I am using MySQL.
MySQL will allow it, but will ignore the index:
mysql> describe METADATA_44;
+---------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| AtextId | int(11) | NO | PRI | NULL | |
| num | varchar(128) | YES | | NULL | |
| title | varchar(128) | YES | | NULL | |
| file | varchar(128) | YES | | NULL | |
| context | varchar(128) | YES | | NULL | |
| source | varchar(128) | YES | | NULL | |
+---------+--------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
mysql> explain select * from METADATA_44 where Atextid like '7';
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | METADATA_44 | ALL | PRIMARY | NULL | NULL | NULL | 591 | Using where |
+----+-------------+-------------+------+---------------+------+---------+------+------+-------------+
mysql> explain select * from METADATA_44 where Atextid=7;
+----+-------------+-------------+-------+---------------+---------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------------+-------+---------------+---------+---------+-------+------+-------+
| 1 | SIMPLE | METADATA_44 | const | PRIMARY | PRIMARY | 4 | const | 1 | |
+----+-------------+-------------+-------+---------------+---------+---------+-------+------+-------+
1 row in set (0.00 sec)
You'd need to look at the Query Execution Plan on your RDBMS to verify that LIKE with no wildcards is treated as efficiently as an = would be. A quick test in SQL Server shows that it would give you an index scan rather than a seek so I guess it doesn't look at that when generating the plan and for SQL Server using = would be much more efficient. I don't have a MySQL install to test against.
Edit: Just to update this SQL Server seems to handle it fine and do a seek when the data type is varchar. When it is run against an int column though you get the scan. This is because it does an implicit conversion to varchar on the int column so can't use the index.
You are better off writing your query as
SELECT * FROM my_table WHERE id = 1;
otherwise mysql will have to typecast '1' to int which is the type of the column id
so obviously there is a small performance penalty, when u know the type of the column supply the value according to that type
Speed. [15-char filler as there's not much more to say]
Without using any wildcards with LIKE, is should be fine for your needs if the speed/efficiency is something you don't bother with.