Primary key auto-increment manipulation - sql

Is there any way to have a primary key with a feature that increments it but fills in gaps? Assuming I have the following table:
____________________
| ID | Value |
| 1 | A |
| 2 | B |
| 3 | C |
^^^^^^^^^^^^^^^^^^^^^
Notice that the value is only an example, the order has nothing to do with the question.
Once I remove the row with the ID of 2 (the table will look like this):
____________________
| ID | Value |
| 1 | A |
| 3 | C |
^^^^^^^^^^^^^^^^^^^^^
And I add another row, with regular auto-increment feature it will look like this:
____________________
| ID | Value |
| 1 | A |
| 3 | C |
| 4 | D |
^^^^^^^^^^^^^^^^^^^^^
As expected.
The output I'd want would be:
____________________
| ID | Value |
| 1 | A |
| 2 | D |
| 3 | C |
^^^^^^^^^^^^^^^^^^^^^
Where the gap is filled with the new row. Also note that maybe, in memory, it would look different. But the point is that the primary key would fill the gaps.
When having the primary keys (for instance) 1, 2, 3, 6, 7, 10, 11, 4 should be first filled in, then 5, 8 and so on... When the table is empty (even if it had a million of rows before) it should start over from 1.
How do I accomplish that? Is there any built-in feature similar to that? Can I implement it?
EDIT: If it's not possible, why not?

No, you don't want to do that, as juergen-d said. It's unlikely to do what you think it is doing, and it will do it even less in a multi-user environment.
In a multiuser environment you are likely to get voids even when there are no deletes, just from aborted inserts.

Related

Newbie in dilemma due to OCD tries to reorder SQL database automatically

Sorry, I'm very new to SQL. I just learned it few hours ago. I'm using MariaDB + InnoDB Engine with HeidiSQL software + CodeIgniter 3. Let's say I have a table named disciples with the following data:
-------------------
| sort_id | name |
-------------------
| 1 | Peter |
| 4 | John |
| 3 | David |
| 5 | Petrus |
| 2 | Matthew |
-------------------
I'm fully aware that it's better to have a column called sort_id to be able to fetch the data using ORDER BY if I prefer a custom sorting. But if I delete row 3, the new table will look like this:
-------------------
| sort_id | name |
-------------------
| 1 | Peter |
| 4 | John |
| 5 | Petrus |
| 2 | Matthew |
-------------------
The thing is I'm having OCD (imagine there are 1000 rows), it hurts my eyes to see this mess with some missing numbers (in this case number 3 - see the above table) under sort_id. I think it has something to do with "relational database". Is there a way to quickly and automatically "re-assign/reset" new sort_id numbers to given rows and sort them ASC order according to the name using SQL code without having to do it manually?
-------------------
| sort_id | name |
-------------------
| 1 | John |
| 2 | Matthew |
| 3 | Peter |
| 4 | Petrus |
-------------------
I figured this out after reading the answer from Lynn Crumbling.
She made me realized I need a primary key in order to have a better management for my rows which is exactly what I was looking for. It happens that InnoDB automatically creates a primary key and is hidden from HeidiSQL interface unless I specify a specific column for example id. Now, I can re-organize my table rows by editing the primary key id and the table row will automatically sort itself the way I want. Before this, I edited the sort_id but the data did not update accordingly because it was not the primary key.
------------------------
| id | sort_id | name |
------------------------
| 1 | 1 | Peter |
| 2 | 4 | John |
| 3 | 5 | Petrus |
| 4 | 2 | Matthew |
------------------------
Thank you.

Pivot Way or Straight Way in SQL

I have following association in pivot way.
| DOCID | Note1 | Note2 | Note3 |
|-------|-------|-------|-------|
| 1 | N11 | N21 | N31 |
| 2 | N12 | NULL | N32 |
| 3 | N13 | N23 | N33 |
| 4 | N14 | N24 | NULL |
| 5 | NULL | N25 | N35 |
Other way of storing above is as below.
| DOCID | Field | Value |
|-------|---------|-------|
| 1 | Note1 | N11 |
| 1 | Note2 | N21 |
| 1 | Note3 | N31 |
| 2 | Note1 | N12 |
| 2 | Note3 | N32 |
| 3 | Note1 | N13 |
| 3 | Note2 | N23 |
| 3 | Note3 | N33 |
| 4 | Note1 | N14 |
| 4 | Note2 | N24 |
| 5 | Note2 | N25 |
| 5 | Note3 | N35 |
which of the above two option is better.
I might have more null values. in that case 2nd option seems better. as it will have less records.
but when I have 10 million records, it will be multiplied by notes (in our case it will be (30 million - null) records).
So considering performance for fetching associated records. which option is better and why?
I will have more notes associated with DocIDs.
"Better" is often subjective. In this case, though, I think one method is generally better than the other.
The second approach is the better approach -- one row per document/note pair. In general, when you have columns that are only distinguished by a number -- but otherwise contain the same things -- then the data model is suspect. There may be good reasons for representing the data across columns, but the structure should be questioned. If you still need it, then fine.
Consider a simple query such as which ids have a particular note. In the first representation, you need to check all three columns. This makes it hard to use an index. And, it negates the value of columnar storage.
If the business changes and you suddenly want 4 notes per docid -- or want to limit them to 2 -- then the table needs to be restructured. That is an expensive process.
I'm not sure what the notes refer to. But if they represent a foreign key relationship to another table, then the pivoted version needs to maintain multiple foreign key relationships -- for essentially the same purpose.

How to add data or change schema to production database

I am new to working with databases and I want to make sure I understand the best way to add or remove data from a database without making a mess of any related data.
Here is a scenario I am working with:
I have a Tags table, with an Identity ID column. The Tags can be selected via the web application to categorize stories that are submitted by a user. When the database was first seeded; like tags were seeded in order together. As you can see all the Campuses (cities) were 1-4, the Colleges (subjects) are 5-7, and Populations are 8-11.
If this database is live in production and the client wants to add a new Campus (City) tag, what is the best way to do this?
All the other city tags are sort of organized at the top, it seems like the only option is to insert any new tags at to bottom of the table, where they will end up taking whatever the next ID available is. I suppose this is fine because the Display category column will allow us to know which categories these new tags actually belong to.
Is this typical? Is there better ways to set up the database or handle this situation such that everything remains more organized?
Thank you
+----+------------------+---------------+-----------------+--------------+--------+----------+
| ID | DisplayName | DisplayDetail | DisplayCategory | DisplayOrder | Active | ParentID |
+----+------------------+---------------+-----------------+--------------+--------+----------+
| 1 | Albany | NULL | 1 | 0 | 1 | NULL |
| 2 | Buffalo | NULL | 1 | 1 | 1 | NULL |
| 3 | New York City | NULL | 1 | 2 | 1 | NULL |
| 4 | Syracuse | NULL | 1 | 3 | 1 | NULL |
| 5 | Business | NULL | 2 | 0 | 1 | NULL |
| 6 | Dentistry | NULL | 2 | 1 | 1 | NULL |
| 7 | Law | NULL | 2 | 2 | 1 | NULL |
| 8 | Student-Athletes | NULL | 3 | 0 | 1 | NULL |
| 9 | Alumni | NULL | 3 | 1 | 1 | NULL |
| 10 | Faculty | NULL | 3 | 2 | 1 | NULL |
| 11 | Staff | NULL | 3 | 3 | 1 | NULL |
+----+------------------+---------------+-----------------+--------------+--------+----------+
The terms "top" and "bottom" which you use aren't really applicable. "Albany" isn't at the "Top" of the table - it's merely at the top of the specific view you see when you query the table without specifying a meaningful sort order. It defaults to a sort order based on the Id or an internal ROWID parameter, which isn't the logical way to show this data.
Data in the table isn't inherently ordered. If you want to view your tags organized by their category, simply order your query by DisplayCategory (and probably by DisplayOrder afterwards), and you'll see your data properly organized. You can even create a persistent View that sorts it that way for your convenience.

Nonclustered indexes covering queries

I'm having an issue or two with the following;
A nonclustered index can cover a query. Covering a query means that SQL Server can find
all data needed for the query in a nonclustered index and does not need to do any lookups in the base table.
Does this mean that the data is stored inside the clustered index leaf pages? I thought these contained pointers to the RIDs (heaps) and clustered index keys (clustered index) and the data was stored there?
Also the quote above mentions a 'base table' - is that the heap/clustered index? I'm learning to think of the word 'table' as being the form in which data is returned rather than the form in which it's stored, so to hear it referred to as a storage medium seems misleading.
Any advice/help appreciated.
Perhaps the best way to understand how this works would be an example.
Say you have a table foo with columns a, b, and c, and you run CREATE INDEX foo_a_b ON foo (a, b).
The table might look like this:
| a | b | c |
+-----+-----+-----+
| 1 | 1 | 1 |
| 4 | 10 | 42 |
| 2 | 4 | 42 |
| 5 | 16 | 1 |
| 3 | 8 | 1 |
If you now run the query SELECT a, b, c FROM foo WHERE a < 5 AND b < 10, the DBMS can use the index to find rows that meet the WHERE clause.
In order for that to be true, the index must have the values of a and b as accessible data. It might look something like this:
| a | b | row_address |
+-----+-----+---------------+
| 1 | 1 | 0xABDEFC |
| 2 | 4 | 0xAFBDEC |
| 3 | 8 | 0xFABDEC |
| 4 | 10 | 0xCAFEBA |
| 5 | 16 | 0xADDAFF |
If we instead write SELECT a, b FROM foo WHERE a < 5 AND b < 10, something special happens: to select the values of a and b, we don't actually need to follow the pointer to the full row, we can just scan down the index outputting the pairs that match the condition:
| a | b |
+-----+-----+
| 1 | 1 |
| 2 | 4 |
| 3 | 8 |
This is true regardless of whether the pointer to a full row (which I've called row_address) points to an arbitrary heap address (in the case of a non-clustered table), or a location ordered based on some index key (in the case of a clustered table).

Relative incremental ID by reference field

I have a table to store reservations for certain events; relevant part of it is:
class Reservation(models.Model):
# django creates an auto-increment field "id" by default
event = models.ForeignKey(Event)
# Some other reservation-specific fields..
first_name = models.CharField(max_length=255)
Now, I wish to retrieve the sequential ID of a given reservation relative to reservations for the same event.
Disclaimer: Of course, we assume reservations are never deleted, or their relative position might change.
Example:
+----+-------+------------+--------+
| ID | Event | First name | Rel.ID |
+----+-------+------------+--------+
| 1 | 1 | AAA | 1 |
| 2 | 1 | BBB | 2 |
| 3 | 2 | CCC | 1 |
| 4 | 2 | DDD | 2 |
| 5 | 1 | EEE | 3 |
| 6 | 3 | FFF | 1 |
| 7 | 1 | GGG | 4 |
| 8 | 1 | HHH | 5 |
+----+-------+------------+--------+
The last column is the "Relative ID", that is, a sequential number, with no gaps, for all reservations of the same event.
Now, what's the best way to accomplish this, without having to manually calculate relative id for each import (I don't like that)? I'm using postgresql as underlying database, but I'd prefer to stick with django abstraction layer in order to keep this portable (i.e. no database-specific solutions, such as triggers etc.).
Filtering using Reservation.objects.filter(event_id = some_event_id) should suffice. This will give you a QuerySet that should have the same ordering each time. Or am I missing something in your question?
I hate always being the one that responds its own questions, but I solved using this:
class Reservation(models.Model):
# ...
def relative_id(self):
return self.id - Reservation.objects.filter(id__lt=self.id).filter(~Q(event=self.event)).all().count()
Assuming records from reservations are never deleted, we can safely assume the "relative id" is the incremental id - (count of reservations before this one not belonging to same event).
I'm thinking of any drawbacks, but I didn't find any.