I have a table place2022 which has a very long CHAR column
timestamp | user_id | pixel_color | coordinate
-----------------+------------------------------------------------------------------------------------------+-------------+------------
17:38:20.021+00 | p0sXpmkcmg1KLiCdK5e4xKdudb1f8cjscGs35082sKpGBfQIw92nZ7yGvWbQ/ggB1+kkRBaYu1zy6n16yL/yjA== | #FF4500 | 371,488
17:38:20.024+00 | Ctar52ln5JEpXT+tVVc8BtQwm1tPjRwPZmPvuamzsZDlFDkeo3+ItUW89J1rXDDeho6A4zCob1MKmJrzYAjipg== | #51E9F4 | 457,493
17:38:20.025+00 | rNMF5wpFYT2RAItySLf9IcFZwOhczQhkRhmTD4gv0K78DpieXrVUw8T/MBAZjj2BIS8h5exPISQ4vlyzLzad5w== | #000000 | 65,986
17:38:20.025+00 | u0a7l8hHVvncqYmav27EARAE6ciLtpUTPXMI33lDrUmtj5Ei3ixlfRuG28KUvs7r5LpeiE/iOKPALVjkILhrYg== | #3690EA | 73,961
The user_ids are already hashes, so all I really care about here is having some sort of id column which is 1-1 with the user_id.
I've counted the number of unique user_ids, which is 10381163, which fits into 24 bits. Therefore, I can compress the id field down to a 32-bit integer using the obvious scheme of "Assign 1 to the first new user_id you see, 2 to the second new user_id you see", etc. I don't even care that the user_id's are mapped in the order that they're seen: I just need them to be mapped in an invertible manner to 32-bit ints somehow. I'd also like to persist this mapping somewhere so that, if I want to, I can go backwards.
What would be the best way to achieve this? I imagine that we could create a new table (create table place2022_user_ids as select distinct(user_id) from place2022;?) and then reverse-lookup the user_id column in that table, but I don't know quite how to formulate the queries and also make sure that I'm not doing something ridiculously slow.
I am using postgresql, if it matters.
If you have a recent (>8) version of Postgres you can add an auto increment id column to an existing table.
ALTER TABLE place2022
ADD COLUMN id SERIAL PRIMARY KEY;
NB If the existing column is a PRIMARY KEY you will need to drop it first.
See drop primary key constraint in postgresql by knowing schema and table name only
Related
Let's say I have this simple table called "characters":
realm_id | character_name | xp
---------|----------------|----------
1 | "mike" | 10
1 | "lara" | 25
2 | "mike" | 40
What I want to do is to have unique names depending on the realm_id. So, for example, while having two "mikes" with different realm_ids is allowed, it's not allowed to have two "mikes" within the same realm_id. Is that possible?
If you're looking to perform a SELECT statement on this data then you'll be looking for something like this (assuming highest XP wins);
SELECT
realm_id
character_name
MAX(XP) AS XP
FROM characters
However, if you want the table to not allow duplicates in the first place then you're best looking at making teh realm_id and character_name a composite primary key. That will stop your duplication happening in the first place although you'll have to consider what happens when somebody tries to insert that duplicate, it'll throw an error.
Create a primary key on the table that consists of realm_id and character_name. The primary key will enforce uniqueness in the table across realm and character. Thus, you could have realm_id=1, character_name='Mike' and realm_id=2, character_name='Mike', but if you tried to insert realm_id=1 and character_name='Mike' again, the insert would fail. Your uniqueness is guaranteed.
(related to Finding the lowest unused unique id in a list and Getting unused unique values on a SQL table)
Suppose I have a table containing on id column and some others (they don't make any difference here):
+-----+-----+
| id |other|
+-----+-----+
The id has numerical increasing value. My goal is to get the lowest unused id and creating that row. So of course for the first time I run it will return 0 and the the row of this row would have been created. After a few executions it will look like this:
+-----+-----+
| id |other|
+-----+-----+
| 0 | ... |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 4 | ... |
+-----+-----+
Fairly often some of these rows might get deleted. Let's assume the rows with the id's of 1 and 3 are removed. No the table will look like this:
+-----+-----+
| id |other|
+-----+-----+
| 0 | ... |
| 2 | ... |
| 4 | ... |
+-----+-----+
If I now run again the query it would like to get back the id 1 and this row should be created:
| id |other|
+-----+-----+
| 0 | ... |
| 1 | ... |
| 2 | ... |
| 4 | ... |
+-----+-----+
The next times the query runs it should return the id's 3, 5, 6, etc.
What's the most effective way to run those kinds of query as I need to execute them fairly often in a second (it is fair to assume that the the id's are the only purpose of the table)? Is it possible to get the next unused row with one query? Or is it easier and faster by introducing another table which keeps track of the unused id's?
If it is significantly faster it is also possible to get a way to reuse any hole in the table provided that all numbers get reused at some time.
Bonus question: I plan to use SQLite for this kind of storing information as I don't need a database except for storing these id's. Is there any other free (as in speech) server which can do this job significantly faster?
I think I'd create a trigger on delete, and insert the old.id in a separate table.
Then you can select min(id) from that table to get the lowest id.
disclaimer: i don't know what database engine you use, so i don't know if triggers are available to you.
Like Dennis Haarbrink said; a trigger on delete and another on insert :
The trigger on delete would take the deleted id and insert it in a id pool table (only one column id)
The trigger on before insert would check if an id value is provided, otherwise it just query the id pool table (ex: SELECT MIN(id) FROM id_pool_table) and assign it (i.g. deletes it from the id_pool_table)
Normally you'd let the database handle assigning the ids. Is there a particular reason you need to have the id's sequential rather than unique? Can you, instead, timestamp them, and just number them when you display them? Or make a separate column for the sequential id, and renumber them?
Alternatively, you could not delete the rows themselves, but rather, mark them as deleted with a flag in a column, and then re-use the id's of the marked rows by finding the lowest numbered 'deleted' row, and reusing that id.
The database doesn't care if the values are sequential, only that they are unique. The desire to have your id values sequential is purely cosmetic, and if you are exposing this value to users -- it should not be your primary key, nor should there be any referential integrity based on the value because a client could change the format if desired.
The fastest and safest way to deal with the id value generation is to rely on native functionality that gives you a unique integer value (IE: SQLite's autoincrement). Using triggers only adds overhead, using MAX(id) +1 is extremely risky...
Summary
Ideally, use the native unique integer generator (SQLite/MySQL auto_increment, Oracle/PostgreSQL sequences, SQL Server IDENTITY) for the primary key. If you want a value that is always sequential, add an additional column to store that sequential value & maintain it as necessary. MySQL/SQLite/SQL Server unique integer generation only allows one per column - sequences are more flexible.
I'm not exactly sure how to phrase this, but here goes...
We have a table structure like the following:
Id | Timestamp | Type | Clientid | ..others..
001 | 1234567890 | TYPE1 | CL1234567 |.....
002 | 1234561890 | TYPE1 | CL1234567 |.....
Now for the data given above... I would like to have a constraint so that those 2 rows could not exist together. Essentially, I want the table to be
Unique for (Type, ClientId, CEIL(Timestamp/10000)*10000)
I don't want rows with the same data created within X time of each other to be added to the db, i.e would like a constraint violation in this case. The problem is that, the above constraint is not something I can actually create.
Before you ask, I know, I know.... why right? Well I know a certain scenario should not be happening, but alas it is. I need a sort of stop gap measure for now, so I can buy some time to investigate the actual matter. Let me know if you need additional info...
Yes, Oracle supports calculated columns:
SQL> alter table test add calc_column as (trunc(timestamp/10000));
Table altered.
SQL> alter table test
add constraint test_uniq
unique (type, clientid, calc_column);
Table altered.
should do what you want.
AFAIK, Oracle does not support computed columns like SQL Server does. You can mimic the functionality of a computed column using Triggers.
Here are the steps for this
Add a column called CEILCalculation to your table.
On your table, put a trigger will update CEILCalculation with the value from CEIL(Timestamp/10000)*10000
Create a Unique Index on the three columns (Unique for (Type, ClientId, CEILCalculation)
If you do not want to modify the table structure, you can put a BEFORE INSERT TRIGGER on the table and check for validity over there.
http://www.techonthenet.com/oracle/triggers/before_insert.php
In a MySQL database I have a table with the following primary key
PRIMARY KEY id (invoice, item)
In my application I will also frequently be selecting on item by itself and less frequently on only invoice. I'm assuming I would benefit from indexes on these columns.
MySQL does not complain when I define the following:
INDEX (invoice),
INDEX (item),
PRIMARY KEY id (invoice, item)
But I don't see any evidence (using DESCRIBE -- the only way I know how to look) that separate indexes have been established for these two columns.
Are the columns that make up a primary key automatically indexed individually? Is there a better way than DESCRIBE to explore the structure of my table?
I'm not intimately familiar with the internals of indices on mySql, but on the two database vendor products that I am familiar with (MsSQL, Oracle) indices are balanced-Tree structures, whose nodes are organized as a sequenced tuple of the columns the index is defined on (In the Sequence Defined)
So, unless mySql does it very differently, (probably not), any composite index (on more than one column) can be useable by any query that needs to filter or sort by a subset of the columns in the index, as long as the list of columns is compatible, i.e., if the columns, when sequenced the same as the sequenced list of columns in the complete index, is an ordered subset of the complete set of index columns, which starts at the beginning of the actual index sequence, with no gaps except at the end...
In other words, this means that if you have an index on (a,b,c,d) a query that filters on (a), (a,b), or (a,b,c) can also use the index, but a query that needs to filter on (b), or (c) or (b,c) will not be able to use the index...
So in your case, if you often need to filter or sort on column item alone, you need to add another index on that column by itself...
I personally use phpMyAdmin to view and edit the structure of MySQL databases. It is a web application but it runs well enough on a local web server (I run an instance of apache on my machine for this and phpPgAdmin).
As for the composite key of (invoice, item), it acts like an index for (invoice, item) and for invoice. If you want to index by just item you have to add that index yourself. Your PK will be sorted by invoice and then by item where invoice is the same in multiple records. While the order in a composite PK does not matter for uniqueness enforcement, it does matter for access.
On your table I would use:
PRIMARY KEY id (invoice, item), INDEX (item)
I'm not that familiar with MySQL, but generally an multiple-column index is equally useful on the first column in the index as an index on that column alone. The multiple-column index becomes less useful for querying against a single column the further the column appears into the index.
This makes some sense if you think of the multi-column index as a hierarchy. The first column in the index is the root of the hierarchy, so searching it is just a matter of scanning that first level. However, in order to scan the second column, the database has to look up the tree for each unique value found in the first column. This can be costly enough that most optimizers won't bother to look deeply into a multi-column index, instead opting to full-table-scan.
For example, if you have a table as follows:
Col1 |Col2 |Col3
----------------
A | 1 | Z
A | 2 | Y
A | 2 | X
B | 1 | Z
B | 2 | X
Assuming you have an index on all three columns, in order, the tree will look something like this:
A
+-1
+-Z
+-2
+-X
+-Y
B
+-1
+-Z
+-2
+-X
Looking for Col1='A' is easy: you only have to look at 2 ordered values. However, to resolve col3='X', you have to look at all of the values in the 4 bigger buckets, each of which is ordered individually.
To return table index information, you can use:
SHOW INDEX FROM <table>;
See: http://dev.mysql.com/doc/refman/5.0/en/show-index.html
To view table information:
SHOW CREATE TABLE <table>;
See: http://dev.mysql.com/doc/refman/5.0/en/show-create-table.html
Primary keys are indexes, so there's no need to create additional indexes. You can find out more information about them under the CREATE TABLE syntax (there's too much to insert here):
http://dev.mysql.com/doc/refman/5.0/en/create-table.html
There is a difference between composite index and composite primary key.
If you have defined a composite index like below
INDEX idx(invoice,item)
the index wont work if you query based on item and you need to add a separate index
INDEX itemidx(item)
But, if you have defined a composite primary key like below
PRIMARY KEY(invoice, item)
the index would work if you query based on item and no separate index is required.
Working example:
mysql>create table test ( col1 int(20), col2 int(20) ) primary key(col1,col2);
mysql>explain select * from test where col2 = 1;
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | test | index | NULL | PRIMARY | 8 | NULL | 10 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
Mysql auto create an index for composite keys. Depending on your queries, you may have to create separate index for individual column in the composite key.
If you are using mysql workbench, you can manually right click the schema and click on edit to see everything about the table
If your query is using both columns in where clause then you don't need to create a separate index in a composite primary key.
EXPLAIN SELECT * FROM `table` WHERE invoice = 1 and item = 1
You are also fine if you want to query with first column only
EXPLAIN SELECT * FROM `table` WHERE invoice = 1
But if you want to query with subsequent columns col2, col3 in composite PK then you would need to create separate indexes on those columns. The following explain query shows the second column does not have a possible key detected by MySQL
EXPLAIN SELECT * FROM `table` WHERE item = 1
I have an access table with an automatic primary key, a date, and other data. The first record starts at 36, due to deleted records. I want to change all the primary keys so they begin at 1 and increment, ordered by the date. Whats the best way to do this?
I want to change the table from this:
| TestID | Date | Data |
| 36 | 12/02/09 | .54 |
| 37 | 12/04/09 | .52 |
To this:
| TestID | Date | Data |
| 1 | 12/02/09 | .54 |
| 2 | 12/04/09 | .52 |
EDIT: Thanks for the input and those who answered. I think some were reading a little too much into my question, which is okay because it still adds to my learning and thinking process. The purpose of my question was two fold: 1) It would simply be nicer for me to have the PK match with the order of my data's dates and 2) to learn if something like this was possible for later use. Such as, if I want to add a new column to the table which numbers the tests, labels the type of test, etc. I am trying to learn a lot at once right now so I get a little confused where to start sometimes. I am building .NET apps and trying to learn SQL and database management and it is sometimes confusing finding the right info with the different RDMS's and ways to interact with them.
Following from MikeW, you can use the following SQL command to copy the data from the old to the new table:
INSERT
TestID, Date, Data
INTO
NewTable
SELECT
TestID, Date, Data
FROM
OldTable;
The new TestID will start from 1 if you use an AutoIncrement field.
I would create a new table, with autoincrement.
Then select all the existing data into it, ordering by date. That will result in the IDs being recreated from "1".
Then you could drop the original table, and rename the new one.
Assuming no foreign keys - if so you'd have to drop and recreate those too.
An Autonumber used as a surrogate primary keys is not data, but metadata used to do nothing but connect records in related tables. If you need to control the values in that field, then it's data, and you can't use an Autonumber, but have to roll your own autoincrement routine. You might want to look at this thread for a starting point, but code for this for use in Access is available everywhere Access programmers congregate on the Net.
I agree that the value of the auto-generated IDENTITY values should have no meaning, even for the coder, but for education purposes, here's how to reseed the IDENTITY using ADO:
ACC2000: Cannot Change Default Seed and Increment Value in UI
Note the article as out of date because it says, "there are no options available in the user interface (UI) for you to make this change." In later version the Access, the SQL DLL could be executed when in ANSI-92 Query Mode e.g. something like this:
ALTER TABLE MyTable ALTER TestID INTEGER IDENTITY (1, 1) NOT NULL;