Indexes and multi column primary keys - sql

In a MySQL database I have a table with the following primary key
PRIMARY KEY id (invoice, item)
In my application I will also frequently be selecting on item by itself and less frequently on only invoice. I'm assuming I would benefit from indexes on these columns.
MySQL does not complain when I define the following:
INDEX (invoice),
INDEX (item),
PRIMARY KEY id (invoice, item)
But I don't see any evidence (using DESCRIBE -- the only way I know how to look) that separate indexes have been established for these two columns.
Are the columns that make up a primary key automatically indexed individually? Is there a better way than DESCRIBE to explore the structure of my table?

I'm not intimately familiar with the internals of indices on mySql, but on the two database vendor products that I am familiar with (MsSQL, Oracle) indices are balanced-Tree structures, whose nodes are organized as a sequenced tuple of the columns the index is defined on (In the Sequence Defined)
So, unless mySql does it very differently, (probably not), any composite index (on more than one column) can be useable by any query that needs to filter or sort by a subset of the columns in the index, as long as the list of columns is compatible, i.e., if the columns, when sequenced the same as the sequenced list of columns in the complete index, is an ordered subset of the complete set of index columns, which starts at the beginning of the actual index sequence, with no gaps except at the end...
In other words, this means that if you have an index on (a,b,c,d) a query that filters on (a), (a,b), or (a,b,c) can also use the index, but a query that needs to filter on (b), or (c) or (b,c) will not be able to use the index...
So in your case, if you often need to filter or sort on column item alone, you need to add another index on that column by itself...

I personally use phpMyAdmin to view and edit the structure of MySQL databases. It is a web application but it runs well enough on a local web server (I run an instance of apache on my machine for this and phpPgAdmin).
As for the composite key of (invoice, item), it acts like an index for (invoice, item) and for invoice. If you want to index by just item you have to add that index yourself. Your PK will be sorted by invoice and then by item where invoice is the same in multiple records. While the order in a composite PK does not matter for uniqueness enforcement, it does matter for access.
On your table I would use:
PRIMARY KEY id (invoice, item), INDEX (item)

I'm not that familiar with MySQL, but generally an multiple-column index is equally useful on the first column in the index as an index on that column alone. The multiple-column index becomes less useful for querying against a single column the further the column appears into the index.
This makes some sense if you think of the multi-column index as a hierarchy. The first column in the index is the root of the hierarchy, so searching it is just a matter of scanning that first level. However, in order to scan the second column, the database has to look up the tree for each unique value found in the first column. This can be costly enough that most optimizers won't bother to look deeply into a multi-column index, instead opting to full-table-scan.
For example, if you have a table as follows:
Col1 |Col2 |Col3
----------------
A | 1 | Z
A | 2 | Y
A | 2 | X
B | 1 | Z
B | 2 | X
Assuming you have an index on all three columns, in order, the tree will look something like this:
A
+-1
+-Z
+-2
+-X
+-Y
B
+-1
+-Z
+-2
+-X
Looking for Col1='A' is easy: you only have to look at 2 ordered values. However, to resolve col3='X', you have to look at all of the values in the 4 bigger buckets, each of which is ordered individually.

To return table index information, you can use:
SHOW INDEX FROM <table>;
See: http://dev.mysql.com/doc/refman/5.0/en/show-index.html
To view table information:
SHOW CREATE TABLE <table>;
See: http://dev.mysql.com/doc/refman/5.0/en/show-create-table.html
Primary keys are indexes, so there's no need to create additional indexes. You can find out more information about them under the CREATE TABLE syntax (there's too much to insert here):
http://dev.mysql.com/doc/refman/5.0/en/create-table.html

There is a difference between composite index and composite primary key.
If you have defined a composite index like below
INDEX idx(invoice,item)
the index wont work if you query based on item and you need to add a separate index
INDEX itemidx(item)
But, if you have defined a composite primary key like below
PRIMARY KEY(invoice, item)
the index would work if you query based on item and no separate index is required.
Working example:
mysql>create table test ( col1 int(20), col2 int(20) ) primary key(col1,col2);
mysql>explain select * from test where col2 = 1;
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | test | index | NULL | PRIMARY | 8 | NULL | 10 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+

Mysql auto create an index for composite keys. Depending on your queries, you may have to create separate index for individual column in the composite key.
If you are using mysql workbench, you can manually right click the schema and click on edit to see everything about the table

If your query is using both columns in where clause then you don't need to create a separate index in a composite primary key.
EXPLAIN SELECT * FROM `table` WHERE invoice = 1 and item = 1
You are also fine if you want to query with first column only
EXPLAIN SELECT * FROM `table` WHERE invoice = 1
But if you want to query with subsequent columns col2, col3 in composite PK then you would need to create separate indexes on those columns. The following explain query shows the second column does not have a possible key detected by MySQL
EXPLAIN SELECT * FROM `table` WHERE item = 1

Related

Configuring indexes in postgres

this is my first time dealing with indexes and would like to understand few things.
I have the tables of the following schemas:
Table1: Customer details
id
name
createdOn
username
phone
address
1
xyz
some date
xyz12
12345678
abc
The id in the above table is unique. The id is not defined as PK in the table though. Would id + createdOn be a good complex index?
Table2: Tracked customer actions
customer id
name
timestamp
action type
cart value
address
1
xyz
some date
click
.
abc
The above table does not have any column with unique values and there can be a lot of sparse data. The above actions table is a sample and can have almost 18 columns, with new data being added frequently. Is having all columns as a index a good one?
The queries on these tables could be both simple and complex as below:
select * from customerDetails
OR
with target_customers as (
select
id as customer_id
from customerDetails
where customer_date > {some date}
)
select avg(cart_value) from actions a
where action_type = 'cart updated'
inner join target_customers b on a.customer_id = b.customer_id
These are sample queries and I believe I will be having even more complex queries using different aggregations and joins with other tables as well to gain insights while performing analytics in the future.
I want to understand the best columns for indexes on the above tables.
The id is not defined as PK in the table though."
That's unusual. Why is that?
Would id + createdOn be a good complex index?
No, you'd reverse it: createdOn, id. An index can use the first column alone. This allows you to use the index to order by createdOn and also createdOn between X and Y.
But you probably wouldn't include id in there at all. Make id a primary key and it is indexed.
In general, if you want to cover all possibilities for two keys, make two indexes...
columnA, columnB
columnB
columnA, columnB can cover queries which only reference columnA and can also order by columnA. It can also cover queries which reference both columnA and columnB. But it can't cover a query which only references columnB, so we need an single-column index for columnB.
Is having all columns as a index a good one?
Maybe, it depends on your queries, but probably not.
You want to index foreign keys, they should be indexed automatically, because that will speed up all joins.
You probably want to index timestamps that you're going to search or order by.
Any flags you often query by, such as where action_type = 'cart updated' you may want to index. Or you may want to partition the table by the action type.
The above actions table is a sample and can have almost 18 columns, with new data being added frequently.
This may be a good use of a single jsonb column to store all the miscellaneous attributes. This allows you to use a single index for the jsonb column. However, jsonb is not a panacea and you will have to choose what to put in jsonb and what to make columns.
For example, a timestamp such as createdOn should probably be a column. Also any foreign keys. And status flags such as action_type.

SQL: String column to unique integer?

I have a table place2022 which has a very long CHAR column
timestamp | user_id | pixel_color | coordinate
-----------------+------------------------------------------------------------------------------------------+-------------+------------
17:38:20.021+00 | p0sXpmkcmg1KLiCdK5e4xKdudb1f8cjscGs35082sKpGBfQIw92nZ7yGvWbQ/ggB1+kkRBaYu1zy6n16yL/yjA== | #FF4500 | 371,488
17:38:20.024+00 | Ctar52ln5JEpXT+tVVc8BtQwm1tPjRwPZmPvuamzsZDlFDkeo3+ItUW89J1rXDDeho6A4zCob1MKmJrzYAjipg== | #51E9F4 | 457,493
17:38:20.025+00 | rNMF5wpFYT2RAItySLf9IcFZwOhczQhkRhmTD4gv0K78DpieXrVUw8T/MBAZjj2BIS8h5exPISQ4vlyzLzad5w== | #000000 | 65,986
17:38:20.025+00 | u0a7l8hHVvncqYmav27EARAE6ciLtpUTPXMI33lDrUmtj5Ei3ixlfRuG28KUvs7r5LpeiE/iOKPALVjkILhrYg== | #3690EA | 73,961
The user_ids are already hashes, so all I really care about here is having some sort of id column which is 1-1 with the user_id.
I've counted the number of unique user_ids, which is 10381163, which fits into 24 bits. Therefore, I can compress the id field down to a 32-bit integer using the obvious scheme of "Assign 1 to the first new user_id you see, 2 to the second new user_id you see", etc. I don't even care that the user_id's are mapped in the order that they're seen: I just need them to be mapped in an invertible manner to 32-bit ints somehow. I'd also like to persist this mapping somewhere so that, if I want to, I can go backwards.
What would be the best way to achieve this? I imagine that we could create a new table (create table place2022_user_ids as select distinct(user_id) from place2022;?) and then reverse-lookup the user_id column in that table, but I don't know quite how to formulate the queries and also make sure that I'm not doing something ridiculously slow.
I am using postgresql, if it matters.
If you have a recent (>8) version of Postgres you can add an auto increment id column to an existing table.
ALTER TABLE place2022
ADD COLUMN id SERIAL PRIMARY KEY;
NB If the existing column is a PRIMARY KEY you will need to drop it first.
See drop primary key constraint in postgresql by knowing schema and table name only

Uniqueness in many-to-many

I couldn't figure out what terms to google, so help tagging this question or just pointing me in the way of a related question would be helpful.
I believe that I have a typical many-to-many relationship:
CREATE TABLE groups (
id integer PRIMARY KEY);
CREATE TABLE elements (
id integer PRIMARY KEY);
CREATE TABLE groups_elements (
groups_id integer REFERENCES groups,
elements_id integer REFERENCES elements,
PRIMARY KEY (groups_id, elements_id));
I want to have a constraint that there can only be one groups_id for a given set of elements_ids.
For example, the following is valid:
groups_id | elements_id
1 | 1
1 | 2
2 | 2
2 | 3
The following is not valid, because then groups 1 and 2 would be equivalent.
groups_id | elements_id
1 | 1
1 | 2
2 | 2
2 | 1
Not every subset of elements must have a group (this is not the power set), but new subsets may be formed. I suspect that my design is incorrect since I'm really talking about adding a group as a single entity.
How can I create identifiers for subsets of elements without risk of duplicating subsets?
That is an interesting problem.
One solution, albeit a klunky one, would be to store a concatenation of groups_id and elements_id in the groups table: 1-1-2 and make it a unique index.
Trying to do a search for duplicate groups before inserting a new row, would be an enormous performance hit.
The following query would spit out offending group ids:
with group_elements_arr as (
select groups_id, array_agg(elements_id order by elements_id) elements
from group_elements
group by groups_id )
select elements, count(*), array_agg(groups_id) offending_groups
from group_elements_arr
group by elements
having count(*) > 1;
Depending on the size of group_elements and its change rate you might get away with stuffing something along this lines into a trigger watching group_elements. If that's not fast enough you can materialize group_elements_arr into a real table managed by triggers.
And I think, the trigger should be FOR EACH STATEMENT and INITIALLY DEFERRED for easy building up a new group.
This link from user ypercube was most helpful: unique constraint on a set. In short, a bit of what everyone is saying is correct.
It's a question of tradeoffs, but here are the best options:
a) Add a hash or some other combination of element values to the groups table and make it unique, then populate the groups_elements table off of it using triggers. Pros of this method are that it preserves querying ability and enforces the constraint so long as you deny naked updates to groups_elements. Cons are that it adds complexity and you've now introduced logic like "how do you uniquely represent a set of elements" into your database.
b) Leave the tables as-is and control the access to groups_elements with your access layer, be it a stored procedure or otherwise. This has the advantage of preserving querying ability and keeps the database itself simple. However, it means that you are moving an analytic constraint into your access layer, which necessarily means that your access layer will need to be more complex. Another point is that it separates what the data should be from the data itself, which has both pros and cons. If you need faster access to whether or not a set already exists, you can attack that problem separately.

Adding a unique constraint on calculated value of a column

I'm not exactly sure how to phrase this, but here goes...
We have a table structure like the following:
Id | Timestamp | Type | Clientid | ..others..
001 | 1234567890 | TYPE1 | CL1234567 |.....
002 | 1234561890 | TYPE1 | CL1234567 |.....
Now for the data given above... I would like to have a constraint so that those 2 rows could not exist together. Essentially, I want the table to be
Unique for (Type, ClientId, CEIL(Timestamp/10000)*10000)
I don't want rows with the same data created within X time of each other to be added to the db, i.e would like a constraint violation in this case. The problem is that, the above constraint is not something I can actually create.
Before you ask, I know, I know.... why right? Well I know a certain scenario should not be happening, but alas it is. I need a sort of stop gap measure for now, so I can buy some time to investigate the actual matter. Let me know if you need additional info...
Yes, Oracle supports calculated columns:
SQL> alter table test add calc_column as (trunc(timestamp/10000));
Table altered.
SQL> alter table test
add constraint test_uniq
unique (type, clientid, calc_column);
Table altered.
should do what you want.
AFAIK, Oracle does not support computed columns like SQL Server does. You can mimic the functionality of a computed column using Triggers.
Here are the steps for this
Add a column called CEILCalculation to your table.
On your table, put a trigger will update CEILCalculation with the value from CEIL(Timestamp/10000)*10000
Create a Unique Index on the three columns (Unique for (Type, ClientId, CEILCalculation)
If you do not want to modify the table structure, you can put a BEFORE INSERT TRIGGER on the table and check for validity over there.
http://www.techonthenet.com/oracle/triggers/before_insert.php

What's a MySQL index table?

I need to speed up a query. Is an index table what I'm looking for? If so, how do I make one? Do I have to update it each insert?
Here are the table schemas:
--table1-- | --tableA-- | --table2--
id | id | id
attrib1 | t1id | attrib1
attrib2 | t2id | attrib2
| attrib1 |
And the query:
SELECT
table1.attrib1,
table1.attrib2,
tableA.attrib1
FROM
table1,
tableA
WHERE
table1.id = tableA.t1id
AND (tableA.t2id = x or ... or tableA.t2id = z)
GROUP BY
table1.id
You need to create a composite index on tableA:
CREATE INDEX ix_tablea_t1id_t2id ON table_A (t1id, t2id)
Indexes in MySQL are considered a part of a table: they are updated automatically, and used automatically whenever the optimizer decides it's a good move to use them.
MySQL does not use the term index table.
This term is used by Oracle to refer to what other databases call CLUSTERED INDEX: a kind of table where the records themselves are arranged according to the value of a column (or a set of columns).
In MySQL:
When you use MyISAM storage, an index is created as a separate file that has .MYI extension.
The contents of this file represent a B-Tree, each leaf containing the index key and a pointer to the offset in .MYD file which contains the data.
The size of the pointer is determined by the server setting called myisam_data_pointer_size, which can vary from 2 to 7 bytes, and defaults to 6 since MySQL 5.0.6.
This allows creating MyISAM tables up to 2 ^ (8 * 6) bytes = 256 TB
In InnoDB, all tables are inherently ordered by the PRIMARY KEY, it does not support heap-organized tables.
Each index, therefore, in fact is just a plain InnoDB table consisting of a single PRIMARY KEY of N+M records: N records being an indexed value, and M records being a PRIMARY KEY of the main table record which holds the indexed data.