How to use Oracle Indexes - sql

I am a PHP developer with little Oracle experience who is tasked to work with an Oracle database.
The first thing I have noticed is that the tables don't seem to have an auto number index as I am used to seeing in MySQL. Instead they seem to create an index out of two fields.
For example I noticed that one of the indexes is a combination of a Date Field and foreign key ID field. The Date field seems to store the entire date and timestamp so the combination is fairly unique.
If the index name was PLAYER_TABLE_IDX how would I go about using this index in my PHP code?
I want to reference a unique record by this index (rather than using two AND clauses in the WHERE portion of my SQL query)
Any advice Oracle/PHP gurus?

I want to reference a unique record by this index (rather than using two AND clauses in the WHERE portion of my SQL query)
There's no way around that you have to use reference all the columns in a composite primary key to get a unique row.
You can't use an index directly in a SQL query.
In Oracle, you use the hint syntax to suggestion an index that should be used, but the only means of hoping to use an index is by specifying the column(s) associated with it in the SELECT, JOIN, WHERE and ORDER BY clauses.
The first thing I have noticed is that the tables don't seem to have an auto number index as I am used to seeing in MySQL.
Oracle (and PostgreSQL) have what are called "sequences". They're separate objects from the table, but are used for functionality similar to MySQL's auto_increment. Unlike MySQL's auto_increment, you can have more than one sequence used per table (they're never associated), and can control each one individually.
Instead they seem to create an index out of two fields.
That's what the table design was, nothing specifically Oracle about it.
But I think it's time to address that an index has different meaning in a database than how you are using the term. An index is an additional step to make SELECTing data out of a table faster (but makes INSERT/UPDATE/DELETE slower because of maintaining them).
What you're talking about is actually called a primary key, and in this example it'd be called a composite key because it involves more than one column. One of the columns, either the DATE (consider it DATETIME) or the foreign key, can have duplicates in this case. But because of the key being based on both columns, it's the combination of the two values that makes them the key to a unique record in the table.

http://use-the-index-luke.com/ is my Web-Book that explains how to use indexes in Oracle.
It's an overkill to your question, however, it is probably worth reading if you want to understand how things work.

Related

MariaDB Indexing

Let's say I have a table of 200,000,000 users. For each user I have saved a certain attribute. Let it be their lastname.
I am unsure of which index type to use with MariaDB. The only queries made to the database will be in the form of SELECT lastname FROM table WHERE username='MYUSERNAME'.
Is it therefore the best to just define the column username as a primary key. Or do I need to do anything else? Also how long is it going to take until the index is built?
Sorry for this question, but this is my first database with more than 200.000 rows.
I would go with:
CREATE INDEX userindex on `table`(username);
This will index the usernames since this is what your query is searching on. This will speed up the results coming back as the username column will be indexed.
Try it and if it reduces performance just delete the index, nothing lost (although make sure you do have backups! :))
This article will help you out https://mariadb.com/kb/en/getting-started-with-indexes/
It says primary keys are best set at table creation and as I guess yours is already in existence that would mean either copying it and creating a primary key or just using an index.
I recently indexed a table with non unique strings as an ID and although it took a few minutes to index the speed performance was a great improvement, this table was 57m rows.
-EDIT- Just re-read and thought it was 200,000 as mentioned at the end but see it is 200,000,000 in the title, that's a hella lotta rows.
username sounds like something that is "unique" and not null. So, make it NOT NULL and have PRIMARY KEY(username), without an AUTO_INCREMENT surrogate PK.
If it not unique, or cannot be NOT NULL, then INDEX(username) is very likely to be useful.
To design indexes, you must first know what queries you will be performing. (If you had called it simply "col1", I would not have been able to guess at the above advice.)
There are 3 index types:
BTree (actually B+Tree; see Wikipedia). This is the default and the most commonly used index type. It is efficient at finding a row given a specific value (WHERE user_name = 'joe'). It is also useful for a range of values (WHERE user_name LIKE 'Smith%').
FULLTEXT is useful for a TEXT column where you want to search for "words" inside it.
SPATIAL is useful for 2-dimensional data, such as geographical points on a map or other type of grid.

Database: Should ids be sequential?

I want to use an id as a primary key for my table. In each record, I am also storing an id from an other source, but these ids are in no way sequential.
Should I add an (auto-incremented) column with a "new" id? It is very important that queries by the id are as fast as possible.
Some info:
The content of my table is only stored "temporary", The table gets often cleared (TRUNCATE) and than filled with new content.
It's a sql-server 2008
After writing content to the table, I create an index for the id column
Thanks!
As long as you are sure the supplied id's are unique, there's no need to create another (surrogate) id to use as primary key.
Under most circumstances, an index on the existing id should be sufficient. You can make it slightly faster by declaring it as a primary key.
From what you describe a new id is not necessary for performance. If you do add one, the table will be slightly larger, which has a (very small) negative effect on performance.
If the existing id is not numeric (or not an integer), then there might be a small gain from using a more efficient type for the index. But, your best bet is to make the existing id a primary key (although this might affect load performance).
Note: I usually prefer synthetic primary keys, so this answer is very specific to your question.
If you are after speed I would join the two IDs together (either from the application or stored proc) and then put them in one column

SQL Server: How to allow duplicate records on small table

I have a small table "ImgViews" that only contains two columns, an ID column called "imgID" + a count column called "viewed", both set up as int.
The idea is to use this table only as a counter so that I can track how often an image with a certain ID is viewed / clicked.
The table has no primary or foreign keys and no relationships.
However, when I enter some data for testing and try entering the same imgID multiple times it always appears greyed out and with a red error icon.
Usually this makes sense as you don't want duplicate records but as the purpose is different here it does make sense for me.
Can someone tell me how I can achieve this or work around it ? What would be a common way to do this ?
Many thanks in advance, Tim.
To address your requirement to store non-unique values, simply remove primary keys, unique constraints, and unique indexes. I expect you may still want a non-unique clustered index on ImgID to improve performance of aggregate queries that would otherwise require a scan the entire table and sort. I suggest you store an insert timestamp, not to provide uniqueness, but to facilitate purging data by date, should the need arise in the future.
You must have some unique index on that table. Make sure there is no unique index and no unique or primary key constraint.
Or, SSMS simply doesn't know how to identify the row that was just inserted because it has no key.
It is generally not best practice to have a table without a (logical) primary key. In your case, I'd make the image id the primary key and increment the counter. The MERGE statement is well-suited for performing and insert or update at the same time. Alternatives exist.
If you don't like that, create a surrogate primary key (an identity column set as the primary key).
At the moment you have no way of addressing a specific row. That makes the table a little unwieldy.
If you allow multiple rows being absolutely identical, how would you update/delete one of those rows?
How would you expect the database being able to "know" what row you referred to??
At the very least add a separate identity column (preferred being the clustered index, too).
As a side note: It's weird that you "like to avoid unneeded data" but at the same time insert duplicates over and over again instead of simply add up the click count per single image...
Use SQL statements, not GUI, if the table has not primary key or unique constraint.

Why most SQL databases allow defining the same index twice?

Why most SQL databases allow defining the same index (or constraint) twice?
For example in MySQL I can do:
CREATE TABLE testkey(id VARCHAR(10) NOT NULL, PRIMARY KEY(id));
ALTER TABLE testkey ADD KEY (id);
ALTER TABLE testkey ADD KEY (id);
SHOW CREATE TABLE testkey;
CREATE TABLE `testkey` (
`id` varchar(10) NOT NULL,
PRIMARY KEY (`id`),
KEY `id` (`id`),
KEY `id_2` (`id`)
)
I do not see any use case for having the same index or constraint twice. And I would like SQL databases not allowing me do so.
I also do not see the point on naming indexes or constraints, as I could reference them for deletion just as I created them.
Several reasons come to mind. In the case of a database product which supports multiple index types it is possible that you might want to have the same field or combination of fields indexed multiple times, with each index having a different type depending on intended usage. For example, some (perhaps most) database products have a tree-structured index which is good for both direct lookup (e.g KEY_FIELD = 1) and range scans (e.g. KEY_FIELD > 0 AND KEY_FIELD < 5). In addition, some (but definitely not all) database products also support a hashed index type, which is only useful for direct lookups but which is very fast (e.g. would work for a comparison such as KEY_FIELD = 1 but which could not be used for a range comparison). If you need to have very fast direct lookup times but still need to to provide for ranged comparisons it might be useful to create both a tree-structured index and a hashed index.
Some database products do prevent you from having multiple primary key constraints on a table. However, preventing all possible duplicates might require more effort on the part of the database vendor than they feel can be justified. In the case of an open source database the principal developers might take the view that if a given feature is a big enough deal to a given user it should be up to that user to send in a code patch to enable whatever feature it is that is wanted. Open source is not a euphemism for "I use your open-source product; therefore, you are now my slave and must implement every feature I might ever want!".
In the end I think it's fair to say that a product which is intended for use by software developers can take it as a given that the user should be expected to exercise reasonable care when using the product.
All programming languages allow you to write redundancies:
<?php
$foo = 'bar';
$foo = 'bar';
That's just an example, you could obviously have duplicate code, duplicate functions, or duplicate data structures that are much more wasteful.
It's up to you to write good code, and this depends on the situation. Maybe there's a good reason in some rare case to write something that seems redundant. In that case, you'd be just as put out if the technology didn't allow you to do it.
You might be interested in a tool called Maatkit, which is a collection of indispensable tools for MySQL users. One of its tools checks for duplicate keys:
http://www.maatkit.org/doc/mk-duplicate-key-checker.html
If you're a MySQL developer, novice or expert, you should download Maatkit right away and set aside a full day to read the docs, try out each tool in the set, and learn how to integrate them into your daily development tasks. You'll kick yourself for not doing it sooner.
As for naming indexes, it allows you to do this:
ALTER TABLE testkey DROP KEY `id`, DROP KEY `id_2`;
If they weren't named, you'd have no way to drop individual indexes. You'd have to drop the whole table and recreate it without the indexes.
There are only two good reasons - that I can think of - for allowing defining the same index twice
for compatibility with existing scripts that do define the same index twice.
changing the implementation would require work that I am neither willing to do nor pay for
I can see that some databases prevent duplicate indexes. Oracle Database prevents duplicate indexes https://www.techonthenet.com/oracle/errors/ora01408.php while other databases like MySQL and PostgreSQL do not have duplicate index prevention.
You shouldn't be in a scenario that you have so many indexes on a table that you can't just quickly look and see if the index in there.
As for naming constraints and indexes, I only really ever name constraints. I will name a constraint FK_CurrentTable_ForeignKeyedColumn, just so things are more visible when quickly looking through lists of them.
Because databases that support covering indexes - Oracle, MySQL, SQL Server... (but not PostgreSQL, oddly). A covering index means indexing two or more columns, and are processed left to right for that column list in order to use them.
So if I define a covering index on columns 1, 2 and 3 - my queries need to use, at a minimum, column 1 to use the index. The next possible combination is column 1 & 2, and finally 1,2 and 3.
So what about my queries that only use column 3? Without the other two columns, the covering index can't be used. It's the same issue for only column 2 use... Either case, that's a situation where I would consider separate indexes on columns 2 and 3.

SQL Server select primary key from table where the key contains multiple columns

I am working on a legacy database. I am not able to change the schema :( in a couple of tables the primary key uses multiple columns.
In the app I read the data in each row into a table the user then updates the data and I write the data back into the table.
Currently I concatenate the various PK columns and store them as a unique id for when I put the data back into the table.
Now I was wondering if there is a more efficient way to do that. Coming from a mySQL background I am not aware of any but thought SQL Server 2005 may have a function
SELECT PRIMARYKEY() as pk, ... FROM table WHERE ...
the above would select the key that the database engine uses as the primary key for the given record
I searched and couldn't find anything. Its probably just me being fussy but I don't like the concatenation trick.
DC
In SQL Server, there is no equivalent of PRIMARYKEY() that I would be aware of, really. You can consult the system catalog views to find out which columns make up the primary key, but you can't just simply select the primary key value(s) with a function call.
I would agree with StarShip3000 - what do you concatenate your PK values for? While I don't think a compound primary key made up of several columns is necessarily a very good idea, if it's a legacy system and you can't change it, I wouldn't bother concatenating the PK values on read, and then having to split them apart again when you write your data back. Just leave the structure as it is - compound keys aren't generally recommended, but they are indeed supported, no problem.
"Currently I concatenate the various PK columns and store them as a unique id for when I put the data back into the table."
Can't you just store the pk as two columns in the target table and use that to join back to the two columns on the source table?
What benefit is concatenating giving you here?