Performance issue related to using WHERE clause - sql

I am facing here a problem with a SQL statement, which fetches the data from table called News according to category :
News : Id(PK) int, Title string, Subject string, Uid int, Cid int.
SELECT Id, Subject, Uid, Title FROM News WHERE Uid = #Uid
This statement operates slowly comparing to statement without WHERE since it should go and check every single row to ensure if it is accomplish the condition.
So imagine with me the table News with 10000000 article. What should I do about such a thing?

If the Id column is a primary key that might already be clustered (they are by default), you just need to create a non-clustered index on the Uid column - especially of the Uid column is a uniqueidentifier (GUID) data type.
This can be created by running the following SQL:
CREATE NONCLUSTERED INDEX IX_News_Uid ON News
(
Uid
)

You should create an index for the column Uid, with Id, Subject and Title as included columns.
That way the database can run the query using only the index, and doesn't have to touch the table at all.

I would create an unique index on Uid.

You should could create a clustered index on the Uid column. This should improve performance. Note: You can only have one clustered index column per table.

Related

What is the most efficient strategy for lookups on a large, static table which is already in sorted order (sqlite)?

I have a basic reverse lookup table in which the ids are already sorted in ascending numerical order:
id INT NOT NULL,
value INT NOT NULL
The ids are not unique; each id has from 5 to 25,000 associated values. Each id is independent, i.e., no relationships between the ids.
The table is static. Read only, no inserts or updates ever. The table has 100-200 million records. The database itself will be around 7-12gb. Sqlite.
I will do frequent lookups in this table and want the fastest response time for each query. Lookups are one-direction only, unordered, and always of the form:
SELECT value WHERE id IN (x,y,z)
What advantages does the pre-sorted order give me in terms of database efficiency? What should I do differently than I would with typical unordered tables? How do I tell sql that it's an ordered list?
What about indices: is it necessary or even helpful to create an index on id?
[Updated for clustered comment thanks to Gordon Linoff]. As far as I can tell, sqlite doesn't support clustered indices directly. The wiki says: "Are [clustered indices] supported? No, but if you use INTEGER PRIMARY KEY it acts as a clustered index." In my situation, the column id is not unique...
Assuming that space is not an issue, you should create an index on (id, value). This should be sufficient for your purposes.
However, if the table is static, then I would recommend that you create a clustered index when you create the table. The index would have the same keys, (id, value).
If the table happens to be sorted, the database does not know about this, so you'd still need an index.
It is a better idea to use a WITHOUT ROWID table (what other DBs call a clustered index):
CREATE TABLE MyLittleLookupTable (
id INTEGER,
value INTEGER,
PRIMARY KEY (id, value)
) WITHOUT ROWID;

RDBMS primary key design for row versioning

I want to design primary key for my table with row versioning. My table contains 2 main fields : ID and Timestamp, and bunch of other fields. For a unique "ID" , I want to store previous versions of a record. Hence I am creating primary key for the table to be combination of ID and timestamp fields.
Hence to see all the versions of a particular ID, I can give,
Select * from table_name where ID=<ID_value>
To return the most recent version of a ID, I can use
Select * from table_name where ID=<ID_value> ORDER BY timestamp desc
and get the first element.
My question here is, will this query be efficient and run in O(1) instead of scanning the entire table to get all entries matching same ID considering ID field was a part of primary key fields? Ideally to get a result in O(1), I should have provided the entire primary key. If it does need to do entire table scan, then how else can I design my primary key so that I get this request done in O(1)?
Thanks,
Sriram
The canonical reference on this subject is Effective Timestamping in Databases:
https://www.cs.arizona.edu/~rts/pubs/VLDBJ99.pdf
I usually design with a subset of this paper's recommendations, using a table containing a primary key only, with another referencing table that has that key as well change_user, valid_from and valid_until colums with appropriate defaults. This makes referential integrity easy, as well as future value insertion and history retention. Index as appropriate, and consider check constraints or triggers to prevent overlaps and gaps if you expose these fields to the application for direct modification. These have an obvious performance overhead.
We then make a "current values view" which is exposed to developers, and is also insertable via an "instead of" trigger.
It's far easier and better to use the History Table pattern for this.
create table foo (
foo_id int primary key,
name text
);
create table foo_history (
foo_id int,
version int,
name text,
operation char(1) check ( operation in ('u','d') ),
modified_at timestamp,
modified_by text
primary key (foo_id, version)
);
Create a trigger to copy a foo row to foo_history on update or delete.
https://wiki.postgresql.org/wiki/Audit_trigger_91plus for a full example with postgres

Unique index in already existing database table

I am trying to add a new unique index on one of my database tables in SQL Server 2008. This is an existing table and the column where I want the unique index already has some duplicate values.
Can I set up a unique index for that column? If so, how?
You can't set this column up with a UNIQUE index if the table already has duplicate values, unless you remove the records containing the duplicate values for that column. This goes to the definition of UNIQUE.
First you are gonna need to delete the duplicate values on your column and then you can create a unique index on it. So lets assume your table has 2 columns, id and column1. To delete duplicate values you need to choose one, it can be random or with some order. So it would be like this:
WITH CTE AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY column1 ORDER BY Id) Corr
FROM YourTable
)
DELETE FROM CTE
WHERE Corr > 1
CREATE UNIQUE INDEX I_Unique ON YourTable(Column1)
No as the name suggest, Unique Index which says key has to be unique. So you cant
See this
If the column already has duplicate values then I would recommend you create a unique composite key instead.
e.g.
So, to handle that issue with this table design, you need to create a unique constraint on the table CustomerID/ProductID columns:
create unique index cust_products_unique on CustomerProducts (CustomerID, ProductID)
So that in essence a combination of fields ensures that the index is unique.
Regards
May not have been true in SQL Server 2008, however you can use Management Studio to do this in later versions such as 2014.
Right click your table
Choose Design
Expand "Identity Specification" and set (is Identity) to Yes
Save

Create an index only on certain rows in mysql

So, I have this funny requirement of creating an index on a table only on a certain set of rows.
This is what my table looks like:
USER: userid, friendid, created, blah0, blah1, ..., blahN
Now, I'd like to create an index on:
(userid, friendid, created)
but only on those rows where userid = friendid. The reason being that this index is only going to be used to satisfy queries where the WHERE clause contains "userid = friendid". There will be many rows where this is NOT the case, and I really don't want to waste all that extra space on the index.
Another option would be to create a table (query table) which is populated on insert/update of this table and create a trigger to do so, but again I am guessing an index on that table would mean that the data would be stored twice.
How does mysql store Primary Keys? I mean is the table ordered on the Primary Key or is it ordered by insert order and the PK is like a normal unique index?
I checked up on clustered indexes (http://dev.mysql.com/doc/refman/5.0/en/innodb-index-types.html), but it seems only InnoDB supports them. I am using MyISAM (I mention this because then I could have created a clustered index on these 3 fields in the query table).
I am basically looking for something like this:
ALTER TABLE USERS ADD INDEX (userid, friendid, created) WHERE userid=friendid
Regarding the conditional index:
You can't do this. MySQL has no such thing.
Regarding the primary key:
It depends on the storage engine. MySQL does not define how data is stored or retrieved, that's left up to the storage engine.
MyISAM does not enforce any order on how rows are stored; they're appended to the end of the table but gaps from deleting can be reused and UPDATE queries can leave things out of order even without any DELETEs.
InnoDB stores rows in order of their primary keys.
hard to tell what you're actually trying to do here (why would a user need to be his own friend?) but it seems to me a simple rethinking of your database schema would resolve this problem.
table 1: USER: userid, created, blah0, blah1, ...,
table 2: userIsFriend (user1,user2,...)
and just do your indexing on table 2 (whose elements presumably have foreign key constraint on table 1)
btw you should probably be using InnoDB if you want to do anything semi-serious with mySQL anyway, IMHO.

Oracle Table Index creation

I created a table named STUDENT. It has the following columns:
Id
Name
Surname
DateOfBirth
DateOfAdmission
DateOfPassout
This table have following primary key:
Id
Name
Surname
DateOfAdmission
Do I need to create an index another index of column Id, Name if want to query this table providing input just Id and Name?
An index isn't necessary for queries.
An index has the potential to speed up queries if the index can be used, but will slow down INSERT/UPDATE/DELETE statements.
I'm not clear when Oracle started, but Oracle 10g+ will automatically create an index when a primary key is defined for a table. That index will match the column(s) that make up the primary key. Being that the id and name columns are part of the primary key, the pair is guaranteed to be unique and I don't see the need to create an additional covering index.