SQL Server two columns in index but query on only one - sql

In legacy code I found an index as follows:
CREATE CLUSTERED INDEX ix_MyTable_foo ON MyTable
(
id ASC,
name ASC
)
If I understand correctly, this index would be useful for querying on column id alone, or id and name. Do I have that correct?
So it would possibly improve retrieval of records by doing:
select someColumn from MyTable where id = 4
But it would do nothing for this query:
select someColumn from MyTable where name = 'test'

Yes, you are right. But in case when you have table with many columns:
A
B
C
D
..
F
where your primary key index is for example (A), if you have second index like (B,C), the engine may decide to use it if you are using query like this:
CREATE TABLE dbo.StackOverflow
(
A INT
,B INT
,C INT
,D INT
,E INT
,PRIMARY KEY (A)
,CONSTRAINT IX UNIQUE(B,C)
)
SELECT A
,C
FROM dbo.StackOverflow
WHERE C = 0;
So, if an index can be used as covering, the engine may use it even you are not interested in some of the columns, because it will decide that less work (less reads) will be performed.
In your case, it seems that a PK on id column is a better choice with combination with second index on the name column.

Related

SQL Server Indexing and Composite Keys

Given the following:
-- This table will have roughly 14 million records
CREATE TABLE IdMappings
(
Id int IDENTITY(1,1) NOT NULL,
OldId int NOT NULL,
NewId int NOT NULL,
RecordType varchar(80) NOT NULL, -- 15 distinct values, will never increase
Processed bit NOT NULL DEFAULT 0,
CONSTRAINT pk_IdMappings
PRIMARY KEY CLUSTERED (Id ASC)
)
CREATE UNIQUE INDEX ux_IdMappings_OldId ON IdMappings (OldId);
CREATE UNIQUE INDEX ux_IdMappings_NewId ON IdMappings (NewId);
and this is the most common query run against the table:
WHILE #firstBatchId <= #maxBatchId
BEGIN
-- the result of this is used to insert into another table:
SELECT
NewId, -- and lots of non-indexed columns from SOME_TABLE
FROM
IdMappings map
INNER JOIN
SOME_TABLE foo ON foo.Id = map.OldId
WHERE
map.Id BETWEEN #firstBatchId AND #lastBatchId
AND map.RecordType = #someRecordType
AND map.Processed = 0
-- We only really need this in case the user kills the binary or SQL Server service:
UPDATE IdMappings
SET Processed = 1
WHERE map.Id BETWEEN #firstBatchId AND #lastBatchId
AND map.RecordType = #someRecordType
SET #firstBatchId += 4999
SET #lastBatchId += 4999
END
What are the best indices to add? I figure Processed isn't worth indexing since it only has 2 values. Is it worth indexing RecordType since there are only about 15 distinct values? How many distinct values will a column likely have before we consider indexing it?
Is there any advantage in a composite key if some of the fields are in the WHERE and some are in a JOIN's ON condition? For example:
CREATE INDEX ix_IdMappings_RecordType_OldId
ON IdMappings (RecordType, OldId)
... if I wanted both these fields indexed (I'm not saying I do), does this composite key gain any advantage since both columns don't appear together in the same WHERE or same ON?
Insert time into IdMappings isn't really an issue. After we insert all records into the table, we don't need to do so again for months if ever.

How to make sure only one column is not null in postgresql table

I'm trying to setup a table and add some constraints to it. I was planning on using partial indexes to add constraints to create some composite keys, but ran into the problem of handling NULL values. We have a situation where we want to make sure that in a table only one of two columns is populated for a given row, and that the populated value is unique. I'm trying to figure out how to do this, but I'm having a tough time. Perhaps something like this:
CREATE INDEX foo_idx_a ON foo (colA) WHERE colB is NULL
CREATE INDEX foo_idx_b ON foo (colB) WHERE colA is NULL
Would this work? Additionally, is there a good way to expand this to a larger number of columns?
Another way to write this constraint is to use the num_nonulls() function:
create table table_name
(
a integer,
b integer,
check ( num_nonnulls(a,b) = 1)
);
This is especially useful if you have more columns:
create table table_name
(
a integer,
b integer,
c integer,
d integer,
check ( num_nonnulls(a,b,c,d) = 1)
);
You can use the following check:
create table table_name
(
a integer,
b integer,
check ((a is null) != (b is null))
);
If there are more columns, you can use the trick with casting boolean to integer:
create table table_name
(
a integer,
b integer,
...
n integer,
check ((a is not null)::integer + (b is not null)::integer + ... + (n is not null)::integer = 1)
);
In this example only one column can be not null (it simply counts not null columns), but you can make it any number.
One can do this with an insert/update trigger or checks, but having to do so indicates it could be done better. Constraints exist to give you certainty about your data so you don't have to be constantly checking if the data is valid. If one or the other is not null, you have to do the checks in your queries.
This is better solved with table inheritance and views.
Let's say you have (American) clients. Some are businesses and some are individuals. Everyone needs a Taxpayer Identification Number which can be one of several things such as a either a Social Security Number or Employer Identification Number.
create table generic_clients (
id bigserial primary key,
name text not null
);
create table individual_clients (
ssn numeric(9) not null
) inherits(generic_clients);
create table business_clients (
ein numeric(9) not null
) inherits(generic_clients);
SSN and EIN are both Taxpayer Identification Numbers and you can make a view which will treat both the same.
create view clients as
select id, name, ssn as tin from individual_clients
union
select id, name, ein as tin from business_clients;
Now you can query clients.tin or if you specifically want businesses you query business_clients.ein and for individuals individual_clients.ssn. And you can see how the inherited tables can be expanded to accommodate more divergent information between types of clients.

When I retrieve a non-indexed column from a large table, SQL Server looks up the PK

I have a huge table which has millions of rows:
create table BigTable
(
id bigint IDENTITY(1,1) NOT NULL,
value float not null,
vid char(1) NOT NULL,
dnf char(4) NOT NULL,
rbm char(6) NOT NULL,
cnt char(2) not null,
prs char(1) not null,
...
PRIMARY KEY CLUSTERED (id ASC)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
)
There is a non-clustered unique index on this table which is on the following columns:
vid, dnf, rbm, cnt, prs
Now I would like to get some data out of this huge table with a partial index:
insert into #t
select b.id, b.rbm, b.value
from
(select distinct x from #t1) a
join
BigTable b on b.vid = '1'
and b.dnf = '1234'
and b.rbm = a.x
where:
create table #t
(
id integer primary key,
rbm varchar(8),
val float null
)
create index tmp_idx_t on #t(rbm)
create table #t1 (rbm char(6) not null)
If I don't include the value column in the query, the execution plan will not show the PK lookup on BigTable. But if I have the value in the resultset, a PK lookup will be on BigTable. The output list of this lookup is the very column val.
But I am not using any PKs here so it takes a lot of time to finish. Is there a way to stop that PK lookup? Or am I writing wrong SQL?
I know the BigTable is not a great design but I can't change it.
In order to avoid the PK index lookup you'll need to include all the columns (filtering predicate AND result set) in the index. For example:
create index ix1 on BigTable (dnf, vid, rbm) include (value);
This index only uses the first three columns (dnf, vid, rbm) for search, and adds (value) columns as data. As pointed out by #DanGusman the id column is always present in a SQL Server secondary index. This way the index has all the data it needs to resolve the query.
Alternatively, you could use the old way:
create index ix1 on BigTable (dnf, vid, rbm, value);
This will also work, but will generate a heavier index. That being said, this heavier index may also be of use for other queries.
Your index is lacking the id column. In order to resolve the query, the engine needs to go to the data pages. I would advise you include the column in the index, so the index covers the query.
I am thinking that you might try writing the query as:
select b.id, b.rbm, b.value
from BigTable b
where b.vid = '1' and
b.dnf = '1234' and
exists (select 1 from #t1 t1 where t1.x = b.rbm);
This might encourage SQL Server to use a more optimal execution plan, even when id is not in the index.

SQLite performance tuning for paginated fetches

I am trying to optimize the query I use for fetching paginated data from database with large data sets.
My schema looks like this:
CREATE TABLE users (
user_id TEXT PRIMARY KEY,
name TEXT,
custom_fields TEXT
);
CREATE TABLE events (
event_id TEXT PRIMARY KEY,
organizer_id TEXT NOT NULL REFERENCES users(user_id) ON DELETE SET NULL ON UPDATE CASCADE,
name TEXT NOT NULL,
type TEXT NOT NULL,
start_time INTEGER,
duration INTEGER
-- more columns here, omitted for the sake of simplicity
);
CREATE INDEX events_organizer_id_start_time_idx ON events(organizer_id, start_time);
CREATE INDEX events_organizer_id_type_idx ON events(organizer_id, type);
CREATE INDEX events_organizer_id_type_start_time_idx ON events(organizer_id, type, start_time);
CREATE INDEX events_type_start_time_idx ON events(type, start_time);
CREATE INDEX events_start_time_desc_idx ON events(start_time DESC);
CREATE INDEX events_start_time_asc_idx ON events(IFNULL(start_time, 253402300800) ASC);
CREATE TABLE event_participants (
participant_id TEXT NOT NULL REFERENCES users(user_id) ON DELETE CASCADE ON UPDATE CASCADE,
event_id TEXT NOT NULL REFERENCES events(event_id) ON DELETE CASCADE ON UPDATE CASCADE,
role INTEGER NOT NULL DEFAULT 0,
UNIQUE (participant_id, event_id) ON CONFLICT REPLACE
);
CREATE INDEX event_participants_participant_id_event_id_idx ON event_participants(participant_id, event_id);
CREATE INDEX event_participants_event_id_idx ON event_participants(event_id);
CREATE TABLE event_tag_maps (
event_id TEXT NOT NULL REFERENCES events(event_id) ON DELETE CASCADE ON UPDATE CASCADE,
tag_id TEXT NOT NULL,
PRIMARY KEY (event_id, tag_id) ON CONFLICT IGNORE
);
CREATE INDEX event_tag_maps_event_id_tag_id_idx ON event_tag_maps(event_id, tag_id);
Where in events table I have around 1,500,000 entries, and around 2,000,000 in event_participants.
Now, a typical query would look something like:
SELECT
EVTS.event_id,
EVTS.type,
EVTS.name,
EVTS.time,
EVTS.duration
FROM events AS EVTS
WHERE
EVTS.organizer_id IN(
'f39c3bb1-3ee3-11e6-a0dc-005056c00008',
'4555e70f-3f1d-11e6-a0dc-005056c00008',
'6e7e33ae-3f1c-11e6-a0dc-005056c00008',
'4850a6a0-3ee4-11e6-a0dc-005056c00008',
'e06f784c-3eea-11e6-a0dc-005056c00008',
'bc6a0f73-3f1d-11e6-a0dc-005056c00008',
'68959fb5-3ef3-11e6-a0dc-005056c00008',
'c4c96cf2-3f1a-11e6-a0dc-005056c00008',
'727e49d1-3f1b-11e6-a0dc-005056c00008',
'930bcfb6-3f09-11e6-a0dc-005056c00008')
AND EVTS.type IN('Meeting', 'Conversation')
AND(
EXISTS (
SELECT 1 FROM event_tag_maps AS ETM WHERE ETM.event_id = EVTS.event_id AND
ETM.tag_id IN ('00000000-0000-0000-0000-000000000000', '6ae6870f-1aac-11e6-aeb9-005056c00008', '6ae6870c-1aac-11e6-aeb9-005056c00008', '1f6d3ccb-eaed-4068-a46b-ec2547fec1ff'))
OR NOT EXISTS (
SELECT 1 FROM event_tag_maps AS ETM WHERE ETM.event_id = EVTS.event_id)
)
AND EXISTS (
SELECT 1 FROM event_participants AS EPRTS
WHERE
EVTS.event_id = EPRTS.event_id
AND participant_id NOT IN('79869516-3ef2-11e6-a0dc-005056c00008', '79869515-3ef2-11e6-a0dc-005056c00008', '79869516-4e18-11e6-a0dc-005056c00008')
)
ORDER BY IFNULL(EVTS.start_time, 253402300800) ASC
LIMIT 100 OFFSET #Offset;
Also, for fetching the overall count of the query-matching items, I would use the above query with count(1) instead of the columns and without the ORDER BY and LIMIT/OFFSET clauses.
I experience two main problems here:
1) The performance drastically decreases as I increase the #Offset value. The difference is very significant - from being almost immediate to a number of seconds.
2) The count query takes a long time (number of seconds) and produces the following execution plan:
0|0|0|SCAN TABLE events AS EVTS
0|0|0|EXECUTE LIST SUBQUERY 1
0|0|0|EXECUTE LIST SUBQUERY 1
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE event_tag_maps AS ETM USING COVERING INDEX event_tag_maps_event_id_tag_id_idx (event_id=? AND tag_id=?)
1|0|0|EXECUTE LIST SUBQUERY 2
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 2
2|0|0|SEARCH TABLE event_tag_maps AS ETM USING COVERING INDEX event_tag_maps_event_id_tag_id_idx (event_id=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 3
3|0|0|SEARCH TABLE event_participants AS EPRTS USING INDEX event_participants_event_id_idx (event_id=?)
Here I don't understand why the full scan is performed instead of an index scan.
Additional info and SQLite settings used:
I use System.Data.SQLite provider (have to, because of custom functions support)
Page size = cluster size (4096 in my case)
Cache size = 100000
Journal mode = WAL
Temp store = 2 (memory)
No transaction is open for the query
Is there anything I could do to change the query/schema or settings in order to get as much performance improvement as possible?

SQL Constraint on column value depending on value of other column

First, I have simple [SomeType] table, with columns [ID] and [Name].
Also I have [SomeTable] table, with fields like:
[ID],
[SomeTypeID] (FK),
[UserID] (FK),
[IsExpression]
Finally, I have to made on database layer a constraint that:
for concrete [SomeType] IDs (actually, for all but one),
for same UserID,
only one entry should have [IsExpression] equal to 1
(IsExpression is of BIT type)
I don't know if it's complex condition or not, but I have no idea how to write it. How would you implement such constraint?
You can do this with filtered index:
CREATE UNIQUE NONCLUSTERED INDEX [IDX_SomeTable] ON [dbo].[SomeTable]
(
[UserID] ASC
)
WHERE ([SomeTypeID] <> 1 AND [IsExpression] = 1)
or:
CREATE UNIQUE NONCLUSTERED INDEX [IDX_SomeTable] ON [dbo].[SomeTable]
(
[UserID] ASC,
[SomeTypeID] ASC
)
WHERE ([SomeTypeID] <> 1 AND [IsExpression] = 1)
Depends on what you are trying to achieve. Only one [IsExpression] = 1 within one user without consideration of [SomeTypeID] or you want only one [IsExpression] = 1 within one user and one [SomeTypeID].