Nulls in one of the columns in a composite unique index - sql

I have a unique index on (id, name) columns. I have a date column that I want to add to the index since I want the uniqueness to be based on (id, name, date) columns. The date column contains a lot of null values. How would it affect the index?

If you are using SQL Server, so in SQL Server null values are not included in the index structure, But SQL Server has some new features, one of the filtering index. If a field has many null values so recommended creating an additional filtering index using where the field is null condition.
For more information about filtering index visit this link
Final result: You can do your add index operations comfortably, without problems, in many Databases null values don't affect performance.

Related

Configuring indexes in postgres

this is my first time dealing with indexes and would like to understand few things.
I have the tables of the following schemas:
Table1: Customer details
id
name
createdOn
username
phone
address
1
xyz
some date
xyz12
12345678
abc
The id in the above table is unique. The id is not defined as PK in the table though. Would id + createdOn be a good complex index?
Table2: Tracked customer actions
customer id
name
timestamp
action type
cart value
address
1
xyz
some date
click
.
abc
The above table does not have any column with unique values and there can be a lot of sparse data. The above actions table is a sample and can have almost 18 columns, with new data being added frequently. Is having all columns as a index a good one?
The queries on these tables could be both simple and complex as below:
select * from customerDetails
OR
with target_customers as (
select
id as customer_id
from customerDetails
where customer_date > {some date}
)
select avg(cart_value) from actions a
where action_type = 'cart updated'
inner join target_customers b on a.customer_id = b.customer_id
These are sample queries and I believe I will be having even more complex queries using different aggregations and joins with other tables as well to gain insights while performing analytics in the future.
I want to understand the best columns for indexes on the above tables.
The id is not defined as PK in the table though."
That's unusual. Why is that?
Would id + createdOn be a good complex index?
No, you'd reverse it: createdOn, id. An index can use the first column alone. This allows you to use the index to order by createdOn and also createdOn between X and Y.
But you probably wouldn't include id in there at all. Make id a primary key and it is indexed.
In general, if you want to cover all possibilities for two keys, make two indexes...
columnA, columnB
columnB
columnA, columnB can cover queries which only reference columnA and can also order by columnA. It can also cover queries which reference both columnA and columnB. But it can't cover a query which only references columnB, so we need an single-column index for columnB.
Is having all columns as a index a good one?
Maybe, it depends on your queries, but probably not.
You want to index foreign keys, they should be indexed automatically, because that will speed up all joins.
You probably want to index timestamps that you're going to search or order by.
Any flags you often query by, such as where action_type = 'cart updated' you may want to index. Or you may want to partition the table by the action type.
The above actions table is a sample and can have almost 18 columns, with new data being added frequently.
This may be a good use of a single jsonb column to store all the miscellaneous attributes. This allows you to use a single index for the jsonb column. However, jsonb is not a panacea and you will have to choose what to put in jsonb and what to make columns.
For example, a timestamp such as createdOn should probably be a column. Also any foreign keys. And status flags such as action_type.

Compare 2 tables based on range values

We have big transaction tables, it has all the values (including duplicates), need to eliminate the duplicate values based on other table values.
Table A (Transaction table) has Store, Date, Index , Etc values
Table B maintain the Index ranges, it has Store, Date, Index Begin, Index End etc.
Based on Store, Date need to compare index from table A with Table B (Table B has index Range values), eliminate the ranges of index values from Table A, so I can avoid duplicate values.
If the given index is not in range of Index Begin and Index End, I can keep that. Indexes range starts from 1. But I need to keep 1, it's a header record.
It has to check from Index 2 onwards. If you could please help with SQL statement that would be great.
Tried with few statements, did not work.
Need to eliminate duplicate records based on Index ranges from table B
To eliminate the duplicates use the key word DISTINCT after SELECT, so SELECT DISTINCT. You'll need to write a JOIN statement that compares the two tables based on the common value.
I assume you already have a query so I won't write one unless you comment needing help:)

Best practice for indexing in SQL Server

I have a transaction table and a inventory table that I would like to 'JOIN' together. The tables need to 'JOIN' on three primary keys.
My question is: should I create a unique key (concatenation of the three fields) and create a 'INDEX' on the unique key or would I just create a non-clustered 'INDEX' on all three fields?
I'm currently using SQL Server 2014
I'm guessing the Transaction table is the biggest and the Inventory is the smaller. A lot depends on what proportion of the data would you expect to be returned by your join - If its most then a table scan will probably occur so an index wont help much. If your going to try and get a small subset of date then create an index on the 3 columns on both tables and create a foreign key from Trans to Inventory on the 3 cols. (SQL Server needs an index as well as a FK)
Pick the most granular column as the first in your index as this will encourage SQL servers Optimiser to use the index.

Why do we get a RID lookup in SQL?

I created a non-clustered index on "last_name" column in the table "Persons"
Select * From Persons
Where last_name = 'Hogg'
So why is the index incapable of returning all the columns simultaneously and instead does a RID lookup?
How does indexing work here?
The index only covers the column last_name, and only contains data about that column. You can conceptually think about the index that you've described as a series of pairs: (last_name,row), where row is a reference to a particular row in the actual table. The index stores the pairs sorted by last_name, but stores no additional information about the table.
Your query requests all of the columns of Persons. The index is used to locate the row or rows where last_name is "Hogg", but the database has to reference the table to retrieve the additional columns.
What you appear to want is a covering index for the columns of interest. The term "RID lookup" implies SQL Server. Perhaps the question What are Covering Indexes and Covered Queries in SQL Server? and the page it points to: Using Covering Indexes to Improve Query Performance will help.

MySql compound keys and null values

I have noticed that if I have a unique compound keys for two columns, column_a and column_b, then my sql ignores this constraint if one column is null.
E.g.
if column_a=1 and column_b = null I can insert column_a=1 and column_b=null as much as I like
if column_a=1 and column_b = 2 I can only insert this value once.
Is there a way to apply this constraint, other than maybe changing the columns to Not Null and setting default values?
http://dev.mysql.com/doc/refman/5.0/en/create-index.html
"A UNIQUE index creates a constraint such that all values in the index must be distinct. An error occurs if you try to add a new row with a key value that matches an existing row. This constraint does not apply to NULL values except for the BDB storage engine. For other engines, a UNIQUE index allows multiple NULL values for columns that can contain NULL."
So, no, you can't get MySQL to treat NULL as a unique value. I guess you have a couple of choices: you could do what you suggested in your question and store a "special value" instead of null, or you could use the BDB engine for the table. I don't think this minor difference in behaviour warrants making an unusual choice of storage engine, though.
I worked around this issue by creating a virtual (stored) column on the same table that was COALESCE(column_b, 0). I then made by unique composite index based upon that column (and the second column) instead. Works very well.
Of course this was probably not possible back in 2010 :)