Creating optimized Azure SQL table for querying - sql

Assuming I have a table in Azure SQL DB with a million rows. What are the ways I can optimize the table for performing queries using WHERE clause. Column 1 is for the id which is the primary key. Column 2 to 5 is for addresses (St, city, state, zip) and columns 6 to 8 are digits.

Look into indexes. If you are going to search by address, add an index on address fields. If most searches are by zip code, add an index on that field. For more info on indexes have a look at this document Index Table

Related

Nulls in one of the columns in a composite unique index

I have a unique index on (id, name) columns. I have a date column that I want to add to the index since I want the uniqueness to be based on (id, name, date) columns. The date column contains a lot of null values. How would it affect the index?
If you are using SQL Server, so in SQL Server null values are not included in the index structure, But SQL Server has some new features, one of the filtering index. If a field has many null values so recommended creating an additional filtering index using where the field is null condition.
For more information about filtering index visit this link
Final result: You can do your add index operations comfortably, without problems, in many Databases null values don't affect performance.

MS SQL Server Optimize repeated column values

I was requested to create a table that will contain many repeated values and I'm not sure if this is the best way to do it.
I must use SQL Server. I would love to use Azure Table Storage and partition keys, but I'm not allowed to.
Imagine that the table Shoes has the columns
id int, customer_name varchar(50), shoe_type varchar(50)
The problem is that the column shoe_type will have millions of repeated values, and I want to have them in their own partition, but SQL Server only allows ranged partitions afaik.
I don't want the repeated values to take more space than needed, meaning that if the column value is repeated 50 times, I don't want it to take 50 times more space, only 1 time.
I thought about using a relationship between the column shoe_type (as an int) and another table which will have its string value, but is that the most I can optimize?
EDIT
Shoes table data
id customer_name shoe_type
-----------------------------
1 a nike
2 b adidas
3 c adidas
4 d nike
5 e adidas
6 f nike
7 g puma
8 h nike
As you can see, the rows contain repeated shoe_type values (nike, adidas, puma).
What I thought about is using the shoe_type column as an int foreign key to another table, but I'm not sure if this is the most efficient way to do it, because in Azure Table Storage you have partitions and partition keys, and in MS SQL Server you have partitions, but they are ranged only.
The sample data you provide suggests that there is a "shoe type" entity in the business domain, and that all shoes have a mandatory relationship to a single shoe type. It would be different if the values were descriptive text - e.g. "Attractive running shoe, suitable for track and leisure wear". Repeated values are often (but of course not always) an indicator that there is another entity you can extract.
You suggest that the table will have millions of records. In very general terms, I recommend designing your schema to reflect the business domain, and only go for exotic optimization options once you know, and can measure, that you have a performance problem.
In your case, I'd suggest factoring out a separate table called "shoe_types", and to include a foreign key relationship from "shoes" to "shoe_types". The primary key for "shoe_types" should be a clustered index, and the "shoe_type_id" in "shoe_types" should be a regular index. All things being equal, with (tens of) millions of rows, that hit the foreign key index should be very fast.
In addition, supporting queries like "find all shoes where shoe type name starts with 'nik%'" should be much faster, because the shoe_types table should have far fewer rows than "shoes".

How to use indexes in sql database?

I have one table that contains 3 columns. Primary id, uuid, date(last login). When users are loging in I run query to database to check if user with this uuid exists. This table should work super fast for ~5mln users. How to make query on table like that faster? Will it help if I add another column for ex. country, and use it as a index?
If you want to check a particular uuid, then you want an index on that column:
create index idx_table_uuid on table(uuid);
This should be fast enough for your purposes.

Best practice for indexing in SQL Server

I have a transaction table and a inventory table that I would like to 'JOIN' together. The tables need to 'JOIN' on three primary keys.
My question is: should I create a unique key (concatenation of the three fields) and create a 'INDEX' on the unique key or would I just create a non-clustered 'INDEX' on all three fields?
I'm currently using SQL Server 2014
I'm guessing the Transaction table is the biggest and the Inventory is the smaller. A lot depends on what proportion of the data would you expect to be returned by your join - If its most then a table scan will probably occur so an index wont help much. If your going to try and get a small subset of date then create an index on the 3 columns on both tables and create a foreign key from Trans to Inventory on the 3 cols. (SQL Server needs an index as well as a FK)
Pick the most granular column as the first in your index as this will encourage SQL servers Optimiser to use the index.

Why do we get a RID lookup in SQL?

I created a non-clustered index on "last_name" column in the table "Persons"
Select * From Persons
Where last_name = 'Hogg'
So why is the index incapable of returning all the columns simultaneously and instead does a RID lookup?
How does indexing work here?
The index only covers the column last_name, and only contains data about that column. You can conceptually think about the index that you've described as a series of pairs: (last_name,row), where row is a reference to a particular row in the actual table. The index stores the pairs sorted by last_name, but stores no additional information about the table.
Your query requests all of the columns of Persons. The index is used to locate the row or rows where last_name is "Hogg", but the database has to reference the table to retrieve the additional columns.
What you appear to want is a covering index for the columns of interest. The term "RID lookup" implies SQL Server. Perhaps the question What are Covering Indexes and Covered Queries in SQL Server? and the page it points to: Using Covering Indexes to Improve Query Performance will help.