Should I index my PartitionKey when using cosmosDB - indexing

I'm new to Cosmos DB and I am wondering if I should create an index for my PartitionKey. For example, let's suppose I choose a non-unique ID to be my partition key. Here are some sample data:
uniqueId
someProperty
partitionKey
1
some data
1
2
some more data
1
3
some more data
1
4
some more data
2
5
some more data
2
And let's assume I want to query all the items that have partitionKey = 2. Should I add an index on partitionKey, or this column is optimized just by the fact that it is partitioned?

If you are asking about if you need to index it manually, Cosmos DB automatically indexes every property for all items. If you are talking about explicitly adding index policy then Yes, if you have disabled default indexing policy and need to filter by partition key then you should add an index to it.

Related

SQL server, is it possible to selectively over-write a computed column, or an IDENTITY int column

I'm in the process of migrating data to an SQL server database, ideally going foward I want to use the tables in this database to generate unique identities for the data being recorded in them, however the existing data comes with unique identities already.
I'd like to set up a table that looks like this
entry_num (PK)
component (FK)
long_id (Unique) - computed from combining row_num and component
1
THING1
THING1_1
2
THING2
THING2_2
3
THING1
THING2_3
I would like to be able to insert my existing data into the table by including it's existing id in the long_id column, and for future entries calculate the column automatically.
So my original inserted data might look like this:
entry_num (PK)
component (FK)
long_id (Unique) - computed from combining row_num and component
1
THING2
THING2_50
2
THING3
THING3_90
3
THING4
THING4_11
Alternatively could I manually specify the entry_num to match the identities?
I was planning on using
CREATE TABLE table_name (entry_num int IDENTITY...);
to auto increment this column, but is there another way that allows me to manually alter the identities of specific rows without violating the auto incrementing ability?

Should I add another table or add a coulmn to exisiting one

I have a table with thousands of entry and want to show if the entity is deleted or not.
I can add a new column "isDeleted" in the existing table and update every entry(thousands) of that entity in the table once it is deleted
OR
have a new table for the deleted entries and join the tables for queries.
I want to know which is faster.
I will be querying from the table and want the information about deleted entities as well as non deleted ones.
Lets say my table has columns:
id
type
prop1
info1
1
A
any
any
2
B
any
any
3
C
any
any
4
A
any
any
5
B
any
any
And i go and delete the type A, now I can have a isDeleted Column in this table only, as such
id
type
prop1
info1
isDeleted
1
A
any
any
true
2
B
any
any
false
3
C
any
any
false
4
A
any
any
true
5
B
any
any
false
or have a new table for deleted types.
with the first method I will have to go and update the isDeleted column for every instance of type A, and there are 1000's of such entries. whereas in the second method i can simply add a new row in the new table.
I want all such unique "types" that have not been deleted from my table. but dont want to remove the deleted types information
I hope this is clearer
The easiest way would be just to add an isDeleted column which is nullable and mark those that you delete as non-null. This would assert backwards compatibility also.
To build on this further, I would instead recommend to make this column into a deleted_at column stored as a nullable timestamp - this way you get the bonus of some extra metadata.
One such benefit of this extra metadata could be for audit trails.
To prevent repeated storage of the same data, add a different table types with columns type and is_deleted. This way, you can avoid inconsistencies, such as when rows 1 and 4 in your proposed example disagree with each other (one is true, another is false).
REFERENCES:
What is the reason to "normalize your databases"?
What is Normalisation (or Normalization)?

Transferring data when identity column values are different

I am in process of restructuring a database and creating a MVC 5 application. There are many tables which are normalized but few remains the same. In the original database table few of the rows were deleted. SO the table data looks like below,
Id Column1
--------------------
1 Some value
2 Some value
4 Some value
8 Some value
9 Some value
Now I am using code first to create new database with some new and some existing database tables. In my entity model I am using the following code to mark a field as primary key and identity,
[Key]
[DatabaseGenerated(DatabaseGeneratedOption.Identity)]
public int ID { get; set; }
Now the tables created by code first have auto incremented values for ID columns.
Id Column1
--------------------
1 Some value
2 Some value
3 Some value
4 Some value
5 Some value
The number of records for first table are more then 100. ID column for this table is also used as foreign key in another table which has nearly 1000 records. Now problem I am facing is that how to account for the difference in the original table IDs and newly created IDs. If I try to specify the value for ID column explicitly then it gives error that I cannot explicitly specify value for Identity column. I have to write a seed method for both tables. What can be the proper way to handle this scenario?

Indexes and multi column primary keys

In a MySQL database I have a table with the following primary key
PRIMARY KEY id (invoice, item)
In my application I will also frequently be selecting on item by itself and less frequently on only invoice. I'm assuming I would benefit from indexes on these columns.
MySQL does not complain when I define the following:
INDEX (invoice),
INDEX (item),
PRIMARY KEY id (invoice, item)
But I don't see any evidence (using DESCRIBE -- the only way I know how to look) that separate indexes have been established for these two columns.
Are the columns that make up a primary key automatically indexed individually? Is there a better way than DESCRIBE to explore the structure of my table?
I'm not intimately familiar with the internals of indices on mySql, but on the two database vendor products that I am familiar with (MsSQL, Oracle) indices are balanced-Tree structures, whose nodes are organized as a sequenced tuple of the columns the index is defined on (In the Sequence Defined)
So, unless mySql does it very differently, (probably not), any composite index (on more than one column) can be useable by any query that needs to filter or sort by a subset of the columns in the index, as long as the list of columns is compatible, i.e., if the columns, when sequenced the same as the sequenced list of columns in the complete index, is an ordered subset of the complete set of index columns, which starts at the beginning of the actual index sequence, with no gaps except at the end...
In other words, this means that if you have an index on (a,b,c,d) a query that filters on (a), (a,b), or (a,b,c) can also use the index, but a query that needs to filter on (b), or (c) or (b,c) will not be able to use the index...
So in your case, if you often need to filter or sort on column item alone, you need to add another index on that column by itself...
I personally use phpMyAdmin to view and edit the structure of MySQL databases. It is a web application but it runs well enough on a local web server (I run an instance of apache on my machine for this and phpPgAdmin).
As for the composite key of (invoice, item), it acts like an index for (invoice, item) and for invoice. If you want to index by just item you have to add that index yourself. Your PK will be sorted by invoice and then by item where invoice is the same in multiple records. While the order in a composite PK does not matter for uniqueness enforcement, it does matter for access.
On your table I would use:
PRIMARY KEY id (invoice, item), INDEX (item)
I'm not that familiar with MySQL, but generally an multiple-column index is equally useful on the first column in the index as an index on that column alone. The multiple-column index becomes less useful for querying against a single column the further the column appears into the index.
This makes some sense if you think of the multi-column index as a hierarchy. The first column in the index is the root of the hierarchy, so searching it is just a matter of scanning that first level. However, in order to scan the second column, the database has to look up the tree for each unique value found in the first column. This can be costly enough that most optimizers won't bother to look deeply into a multi-column index, instead opting to full-table-scan.
For example, if you have a table as follows:
Col1 |Col2 |Col3
----------------
A | 1 | Z
A | 2 | Y
A | 2 | X
B | 1 | Z
B | 2 | X
Assuming you have an index on all three columns, in order, the tree will look something like this:
A
+-1
+-Z
+-2
+-X
+-Y
B
+-1
+-Z
+-2
+-X
Looking for Col1='A' is easy: you only have to look at 2 ordered values. However, to resolve col3='X', you have to look at all of the values in the 4 bigger buckets, each of which is ordered individually.
To return table index information, you can use:
SHOW INDEX FROM <table>;
See: http://dev.mysql.com/doc/refman/5.0/en/show-index.html
To view table information:
SHOW CREATE TABLE <table>;
See: http://dev.mysql.com/doc/refman/5.0/en/show-create-table.html
Primary keys are indexes, so there's no need to create additional indexes. You can find out more information about them under the CREATE TABLE syntax (there's too much to insert here):
http://dev.mysql.com/doc/refman/5.0/en/create-table.html
There is a difference between composite index and composite primary key.
If you have defined a composite index like below
INDEX idx(invoice,item)
the index wont work if you query based on item and you need to add a separate index
INDEX itemidx(item)
But, if you have defined a composite primary key like below
PRIMARY KEY(invoice, item)
the index would work if you query based on item and no separate index is required.
Working example:
mysql>create table test ( col1 int(20), col2 int(20) ) primary key(col1,col2);
mysql>explain select * from test where col2 = 1;
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
| 1 | SIMPLE | test | index | NULL | PRIMARY | 8 | NULL | 10 | Using where; Using index |
+----+-------------+-------+-------+---------------+---------+---------+------+------+--------------------------+
Mysql auto create an index for composite keys. Depending on your queries, you may have to create separate index for individual column in the composite key.
If you are using mysql workbench, you can manually right click the schema and click on edit to see everything about the table
If your query is using both columns in where clause then you don't need to create a separate index in a composite primary key.
EXPLAIN SELECT * FROM `table` WHERE invoice = 1 and item = 1
You are also fine if you want to query with first column only
EXPLAIN SELECT * FROM `table` WHERE invoice = 1
But if you want to query with subsequent columns col2, col3 in composite PK then you would need to create separate indexes on those columns. The following explain query shows the second column does not have a possible key detected by MySQL
EXPLAIN SELECT * FROM `table` WHERE item = 1

What's a MySQL index table?

I need to speed up a query. Is an index table what I'm looking for? If so, how do I make one? Do I have to update it each insert?
Here are the table schemas:
--table1-- | --tableA-- | --table2--
id | id | id
attrib1 | t1id | attrib1
attrib2 | t2id | attrib2
| attrib1 |
And the query:
SELECT
table1.attrib1,
table1.attrib2,
tableA.attrib1
FROM
table1,
tableA
WHERE
table1.id = tableA.t1id
AND (tableA.t2id = x or ... or tableA.t2id = z)
GROUP BY
table1.id
You need to create a composite index on tableA:
CREATE INDEX ix_tablea_t1id_t2id ON table_A (t1id, t2id)
Indexes in MySQL are considered a part of a table: they are updated automatically, and used automatically whenever the optimizer decides it's a good move to use them.
MySQL does not use the term index table.
This term is used by Oracle to refer to what other databases call CLUSTERED INDEX: a kind of table where the records themselves are arranged according to the value of a column (or a set of columns).
In MySQL:
When you use MyISAM storage, an index is created as a separate file that has .MYI extension.
The contents of this file represent a B-Tree, each leaf containing the index key and a pointer to the offset in .MYD file which contains the data.
The size of the pointer is determined by the server setting called myisam_data_pointer_size, which can vary from 2 to 7 bytes, and defaults to 6 since MySQL 5.0.6.
This allows creating MyISAM tables up to 2 ^ (8 * 6) bytes = 256 TB
In InnoDB, all tables are inherently ordered by the PRIMARY KEY, it does not support heap-organized tables.
Each index, therefore, in fact is just a plain InnoDB table consisting of a single PRIMARY KEY of N+M records: N records being an indexed value, and M records being a PRIMARY KEY of the main table record which holds the indexed data.