Non Clustered Indexes on data with no Logical insert order - sql

I have a SQL Server database that imports data from an old VMS database.
The data has many many tables that need to be joined for reporting.
The common Id for all table Joins goes like this
D-100344-1
,D-100344-2
,D-100345-3
,D-100346-1
,N-100346000-1
,N-100344001-1
,N-100344001-2
,N-100345001-3
,N-100346000-1
About 1.2 million of these lines come in each day, across 827 tables.
Many times a line will come in with updated data, and I will insert and remove the earlier line as the table doesn't need duplicates.
To better facilitate joins between the tables I looked to add a non clustered index on this ID.
it became 20% fragmented after one days insert (because of course it would.
What are my options here.
fyi. I use an incrementing TableID column for a clustered index on the table so my inserts arent so terrible but that id has no relation to the other tables for joining

Related

Best practice for indexing in SQL Server

I have a transaction table and a inventory table that I would like to 'JOIN' together. The tables need to 'JOIN' on three primary keys.
My question is: should I create a unique key (concatenation of the three fields) and create a 'INDEX' on the unique key or would I just create a non-clustered 'INDEX' on all three fields?
I'm currently using SQL Server 2014
I'm guessing the Transaction table is the biggest and the Inventory is the smaller. A lot depends on what proportion of the data would you expect to be returned by your join - If its most then a table scan will probably occur so an index wont help much. If your going to try and get a small subset of date then create an index on the 3 columns on both tables and create a foreign key from Trans to Inventory on the 3 cols. (SQL Server needs an index as well as a FK)
Pick the most granular column as the first in your index as this will encourage SQL servers Optimiser to use the index.

Oracle SQL merge tables without specifying columns

I have a table people with less than 100,000 records and I have taken a backup of this table using the following:
create table people_backup as select * from people
I add some new records to my people table over time, but eventually I want to merge the records from my backup table into people. Unfortunately I cannot simply DROP my table as my new records will be lost!
So I want to update the records in my people table using the records from people_backup, based on their primary key id and I have found 2 ways to do this:
MERGE the tables together
use some sort of fancy correlated update
Great! However, both of these methods use SET and make me specify what columns I want to update. Unfortunately I am lazy and the structure of people may change over time and while my CTAS statement doesn't need to be updated, my update/merge script will need changes, which feels like unnecessary work for me.
Is there a way merge entire rows without having to specify columns? I see here that not specifying columns during an INSERT will direct SQL to insert values by order, can the same methodology be applied here, is this safe?
NB: The structure of the table will not change between backups
Given that your table is small, you could simply
DELETE FROM table t
WHERE EXISTS( SELECT 1
FROM backup b
WHERE t.key = b.key );
INSERT INTO table
SELECT *
FROM backup;
That is slow and not particularly elegant (particularly if most of the data from the backup hasn't changed) but assuming the columns in the two tables match, it does allow you to not list out the columns. Personally, I'd much prefer writing out the column names (presumably those don't change all that often) so that I could do an update.

H2 incrementally update counts from another table?

With the H2 database, suppose there is a SUMS table that has a key and several count fields and there is an UPDATES table which has the same key and count fields. The keys in the UPDATES table may or may not exist in the SUMS table.
What is the most efficient way to add all the counts for each key from the UPDATES table to the SUM table, or insert a row with those counts if the SUMS table does not yet have it?
Of course I could always process the result set of a select on the UPDATES table and then one-by-one update or insert into the SUMS table, but this feels like there should be a more efficient way to do it.
If it is not possible in H2 but possible in some other Java-embeddable solution I would be interested in this too, because this processing is just an intermediate step for processing a larger number of these counts (a couple of dozen million keys and a couple of billion rows for updating them).

Performing mass DELETE operation in Oracle 11g

I have a table MyTable with multiple int columns with date and one column containing a date. The date column has an index created like follows
CREATE INDEX some_index_name ON MyTable(my_date_column)
because the table will often be queried for its contents between a user-specified date range. The table has no foreign keys pointing to it, nor have any other indexes other than the primary key which is an auto-incrementing index filled by a sequence/trigger.
Now, the issue I have is that the data on this table is often replaced for a given time period because it was out of date. So they way it is updated is by deleting all the entries within a given time period and inserting the new ones. The delete is performed using
DELETE FROM MyTable
WHERE my_date_column >= initialDate
AND my_date_column < endDate
However, because the number of rows deleted is massive (from 5 million to 12 million rows) the program pretty much blocks during the delete.
Is there something I can disable to make the operation faster? Or maybe specify an option in the index to make it faster? I read something about redo space having to do with this but I don't know how to disable it during an operation.
EDIT: The process runs every day and it deletes the last 5 days of data, then it brings the data for those 5 days (which may have changed in the external source) and reinserts the data.
The amount of data deleted is a tiny fraction compared to the whole amount of data in the table ( < 1%). So copying the data I want to keep into another table and dropping-recreating the table may not be the best solution.
I can only think of two ways to speed up this.
if you do this on a regular basis, you should consider partitioning your table by month. Then you just drop the partition of the month you want to delete. That is basically as fast as dropping a table. Partitioning requires an enterprise license if I'm not mistaken
create a new table with the data you want to keep (using create table new_table as select ...), drop the old table and rename the interims table. This will be much faster, but has the drawback that you need to re-create all indexes and (primary, foreign key) constraints on the new table.

SQL query: have results into a table named the results name

I have a very large database I would like to split up into tables. I would like to make it so when I run a distinct, it will make a table for every distinct name. The name of the table will be the data in one of the fields.
EX:
A --------- Data 1
A --------- Data 2
B --------- Data 3
B --------- Data 4
would result in 2 tables, 1 named A and another named B. Then the entire row of data would be copied into that field.
select distinct [name] from [maintable]
-make table for each name
-select [name] from [maintable]
-copy into table name
-drop row from [maintable]
Any help would be great!
I would advise you against this.
One solution is to create indexes, so you can access the data quickly. If you have only a handful of names, though, this might not be particularly effective because the index values would have select almost all records.
Another solution is something called partitioning. The exact mechanism differs from database to database, but the underlying idea is the same. Different portions of the table (as defined by name in your case) would be stored in different places. When a query is looking only for values for a particular name, only that data gets read.
Generally, it is bad design to have multiple tables with exactly the same data columns. Here are some reasons:
Adding a column, changing a type, or adding an index has to be done times instead of one time.
It is very hard to enforce a primary key constraint on a column across the tables -- you lose the primary key.
Queries that touch more than one name become much more complicated.
Insertions and updates are more complex, because you have to first identify the right table. This often results in overuse of dynamic SQL for otherwise basic operations.
Although there may be some simplifications (security comes to mind), most databases have other mechanisms that are superior to splitting the data into separate tables.
what you want is
CREATE TABLE new_table
AS (SELECT .... //the data that you want in this table);