SQL Server 2012 Query Performance - sql-server-2012

I will be starting a project soon using SQL Server 2012 where I will be required to provide real-time querying of database tables in excess of 4 billion records in 1 of the tables alone. I am fairly familiar with SQL Server (I have indexes on the relevant columns), but have never had to deal with databases so large before.
I have been looking into partitioning and am fairly confident at using it, however it is only available in the Enterprise version(?) for which the licenses are WAY too expensive. Column Store indexes also look promising, but as well as only being in Enterprise version, they also render your table read-only(??). Another option is to archive data as soon as it is not being used in live so that I keep as little data in the live tables as possible.
The main queries on the largest table will be on a NVARCHAR(50) column which contains an ID. Initial testing with 4 billion records using a query to pull a single record based on the ID is taking in excess of 5 mins even with indexing. So my question is (and sorry if it sounds naive!): can somebody please suggest a way to speed up the queries on this table that I haven't mentioned (and therefore don't know about)? Many thanks in advance.

Related

Improving query performance on unindexed column in MS Access table

I have a big old MS Access table with ~84 columns and ~280k rows. Three of these columns are LabNumber (indexed), HospitalNumber (non-indexed), and NHSNumber (non-indexed). I want to search HospitalNumber and NHSNumber for a term to retrieve the value of LabNumber. It's a regularly used production database, so the table must stay as is. Oh, and the database is being accessed over a network. The query was painfully slow.
Using the wonderful power of regular expressions, I can work out which one of NHSNumber and HospitalNumber I need to look in. Reducing it to only looking in one or the other has made it faster, but it's still taking 30 seconds on a good day, sometimes longer.
My question is this. Is there any other tips or tricks that I can use to try and bring the execution time down to a more manageable level? Welcome pragmatic solutions to it all, but bear in mind that the table must not be altered, and the existing database will be updated relatively regularly (let's say that the data being a day out isn't a big deal, but a week out definitely is)
Edit
The query was requested, so here it is. Unfortunately it's not that exciting:
SELECT [ConsID], [LabNumber], [HospitalNumber], [NHSNumber]
FROM Samples
WHERE [NHSNumber]="1234567890";
If you cannot modify the existing table, copy it to a local table and apply index on the columns you search.
This can all be done by code which you can run when an update is needed.
If you use VBA to open the table on startup and keep it open until the database is closed, it should improve the performance significantly.

How to best query across both Oracle and SQL Server databases for large tables?

I have a stored procedure in SQL Server that also queries tables in the same database and in a different Oracle database. This is for a data warehouse project that joins several large tables across databases and queries them.
Is it better to copy the table(with ~3 mil records) to the same database and then query it, or is the slowdown not significant from the table being in a different database? The query is complicated and can take hours.
I'm not necessarily looking for a specific answer, informed opinion and/or specific further reading are also very appreciated. Thanks!
I always prefer stage layer, or somebody calls it integration layer.
In your case (on blind) it's perhaps best solution to:
Copy table once
Create a sync step (Insert/Update) based on primary key(s)
Schedule step 2
Run your query
If there is some logical data-integrity rule, you can create second step by simple SQL based on timestamps.

Store Many Rows In Sql Server Issue?

I'm Working on My Program that Works With SQL Server.
for Store Data in Database Table, Which of the below approaches is correct?
Store Many Rows Just in One Table (10 Million Record)
Store Fewer Rows in Several Table (500000 Record) (exp: for each Year Create One Table)
It depends on how often you access data.If you are not using the old records, then you can archive those records. Splitting up of tables is not desirable as it may confuse you while fetching data.
I would say to store all the data in a single table, but implement a table partition on the older data. Partioning the data will increase query performance.
Here are some references:
http://www.mssqltips.com/sqlservertip/1914/sql-server-database-partitioning-myths-and-truths/
http://msdn.microsoft.com/en-us/library/ms188730.aspx
http://blog.sqlauthority.com/2008/01/25/sql-server-2005-database-table-partitioning-tutorial-how-to-horizontal-partition-database-table/
Please note that this table partioning functionality is only available in Enterprise Edition.
Well, it depends!
What are you going to do with the data? If you are querying this data a lot of times it could be a better solution to split the data in (for example) year tables. That way you would have a better performance since you have to query smaller tables.
But on the other side. With a bigger table and with good query's you might not even see a performance issue. If you only need to store this data it would be better to just use 1 table.
BTW For loading this data into the database you could use BCP (bulkcopy), which is a fast way of inserting a lot of rows.

What is the most scalable design for this table structure

DataColumn, DataColumn, DateColumn
Every so often we put data into the table via date.
So everything seems great at first, but then I thought: What happens when there are a million or billion rows in the table? Should I be breaking up the tables by date? This way the query performance will never degrade? How do people deal with this sort of thing?
You can use partitioned tables starting with SQL 2K5: Partitioned Tables
This way you gain the benefits of keeping the logical design pure while being able to move old data into a different file group.
You should not break your tables because of data. Instead, you should worry about your indexes, normalization and so on.
Update
A little deeper explanation. Let's suppose you have a table with a million records. If you have different dates on [DateColumn], your greatest ally will be the indexes that work with the [DateColumn]. Then you make sure your queries always filter by at least [DateColumn].
This way, you will be fine.
This easily qualifies as premature optimization, which is tough to achieve in db design IMHO, because optimization is/should be closer to the surface in data modeling.
But all you need to do is create an index on the DateColumn field. An index is actually a much better performance solution than any kind of table splitting/breaking up and keeps your design and therefore all of you programming much simpler. (And you can decide to use partitioning w/o affecting your design in the future if it helps.)
Sounds like you could use a history table. If you are mostly going to query the current date's data, then migrate the old data to the history table and your main table will not grow so much.
If I understand you question correctly, you have a table with some data and a date. Your question is -- will I see improved performance if I make a new table say, every year. This way the queries will never have to look at more than one years worth of data.
This is wrong. Instead what you should do is set the date field as an index. The server will be able to give you the performance gain you need if it is an index.
If you don't do this your program's logic will get crazy and ultimately slow down your system.
Keep it simple.
(NB - There are some advanced partitioning features you can make use of, but these can be layered in later if needed -- it is unlikely you will need these features but the simple design should be able to migrate to them if needed.)
When tables and indexes become very
large, partitioning can help by
partitioning the data into smaller,
more manageable sections.
Microsoft SQL Server 2005 allows you
to partition your tables based on
specific data usage patterns using
defined ranges or lists. SQL Server
2005 also offers numerous options for
the long-term management of
partitioned tables and indexes by the
addition of features designed around
the new table and index structure.
Furthermore, if a large table exists
on a system with multiple CPUs,
partitioning the table can lead to
better performance through parallel
operations.
You might need considering the
following too: In SQL Server 2005,
related tables (such as Order and
OrderDetails tables) that are
partitioned to the same partitioning
key and the same partitioning function
are said to be aligned. When the
optimizer detects that two partitioned
and aligned tables are joined, SQL
Server 2005 can join the data that
resides on the same partitions first
and then combine the results. This
allows SQL Server 2005 to more
effectively use multiple-CPU
computers.
Read about Partitioned Tables and Indexes in SQL Server 2005

SQL Server 2008 Slow Table, Table Partitioning

I have a table that has grown to over 1 million records... today (all valid)
I need to speed it up... would Table Partitioning be the answer? If so can i get some help on building the query?
The table has 4 bigint value keys and thats all with a primary key indexed and a index desc on userid the other values are at max 139 (there is just over 10,000 users now)
Any help or direction would be appreciated :)
You should investigate your indexes and query workload before thinking about partitioning. If you have done a large number of inserts, your clustered index may be fragmented.
Even though you are using SQL Server Express you can still profile using this free tool: Profiler for Microsoft SQL Server 2005/2008 Express Edition
you probably just need to tune your queries and/or indexes. 1 million records shouldn't be causing you problems. I have a table with several hundred million records & am able to maintain pretty high performance. I have found the SQL Server profiler to be pretty helpful with this stuff. It's available in SQL Server Management Studio (but not the express version, unfortunately). You can also do Query > Include Actual Execution Plan to see a diagram of where time is being spent during the query.
I agree with the other comments. With a reasonably small database (largest table 1MM records) it's unlikely that any activity in the database should provide a noticeable load if queries are optimized and the rest of the code isn't abusing the database with redundant queries. It's a good opportunity to get a feeling for the interplay between database queries and the rest of the code.
See my experiments on sql table partitioning here [http://faiz.kera.la/2009/08/02/does-partitioning-improve-performance-for-sql-tables/]. Hope this is helpful for you... And for your case, 1M is not a considerable figure. May be you need to fine tune the queries than going for partitioning.