SSIS crash after few records - sql

I have an SSIS package which suppose to take 100,000 records loop on them and for each one save the details to few tables.
It's working fine, until it reaches somewhere near the 3000 records, then the visual studio crashes. At this point devenv.exe used about 500MB and only 3000 rows were processed.
I'm sure the problem is not with a specific record because it always happens on different 3K of records.
I have a good computer with 2 GIG of ram available.
I'm using SSIS 2008.
Any idea what might be the issue?
Thanks.

Try increasing the default buffer size on your data flow tasks.
Example given here: http://www.mssqltips.com/sqlservertip/1867/sql-server-integration-services-ssis-performance-best-practices/
Best Practice #7 - DefaultBufferMaxSize and DefaultBufferMaxRows
As I said in the "Best Practices #6", the execution tree creates
buffers for storing incoming rows and performing transformations. So
how many buffers does it create? How many rows fit into a single
buffer? How does it impact performance?
The number of buffer created is dependent on how many rows fit into a
buffer and how many rows fit into a buffer dependent on few other
factors. The first consideration is the estimated row size, which is
the sum of the maximum sizes of all the columns from the incoming
records. The second consideration is the DefaultBufferMaxSize property
of the data flow task. This property specifies the default maximum
size of a buffer. The default value is 10 MB and its upper and lower
boundaries are constrained by two internal properties of SSIS which
are MaxBufferSize (100MB) and MinBufferSize (64 KB). It means the size
of a buffer can be as small as 64 KB and as large as 100 MB. The third
factor is, DefaultBufferMaxRows which is again a property of data flow
task which specifies the default number of rows in a buffer. Its
default value is 10000.
Although SSIS does a good job in tuning for these properties in order
to create a optimum number of buffers, if the size exceeds the
DefaultBufferMaxSize then it reduces the rows in the buffer. For
better buffer performance you can do two things. First you can remove
unwanted columns from the source and set data type in each column
appropriately, especially if your source is flat file. This will
enable you to accommodate as many rows as possible in the buffer.
Second, if your system has sufficient memory available, you can tune
these properties to have a small number of large buffers, which could
improve performance. Beware if you change the values of these
properties to a point where page spooling (see Best Practices #8)
begins, it adversely impacts performance. So before you set a value
for these properties, first thoroughly testing in your environment and
set the values appropriately.
You can enable logging of the BufferSizeTuning event to learn how many
rows a buffer contains and you can monitor "Buffers spooled"
performance counter to see if the SSIS has began page spooling. I
will talk more about event logging and performance counters in my next
tips of this series.

Related

Unable to upload data even after partitioning in VoltDB

We are trying to upload 80 GB of data in 2 host servers each with 48 GB RAM(in total 96GB). We have partitioned table too. But even after partitioning, we are able to upload data only upto 10 GB. In VMC interface, we checked the size worksheet. The no of rows in the table is 40,00,00,000 and table maximum size is 1053,200,000k and minimum size is 98,000,000K. So, what is issue in uploading 80GB even after partitioning and what is this table size?
The size worksheet provides minimum and maximum size in memory that the number of rows would take, based on the schema of the table. If you have VARCHAR or VARBINARY columns, then the difference between min and max can be quite substantial, and your actual memory use is usually somewhere in between, but can be difficult to predict because it depends on the actual size of the strings that you load.
But I think the issue is that the minimum size is 98GB according to the worksheet, meaning if any nullable strings are null, or any not-null strings would be an empty string. Even without taking into account the heap size and any overhead, this is higher than your 96GB capacity.
What is your kfactor setting? If it is 0, there will be only one copy of each record. If it is 1, there will be two copies of each record, so you would really need 196GB minimum in that configuration.
The size per record in RAM depends on the datatypes chosen and if there are any indexes. Also, VARCHAR values longer than 15 characters or 63 bytes are stored in pooled memory which carries more overhead than fixed-width storage, although it can reduce the wasted space if the values are smaller than the maximum size.
If you want some advice on how to minimize the per-record size in memory, please share the definition of your table and any indexes, and I might be able to suggest adjustments that could reduce the size.
You can add more nodes to the cluster, or use servers with more RAM to add capacity.
Disclaimer: I work for VoltDB.

Does it make a difference in SQL Server whether to use a TinyInt or Bit? Both in size and query performance

I have a table that has 124,387,133 rows each row has 59 columns and of those 59, 18 of the columns are TinyInt data type and all row values are either 0 or 1. Some of the TinyInt columns are used in indexes and some are not.
My question will it make a difference on query performance and table size if I change the tinyint to a bit?
In case you don't know, a bit uses less space to store information than a TinyInt (1 bit against 8 bits). So you would save space changing to bit, and in theory the performance should be better. Generally is hard to notice such performance improvement but with the amount of data you have, it might actually make a difference, I would test it in a backup copy.
Actually,it's good to use the right data type..below are the benefits i could see when you use bit data type
1.Buffer pool savings,page is read into memory from storage and less memory can be allocated
2.Index key size will be less,so more rows can fit into one page and there by less traversing
Also you can see storage space savings as immediate benefit
In theory yes, in practise the difference is going to be subtle, the 18 bit fields get byte packed and rounded up, so it changes to 3 bytes. Depending on nullability / any nullability change the storage cost again will change. Both types are held within the fixed width portion of the row. So you will drop from 18 bytes to 3 bytes for those fields - depending on the overall size of the row versus the page size you may squeeze an extra row onto the page. (The rows/page density is where the performance gain will show up primarily, if you are to gain)
This seems like a premature micro-optimization however, if you are suffering from bad performance, investigate that and gather the evidence supporting any changes. Making type changes on existing systems should be carefully considered, if you cause the need for a code change, which prompts a full regression test etc, the cost of the change rises dramatically - for very little end result. (The production change on a large dataset will also not be rapid, so you can factor in some downtime in the cost as well to make this change)
You would be saving about 15 bytes per record, for a total of 1.8 Gbytes.
You have 41 remaining fields. If I assume that those are 4-byte integers, then your current overall size is about 22 Gbytes. The overall savings is less than 10% -- and could be much less if the other fields are larger.
This does mean that a full table scan would be about 10% faster, so that gives you a sense of the performance gain and magnitude.
I believe that bit fields require an extra operation or two to mask the bits and read -- trivial overhead that is measured in nanoseconds these days -- but something to keep in mind.
The benefits of a smaller page size are that more records fit on a single page, so the table occupies less space in memory (assuming all is read in at once) and less space on disk. Smaller data does not always mean improved query performance. Here are two caveats:
If you are reading a single record, then the entire page needs to be read into cache. It is true that you are a bit less likely to get a cache miss with a warm cache, but overall reading a single record from a cold cache would be the same.
If you are reading the entire table, SQL Server actually reads pages in blocks and implements some look-ahead (also called read-ahead or prefetching). If you are doing complex processing, you might not even notice the additional I/O time, because I/O operations can run in parallel with computing.
For other operations such as deletes and updates, locking is sometimes done at the page level. In these cases, sparser pages can be associated with better performance.

Postgres performance improvement and checklist

I'm studing a series of issues related to performance of my application written in Java, which has about 100,000 hits per day and each visit on average from 5 to 10 readings/writings on the 2 principale database tables (divided equally) whose cardinality is for both between 1 and 3 million records (i access to DB via hibernate).
My two main tables store user information (about 60 columns of type varchar, integer and timestamptz) and another linked to the data to be displayed (with about 30 columns here mainly varchar, integer, timestamptz).
The main problem I encountered may have had a drop in performance of my site (let's talk about time loads over 5 seconds which obviously does not depend only on the database performance), is the use of FillFactor which is currently the default value of 100 (that it's used always when data not changing..).
Obviously fill factor it's same on index (there are 10 for each 2 tables of type btree)
Currently on my main tables I make
40% select operations
30% update operations
20% operations insert
10% delete operations.
My database is also made ​​up of 40 other tables of minor importance (there is just others 3 with same cardinality of user).
My questions are:
How do you find the right value of the fill factor to be set ?
Which can be a checklist of tasks to be checked to improve the performance
of a database of this kind?
Database is on server dedicated (16GB Ram, 8 Core) and storage it's on SSD disk (data are backupped all days and moved on another storage)
You have likely hit the "knee" of your memory usage where the entire index of the heavily used tables no longer fits in shared memory, so disk I/O is slowing it down. Confirm by checking if disk I/O is higher than normal. If so, try increasing shared memory (shared_buffers), or if that's already maxed, adjust the system shared memory size or add more system memory so you can bump it higher. You'll also probably have to start adjusting temp buffers, work memory and maintenance memory, and WAL parameters like checkpoint_segments, etc.
There are some perf tuning hints on PostgreSQL.org, and Google is your friend.
Edit: (to address the first comment) The first symptom of not-enough-memory is a big drop in performance, everything else being the same. Changing the table fill factor is not going to make a difference if you hit a knee in memory usage, if anything it will make it worse w.r.t. load times (which I assume means "db reads") because row information will be expanded across more pages on disk with blank space in each page thus more disk I/O is needed for table scans. But fill factor less than 100% can help with UPDATE operations, but I've found adjusting WAL parameters can compensate most of the time when using indexes (unless you've already optimized those). Bottom line, you need to profile all the heavy queries using EXPLAIN to see what will help. But at first glance, I'm pretty certain this is a memory issue even with the database on an SSD. We're talking a lot of random reads and random writes and a lot of SSDs actually get worse than HDDs after a lot of random small writes.

SSIS DataFlowTask DefaultBufferSize and DefaultBufferMaxRows

I have a task which pulls records from Oracle db to our SQL using dataflow task. This package runs everyday around 45 mins. This package will refresh about 15 tables. except one, others are incremental update. so almost every task runs 2 to 10 mins.
the one package which full replacement runs up to 25 mins. I want to tune this dataflow task to run faster.
There is just 400k of rows in the table. I did read some articles about DefaultBufferSize and DefaultBufferMaxRows. I have below doubts.
If I can set DefaultBufferSize upto 100 MB, Is there any place to look or analyse how much I can provide.
DefaultBufferMaxRows is set to 10k. Even If I give 50k and I provided 10 MB for DefaultBufferSize if which can only hold up to some 20k then what will SSIS do. Just ignore those 30k records or still it will pull all those 50k rocords(Spooling)?
Can I use Logging options to set proper limits?
As a general practice (and if you have enough memory), a smaller number of large buffers is better than a larger number of small buffers BUT not until the point where you have paging to disk (which is bad for obvious reasons)
To test it, you can log the event BufferSizeTuning, which will show you how many rows are in each buffer.
Also, before you begin adjusting the sizing of the buffers, the most important improvement that you can make is to reduce the size of each row of data by removing unneeded columns and by configuring data types appropriately.

Is altering the Page Size in SQL Server the best option for handling "Wide" Tables?

I have a multiple tables in my application that are both very wide and very tall. The width comes from sometimes 10-20 columns with a variety of datatypes varchar/nvarchar as well as char/bigint/int/decimal. My understanding is that the default page size in SQL is 8k, but can be manually changed. Also, that varchar/nvarchar columns are except from this restriction and they are often(always?) moved to a separate location, a process called Row_Overflow. Evenso, MS documentation states that Row-Overflowed data will degrade performance. "querying and performing other select operations, such as sorts or joins on large records that contain row-overflow data slows processing time, because these records are processed synchronously instead of asynchronously"
They recommend moving large columns into joinable metadata tables. "This can then be queried in an asynchronous JOIN operation".
My question is, is it worth enlarging the page size to accomodate the wide columns, and are there other performance problems thatd come up? If I didnt do that and instead partitioned the table into 1 or more metadata tables, and the tables got "big" like in the 100MM records range, wouldnt joining the partitioned tables far outweigh the benefits? Also, if the SQL Server is on a single core machine (or on SQL Azure) my understanding is that parallelism is disabled, so would that also eliminate the benefit of moving the tables intro partitions given that the join would no longer be asynchronous? Any other strategies that you'd recommend?
EDIT: Per the great comments below and some additional reading (that I shouldve done originally), you cannot manually alter SQL Server page size. Also, related SO post: How do we change the page size of SQL Server?. Additional great answer there from #remus-rusanu
You cannot change the page size.
varchar(x) and (MAX) are moved off-row when necessary - that is, there isn't enough space on the page itself. If you have lots of large values it may indeed be more effective to move them into other tables and then join them onto the base table - especially if you're not always querying for that data.
There is no concept of synchronously and asynchronously reading that off-row data. When you execute a query, it's run synchronously. You may have parallelization but that's a completely different thing, and it's not affected in this case.
Edit: To give you more practical advice you'll need to show us your schema and some realistic data characteristics.
My understanding is that the default page size in SQL is 8k, but can
be manually changed
The 'large pages' settings refers to memory allocations, not to change the database page size. See SQL Server and Large Pages Explained. I'm afraid your understanding is a little off.
As a general non-specific advice, for wide fixed length columns the best strategy is to deploy row-compression. For nvarchar, Unicode compression can help a lot. For specific advice, you need to measure. What is the exact performance problem you encountered? How did you measured? Did you used a methodology like Waits and Queues to identify the bottlenecks and you are positive that row size and off-row storage is an issue? It seems to me that you used the other 'methodology'...
you can't change the default 8k page size
varchar and nvarchar are treated like any other field, unless the are (max) which means they will be stored a little bit different because they can extend the size of a page, which you cant do with another datatype, also because it is not possible
For example, if you try to execute this statement:
create table test_varchars(
a varchar(8000),
b varchar(8001),
c nvarchar(4000),
d nvarchar(4001)
)
Column a and c are fine because both on them are max 8000 bytes in length.
But, you would get the following errors on columns b and d:
The size (8001) given to the column 'b' exceeds the maximum allowed for any data type (8000).
The size (4001) given to the parameter 'd' exceeds the maximum allowed (4000).
because both of them exceed the 8000 bytes limit. (Remember that the n in front of varchar or char means unicode and occupies double of space)