Slow dimension because it had a default member

Slow dimension because it had a default member - ssas

I have an SSAS cube that has a dimension with about 500,000 members.
Performance is surprisingly good until I had a default member on one of the attributes (a boolean Yes/No value that we default to Yes).
This change makes a refresh go from 5 seconds to over 20 minutes.
Are default values known to be bad for performance? I can't really see anything on Google suggesting that they are bad performers.
For context, when refreshing a PivotTable, Excel was executing a heap of WITH MEMBER statements, with each taking about 3.5 seconds. Removing the default member brought that down to 40ms.

Related

Why is the transaction log growing so large?

I'm performing an update on a DB that is inserting a 15 digit number into 270,000,000 rows of a single column. I think the space required should be around 4GB but it is still running and the transaction log has just hit 180GB.

Transactions have to store a lot of information just in case the changes need to be rolled back.
There needs to be a sequential value to know which order the records were updated/inserted. It needs to store the original value for the column (some RDBMSs might even store the whole row!). It needs a unique identifier to tie the data back to the row's location.
It has to store so much data because if something catastrophic happens -- like the database crashing -- it needs to be able to return to a consistent state.
Yes, 15 digits * 270 mil may come out to 4 GB, but that completely ignores all of the very important metadata required.
If this is a one-off update that doesn't need to be repeated, it may be faster to simply recreate the table with the column updated. Compared to inserts/updates/deletes, table creates from selects require almost no transaction logging.

Probably, all pages split due to the significant amount of data added (4/180 = 2.2%; might not seem significant but probably pushes many pages over the edge).
Rebuild the clustered index with a fillfactor (probably 90 is enough). Then, you will not have any page splits when updating.
If this does not help we need to dig deeper.
In any case there will be significant log growth and it will be bigger than 4GB for sure. 180 sounds too much. That sounds like whole pages are stored.

Postgres performance improvement and checklist

I'm studing a series of issues related to performance of my application written in Java, which has about 100,000 hits per day and each visit on average from 5 to 10 readings/writings on the 2 principale database tables (divided equally) whose cardinality is for both between 1 and 3 million records (i access to DB via hibernate).
My two main tables store user information (about 60 columns of type varchar, integer and timestamptz) and another linked to the data to be displayed (with about 30 columns here mainly varchar, integer, timestamptz).
The main problem I encountered may have had a drop in performance of my site (let's talk about time loads over 5 seconds which obviously does not depend only on the database performance), is the use of FillFactor which is currently the default value of 100 (that it's used always when data not changing..).
Obviously fill factor it's same on index (there are 10 for each 2 tables of type btree)
Currently on my main tables I make
40% select operations
30% update operations
20% operations insert
10% delete operations.
My database is also made up of 40 other tables of minor importance (there is just others 3 with same cardinality of user).
My questions are:
How do you find the right value of the fill factor to be set ?
Which can be a checklist of tasks to be checked to improve the performance
of a database of this kind?
Database is on server dedicated (16GB Ram, 8 Core) and storage it's on SSD disk (data are backupped all days and moved on another storage)

You have likely hit the "knee" of your memory usage where the entire index of the heavily used tables no longer fits in shared memory, so disk I/O is slowing it down. Confirm by checking if disk I/O is higher than normal. If so, try increasing shared memory (shared_buffers), or if that's already maxed, adjust the system shared memory size or add more system memory so you can bump it higher. You'll also probably have to start adjusting temp buffers, work memory and maintenance memory, and WAL parameters like checkpoint_segments, etc.
There are some perf tuning hints on PostgreSQL.org, and Google is your friend.
Edit: (to address the first comment) The first symptom of not-enough-memory is a big drop in performance, everything else being the same. Changing the table fill factor is not going to make a difference if you hit a knee in memory usage, if anything it will make it worse w.r.t. load times (which I assume means "db reads") because row information will be expanded across more pages on disk with blank space in each page thus more disk I/O is needed for table scans. But fill factor less than 100% can help with UPDATE operations, but I've found adjusting WAL parameters can compensate most of the time when using indexes (unless you've already optimized those). Bottom line, you need to profile all the heavy queries using EXPLAIN to see what will help. But at first glance, I'm pretty certain this is a memory issue even with the database on an SSD. We're talking a lot of random reads and random writes and a lot of SSDs actually get worse than HDDs after a lot of random small writes.

View Clustered Index Seek over 0.5 million rows takes 7 minutes

Take a look at this execution plan: http://sdrv.ms/1agLg7K
It’s not estimated, it’s actual. From an actual execution that took roughly 30 minutes.
Select the second statement (takes 47.8% of the total execution time – roughly 15 minutes).
Look at the top operation in that statement – View Clustered Index Seek over _Security_Tuple4.
The operation costs 51.2% of the statement – roughly 7 minutes.
The view contains about 0.5M rows (for reference, log2(0.5M) ~= 19 – a mere 19 steps given the index tree node size is two, which in reality is probably higher).
The result of that operator is zero rows (doesn’t match the estimate, but never mind that for now).
Actual executions – zero.
So the question is: how the bleep could that take seven minutes?! (and of course, how do I fix it?)
EDIT: Some clarification on what I'm asking here.
I am not interested in general performance-related advice, such as "look at indexes", "look at sizes", "parameter sniffing", "different execution plans for different data", etc.
I know all that already, I can do all that kind of analysis myself.
What I really need is to know what could cause that one particular clustered index seek to be so slow, and then what could I do to speed it up.
Not the whole query.
Not any part of the query.
Just that one particular index seek.
END EDIT
Also note how the second and third most expensive operations are seeks over _Security_Tuple3 and _Security_Tuple2 respectively, and they only take 7.5% and 3.7% of time. Meanwhile, _Security_Tuple3 contains roughly 2.8M rows, which is six times that of _Security_Tuple4.
Also, some background:
This is the only database from this project that misbehaves.
There are a couple dozen other databases of the same schema, none of them exhibit this problem.
The first time this problem was discovered, it turned out that the indexes were 99% fragmented.
Rebuilding the indexes did speed it up, but not significantly: the whole query took 45 minutes before rebuild and 30 minutes after.
While playing with the database, I have noticed that simple queries like “select count(*) from _Security_Tuple4” take several minutes. WTF?!
However, they only took several minutes on the first run, and after that they were instant.
The problem is not connected to the particular server, neither to the particular SQL Server instance: if I back up the database and then restore it on another computer, the behavior remains the same.

First I'd like to point out a little misconception here: although the delete statement is said to take nearly 48% of the entire execution, this does not have to mean it takes 48% of the time needed; in fact, the 51% assigned inside that part of the query plan most definitely should NOT be interpreted as taking 'half of the time' of the entire operation!
Anyway, going by your remark that it takes a couple of minutes to do a COUNT(*) of the table 'the first time' I'm inclined to say that you have an IO issue related to said table/view. Personally I don't like materialized views very much so I have no real experience with them and how they behave internally but normally I would suggest that fragmentation is causing its toll on the underlying storage system. The reason it works fast the second time is because it's much faster to access the pages from the cache than it was when fetching them from disk, especially when they are all over the place. (Are there any (max) fields in the view ?)
Anyway, to find out what is taking so long I'd suggest you rather take this code out of the trigger it's currently in, 'fake' an inserted and deleted table and then try running the queries again adding times-stamps and/or using some program like SQL Sentry Plan Explorer to see how long each part REALLY takes (it has a duration column when you run a script from within the program).
It might well be that you're looking at the wrong part; experience shows that cost and actual execution times are not always as related as we'd like to think.

Observations include:
Is this the biggest of these databases that you are working with? If so, size matters to the optimizer. It will make quite a different plan for large datasets versus smaller data sets.
The estimated rows and the actual rows are quite divergent. This is most apparent on the fourth query. "delete c from #alternativeRoutes...." where the _Security_Tuple5 estimates returning 16 rows, but actually used 235,904 rows. For that many rows an Index Scan could be more performant than Index Seeks. Are the statistics on the table up to date or do they need to be updated?
The "select count(*) from _Security_Tuple4" takes several minutes, the first time. The second time is instant. This is because the data is all now cached in memory (until it ages out) and the second query is fast.
Because the problem moves with the database then the statistics, any missing indexes, et cetera are in the database. I would also suggest checking that the indexes match with other databases using the same schema.
This is not a full analysis, but it gives you some things to look at.

Fyodor,
First:
The problem is not connected to the particular server, neither to the particular SQL Server instance: if I back up the database and then restore it on another computer, the behavior remains the same.
I presume that you: a) run this query in isolated environment, b) the data is not under mutation.
Is this correct?
Second: post here your CREATE INDEX script. Do you have a funny FILLFACTOR? SORT_IN_TEMPDB?
Third: which type is your ParentId, ObjectId? int, smallint, uniqueidentifier, varchar?

Running Updates on a large, heavily used table

I have a large table (~170 million rows, 2 nvarchar and 7 int columns) in SQL Server 2005 that is constantly being inserted into. Everything works ok with it from a performance perspective, but every once in a while I have to update a set of rows in the table which causes problems. It works fine if I update a small set of data, but if I have to update a set of 40,000 records or so it takes around 3 minutes and blocks on the table which causes problems since the inserts start failing.
If I just run a select to get back the data that needs to be updated I get back the 40k records in about 2 seconds. It's just the updates that take forever. This is reflected in the execution plan for the update where the clustered index update takes up 90% of the cost and the index seek and top operator to get the rows take up 10% of the cost. The column I'm updating is not part of any index key, so it's not like it reorganizing anything.
Does anyone have any ideas on how this could be sped up? My thought now is to write a service that will just see when these updates have to happen, pull back the records that have to be updated, and then loop through and update them one by one. This will satisfy my business needs but it's another module to maintain and I would love if I could fix this from just a DBA side of things.
Thanks for any thoughts!

Actually it might reorganise pages if you update the nvarchar columns.
Depending on what the update does to these columns they might cause the record to grow bigger than the space reserved for it before the update.
(See explanation now nvarchar is stored at http://www.databasejournal.com/features/mssql/physical-database-design-consideration.html.)
So say a record has a string of 20 characters saved in the nvarchar - this takes 20*2+2(2 for the pointer) bytes in space. This is written at the initial insert into your table (based on the index structure). SQL Server will only use as much space as your nvarchar really takes.
Now comes the update and inserts a string of 40 characters. And oops, the space for the record within your leaf structure of your index is suddenly too small. So off goes the record to a different physical place with a pointer in the old place pointing to the actual place of the updated record.
This then causes your index to go stale and because the whole physical structure requires changing you see a lot of index work going on behind the scenes. Very likely causing an exclusive table lock escalation.
Not sure how best to deal with this. Personally if possible I take an exclusive table lock, drop the index, do the updates, reindex. Because your updates sometimes cause the index to go stale this might be the fastest option. However this requires a maintenance window.

You should batch up your update into several updates (say 10000 at a time, TEST!) rather than one large one of 40k rows.
This way you will avoid a table lock, SQL Server will only take out 5000 locks (page or row) before esclating to a table lock and even this is not very predictable (memory pressure etc). Smaller updates made in this fasion will at least avoid concurrency issues you are experiencing.
You can batch the updates using a service or firehose cursor.
Read this for more info:
http://msdn.microsoft.com/en-us/library/ms184286.aspx
Hope this helps
Robert

The mos brute-force (and simplest) way is to have a basic service, as you mentioned. That has the advantage of being able to scale with the load on the server and/or the data load.
For example, if you have a set of updates that must happen ASAP, then you could turn up the batch size. Conversely, for less important updates, you could have the update "server" slow down if each update is taking "too long" to relieve some of the pressure on the DB.
This sort of "heartbeat" process is rather common in systems and can be very powerful in the right situations.

Its wired that your analyzer is saying it take time to update the clustered Index . Did the size of the data change when you update ? Seems like the varchar is driving the data to be re-organized which might need updates to index pointers(As KMB as already pointed out) . In that case you might want to increase the % free sizes on the data and the index pages so that the data and the index pages can grow without relinking/reallocation . Since update is an IO intensive operation ( unlike read , which can be buffered ) the performance also depends on several factors
1) Are your tables partitioned by data 2) Does the entire table lies in the same SAN disk ( Or is the SAN striped well ?) 3) How verbose is the transaction logging . Can the buffer size of the transaction loggin increased to support larger writes to the log to suport massive inserts ?
Its also important which API/Language are you using? e.g JDBC support a batch update feature which makes the updates a little bit efficient if you are doing multiple updates .

SQL Read/Write efficiency

Is there any diffrenece in the performance of read and write operations in SQL? Using Linq to SQL in an ASP.NET MVC application, I often update many values in one of my tables in single posts (during this process, many posts of this type will come in rapidly from the user, although the user is unable to submit new data until the previous update is complete). My current implementation is to loop through the input (a list of the current values for each row), and write them to the field (nullable int). I wonder if there would be any performance difference if instead I read the current db value, and only wrote if it has changed. Most of these operations change the values for roughly 1/4 to 2/3 of the rows, some change fewer, and few change more than 2/3 of the rows.
I don't know much about the comparative speeds of these operations (or if there is even any difference). Is there any benefit to be gained from doing this? If so, what table sizes would benefit the most/not benefit at all, and would there be any percentage of the rows changing that would be a threshold for this improvement?

It's always faster to read.
A write is actually always a read followed by a write.
SQL needs to know which row to write to, which involves reading either an index or the table itself in a seek or scan operation, then writing to the appropriate row.
Writing also needs to update any applicable indexes. Depending on the circumstance, the index may get "updated" even when the data doesn't change.
As a very general rule, it's a good idea only to modify the data that needs to be changed.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas