I have a situation where I 3 partitions in SSAS BIDS 2008 for different years. I need to know which partition is in use in current context and why? How to change it manually?
For example, I have to partitions: P2001, P2002 and P2001-2002; user queries for sales in 2002.
In this case, what partition comes in play and why only that? How can I change this. I want to use P2001 when user queries for sales in 2002 (It makes no sense logically but will clarify my doubts)
I hope I made sense in elaborating my idea?
Thanks in advance.
First of all, your partitions should not have overlapping data. It will read the overlapping data twice (or the number of partitions this data is on). You do not control which partition is being read, SSAS knows which partition each key is on, so it will just read that partition when running a query.
You can use the SQL Server Profiler to look at the queries being run to see which partitions are being read, here is an example from the web:
In order to be able to query without any cached data (to make sure to see which partitions are being read), you can run this XMLA to clear the cache for your cube then run your queries again:
<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
<ClearCache>
<Object>
<DatabaseID> database id </DatabaseID>
<CubeID> cube id </CubeID>
</Object>
</ClearCache>
</Batch>
Related
I´m working with MS SQL Server 2008, and I´d like make some stastistic every week / month for users who are connecting to this server (how many times the were connected, which table was most popular or when was the traffic most, etc...).
I cannot find anything about this weekly / monthly statistic for SQL users.
I´ll be glad if someone can help me. Thanks a lot.
If you're looking for general table access statistics, the sys.dm_db_index_usage_stats view is a great place to start. For every table and index in the database that's been accessed, there will be a row in that view with stats on the number of times it was seeked, scanned, or used as a lookup, and when the last access time was. You could set up a sql agent job to run every few minutes, taking a snapshot of this entire view, and then graph the results over time to show the rate at which each table/index in the database changes.
I did a quite write-up on that view a while ago at http://trycatchfinally.net/2010/01/finding-unused-tables-in-sql-server-2005-and-2008/, but it's pretty powerful - though the example I use helps identify indexes or tables that aren't being used, you could flip it around to show which ones are being used most often.
you can create job for Monthy Period That Execute Bellow Command
Exec Sp_UpdateStats
I am trying to optimize the search query which is the most used in our system. So far I have added some missing indexes and that has helped slightly. But I want to further reduce the load on the db server. One option that I will use is caching the result set as a LIST in the asp.net Cache so that I don't have to hit the db often.
However, I was wondering if there is a way to Cache some portions of the select query at the db as well. e.g. for the search results we consider only users who have been active in the last 180 days and who have share-info set as true. So this is like a super set which the db processes everytime and then applies other conditions such as category specified, city etc. which are passed. Is it possible to somehow Cache the Super Set so that I can run queries against the super set rather than run the query against the whole table? Will creating a View help in this? I am a bit hesitant to create a view as I read managing views can be an overhead and takes away some flexibility to modfy the tables.
I am using Sql-Server 2005 so cannot create a filtered index on the table, which I think would have been helpful.
I agree with #Neville K. SQL Server is pretty smart at caching data in memory. You might see limited / no performance gains for your effort.
You could consider indexed views (Enterprise Edition only) http://technet.microsoft.com/en-us/library/cc917715.aspx for your sub-query.
It is, of course, possible to do this - but I'm not sure if it will help.
You can create a scheduled job - once a night, perhaps - which populates a table called "active_users_with_share_info" by truncating it, and then repopulating it based on a select query filtering out users active in the last 180 days with "share_info = true".
Then you can join your search query to this table.
However, I doubt this would do much good - SQL Server is pretty smart at caching. Unless you're dealing with huge volumes of data (100 of millions of records), or very limited hardware, I doubt you'd get any measurable performance improvements - but by all means try it!
Of course, the price for this would be more moving parts in your application, more interesting failure modes (what happens if the overnight batch fails silently?), and more training for any new developers you bring into the team.
there is a table which has 80.000 rows.
Everyday I will clone this table to another log table giving a name like 20101129_TABLE
, and every day the prefix will be changed according to date..
As you calculate, the data will be 2400 000 rows every month..
Advices please for saving space, and getting fast service and other advantages and disadvantages!! how should i think to create the best archive or log..
it is a table has the accounts info. branch code balance etc
It is quite tricky to answer your question since you are a bit vague on some important facts:
How often do you need the archived tables?
How free are you in your design-choices?
If you don't need the archived data often and you are free in your desgin I'd copy the data into an archive database. That will give you the option of storing the database on a separate disk (cost-efficiency) and you can have a separate backup-schedule on that database as well.
You could also store all the data in one table with just an additional column like ArchiveDate datetime. But I think this depends really on how you plan on accessing the data later.
Consider TABLE PARTITIONING (MSDN) - it is designed for exactly this kind of scenarios. Not only you can spread data across partitions (and map partitions to different disks), you can keep all data in the same table and let MSSQL do all the hard work in the background (what partition to use based on select criteria, etc.).
I have a postgres database with several million rows, which drives a web app. The data is static: users don't write to it.
I would like to be able to offer users query-able aggregates (e.g. the sum of all rows with a certain foreign key value), but the size of the database now means it takes 10-15 minutes to calculate such aggregates.
Should I:
start pre-calculating aggregates in the database (since the data is static)
move away from postgres and use something else?
The only problem with 1. is that I don't necessarily know which aggregates users will want, and it will obviously increase the size of the database even further.
If there was a better solution than postgres for such problems, then I'd be very grateful for any suggestions.
You are trying to solve an OLAP (On-Line Analytical Process) data base structure problem with an OLTP (On-Line Transactional Process) database structure.
You should build another set of tables that store just the aggregates and update these tables in the middle of the night. That way your customers can query the aggregate set of tables and it won't interfere with the on-line transation proceessing system at all.
The only caveate is the aggregate data will always be one day behind.
Yes
Possibly. Presumably there are a whole heap of things you would need to consider before changing your RDBMS. If you moved to SQL Server, you would use Indexed views to accomplish this: Improving Performance with SQL Server 2008 Indexed Views
If you store the aggregates in an intermediate Object (something like MyAggragatedResult), you could consider a caching proxy:
class ResultsProxy {
calculateResult(param1, param2) {
.. retrieve from cache
.. if not found, calculate and store in cache
}
}
There are quite a few caching frameworks for java, and most like for other languages/environments such as .Net as well. These solution can take care of invalidation (how long should a result be stored in memory), and memory-management (remove old cache items when reaching memory limit, etc.).
If you have a set of commonly-queried aggregates, it might be best to create an aggregate table that is maintained by triggers (or an observer pattern tied to your OR/M).
Example: say you're writing an accounting system. You keep all the debits and credits in a General Ledger table (GL). Such a table can quickly accumulate tens of millions of rows in a busy organization. To find the balance of a particular account on the balance sheet as of a given day, you would normally have to calculate the sum of all debits and credits to that account up to that date, a calculation that could take several seconds even with a properly indexed table. Calculating all figures of a balance sheet could take minutes.
Instead, you could define an account_balance table. For each account and dates or date ranges of interest (usually each month's end), you maintain a balance figure by using a trigger on the GL table to update balances by adding each delta individually to all applicable balances. This spreads the cost of aggregating these figures over each individual persistence to the database, which will likely reduce it to a negligible performance hit when saving, and will decrease the cost of getting the data from a massive linear operation to a near-constant one.
For that data volume you shouldn't have to move off Postgres.
I'd look to tuning first - 10-15 minutes seems pretty excessive for 'a few million rows'. This ought to be just a few seconds. Note that the out-of-the box config settings for Postgres don't (or at least didn't) allocate much disk buffer memory. You might look at that also.
More complex solutions involve implementing some sort of data mart or an OLAP front-end such as Mondrian over the database. The latter does pre-calculate aggregates and caches them.
If you have a set of common aggregates you can calculate it before hand (like, well, once a week) in a separate table and/or columns and users get it fast.
But I'd seeking the tuning way too - revise your indexing strategy. As your database is read only, you don't need to worry about index updating overhead.
Revise your database configuration, maybe you can squeeze some performance of it - normally default configurations are targeted to easy the life of first-time users and become short-sighted fastly with large databases.
Maybe even some denormalization can speed up things after you revised your indexing and database configuration - and falls in the situation that you need even more performance, but try it as a last resort.
Oracle supports a concept called Query Rewrite. The idea is this:
When you want a lookup (WHERE ID = val) to go faster, you add an index. You don't have to tell the optimizer to use the index - it just does. You don't have to change the query to read FROM the index... you hit the same table as you always did but now instead of reading every block in the table, it reads a few index blocks and knows where to go in the table.
Imagine if you could add something like that for aggregation. Something that the optimizer would just 'use' without being told to change. Let's say you have a table called DAILY_SALES for the last ten years. Some sales managers want monthly sales, some want quarterly, some want yearly.
You could maintain a bunch of extra tables that hold those aggregations and then you'd tell the users to change their query to use a different table. In Oracle, you'd build those as materialized views. You do no work except defining the MV and an MV Log on the source table. Then if a user queries DAILY_SALES for a sum by month, ORACLE will change your query to use an appropriate level of aggregation. The key is WITHOUT changing the query at all.
Maybe other DB's support that... but this is clearly what you are looking for.
We have a warehouse database that contains a year of data up to now. I want to create report database that represents the last 3 months of data for reporting purposes. I want to be able to keep the two databases in sync. Right now, every 10 minutes I execute a package that will grab the most recent rows from the warehouse and adds them to the report db. The problem is that I only get new rows but not new updates.
I would like to know what are the various ways of solving this scenario.
Thanks
look into replication, mirroring or log shipping
If you are using SQL 2000 or below, replication is your best bet. Since you are doing this every ten minutes, you should definitely look at transactional replication.
If you are using SQL 2005 or greater, you have more options available to you. Database snapshots, log shipping, and mirroring as SQLMenace suggested above. The suitability of these vary depending on your hardware. You will have to do some research to pick the optimal one for your needs.
You should probably read about replication, or ask your DB admin about it.
Is it possible to add columns to this database? You could add a Last_Activity column to the DB and the write a trigger that updates the date/timestamp on that row to reflect the latest edit. For any new entries, the date/time would reflect the timestamp when the row was added.
This way, when you grab the last three months, you'd be grabbing the last three months' activity, not just the new stuff.