MS SQL 2008R2 - Monthly statistic - sql

I´m working with MS SQL Server 2008, and I´d like make some stastistic every week / month for users who are connecting to this server (how many times the were connected, which table was most popular or when was the traffic most, etc...).
I cannot find anything about this weekly / monthly statistic for SQL users.
I´ll be glad if someone can help me. Thanks a lot.

If you're looking for general table access statistics, the sys.dm_db_index_usage_stats view is a great place to start. For every table and index in the database that's been accessed, there will be a row in that view with stats on the number of times it was seeked, scanned, or used as a lookup, and when the last access time was. You could set up a sql agent job to run every few minutes, taking a snapshot of this entire view, and then graph the results over time to show the rate at which each table/index in the database changes.
I did a quite write-up on that view a while ago at http://trycatchfinally.net/2010/01/finding-unused-tables-in-sql-server-2005-and-2008/, but it's pretty powerful - though the example I use helps identify indexes or tables that aren't being used, you could flip it around to show which ones are being used most often.

you can create job for Monthy Period That Execute Bellow Command
Exec Sp_UpdateStats

Related

Google BigQuery move to SQL Server, Big Data table optimisation

I have a curious question and as my name suggests I am a novice so please bear with me, oh and hi to you all, I have learned so much using this site already.
I have an MSSQL database for customers where I am trying to track their status on a daily basis, with various attributes being recorded in several tables, which are then joined together using a data table to create a master table which yields approximately 600million rows.
As you can imagine querying this beast on a middling server (Intel i5, SSD HD OS, 2tb 7200rpm HD, Standard SQL Server 2017) is really slow. I was using Google BigQuery, but that got expensive very quickly. I have implemented indexes which have somewhat sped up the process, but still not fast enough. A simple select distinct on customer id for a given attribute is still taking 12 minutes on average for a first run.
The whole point of having a daily view is to make it easier to have something like tableau or QLIK connect to a single table to make it easy for the end user to create reports by just dragging the required columns. I have thought of using the main query that creates the master table and parameterizes it, but visualization tools aren't great for passing many variables.
This is a snippet of the table, there are approximately 300,000 customers and a row per day is created for customers who join between 2010 and 2017. They fall off the list if they leave.
My questions are:
1) should I even bother creating a flat file or should I just parameterize the query.
2) Are there any techniques I can use aside from setting the smallest data types for each column to keep the DB size to a minimal.
3) There are in fact over a hundred attribute columns, a lot of them, once they are set to either a 0 or 1, seldom change, is there another way to achieve this and save space?
4)What types of indexes should I have on the master table if many of the attributes are binary
any ideas would be greatly received.

MS SQL Server Query caching

One of my projects has a very large database on which I can't edit indexes etc., have to work as it is.
What I saw when testing some queries that I will be running on their database via a service that I am writing in .net. Is that they are quite slow when ran the first time?
What they used to do before is - they have 2 main (large) tables that are used mostly. They showed me that they open SQL Server Management Studio and run a
SELECT *
FROM table1
JOIN table2
a query that takes around 5 minutes to run the first time, but then takes about 30 seconds if you run it again without closing SQL Server Management Studio. What they do is they keep open SQL Server Management Studio 24/7 so that when one of their programs executes queries that are related to these 2 tables (which seems to be almost all queries ran by their program) in order to have the 30 seconds run time instead of the 5 minutes.
This happens because I assume the 2 tables get cached and then there are no (or close to none) disk reads.
Is this a good idea to have a service which then runs a query to cache these 2 tables every now and then? Or is there a better solution to this, given the fact that I can't edit indexes or split the tables, etc.?
Edit:
Sorry just I was possibly unclear, the DB hopefully has indexes already, just I am not allowed to edit them or anything.
Edit 2:
Query plan
This could be a candidate for an indexed view (if you can persuade your DBA to create it!), something like:
CREATE VIEW transhead_transdata
WITH SCHEMABINDING
AS
SELECT
<columns of interest>
FROM
transhead th
JOIN transdata td
ON th.GID = td.HeadGID;
GO
CREATE UNIQUE CLUSTERED INDEX transjoined_uci ON transhead_transdata (<something unique>);
This will "precompute" the JOIN (and keep it in sync as transhead and transdata change).
You can't create indexes? This is your biggest problem regarding performance. A better solution would be to create the proper indexes and address any performance by checking wait stats, resource contention, etc... I'd start with Brent Ozar's blog and open source tools, and move forward from there.
Keeping SSMS open doesn't prevent the plan cache from being cleared. I would start with a few links.
Understanding the query plan cache
Check your current plan cache
Understanding why the cache would clear (memory constraint, too many plans (can't hold them all), Index Rebuild operation, etc. Brent talks about this in this answer
How to clear it manually
Aside from that... that query is suspect. I wouldn't expect your application to use those results. That is, I wouldn't expect you to load every row and column from two tables into your application every time it was called. Understand that a different query on those same tables, like selecting less columns, adding a predicate, etc could and likely would cause SQL Server to generate a new query plan that was more optimized. The current query, without predicates and selecting every column... and no indexes as you stated, would simply do two table scans. Any increase in performance going forward wouldn't be because the plan was cached, but because the data was stored in memory and subsequent reads wouldn't experience physical reads. i.e. it is reading from memory versus disk.
There's a lot more that could be said, but I'll stop here.
You might also consider putting this query into a stored procedure which can then be scheduled to run at a regular interval through SQL Agent that will keep the required pages cached.
Thanks to both #scsimon #Branko Dimitrijevic for their answers I think they were really useful and the one that guided me in the right direction.
In the end it turns out that the 2 biggest issues were hardware resources (RAM, no SSD), and Auto Close feature that was set to True.
Other fixes that I have made (writing it here for anyone else that tries to improve):
A helper service tool will rearrange(defragment) indexes once every
week and will rebuild them once a month.
Create a view which has all the columns from the 2 tables in question - to eliminate JOIN cost.
Advised that a DBA can probably help with better tables/indexes
Advised to improve server hardware...
Will accept #Branko Dimitrijevic 's answer as I can't accept both

Oracle ash report

Recently, i have been handed over with an ash report from DBA.
To me this report is like french. I have no idea what it is about and what all is written in this report. Can someone please guide me in reading it and explain me what all steps should i take to make my query stable and less cpu consumer. Moreover, which query is taking more of CPU.
1. Please recommend what actions should i perform.
2. Which query is consuming more of my CPU and what should i do to improve it.
An ASH (Active Session History) report shows what was happening within the database over a range of time.
The SQL with Top Events shows, for example, that 31% of time was consumed by the first query. The SQL ID column can be linked to the Complete List of SQL Text used was:
select some_giant_list_of_columns
from VU_PERSON_MINI
where lower (person_id) = ?
That same statement in Top SQL with Top Row Sources says it's doing a full table scan (TABLE ACCESS = FULL).
I suspect that there's not a function-based index on LOWER(PERSON_ID) on VU_PERSON_MINI. Adding one or fixing the underlying issue (is that a number column? can you enforce at the application that it's always stored in lower case) should help performance.
Also, the existence of direct path read events in that query is somewhat troublesome. That generally indicates sort activity to disk - or that you're reading LOB data. Do you mean to be as part of that SQL?

Is it possible to Cache the result set of a select query in the database?

I am trying to optimize the search query which is the most used in our system. So far I have added some missing indexes and that has helped slightly. But I want to further reduce the load on the db server. One option that I will use is caching the result set as a LIST in the asp.net Cache so that I don't have to hit the db often.
However, I was wondering if there is a way to Cache some portions of the select query at the db as well. e.g. for the search results we consider only users who have been active in the last 180 days and who have share-info set as true. So this is like a super set which the db processes everytime and then applies other conditions such as category specified, city etc. which are passed. Is it possible to somehow Cache the Super Set so that I can run queries against the super set rather than run the query against the whole table? Will creating a View help in this? I am a bit hesitant to create a view as I read managing views can be an overhead and takes away some flexibility to modfy the tables.
I am using Sql-Server 2005 so cannot create a filtered index on the table, which I think would have been helpful.
I agree with #Neville K. SQL Server is pretty smart at caching data in memory. You might see limited / no performance gains for your effort.
You could consider indexed views (Enterprise Edition only) http://technet.microsoft.com/en-us/library/cc917715.aspx for your sub-query.
It is, of course, possible to do this - but I'm not sure if it will help.
You can create a scheduled job - once a night, perhaps - which populates a table called "active_users_with_share_info" by truncating it, and then repopulating it based on a select query filtering out users active in the last 180 days with "share_info = true".
Then you can join your search query to this table.
However, I doubt this would do much good - SQL Server is pretty smart at caching. Unless you're dealing with huge volumes of data (100 of millions of records), or very limited hardware, I doubt you'd get any measurable performance improvements - but by all means try it!
Of course, the price for this would be more moving parts in your application, more interesting failure modes (what happens if the overnight batch fails silently?), and more training for any new developers you bring into the team.

What would be the most efficient method for storing/updating Interval based data in SQL?

I have a database table with about 700 millions rows plus (growing exponentially) of time based data.
Fields:
PK.ID,
PK.TimeStamp,
Value
I also have 3 other tables grouping this data into Days, Months, Years which contains the sum of the value for each ID in that time period. These tables are updated nightly by a SQL job, the situation has arisen where by the tables will need to updated on the fly when the data in the base table is updated, this can be however up to 2.5 million rows at a time (not very often, typically around 200-500k up to every 5 minutes), is this possible without causing massive performance hits or what would be the best method for achieving this?
N.B
The daily, monthly, year tables can be changed if needed, they are used to speed up queries such as 'Get the monthly totals for these 5 ids for the last 5 years', in raw data this is about 13 million rows of data, from the monthly table its 300 rows.
I do have SSIS available to me.
I cant afford to lock any tables during the process.
700M recors in 5 months mean 8.4B in 5 years (assuming data inflow doesn't grow).
Welcome to the world of big data. It's exciting here and we welcome more and more new residents every day :)
I'll describe three incremental steps that you can take. The first two are just temporary - at some point you'll have too much data and will have to move on. However, each one takes more work and/or more money so it makes sense to take it a step at a time.
Step 1: Better Hardware - Scale up
Faster disks, RAID, and much more RAM will take you some of the way. Scaling up, as this is called, breaks down eventually, but if you data is growing linearly and not exponentially, then it'll keep you floating for a while.
You can also use SQL Server replication to create a copy of your database on another server. Replication works by reading transaction logs and sending them to your replica. Then you can run the scripts that create your aggregate (daily, monthly, annual) tables on a secondary server that won't kill the performance of your primary one.
Step 2: OLAP
Since you have SSIS at your disposal, start discussing multidimensional data. With good design, OLAP Cubes will take you a long way. They may even be enough to manage billions of records and you'll be able to stop there for several years (been there done that, and it carried us for two years or so).
Step 3: Scale Out
Handle more data by distributing the data and its processing over multiple machines. When done right this allows you to scale almost linearly - have more data then add more machines to keep processing time constant.
If you have the $$$, use solutions from Vertica or Greenplum (there may be other options, these are the ones that I'm familiar with).
If you prefer open source / byo, use Hadoop, log event data to files, use MapReduce to process them, store results to HBase or Hypertable. There are many different configurations and solutions here - the whole field is still in its infancy.
Indexed views.
Indexed views will allow you to store and index aggregated data. One of the most useful aspects of them is that you don't even need to directly reference the view in any of your queries. If someone queries an aggregate that's in the view, the query engine will pull data from the view instead of checking the underlying table.
You will pay some overhead to update the view as data changes, but from your scenario it sounds like this would be acceptable.
Why don't you create monthly tables, just to save the info you need for that months. It'd be like simulating multidimensional tables. Or, if you have access to multidimensional systems (oracle, db2 or so), just work with multidimensionality. That works fine with time period problems like yours. At this moment I don't have enough info to give you, but you can learn a lot about it just googling.
Just as an idea.