How and why do we test BufferPool HIT RATIO graphs - sql

we are testing BufferPool HIT RATIO graphs for DB call that happen through stored procedures (DB2).
Now what I dont understand is "why and how we test them."
When I google about BUFFERPOOL, I got that when pages (in turn our data I guess) is not updated then those pages will be picked from bufferpool else they will be picked from hard disk.
So my questions are :
How we test this for a db call.
If I am requesting some data continuously then data might not be changed then how BufferPool HIT RATIO graphs should look like.
If the graph is 100 percent or if goes down to 0, then what it mean. Basically how we check that which level in graph is good or bad.
On the whole I want to understand the concept of looking at BufferPool HIT RATIO graphs.
And which scenario is right for testing this. If my database is not changing then also I can see and compare results as compared to DB which changes frequently.
If any one can give some links for this, that also will help.
Buffer Cache Hit Ratio shows how SQL Server utilizes buffer cache
“Percent of page requests satisfied by data pages from the buffer pool”
It gives the ratio of the data pages found and read from the SQL Server buffer cache and all data page requests. The pages that are not found in the buffer cache are read from the disk, which is significantly slower and affects performance.
For more info : http://www.sqlshack.com/sql-server-memory-performance-metrics-part-4-buffer-cache-hit-ratio-page-life-expectancy/
This link solves my questions no : 1,2 & 3.
Now if some one can share there experience for qsn no 4. That would be big help.
Actually we are trying to monitor performance issues that can be caused in actual application's db, by testing our stored procedures in another DB which we use for test /develop purposes.

Related

Allowing many users to view stale BigQuery data query results concurrently

If I have a BigQuery dataset with data that I would like to make available to 1000 people (where each of these people would only be allowed to view their subset of the data, and is OK to view a 24hr stale version of their data), how can I do this without exceeding the 50 concurrent queries limit?
In the BigQuery documentation there's mention of 50 concurrent queries being permitted which give on-the-spot accurate data, which I would surpass if I needed them to all be able to view on-the-spot accurate data - which I don't.
In the documentation there is mention of Batch jobs being permitted and saving of results into destination tables which I'm hoping would somehow allow a reliable solution for my scenario, but am having difficulty finding information on how reliably or frequently those batch jobs can be expected to run, and whether or not someone querying results that exist in those destination tables is in itself counting towards the 50 concurrent users limit.
Any advice appreciated.
Without knowing the specifics of your situation and depending on how much data is in the output, I would suggest putting your own cache in front of BigQuery.
This sounds kind of like a dashboading/reporting solution, so I assume there is a large amount of data going in and a relatively small amount coming out (per-user).
Run one query per day with a batch script to generate your output (grouped by user) and then export it to GCS. You can then break it up into multiple flat files (or just read it into memory on your frontend). Each user hits your frontend, you determine which part of the output to serve up to them and respond.
This should be relatively cheap if you can work off the cached data and it is small enough that handling the BigQuery output isn't too much additional processing.
Google Cloud Functions might be an easy way to handle this, if you don't want the extra work of setting up a new VM to host your frontend.

UI testing fails with error "Failed to get snapshot within 15.0s"

I am having a table view with a huge amount of cells. I tried to tap a particular cell from this table view. But the test ending with this error:
Failed to get snapshot within 15.0s
I assume that, the system will take snapshot of the whole table view before accessing its element. Because of the huge number of cells, the snapshot taken time was not enough (15 secs is may be the system default time).
I manually set the sleep time / wait time (I put 60 secs). Still I can not access the cells after 60 seconds!!
A strange thing I found is, before accessing the cell, I print the object in Debugger like this:
po print XCUIApplication().cells.debugDescription
It shows an error like
Failed to get snapshot within 15.0s
error: Execution was interrupted, reason: internal ObjC exception breakpoint(-3)..
The process has been returned to the state before expression evaluation.
Again if I print the same object using
po print XCUIApplication().cells.debugDescription
Now it will print all cells in the tableview in Debugger view.
No idea about why this happens. Does anyone faced similar kind of issues? Help needed!!
I assume that, the system will take snapshot of the whole table view before accessing its element.
Your assumption is correct but there is even more to the story. The UI test requests a snapshot from the application. The application takes this snapshot and then sends the snapshot to the test where the test finally evaluates the query. For really large snapshots (like your table view) that means that:
The snapshot takes a long time for the application to generate and
the snapshot takes a long time to send back to the test for query evaluation.
I'm at WWDC 2017 right now and there is a lot of good news about testing - specifically some things that address your exact problem. I'll outline it here but you should go watch WWDC 2017 Session 409 and skip to timestamp 17:10.
The first improvement is remote queries. This is where the test will transmit the query to the application, the application will evaluate that query remotely from the test and then only transmit back the result of that query. Expect a marginal improvement from this enhancement ~20% faster and ~30% less memory.
The second improvement is query analysis. This enhancement will reduce the size of snapshots taken by using a minimum attribute set for taking snapshots. This means that full snapshots of the view are not taken by default when evaluating a query. Example is if you're querying to tap a button, the snapshot is going to be limited to buttons within the view. This means writing less ambiguous queries are even more important. I.e. if you want to tap a navigation bar button, specify it in the query like app.navigationBars.buttons["A button"]. You'll see even more performance improvement from this enhancement ~50% faster and ~35% less memory
The last and most notable (and dangerous) improvement is what they're calling the First Match API. This comes with some trade offs/risks but offers the largest performance gain. It offers a new .firstMatch property that returns the first match for a XCUIElement query. Ambiguous matches that result in test failures will not occur when using .firstMatch, so you run the risk of evaluating or performing an action on an XCUIElement that you didn't intend to. Expect a performance increase of ~10x faster and no memory spiking at all.
So, to answer your question - update to Xcode 9, macOS High Sierra and iOS 11. Utilize .firstMatch where you can with highly specific queries and your issue of timing out snapshots should be solved. In fact the time out issue you're experiencing might already be solved with the general improvements you'll receive from remote queries and query analysis without you having to use .firstMatch!

What's the best way to get a 'lot' of small pieces of data synced between a Mac App and the Web?

I'm considering MongoDB right now. Just so the goal is clear here is what needs to happen:
In my app, Finch (finchformac.com for details) I have thousands and thousands of entries per day for each user of what window they had open, the time they opened it, the time they closed it, and a tag if they choose one for it. I need this data to be backed up online so it can sync to their other Mac computers, etc.. I also need to be able to draw charts online from their data which means some complex queries hitting hundreds of thousands of records.
Right now I have tried using Ruby/Rails/Mongoid in with a JSON parser on the app side sending up data in increments of 10,000 records at a time, the data is processed to other collections with a background mapreduce job. But, this all seems to block and is ultimately too slow. What recommendations does (if anyone) have for how to go about this?
You've got a complex problem, which means you need to break it down into smaller, more easily solvable issues.
Problems (as I see it):
You've got an application which is collecting data. You just need to
store that data somewhere locally until it gets sync'd to the
server.
You've received the data on the server and now you need to shove it
into the database fast enough so that it doesn't slow down.
You've got to report on that data and this sounds hard and complex.
You probably want to write this as some sort of API, for simplicity (and since you've got loads of spare processing cycles on the clients) you'll want these chunks of data processed on the client side into JSON ready to import into the database. Once you've got JSON you don't need Mongoid (you just throw the JSON into the database directly). Also you probably don't need rails since you're just creating a simple API so stick with just Rack or Sinatra (possibly using something like Grape).
Now you need to solve the whole "this all seems to block and is ultimately too slow" issue. We've already removed Mongoid (so no need to convert from JSON -> Ruby Objects -> JSON) and Rails. Before we get onto doing a MapReduce on this data you need to ensure it's getting loaded into the database quickly enough. Chances are you should architect the whole thing so that your MapReduce supports your reporting functionality. For sync'ing of data you shouldn't need to do anything but pass the JSON around. If your data isn't writing into your DB fast enough you should consider Sharding your dataset. This will probably be done using some user-based key but you know your data schema better than I do. You need choose you sharding key so that when multiple users are sync'ing at the same time they will probably be using different servers.
Once you've solved Problems 1 and 2 you need to work on your Reporting. This is probably supported by your MapReduce functions inside Mongo. My first comment on this part, is to make sure you're running at least Mongo 2.0. In that release 10gen sped up MapReduce (my tests indicate that it is substantially faster than 1.8). Other than this you can can achieve further increases by Sharding and directing reads to the the Secondary servers in your Replica set (you are using a Replica set?). If this still isn't working consider structuring your schema to support your reporting functionality. This lets you use more cycles on your clients to do work rather than loading your servers. But this optimisation should be left until after you've proven that conventional approaches won't work.
I hope that wall of text helps somewhat. Good luck!

How data will load in to memory in sql server

I am a beginner in sql server.I have read about buffer cache in sql server.I think it is the location in system RAM where the data will be stored once the query is executed.If that is correct,i have a few questions about query execution.
1)If my RAM size is 2GB, and i have data in my sql server of 10GB size and if i execute the sqlstatment for retreiving all the data(10 GB) from database,what will happen(Work/Not work)?
2)In the same case, if multiple user execute queries for retrieving 5 GB each(total 10 GB),what will happen?
When you issue a SELECT * FROM [MyTable] and your table has 10Gb on a system that has only 2Gb of RAM a database does not have to read the entire 10 Gb at once in memory. The select will start a scan of the data, starting with the first data page. For this it only needs that page (which is 8Kb) in memory, so it reads the page and consumes 8Kb of RAM. As it scans this page, it produces output that you see as the result set. As soon as is done with this page, it needs the next page so it read is in memory. It scans the records in it and produces output for your result. Then next page, then next page. The key point is that once is done with a page, its no longer needed. As it keeps adding these 8kb pages into RAM, they will eventually add up and consume all free RAM. In that moment SQL will free the old, unused, pages in RAM and thus make room for new ones. It will keep doing so until the entire 10Gb of your table are read.
If there are two users reading a table of 5GB each, things work exactly the same. Each user's query is scanning only one page at a time, and as they make progress and keep reading pages, they will fill up the RAM. When all the available RAM is used, SQL will start discarding old pages from RAM to make room for new ones.
In the real world things are fancier because of considerations like read-ahead.
And as a side note, you should never scan 10Gb of data. Your application should always request only the data it needs, and the data should be retrievable fast by using an index, exactly to avoid such large scan that needs to examine the entire table.
Thankfully you don't have to worry about this. Sure, it is important to tune your queries, minimize resultsets for network transfer, etc. etc. but SQL Server has been around a long time and it's very good at its own memory management. Unless you encounter a specific query that misbehaves, I'd say don't worry about it.
As you noted, data retrieved goes into the buffer cache. Some of that is real memory, some is spooled off to disk.
You can observe whether or not you're reusing memory by watching the Page Life Expectancy performance counter (there other indicators too, but this one is a quick shorthand). If you run a query that returns huge amounts of data and pushes other data out of the cache, you can watch the page life expectancy drop. As you run lots & lots of small queries, you can see the page life expectancy grow, sometimes to a lenght of days & days.
It's not a great measure for memory pressure on the system, but it can give you an indication of how well you're seeing the data in cache get reused.
the full resultset of a query is not stored on the RAM by sql server for reusing. What can be stored is the execution plan used on a query.
SQL Server stores data to manipulate it, like on an update statement, it reads the data from the DB, stores on the RAM, edit it and then another process writes it back to the DB. BUt as #n8wrl said, you dont need to worry about it.

Best practice for inserting and querying data from memory

We have an application that takes real time data and inserts it into database. it is online for 4.5 hours a day. We insert data second by second in 17 tables. The user at any time may query any table for the latest second data and some record in the history...
Handling the feed and insertion is done using a C# console application...
Handling user requests is done through a WCF service...
We figured out that insertion is our bottleneck; most of the time is taken there. We invested a lot of time trying to finetune the tables and indecies yet the results were not satisfactory
Assuming that we have suffecient memory, what is the best practice to insert data into memory instead of having database. Currently we are using datatables that are updated and inserted every second
A colleague of ours suggested another WCF service instead of database between the feed-handler and the WCF user-requests-handler. The WCF mid-layer is supposed to be TCP-based and it keeps the data in its own memory. One may say that the feed handler might deal with user-requests instead of having a middle layer between 2 processes, but we want to seperate things so if the feed-handler crashes we want to still be able to provide the user with the current records
We are limited in time, and we want to move everything to memory in short period. Is having a WCF in the middle of 2 processes a bad thing to do? I know that the requests add some overhead, but all of these 3 process(feed-handler, In memory database (WCF), user-request-handler(WCF) are going to be on the same machine and bandwidth will not be that much of an issue.
Please assist!
I would look into creating a cache of the data (such that you can also reduce database selects), and invalidate data in the cache once it has been written to the database. This way, you can batch up calls to do a larger insert instead of many smaller ones, but keep the data in-memory such that the readers can read it. Actually, if you know when the data goes stale, you can avoid reading the database entirely and use it just as a backing store - this way, database performance will only affect how large your cache gets.
Invalidating data in the cache will either be based on whether its written to the database or its gone stale, which ever comes last, not first.
The cache layer doesn't need to be complicated, however it should be multi-threaded to host the data and also save it in the background. This layer would sit just behind the WCF service, the connection medium, and the WCF service should be improved to contain the logic of the console app + the batching idea. Then the console app can just connect to WCF and throw results at it.
Update: the only other thing to say is invest in a profiler to see if you are introducing any performance issues in code that are being masked. Also, profile your database. You mention you need fast inserts and selects - unfortunately, they usually trade-off against each other...
What kind of database are you using? MySQL has a storage engine MEMORY which would seem to be suited to this sort of thing.
Are you using DataTable with DataAdapter? If so, I would recommend that you drop them completely. Insert your records directly using DBCommand. When users request reports, read data using DataReader, or populate DataTable objects using DataTable.Load (IDataReader).
Storying data in memory has the risk of losing data in case of crashes or power failures.