Performance Issue : Sql Views vs Linq For Decreasing Query Execution Time - sql

I am having a System Setup in ASP.NET Webforms and there is Acccounts Records Generation Form In Some Specific Situation I need to Fetch All Records that are near to 1 Million .
One solution could be to reduce number of records to fetch but when we need to fetch records for more than a year of 5 years that time records are half million, 1 million etc. How can I decrease its time?
What could be points that I can use to reduce its time? I can't show full query here, it's a big view that calls some other views in it
Does it take less time if I design it in as a Linq query? That's why I asked Linq vs Views
I have executed a "Select * from TableName" Query and its 40 mins and its still executing table is having 1,17,000 Records Can we decrease this timeline

I started this as a comment but ran out of room.
Use the server to do as much filtering for you as possible and return as few rows as possible. Client side filtering is always going to be much slower than server side filtering. Eg, it does not have access to the indexes & optimisation techniques that exist on the server.
Linq uses "lazy evaluation" which means that it builds up a method for filtering but does not execute it until it is forced to. I've used it and was initially impressed with the speed ... until I started to access the data it returned. When you use the data you want from Linq, this will trigger the actual selection process, which you'll find is slow.
Use the server to return a series of small resultsets and then process those. If you need to join these resultsets on a key, save them into dictionaries with that key so you can join them quickly.
Another approach is to look at Entity Framework to create a mirror of the server database structure along with indexes so that the subset of data you retrieve can be joined quickly.

Related

Pagination very heavy SQL query

I am trying to improve performance of some report in my system. Application is written in .NET Core 3.0, I am using EF-Core as ORM Framework and PostgreSQL as database. This report returns thousands of records and in some view it is presented to user. The results are paginated, ordered by some selected column (like start_time or agent_name etc.).
This result is calculated from heavy query (one execution takes about 10 seconds). To implement pagination, we need to calculate results for one page and total count. I can see 2 approaches how to solve this problem. Both of them have some disadvantages.
In current solution I am downloading full report, and then I am sorting it and slicing one page in memory. Advantage of this solution is that data is fetched from database only once. Disadvantage - we load thousands of record, but in fact we need only one page (50 records).
Other approach which I can see is slice records in DB (using LIMIT and OFFSET operators). I will fetch only on page of data, but I don't have count of all records, so I need to make second query with same query parameters which returns count of all records.
What do you think about this problem? Which approach is in your opinion better? Maybe some other technique is best for this issue?

is it ok to loop a sql query in programing language

I have a doubt in mind when retrieving data from database.
There are two tables and master table id always inserted to other table.
I know that data can retrieve from two table by joining but want to know,
if i first retrieve all my desire data from master table and then in loop (in programing language) join to other table and retrieve data, then which is efficient and why.
As far as efficiency goes the rule is you want to minimize the number of round trips to the database, because each trip adds a lot of time. (This may not be as big a deal if the database is on the same box as the application calling it. In the world I live in the database is never on the same box as the application.) Having your application loop means you make a trip to the database for every row in the master table, so the time your operation takes grows linearly with the number of master table rows.
Be aware that in dev or test environments you may be able to get away with inefficient queries if there isn't very much test data. In production you may see a lot more data than you tested with.
It is more efficient to work in the database, in fewer larger queries, but unless the site or program is going to be very busy, I doubt that it'll make much difference that the loop is inside the database or outside the database. If it is a website application then looping large loops outside the database and waiting on results will take a more significant amount of time.
What you're describing is sometimes called the N+1 problem. The 1 is your first query against the master table, the N is the number of queries against your detail table.
This is almost always a big mistake for performance.*
The problem is typically associated with using an ORM. The ORM queries your database entities as though they are objects, the mistake is assume that instantiating data objects is no more costly than creating an object. But of course you can write code that does the same thing yourself, without using an ORM.
The hidden cost is that you now have code that automatically runs N queries, and N is determined by the number of matching rows in your master table. What happens when 10,000 rows match your master query? You won't get any warning before your database is expected to execute those queries at runtime.
And it may be unnecessary. What if the master query matches 10,000 rows, but you really only wanted the 27 rows for which there are detail rows (in other words an INNER JOIN).
Some people are concerned with the number of queries because of network overhead. I'm not as concerned about that. You should not have a slow network between your app and your database. If you do, then you have a bigger problem than the N+1 problem.
I'm more concerned about the overhead of running thousands of queries per second when you don't have to. The overhead is in memory and all the code needed to parse and create an SQL statement in the server process.
Just Google for "sql n+1 problem" and you'll lots of people discussing how bad this is, and how to detect it in your code, and how to solve it (spoiler: do a JOIN).
* Of course every rule has exceptions, so to answer this for your application, you'll have to do load-testing with some representative sample of data and traffic.

Optimize calls to a commonly called, expensive query

I have a view in my database which returns the last updated value for a number of tables. This is to prevent those tables being queried directly for changes by the application, the application is in a multi user environment and these tables may be frequently updated for short bursts, then ignored for hours at a time.
I have a view called vwLastUpated
CREATE VIEW vwLastUpdated as
SELECT Tasks, Items, ListItems FROM
(Select Max(ModifiedTime) as Tasks from tblTasks) a CROSS JOIN
(Select Max(ModifiedTime) as Items from tblItems) b CROSS JOIN
(Select Max(ModifiedTime) as ListItems from tblListItem) c
Clients are configured to call this view around every 10-30 seconds (user configurable), the trouble is, when there are a lot of clients (around 80 at one site), the view gets hit very, very frequently, and can sometimes take a few milliseconds to run, but sometimes takes 200-300 ms to run if updates are occurring, this seems to be slowing down the front end during heavy use. The tables are properly indexed on ModifiedTime DESC.
These sites are using SQL Express in some cases, at other sites they have the full version of SQL and I can design the view differently and use Agent to update a common table (tblLastUpdated) where Agent updates the table directly by essentially running the above query every 5 seconds.
What could I do to make the process more efficient and reduce the load on the database server where SQL Express is used?
The client sites are on a minimum of SQL Server 2008 (up to SQL 2012)
Do you have an index on the following fields?
tblTasks(ModifiedTime)
tblItems(ModifiedTime)
ListItems(ModifiedTime)
This should ensure pretty good performance.
If you do, and there is still the problem of interacting locks, then you might consider having another table with this information. If you do updates/inserts directly on the tables, this would require triggers. If you wrap updates and inserts in stored procedures, then you can do the changes there.
This would basically be turning the view into a table and updating that table whenever the data changes. That update should be very fast and have minimal interaction with other queries.

Windows 2003 server becomes very slow while executing query that retrieves lac's of records

I executed a query, which retrieves more than 100,000 records, which uses joins to retrieve the records.
While this query was running the whole server becomes very slow and this affects other sites, which try to run normal query to get records.
In this case, query which runs for getting that many number of records and other query running simultaneously are of different data base.
Your query retrieving hundreds of thousands of records is probably causing significant IO and trashes the buffer pool. You need to address this from two directions:
review your requirements. Why are you retrieving hundreds of thousands of records? For sure no human can look at so many. Any analysis should be pushed to be performed on the server and only retrieve aggregate results
Why do you need to analyze frequently hundreds of thousands of records? Perhaps you need an ETL pipeline to extract the required aggregates/summation on a daily basis
Maybe the query does need to analyse hundreds of thousands of records, perhaps you're missing an index
If all of the above don't apply it simply means you need a bigger boat. Your current hardware cannot handle the requirements.

LINQ vs datasets - performance hit?

I am refactoring an existing VB.NET application to use Linq. I've been able to successfully get it to work, but it takes ages (over a minute) on the client machine!
They have lots of rows in the database table, but the old version of the programme on the same machine (which uses Datasets) takes 5 seconds.
My Linq queries are pretty standard, like so:
Dim query = From t As TRANSACTION In db.TRANSACTIONs _
where t.transactionID = transactionID _
select t
They only ever return one or zero rows. Any thoughts?
I am surprised by the huge time differential (5 seconds to 60+ seconds). I guess it would depend on how complex the TRANSACTION entity is. LINQ to SQL will process each row from your result set and turn it into an object, then add some state-tracking information to the DataContext. A DataSet simply stores the data raw, and processes it into strongly typed data as you read it from the DataTable. I wouldn't expect L2S to incur a 12-fold cost increase, but I would expect some increase.
The code you've pasted doesn't actually access the database at all -- what you do next with 'query' will determine how much data ends up getting transferred to the client. Is it possible something you're doing later on is causing the LINQ version to download more data than the Dataset version?
I have done the same transition on a project and seen only equivalent or better performance from LINQ but there have been instances where the LINQ version was doing a lot more roundtrips to the server, e.g. doing a Count() followed by a fetch of the data as two separate server queries. I usually solved this by doing a .ToList() to get the data locally before working on it. You have to use SQL Profiler sometimes to find out what is going on behind the scenes.