How to handle a big result set without load it into memory

How to handle a big result set without load it into memory - nhibernate

using nhibernate and getNamedQuery, how can I handle a huge result set, without load all rows to a list?
Let's suppose I need to do some logic with each row, but I don't need it after use it.
I don't want to use pagination, because I can't change the source (a stored procedure).

When you say you don't want to use pagination, do you consider using setMaxResults() and setFirstResult() pagination?
Query query = session.getNamedQuery("storedProc");
query.setFirstResult(fromIndex);
query.setMaxResults(pageSize);
I'm not really sure you have any other option with Hibernate. There are all sorts of ways you can incorporate partitioning to split the work, but that's only after you've loaded the data that you want to partition.
Assuming you're doing some sort of logic against a row and storing in the DB, and seeing as you're already using stored procs, your best bet may be another stored proc. Alternatively, if you want to keep your logic outside of your DB, then you should be working off of tables rather than data from a stored proc.

Related

Any downside to using a view to make sure stored procedures get all the columns they need?

Let me start by stating that when writing SELECT statements in a stored procedure or elsewhere in application code, I ALWAYS specify columns explicitly rather than using SELECT *.
That said, I have a case where I have many stored procedures that need exactly the same columns back because they are used to hydrate the same models on the client. In an effort to make the code less brittle and less prone forgetting to update a stored procedure with a new column, I am thinking of creating a view and selecting from that in my stored procedures using SELECT *.
To help clarify, here are examples of what the stored procedures might be named:
Entity_GetById
Entity_GetByX
Entity_GetForY
-- and on and on...
So in each of these stored procedures, I would have
SELECT *
FROM EntityView
WHERE...[criteria based on the stored procedure]
I'm wondering if there is any cost to doing this. I understand the pitfalls of SELECT * FROM Table but by selecting from a view that exists specifically to define the columns needed seems to mitigate this.
Are there any other reasons I wouldn't want to use this approach?

Personally, I don't see a problem with this approach.
However, there is a range of opinions on the use of select * in production code, which generally discourages it for the outermost query (of course, it is fine in subqueries).
You do want to make sure that the stored procedures are recompiled if there is any change to either the view or to the underlying tables supplying the view. You probably want to use schemabinding to ensure that (some) changes to the underlying tables do not inadvertently affect the view(s).

I don't know your system, but using a view would not affect performance?
SELECT * from the view makes sense, but does the view just selects specific columns from one table?
If not then look carefully into performance.
If I remember correctly in MS SQL stored procedure can return a recordset.
If I right you can try to wrap various queries into kind of sub queries stored procedure and have a one main which selects specific columns -- here complication should fail if you miss something in .
Even better would be having stored procedures which ask by various parameters and returns only primary keys (as record set or in temporary table) and one main which fetch all required columns based on returned primary keys.

View or Temporary Table - which to use in MS SQL Server?

I have a problem to decide whether to use a view or a temp table.
I have a stored procedure that i call from program. In that SP i store the result of a long query in a temp table, name the columns and make another queries on that table store the results in labels or a gridview or whatever and drop the Temp Table. I could also store the query-result in a view and make queries on that view. So what is better or in what case do i HAVE to use a VIEW/ Temp Table.
According to my research a view has the benefit of: Security, Simplicity and Column Name Specification. My temporary table fulfills all that too (according to my opinion).

If the query is "long" and you are accessing the results from multiple queries, then a temporary table is the better choice.
A view, in general, is just a short-cut for a select statement. If does not imply that the results are ever run and processed. If you use a view, the results will need to be regenerated each time it is used. Although subsequent runs of the view may be more efficient (say because the pages used by the view query are in cache), a temporary table actually stores the results.
In SQL Server, you can also use table variables (declare #t table . . .).
Using a temporary table (or table variable) within a single stored procedure would seem to have few implications in terms of security, simplicity, and column names. Security would be handled by access to the stored procedure. Column names are needed for either solution. Simplicity is hard to judge without more information, but nothing sticks out as being particularly complicated.

depends
A view must replicate the processing of your "long query" each time it is run, while a temp table stores the results.
so do you want to use more processing or more storage?
You can store some view values (persistent index) which could help on processing, but you don't provide enough info to really explore this.
If you are talking about just storing the data for the use within a single procedure call, then a temp table is the way to go.

I'd like to also mention that for temporary table,
You cannot refer to a TEMPORARY table more than once in the same query.
This make temp table inconvenient for the cases where you want to use self join on it.

It is really a situational and operation specific question and answer may vary depending on the requirements of the scenario.
However, a small point that i would like to add is that if you are using a view to store results of a complex query, which are in turn used in operations of a GridView, then it can be troublesome to perform update operations on complex views. On the contrary, Temp Tables can suffice to this perfectly.
Again, There are scenario's where Views may be a better choice [ as in Multiple Database Servers if not handled properly] but it depends on what you want to do.

In general I would use a temporary table when I want to refer multiple times to the same table within a stored procedure, and a view when I want to use the table across different stored procedures.
A view does not persist the data (in principle): each time you reference the view SQL uses the logic from the view to access the original table. So you would not want to build a view on a view on a view, or use multiple references to a view that has complex logic.

SQL Server execute several commands simultaneously

I have a stored proc that I need to execute 100 times (one for each parameter). I was wondering if I could execute these all at the same time using batch or something similar so that it would speed up processing instead of executing one and then the next.
Thanks!

Can you rewrite your procedure to accept TABLE parameter, fill it with 100 values, and process table instead of 100 scalars?

To directly answer your question, you could open 100 separate connections and execute 100 separate queries concurrently.
But as has been mentioned, I don't think this is the solution for you.
As you have a table with the 100 values, it seems you have a few options...
Change the StoredProcedure to a View, and join on the view.
Change the StoredProcedure to a Table Valued Function, and use CROSS APPLY.
(Inline functions will perform a lot better than Multi-Statement functions.)
These two are limitted by the fact that neither a view nor a function can have any side-effects... No writing to tables, etc, etc.
If you can't refactor your code to use a view or function, you still need a stored procedure to encapsulate the code.
In that case, you could either:
- pass the table of values in as a Table Valued Parameter.
- or simply have the Stored Procedure read from the table directly.
Depending on your needs, you may even wish to create a table specifically for this SP to read from. This introduces a couple of extra issues though...
- Concurrency : How to keep my data separate from someone elses? Have a field to hold a unique identifier, such as the session's ##SPID.
- Clean Up : You don't want processes inserting data all day, but never deleting it.
The one thing I would strongly suggest you avoid is using loops/cursors. If you can find a set based approach, use it :)
EDIT
A comment you just left mentions that you have millions of records to process.
This makes using set based approaches much more preferable. You may, however, find that this creates extremely large transactions (if you're doing a lot of INSERTs, UPDATEs, etc). In which case, still find a set based approach, then find a way of doing this in smaller pieces (say split by day if the data is time related, or just 1000 records at a time, whatever fits.)

Which is better, filtering results at db or application?

A simple question. There are cases when I fetch the data and then process it in my BLL. But I realized that the same processing/filtering can be done in my stored procedure and the filtered results returned to BLL.
Which is better, processing at DB or processing in BLL? And Why?
consider the scenario, I want to check whether a product exists in my db and if it exists add that to the order (Example taken from answer by Nour Sabony below)now i can do this checking at my BLL or I an do this at the stored procedure as well. If I combine things to one procedure, i reduce the whole operation to one db call. Is that better?

At the database level.
Databases should be optimized for querying, so why go through all the effort to return a large dataset, and then do filtering in your application afterwards?

As a general rule: Anything that reasonably belongs to the db's responsibilities and can be done at your db, do it at your db.

well, the fatest answer is at database,
but you may consider something like Linq2Sql,
I mean to write an expression at the presentation layer, and it will be parsed as Sql Statement at Data Access Layer.
of course there are some situation BLL should get some data from DAL ,process it ,and then return it to DAL.
take an example :
PutOrder(Order value) procedure , which should check for the availability of the ordered product.
public void PutOrder(Order _order)
{
foreach (OrderDetail _orderDetail in _order.Details)
{
int count = dalOrder.GetProductCount(_orderDetail.Product.ProductID);
if (count == 0)
throw new Exception (string.Format("Product {0} is not available",_orderDetail.Product.Name));
}
dalOrder.PutOrder(_order);
}
but if you are making a Browse view, it is not a good idea (from a performance viewpoint) to bring all the data from Dal and then choose what to display in the Browse view.
may be the following could help:
public List<Product> SearchProduts(Criteria _criteria)
{
string sql = Parser.Parse(_criteria);
///code to pass the sql statement to Database procedure and get the corresponding data.
}

at the database.

I have consistently been taught that data should be handled in the data layer whenever possible. Filtering data is what a DB is specifically there to do and is optimized to do. Therefore that is where it should occur.
This also prevents the duplication of effort. If the data filtering occurs in a data layer then other code/systems could benefit from the same stored procedure instead of duplicating the work in your BLL.
I have found that people like to put the primary logic in the section that they are most comfortable working in. Regardless of a developer's strengths, data processing should go in the data layer whenever possible.

To throw a differing view point out there it also really depends upon your data abstraction model if you have a large 2nd level cache that sits on top of your database and the majority (or entirety) of your data you will be filtering on is in the cache then yes stay inside the application and use the cached data.

The difference is not very important when you consider tables that will never contain more than a few dozen rows.
Filtering at the db becomes the obvious choice when you consider tables that will grow to thousands, hundreds of thousands or millions of rows. It would make zero sense to transfer that much data to filter for a subset of records.

I asked a similar question in a SSRS class earlier this year when I wanted to know "best practice" for having a stored procedure retrieve queried data instead of executing TSQL in the report's data set. The response I received was: Allow the database engine to do the "heavy lifting." That is, do all the selecting, grouping, sorting in the stored procedure and allow SSRS to concentrate on delivering the results.
The same is true for your scenario. By all means, my vote would be to do all the filtering in your stored procedure. Call the sproc from your application and let the database do the work.

If you filter at the database, then only the data that is required by the application is actually sent on the wire between the database and the application server. For this reason, I suggest that it is better to filter at the database.

Avoiding duplicating SQL code?

I know many ways to avoid duplicating PHP code (in my case PHP). However, I am developing a rather big application that does some calculations on the database with the data it finds, and I have noticed the need to use the same code (parts of SQL) in other places.
I don't like the idea of copying and pasting the same thing over and over again. What is a good way to do this? Should I use stored procedures? I could almost calculate some of the stuff in PHP except that most of the times the queries are calculating values based on also data not returned by the query and it seems stupid to return extra data to PHP so that it could its calculations. Sometimes that may be okay, but now it does not feel so.
What should I do?
For example, all over in many SQL queries I am calculating similar to this:
...
(SELECT SUM(amount) FROM IT INNER JOIN Invoice I WHERE IT.invoiceId=I.id) AS total
...
FROM InvoiceTransaction IT
...
Note that I'm at home now so I'm writing this off the top of my head.

I think you have 2 solutions:
if the SQL returns a small amount of data, I would simply wrap the SQL invocation in a method call and call it (parameterising as necessary)
if the SQL handles a lot of data, I would keep that data in the database and use a stored procedure. You can then call that stored procedure without duplicating the code (but wrap the stored proc call in a function and call it - i.e. as in option 1)
I wouldn't necessarily shy away from stored procedures. But I would advise keeping business logic out of them (keep it in the application itself) and make sure you have sufficient unit testing around it.

I do not prefer store procedure, especially not for the sake of refactoring. You should consider writing a function that return the record you need, and put your SQL queries in that function so you can call it instead of putting your SQL everywhere.

I think we would need an example of a query. Stored procs might be a good option. Or an alternative might be to use views. One advantage in having your queries in views or stored procs is that you can often use the database to see where your tables are used. Disadvantage is that you are locking yourself into one database, however you are probably doing this anyway.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas