When to use internal tables? - abap

So, I have read that using internal tables increases the performance of the program and that we should make operations on DB tables as less as possible. But I have started working on a project that does not use internal tables at all.
Some details:
It is a scanner that adds or removes products in/from a store. First the primary key is checked (to see if that type of product exists) and then the product is added or removed. We use ‘Insert Into’ and ‘Delete From’ to add/remove the products directly from the DB table.
I have not asked why they do not use internal tables because I do not have a better solution so far.
Here’s what I have so far: Insert all products in an internal table, place the deleted products in another internal table.
Form update.
Modify zop_db_table from table gt_table." – to add all new products
LOOP AT gt_deleted INTO gs_deleted.
DELETE FROM zop_db_table WHERE index_nr = gs_deleted-index_nr.
ENDLOOP. " – to delete products
Endform.
But when can I perform this update?
I could set a ‘Save button’ to perform the update, but then there will be the risk that the user forgets to save large amounts of data, or drops the scanner, shutting it down or similar situations. So this is clearly not a good solution.
My final question is: Is there a (good) way to implement internal tables in a project like this?

internal tables should be used for data processing, like lists or arrays in other languages (c#, java...). From a performance and system load perspective it is preferred to first load all data you need into an internal table, then process that internal table instead of loading individual records from the database.
But that is mostly true for reporting, which is probably the most common type of custom abap program. You often see developers use select...endselect-statements, that in effect loop over a database table, transferring row after row to the report, one at a time. That is extremely slow compared to reading all records at once into an itab, then looping over the itab. More than once i've cut the execution time of a report down to a fraction by just eliminating roundtrips to the database.
If you have a good reason to read from the database or update records immediately, you should do so. If you can safely delay updates and deletes to a point in time where you can process all of them together, without risking inconsistencies, I'd consider than an improvement. But if there is a good reason (like consistency or data loss) to update immediately, do it.
Update: as #vwegert mentioned regarding the select-endselect statement, the statement doesn't actually create individual database queries for each row. The database interface of the application server optimizes the query, transferring rows in bulk to the application server. From there the records are transported to the abap report one by one (because in the report there is only the work area to store a single row), which has a significant performance impact especially for queries with large result sets. A select into an internal table can transport all rows directly to the abap report (as long as there is enough memory to hold them), as now there is the internal table to hold those records in the report.

Related

Is it possible to implement point in time recovery (PITR) in PostgreSQL for a single table?

Let's say I have a database with lots of tables, but there's one big table that's being updated regularly. At any given point in time, this table contains billions of rows, and let's say that the table is updated so regularly that we can expect a 100% refresh of the table by the end of each quarter. So the volume of data being moved around is in the order tens of billions. Because this table is changing so constantly, I want to implement a PITR, but only for this one table. I have two options:
Hack PostgreSQL's in-house PITR to apply only for one table.
Build it myself by creating a base backup, set up continuous archiving, and using a python script to execute the log of SQL statements up to a point in time (or use PostgreSQL's EXECUTE statement to loop through the archive). The big con with this is that it won't have the timeline functionality.
My problem is, I don't know if option 1 is even possible, and I don't know if option 2 even makes sense (looping through billions of rows sounds like it defeats the purpose of PITR, which is speed and convenience.) What other options do I have?

JOOQ Insert into select with large number of records

I'm trying to insert large number of records by selecting from a different table.
In the below example, BAR table has around 1 million records and trying to insert all those into FOO table. Is there a way I can do this efficiently with out the loader API or batch insert with JOOQ?
FYI, I'm trying to avoid the approach to load all the records in memory, so I'm not using the loader API which expects the JOOQRecords.
dslContext
.insertInto(FOO)
.columns(FOO.A, FOO.B)
.select(
select(A, B)
.from(BAR))
.execute();
This isn't strictly a jOOQ problem as you'd run into the same issues when writing the equivalent query in JDBC or even in a stored procedure. Such a bulk data transfer operation is usually the most efficient way to copy data between tables using SQL. There might be other tools available that bypass the SQL layer (e.g. pg_dump), but with SQL, this is optimal.
If you don't have enough resources to run everything in one go, you could partition your data set into several chunks using different techniques:
By transferring data of individual date ranges
By transferring data of individual ID ranges
By using keyset pagination
When partitioning your data as mentioned above, do also check if you can decrease the transaction size, e.g. to 1000 rows per commit. This isn't exact science, you'll have to find appropriate chunk and transaction sizes empirically for your specific system.
With all of these approaches, ACID is no longer guaranteed, so if your source data is modified during the move, you'll have to detect that somehow, and "fix it" (e.g by flagging rows that have been moved)
Or, just add more memory to the system.

What are the benefits of a Make Table vs a Select query in Access?

I know you can run SELECT queries on top of SELECT queries in Access, but the application also provides the Make Table query type.
I'm wondering what the benefits/reasons for using Make Table might be?
You would usually use Make Table for performance reasons. If you have a fairly complex query that returns a subset of your table's data, and that you may need to retrieve multiple times, it can be expensive to re-run the query multiple times.
Using Make Table allows you to incur the cost of running the expensive query once, and make a copy of the query results into a table. Querying this copy would then be a lot less expensive than running your original expensive query.
This is usually a good option when you don't expect your original data to change frequently, or if you don't care that you are working of a copy of the data that may not be 100% up-to-date with the original data.
Notice what the following article on Create a make table query has to say:
Typically, you create make table queries when you need to copy or archive data. For example, suppose you have a table (or tables) of past sales data, and you use that data in reports. The sales figures cannot change because the transactions are at least one day old, and constantly running a query to retrieve the data can take time — especially if you run a complex query against a large data store. Loading the data into a separate table and using that table as a data source can reduce workload and provide a convenient data archive. As you proceed, remember that the data in your new table is strictly a snapshot; it has no relationship or connection to its source table or tables.
The main defense here is that a make table query creates a table. And when you done with the table then effort and time to delete that table and recover the VERY LARGE increase in the database file will have to occur. For general reports and a query of data make much more send. A comparison would be to build a NEW garage every time you want to park your car.
The database engine and query system can fetch and pull rows at a very high rate and those results are then able to be rendered into a report or form, and this occurs without having to create a temp table. It makes little sense to go through all of the trouble of having the system create a WHOLE NEW table for such results of data when they can with ease be sent to a report.
In other words creating a whole table just to display or use some data that the database engine already fetched and returned makes little sense. A table is a set of rows that holds data that can be updated and the results are permanent. A query is a “on the fly” results or sub set of data that only exists in memory and is discarded after you use the results.
So for general reporting and display of data, it makes no sense to create a temp table. MUCH WORSE of an issue is that if you have two users wanting to run a report, if they both need different results and you send the results to the SAME temp table, then you have a big mess and collision between the two users. So use of a temp table in Access for the most part makes little sense, and this is EVEN MORE so when working in a multi-user environment. And as noted, once the table is created, then after you are done you need to delete and remove the table. And with many users in a multi-user database this becomes even more of a problem and issue.
However in a multi-user environment as pointed out that if the resulting data needs additional processing, then sending the results to a temp table can be of use. This approach however suggests that EACH USER has their own front end and own copy of the application side. And better is that the temp table is created outside of the front end application that resides on each computer. Since the application part (front end) is placed on each computer, then creating of a temp table does not occur in the production database (back end) and as a result you can have multiple users function correctly without each individual user creating a temp table in the production back end database. So if one is to adopt a make table query, it likely should occur on each local workstation and not in the back end database when you have a multiple user database application.
Thus for the most part a make table and that of reports and query of data are VERY different goals and tasks. You don't want nor as a general rule create a whole brand new table for a simple query. In a multi user database system the users might run 100's of reports in a given day and FEW if any systems will send such data to a temp table in place of sending the query results directly to the report.
It creates a table - which is useful if you have a need for that table which you may have for temporary use where you have to modify the data for calculations or further processing while not disturbing the original data.

is it ok to loop a sql query in programing language

I have a doubt in mind when retrieving data from database.
There are two tables and master table id always inserted to other table.
I know that data can retrieve from two table by joining but want to know,
if i first retrieve all my desire data from master table and then in loop (in programing language) join to other table and retrieve data, then which is efficient and why.
As far as efficiency goes the rule is you want to minimize the number of round trips to the database, because each trip adds a lot of time. (This may not be as big a deal if the database is on the same box as the application calling it. In the world I live in the database is never on the same box as the application.) Having your application loop means you make a trip to the database for every row in the master table, so the time your operation takes grows linearly with the number of master table rows.
Be aware that in dev or test environments you may be able to get away with inefficient queries if there isn't very much test data. In production you may see a lot more data than you tested with.
It is more efficient to work in the database, in fewer larger queries, but unless the site or program is going to be very busy, I doubt that it'll make much difference that the loop is inside the database or outside the database. If it is a website application then looping large loops outside the database and waiting on results will take a more significant amount of time.
What you're describing is sometimes called the N+1 problem. The 1 is your first query against the master table, the N is the number of queries against your detail table.
This is almost always a big mistake for performance.*
The problem is typically associated with using an ORM. The ORM queries your database entities as though they are objects, the mistake is assume that instantiating data objects is no more costly than creating an object. But of course you can write code that does the same thing yourself, without using an ORM.
The hidden cost is that you now have code that automatically runs N queries, and N is determined by the number of matching rows in your master table. What happens when 10,000 rows match your master query? You won't get any warning before your database is expected to execute those queries at runtime.
And it may be unnecessary. What if the master query matches 10,000 rows, but you really only wanted the 27 rows for which there are detail rows (in other words an INNER JOIN).
Some people are concerned with the number of queries because of network overhead. I'm not as concerned about that. You should not have a slow network between your app and your database. If you do, then you have a bigger problem than the N+1 problem.
I'm more concerned about the overhead of running thousands of queries per second when you don't have to. The overhead is in memory and all the code needed to parse and create an SQL statement in the server process.
Just Google for "sql n+1 problem" and you'll lots of people discussing how bad this is, and how to detect it in your code, and how to solve it (spoiler: do a JOIN).
* Of course every rule has exceptions, so to answer this for your application, you'll have to do load-testing with some representative sample of data and traffic.

How should I keep accurate records summarising multiple tables?

I have a normalized database and need to produce web based reports frequently that involve joins across multiple tables. These queries are taking too long, so I'd like to keep the results computed so that I can load pages quickly. There are frequent updates to the tables I am summarising, and I need the summary to reflect all update so far.
All tables have autoincrement primary integer keys, and I almost always add new rows and can arrange to clear the computed results in they change.
I approached a similar problem where I needed a summary of a single table by arranging to iterate over each row in the table, and keep track of the iterator state and the highest primary keen (i.e. "highwater") seen. That's fine for a single table, but for multiple tables I'd end up keeping one highwater value per table, and that feels complicated. Alternatively I could denormalise down to one table (with fairly extensive application changes), which feels a step backwards and would probably change my database size from about 5GB to about 20GB.
(I'm using sqlite3 at the moment, but MySQL is also an option).
I see two approaches:
You move the data in a separate database, denormalized, putting some precalculation, to optimize it for quick access and reporting (sounds like a small datawarehouse). This implies you have to think of some jobs (scripts, separate application, etc.) that copies and transforms the data from the source to the destination. Depending on the way you want the copying to be done (full/incremental), the frequency of copying and the complexity of data model (both source and destination), it might take a while to implement and then to optimizie the process. It has the advantage that leaves your source database untouched.
You keep the current database, but you denormalize it. As you said, this might imply changing in the logic of the application (but you might find a way to minimize the impact on the logic using the database, you know the situation better than me :) ).
Can the reports be refreshed incrementally, or is it a full recalculation to rework the report? If it has to be a full recalculation then you basically just want to cache the result set until the next refresh is required. You can create some tables to contain the report output (and metadata table to define what report output versions are available), but most of the time this is overkill and you are better off just saving the query results off to a file or other cache store.
If it is an incremental refresh then you need the PK ranges to work with anyhow, so you would want something like your high water mark data (except you may want to store min/max pairs).
You can create triggers.
As soon as one of the calculated values changes, you can do one of the following:
Update the calculated field (Preferred)
Recalculate your summary table
Store a flag that a recalculation is necessary. The next time you need the calculated values check this flag first and do the recalculation if necessary
Example:
CREATE TRIGGER update_summary_table UPDATE OF order_value ON orders
BEGIN
UPDATE summary
SET total_order_value = total_order_value
- old.order_value
+ new.order_value
// OR: Do a complete recalculation
// OR: Store a flag
END;
More Information on SQLite triggers: http://www.sqlite.org/lang_createtrigger.html
In the end I arranged for a single program instance to make all database updates, and maintain the summaries in its heap, i.e. not in the database at all. This works very nicely in this case but would be inappropriate if I had multiple programs doing database updates.
You haven't said anything about your indexing strategy. I would look at that first - making sure that your indexes are covering.
Then I think the trigger option discussed is also a very good strategy.
Another possibility is the regular population of a data warehouse with a model suitable for high performance reporting (for instance, the Kimball model).