How to insert/update multiple records in single call to create/update_attributes in Rhomobile - insert-update

As per the performance tip in Rhom API of Rhomobile,
We should prepare the whole data set first and then call the create/update_attributes for better performance over preparing single record then calling create inside loop.
As per my knowledge, create method takes the object of single record as like this,
#account = Account.create(
{"name" => "some new record", "industry" => "electronics"}
)
So i wonder how to create/update multiple records on a single call?
Thanks in advance.

First, I have no idea how much this will actually affect performance, whether positively or negatively, and have never measured it.
That said, you can wrap all the CRUD calls in a transaction, to minimise the DB connections opened and closed. This can also help you with maintaining referential integrity, by rolling back changes if some record is causing a problem with your new dataset.
# Load all DB Models, to ensure they are available before first time import
Rho::RHO.load_all_sources();
# Get instance of DB to work transactions with
db = ::Rho::RHO.get_db_partitions()['local'] # Get reference to model db
db.start_transaction() # BEGIN transaction
... Do all your create/update/deletes
if (was_import_successful)
db.commit # COMMIT transaction
else
db.rollback() # ROLLBACK transaction
end

Using Rhom, you can still write SQL queries for the underlying SQLite engine. But you need to understand what is the Table format you're using.
The default PropertyBags data model are all stored in a key value store in a single Table, if you're looking for the maximum performance, you better switch to FixedSchema data models. In this case you loose some flexibility but you gain some performance and you save same space.
My suggestion is to use transactions, like you're already doing, switch to FixedSchema data models and see if you're fine in that way. If you really need to increase the speed, maybe you can achieve what you want in a different way, something like importing a SQLite database created on the server side.
This is the method that RhoConnect uses for the bulk synchronization.

Related

When to use internal tables?

So, I have read that using internal tables increases the performance of the program and that we should make operations on DB tables as less as possible. But I have started working on a project that does not use internal tables at all.
Some details:
It is a scanner that adds or removes products in/from a store. First the primary key is checked (to see if that type of product exists) and then the product is added or removed. We use ‘Insert Into’ and ‘Delete From’ to add/remove the products directly from the DB table.
I have not asked why they do not use internal tables because I do not have a better solution so far.
Here’s what I have so far: Insert all products in an internal table, place the deleted products in another internal table.
Form update.
Modify zop_db_table from table gt_table." – to add all new products
LOOP AT gt_deleted INTO gs_deleted.
DELETE FROM zop_db_table WHERE index_nr = gs_deleted-index_nr.
ENDLOOP. " – to delete products
Endform.
But when can I perform this update?
I could set a ‘Save button’ to perform the update, but then there will be the risk that the user forgets to save large amounts of data, or drops the scanner, shutting it down or similar situations. So this is clearly not a good solution.
My final question is: Is there a (good) way to implement internal tables in a project like this?
internal tables should be used for data processing, like lists or arrays in other languages (c#, java...). From a performance and system load perspective it is preferred to first load all data you need into an internal table, then process that internal table instead of loading individual records from the database.
But that is mostly true for reporting, which is probably the most common type of custom abap program. You often see developers use select...endselect-statements, that in effect loop over a database table, transferring row after row to the report, one at a time. That is extremely slow compared to reading all records at once into an itab, then looping over the itab. More than once i've cut the execution time of a report down to a fraction by just eliminating roundtrips to the database.
If you have a good reason to read from the database or update records immediately, you should do so. If you can safely delay updates and deletes to a point in time where you can process all of them together, without risking inconsistencies, I'd consider than an improvement. But if there is a good reason (like consistency or data loss) to update immediately, do it.
Update: as #vwegert mentioned regarding the select-endselect statement, the statement doesn't actually create individual database queries for each row. The database interface of the application server optimizes the query, transferring rows in bulk to the application server. From there the records are transported to the abap report one by one (because in the report there is only the work area to store a single row), which has a significant performance impact especially for queries with large result sets. A select into an internal table can transport all rows directly to the abap report (as long as there is enough memory to hold them), as now there is the internal table to hold those records in the report.

Rails ActiveRecord - how can I lock a table for reading?

I have some Rails ActiveRecord code that looks like this:
new_account_number = Model.maximum(:account_number)
# Some processing that usually involves incrementing
# the new account number by one.
Model.create(foo: 12, bar: 34, account_number: new_account_number)
This code works fine on its own, but I have some background jobs that are processed by DelayedJob workers. There are two workers and if they both start processing a batch of jobs that deal with this code, they end up creating new Model records that has the same account_number, because of the delay between finding the maximum and creating a new record with an even higher account number.
For now, I have solved it by adding a uniqueness constraint at database level to the models table and then retry by re-selecting the maximum in case this constraint triggers an exception.
However it feels like a hack.
Adding auto incrementing at database level to the account_number column is not an option, because the account_number assigning entails more than just incrementing.
Ideally I would like to lock the table in question for reading, so no other can execute the maximum select query against the table until I am done. However, I'm not sure how to go about that. I'm using Postgresql.
Based on the ActiveRecord::Locking docs it looks like Rails doesn't provide a built-in API for table-level locks.
But you can still do this with raw SQL. For Postgres, this looks like
ActiveRecord::Base.transaction do
ActiveRecord::Base.connection.execute('LOCK table_name IN ACCESS EXCLUSIVE MODE')
...
end
The lock must be acquired within a transaction, and is automatically freed once the transaction ends.
Note that the SQL you use here will be different depending on your database.
Obviously locking the entire table is not elegant or efficient, but for small apps, for some time, it may indeed be the best solution. It's simple and easy to reason about. In general, an advisory lock is a better fit for this kind of data race.
There are already answers on how to lock the entire table, but I believe you should try to avoid that. Instead I believe you should give advisory locks a look. It makes sure the same block of code isn't executed on two machines simultaneously, while still keeping the table open for other business.
It still uses the database, but it doesn't lock your tables.
You can use the gem called "with_advisory_lock" like this:
Model.with_advisory_lock("ADVISORY_LOCK_NAME") do
# Your code
end
https://github.com/ClosureTree/with_advisory_lock
It doesn't work with SQLite.
Setting unique constraint IS NOT a hack. It is thing that makes your data consistent.
By the way you have a few more options here:
Lock some DB resource (e.g. it could be a unique record) using
SELECT FOR UPDATE or PostreSQL's Advisory Locks (see docs).
Use a sequence (docs).
The main difference between two approaches is #1 does not allow to have gaps in your numbers because other session will wait for transaction commit and #2 allows.
you don't have to lock the hall table to lock a piece of code for a single process at a time. locking a full table causes performence problems.you can lock a single same row all the time with "with_lock" method.this way code is fully protected. no extra gem is needed. it also creates a transaction. like this:
m = Model.order(:id).first
m.with_lock do #aquire lock
#some code here for a single process at a time
end #release lock
Well, technically it's the same to lock a table or to always lock a record of another table before accessing the table.
So you may have another table with max one record, alway lock that record with http://api.rubyonrails.org/classes/ActiveRecord/Locking/Pessimistic.html before read/write from the table you want to lock:
LockTable.last.with_lock do
// the things that needed for your table
end

Encapsulating Data Access logic

I have recently purchased the DoFactory framework in an attempt to learn more about design patterns. The product is good but it concentrates on transactional operations only e.g. a user updating a customer record or inserting a customer record.
I have an app that has scheduled tasks e.g. one thousand customers are created overnight via a web service. I am trying to understand the best way to approach this:
Option 1
public sub shared InsertCustomerBatch(ByVal list as list(as Customer))
For Each Customer In Customers
'Connect to database
'Insert Customer
Next
end public
Option 2
Public Sub InsertCustomer(ByVal list as list(as typeCustomer))
For Each typeCustomer as typeCustomer In list
Customer.Insert(typeCustomer)
End For
End Public
Both options will obviously work, however I believe that option 2 is "better" because it follows design principles i.e. Customer.Insert is encapsulated in the Customer class.
However, after talking to a more senior developer earlier he said choose option 1, but I do not understand why. Is option 1 "better".
I think one has to justify why a connection has to be opened and closed with every row in a batch scenario (option 1). One advantage may be implicit commit. However, frequent commits are not usually required in batch processing of many LOB applications. A business decision may have to be taken to determine the sensitivity of the data of course. However, it makes sense to commit rows in groups of reasonable sizes (bound by db log size).
One way is to divide a large batch size into several small logical batches and commit each batch separately. Another way is to use bulk copy to insert rows in the db when appropriate (see for example: bulk insert data into db. Also note that by default, constraints on the table are not checked for the bulk copy operation unless CHECK_CONSTRAINTS is specified.
Also, it may be good to check the connection timeout setting in case it may have an effect on long transaction processing(not sure on that one). However, I guess in your case the defaults should work fine.
In conclusion, I would suggest you go with option 2, possibly with some modifications as suggested if your case calls for large number of rows.

How should I keep accurate records summarising multiple tables?

I have a normalized database and need to produce web based reports frequently that involve joins across multiple tables. These queries are taking too long, so I'd like to keep the results computed so that I can load pages quickly. There are frequent updates to the tables I am summarising, and I need the summary to reflect all update so far.
All tables have autoincrement primary integer keys, and I almost always add new rows and can arrange to clear the computed results in they change.
I approached a similar problem where I needed a summary of a single table by arranging to iterate over each row in the table, and keep track of the iterator state and the highest primary keen (i.e. "highwater") seen. That's fine for a single table, but for multiple tables I'd end up keeping one highwater value per table, and that feels complicated. Alternatively I could denormalise down to one table (with fairly extensive application changes), which feels a step backwards and would probably change my database size from about 5GB to about 20GB.
(I'm using sqlite3 at the moment, but MySQL is also an option).
I see two approaches:
You move the data in a separate database, denormalized, putting some precalculation, to optimize it for quick access and reporting (sounds like a small datawarehouse). This implies you have to think of some jobs (scripts, separate application, etc.) that copies and transforms the data from the source to the destination. Depending on the way you want the copying to be done (full/incremental), the frequency of copying and the complexity of data model (both source and destination), it might take a while to implement and then to optimizie the process. It has the advantage that leaves your source database untouched.
You keep the current database, but you denormalize it. As you said, this might imply changing in the logic of the application (but you might find a way to minimize the impact on the logic using the database, you know the situation better than me :) ).
Can the reports be refreshed incrementally, or is it a full recalculation to rework the report? If it has to be a full recalculation then you basically just want to cache the result set until the next refresh is required. You can create some tables to contain the report output (and metadata table to define what report output versions are available), but most of the time this is overkill and you are better off just saving the query results off to a file or other cache store.
If it is an incremental refresh then you need the PK ranges to work with anyhow, so you would want something like your high water mark data (except you may want to store min/max pairs).
You can create triggers.
As soon as one of the calculated values changes, you can do one of the following:
Update the calculated field (Preferred)
Recalculate your summary table
Store a flag that a recalculation is necessary. The next time you need the calculated values check this flag first and do the recalculation if necessary
Example:
CREATE TRIGGER update_summary_table UPDATE OF order_value ON orders
BEGIN
UPDATE summary
SET total_order_value = total_order_value
- old.order_value
+ new.order_value
// OR: Do a complete recalculation
// OR: Store a flag
END;
More Information on SQLite triggers: http://www.sqlite.org/lang_createtrigger.html
In the end I arranged for a single program instance to make all database updates, and maintain the summaries in its heap, i.e. not in the database at all. This works very nicely in this case but would be inappropriate if I had multiple programs doing database updates.
You haven't said anything about your indexing strategy. I would look at that first - making sure that your indexes are covering.
Then I think the trigger option discussed is also a very good strategy.
Another possibility is the regular population of a data warehouse with a model suitable for high performance reporting (for instance, the Kimball model).

Is every DDL SQL command reversible? [database version control]

I want to setup a mechanism for tracking DB schema changes, such the one described in this answer:
For every change you make to the
database, you write a new migration.
Migrations typically have two methods:
an "up" method in which the changes
are applied and a "down" method in
which the changes are undone. A single
command brings the database up to
date, and can also be used to bring
the database to a specific version of
the schema.
My question is the following: Is every DDL command in an "up" method reversible? In other words, can we always provide a "down" method? Can you imagine any DDL command that can not be "down"ed?
Please, do not consider the typical data migration problem where during the "up" method we have loss of data: e.g. changing a field type from datetime (DateOfBirth) to int (YearOfBirth) we are losing data that can not be restored.
in sql server every DDL command that i know of is an up/down pair.
Other than loss of data, every migration I've ever done is reversible. That said, Rails offers a way to mark a migration as "destructive":
Some transformations are destructive
in a manner that cannot be reversed.
Migrations of that kind should raise
an ActiveRecord::IrreversibleMigration
exception in their down method.
See the API documentation here.
Yes, you've identified cases where you lose data, either by transforming it or simply DROP COLUMN in the "up" migration.
Another example is that you could drop a SEQUENCE object, thus losing its state. The "down" migration would recreate the sequence, but it would start over at 1. This could cause duplicate values to be generated by the sequence. Not a problem if you're performing a migration on an empty database, and you want the sequence to start at 1 anyway, but if you have some number of rows of data, you'd want the sequence to be reset to the greatest value currently in use, which is hard to do reliably, unless you have an exclusive lock on that table.
Any other DDL that is dependent on the state of data in the database has similar problems. That's probably not a good schema design in the first place, I'm just trying to think of any cases that fit your question.