Truncating takes too much time hsqldb

Truncating takes too much time hsqldb - properties

I use HSQLDB 2.3.2 and I've got the following issues:
My database has cached table with 10 000 000 records without any constraints and indexes. Its size is about ~900mb. I turn off autocommit mode and when I try to execute "Truncate table tableName", execution hangs but only dbName.backup is growing. And here's why:
TRACE ENGINE:? - copyShadow [size, time] 2246252 9721 TRACE
ENGINE:? - setFileModified flag set TRACE ENGINE:? - cache save
rows [count,time] totals 24801,9921 operation 24801,9921 txts 96
TRACE ENGINE:? - copyShadow [size, time] 4426920 7732 TRACE
ENGINE:? - cache save rows [count,time] totals 49609,17775 operation
24808,7854 txts 96 TRACE ENGINE:? - copyShadow [size, time]
6574796 9024
It takes about 1500-2000 seconds and finally I can get either empty table or exception that is something like this:
Tests run: 0, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 1,525.509 sec
org.apache.maven.surefire.util.SurefireReflectionException: java.lang.reflect.InvocationTargetException; nested exception is java.lang.reflect.InvocationTargetException: null
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.maven.surefire.util.ReflectionUtils.invokeMethodWithArray(ReflectionUtils.java:164)
at org.apache.maven.surefire.booter.ProviderFactory$ProviderProxy.invoke(ProviderFactory.java:110)
at org.apache.maven.surefire.booter.SurefireStarter.invokeProvider(SurefireStarter.java:175)
at org.apache.maven.surefire.booter.SurefireStarter.runSuitesInProcessWhenForked(SurefireStarter.java:107)
at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:68)
Caused by: java.lang.OutOfMemoryError: Java heap space
After the truncating dbName.backup is removed. I don't have to use any backups in my application, how can I avoid the copying?
dbName.properties doesn't work. It contains the following text:
#HSQL Database Engine 2.3.2
#Thu Mar 19 08:42:10 EAT 2015
version=2.3.2
modified=no
I tried to append hsqldb.applog=1 but nothing happened. dbName.app.log appears in case you change the line SET DATABASE EVENT LOG LEVEL 1 in dbName.script
After working with the database from my application dbName.properties is overwrited: 'modified' changes on 'yes' and any lines below are deleted. What do I do wrong?

My database has cached table with 10 000 000 records without any
constraints and indexes.
It is simply wrong to have a large table without any constraints and indexes. Any SELECT, UPDATE or DELETE on this table that affects a few rows has to search all the rows in the table.
I try to execute "Truncate table tableName"
The above statement allows you to roll back the action before you commit. The information for rolling back is stored in memory and memory will run out when the table is very large. As you do want to commit the change, use this statement instead:
TRUNCATE TABLE tableName AND COMMIT
This is mentioned in the Guide: http://hsqldb.org/doc/2.0/guide/dataaccess-chapt.html#dac_truncate_statement
The only addition that you can make to the .properties file is to set the database files readonly. Any other change will be ignored and deleted.
The .backup file is for internal use of the database engine and you cannot stop its use.

Related

Spark JDBC Write to Teradata: multiple spark tasks failing with Transaction ABORTed due to deadlock error resulting in Stage failure

I am using spark JDBC write to load data from hive to teradata view. I am using 200 vcores and partitioned the data into 10000 partitions.
Spark tasks are failing with the below error resulting in stage failure. Sometimes the application finishes successfully but with some duplicate records
caused by: java.sql.SQLException: [Teradata Database] [TeraJDBC 16.20.00.10] [Error 2631] [SQLState 40001] Transaction ABORTed due to deadlock.
Below is the code I have used:
val df = spark.sql("select * from hive table").distinct.repartition(10000).write.mode(overwrite)
.option("truncate", Truncate).jdbc(url,dbTable, dproperties)
Teradata view is created with "AS LOCKING ROW FOR ACCESS". The table also has a unique PI.
I am unable to figure out why some spark tasks are failing with dead lock error and is there a way I can stop my entire spark application from failing because of the task failures.

Dozens of sessions trying to insert into the same table will likely cause a deadlock. Even though the view is defined with an access lock, a write lock must be obtained in order to insert rows into the backing table.

Concurrency Spanner java stacktrace on `bq query` update column of partitioned table

I've created sql that does an update of all values in one column
UPDATE `Blackout_POC2.measurements_2020`
SET visitor.customerId_enc = enc.encrypted
FROM `Blackout_POC2.encrypted` AS enc
WHERE dateAmsterdam="2020-01-05"
AND session.visitId = enc.visitId
AND visitor.customerId = enc.plain
where dateAmsterdam is the partition key of the measurements_2020 table, and encrypted is a non-partitioned table that holds visitId, plain and encrypted fields. The code sets all values in the customerId_enc column with values from the encrypted table.
The code works perfectly fine when I run it one day at a time, but when I run days in parallel, I occassionally (1% or so) get a stacktrace from my bq query <sql> (see below).
I thought that I could modify partitioned tables in parallel within each partition, but that seems to occassionally not be the case. Could someone point me to where this would be documented, and preferably how to avoid this?
I can probably just rerun that query again, since it is idempotent, but I would like to know why this happens.
Thanks
Bart van Deenen, data-engineer Bol.com
Error in query string: Error processing job 'bolcom-dev-blackout-339:bqjob_r131fa5b3dfd24829_0000016faec5e5da_1': domain: "cloud.helix.ErrorDomain"
code: "QUERY_ERROR" argument: "Could not serialize access to table bolcom-dev-blackout-339:Blackout_POC2.measurements_2020 due to concurrent update"
debug_info: "[CONCURRENT_UPDATE] Table modified by concurrent UPDATE/DELETE/MERGE DML or truncation at 1579185217979. Storage set job_uuid:
03d3d5ec-2118-4e43-9fec-1eae99402c86:20200106, instance_id: ClonedTable-1579183484786, Reason: code=CONCURRENT_UPDATE message=Could not serialize
access to table bolcom-dev-blackout-339:Blackout_POC2.measurements_2020 due to concurrent update debug=Table modified by concurrent UPDATE/DELETE/MERGE
DML or truncation at 1579185217979. Storage set job_uuid: 03d3d5ec-2118-4e43-9fec-1eae99402c86:20200106, instance_id: ClonedTable-1579183484786
errorProto=domain: \"cloud.helix.ErrorDomain\"\ncode: \"QUERY_ERROR\"\nargument: \"Could not serialize access to table bolcom-dev-
blackout-339:Blackout_POC2.measurements_2020 due to concurrent update\"\ndebug_info: \"Table modified by concurrent UPDATE/DELETE/MERGE DML or
truncation at 1579185217979. Storage set job_uuid: 03d3d5ec-2118-4e43-9fec-1eae99402c86:20200106, instance_id: ClonedTable-1579183484786\"\n\n\tat
com.google.cloud.helix.common.Exceptions$Public.concurrentUpdate(Exceptions.java:381)\n\tat
com.google.cloud.helix.common.Exceptions$Public.concurrentUpdate(Exceptions.java:373)\n\tat
com.google.cloud.helix.server.metadata.StorageTrackerData.verifyStorageSetUpdate(StorageTrackerData.java:224)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner.validateUpdates(AtomicStorageTrackerSpanner.java:1133)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner.updateStorageSets(AtomicStorageTrackerSpanner.java:1310)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner.updateStorageSets(AtomicStorageTrackerSpanner.java:1293)\n\tat
com.google.cloud.helix.server.metadata.MetaTableTracker.updateStorageSets(MetaTableTracker.java:2274)\n\tat
com.google.cloud.helix.server.job.StorageSideEffects$1.update(StorageSideEffects.java:1123)\n\tat
com.google.cloud.helix.server.job.StorageSideEffects$1.update(StorageSideEffects.java:976)\n\tat
com.google.cloud.helix.server.metadata.MetaTableTracker$1.update(MetaTableTracker.java:2510)\n\tat
com.google.cloud.helix.server.metadata.StorageTrackerSpanner.lambda$atomicUpdate$7(StorageTrackerSpanner.java:165)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner$Factory$1.run(AtomicStorageTrackerSpanner.java:3775)\n\tat com.google.cloud.helix.se
rver.metadata.AtomicStorageTrackerSpanner$Factory.lambda$performJobWithCommitResult$0(AtomicStorageTrackerSpanner.java:3792)\n\tat
com.google.cloud.helix.server.metadata.persistence.SpannerTransactionContext$RetryCountingWork.run(SpannerTransactionContext.java:1002)\n\tat com.googl
e.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.executeWithResultInternal(SpannerTransactionContext.java:840)\n\tat com.goo
gle.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.executeOptimisticWithResultInternal(SpannerTransactionContext.java:722)\n
\tat com.google.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.lambda$executeOptimisticWithResult$1(SpannerTransactionContex
t.java:716)\n\tat
com.google.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.executeWithMonitoring(SpannerTransactionContext.java:942)\n\tat co
m.google.cloud.helix.server.metadata.persistence.SpannerTransactionContext$Factory.executeOptimisticWithResult(SpannerTransactionContext.java:715)\n\ta
t com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner$Factory.performJobWithCommitResult(AtomicStorageTrackerSpanner.java:3792)\n\tat
com.google.cloud.helix.server.metadata.AtomicStorageTrackerSpanner$Factory.performJobWithCommitResult(AtomicStorageTrackerSpanner.java:3720)\n\tat
com.google.cloud.helix.server.metadata.StorageTrackerSpanner.atomicUpdate(StorageTrackerSpanner.java:159)\n\tat
com.google.cloud.helix.server.metadata.MetaTableTracker.atomicUpdate(MetaTableTracker.java:2521)\n\tat com.google.cloud.helix.server.metadata.StatsRequ
estLoggingTrackers$LoggingStorageTracker.lambda$atomicUpdate$8(StatsRequestLoggingTrackers.java:494)\n\tat
com.google.cloud.helix.server.metadata.StatsRequestLoggingTrackers$StatsRecorder.record(StatsRequestLoggingTrackers.java:181)\n\tat
com.google.cloud.helix.server.metadata.StatsRequestLoggingTrackers$StatsRecorder.record(StatsRequestLoggingTrackers.java:158)\n\tat
com.google.cloud.helix.server.metadata.StatsRequestLoggingTrackers$StatsRecorder.access$500(StatsRequestLoggingTrackers.java:123)\n\tat
com.google.cloud.helix.server.metadata.StatsRequestLoggingTrackers$LoggingStorageTracker.atomicUpdate(StatsRequestLoggingTrackers.java:493)\n\tat
com.google.cloud.helix.server.job.StorageSideEffects.apply(StorageSideEffects.java:1238)\n\tat
com.google.cloud.helix.server.rosy.MergeStorageImpl.commitChanges(MergeStorageImpl.java:936)\n\tat
com.google.cloud.helix.server.rosy.MergeStorageImpl.merge(MergeStorageImpl.java:729)\n\tat
com.google.cloud.helix.server.rosy.StorageStubby.mergeStorage(StorageStubby.java:937)\n\tat
com.google.cloud.helix.proto2.Storage$ServiceParameters$21.handleBlockingRequest(Storage.java:2100)\n\tat
com.google.cloud.helix.proto2.Storage$ServiceParameters$21.handleBlockingRequest(Storage.java:2098)\n\tat
com.google.net.rpc3.impl.server.RpcBlockingApplicationHandler.handleRequest(RpcBlockingApplicationHandler.java:28)\n\tat
....

BigQuery DML operations doesn't have support for multi-statement transactions; nevertheless, you can execute some concurrent statements:
UPDATE and INSERT
DELETE and INSERT
INSERT and INSERT
For example, you execute two UPDATES statements simultaneously against the table then only one of them will succeed.
Keeping this in mind, due you can execute concurrently UPDATE and INSERT statements, another possible cause is if you are executing multiple UPDATE statements simultaneously.
You could try using the Scripting feature to manage the execution flow to prevent DML concurrency.

Why SQL statement doesn't work? How do transactions work?

Question expression:
I am trying out a few things with regards to transactions. Consider the following operations:
Open a connection window 1 in SQL Server Management Studio
BEGIN a transaction, then delete about 2000 rows of data from table A
Open another connection window 2 in SQL Server Management Studio
Insert one new row of data into table A in window 2 (no transaction), it runs successfully right now
Then, I repeat the same operation, but in STEP 2 I delete 10k rows of data, in that case, STEP 4 can't run successfully. I had already waited for half an hour.
It shows that it is executing SQL...can't finish. Finally, I insert the data using connection window 1, it works right now.
Why does it work with 2k rows but not 10k rows?
The Sql sentence I execute:
In connection windows A, I execute
BEGIN TRAN
Delete from tableA (10K rows)
In connection windows B, I execute
Insert into tableA(..) VALUES (...)
windows B can't executes successfully.
Many thanks #Gordon
The cause:
I search the keyword about lock escalation.
I try to track lock escalation using SQL Server Profiler, and I get a
lock:escalation when I delete many data in a transaction(I don't commit or rollback).
So, I know the concept of lock escalation.
I delete too much data in a table ,then the row locks escalation to table escalation.I didn't commit or rollback them, other connection (or application)
can't access the table with table lock.
trace lock escalation in sql profiler
When the lock escalation happens in MSSQL Server
"The locks option also affects when lock escalation occurs. When locks is set to 0, lock escalation occurs when the memory used by the current lock structures reaches 40 percent of the Database Engine memory pool. When locks is not set to 0, lock escalation occurs when the number of locks reaches 40 percent of the value specified for locks."
How to set Using SQL Server Management Studio that will impact lock escalation:
To configure the locks option
In Object Explorer, right-click a server and select Properties .
Click the Advanced node.
Under Parallelism , type the desired value for the locks option.
Use the locks option to set the maximum number of available locks, such limiting the amount of memory SQL Server uses for them.

What you are experiencing is "lock escalation". By default, SQL Server uses (I think) row level locks for the delete. However if an expression has more than some threshold -- 5,000 locks -- then SQL Server escalates the locking to the entire table.
This is an automatic mechanism, which you can turn off if you need.
There is a lot of information about this, both in SQL Server documents and related documents.

Execution Timeout Expired - Randomly for simple update command

I am randomly getting execution timeout expired error using the Entity Framework (EF6). At the time of executing the below update command it gives randomly execution timeout error.
UPDATE [dbo].[EmployeeTable] SET [Name]=#0,[JoiningDate]=#1 WHERE
([EmpId]=#2)
The above update command is simple and it takes 2-5 seconds to update the EmployeeTable. But sometime the same update query takes 40-50 seconds and leads the error as
Execution Timeout Expired. The timeout period elapsed prior to
completion of the operation or the server is not responding. the
statement has been terminated
.
For that I updated my code inside constructor of MyApplicationContext class can be changed to include the following property
this.Database.CommandTimeout = 180;
The above command should resolve my timeout issue. But I can’t find out the root cause of that issue.
For my understanding this type of timeout issue can have three causes;
There's a deadlock somewhere
The database's statistics and/or query plan cache are incorrect
The query is too complex and needs to be tuned
Can you please tell me what the main root cause of that error is?

This query:
UPDATE [dbo].[EmployeeTable]
SET [Name] = #0,
[JoiningDate] = #1
WHERE ([EmpId]=#2);
Should not normally take 2 seconds. It is (presumably) updating a single row and that is not a 2-second operation, even on a pretty large table. Do you have an index on EmployeeTable(EmpId)? If not, that would explain the 2 seconds as well as the potential for deadlock.
If you do have an index, this perhaps something else is going on. One place to look is for any triggers on the table.

If it's random, maybe something else is accessing the database while your application is updating rows.
We got the same exact issue. In our case, the root cause was BO reports being run on the production database. These reports were locking rows and causing timeouts in our applications (randomly since these reports were executed on demand).
Other things that you might want to check:
Do you have complex triggers on your table ?
Does all foreign keys used in your table are indexed in the foreign tables ?

Running pg_dump while insert queries are still running?

If I run pg_dump to dump a table into a SQL file, does it take a snapshot of the last row in the table, and dump all the rows up to this row?
Or does it keep dumping all the rows, even those that were inserted after pg_dump was ran?
A secondary question is: Is it a good idea to stop all insert queries before running pg_dump?

It will obtain a shared lock on your tables when you run the pg_dump. Any transactions completed after you run the dump will not be included. So when the dump is finished, if there are current transactions in process that haven't been committed, they won't be included in the dump.
There is another pg_dump option with which it can be run:
--lock-wait-timeout=timeout
Do not wait forever to acquire shared table locks at the beginning of
the dump. Instead fail if unable to lock a table within the specified
timeout. The timeout may be specified in any of the formats accepted
by SET statement_timeout. (Allowed formats vary depending on the
server version you are dumping from, but an integer number of
milliseconds is accepted by all versions.)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas