Running pg_dump while insert queries are still running?

Running pg_dump while insert queries are still running? - sql

If I run pg_dump to dump a table into a SQL file, does it take a snapshot of the last row in the table, and dump all the rows up to this row?
Or does it keep dumping all the rows, even those that were inserted after pg_dump was ran?
A secondary question is: Is it a good idea to stop all insert queries before running pg_dump?

It will obtain a shared lock on your tables when you run the pg_dump. Any transactions completed after you run the dump will not be included. So when the dump is finished, if there are current transactions in process that haven't been committed, they won't be included in the dump.
There is another pg_dump option with which it can be run:
--lock-wait-timeout=timeout
Do not wait forever to acquire shared table locks at the beginning of
the dump. Instead fail if unable to lock a table within the specified
timeout. The timeout may be specified in any of the formats accepted
by SET statement_timeout. (Allowed formats vary depending on the
server version you are dumping from, but an integer number of
milliseconds is accepted by all versions.)

Related

SQL Server BULK INSERT errored out and did not release Sch-M lock, table is unaccessible

TLDR: I think SQL Server did not release a Sch-M lock and I can't release it or find the transaction holding the lock
I was trying to build a copy of another table and run some benchmarking on the data updates from disk, and when running a BULK INSERT on the empty table it errored out because I had used the wrong file name. I was running the script by selecting a large portion of text from a notepad in SQL Server Management Studio and hitting the Execute button. Now, no queries regarding the table whatsoever can execute, including things like OBJECT_ID or schema information in SQL Server Management Studio when manually refreshed. The queries just hang, and in the latter case SSMS gives an error about a lock not being relinquished within a timeout.
I have taken a few debugging steps so far.
I took a look at the sys.dm_tran_locks table filtered on the DB's resource ID. There, I can see a number of shared locks on the DB itself and some exclusive key locks, and exactly 1 lock on an object, a Sch-M lock. When I try and get the object name from the resource ID on the sys.dm_tran_locks table, the query hangs just like OBJECT_ID() does (and does not for other table names/IDs). The lock cites session ID 54.
I have also taken a look at the sys.dm_exec_requests table to try and find more information about SPID 54, but there are no rows with that session id. In fact, the only processes there are ones owned by sa and the single query checking the sys.dm_exec_requests table, owned by myself.
From this, if I understand everything correctly, it seems like somehow the BULK INSERT statement somehow failed to release the Sch-M lock that it takes.
So here are my questions: Why is there still a Sch-M lock on the table if the process that owns it seems to no longer exist? Is there some way to recover access to the table without restarting the SQL Server process? Would SQL code in the script after the BULK INSERT have run, but just on an empty table?
I am using SQL Server 2016 and SQL Server Management Studio 2016.

Understanding locks and query status in Snowflake (multiple updates to a single table)

While using the python connector for snowflake with queries of the form
UPDATE X.TABLEY SET STATUS = %(status)s, STATUS_DETAILS = %(status_details)s WHERE ID = %(entry_id)s
, sometimes I get the following message:
(snowflake.connector.errors.ProgrammingError) 000625 (57014): Statement 'X' has locked table 'XX' in transaction 1588294931722 and this lock has not yet been released.
and soon after that
Your statement X' was aborted because the number of waiters for this lock exceeds the 20 statements limit
This usually happens when multiple queries are trying to update a single table. What I don't understand is that when I see the query history in Snowflake, it says the query finished successfully (Succeded Status) but in reality, the Update never happened, because the table did not alter.
So according to https://community.snowflake.com/s/article/how-to-resolve-blocked-queries I used
SELECT SYSTEM$ABORT_TRANSACTION(<transaction_id>);
to release the lock, but still, nothing happened and even with the succeed status the query seems to not have executed at all. So my question is, how does this really work and how can a lock be released without losing the execution of the query (also, what happens to the other 20+ queries that are queued because of the lock, sometimes it seems that when the lock is released the next one takes the lock and have to be aborted as well).
I would appreciate it if you could help me. Thanks!

Not sure if Sergio got an answer to this. The problem in this case is not with the table. Based on my experience with snowflake below is my understanding.
In snowflake, every table operations also involves a change in the meta table which keeps track of micro partitions, min and max. This meta table supports only 20 concurrent DML statements by default. So if a table is continuously getting updated and getting hit at the same partition, there is a chance that this limit will exceed. In this case, we should look at redesigning the table updation/insertion logic. In one of our use cases, we increased the limit to 50 after speaking to snowflake support team

UPDATE, DELETE, and MERGE cannot run concurrently on a single table; they will be serialized as only one can take a lock on a table at at a time. Others will queue up in the "blocked" state until it is their turn to take the lock. There is a limit on the number of queries that can be waiting on a single lock.
If you see an update finish successfully but don't see the updated data in the table, then you are most likely not COMMITting your transactions. Make sure you run COMMIT after an update so that the new data is committed to the table and the lock is released.
Alternatively, you can make sure AUTOCOMMIT is enabled so that DML will commit automatically after completion. You can enable it with ALTER SESSION SET AUTOCOMMIT=TRUE; in any sessions that are going to run an UPDATE.

Can I perform a dump tran mid-transaction? Also, effects of using delay on log?

Good day,
Two questions:
A) If I have something like this:
COMPLEX QUERY
WAIT FOR LOG TO FREE UP (DELAY)
COMPLEX QUERY
Would this actually work? Or would the log segment of tempdb remain just as full, due to still holding on to the log of the first query.
B) In the situation above, is it possible to have the middle query perform a dump tran with truncate_only ?
(It's a very long chain of various queries that are run together. They don't change anything in the databases and I don't care to even keep the logs if I don't have to.)
The reason for the chain is because I need the same two temp tables, and a whole bunch of variables, for various queries in the chain (Some of them for all of the queries). To simply the usage of the query chain by a user with VERY limited SQL knowledge, I collect very simple information at the beginning of the long script, retrieve the rest automatically, and then use it through out the script
I doubt either of these would work, but I thought I may as well ask.
Sybase versions 15.7 and 12 (12.? I don't remember)
Thanks,
Ziv.
Per my understanding of #michael-gardner 's answers this is what I plan:
FIRST TEMP TABLES CREATION
MODIFYING OPERATIONS ON FIRST TABLES
COMMIT
QUERY1: CREATE TEMP TABLE OF THIS QUERY
QUERY1: MODIFYING OPERATIONS ON TABLE
QUERY1: SELECT
COMMIT
(REPEAT)
DROP FIRST TABLES (end of script)
I read that 'select into' is not written to the log, so I'm creating the table with a create (I have to do it this way due to other reasons), and use select into existing table for initial population. (temp tables)
Once done with the table, I drop it, then 'commit'.
At various points in the chain I check the log segment of tempdb, if it's <70% (normally at >98%), I use a goto to reach the end of the script where I drop the last temp tables and the script ends. (So no need for a manual 'commit' here)
I misunderstood the whole "on commit preserve rows" thing, that's solely on IQ, and I'm on ASE.

Dumping the log mid-transaction won't have any affect on the amount of log space. The Sybase log marker will only move if there is a commit (or rollback), AND if there isn't an older open transaction (which can be found in syslogshold)
There are a couple of different ways you can approach solving the issue:
Add log space to tempdb.
This would require no changes to your code, and is not very difficult. It's even possible that tempdb is not properly sized for the sytem, and the extra log space would be useful to other applications utilizing tempdb.
Rework your script to add a commit at the beginning, and query only for the later transactions.
This would accomplish a couple of things. The commit at the beginning would move the log marker forward, which would allow the log dump to reclaim space. Then since the rest of your queries are only reads, there shouldn't be any transaction space associate with them. Remember the transaction log only stores information on Insert/Update/Delete, not Reads.
Int the example you listed above, the users details could be stored and committed to the database, then the rest of the queries would just be select statements using those details for the variables, then a final transaction would cleanup the table. In this scenario the log is only held for the first transaction, and the last transaction..but the queries in the middle would not fill the log.
Without knowing more about the DB configuration or query details it's hard to get much more detailed.

db2 V9.1 deadlocks

I have 4 different services in my application which SELECT and UPDATE on the same table in my database (db2 v9.1) on AIX 6.1, not big table around 300,000 records.The 4 services work execute in parallel way, and each service execute in sequential way (not parallel).
The issue that everyday I face horrible deadlock problem, the db hangs for about 5 to 10 minutes then it get back to its normal performance.
My services are synchronized in a way which make them never SELECT or UPDATE on the same row so I believe even if a deadlock occurred it supposed to be on a row level not table level, RIGHT?
Also, in my SELECT queries I use "ONLY FOR FETCH WITH UR", in db2 v9.1 that means not to lock the row as its only for read purpose and there will be no update (UR = uncommitted read).
Any Ideas about whats happening and why?

Firstly, these are certainly not deadlocks: a deadlock would be resolved by DB2 within few seconds by rolling back one of the conflicting transactions. What you are experiencing are most likely lock waits.
As to what's happening, you will need to monitor locks as they occur. You can use the db2pd utility, e.g.
db2pd -d mydb -locks showlocks
or
db2pd -d mydb -locks wait
You can also use the snapshot monitor:
db2 update monitor switches using statement on lock on
db2 get snapshot for locks on mydb
or the snapshot views:
select * from sysibmadm.locks_held
select * from sysibmadm.lockwaits

Why is an implicit table lock being released prior to end of transaction in RedShift?

I have an ETL process that is building dimension tables incrementally in RedShift. It performs actions in the following order:
Begins transaction
Creates a table staging_foo like foo
Copies data from external source into staging_foo
Performs mass insert/update/delete on foo so that it matches staging_foo
Drop staging_foo
Commit transaction
Individually this process works, but in order to achieve continuous streaming refreshes to foo and redundancy in the event of failure, I have several instances of the process running at the same time. And when that happens I occasionally get concurrent serialization errors. This is because both processes are replaying some of the same changes to foo from foo_staging in overlapping transactions.
What happens is that the first process creates the staging_foo table, and the second process is blocked when it attempts to create a table with the same name (this is what I want). When the first process commits its transaction (which can take several seconds) I find that the second process gets unblocked before the commit is complete. So it appears to be getting a snapshot of the foo table before the commit is in place, which causes the inserts/updates/deletes (some of which may be redundant) to fail.
I am theorizing based on the documentation http://docs.aws.amazon.com/redshift/latest/dg/c_serial_isolation.html where it says:
Concurrent transactions are invisible to each other; they cannot detect each other's changes. Each concurrent transaction will create a snapshot of the database at the beginning of the transaction. A database snapshot is created within a transaction on the first occurrence of most SELECT statements, DML commands such as COPY, DELETE, INSERT, UPDATE, and TRUNCATE, and the following DDL commands :
ALTER TABLE (to add or drop columns)
CREATE TABLE
DROP TABLE
TRUNCATE TABLE
The documentation quoted above is somewhat confusing to me because it first says a snapshot will be created at the beginning of a transaction, but subsequently says a snapshot will be created only at the first occurrence of some specific DML/DDL operations.
I do not want to do a deep copy where I replace foo instead of incrementally updating it. I have other processes that continually query this table so there is never a time when I can replace it without interruption. Another question asks a similar question for deep copy but it will not work for me: How can I ensure synchronous DDL operations on a table that is being replaced?
Is there a way for me to perform my operations in a way that I can avoid concurrent serialization errors? I need to ensure that read access is available for foo so I can't LOCK that table.

OK, Postgres (and therefore Redshift [more or less]) uses MVCC (Multi Version Concurrency Control) for transaction isolation instead of a db/table/row/page locking model (as seen in SQL Server, MySQL, etc.). Simplistically every transaction operates on the data as it existed when the transaction started.
So your comment "I have several instances of the process running at the same time" explains the problem. If Process 2 starts while Process 1 is running then Process 2 has no visibility of the results from Process 1.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas