Azure Rest API: PutBlockList - Commit uncommitted blocks to a different blob? - api

I'm trying to do some automation by using the Azure Rest API (using win 8 app) to try to read a blob being written to by another worker role. This other logging worker role is "PUT"ting all these blocks to a blob, and then at the top of the hour commits all these blocks, compresses them then starts a new one. I don't own this logger and would like to try to do this around it without bothering anything.
Here is what I'm doing:
Get the list of all uncommitted blocks from that blob file
Commit that list to that blob and then read/transform as I need
My current approach works for me but it causes issues with the logger, possibly crashing an instance, and definitely losing data when it commits its blob. To prevent this, I've been trying and failing to commit these uncommitted blocks to another temp blob and then reading it from there.
It always returns with a 400: Invalid Block list, when I call PutBlockList. The same exact call succeeds when using the blob file that I got the uncommitted blocks from. It also succeeds to the temp blob if the list of uncommitted blocks is empty.
I've tried to copy the blob, but it fails with a 404: BlobNotFound since it has no content until the blob is committed.
I've tried snapshots, but the documentation states that it omits uncommitted blocks. There is also no documentation of getting the actual data of the uncommitted blocks instead of just the blockids.
So is it possible to commit a blob's uncommitted blocks to a different blob?

Related

Is SELECT retrieving data from the WAL files?

To my understanding, database can postpone writing to table files to boost IO performance. When transaction is COMMITted, data are written to the WAL files.
I'm curious, how much delayed the writing to table files can be. In particular, when I use a simple SELECT, e.g.
SELECT * from myTable;
after COMMIT, is it possible that database has to retrieve data from the WAL files in addition to the table files?
The documentation talks about being able to postpone the flushing of data pages:
If we follow this procedure, we do not need to flush data pages to disk on every transaction commit.
What WAL files allow an RDBMS to do is keeping "dirty" data pages in memory and flushing them to disk at a later time. It does not work in a way that the data pages are modified at a later time.
So the answer to your question is "No, a SELECT is always retrieving data from the data pages, not from the WAL files".
PostgreSQL does not read from WAL for this purpose during normal operations. It would only read WAL in order to apply changes to the data files after a crash (or during replication onto another server).
When data from ordinary data files is changed, the pages of the data files are kept in shared memory (in shared_buffers) until they are written to disk. Any other processes wanting to see that data will find it in that shared memory, and it will be in its changed form. They will always look for it in shared_buffers before they try to read it from disk, so no one will ever see the stale on-disk version of the data, except for the recovery process after a crash.

When does DBwn update buffers in Database buffer cache to database disk?

I am learning some basic knowledge about Oracle Database Architecture, and there are two examples.
The steps involved in executing a data manipulation language (DML) statements.
The steps involved in executing COMMIT command.
Executing DML steps
The steps are as follows:
The server process receives the statement and checks the library cache for any shared SQL area that contains a similar SQL statement.
If a shared SQL area is found, the server process checks the user’s
access privileges for the requested data, and the existing shared SQL
area is used to process the statement. If not, a new shared SQL area
is allocated for the statement, so that it can be parsed and
processed.
If the data and undo segment blocks are not already in the buffer cache, the server process reads them from the data files into the
buffer cache. The server process locks the rows that are to be
modified.
The server process records the changes to be made to the data buffers as well as the undo changes. These changes are written to the
redo log buffer before the in-memory data and undo buffers are
modified. This is called write-ahead logging.
The undo segment buffers contain values of the data before it is modified. The undo buffers are used to store the before image of the
data so that the DML statements can be rolled back, if necessary. The
data buffers record the new values of the data.
The user gets feedback from the DML operation (such as how many rows were affected by the operation).
COMMIT Process steps
The steps are as follows:
The server process places a commit record, along with the system change number (SCN), in the redo log buffer. The SCN is monotonically
incremented and is unique within the database.
The LGWR background process performs a contiguous write of all the redo log buffer entries up to and including the commit record to the
redo log files.
If modified blocks are still in the SGA, and if no other session is modifying them, then the database removes lock-related transaction
information from the blocks.
The server process provides feedback to the user process about the completion of the transaction.
First question: Do server process or background process finally move or migrate Redo log files to Data files? If yes, how to do this process?
Thanks to Nicholas Krasnov & JSapkota comments. There doesn't exist this kind of "Migration" process because they serve different purposes. Data files are the data of the database and Redo log files are used to recover the database. DBWn is responsible for writing data to data files and LGWR write redo log buffer to the active redo log file on disk.
My Second question: When does DBwn(Database writer process) modify buffers in the cache to database disk? Update database disk before COMMIT or after COMMIT.
DBwn will dose not write to database files because of issuing commit statement. Commit just means that this is a end of transaction. Locks on table or rows are released, SCN is incremented, and LGWR writes the SCN and changes to online redolog file.
The database buffer cache has two list
1) Write list
2) least-recently-used (LRU) list
Least-recently-used (LRU) list has dirty buffer. Dirty buffer is those buffers which were modified. Commit will indirectly make a buffer dirty. It will be in buffer as its been recently accessed.
DBwn neither writes in the datafile before the commit nor after. It has its own scenarios like when checkpoint happens or dirty buffers reaches threshold or there is no free buffer etc.
I hope, I answered your question. Thank you.

when exactly does writing in data files occur in sql?

i am currently studying transaction management in dbms .... from database systems elmasri and navathe 6th edition link: http://mathcomp.uokufa.edu.iq/staff/kbs/file/2/Fundamentals%20of%20Database%20Systems%20-%20Ramez%20Elmasri%20&%20Navathe.pdf
can someone please tell (in short) the transaction commit process i.e. without going into too much detail .... i also read some portion from oracle forum link :
https://docs.oracle.com/cd/B19306_01/server.102/b14220/transact.htm
what i could understand is that actual writing can take place before or after committing ... but if changes made have to be visible to all users then it must take before commit not after commit , right ?
can someone please help me clear the confusion ??
As the forum post indicates, writes to the data files are completely independent of transaction control. Changes might be written before a transaction commits, they might be written after the transaction commits.
When changes are made, those changes are made to a version of the data in memory. In order for a transaction to commit successfully (assuming default commit settings), the change must be written to the redo logs. That allows the database to re-create the change if it has not been written to data files and the database crashes. Conversely, if a change is written to a data file before the transaction is committed, information on how to reverse the change will be in the undo logs so that if the transaction will be able to be rolled back if the database fails before the transaction commits (or if the application issues a rollback).

Possibility of restoring a deleted stream?

A new stream called stream 1 is created.
I deliver some changes to stream 1.
Later, I delete stream 1.
So:
Is there a possibility to restore a deleted stream?
If I am not able to restore the stream then, will I loose my changes delivered to it?
Is there a possibility to restore a deleted stream?
Not easily unless you had created snapshots (we covered snapshots in your previous question "Consistency of snapshot code in rtc?"): in that case, when you delete a stream, RTC would ask you to select another existing stream in order to keep ownership of those snapshots.
If you do, then it is trivial to re-create a new stream from a snapshot, assuring you to recover all components at their exact state as recorded by the snapshot.
But if you didn't set any snapshot, then you have to manually re-enter all the components, and set them to (for instance) their most recent baselines.
If I am not able to restore the stream then, will I loose my changes delivered to it?
In any case, as mentioned in the thread "Delete a Stream - any side-effects?"
Change-sets exist independently of any stream, so deleting a
stream does not delete any change-sets.
It will just be harder to get the exact list of change sets back to a new stream if they were only delivered to stream 1 (that you deleted).
Especially if those change set were never grouped inside a baseline (for a given component) or, as explained above, with a snapshot.
But those change sets are not gone.

SQL Server insert flow

As I understood reading some articles in internet, SQL Server has a buffer cache where it stores pages, and when insert statement is executed, the modified data is written only to that buffer in memory, not to the disk.
And when system checkpoint comes all dirty pages are flushed to disk.
Does this mean that when we execute insert statement and get a return value that everything was ok, the data might still not be written to disk and in theory if system crash occurs before checkpoint, the dirty pages wont be saved to disk although we received information that everything was ok and the transaction is commited?
No. Because you totally ignore the 2nd part of the mechanism - the LOG FILE. The Log file keeps all changed pages and is flushed to disc. In case of a crash, upon start, the server will replay the changes from the log file.