BigTable: When should I enable Single-Row Transaction? - bigtable

Cloud Bigtable docs on Single-row Transactions says:
Cloud Bigtable also supports some write operations that would require
a transaction in other databases:
Read-modify-write operations, including increments and appends. A
read-modify-write operation reads an existing value; increments or
appends to the existing value; and writes the updated value to the
table.
Check-and-mutate operations, also known as conditional
mutations or conditional writes. In a check-and-mutate operation,
Cloud Bigtable checks a row to see if it meets a specified condition.
If the condition is met, Cloud Bigtable writes new values to the row.
So, if I understand correctly, if I use "Read-modify-write" or "Check-and-mutate" operations, enabling single-row transactions is required.
Those operations are API methods like CheckAndMutateRow, right?
So what if a program uses that method and single-row transactions is not enabled? Will the app fail? Am on the right direction?
My goal is to understand how, when and where (in an app) the single-row transaction setting on the app profile is being utilized.
Thanks!
Gabriel

You should enable single-row transactions only if you make calls to CheckAndMutateRow or ReadModifyWriteRow from your app, as those calls will fail without the setting enabled. I would even go so far as to disable them if you don't use them, as it will reduce the number of warnings you see when using replication.
Note as Jeff pointed out in his comment that these are enabled by default, in particular if your instance was created with a single cluster. That's simply to avoid breakage of legacy clients, as this distinction didn't matter prior to the launch of replication.
For a little more color as to why this setting exists, see the section here on conflicts between single-row transactions when using replication.

So, if I understand correctly, if I use "Read-modify-write" or "Check-and-mutate" operations, enabling single-row transactions is required.
That is not correct. Using those APIs results in single-row transactions, you do not need to enable anything beforehand.
Those operations are API methods like CheckAndMutateRow, right?
Yes.
So what if a program uses that method and single-row transactions is not enabled?
There is nothing to enable. Calling those APIs results in an atomic operation on the rows you are trying to change.
Will the app fail?
This is not applicable, see above.

Related

How to handle authentication count to be compliant with CQRS pattern?

I need to count improper authentication attempts to some accounts in my application. If some certain value is reached i need to block the account. In my understanding of CQS/CQRS, an authentication request is kind of query. Queries should not modify any data on the server side in my opinion. To solve such a problem i should update some attribute in the database while handling the query and this would be violation of CQRS principles i guess. What should i do?? Is an authentication a command in my case (remember commands, cannot return any value so how can i know that authentication is correct for example)?? Maybe i should publish some event after unsuccessful authentication?? How can i solve such a problem? Thanks for any answer.
Queries should not modify domain state.
I'm not aware of a general prohibition on a command returning data (strictly interpreted, that constraint would preclude any sort of acknowledgement of a command).
The "S" in CQRS is often interpreted too strictly in my experience: any write model (at least any write model which retains the right to decide that a command is inapplicable based on the results of previous commands) carries with it a read model (if event sourcing, that read model is typically going to be a snapshot derived from the write model, but the principle holds). If there's a query which can be effectively answered from that read model, there's not necessarily any gain from having a different read model handle the query (the best reason to do that separation in that scenario is if there'd be enough query volume that handling "query commands" degrades write performance).
So I'd advise modeling authentication as a command (it almost certainly changes the state of the system) and returning whatever auth status info is relevant/available from an auth as a response to the command.

nservicebus db insert duplicate

We have a Data loader service that uses NServiceBus to insert data(if not already present)into SQL DB. The queue is configured with Concurrencylevel > 1 as the data to load might get huge. Since the Concurrencylevel > 1, it results in duplicate inserts. Is there a way to handle this within NServiceBus.
Note: We have already considered and ruled out creating thread safe locks
Generally speaking, there's no need to run the endpoint with Concurrency Level of one. You also don't need to manage the threading and fiddle with concurrency/locks when it comes to NServiceBus. There are other factors on how the system needs to be designed to make it work:
Different transports have different levels of transaction support. Choose one that supports Transactions. It means if the message is retried, you won't get duplicated messages/data.
Try to work your system with idempotency. It means that with the lack of transactions (not supported by the transport or disabled by the code) if you process a message twice, you won't have multiple data/side effects. The 'how' part requires better knowledge about the data you're dealing with and your domain.

How to efficiently trigger system command with SQL query or table change?

I have data conversion and caching service running as self-hosted WCF service.
Now it uses database polling in constant short intervals to update its data.
I think it's unnecessary. The data can be changed only if one of the tables is changed, and when the data is changed depends on system users actions.
There is no problem in setting a trigger for specific tables, however I would need an action outside SQL-Server to update my cache. My WCF service could perform update when receiving specific URI via HTTP. So all I need is a command in table trigger which would send a request. Is it even possible?
I think about a hack I used back in the days with HTTP requests. I halted HTTP request response at server until data packet from somewhere else arrived. There was no delay between polling requests. I achieved fully asynchronous, "real-time" updates.
Maybe this approach is possible to apply with SQL? I think about a query which blocks termination until receives a signal. Well, it eventually times out, but it's good enough to try. Then - how to signal and wait in SQL? By locking and unlocking shared resource, like cursor or dummy table?
Any other options?
I need the cache update done at lowest possible frequency (because it's pretty expensive, so once per minute is great), but I need immediate update when the data is changed.
To answer your question, have you looked at xp_cmdshell?
https://msdn.microsoft.com/en-us/library/ms175046.aspx
However, the security/performance implications of such a decision could be non-trivial depending on your use case.

Good ways to decouple GUIs from SOAP/WS-API update/write calls?

Let's assume we have some configuration GUI that in its current form uses direct DB transactions to submit new configurations for more than one configurable component in a consistent manner.
Now let's move the data (DB) stuff behind some SOAP/WS API. The GUI has no direct DB access anymore. The transactional behaviour must remain, but the API should NOT be designed to explcitly accommodate the GUI form submissions. In fact, I don't even know how the new GUI will work or how the user input will be structured. Therefore I need to provide something like WS-AtomicTransaction on the API server side. However, there are (at least) two caveats:
The GUI is written in PHP: I don't think there is any WS-Transaction support in PHP available.
I don't want to keep DB transactions open on the server side while waiting for additional client requests.
Solutions I can think of:
using Camel's aggregation. However, that would make things more complicated in at least two ways:
You cannot use DB row ids of newly inserted rows in the subsequent calls inside the same transaction. You need to use some sort of symbolic back-referencing because there would be no communication between client and server while processing the aggregated messages.
call replies would not be immediate (or the immediate and separate reply to each single call would only be some sort of a stub, ie. not containing any useful information beyond "your message has been attached to TX xyz" -- if that's at all possible in the Camel aggregation case).
the two disadvantages of the previous solution make me think of request batches where possibly the WS standards provide means for referencing call results in subsequent calls inside the batch transaction. Is there any such thing already available? Maybe even as a PHP client?
trying to eliminate lock contention in the database by carefully using row-level locks etc. However, when inserting new elements, my guess is that usually pages and index pages need to be locked by the DB.
maybe some server-side persistence layer using optimistic locking? But again, that would not return any DB IDs back to the client before the final commit if DB writes would be postponed until the commit (don't know if that's possible at all).
What do YOU think?
Transactions are a powerful tool and we easily get into a thinking pattern in which we see every problem as a nail we hit with this big hammer. I can relate to your confusion because I've experienced it myself. Unfortunately I have no better advice for you than to try not think in terms of transactions but of atomic API calls.
When I think in terms of transactions, my thought pattern usually goes like this:
start transaction
read (repeat as required)
update (repeat as required)
commit/roll back
It takes some time to realize that we overuse this pattern. Actual conflicts are rare and there are many other ways of dealing with them. Here is a commonly used one in APIs
read and send data to client (atomic API call)
update data (on the client)
send original + updates back to the server (atomic API call)
start transaction (on server)
read
compare with original from client
if not same, return error (client should retry)
if same, update
commit
The last six points are part of the implementation of the API call.
Ferenc Mihaly
http://theamiableapi.com

Recover from SQL batch-abort errors inside a transaction? Alternative?

I'm looking for a way to continue execution of a transaction despite errors while inserting low-priority data. It seems like real nested transaction could be a solution, but they aren't supported by SQL Server 2005/2008. Another solution would be to have logic to decide if an error is critical or not, but it would seem that's not possible either.
Here's more detail on my scenario:
Data is periodicaly inserted in the database using ADO.NET/C#, and while some of it is vital, some could also be missing without problems. When the inserts are done, some computations are made on the data. (Both vital and non-vital) This whole process is inside a transaction so everything remains in synch.
Currently, transaction save points are used, and partial rollbacks are made on exceptions which occur during non-vital inserts. However, this doesn't work for "batch-abort" errors, which automaticly rollback the entire transaction. I understand some errors are critical, but things like failed casts are considered by SQL Server to be batch-abort errors. (Info on batch errors) I'm trying to prevent these errors from bringing down the whole insert if they occur on low priority data.
If what I'm describing isn't possible, I'm willing to consider any alternative way to achieve data integrity but allow the failure of the non-vital inserts.
Thanks for your help.
Unfortunately, can't be done as you describe (full support for nested transactions would be key here). Couple things I can think of that have been used to get around this in the past:
Best option would probably be to separate the commands into important/non-important commands that could be executed distinctly, naturally this would require that they not be order-dependent on each other
Could also use a messaging based approach (see Service Broker) where you would execute the primary commands inline and push the non-primary commands onto a queue for execution later/separately. The push to the queue would be transactional within the batch, but the execution of the command when you pop off the queue would be separate. This too would require they not be order-dependent on each other.
If order-dependent, you could use the messaging approach for everything, which would ensure order and could have separate messages per operation, then grouping them together (via conversation groups) would allow you to pull them off the queue in order as well and use separate transactions for each 'type' of operation (i.e. primary vs. non-primary). This would require some special coding on your part if all the grouped messages must be a single autonomous operation, but could be done.
I hesitate to even mention this option, because it is a terrible option, but for full disclosure I suppose you could consider it at your discretion if you think it fits (but it is definitely not an architecture that would apply to almost any scenario). You could use xp_cmdshell to call out to the command line and execute sqlcmd/osql for the non-critical tasks - this sqlcmd execution would be in a separate transaction from the module you are executing from, and simply ignoring the xp_cmdshell failure should allow the primary batch to continue.
Those are some ideas...
Can you do your import into a temporary location, using transactions only for the important parts. Once the temp location loaded, having absorbed any non-critical errors, you can copy the data into its final destination in a single transaction. Depends on the nature the work you are doing, but potentially a viable option.