Tasks in SQL Server and multiple worker role instances - sql

Consider the following table in SQL Server: Tasks (Payload nvarchar, DateToExecute datetime, DateExecuted datetime null).
Now we have two worker processes (2 Azure worker role instances in our case). Both of them periodically try to get records where DateExecuted IS NULL AND DateToExecute <= GETDATE(). Then they process that record and set (SQL update) DateExecuted to current date.
The problem is that a single task should be processed only once by a single worker instance.
What's the best way to provide synchronization or locking for implementing such scenario?

The easiest way to do locking over multiple roles/instances in Windows Azure is by using blob leases. Steve Marx created a great class for this called AutoRenewLease (source, NuGet, blog post). If you already have a timer or while loop, you can write code like this:
using (var arl = new AutoRenewLease(leaseBlob))
{
if (arl.HasLease)
{
// Query Tasks table and do work....
}
else
{
// Other worker is busy....
}
}
Or you could use the DoEvery method which allows you to schedule your code every X minutes:
AutoRenewLease.DoEvery(leaseBlob, TimeSpan.FromMinutes(15), () => {
// Query Tasks table and do work....
});

Related

DbUpdateConcurrencyException on inserting a new row in SQL Server using EF Core (expected to affect 1 row(s) but actually affected 0 row(s))

I am trying to insert data in one table using Ef core 5 with repository pattern and Unit of work.
Code Sample :
var stateData = new State
{
StateId = state.StateId,
Action = state.Action,
Event = state.Event,
ExecutedOn = DateTime.Now
};
_unitOfWork.GetRepository<State>().Add(stateData);
var result = _unitOfWork.Commit();
GetRepository method used to get respective repo:
{
return (IRepository<TEntity>)GetOrAddRepository(typeof(TEntity), new
Repository<TEntity>(Context));
}
Commit Method :
{
return Context.SaveChanges();
}
I am trying to insert data in state table which has Id as primary and identity column. Rest other columns are StateId,Action,Event and ExecutedOn(datatype : datetime2).
Application is running on multiple nodes. So there will be multiple insert request at a same time from multiple node but different data.
I am getting DbUpdateConcurrencyException frequently while inserting the records of states in DB. Sometimes it works but most of time I get DbUpdateConcurrencyException with message "Database operation expected to affect 1 row(s) but actually affected 0 row(s). Data may have been modified or deleted since entities were loaded. See http://go.microsoft.com/fwlink/?LinkId=527962 for information on understanding and handling optimistic concurrency exceptions".
There is no update operation but still I am getting concurrency exception.
I have tried all others solutions on similar questions but no luck.

How to read data from BigQuery periodically in Apache Beam?

I want to read data from Bigquery periodically in Beam, and the test codes as below
pipeline.apply("Generate Sequence",
GenerateSequence.from(0).withRate(1, Duration.standardMinutes(2)))
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))))
.apply("Read from BQ", new ReadBQ())
.apply("Convert Row",
MapElements.into(TypeDescriptor.of(MyData.class)).via(MyData::fromTableRow))
.apply("Map TableRow", ParDo.of(new MapTableRowV1()))
;
static class ReadBQ extends PTransform<PCollection<Long>, PCollection<TableRow>> {
#Override
public PCollection<TableRow> expand(PCollection<Long> input) {
BigQueryIO.TypedRead<TableRow> rows = BigQueryIO.readTableRows()
.fromQuery("select * from project.dataset.table limit 10")
.usingStandardSql();
return rows.expand(input.getPipeline().begin());
}
}
static class MapTableRowV1 extends DoFn<AdUnitECPM, Void> {
#ProcessElement
public void processElement(ProcessContext pc) {
LOG.info("String of mydata is " + pc.element().toString());
}
}
Since BigQueryIO.TypedRead is related to PBegin, one trick is done in ReadBQ through rows.expand(input.getPipeline().begin()). However, this job does NOT run every two minutes. How to read data from bigquery periodically?
Look at using Looping Timers. That provides the right pattern.
As written your code would only fire once after sequence is built. For fixed windows you would need an input value coming into the Window for it to trigger. For example, have the pipeline read a Pub/Sub input and then have an agent push events every 2 minutes into the topic/sub.
I am going to assume that you are running in streaming mode here; however, another way to do this would be to use a batch job and then run it every 2 mins from Composer. Reason being if your job is idle for effectively 90 secs (2 min - processing time) your streaming job wasting some resources.
One other note: Look at thinning down you column selection in your BigQuery SQL (to save time and money). Look at using some filter on your SQL to pick up a partition or cluster. Look at using #timestamp filter to only scan records that are new in last N. This could give you better control over how you deal with latency and variability at the db level.
As you have mentioned in the question, BigQueryIO read transforms start with PBegin, which puts it at the start of the Graph. In order to achieve what you are looking for, you will need to make use of the BigQuery client libraries directly within a DoFn.
For an example of this have a look at this
transform
Using a normal DoFn for this will be ok for small amounts of data, but for a large amount of data, you will want to look at implementing that logic in a SDF.

Hangfire Job timout

I have certain jobs that appear to be 'Hung' in hangfire and may run for hours but aren't actually doing anything. Is there a way for Hangfire to kill a job if it runs longer than a certain amount to time?
I'm running the latest version of Hangfire on SQL server.
In your job creation (doesn't matter if it's a recurring or a single background job) call, pass in an extra param of type "IJobCancellationToken" to your job method like this,
public static void Method1(string param1, IJobCancellationToken token) { }
When you create your job, create it with a null IJobCancellationToken token value and save the jobId. Have another recurring job that polls these jobs and simply call BackgroundJob.Delete(jobId) when it exceeds your desired time limit. This will clear the job from hangfire and also kill the process on your server.
Reference: https://discuss.hangfire.io/t/how-to-cancel-a-job/872
Yes you can do this, you'll want to set the FetchNextJobTimeout at startup. By setting FetchNextJobTimeout, you can control how long a job can run for before Hangfire starts executing it again on another thread.
services.AddHangfire(config => {
config.UseMemoryStorage(new MemoryStorageOptions { FetchNextJobTimeout = TimeSpan.FromHours(24) });
});

Using SQL dB column as a lock for concurrent operations in Entity Framework

We have a long running user operation that is handled by a pool of worker processes. Data input and output is from Azure SQL.
The master Azure SQL table structure columns are approximated to
[UserId, col1, col2, ... , col N, beingProcessed, lastTimeProcessed ]
beingProcessed is boolean and lastTimeProcessed is DateTime. The logic in every worker role is as shown below and with multiple workers processing (each with their own Entity Framework layer), in essence beingProcessed is being used a lock for MutEx purposes
Question: How can I deal with concurrency issues on the beingProcessed "lock" itself based on the above load? I think read-modify-write operation on the beingProcessed needs to be atomic but I'm open to other strategies. Open to other code refinements too.
[Update]: I wonder if TransactionScope is what's needed here ... http://msdn.microsoft.com/en-US/library/system.transactions.transactionscope(v=vs.110).aspx
Code:
public void WorkerRoleMain()
{
while(true)
{
try
{
dbContext db = new dbContext();
// Read
foreach (UserProfile user in db.UserProfile
.Where(u => DateTime.UtcNow.Subtract(u.lastTimeProcessed)
> TimeSpan.FromHours(24) &
u.beingProcessed == false))
{
user.beingProcessed = true; // Modify
db.SaveChanges(); // Write
// Do some long drawn processing here
...
...
...
user.lastTimeProcessed = DateTime.UtcNow;
user.beingProcessed = false;
db.SaveChanges();
}
}
catch(Exception ex)
{
LogException(ex);
Sleep(TimeSpan.FromMinutes(5));
}
} // while ()
}
What we usually do is this:
At the beginning of a long operation we start a transaction:
BEGIN TRANSACTION
Then we select a row from the table we would like to update/delete using these hints:
SELECT * FROM Table WITH (ROWLOCK, NOWAIT) Where ID = 123;
Then we check that we have the row. If the row is locked by another process there will be an SQL Error. In this case we rollback the transaction and advise the user.
If the record is locked we process the record, and do the required updates, using the same transaction object we used to lock the record:
UPDATE Table SET Col1='value' WHERE ID = 123;
Then we COMMIT the transaction.
COMMIT;
This is just the Pseudo-code of the process. You will have to implement it in your program.
One small note regarding the above process. When you lock the record in SQL Server (or Azure), use the primary key in your WHERE Clause, otherwise the SQL Server will decide to use a Page lock, or Table lock

Transaction timeout expired while using Linq2Sql DataContext.SubmitChanges()

please help me resolve this problem:
There is an ambient MSMQ transaction. I'm trying to use new transaction for logging, but get next error while attempt to submit changes - "Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding." Here is code:
public static void SaveTransaction(InfoToLog info)
{
using (TransactionScope scope =
new TransactionScope(TransactionScopeOption.RequiresNew))
{
using (TransactionLogDataContext transactionDC =
new TransactionLogDataContext())
{
transactionDC.MyInfo.InsertOnSubmit(info);
transactionDC.SubmitChanges();
}
scope.Complete();
}
}
Please help me.
Thx.
You could consider increasing the timeout or eliminating it all together.
Something like:
using(TransactionLogDataContext transactionDC = new TransactionLogDataContext())
{
transactionDC.CommandTimeout = 0; // No timeout.
}
Be careful
You said:
thank you. but this solution makes new question - if transaction scope was changed why submit operation becomes so time consuming? Database and application are on the same machine
That is because you are creating new DataContext right there:
TransactionLogDataContext transactionDC = new TransactionLogDataContext())
With new data context ADO.NET opens up new connection (even if connection strings are the same, unless you do some clever connection pooling).
Within transaction context when you try to work with more than 1 connection instances (which you just did)
ADO.NET automatically promotes transaction to a distributed transaction and will try to enlist it into MSDTC. Enlisting very first transaction per connection into MSDTC will take time (for me it takes 30+ seconds), consecutive transactions will be fast, however (in my case 60ms). Take a look at this http://support.microsoft.com/Default.aspx?id=922430
What you can do is reuse transaction and connection string (if possible) when you create new DataContext.
TransactionLogDataContext tempDataContext =
new TransactionLogDataContext(ExistingDataContext.Transaction.Connection);
tempDataContext.Transaction = ExistingDataContext.Transaction;
Where ExistingDataContext is the one which started ambient transaction.
Or attemp to speed up your MS DTC.
Also do use SQL Profiler suggested by billb and look for SessionId between different commands (save and savelog in your case). If SessionId changes, you are in fact using 2 different connections and in that case will have to reuse transaction (if you don't want it to be promoted to MS DTC).