Redis - Reliable queue pattern with multiple pops in one request - redis

I have implemented Redis's reliable queue pattern using BRPOPLPUSH because I want to avoid polling.
However this results in a network request for each item. How can I augment this so that a worker BRPOPLPUSH'es multiple entries at once?

While BRPOPLPUSH is blocking version of RPOPLSPUSH and do not support transactions and you cant handle multiple entries. Also you cant use LUA for this purposes because of LUA execution nature: server would be blocked for new requests before LUA script has finished.
You can use application side logic to resolve queue pattern you need. Pseudo language
func MyBRPOPLPUSH(source, dest, maxItems = 1, timeOutTime = 0) {
items = []
timeOut = time() + timeOutTime
while ((timeOut > 0 && time() < timeOut) || items.count < maxItems) {
item = redis.RPOPLSPUSH(source, dest)
if (item == nil) {
sleep(someTimeHere);
continue;
}
items.add(item)
}

Related

Hangfire executes job twice

I am using Hangfire.AspNetCore 1.7.17 and Hangfire.MySqlStorage 2.0.3 for software that is currently in production.
Now and then, we get a report of jobs being executed twice, despite the usage of the [DisableConcurrentExecution] attribute with a timeout of 30 seconds.
It seems that as soon as those 30 seconds have passed, another worker picks up that same job again.
The code is fairly straightforward:
public async Task ProcessPicking(HttpRequest incomingRequest)
{
var filePath = await StoreStreamAsync(incomingRequest, TriggerTypes.Picking);
var picking = await XmlHelper.DeserializeFileAsync<Picking>(filePath);
// delay with 20 minutes so outbound-out gets the chance to be send first
BackgroundJob.Schedule(() => StartPicking(picking), TimeSpan.FromMinutes(20));
}
[TriggerAlarming("[IMPORTANT] Failed to parse picking message to **** object.")]
[DisableConcurrentExecution(30)]
public void StartPicking(Picking picking)
{
var orderlinePickModels = picking.ToSalesOrderlinePickQuantityRequests().ToList();
var orderlineStatusModels = orderlinePickModels.ToSalesOrderlineStatusRequests().ToList();
var isParsed = DateTime.TryParse(picking.Order.UnloadingDate, out var unloadingDate);
for (var i = 0; i < orderlinePickModels.Count; i++)
{
// prevents bugs with usage of i in the background jobs
var index = i;
var id = BackgroundJob.Enqueue(() => SendSalesOrderlinePickQuantityRequest(orderlinePickModels[index], picking.EdiReference));
BackgroundJob.ContinueJobWith(id, () => SendSalesOrderlineStatusRequest(
orderlineStatusModels.First(x=>x.SalesOrderlineId== orderlinePickModels[index].OrderlineId),
picking.EdiReference, picking.Order.PrimaryReference, isParsed ? unloadingDate : DateTime.MinValue));
}
}
[TriggerAlarming("[IMPORTANT] Failed to send order line pick quantity request to ****.")]
[AutomaticRetry(Attempts = 2)]
[DisableConcurrentExecution(30)]
public void SendSalesOrderlinePickQuantityRequest(SalesOrderlinePickQuantityRequest request, string ediReference)
{
var audit = new AuditPostModel
{
Description = $"Finished job to send order line pick quantity request for item {request.Itemcode}, part of ediReference {ediReference}.",
Object = request,
Type = AuditTypes.SalesOrderlinePickQuantity
};
try
{
_logger.LogInformation($"Started job to send order line pick quantity request for item {request.Itemcode}.");
var response = _service.SendSalesOrderLinePickQuantity(request).GetAwaiter().GetResult();
audit.StatusCode = (int)response.StatusCode;
if (!response.IsSuccessStatusCode) throw new TriggerRequestFailedException();
audit.IsSuccessful = true;
_logger.LogInformation("Successfully posted sales order line pick quantity request to ***** endpoint.");
}
finally
{
Audit(audit);
}
}
It schedules the main task (StartPicking) that creates the objects required for the two subtasks:
Send picking details to customer
Send statusupdate to customer
The first job is duplicated. Perhaps the second job as well, but this is not important enough to care about as it just concerns a statusupdate. However, the first job causes the customer to think that more items have been picked than in reality.
I would assume that Hangfire updates the state of a job to e.g. in progress, and checks this state before starting a job. Is my time-out on the disabled concurrent execution too low? Is it possible in this scenario that the database connection to update the state takes about 30 seconds (to be fair, it is running on a slow server with ~8GB Ram, 6 vCores) due to which the second worker is already picking up the job again?
Or is this a Hangfire specific issue that must be tackled?

ReactiveX collect elements processed before a failure

I'm using RxJava to create a background job syncronizing my db.
It connects to an external source and start to process entries, map them and insert in the db.
When it ends I need the list with all the elements processed, I can get it when everything goes right, but how can I collect all the elements processed if during the flow something fail?
final List<String> res = Observable.create(onSubscribe)
.buffer(4)
.flatMap(TestRx::doStuff)
.buffer(8)
.map(TestRx::calculateList)
.toList()
.toBlocking()
.single();
System.out.println("strings = " + res);
What I would like to have is a way that if doStuff or calculateList throw exceptions, the flow stop an returns the list with everything it processed until the error.
List<String> res = Observable.create(onSubscribe)
.buffer(4)
.flatMap(TestRx::doStuff)
.onErrorResumeNext(Observable.empty()) // turn error into completion
.buffer(8)
.map(TestRx::calculateList)
.onErrorResumeNext(Observable.empty()) // turn error into completion
.toList()
.toBlocking()
.single();
System.out.println("strings = " + res);

Longer than expected upload times for big query insert

I am uploading data to big query as csv format with JSON schemas. What I am seeing is the very long times to load into big query. I take the start and ending load times from the pollJob.getStatistics() when the load is DONE and compute a delta time as (startTime - endTime)/1000. Then I look at the number of bytes loaded. The data is from files stored in google cloud storage that I reprocess in app engine to do some reformatting. I convert the string into a byte stream and then load as the contents of the load as follows:
public static void uploadFileToBigQuerry(TableSchema tableSchema,String tableData,String datasetId,String tableId,boolean formatIsJson,int waitSeconds,String[] fileIdElements) {
/* Init diagnostic */
String projectId = getProjectId();
if (ReadAndroidRawFile.testMode) {
String s = String.format("My project ID at start of upload to BQ:%s datasetID:%s tableID:%s json:%b \nschema:%s tableData:\n%s\n",
projectId,datasetId,tableId,formatIsJson,tableSchema.toString(),tableData);
log.info(s);
}
else {
String s = String.format("Upload to BQ tableID:%s tableFirst60Char:%s\n",
tableId,tableData.substring(0,60));
log.info(s);
}
/* Setup the data each time */
Dataset dataset = new Dataset();
DatasetReference datasetRef = new DatasetReference();
datasetRef.setProjectId(projectId);
datasetRef.setDatasetId(datasetId);
dataset.setDatasetReference(datasetRef);
try {
bigquery.datasets().insert(projectId, dataset).execute();
} catch (IOException e) {
if (ReadAndroidRawFile.testMode) {
String se = String.format("Exception creating datasetId:%s",e);
log.info(se);
}
}
/* Set destination table */
TableReference destinationTable = new TableReference();
destinationTable.setProjectId(projectId);
destinationTable.setDatasetId(datasetId);
destinationTable.setTableId(tableId);
/* Common setup line */
JobConfigurationLoad jobLoad = new JobConfigurationLoad();
/* Handle input format */
if (formatIsJson) {
jobLoad.setSchema(tableSchema);
jobLoad.setSourceFormat("NEWLINE_DELIMITED_JSON");
jobLoad.setDestinationTable(destinationTable);
jobLoad.setCreateDisposition("CREATE_IF_NEEDED");
jobLoad.setWriteDisposition("WRITE_APPEND");
jobLoad.set("Content-Type", "application/octet-stream");
}
else {
jobLoad.setSchema(tableSchema);
jobLoad.setSourceFormat("CSV");
jobLoad.setDestinationTable(destinationTable);
jobLoad.setCreateDisposition("CREATE_IF_NEEDED");
jobLoad.setWriteDisposition("WRITE_APPEND");
jobLoad.set("Content-Type", "application/octet-stream");
}
/* Setup the job config */
JobConfiguration jobConfig = new JobConfiguration();
jobConfig.setLoad(jobLoad);
JobReference jobRef = new JobReference();
jobRef.setProjectId(projectId);
Job outputJob = new Job();
outputJob.setConfiguration(jobConfig);
outputJob.setJobReference(jobRef);
/* Convert input string into byte stream */
ByteArrayContent contents = new ByteArrayContent("application/octet-stream",tableData.getBytes());
int timesToSleep = 0;
try {
Job job = bigquery.jobs().insert(projectId,outputJob,contents).execute();
if (job == null) {
log.info("Job is null...");
throw new Exception("Job is null");
}
String jobIdNew = job.getId();
//log.info("Job is NOT null...id:");
//s = String.format("job ID:%s jobRefId:%s",jobIdNew,job.getJobReference());
//log.info(s);
while (true) {
try{
Job pollJob = bigquery.jobs().get(jobRef.getProjectId(), job.getJobReference().getJobId()).execute();
String status = pollJob.getStatus().getState();
String errors = "";
String workingDataString = "";
if ((timesToSleep % 10) == 0) {
String statusString = String.format("Job status (%dsec) JobId:%s status:%s\n", timesToSleep, job.getJobReference().getJobId(), status);
log.info(statusString);
}
if (pollJob.getStatus().getState().equals("DONE")) {
status = String.format("Job done, processed %s bytes\n", pollJob.getStatistics().toString()); // getTotalBytesProcessed());
log.info(status); // compute load stats with this string
if ((pollJob.getStatus().getErrors() != null)) {
errors = pollJob.getStatus().getErrors(). toString();
log.info(errors);
}
The performance I get is as follows: the median upload of BYTES/(deltaTime) is 17 BYTES/sec! Yes, bytes, not kilo or mega...
Worse is that sometimes for only a few hundred bytes, just one row, it takes up to 5 minutes. I generally have no errors, but I am thinking that with this performance, I will not be able to upload each app before more data arrives. I am processing with a task queue in a backends instance. This task queue gets a time-out after about an hour of processing.
Is this poor performance because of the contents method?
A couple of things:
If you are loading a small amount of data, you may be better off using TableData.insertAll() rather than a load job, which lets you post the data and have it be available immediately.
Load jobs are Batch oriented jobs. That is, you can insert (more or less) as many as you'd like and they'll be processed when there are resources to do so. Sometimes you create a job and the worker pool is resizing so you have to wait. Sometimes the worker pool is full.
If you provide a project & Job ID we can look into the performance of individual jobs to see what's taking so long.
Load jobs process in parallel; that is, once they start executing they should go very quickly, but the time to start executing may take a long time.
There are three time fields in the job statistics. createTime, startTime, and endTime.
createTime is the moment the BigQuery server receives your request.
startTime is when BigQuery actually starts working on your job
endTime is when the job is completely done
I'd expect that most of the time is being spent between create and start. If that is not the case for small jobs, then it means that something is strange is going on, and a Job ID would help diagnose the issue.

Using Rx to Geocode an address in Bing Maps

I am learning to use the Rx extensions for a Silverlight 4 app I am working on. I created a sample app to nail down the process and I cannot get it to return anything.
Here is the main code:
private IObservable<Location> GetGPSCoordinates(string Address1)
{
var gsc = new GeocodeServiceClient("BasicHttpBinding_IGeocodeService") as IGeocodeService;
Location returnLocation = new Location();
GeocodeResponse gcResp = new GeocodeResponse();
GeocodeRequest gcr = new GeocodeRequest();
gcr.Credentials = new Credentials();
gcr.Credentials.ApplicationId = APP_ID2;
gcr.Query = Address1;
var myFunc = Observable.FromAsyncPattern<GeocodeRequest, GeocodeResponse>(gsc.BeginGeocode, gsc.EndGeocode);
gcResp = myFunc(gcr) as GeocodeResponse;
if (gcResp.Results.Count > 0 && gcResp.Results[0].Locations.Count > 0)
{
returnLocation = gcResp.Results[0].Locations[0];
}
return returnLocation as IObservable<Location>;
}
gcResp comes back as null. Any thoughts or suggestions would be greatly appreciated.
The observable source you are subscribing to is asynchronous, so you can't access the result immediately after subscribing. You need to access the result in the subscription.
Better yet, don't subscribe at all and simply compose the response:
private IObservable<Location> GetGPSCoordinates(string Address1)
{
IGeocodeService gsc =
new GeocodeServiceClient("BasicHttpBinding_IGeocodeService");
Location returnLocation = new Location();
GeocodeResponse gcResp = new GeocodeResponse();
GeocodeRequest gcr = new GeocodeRequest();
gcr.Credentials = new Credentials();
gcr.Credentials.ApplicationId = APP_ID2;
gcr.Query = Address1;
var factory = Observable.FromAsyncPattern<GeocodeRequest, GeocodeResponse>(
gsc.BeginGeocode, gsc.EndGeocode);
return factory(gcr)
.Where(response => response.Results.Count > 0 &&
response.Results[0].Locations.Count > 0)
.Select(response => response.Results[0].Locations[0]);
}
If you only need the first valid value (the location of the address is unlikely to change), then add a .Take(1) between the Where and Select.
Edit: If you want to specifically handle the address not being found, you can either return results and have the consumer deal with it or you can return an Exception and provide an OnError handler when subscribing. If you're thinking of doing the latter, you would use SelectMany:
return factory(gcr)
.SelectMany(response => (response.Results.Count > 0 &&
response.Results[0].Locations.Count > 0)
? Observable.Return(response.Results[0].Locations[0])
: Observable.Throw<Location>(new AddressNotFoundException())
);
If you expand out the type of myFunc you'll see that it is Func<GeocodeRequest, IObservable<GeocodeResponse>>.
Func<GeocodeRequest, IObservable<GeocodeResponse>> myFunc =
Observable.FromAsyncPattern<GeocodeRequest, GeocodeResponse>
(gsc.BeginGeocode, gsc.EndGeocode);
So when you call myFunc(gcr) you have an IObservable<GeocodeResponse> and not a GeocodeResponse. Your code myFunc(gcr) as GeocodeResponse returns null because the cast is invalid.
What you need to do is either get the last value of the observable or just do a subscribe. Calling .Last() will block. If you call .Subscribe(...) your response will come thru on the call back thread.
Try this:
gcResp = myFunc(gcr).Last();
Let me know how you go.
Richard (and others),
So I have the code returning the location and I have the calling code subscribing. Here is (hopefully) the final issue. When I call GetGPSCoordinates, the next statement gets executed immediately without waiting for the subscribe to finish. Here's an example in a button OnClick event handler.
Location newLoc = new Location();
GetGPSCoordinates(this.Input.Text).ObserveOnDispatcher().Subscribe(x =>
{
if (x.Results.Count > 0 && x.Results[0].Locations.Count > 0)
{
newLoc = x.Results[0].Locations[0];
Output.Text = "Latitude: " + newLoc.Latitude.ToString() +
", Longtude: " + newLoc.Longitude.ToString();
}
else
{
Output.Text = "Invalid address";
}
});
Output.Text = " Outside of subscribe --- Latitude: " + newLoc.Latitude.ToString() +
", Longtude: " + newLoc.Longitude.ToString();
The Output.Text assignment that takes place outside of Subscribe executes before the Subscribe has finished and displays zeros and then the one inside the subscribe displays the new location info.
The purpose of this process is to get location info that will then be saved in a database record and I am processing multiple addresses sequentially in a Foreach loop. I chose Rx Extensions as a solution to avoid the problem of the async callback as a coding trap. But it seems I have exchanged one trap for another.
Thoughts, comments, suggestions?

How to handle authentication and authorization with thrift?

I'm developing a system which uses thrift. I'd like clients identity to be checked and operations to be ACLed. Does Thrift provide any support for those?
Not directly. The only way to do this is to have an authentication method which creates a (temporary) key on the server, and then change all your methods so that the first argument is this key and they all additionally raise an not-authenticated error. For instance:
exception NotAuthorisedException {
1: string errorMessage,
}
exception AuthTimeoutException {
1: string errorMessage,
}
service MyAuthService {
string authenticate( 1:string user, 2:string pass )
throws ( 1:NotAuthorisedException e ),
string mymethod( 1:string authstring, 2:string otherargs, ... )
throws ( 1:AuthTimeoutException e, ... ),
}
We use this method and save our keys to a secured memcached instance with a 30min timeout for keys to keep everything "snappy". Clients who receive an AuthTimeoutException are expected to reauthorise and retry and we have some firewall rules to stop brute-force attacks.
Tasks like autorisation and permissions are not considered as a part of Thrift, mostly because these things are (usually) more related to the application logic than to a general RPC/serialization concept. The only Thing that Thrift supports out of the box right now is the TSASLTransport. I can't say much about that one myself, simply because I never felt the need to use it.
The other option could be to make use of THeaderTransport which unfortunately at the time of writing is only implemented with C++. Hence, if you plan to use it with some other language you may have to invest some additional work. Needless to say that we accept contributions ...
A bit late (I guess very late) but I had modified the Thrift Source code for this a couple of years ago.
Just submitted a ticket with the Patch to https://issues.apache.org/jira/browse/THRIFT-4221 for just this.
Have a look at that. Basically the proposal is to add a "BeforeAction" hook that does exactly that.
Example Golang generated diff
+ // Called before any other action is called
+ BeforeAction(serviceName string, actionName string, args map[string]interface{}) (err error)
+ // Called if an action returned an error
+ ProcessError(err error) error
}
type MyServiceClient struct {
## -391,7 +395,12 ## func (p *myServiceProcessorMyMethod) Process(seqId int32, iprot, oprot thrift.TP
result := MyServiceMyMethodResult{}
var retval string
var err2 error
- if retval, err2 = p.handler.MyMethod(args.AuthString, args.OtherArgs_); err2 != nil {
+ err2 = p.handler.BeforeAction("MyService", "MyMethod", map[string]interface{}{"AuthString": args.AuthString, "OtherArgs_": args.OtherArgs_})
+ if err2 == nil {
+ retval, err2 = p.handler.MyMethod(args.AuthString, args.OtherArgs_)
+ }
+ if err2 != nil {
+ err2 = p.handler.ProcessError(err2)