I am trying to use the new Google BigQuery Storage Write API in a Dataflow job using Beam.
I am using
BigQueryIO<Pair<String,String>>write().withMethod(BigQueryIO.Write.Method.STORAGE_WRITE_API)
however when I run it I get an error saying
When writing an unbounded PCollection via FILE_LOADS or STORAGE_API_WRITES, triggering frequency must be specified
however the beam docs (https://beam.apache.org/releases/javadoc/2.7.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.Write.html#withTriggeringFrequency-org.joda.time.Duration) for triggeringFrequency say
This is only applicable when the write method is set to BigQueryIO.Write.Method.FILE_LOADS, and only when writing an unbounded PCollection.
To be clear, I am using the STORAGE_WRITE_API method, not FILE_LOADS
I am confused as to why it is asking me to include the triggeringFrequency field?
edit: Documentation on this new storage write API is poor but I am thinking that it, under the hood, is doing a form of batching so, like the FILE method, it needs some frequency to determine the rate of batching
Looking at the source code where triggering frequency is fetched, the storage api triggering frequency is from the BigQueryOptions iff the triggeringFrequency of the IO is not set, see getStorageApiTriggeringFrequency(BigQueryOptions options).
So
When writing an unbounded PCollection via FILE_LOADS or STORAGE_API_WRITES, triggering frequency must be specified
is correct.
But
This is only applicable when the write method is set to BigQueryIO.Write.Method.FILE_LOADS, and only when writing an unbounded PCollection.
is wrong.
Should be
This is only applicable when the write method is set to BigQueryIO.Write.Method.FILE_LOADS or BigQueryIO.Write.Method.STORAGE_WRITE_API, and only when writing an unbounded PCollection.
Ideally, you probably should set it through the BigQueryOptions.setStorageWriteApiTriggeringFrequencySec().
I think the documentation is intentionally hiding the implementation details of overriding the options through the I/O class builder itself.
Related
I spent almost a full day debugging why my client can't post any forms, until I found out the anti-forgery mechanism got borked on the client-side and the server just responded with a 400 error, with zero logs or information (turns out anti-forgery validation is logged internally with Info level).
So I decided the server needs to special handle this scenario, however according to this answer I don't really know how to do that (aside from hacking).
Normally I would set up a IAlwaysRunResultFilter and check for IAntiforgeryValidationFailedResult. Easy.
Except that I use Api Controllers, so by default all results get transformed into ProblemDetails. So context.Result as mentioned here is always of type ObjectResult. The solution accepted there is to use options.SuppressMapClientErrors = true;, however I want to retain this mapping at the end of the pipeline. But if this option isn't set to true, I have no idea how to intercept the Result in the pipeline before this transformation.
So in my case, I want to do something with the result of the anti-forgery validation as mentioned in the linked post, but after that I want to retain the ProblemDetails transformation. But my question is titled generally, as it is about executing filters before the aforementioned client mapping filter.
Through hacking I am able to achieve what I want. If we take a look at the source code, we can see that the filter I want to precede has an order of -2000. So if I register my global filter like this o.Filters.Add(typeof(MyResultFilter), -2001);, then the filter shown here correctly executes before ClientErrorResultFilter and thus I can handle the result and retain the transformation after the handling. However I feel like this is just exploiting the open-source-ness of .Net 6 and of course as you can see it's an internal constant, so I have no guarantee the next patch doesn't change it and my code breaks. Surely there must be a proper way to order my filter to run before the api transform.
I am currently working on a REST API for a project. In the process I should search for events. I would like to make an endpoint for searching events in a period. That is, specify two parameters with from - to.
For the search you normally take a GET operation. My question is now it makes sense to specify two parameters in the path or should I rather fall back to a POST operation for something like that.
Example for the path /Events{From}{To}
Is this even feasible with multiple parameters?
If you are not making a change on the resource, you should use GET operation.
More detailed explanation:
If you were writing a plain old RPC API call, they could technically interchangeable as long as the processing server side were no different between both calls. However, in order for the call to be RESTful, calling the endpoint via the GET method should have a distinct functionality (which is to get resource(s)) from the POST method (which is to create new resources).
GET request with multiple parameters: /events?param1=value1¶m2=value2
GET request with an array as parameter: /events?param=value1,value2,value3
I'd like to store some JMeterVariables together with the sampleResults to an influxdb using a BackendListenerClient for influxdb (I am using package rocks.nt.apm.jmeter to get the raw results).
My current test logs in for a random customer requests some random entities and logs out. Most of the results are within a range, I'd like to zoom in to certain extreme sample results, find out for which customer / requested entity these results are. We have seen in the past we can find performance issues with specific configurations this way.
I store customer and entity ID in a variable. My issue is that the JMeterVariables are not accessible from the BackendListenerClient. I looked at the sample_variables property, but this property will store the variables in the sampleEvent, which is not accessible in the BackendListener.
I could use the threadName, or sample label to store the vars, but I saw the CSVwriter can actually write the var values from the event, which is a much nicer solution.
Looking forward on your thoughts,
Best regards, Spud
You get it right - the Backend Listener is not customizable in terms of fine-shaping the data you're sending to Influx.
Alas.
However, there's a Swiss Army Knife always available in JMeter: the JSR223 components.
The JSR223 listener, in your case.
The InfluxDB line protocol is simple as simple could be, the HTTP/Rest libraries are
in abundance (Apache HTTP must have been already included with standard JMeter, to my recollection, no additional jars needed) - just pick it all up, form your timeseries as you like, toss it towards your InfluxDB REST endpoint, job's done.
I would like to create a kind of a "smart" MoveIteratorFactory for my VRP (time windowed) example based app. This move factory should return an Iterator that would generate each time a CompositeMove based on the current solution state.
Is it possible for the MoveIteratorFactory to create an Iterator that generates moves based on current solution state?
AFAIK MoveIteratorFactory's methods accept a ScoreDirector object, and it seems that the returned Iterator should generate moves using instances retrieved from the ScoreDirector's working solution. But are these instances being updated while the solver process is undergoing? Do they have all planning variables set according to the current working solution state, when hasNext and next methods are called? Or should an iterator have a field with a ScoreDirector instance, and generate moves using instances retrieved each time from the ScoreDirector?
Yes, just make sure sure that the cacheType isn't PHASE or higher (by default it's fine because by default it's JUST_IN_TIME). See docs chapter 7.
At the beginning of every step it will call createRandomMoveIterator(), which can take into account the state of the current workingSolution.
I've been fighting and fighting for some time with a decent way to handle a workflow based on a series of asynchronous ASIHTTPRequests (I am using queues). So far it seems to have eluded me and I always end with a hideous mess of delegate calls and spaghetti code exploding all over my project.
It works as follows:
Download a list of items (1 single ASIHTTPRequest, added to a queue).
The items retrieved in step 1 need to be stored.
Each item, from 1 is then parsed, queuing a 1 ASIHTTPRequest per item, for it's sub-items.
Each of the requests from step 3 are processed and the sub-items stored.
I need to be able to update the UI with the progress %age and messages.
I'm unable for the life of me to figure out a clean/maintainable way of doing this.
I've looked at the following links:
Manage Multiple Asynchronous Requests in iOS with ASINetworkQueue
Sync-Async Pair Pattern Easy Concurrency on iOS
But either I'm missing something, or they don't seem to adequately describe what I'm trying to achieve.
Could I use blocks?
I see myself facing a quite similar issue as I got the exercise to work on a app using a set of async http and ftp handlers in a set of process and workflows.
I'm not aware about ASIHTTP API but I assume I did something similar.
I defined a so called RequestOperationQueue which can for example represent all request operations of a certain workflow. Also I defined several template operations for example FTPDownloadOperation. And here comes the clue. I implemented all these RequestOperations more or less accroding to the idea of http://www.dribin.org/dave/blog/archives/2009/05/05/concurrent_operations/. Instead of implementing the delegate logic in the operation itself I implemented sth like callback handlers specialized for the different protocols (http, ftp, rsync, etc) providing a status property for the certain request which can be handled by the operation via KVO.
The UI can be notified about the workflow for example by a delegate protocol for RequestOperationQueue. for example didReceiveCallbackForRQOperation:(RequestOperation) rqo.
From my point of view the coding of workflows including client-server operations gets quite handy with this approach.