Testing Flink window

Testing Flink window - testing

I have a simple Flink application, which sums up the events with the same id and timestamp within the last minute:
DataStream<String> input = env
.addSource(consumerProps)
.uid("app");
DataStream<Pixel> pixels = input.map(record -> mapper.readValue(record, Pixel.class));
pixels
.keyBy("id", "timestampRoundedToMinutes")
.timeWindow(Time.minutes(1))
.sum("constant")
.addSink(dynamoDBSink);
env.execute(jobName);
I am trying to test this application with the recommended approach in documentation. I also have looked at this stackoverflow question, but adding the sink hadn't helped.
I do have a #ClassRule as recommended in my test class. The function looks like this:
StreamExecutionEnvironment env=StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(2);
CollectSink.values.clear();
Pixel testPixel1 = Pixel.builder().id(1).timestampRoundedToMinutes("202002261219").constant(1).build();
Pixel testPixel2 = Pixel.builder().id(2).timestampRoundedToMinutes("202002261220").constant(1).build();
Pixel testPixel3 = Pixel.builder().id(1).timestampRoundedToMinutes("202002261219").constant(1).build();
Pixel testPixel4 = Pixel.builder().id(3).timestampRoundedToMinutes("202002261220").constant(1).build();
env.fromElements(testPixel1, testPixel2, testPixel3, testPixel4)
.keyBy("id","timestampRoundedToMinutes")
.timeWindow(Time.minutes(1))
.sum("constant")
.addSink(new CollectSink());
JobExecutionResult result = env.execute("AggregationTest");
assertNotEquals(0, CollectSink.values.size());
CollectSink is copied from documentation.
What am I doing wrong? Is there also a simple way to test the application with embedded kafka?
Thanks!

The reason why your test is failing is because the window is never triggered. The job runs to completion before the window can reach the end of its allotted time.
The reason for this has to do with the way you are working with time. By specifying
.keyBy("id","timestampRoundedToMinutes")
you are arranging for all the events for the same id and with timestamps within the same minute to be in the same window. But because you are using processing time windowing (rather than event time windowing), your windows won't close until the time of day when the test is running crosses over the boundary from one minute to the next. With only four events to process, your job is highly unlikely to run long enough for this to happen.
What you should do instead is something more like this: set the time characteristic to event time, and provide a timestamp extractor and watermark assigner. Note that by doing this, there's no need to key by the timestamp rounded to minute boundaries -- that's part of what event time windows do anyway.
public static void main(String[] args) throws Exception {
...
env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime);
env.fromElements(testPixel1, testPixel2, testPixel3, testPixel4)
.assignTimestampsAndWatermarks(new TimestampsAndWatermarks())
.keyBy("id")
.timeWindow(Time.minutes(1))
.sum("constant")
.addSink(new CollectSink());
env.execute();
}
private static class TimestampsAndWatermarks extends BoundedOutOfOrdernessTimestampExtractor<Event> {
public TimestampsAndWatermarks() {
super(/* delay to handle out-of-orderness */);
}
#Override
public long extractTimestamp(Event event) {
return event.timestamp;
}
}
See the documentation and the tutorials for more about event time, watermarks, and windowing.

Related

How to read data from BigQuery periodically in Apache Beam?

I want to read data from Bigquery periodically in Beam, and the test codes as below
pipeline.apply("Generate Sequence",
GenerateSequence.from(0).withRate(1, Duration.standardMinutes(2)))
.apply(Window.into(FixedWindows.of(Duration.standardMinutes(2))))
.apply("Read from BQ", new ReadBQ())
.apply("Convert Row",
MapElements.into(TypeDescriptor.of(MyData.class)).via(MyData::fromTableRow))
.apply("Map TableRow", ParDo.of(new MapTableRowV1()))
;
static class ReadBQ extends PTransform<PCollection<Long>, PCollection<TableRow>> {
#Override
public PCollection<TableRow> expand(PCollection<Long> input) {
BigQueryIO.TypedRead<TableRow> rows = BigQueryIO.readTableRows()
.fromQuery("select * from project.dataset.table limit 10")
.usingStandardSql();
return rows.expand(input.getPipeline().begin());
}
}
static class MapTableRowV1 extends DoFn<AdUnitECPM, Void> {
#ProcessElement
public void processElement(ProcessContext pc) {
LOG.info("String of mydata is " + pc.element().toString());
}
}
Since BigQueryIO.TypedRead is related to PBegin, one trick is done in ReadBQ through rows.expand(input.getPipeline().begin()). However, this job does NOT run every two minutes. How to read data from bigquery periodically?

Look at using Looping Timers. That provides the right pattern.
As written your code would only fire once after sequence is built. For fixed windows you would need an input value coming into the Window for it to trigger. For example, have the pipeline read a Pub/Sub input and then have an agent push events every 2 minutes into the topic/sub.
I am going to assume that you are running in streaming mode here; however, another way to do this would be to use a batch job and then run it every 2 mins from Composer. Reason being if your job is idle for effectively 90 secs (2 min - processing time) your streaming job wasting some resources.
One other note: Look at thinning down you column selection in your BigQuery SQL (to save time and money). Look at using some filter on your SQL to pick up a partition or cluster. Look at using #timestamp filter to only scan records that are new in last N. This could give you better control over how you deal with latency and variability at the db level.

As you have mentioned in the question, BigQueryIO read transforms start with PBegin, which puts it at the start of the Graph. In order to achieve what you are looking for, you will need to make use of the BigQuery client libraries directly within a DoFn.
For an example of this have a look at this
transform
Using a normal DoFn for this will be ok for small amounts of data, but for a large amount of data, you will want to look at implementing that logic in a SDF.

FLOWABLE: How to change 5 minute default interval of Async job

I assume DefaultAsyncJobExecutor is the class which gets picked up by default as an implementation of AsyncExecutor interface (not sure if this assumption is right or not)
So basically I want to modify the default time-out duration of an asynchronous job, the default time-out duration is 5 minutes, which is the value of two variables:
timerLockTimeInMillis, asyncJobLockTimeInMillis in AbstractAsyncExecutor.java**
I tried to change both values with respective setter methods and tried to directly modify the value in the constructor of my custom implementation like this:
public class AsyncExecutorConfigImpl extends DefaultAsyncJobExecutor
{
// #Value( "${async.timeout.duration}" )
private int customAsyncJobLockTimeInMillis = 10 * 60 * 1000;
AsyncExecutorConfigImpl()
{
super();
setTimerLockTimeInMillis( this.customAsyncJobLockTimeInMillis );
setAsyncJobLockTimeInMillis( this.customAsyncJobLockTimeInMillis );
super.timerLockTimeInMillis = this.customAsyncJobLockTimeInMillis;
super.asyncJobLockTimeInMillis = this.customAsyncJobLockTimeInMillis;
}
}
But the values remain same because time-out still happens after 5 minutes.
Initialisation is done via an API, like start-new-process-instance, in this APIfollowing code is there to start the process instance
->Start a workflow process instance asynchronously something like this
(processInstanceName, processInstanceId)
ProcessInstance lProcessInstance = mRuntimeService.createProcessInstanceBuilder()
.processDefinitionId( lProcessDefinition.get().getId() )
.variables( processInstanceRequest.getVariables() )
.name( lProcessInstanceName )
.predefineProcessInstanceId( lProcessInstanceId )
.startAsync();
->Once this is done rest of the workflow involves service tasks and while one instance is executing, I guess the time-out occurs and instance gets restarted
-> Since, I have a listener configured I was able to see this in logs that start-event activity gets started after every 5 minutes
so for example: event-1 is the first event then this event is getting re-started after 5 minutes(duration is displayed in console logs)
Not sure, what I'm missing at this point, let me know if any other details required

if the jar file is not under your control you cannot change the default value of count because in the jar classes are compiled. You can only change the value inside of an object so you can super keyword:
class CustomImplementation extends DefaultExecutedClass{
private int custom_count=1234;
CustomImplementation(){
super();
super.count = this.custom_count;
}
}
otherwise if you really need to change the original file you have to extract it from the jar

When you are using the Flowable Spring Boot starters. Then the SpringAsyncExecutor is used, this uses the TaskExecutor from Spring. It's is provided as a bean. In order to change it's values you can use properties.
e.g.
flowable.process.async.executor.timer-lock-time-in-millis=600000
flowable.process.async.executor.async-job-lock-time-in-millis=600000
Note: Be careful when changing this. If your processes start is taking more than 5 minutes then this means that you have a transaction open for that duration of the time.

JMeter - Avoid threads abrupt shutdown

I have a testPlan that has several transacion controllers (that I called UserJourneys) and each one is composed by some samplers (JourneySteps).
The problem I'm facing is that once the test duration is over, Jmeter kills all the threads and does not take into consideration if they are in the middle of a UserJourney (transaction controller) or not.
On some of these UJs I do some important stuff that needs to be done before the user logs in again, otherwise the next iterations (new test run) will fail.
The question is: Is there a way to tell to JMeter that it needs to wait every thread reach the end of its flow/UJ/TransactionController before killing it?
Thanks in advance!

This is not possible as of version 5.1.1, you should request an enhancement at:
https://jmeter.apache.org/issues.html
The solution is to add as first child of Thread Group a Flow Control Action containing a JSR223 PreProcessor:
The JSR223 PreProcessor will contain this groovy code:
import org.apache.jorphan.util.JMeterStopTestException;
long startDate = vars["TESTSTART.MS"].toLong();
long now = System.currentTimeMillis();
String testDuration = Parameters;
if ((now - startDate) >= testDuration.toLong()) {
log.info("Test duration "+testDuration+" reached");
throw new JMeterStopTestException("Test duration "+testDuration+"reached ");
} else {
log.info("Test duration "+testDuration+" not reached yet");
}
And be configured like this:
Finally you can set the property testDuration in millis on command line using:
-JtestDuration=3600000
If you'd like to learn more about JMeter and performance testing this book can help you.

Optaplanner : View intermediate score

Is there a way keep track of the score from time to time while the solver is running?
I currently instantiate my solver as follow
SolverFactory solverFactory = SolverFactory.createFromXmlResource("solver/SolverConfig.xml");
Solver solver = solverFactory.buildSolver();
solver.addEventListener(new SolverEventListener() {
#Override
public void bestSolutionChanged(BestSolutionChangedEvent event) {
logger.info("New best score : " + event.getNewBestScore().toShortString());
}
});
solver.solve(planningSolution);
This way I am able to see the logs every time the best score changes.
However, I would like to view the score after every 100 steps or after every 10 seconds. Is that possible?

If you turn on DEBUG (or TRACE) logging, you'll see it.
If you want to listen to it in java, that's not supported in the public API, but there's PhaseLifecycleListener in the internal implementation that has no backward compatibility guarantees...

StepVerifier expectNoEvent for 0ms does not work (want to check for no delay for special cases)

I have to create a per-customer backpressure, and implemented this with returning a Mono.justOrEmpty(authentication).delayElement(Duration.ofMillis(calculatedDelay));
This seems to be working fine in my unit-tests, BUT if I have a calculatedDelay of 0, I can't test this. This snippet returns a java.lang.AssertionError: unexpected end during a no-event expectation:
#Test
public void testSomeDelay_8CPUs_Load7() throws IOException {
when(cut.getLoad()).thenReturn("7");
when(cut.getCpuCount()).thenReturn(8L);
when(authentication.getName()).thenReturn("delayeduser");
Duration duration = StepVerifier
.withVirtualTime(() -> cut.get(Optional.of(authentication)))
.expectSubscription() // swallow subscribe event
.expectNoEvent(Duration.ofMillis(0)) // here is the culprit
.expectNextCount(1)
.verifyComplete();
}
I don't know how to check for the cases, where I'm expected no delay at all. This is BTW the same, when I return a Mono.justOrEmpty(authentication) (without any delayed subscription). I can't seem to check, that I have created the correct non-delayed flow.

Yeah this is a corner case that is hard to cover. Especially when you know that foo.delayElement(0) is actually simply returning foo unmodified.
What you could do is test the delay differently, by appending an elapsed() operator:
Duration duration = StepVerifier
.withVirtualTime(() -> cut.get(Optional.of(authentication))
.elapsed() //this will get correct timing of 0 with virtual time
.map(Tuple2::getT1) //we only care about the timing, T2 being the value
)
.expectNext(calculateDelay)
.verifyComplete();

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Testing Flink window - testing

Related

How to read data from BigQuery periodically in Apache Beam?

FLOWABLE: How to change 5 minute default interval of Async job

JMeter - Avoid threads abrupt shutdown

Optaplanner : View intermediate score

StepVerifier expectNoEvent for 0ms does not work (want to check for no delay for special cases)

Categories

Resources