Flink: How to process rest of finite stream with combination of countWindowAll() - kotlin

//assume following logic
val source = arrayOf(1,2,3,4,5,6,7,8,9,10,11,12) // total 12 elements
val env = StreamExecutionEnvironment.createLocalEnvironment(1);
val input = env.fromCollection(source)
.countWindowAll(5)
.aggregate(...) // pack them to List<Int> for bulk upload to DB
.addSink(...) // sends bulk
When i execute it - only first 10 processed, but rest 2 elements
are thrown away - flink shutdown without processing of them.
The only avoid for me - while i totally controll source data, i can push some well-known IGNORABLE_VALUES to source collection to fit window size and then ignore them in sink... but i think where is some far more professional way in flink.

You have a finite stream of 12 and a window that triggers for every 5 elements. So the first window gets 5 elements and then triggers, then the next 5 are received and it triggers, but the last 2 come and the job knows that no more are going to come. So since there aren't 5 elements in the window the trigger doesn't fire so nothing is done with them.

Related

Kafka streams: groupByKey and reduce not triggering action exactly once when error occurs in stream

I have a simple Kafka streams scenario where I am doing a groupyByKey then reduce and then an action. There could be duplicate events in the source topic hence the groupyByKey and reduce
The action could error and in that case, I need the streams app to reprocess that event. In the example below I'm always throwing an error to demonstrate the point.
It is very important that the action only ever happens once and at least once.
The problem I'm finding is that when the streams app reprocesses the event, the reduce function is being called and as it returns null the action doesn't get recalled.
As only one event is produced to the source topic TOPIC_NAME I would expect the reduce to not have any values and skip down to the mapValues.
val topologyBuilder = StreamsBuilder()
topologyBuilder.stream(
TOPIC_NAME,
Consumed.with(Serdes.String(), EventSerde())
)
.groupByKey(Grouped.with(Serdes.String(), EventSerde()))
.reduce { current, _ ->
println("reduce hit")
null
}
.mapValues { _, v ->
println(Id: "${v.correlationId}")
throw Exception("simulate error")
}
To cause the issue I run the streams app twice. This is the output:
First run
Id: 90e6aefb-8763-4861-8d82-1304a6b5654e
11:10:52.320 [test-app-dcea4eb1-a58f-4a30-905f-46dad446b31e-StreamThread-1] ERROR org.apache.kafka.streams.KafkaStreams - stream-client [test-app-dcea4eb1-a58f-4a30-905f-46dad446b31e] All stream threads have died. The instance will be in error state and should be closed.
Second run
reduce hit
As you can see the .mapValues doesn't get called on the second run even though it errored on the first run causing the streams app to reprocess the same event again.
Is it possible to be able to have a streams app re-process an event with a reduced step where it's treating the event like it's never seen before? - Or is there a better approach to how I'm doing this?
I was missing a property setting for the streams app.
props["processing.guarantee"]= "exactly_once"
By setting this, it will guarantee that any state created from the point of picking up the event will rollback in case of a exception being thrown and the streams app crashing.
The problem was that the streams app would pick up the event again to re-process but the reducer step had state which has persisted. By enabling the exactly_once setting it ensures that the reducer state is also rolled back.
It now successfully re-processes the event as if it had never seen it before

using .net StackExchange.Redis with "wait" isn't working as expected

doing a R/W test with redis cluster (servers): 1 master + 2 slaves. the following is the key WRITE code:
var trans = redisDatabase.CreateTransaction();
Task<bool> setResult = trans.StringSetAsync(key, serializedValue, TimeSpan.FromSeconds(10));
Task<RedisResult> waitResult = trans.ExecuteAsync("wait", 3, 10000);
trans.Execute();
trans.WaitAll(setResult, waitResult);
using the following as the connection string:
[server1 ip]:6379,[server2 ip]:6379,[server3 ip]:6379,ssl=False,abortConnect=False
running 100 threads which do 1000 loops of the following steps:
generate a GUID as key and random as value of 1024 bytes
writing the key (using the above code)
retrieve the key using "var stringValue =
redisDatabase.StringGet(key, CommandFlags.PreferSlave);"
compare the two values and print an error if they differ.
running this test a few times generates several errors - trying to understand why as the "wait" with (10 seconds!) operation should have guaranteed the write to all slaves before returning.
Any idea?
WAIT isn't supported by SE.Redis as explained by its prolific author at Stackexchange.redis lacks the "WAIT" support
What about improving consistency guarantees, by adding in some "check, write, read" iterations?
SET a new key value pair (master node)
Read it (set CommandFlags to DemandReplica.
Not there yet? Wait and Try X times.
4.a) Not there yet? SET again. go back to (3) or give up
4.b) There? You're "done"
Won't be perfect but it should reduce probability of losing a SET??

Ignite Data streamer optimization

I am using below settings:
allowOverwrite: false
nodeParallelOperations: 1
autoFlushFrequency: 10
perNodeBufferSize: 5000000
My records size is around 2000 bytes. And see the "grid-data-loader-flusher"
thread stats as below:
Thread Count Average Longest Duration
grid-data-loader-flusher-#100 38 4,737,793.579 30,427,862 180,036,156
What would be the best configurations for Data streamer?
Thanks
Its good to have parallel streaming mode for data streamer. You can achieve this by collecting you key-value records in java Map and call the streamer.addData() method in parallel mode over that map. Here is the snippet.
maptoStream.entrySet().parallelStream().forEach(streamer::addData);
Also, if you are setting allowOverWrite to false then you cant use your custom stream receiver to process your collection of records. In this case it will skip the record(s) if it is already there in cache.
Regarding buffersize, you need to wait till buffer gets full each time to get it flushed automatically to cache. flush frequency comes to your rescue in this case and it will do periodic flushing. so whatever condition first satisfies(either buffer gets full or flush frequency reach) it will do flush. I preferred calling manual flush after above method call.
I observed that streamer works well with much more big collection on which you will call streamer.addData() method in parallel.

Load Runner Session ID Changes Indefinitely

Good day
I'm trying to perform load testing with LoadRunner 11. Here's an issue:
I've got automatically generated script after actions recording
Need to catch Session ID. I do it with web_reg_save_param() in the next way:
web_reg_save_param("S_ID",
"LB=Set-Cookie: JSESSIONID=",
"RB=; Path=/app/;",
LAST);
web_add_cookie("S_ID; DOMAIN={host}");
I catch ID from the response (Tree View):
D2B6F5B05A1366C395F8E86D8212F324
Compare it with Replay Log and see:
"S_ID = 75C78912AE78D26BDBDE73EBD9ADB510".
Compare 2 IDs above with the next request ID and see 3rd ID (Tree View):
80FE367101229FA34EB6429F4822E595
Why do I have 3 different IDs?
Let me know if I have to provide extra information.
You should Use(Search=All) below Code. Provided your Right and left boundary is correct:
web_reg_save_param("S_ID",
"LB=Set-Cookie: JSESSIONID=",
"RB=; Path=/app/;",
"Search=All",
LAST);
web_add_cookie("{S_ID}; DOMAIN={host}");
For Details refer HP Mannual for web_reg_save_param function.
I do not see what the conflict or controversy is here. Yes, items related to state or session will definitely change from user to user, one recording session to the next. They may even change from one request to the next. You may need to record several times to identify the change and use pattern for when you need to collect and when you need to reuse the collected data from a response in a subsequent request.
Take a listen to this podcast. It should help
http://www.perfbytes.com/dynamic-data-correlation

JMeter HTTP Request Post Body from File

I am trying to send an HTTP request via JMeter. I have created a thread group with a loop count of 25. I have a ramp up period of 120 and number of threads set to 30. Within the thread group, I have 20 HTTP Requests. I am a little confused as to how JMeter runs these requests. Do each of the 20 requests within a thread group run in a single thread, and each loop over a thread group runs concurrently on a different thread? Or do each of the 20 requests run in different threads as and when they are available.
My other question is, Over each loop, I want to vary the body of the post data that is being sent via the HTTP request. Is it possible to pass the post data body via a file instead of inserting the data into the JMeter Body Data Tab as show below:
However, instead of doing that, I want to define some kind of variable that picks a file based on iteration of the threadgroup that is running, for example, if it is looping over the thread group the second time, i want to call test2.txt, if the third time test3.txt etc and these text files will contain different post data. Could anyone tell me if this is possible with JMeter please and if so, how would I go about doing this.
Point 1 - JMeter concurrency
JMeter starts with 1 thread and spawns more threads as per ramp-up set. In your case (30 threads and 120 seconds ramp-up) another thread is being added each 4 seconds. Each thread executes 20 requests and if there is another loop - starts over, if there is no loop - the threads shuts down. To control load and concurrency JMeter provides 2 options:
Synchronizing Timer - pause all threads till specified threshold is reached and then release all of them at the same time
Constant Throughput Timer - to specify the load in requests per minute.
Point 2 - Send file instead of text
You can replace your request body with __fileToString function. If you want to parametrize it you can use nested function to provide current iteration - see below.
Point 3 - adding iteration as a parameter
JMeter provides 2 options on how you can increment a counter each loop
Counter config element - starts from specified value and gets incremented by specified value each time it's called.
__counter function - start from 1 and gets incremented by 1 each time it's being called. Can be "per-user" or "global"
See How to Use JMeter Functions post series for comprehensive information on above and more JMeter functions.