Subscibe not print out any log if using publishOn in Project Reactor - spring-webflux

I've got a very simple stream based on the book of Hands-On Reactive Programming in Spring 5.
Flux.just(1, 2, 3).publishOn(Schedulers.elastic()))
.concatMap(i -> Flux.range(0, i).publishOn(Schedulers.elastic()))
.subscribe(log::info);
However, there's no console output at all. But if I add doOnNext after just:
Flux.just(1, 2, 3).doOnNext(log::debug).publishOn(Schedulers.elastic()))
.concatMap(i -> Flux.range(0, i).publishOn(Schedulers.elastic()))
.subscribe(log::info);
then I can get both output of debug and info. May I know why?
Edit 1:
There's the console output of the following stream:
Flux.just(1, 2, 3).doOnNext(log::debug)
.publishOn(Schedulers.elastic())).doOnNext(log::warn)
.concatMap(i -> Flux.range(0, i).publishOn(Schedulers.elastic()))
.subscribe(log::info);
And output:
[main] INFO ReactiveTest - 1
[main] INFO ReactiveTest - 2
[elastic-2] WARN ReactiveTest - 1
[main] INFO ReactiveTest - 3
[elastic-2] DEBUG ReactiveTest - 0
[elastic-2] WARN ReactiveTest - 2
[elastic-2] DEBUG ReactiveTest - 0
[elastic-2] DEBUG ReactiveTest - 1
[elastic-2] WARN ReactiveTest - 3
[elastic-2] DEBUG ReactiveTest - 0
[elastic-2] DEBUG ReactiveTest - 1
[elastic-2] DEBUG ReactiveTest - 2
I think the log messages prove that the function in subscribe will be called at the same thread as the function of concatMap.

Your first program is probably terminating right after you call subscribe. From the docs of subscribe:
Keep in mind that since the sequence can be asynchronous, this will immediately
return control to the calling thread. This can give the impression the consumer is
not invoked when executing in a main thread or a unit test for instance.
In second program, doOnNext is invoked in the middle of processing, so it has time to output all the results. If you run the program many times you will see that it sometimes is not able to output the second log.

Related

Parallel processing Flux::groupBy hangs

I'm using reactor 3.4.18 and have a question about Flux.groupBy. I have generated 1000 integers and splited them into 100 groups, I expect that each group could be process in sperate thread but it hangs after several integers processed.
#Test
void shouldGroupByKeyAndProcessInParallel() {
final Scheduler scheduler = Schedulers.newParallel("group", 1000);
StepVerifier.create(Flux.fromStream(IntStream.range(0, 1000).boxed())
.groupBy(integer -> integer % 100)
.flatMap(groupedFlux -> groupedFlux
.subscribeOn(scheduler) // this line doesn't help
.doOnNext(integer -> log.info("processing {}:{}", groupedFlux.key(), integer)),
2)
)
.expectNextCount(1000)
.verifyComplete();
}
test execution logs:
10:47:58.670 [main] DEBUG reactor.util.Loggers - Using Slf4j logging framework
10:47:58.846 [group-1] INFO com.huawei.hwclouds.coney.spike.FluxGroupByTest - processing 0:0
10:47:58.866 [group-1] INFO com.huawei.hwclouds.coney.spike.FluxGroupByTest - processing 1:1
10:47:58.867 [group-1] INFO com.huawei.hwclouds.coney.spike.FluxGroupByTest - processing 0:100
10:47:58.867 [group-1] INFO com.huawei.hwclouds.coney.spike.FluxGroupByTest - processing 1:101
10:47:58.867 [group-1] INFO com.huawei.hwclouds.coney.spike.FluxGroupByTest - processing 0:200
10:47:58.867 [group-1] INFO com.huawei.hwclouds.coney.spike.FluxGroupByTest - processing 1:201
-------- start hanging ----------
I have changed the flatmap concurrecy to 2 to speed up the reproduction. I expects that flatmap should only slow the whole processing time but should not hang

Calling Karate feature file returns response object including multiple copies of previous response object of parent scenario

I am investigating exponential increase in JAVA heap size when executing complex scenarios especially with multiple reusable scenarios. This is my attempt to troubleshoot the issue with simple example and possible explanation to JVM heap usage.
Environment: Karate 1.1.0.RC4 | JDK 14 | Maven 3.6.3
Example: Download project, extract and execute maven command as per READEME
Observation: As per following example, if we call same scenario multiple times, response object grows exponentially since it includes response from previous called scenario along with copies of global variables.
#unexpected
Scenario: Not over-writing nested variable
* def response = call read('classpath:examples/library.feature#getLibraryData')
* string out = response
* def resp1 = response.randomTag
* karate.log('FIRST RESPONSE SIZE = ', out.length)
* def response = call read('classpath:examples/library.feature#getLibraryData')
* string out = response
* def resp2 = response.randomTag
* karate.log('SECOND RESPONSE SIZE = ', out.length)
Output:
10:26:23.863 [main] INFO c.intuit.karate.core.FeatureRuntime - scenario called at line: 9 by tag: getLibraryData
10:26:23.875 [main] INFO c.intuit.karate.core.FeatureRuntime - scenario called at line: 14 by tag: libraryData
10:26:23.885 [main] INFO com.intuit.karate - FIRST RESPONSE SIZE = 331
10:26:23.885 [main] INFO c.intuit.karate.core.FeatureRuntime - scenario called at line: 9 by tag: getLibraryData
10:26:23.894 [main] INFO c.intuit.karate.core.FeatureRuntime - scenario called at line: 14 by tag: libraryData
10:26:23.974 [main] INFO com.intuit.karate - SECOND RESPONSE SIZE = 1783
10:26:23.974 [main] INFO c.intuit.karate.core.FeatureRuntime - scenario called at line: 9 by tag: getLibraryData
10:26:23.974 [main] INFO c.intuit.karate.core.FeatureRuntime - scenario called at line: 14 by tag: libraryData
10:26:23.988 [main] INFO com.intuit.karate - THIRD RESPONSE SIZE = 8009
Do we really need to include response and global variables in the response of called feature file (non-shared scope)?
When we read large json file and call multiple reusable scenario files, each time copy of read json data gets added to response object. Is there way to avoid this behavior?
Is there a better way to script complex test using reusable scenarios without having multiple copies of same variables?
Okay, can you look at this issue:
https://github.com/intuit/karate/issues/1675
I agree we can optimize the response and global variables. Would be great if you can contribute code.

Is there a better way to reference sub-features so that this test finishes?

When running the following scenario, the tests finish running but execution hangs immediately after and the gradle test command never finishes. The cucumber report isn't built, so it hangs before that point.
It seems to be caused by having 2 call read() to different scenarios, that both call a third scenario. That third scenario references the parent context to inspect the current request.
When that parent request is stored in a variable the tests hang. When that variable is cleared before leaving that third scenario, the test finishes as normal. So something about having a reference to that context hangs the tests at the end.
Is there a reason this doesn't complete? Am I missing some important code that lets the tests finish?
I've added * def currentRequest = {} at the end of the special-request scenario and that allows the tests to complete, but that seems like a hack.
This is the top-level test scenario:
Scenario: Updates user id
* def user = call read('utils.feature#endpoint=create-user')
* set user.clientAccountId = user.accountNumber + '-test-client-account-id'
* call read('utils.feature#endpoint=update-user') user
* print 'the test is done!'
The test scenario calls 2 different scenarios in the same utls.feature file
utils.feature:
#ignore
Feature: /users
Background:
* url baseUrl
#endpoint=create-user
Scenario: create a standard user for a test
Given path '/create'
* def restMethod = 'post'
* call read('special-request.feature')
When method restMethod
Then status 201
#endpoint=update-user
Scenario: set a user's client account ID
Given path '/update'
* def restMethod = 'put'
* call read('special-request.feature')
When method restMethod
Then status 201
And match response == {"status":"Success", "message":"Update complete"}
Both of the util scenarios call the special-request feature with different parameters/requests.
special-request.feature:
#ignore
Feature: Builds a special
Scenario: special-request
# The next line causes the test to sit for a long time
* def currentRequest = karate.context.parentContext.getRequest()
# Without the below clear of currentRequest, the test never finishes
# We are de-referencing the parent context's request allows test to finish
* def currentRequest = {}
without currentRequest = {} these are the last lines of output I get before the tests seem to stop.
12:21:38.816 [ForkJoinPool-1-worker-1] DEBUG com.intuit.karate - response time in milliseconds: 8.48
1 < 201
1 < Content-Type: application/json
{
"status": "Success",
"message": "Update complete"
}
12:21:38.817 [ForkJoinPool-1-worker-1] DEBUG com.jayway.jsonpath.internal.path.CompiledPath - Evaluating path: $
12:21:38.817 [ForkJoinPool-1-worker-1] DEBUG com.jayway.jsonpath.internal.path.CompiledPath - Evaluating path: $
12:21:38.817 [ForkJoinPool-1-worker-1] DEBUG com.jayway.jsonpath.internal.path.CompiledPath - Evaluating path: $
12:21:38.817 [ForkJoinPool-1-worker-1] DEBUG com.jayway.jsonpath.internal.path.CompiledPath - Evaluating path: $
12:21:38.818 [ForkJoinPool-1-worker-1] INFO com.intuit.karate - [print] the test is done!
12:21:38.818 [pool-1-thread-1] DEBUG com.jayway.jsonpath.internal.path.CompiledPath - Evaluating path: $
<==========---> 81% EXECUTING [39s]
With currentRequest = {}, the test completes and the cucumber report generates successfully which is what I would expect to happen even without that line.
Two comments:
* karate.context.parentContext.getRequest()
Wow, these are internal API-s not intended for users to use, I would strongly advise passing values around as variables instead. So all bets are off if you have trouble with that.
It does sound like you have a null-pointer in the above (no surprises here).
There is a bug in 0.9.4 that causes test failures in some edge cases such as the things you are doing, pre-test life-cycle or failures in karate-config.js to hang the parallel runner. You should see something in the logs that indicates a failure, if not - do try help us replicate this problem.
This should be fixed in the develop branch, so you could help if you can build from source and test locally. Instructions are here: https://github.com/intuit/karate/wiki/Developer-Guide
And if you still see a problem, please do this: https://github.com/intuit/karate/wiki/How-to-Submit-an-Issue

Failure of Importing data from Bigquery to GCS

Dear support at Google,
We recently noticed that many of the GAP site import jobs extracting&uploading data from Google Bigquery to Google Cloud Service have been failing (Since April 4th). Our uploading jobs are running fine before April 4th but have been failing since April 4th, after did investigation, we feel this is an issue/error from Bigquery side, not from our job. The details of error info from Bigquery API when uploading data is shown below:
216769 [main] INFO  org.mortbay.log  - Dataset : 130288123
217495 [main] INFO  org.mortbay.log  - Job is PENDING waiting 10000 milliseconds...
227753 [main] INFO  org.mortbay.log  - Job is PENDING waiting 10000 milliseconds...
237995 [main] INFO  org.mortbay.log  - Job is PENDING waiting 10000 milliseconds...
Heart beat
248208 [main] INFO  org.mortbay.log  - Job is PENDING waiting 10000 milliseconds..
258413 [main] INFO  org.mortbay.log  - Job is PENDING waiting 10000 milliseconds...
268531 [main] INFO  org.mortbay.log  - Job is RUNNING waiting 10000 milliseconds...
Heart beat
278675 [main] INFO  org.mortbay.log  - An internal error has occurred
278675 [main] INFO  org.mortbay.log  - ErrorProto : null
 
As per log, it is an internal error with the issue ErrorProto:null.
 
Our google account: ea.eadp#gmail.com
 
Our Google Big Query projects:
Origin-BQ              origin-bq-1
Pulse-web             lithe-creek-712
The importing failure on following data set:
 
In Pulse-web, lithe-creek-712:
101983605
130288123
48135564
56570684
57740926
64736126
64951872
72220498
72845162
73148296
77517207
86821637
 
 
Please look into this and let us know if you have any updates.
Thank you very much and looking forward to hearing back from you.
 
Thanks

Spark execution occasionally gets stuck at mapPartitions at Exchange.scala:44

I am running a Spark job on a two node standalone cluster (v 1.0.1).
Spark execution often gets stuck at the task mapPartitions at Exchange.scala:44.
This happens at the final stage of my job in a call to saveAsTextFile (as I expect from Spark's lazy execution).
It is hard to diagnose the problem because I never experience it in local mode with local IO paths, and occasionally the job on the cluster does complete as expected with the correct output (same output as with local mode).
This seems possibly related to reading from s3 (of a ~170MB file) immediately prior, as I see the following logging in the console:
DEBUG NativeS3FileSystem - getFileStatus returning 'file' for key '[PATH_REMOVED].avro'
INFO FileInputFormat - Total input paths to process : 1
DEBUG FileInputFormat - Total # of splits: 3
...
INFO DAGScheduler - Submitting 3 missing tasks from Stage 32 (MapPartitionsRDD[96] at mapPartitions at Exchange.scala:44)
DEBUG DAGScheduler - New pending tasks: Set(ShuffleMapTask(32, 0), ShuffleMapTask(32, 1), ShuffleMapTask(32, 2))
The last logging I see before the task apparently hangs/gets stuck is:
INFO NativeS3FileSystem: INFO NativeS3FileSystem: Opening key '[PATH_REMOVED].avro' for reading at position '67108864'
Has anyone else experience non-deterministic problems related to reading from s3 in Spark?