Webflux upload file not saving - spring-webflux

I have this piece of code:
#PostMapping(value = {"/store"})
public Mono<ResponseEntity<StoreResponse>> store(#RequestPart("file") Mono<FilePart> file,
#RequestPart("publicationId") String publicationId,
#RequestPart("visible") String visible) throws Exception {
return file
.doOnNext(this::checkFile)
.flatMap((f) -> this.saveFileToDatabase(UUID.fromString(publicationId),
f.filename(),
Boolean.parseBoolean(visible)))
.then(Mono.just(ResponseEntity.ok().body(new StoreResponse("", "", "Working",
null))))
.onErrorReturn(ResponseEntity.internalServerError().body(new StoreResponse("Not Working",
"", "Working", null)));
}
Question 1:
Strange thing with this is that this works as long as i use flatMap on the Mono.
When i switch to map instead of flatMap then it does not work (the file will not be written to a database this.saveFileToDatabase (via spring-data-r2dbc)).
Why is this the way it is?
Question 2:
When i want to do another operation (save file to a minio container) after the saving to database - how can i chain this into the given code? Another then()?

When I switch to map instead of flatMap then it does not work
map operator is designed to perform 1 to 1 synchronous operations like a simple object mapping. On the other hand, flatMap is used for asynchronous I/O operations like as in your case. See map vs flatMap in Reactor
When i want to do another operation after the saving to database
You could use a second flatmap like this:
.flatMap(file -> this.saveFileToDatabase(...).map(reponse -> file))
.flatMap(file -> this.saveFileToContainer(...))

Related

Flux File upload validation - file type

I am new to reactive programming. I am using flux for file upload. I need to make sure that all the files uploaded are of a specific type. If not I need to fail the request.
File.flatmap(input-> validate file())
.flatMAp(output->uploadtoazur())
My problem is when the second file is unacceptable type the first file has been processed. I want validateFile to scan all file and then do processing further
Basically if you want to process all the files at a time, not one-by-one, you should first collect them, since you're dealing with Flux. So, you can achieve it with collectList() on your Flux
And then, having List of your files, you can validate and process them. Here is an example of making your validation with handle()
On your Flux of files:
.collectList() // collect all files to List
.handle((files, sink) -> {
// validate all your files here, for example, using regular Stream API with allMatch()
...
if (allValid) {
// return these files if all files are valid
sink.next(files);
} else {
// throw Exception if some files are not valid
sink.error(new Exception("Some files are not valid"));
}
})
...
// further processing
This is one of the many possible ideas how to achieve what you want.
P.S. Actually you should have provided more code and format it properly.

Hwo to convert Flux<Item> to List<Item> by blocking

Background
I have a legacy application where I need to return a List<Item>
There are many different Service classes each belonging to an ItemType.
Each service class calls a few different backend APIs and collects the responses to create a SubType of the Item.
So we can say, each service class implementation returns an Item
All backend API access code is using WebClient which returns Mono of some type, and I can zip all Mono within the service to create an Item
The user should be able to look up many different types of items in one call. This requires many backend calls
So for performance sake, I wanted to make this all asynchronous using reactor, so I introduced Spring Reactive code.
Problem
If my endpoint had to return Flux<Item> then this code work fine,
But this is some service code which is used by other legacy code caller.
So eventually I want to return the List<Item> but When I try to convert my Flux into the List I get an error
"message": "block()/blockFirst()/blockLast() are blocking,
which is not supported in thread reactor-http-nio-3",
Here is the service, which is calling a few other service classes.
Flux<Item> itemFlux = Flux.fromIterable(searchRequestByItemType.entrySet())
.flatMap(e ->
getService(e.getKey()).searchItems(e.getValue()))
.subscribeOn(Schedulers.boundedElastic());
Mono<List<Item>> listMono = itemFlux
.collectList()
.block(); //This line throws error
Here is what the above service is calling
default Flux<Item> searchItems(List<SingleItemSearchRequest> requests) {
return Flux.fromIterable(requests)
.flatMap(this::searchItem)
.subscribeOn(Schedulers.boundedElastic());
}
Here is what a single-item search is which is used by above
public Mono<Item> searchItem(SingleItemSearchRequest sisr) {
return Mono.zip(backendApi.getItemANameApi(sisr.getItemIdentifiers().getItemId()),
sisr.isAddXXXDetails()
?backendApi.getItemAXXXApi(sisr.getItemIdentifiers().getItemId())
:Mono.empty(),
sisr.isAddYYYDetails()
?backendApi.getItemAYYYApi(sisr.getItemIdentifiers().getItemId())
:Mono.empty())
.map(tuple3 -> Item.builder()
.name(tuple3.getT1())
.xxxDetails(tuple3.getT2())
.yyyDetails(tuple3.getT3())
.build()
);
}
Sample project to replicate the problem..
https://github.com/mps-learning/spring-reactive-example
I’m new to spring reactor, feel free to pinpoint ALL errors in the code.
UPDATE
As per Patrick Hooijer Bonus suggestion, updating the Mono.zip entries to always contain some default.
#Override
public Mono<Item> searchItem(SingleItemSearchRequest sisr) {
System.out.println("\t\tInside " + supportedItem() + " searchItem with thread " + Thread.currentThread().toString());
//TODO: how to make these XXX YYY calls conditionals In clear way?
return Mono.zip(getNameDetails(sisr).defaultIfEmpty("Default Name"),
getXXXDetails(sisr).defaultIfEmpty("Default XXX Details"),
getYYYDetails(sisr).defaultIfEmpty("Default YYY Details"))
.map(tuple3 -> Item.builder()
.name(tuple3.getT1())
.xxxDetails(tuple3.getT2())
.yyyDetails(tuple3.getT3())
.build()
);
}
private Mono<String> getNameDetails(SingleItemSearchRequest sisr) {
return mockBackendApi.getItemCNameApi(sisr.getItemIdentifiers().getItemId());
}
private Mono<String> getYYYDetails(SingleItemSearchRequest sisr) {
return sisr.isAddYYYDetails()
? mockBackendApi.getItemCYYYApi(sisr.getItemIdentifiers().getItemId())
: Mono.empty();
}
private Mono<String> getXXXDetails(SingleItemSearchRequest sisr) {
return sisr.isAddXXXDetails()
? mockBackendApi.getItemCXXXApi(sisr.getItemIdentifiers().getItemId())
: Mono.empty();
}
Edit: Below answer does not solve the issue, but it contains useful information about Thread switching. It does not work because .block() is no problem for non-blocking Schedulers if it's used to switch to synchronous code.
This is because the block operator inherited the reactor-http-nio-3 Thread from backendApi.getItemANameApi (or one of the other calls in Mono.zip), which is non-blocking.
Most operators continue working on the Thread on which the previous operator executed, this is because the Thread is linked to the emitted item. There are two groups of operators where the Thread of the output item differs from the input:
flatMap, concatMap, zip, etc: Operators that emit items from other Publishers will keep the Thread link they received from this inner Publisher, not from the input.
Time based operators like delayElements, interval, buffer(Duration), etc. will schedule their tasks on the provided Scheduler, or Schedulers.parallel() if none provided. The emitted items will then be linked to the Thread the task was scheduled on.
In your case, Mono.zip emits items from backendApi.getItemANameApi linked to reactor-http-nio-3, which gets propagated downstream, goes outside both the flatMap in searchItems and in itemFlux, until it reaches your block operator.
You can solve this by placing a .publishOn(Schedulers.boundedElastic()), either in searchItem, searchItems or itemFlux. This will cause the item to switch to a Thread in the provided Scheduler.
Bonus: Since you requested to pinpoint errors: Your Mono.zip will not work if sisr.isAddXXXDetails() is false, as Mono.zip discards any element it could not zip. Since you return a Mono.empty() in that case, no items can be zipped and it will return an empty Mono.
If we have only spring-boot-starter-webflux defined as application dependency, then springbok spin up a `Netty server.
One is not expected to block() in a reactive application using a non-blocking server.
However, once we add spring-boot-starter-web dependency then even with the presence of spring-boot-starter-webflux, springboot spinup a tomcat server. Which is a thread-per-request model and is expected to have blocking calls
So to solve my problem, all I had to do above is, to add spring-boot-starter-web dependency in pom.xml. After that applications is started in Tomcat
with timcat .collectList().block() works in Controller class to return the List<Item>.
Whereas with the Netty server I could return only Flux<Item> not List<Item>, which is expected.

Neo4j 3.5's embeded database does not seem to persist data

I am trying to build a small command line tool that will store data in a neo4j graph. To do this I have started experimenting with Neo4j3.5's embedded databases. After putting together the following example I have found that either the nodes I am creating are not being saved to the database or the method of database creation is overwriting my previous run.
The Example:
fun main() {
//Spin up data base
val graphDBFactory = GraphDatabaseFactory()
val graphDB = graphDBFactory.newEmbeddedDatabase(File("src/main/resources/neo4j"))
registerShutdownHook(graphDB)
val tx = graphDB.beginTx()
graphDB.createNode(Label.label("firstNode"))
graphDB.createNode(Label.label("secondNode"))
val result = graphDB.execute("MATCH (a) RETURN COUNT(a)")
println(result.resultAsString())
tx.success()
}
private fun registerShutdownHook(graphDb: GraphDatabaseService) {
// Registers a shutdown hook for the Neo4j instance so that it
// shuts down nicely when the VM exits (even if you "Ctrl-C" the
// running application).
Runtime.getRuntime().addShutdownHook(object : Thread() {
override fun run() {
graphDb.shutdown()
}
})
}
I would expect that every time I run main the resulting query count will increase by 2.
That is currently not the case and I can find nothing in the docs that references a different method of opening an already created embedded database. Am I trying to use the embedded database incorrectly or am I missing something? Any help or info would be appreciated.
build Info:
Kotlin jvm 1.4.21
Neo4j-comunity-3.5.35
Transactions in neo4j 3.x have a 3 stage model
create
success / failure
close
you missed the third, which would then commit or rollback.
You can use Kotlin's use as Transaction is an AutoCloseable

Spring data - webflux - Chaining requests

i use reactive Mongo Drivers and Web Flux dependancies
I have a code like below.
public Mono<Employee> editEmployee(EmployeeEditRequest employeeEditRequest) {
return employeeRepository.findById(employeeEditRequest.getId())
.map(employee -> {
BeanUtils.copyProperties(employeeEditRequest, employee);
return employeeRepository.save(employee)
})
}
Employee Repository has the following code
Mono<Employee> findById(String employeeId)
Does the thread actually block when findById is called? I understand the portion within map actually blocks the thread.
if it blocks, how can I make this code completely reactive?
Also, in this reactive paradigm of writing code, how do I handle that given employee is not found?
Yes, map is a blocking and synchronous operation for which time taken is always going to be deterministic.
Map should be used when you want to do the transformation of an object /data in fixed time. The operations which are done synchronously. eg your BeanUtils copy properties operation.
FlatMap should be used for non-blocking operations, or in short anything which returns back Mono,Flux.
"how do I handle that given employee is not found?" -
findById returns empty mono when not found. So we can use switchIfEmpty here.
Now let's come to what changes you can make to your code:
public Mono<Employee> editEmployee(EmployeeEditRequest employeeEditRequest) {
return employeeRepository.findById(employeeEditRequest.getId())
.switchIfEmpty(Mono.defer(() -> {
//do something
}))
.map(employee -> {
BeanUtils.copyProperties(employeeEditRequest, employee);
return employee;
})
.flatMap(employee -> employeeRepository.save(employee));
}

How to avoid duplicates in BigQuery by streaming with Apache Beam IO?

We are using a pretty simple flow where messages are retrieved from PubSub, their JSON content is being flatten into two types (for BigQuery and Postgres) and then inserted into both sinks.
But, we are seeing duplicates in both sinks (Postgres was kinda fixed with a unique constraint and a "ON CONFLICT... DO NOTHING").
At first we trusted in the supposedly "insertId" UUId that the Apache Beam/BigQuery creates.
Then we add a "unique_label" attribute to each message before queueing them into PubSub, using data from the JSON itself, which gives them uniqueness (a device_id + a reading's timestamp). And subscribed to the topic using that attribute with "withIdAttribute" method.
Finally we paid for GCP Support, and their "solutions" do not work. They have told us to even use Reshuffle transform, which is deprecated by the way, and some windowing (that we do not won't since we want near-real time data).
This the main flow, pretty basic:
[UPDATED WITH LAST CODE]
Pipeline
val options = PipelineOptionsFactory.fromArgs(*args).withValidation().`as`(OptionArgs::class.java)
val pipeline = Pipeline.create(options)
var mappings = ""
// Value only available at runtime
if (options.schemaFile.isAccessible){
mappings = readCloudFile(options.schemaFile.get())
}
val tableRowMapper = ReadingToTableRowMapper(mappings)
val postgresMapper = ReadingToPostgresMapper(mappings)
val pubsubMessages =
pipeline
.apply("ReadPubSubMessages",
PubsubIO
.readMessagesWithAttributes()
.withIdAttribute("id_label")
.fromTopic(options.pubSubInput))
pubsubMessages
.apply("AckPubSubMessages", ParDo.of(object: DoFn<PubsubMessage, String>() {
#ProcessElement
fun processElement(context: ProcessContext) {
LOG.info("Processing readings: " + context.element().attributeMap["id_label"])
context.output("")
}
}))
val disarmedMessages =
pubsubMessages
.apply("DisarmedPubSubMessages",
DisarmPubsubMessage(tableRowMapper, postgresMapper)
)
disarmedMessages
.get(TupleTags.readingErrorTag)
.apply("LogDisarmedErrors", ParDo.of(object: DoFn<String, String>() {
#ProcessElement
fun processElement(context: ProcessContext) {
LOG.info(context.element())
context.output("")
}
}))
disarmedMessages
.get(TupleTags.tableRowTag)
.apply("WriteToBigQuery",
BigQueryIO
.writeTableRows()
.withoutValidation()
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_NEVER)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withFailedInsertRetryPolicy(InsertRetryPolicy.neverRetry())
.to(options.bigQueryOutput)
)
pipeline.run()
DissarmPubsubMessage is a PTransforms that uses FlatMapElements transform to get TableRow and ReadingsInputFlatten (own class for Postgres)
We expect zero duplicates or the "best effort" (and we append some cleaning cron job), we paid for these products to run statistics and bigdata analysis...
[UPDATE 1]
I even append a new simple transform that logs our unique attribute through a ParDo which supposedly should ack the PubsubMessage, but this is not the case:
new flow with AckPubSubMessages step
Thanks!!
Looks like you are using the global window. One technique would be to window this into an N minute window. Then process the keys in the window and drop an items with dup keys.
The supported programming languages are Python and Java, your code seems to be Scala and as far as I know it is not supported. I strongly recommend using Java to avoid any unsupported feature for the programming language you use.
In addition, I would recommend the following approaches to work on duplicates, the option 2 could meet your need of near-real-time:
message_id. Probably you already read the FAQ - duplicates which points to deprecated doc. However, if you check the PubsubMessage object you will notice that messageId is still available and it will be populated if not set by the publisher:
"ID of this message, assigned by the server when the message is
published ... It must not be populated by the publisher in a
topics.publish call"
BigQuery Streaming. To validate duplicate during loading the data, right before inserting in BQ you can create UUID.Please refer the section Example sink: Google BigQuery.
Try the Dataflow template PubSubToBigQuery and validate there are not duplicates in BQ.