SpringIntegration message size to big, how to split - size

I have a SprintIntegration system with a JMS endpoint. The size limit for messages is 4mb. I have results which are larger then that, how do I get SI to split that up into several messages?
/A

In Spring Integration, you can use a Splitter to split your messages to not exceed e.g. 4MB.
<int:splitter id="splitter"
ref="splitterBean"
method="split"
input-channel="inputChannel"
output-channel="outputChannel" />
<beans:bean id="splitterBean" class="your.MessageSplitter"/>
or by using a #Splitter annotation.
When a message comes in to the splitter, you would apply the splitting logic inside your.MessageSplitter, and return a List<YourMessage>:
public class MessageSplitter {
public List<YourMessage> split( HugeMessage hugeMessage ) {
List nicelySizedMessages = new ArrayList<YourMessage>();
// splitting logic... that would parse "hugeMessage" and split it to
// nicelySizedMessages.add( ... ) "YourMessage"s
return nicelySizedMessages;
}
}
Spring Integration would take this list and would forward YourMessages from the list one by one.

Related

Chaining Reactive Asynchronus calls in spring

I’m very new to the SpringReactor project.
Until now I've only used Mono from WebClient .bodyToMono() steps, and mostly block() those Mono's or .zip() multiple of them.
But this time I have a usecase where I need to asynchronously call methods in multiple service classes, and those multiple service classes are calling multiple backend api.
I understand Project Reactor doesn't provide asynchronous flow by default.
But we can make the publishing and/or subscribing on different thread and make code asynchronous
And that's what I am trying to do.
I tried to read the documentation here reactor reference but still not clear.
For the purpose of this question, I’m making up this imaginary scenario. that is a little closer to my use case.
Let's assume we need to get a search response from google for some texts searched under images.
Example Scenario
Let's have an endpoint in a Controller
This endpoint accepts the following object from request body
MultimediaSearchRequest{
Set<String> searchTexts; //many texts.
boolean isAddContent;
boolean isAddMetadata;
}
in the controller, I’ll break the above single request object into multiple objects of the below type.
MultimediaSingleSearchRequest{
String searchText;
boolean isAddContent;
boolean isAddMetadata;
}
This Controller talks to 3 Service classes.
Each of the service classes has a method searchSingleItem.
Each service class uses a few different backend Apis, but finally combines the results of those APIs responses into the same type of response class, let's call it MultimediaSearchResult.
class JpegSearchHandleService {
public MultimediaSearchResult searchSingleItem
(MultimediaSingleSearchRequest req){
return comboneAllImageData(
getNameApi(req),
getImageUrlApi(req),
getContentApi(req) //dont call if req.isAddContent false
)
}
}
class GifSearchHandleService {
public MultimediaSearchResult searchSingleItem
(MultimediaSingleSearchRequest req){
return comboneAllImageData(
getNameApi(req),
gitPartApi(req),
someRandomApi(req),
soemOtherRandomApi(req)
)
}
}
class VideoSearchHandleService {
public MultimediaSearchResult searchSingleItem
(MultimediaSingleSearchRequest req){
return comboneAllImageData(
getNameApi(req),
codecApi(req),
commentsApi(req),
anotherApi(req)
)
}
}
In the end, my controller returns the response as a List of MultimediaSearchResult
Class MultimediaSearchResponse{
List< MultimediaSearchResult> results;
}
If I want to use this all asynchronously using the project reactor. how to achieve it.
Like calling searchSingleItem method in each service for each searchText asynchronously.
Even within the services call each backend API asynchronously (I’m already using WebClient and converting response bodyToMono for backend API calls)
First, I will outline a solution for the upper "layer" of your scenario.
The code (a simple simulation of the scenario):
public class ChainingAsyncCallsInSpring {
public Mono<MultimediaSearchResponse> controllerEndpoint(MultimediaSearchRequest req) {
return Flux.fromIterable(req.getSearchTexts())
.map(searchText -> new MultimediaSingleSearchRequest(searchText, req.isAddContent(), req.isAddMetadata()))
.flatMap(multimediaSingleSearchRequest -> Flux.merge(
classOneSearchSingleItem(multimediaSingleSearchRequest),
classTwoSearchSingleItem(multimediaSingleSearchRequest),
classThreeSearchSingleItem(multimediaSingleSearchRequest)
))
.collectList()
.map(MultimediaSearchResponse::new);
}
private Mono<MultimediaSearchResult> classOneSearchSingleItem(MultimediaSingleSearchRequest req) {
return Mono.just(new MultimediaSearchResult("1"));
}
private Mono<MultimediaSearchResult> classTwoSearchSingleItem(MultimediaSingleSearchRequest req) {
return Mono.just(new MultimediaSearchResult("2"));
}
private Mono<MultimediaSearchResult> classThreeSearchSingleItem(MultimediaSingleSearchRequest req) {
return Mono.just(new MultimediaSearchResult("3"));
}
}
Now, some rationale.
In the controllerEndpoint() function, first we create a Flux that will emit every single searchText from the request. We map these to MultimediaSingleSearchRequest objects, so that the services can consume them with the additional metadata that was provided with the original request.
Then, Flux::flatMap the created MultimediaSingleSearchRequest objects into a merged Flux, which (as opposed to Flux::concat) ensures that all three publishers are subscribed to eagerly i.e. they don't wait for one another. It works best on this exact scenario, when several independent publishers need to be subscribed to at the same time and their order is not important.
After the flat map, at this point, we have a Flux<MultimediaSearchResult>.
We continue with Flux::collectList, thus collecting the emitted values from all publishers (we could also use Flux::reduceWith here).
As a result, we now have a Mono<List<MultimediaSearchResult>>, which can easily be mapped to a Mono<MultimediaSearchResponse>.
The results list of the MultimediaSearchResponse will have 3 items for each searchText in the original request.
Hope this was helpful!
Edit
Extending the answer with a point of view from the service classes as well. Assuming that each inner (optionally skipped) call returns a different type of result, this would be one way of going about it:
public class MultimediaSearchResult {
private Details details;
private ContentDetails content;
private MetadataDetails metadata;
}
public Mono<MultimediaSearchResult> classOneSearchSingleItem(MultimediaSingleSearchRequest req) {
return Mono.zip(getSomeDetails(req), getContentDetails(req), getMetadataDetails(req))
.map(tuple3 -> new MultimediaSearchResult(
tuple3.getT1(),
tuple3.getT2().orElse(null),
tuple3.getT3().orElse(null)
)
);
}
// Always wanted
private Mono<Details> getSomeDetails(MultimediaSingleSearchRequest req) {
return Mono.just(new Details("details")); // api call etc.
}
// Wanted if isAddContent is true
private Mono<Optional<ContentDetails>> getContentDetails(MultimediaSingleSearchRequest req) {
return req.isAddContent()
? Mono.just(Optional.of(new ContentDetails("content-details"))) // api call etc.
: Mono.just(Optional.empty());
}
// Wanted if isAddMetadata is true
private Mono<Optional<MetadataDetails>> getMetadataDetails(MultimediaSingleSearchRequest req) {
return req.isAddMetadata()
? Mono.just(Optional.of(new MetadataDetails("metadata-details"))) // api call etc.
: Mono.just(Optional.empty());
}
Optionals are used for the requests that might be skipped, since Mono::zip will fail if either of the zipped publishers emit an empty value.
If the results of each inner call extend the same base class or are the same wrapped return type, then the original answer applies as to how they can be combined (Flux::merge etc.)

Spring WebFlux Web Client - Iterating paged REST API

I need to get the items from all pages of a pageable REST API. I also need to start processing items, as soon as they are available, not needing to wait for all the pages to be loaded. In order to do so, I'm using Spring WebFlux and its WebClient, and want to return Flux<Item>.
Also, the REST API I'm using is rate limited, and each response to it contains headers with details on the current limits:
Size of the current window
Remaining time in the current window
Request quota in window
Requests left in current window
The response to a single page request looks like:
{
"data": [],
"meta": {
"pagination": {
"total": 10,
"current": 1
}
}
}
The data array contains the actual items, while the meta object contains pagination info.
My current solution first does a "dummy" request, just to get the total number of pages, and the rate limits.
Mono<T> paginated = client.get()
.uri(uri)
.exchange()
.flatMap(response -> {
HttpHeaders headers = response.headers().asHttpHeaders();
Limits limits = new Limits();
limits.setWindowSize(headers.getFirst("X-Window-Size"));
limits.setWindowRemaining(headers.getFirst("X-Window-Remaining"));
limits.setRequestsQuota(headers.getFirst("X-Requests-Quota");
limits.setRequestsLeft(headers.getFirst("X-Requests-Remaining");
return response.bodyToMono(Paginated.class)
.map(paginated -> {
paginated.setLimits(limits);
return paginated;
});
});
Afterwards, I emit a Flux containing page numbers, and for each page, I do a REST API request, each request being delayed enough so it doesn't get past the limit, and return a Flux of extracted items:
return paginated.flatMapMany(paginated -> {
return Flux.range(1, paginated.getMeta().getPagination().getTotal())
.delayElements(Duration.ofMillis(paginated.getLimits().getWindowRemaining() / paginated.getLimits().getRequestsQuota()))
.flatMap(page -> {
return client.get()
.uri(pageUri)
.retrieve()
.bodyToMono(Item.class)
.flatMapMany(p -> Flux.fromIterable(p.getData()));
});
});
This does work, but I'm not happy with it because:
It does initial "dummy" request to get the number of pages, and then
repeats the same request to get the actual data.
It gets rate limits only with the initial request, and assumes the
limits won't change (eg, that it's the only one using the API) -
which may not be true, in which case it will get an error that it
exceeded the limit.
So my question is how to refactor it so it doesn't need the initial request (but rather get limits, page numbers and data from the first request, and continue through all pages, while updating (and respecting) the limits.
I think this code will do what you want. The idea is to make a flux that make a call to your resource server, but in the process to handle the response, to add a new event on that flux to be able to make the call to next page.
The code is composed of:
A simple wrapper to contains the next page to call and the delay to wait before executing the call
private class WaitAndNext{
private String next;
private long delay;
}
A FluxProcessor that will make HTTP call and process the response:
FluxProcessor<WaitAndNext, WaitAndNext> processor= DirectProcessor.<WaitAndNext>create();
FluxSink<WaitAndNext> sink=processor.sink();
processor
.flatMap(x-> Mono.just(x).delayElement(Duration.ofMillis(x.delay)))
.map(x-> WebClient.builder()
.baseUrl(x.next)
.defaultHeader("Accept","application/json")
.build())
.flatMap(x->x.get()
.exchange()
.flatMapMany(z->manageResponse(sink, z))
)
.subscribe(........);
I split the code with a method that only manage response: It simply unwrap your data AND add a new event to the sink (the event beeing the next page to call after the given delay)
private Flux<Data> manageResponse(FluxSink<WaitAndNext> sink, ClientResponse resp) {
if (resp.statusCode()!= HttpStatus.OK){
sink.error(new IllegalStateException("Status code invalid"));
}
WaitAndNext wn=new WaitAndNext();
HttpHeaders headers=resp.headers().asHttpHeaders();
wn.delay= Integer.parseInt(headers.getFirst("X-Window-Remaining"))/ Integer.parseInt(headers.getFirst("X-Requests-Quota"));
return resp.bodyToMono(Item.class)
.flatMapMany(p -> {
if (p.paginated.current==p.paginated.total){
sink.complete();
}else{
wn.next="https://....?page="+(p.paginated.current+1);
sink.next(wn);
}
return Flux.fromIterable(p.getData());
});
}
Now we just need to initialize the system by calling for the retrieval of the first page with no delay:
WaitAndNext wn=new WaitAndNext();
wn.next="https://....?page=1";
wn.delay=0;
sink.next(wn);

How to implement an integration test to check if my circuit breaker fallback is called?

In my application, I need to call an external endpoint and if it is too slow a fallback is activated.
The following code is an example of how my app looks like:
#FeignClient(name = "${config.name}", url = "${config.url:}", fallback = ExampleFallback.class)
public interface Example {
#RequestMapping(method = RequestMethod.GET, value = "/endpoint", produces = MediaType.APPLICATION_JSON_VALUE, consumes = MediaType.APPLICATION_JSON_VALUE)
MyReturnObject find(#RequestParam("myParam") String myParam);
}
And its fallback implementation:
#Component
public Class ExampleFallback implements Example {
private final FallbackService fallback;
#Autowired
public ExampleFallback(final FallbackService fallback) {
this.fallback = fallback;
}
#Override
public MyReturnObject find(final String myParam) {
return fallback.find(myParam);
}
Also, a configured timeout for circuit breaker:
hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds: 5000
How can I implement an integration test to check if my circuit break is working, i.e, if my endpoint (mocked in that case) is slow or if it returns an error like 4xx or 5xx?
I'm using Spring Boot 1.5.3 with Spring Cloud (Feign + Hystrix)
Note i donot know Feign or Hystrix.
In my opinion it is problematic to implement an automated integrationtest that simulates different implementatondetails of Feign+Hystrix - this implementation detail can change at any time. There are many different types of failure: primary-Endpoint not reachable, illegal data (i.e. receiving a html-errormessage, when exprecting xml data in a special format), disk-full, .....
if you mock an endpoint you make an assumption of implementationdetail of Feign+Hystrix how the endpoint behaves in a errorsituation (i.e. return null, return some specific errorcode, throw an exception of type Xyz....)
i would create only one automated integration test with a real primary-enpoint that has a never reachable url and a mocked-fallback-endpoint where you verify that the processed data comes from the mock.
This automated test assumes that handling of "networkconnection too slow" is the same as "url-notfound" from your app-s point of view.
For all other tests i would create a thin wrapper interface around Feign+Hystrix where you mock Feign+Hystrix. This way you can automatically test for example what happens if you receive 200bytes from primary interface and then get an expetion.
For details about hiding external dependencies see onion-architecture

How to enable parallelism for a custom U-SQL Extractor

I’m implementing a custom U-SQL Extractor for our internal file format (binary serialization). It works well in the "Atomic" mode:
[SqlUserDefinedExtractor(AtomicFileProcessing = true)]
public class BinaryExtractor : IExtractor
If I switch off the “Atomic“ mode, It looks like U-SQL is splitting the file in a random place (I guess just by 250MB chunks). This is not acceptable for me. The file format has a special row delimiter. Can I define a custom row delimiter in my Extractor and enable parallelism for it. Technically I can change our row delimiter to a new one if it can help.
Could anyone help me with this question?
The file is indeed split into chunks (I think it is 1 GB at the moment, but the exact value is implementation defined and may change for performance reasons).
If the file is indeed row delimited, and assuming your raw input data for the row is less than 4MB, you can use the input.Split() function inside your UDO to do the splitting into rows. The call will automatically handle the case if the raw input data spans the chunk boundary (assuming it is less than 4MB).
Here is an example:
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow outputrow)
{
// this._row_delim = this._encoding.GetBytes(row_delim); in class ctor
foreach (Stream current in input.Split(this._row_delim))
{
using (StreamReader streamReader = new StreamReader(current, this._encoding))
{
int num = 0;
string[] array = streamReader.ReadToEnd().Split(new string[]{this._col_delim}, StringSplitOptions.None);
for (int i = 0; i < array.Length; i++)
{
// DO YOUR PROCESSING
}
}
yield return outputrow.AsReadOnly();
}
}
Please note that you cannot read across chunk boundaries yourself and you should make sure your data is indeed splittable into rows.

Logging all Bro streams

I want to log all the streams the Bro has to offer. I did the following for one stream but I am not getting the desired answer.
redef LogAscii::use_json=T;
redef LogAscii::json_timestamps = JSON::TS_ISO8601;
export
{
# Append the value LOG to the Log::ID enumerable.
redef enum Log::ID += { LOG };
}
event bro_init()
{
#Create the logging stream
Log::create_stream(LOG, [$columns=IRC::Info, $path="irc"]);
Log::write(LOG, IRC::Info) ;
}
Can I get any help with this?
Are you feeding traffic into Bro? Bro will only creates log files when it generates a log line which would go into that log.
Your script doesn't execute either, you are try to pass a type (IRC::Info) into a field that expects a value of that type.
You also don't need to call Log::create_stream, it is part of the base IRC support which is loaded by default.