Streaming objects from S3 Object using Spring Aws Integration - amazon-s3

I am working on a usecase where I am supposed to poll S3 -> read the stream for the content -> do some processing and upload it to another bucket rather than writing the file in my server.
I know I can achieve it using S3StreamingMessageSource in Spring aws integration but the problem I am facing is that I do not know on how to process the message stream received by polling
public class S3PollerConfigurationUsingStreaming {
#Value("${amazonProperties.bucketName}")
private String bucketName;
#Value("${amazonProperties.newBucket}")
private String newBucket;
#Autowired
private AmazonClientService amazonClient;
#Bean
#InboundChannelAdapter(value = "s3Channel", poller = #Poller(fixedDelay = "100"))
public MessageSource<InputStream> s3InboundStreamingMessageSource() {
S3StreamingMessageSource messageSource = new S3StreamingMessageSource(template());
messageSource.setRemoteDirectory(bucketName);
messageSource.setFilter(new S3PersistentAcceptOnceFileListFilter(new SimpleMetadataStore(),
"streaming"));
return messageSource;
}
#Bean
#Transformer(inputChannel = "s3Channel", outputChannel = "data")
public org.springframework.integration.transformer.Transformer transformer() {
return new StreamTransformer();
}
#Bean
public S3RemoteFileTemplate template() {
return new S3RemoteFileTemplate(new S3SessionFactory(amazonClient.getS3Client()));
}
#Bean
public PollableChannel s3Channel() {
return new QueueChannel();
}
#Bean
IntegrationFlow fileStreamingFlow() {
return IntegrationFlows
.from(s3InboundStreamingMessageSource(),
e -> e.poller(p -> p.fixedDelay(30, TimeUnit.SECONDS)))
.handle(streamFile())
.get();
}
}
Can someone please help me with the code to process the stream ?

Not sure what is your problem, but I see that you have a mix of concerns. If you use messaging annotations (see #InboundChannelAdapter in your config), what is the point to use the same s3InboundStreamingMessageSource in the IntegrationFlow definition?
Anyway it looks like you have already explored for yourself a StreamTransformer. This one has a charset property to convert your InputStreamfrom the remote S3 resource to the String. Otherwise it returns a byte[]. Everything else is up to you what and how to do with this converted content.
Also I don't see reason to have an s3Channel as a QueueChannel, since the start of your flow is pollable anyway by the #InboundChannelAdapter.
From big height I would say we have more questions to you, than vise versa...
UPDATE
Not clear what is your idea for InputStream processing, but that is really a fact that after S3StreamingMessageSource you are going to have exactly InputStream as a payload in the next handler.
Also not sure what is your streamFile(), but it must really expect InputStream as an input from the payload of the request message.
You also can use the mentioned StreamTransformer over there:
#Bean
IntegrationFlow fileStreamingFlow() {
return IntegrationFlows
.from(s3InboundStreamingMessageSource(),
e -> e.poller(p -> p.fixedDelay(30, TimeUnit.SECONDS)))
.transform(Transformers.fromStream("UTF-8"))
.get();
}
And the next .handle() will be ready for String as a payload.

Related

Check database for access in Spring Gateway Pre filter

I am using Spring Gateway, where I need to check further user access by Request path using DB call. My repository is like this.
public Mono<ActionMapping> getByUri(String url)
....
This is my current filter where I am using custom UsernamePasswordAuthenticationToken implementation.
#Override
public GatewayFilter apply(Config config) {
return (exchange, chain) -> exchange
.getPrincipal()
.filter(principal -> principal instanceof UserAuthenticationToken) // Custom implementation of UsernamePasswordAuthenticationToken
.cast(UserAuthenticationToken.class)
.map(userAuthenticationToken -> extractAuthoritiesAndSetThatToRequest(exchange, userAuthenticationToken))
.defaultIfEmpty(exchange)
.flatMap(chain::filter);
}
private ServerWebExchange extractAuthoritiesAndSetThatToRequest(ServerWebExchange exchange, UserAuthenticationToken authentication) {
var uriActionMapping = uriActionMappingRepository.findOneByUri(exchange.getRequest().getPath().toString()).block();
if ((uriActionMapping == null) || (authentication.getPermission().containsKey(uriActionMapping.getName()))) {
ServerHttpRequest request = exchange.getRequest()
.mutate()
.header("X-Auth", authentication.getName())
.build();
return exchange.mutate().request(request).build();
}
ServerHttpResponse response = exchange.getResponse();
response.setStatusCode(HttpStatus.UNAUTHORIZED);
response.setComplete();
return exchange.mutate().response(response).build();
}
However, there are several problems here, first that it is blocking call. Also I am not sure I need to mutate exchange to return response like that. Is there anyway achieve this using filter in Spring Cloud Gateway.
Yes, it is a blocking call.
Firstly, Spring WebFlux is based on Reactor. In Reactor, most handling method will not recieve a null from Mono emit, e.g. map, flatMap. Sure, there are counterexamples, such as doOnSuccess, see also the javadoc of Mono.
So, we can just use handling methods to filter results instead of block. Those handling methods will return a empty Mono when recieve a null value.
Secondary, when it authorize failed, we should return a empty Mono instead of calling chain.filter. The chain.filter means "It's OK! Just do something after the Filter!". See also RequestRateLimiterGatewayFilterFactory, it also mutate the response.
So, we should set response to completed, and return a empty Mono if authorize failed.
Try this:
#Override
public GatewayFilter apply(Config config) {
return (exchange, chain) -> exchange
.getPrincipal()
.filter(principal -> principal instanceof UserAuthenticationToken) // Custom implementation of UsernamePasswordAuthenticationToken
.cast(UserAuthenticationToken.class)
.flatMap(userAuthenticationToken -> extractAuthoritiesAndSetThatToRequest(exchange, userAuthenticationToken))
.switchIfEmpty(Mono.defer(() -> exchange.getResponse().setComplete().then(Mono.empty())))
.flatMap(chain::filter);
}
// Maybe return empty Mono, e.g. findOneByUri not found, or Permissions does not containing
private Mono<ServerWebExchange> extractAuthoritiesAndSetThatToRequest(ServerWebExchange exchange, UserAuthenticationToken authentication) {
return uriActionMappingRepository.findOneByUri(exchange.getRequest().getPath().toString())
.filter(it -> authentication.getPermission().containsKey(it.getName()))
.map(it -> exchange.mutate()
.request(builder -> builder.header("X-Auth", authentication.getName()))
.build());
}
About mutate request, see also RewritePathGatewayFilterFactory.

spring amqp RPC copy headers from request to response

I'm looking for a way to copy some headers from the request message to the response message when I use RabbitMq in RPC mode.
so far I have tried with setBeforeSendReplyPostProcessors but I can only access the response and add headers to it. but I don't have access to the request to get the values I need.
I have also tried with the advice chain, but the returnObject is null after proceeding so I can't modify it (I admit I don't understand why it is null... I thought I could get the object to modify it):
#Bean
public SimpleRabbitListenerContainerFactory simpleRabbitListenerContainerFactory(SimpleRabbitListenerContainerFactoryConfigurer simpleRabbitListenerContainerFactoryConfigurer, ConnectionFactory connectionFactory) {
SimpleRabbitListenerContainerFactory simpleRabbitListenerContainerFactory = new SimpleRabbitListenerContainerFactory();
simpleRabbitListenerContainerFactory.setAdviceChain(new MethodInterceptor() {
#Override
public Object invoke(MethodInvocation invocation) throws Throwable {
Object returnObject = invocation.proceed();
//returnObject is null here
return returnObject;
}
});
simpleRabbitListenerContainerFactoryConfigurer.configure(simpleRabbitListenerContainerFactory, connectionFactory);
return simpleRabbitListenerContainerFactory;
}
a working way is to change my method annotated with #RabbitListener so it returns a Message and there I can access both the requesting message (via arguments of the annotated method) and the response.
But I would like to do it automatically, since I need this feature at different places.
Basicaly I want to copy one header from the request message to the response.
this code do the job, but I want to do it through an aspect, or an interceptor.
#RabbitListener(queues = "myQueue"
, containerFactory = "simpleRabbitListenerContainerFactory")
public Message<MyResponseObject> execute(MyRequestObject myRequestObject, #Header("HEADER_TO_COPY") String headerToCopy) {
MyResponseObject myResponseObject = compute(myRequestObject);
return MessageBuilder.withPayload(myResponseObject)
.setHeader("HEADER_RESPONSE", headerToCopy)
.build();
}
The Message<?> return type support was added for this reason, but we could add an extension point to allow this, please open a GitHub issue.
Contributions are welcome.

Spring webflux : consume mono or flux from request

I have a resource API that handles an object (Product for example).
I use PUT to update this object in the database.
And I want to return just en empty Mono to the user.
There is my code :
public Mono<ServerResponse> updateProduct(ServerRequest request){
Mono<Product> productReceived = request.bodyToMono(Product.class);
Mono<Product> result = productReceived.flatMap(item -> {
doSomeThing(item);
System.out.println("Called or not called!!");
return Mono.just(productService.product);
}).subscribe();
return ok()
.contentType(APPLICATION_JSON)
.body(Mono.empty(), Product.class);
}
The problem is my method doSomeThing() and the println are not called.
NB: I use subscribe but doesn't work.
Thanks.
I had a similar issue when I was new to Webflux. In short, you can't call subscribe on the request body and asynchronously return a response because the subscription might not have enough time to read the body. You can see a full explanation of a similar issue here.
To make your code work, you should couple the response with your logic stream. It should be something like the following:
public Mono<ServerResponse> updateProduct(ServerRequest request){
return request
.bodyToMono(Product.class)
.flatMap(item -> {
doSomeThing(item);
System.out.println("Called or not called!!");
return Mono.just(productService.product);
})
.then(ServerResponse.ok().build());
}

How to catch any exceptions thrown by BigQueryIO.Write and rescue the data which is failed to output?

I want to read data from Cloud Pub/Sub and write it to BigQuery with Cloud Dataflow. Each data contains a table ID where the data itself will be saved.
There are various factors that writing to BigQuery fails:
Table ID format is wrong.
Dataset does not exist.
Dataset does not allow the pipeline to access.
Network failure.
When one of the failures occurs, a streaming job will retry the task and stall. I tried using WriteResult.getFailedInserts() in order to rescue the bad data and avoid stalling, but it did not work well. Is there any good way?
Here is my code:
public class StarterPipeline {
private static final Logger LOG = LoggerFactory.getLogger(StarterPipeline.class);
public class MyData implements Serializable {
String table_id;
}
public interface MyOptions extends PipelineOptions {
#Description("PubSub topic to read from, specified as projects/<project_id>/topics/<topic_id>")
#Validation.Required
ValueProvider<String> getInputTopic();
void setInputTopic(ValueProvider<String> value);
}
public static void main(String[] args) {
MyOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().as(MyOptions.class);
Pipeline p = Pipeline.create(options);
PCollection<MyData> input = p
.apply("ReadFromPubSub", PubsubIO.readStrings().fromTopic(options.getInputTopic()))
.apply("ParseJSON", MapElements.into(TypeDescriptor.of(MyData.class))
.via((String text) -> new Gson().fromJson(text, MyData.class)));
WriteResult writeResult = input
.apply("WriteToBigQuery", BigQueryIO.<MyData>write()
.to(new SerializableFunction<ValueInSingleWindow<MyData>, TableDestination>() {
#Override
public TableDestination apply(ValueInSingleWindow<MyData> input) {
MyData myData = input.getValue();
return new TableDestination(myData.table_id, null);
}
})
.withSchema(new TableSchema().setFields(new ArrayList<TableFieldSchema>() {{
add(new TableFieldSchema().setName("table_id").setType("STRING"));
}}))
.withFormatFunction(new SerializableFunction<MyData, TableRow>() {
#Override
public TableRow apply(MyData myData) {
return new TableRow().set("table_id", myData.table_id);
}
})
.withCreateDisposition(BigQueryIO.Write.CreateDisposition.CREATE_IF_NEEDED)
.withWriteDisposition(BigQueryIO.Write.WriteDisposition.WRITE_APPEND)
.withFailedInsertRetryPolicy(InsertRetryPolicy.neverRetry()));
writeResult.getFailedInserts()
.apply("LogFailedData", ParDo.of(new DoFn<TableRow, TableRow>() {
#ProcessElement
public void processElement(ProcessContext c) {
TableRow row = c.element();
LOG.info(row.get("table_id").toString());
}
}));
p.run();
}
}
There is no easy way to catch exceptions when writing to output in a pipeline definition. I suppose you could do it by writing a custom PTransform for BigQuery. However, there is no way to do it natively in Apache Beam. I also recommend against this because it undermines Cloud Dataflow's automatic retry functionality.
In your code example, you have the failed insert retry policy set to never retry. You can set the policy to always retry. This is only effective during something like an intermittent network failure (4th bullet point).
.withFailedInsertRetryPolicy(InsertRetryPolicy.alwaysRetry())
If the table ID format is incorrect (1st bullet point), then the CREATE_IF_NEEDED create disposition configuration should allow the Dataflow job to automatically create a new table without error, even if the table ID is incorrect.
If the dataset does not exist or there is an access permission issue to the dataset (2nd and 3rd bullet points), then my opinion is that the streaming job should stall and ultimately fail. There is no way to proceed under any circumstances without manual intervention.

Sending DataStream in Flink using sockets; serialization issue

I want to send Stream of data from VM to host machine and I am using method writeToSocket() as shown below:
joinedStreamEventDataStream.writeToSocket("192.168.1.10", 6998) ;
Here joinedStreamEventDataStream is of type DataStream<Integer,Integer>.
Can someone please tell me how should I pass serializer to above method.
Thanks in Advance
It depends a little bit on how you would like to read the data from the socket. If you expect it to be the String representation of the data, then you could do it via:
joinedStreamEventDataStream.map(new MapFunction<Type, String>() {
#Override
public String map(Type value) throws Exception {
return value.toString();
}
}).writeToSocket(hostname, port, new SimpleStringSchema());
If you want to keep Flink's serialization format, then you can do write:
joinedStreamEventDataStream.writeToSocket(
hostname,
port,
new TypeInformationSerializationSchema<>(
joinedStreamEventDataStream.getType(),
env.getConfig()));
If you want to output it in your own serialization format, then you have to implement your own SerializationSchema as pointed out by Alex.
The writeToSocket() method takes 3 arguments: a socket host and port and also an implementation of SerializationSchema interface which used to serialize your data. So your implementation maybe like this:
joinedStreamEventDataStream.writeToSocket(
"192.168.1.10", // host name
6998, // port
new SerializationSchema<Integer>() {
#Override
public byte[] serialize(Integer element) {
return ByteBuffer.allocate(4).putInt(element).array();
}
}
);
It's true if joinedStreamEventDataStream has DataStream<Integer> type.