How to do failure tolerance for Flink to sink data to hdfs as gzip compression? - gzip

We want to write compressed data to HDFS by Flink's BucketingSink or StreamingFileSink. I have write my own Writer which works fine if no failure occurs. However when It encounters a failure and restart from checkpoint, It will generate valid-length file(hadoop < 2.7) or truncate the file. Unluckily gzips are binary files which have trailer at the end of file. Therefore simple truncation does not work in my case. Any ideas to enable exactly-once semantic for compression hdfs sink?
That's my writer's code:
public class HdfsCompressStringWriter extends StreamWriterBaseV2<JSONObject> {
private static final long serialVersionUID = 2L;
/**
* The {#code CompressFSDataOutputStream} for the current part file.
*/
private transient GZIPOutputStream compressionOutputStream;
public HdfsCompressStringWriter() {}
#Override
public void open(FileSystem fs, Path path) throws IOException {
super.open(fs, path);
this.setSyncOnFlush(true);
compressionOutputStream = new GZIPOutputStream(this.getStream(), true);
}
public void close() throws IOException {
if (compressionOutputStream != null) {
compressionOutputStream.close();
compressionOutputStream = null;
}
resetStream();
}
#Override
public void write(JSONObject element) throws IOException {
if (element == null || !element.containsKey("body")) {
return;
}
String content = element.getString("body") + "\n";
compressionOutputStream.write(content.getBytes());
compressionOutputStream.flush();
}
#Override
public Writer<JSONObject> duplicate() {
return new HdfsCompressStringWriter();
}
}

I would recommend to implement a BulkWriter for the StreamingFileSink which compresses the elements via a GZIPOutputStream. The code could look the following:
public static void main(String[] args) throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(1);
env.enableCheckpointing(1000);
final DataStream<Integer> input = env.addSource(new InfinitySource());
final StreamingFileSink<Integer> streamingFileSink = StreamingFileSink.<Integer>forBulkFormat(new Path("output"), new GzipBulkWriterFactory<>()).build();
input.addSink(streamingFileSink);
env.execute();
}
private static class GzipBulkWriterFactory<T> implements BulkWriter.Factory<T> {
#Override
public BulkWriter<T> create(FSDataOutputStream fsDataOutputStream) throws IOException {
final GZIPOutputStream gzipOutputStream = new GZIPOutputStream(fsDataOutputStream, true);
return new GzipBulkWriter<>(new ObjectOutputStream(gzipOutputStream), gzipOutputStream);
}
}
private static class GzipBulkWriter<T> implements BulkWriter<T> {
private final GZIPOutputStream gzipOutputStream;
private final ObjectOutputStream objectOutputStream;
public GzipBulkWriter(ObjectOutputStream objectOutputStream, GZIPOutputStream gzipOutputStream) {
this.gzipOutputStream = gzipOutputStream;
this.objectOutputStream = objectOutputStream;
}
#Override
public void addElement(T t) throws IOException {
objectOutputStream.writeObject(t);
}
#Override
public void flush() throws IOException {
objectOutputStream.flush();
}
#Override
public void finish() throws IOException {
objectOutputStream.flush();
gzipOutputStream.finish();
}
}

Related

Spring Cloud: testing S3 client with TestContainters

I use Spring Cloud's ResourceLoader to access S3, e.g.:
public class S3DownUpLoader {
private final ResourceLoader resourceLoader;
#Autowired
public S3DownUpLoader(ResourceLoader resourceLoader) {
this.resourceLoader = resourceLoader;
}
public String storeOnS3(String filename, byte[] data) throws IOException {
String location = "s3://" + bucket + "/" + filename;
WritableResource writeableResource = (WritableResource) this.resourceLoader.getResource(location);
FileCopyUtils.copy( data, writeableResource.getOutputStream());
return filename;
}
It works okey and I need help to test the code with Localstack/Testcontainers. I've tried following test, but it does not work - my production profile gets picked up(s3 client with localstack config is not injected):
#RunWith(SpringRunner.class)
#SpringBootTest
public class S3DownUpLoaderTest {
#ClassRule
public static LocalStackContainer localstack = new LocalStackContainer().withServices(S3);
#Autowired
S3DownUpLoader s3DownUpLoader;
#Test
public void testA() {
s3DownUpLoader.storeOnS3(...);
}
#TestConfiguration
#EnableContextResourceLoader
public static class S3Configuration {
#Primary
#Bean(destroyMethod = "shutdown")
public AmazonS3 amazonS3() {
return AmazonS3ClientBuilder
.standard()
.withEndpointConfiguration(localstack.getEndpointConfiguration(S3))
.withCredentials(localstack.getDefaultCredentialsProvider())
.build();
}
}
}
as we discussed on GitHub,
We solve this problem in a slightly different way. I've actually never seen the way you use the WritableResource, which looks very interesting. None the less, this is how we solve this issue:
#RunWith(SpringRunner.class)
#SpringBootTest(properties = "spring.profiles.active=test")
#ContextConfiguration(classes = AbstractAmazonS3Test.S3Configuration.class)
public abstract class AbstractAmazonS3Test {
private static final String REGION = Regions.EU_WEST_1.getName();
/**
* Configure S3.
*/
#TestConfiguration
public static class S3Configuration {
#Bean
public AmazonS3 amazonS3() {
//localstack docker image is running locally on port 4572 for S3
final String serviceEndpoint = String.format("http://%s:%s", "127.0.0.1", "4572");
return AmazonS3Client.builder()
.withEndpointConfiguration(new AwsClientBuilder.EndpointConfiguration(serviceEndpoint, REGION))
.withCredentials(new AWSStaticCredentialsProvider(new BasicAWSCredentials("dummyKey", "dummySecret")))
.build();
}
}
}
And a sample test:
public class CsvS3UploadServiceIntegrationTest extends AbstractAmazonS3Test {
private static final String SUCCESS_CSV = "a,b";
private static final String STANDARD_STORAGE = "STANDARD";
#Autowired
private AmazonS3 s3;
#Autowired
private S3ConfigurationProperties properties;
#Autowired
private CsvS3UploadService service;
#Before
public void setUp() {
s3.createBucket(properties.getBucketName());
}
#After
public void tearDown() {
final String bucketName = properties.getBucketName();
s3.listObjects(bucketName).getObjectSummaries().stream()
.map(S3ObjectSummary::getKey)
.forEach(key -> s3.deleteObject(bucketName, key));
s3.deleteBucket(bucketName);
}
#Test
public void uploadSuccessfulCsv() {
service.uploadSuccessfulCsv(SUCCESS_CSV);
final S3ObjectSummary s3ObjectSummary = getOnlyFileFromS3();
assertThat(s3ObjectSummary.getKey(), containsString("-success.csv"));
assertThat(s3ObjectSummary.getETag(), is("b345e1dc09f20fdefdea469f09167892"));
assertThat(s3ObjectSummary.getStorageClass(), is(STANDARD_STORAGE));
assertThat(s3ObjectSummary.getSize(), is(3L));
}
private S3ObjectSummary getOnlyFileFromS3() {
final ObjectListing listing = s3.listObjects(properties.getBucketName());
final List<S3ObjectSummary> objects = listing.getObjectSummaries();
assertThat(objects, iterableWithSize(1));
return Iterables.getOnlyElement(objects);
}
}
And the code under test:
#Service
#RequiredArgsConstructor
#EnableConfigurationProperties(S3ConfigurationProperties.class)
public class CsvS3UploadServiceImpl implements CsvS3UploadService {
private static final String CSV_MIME_TYPE = CSV_UTF_8.toString();
private final AmazonS3 amazonS3;
private final S3ConfigurationProperties properties;
private final S3ObjectKeyService s3ObjectKeyService;
#Override
public void uploadSuccessfulCsv(final String source) {
final String key = s3ObjectKeyService.getSuccessKey();
doUpload(source, key, getObjectMetadata(source));
}
private void doUpload(final String source, final String key, final ObjectMetadata metadata) {
try (ReaderInputStream in = new ReaderInputStream(new StringReader(source), UTF_8)) {
final PutObjectRequest request = new PutObjectRequest(properties.getBucketName(), key, in, metadata);
amazonS3.putObject(request);
} catch (final IOException ioe) {
throw new CsvUploadException("Unable to upload " + key, ioe);
}
}
private ObjectMetadata getObjectMetadata(final String source) {
final ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentType(CSV_MIME_TYPE);
metadata.setContentLength(source.getBytes(UTF_8).length);
metadata.setContentMD5(getMD5ChecksumAsBase64(source));
metadata.setSSEAlgorithm(SSEAlgorithm.KMS.getAlgorithm());
return metadata;
}
private String getMD5ChecksumAsBase64(final String source) {
final HashCode md5 = Hashing.md5().hashString(source, UTF_8);
return Base64.getEncoder().encodeToString(md5.asBytes());
}
}
It seems the only way to provide custom amazonS3 bean for ResourceLoader is to inject it manually. The test looks like
#RunWith(SpringRunner.class)
#SpringBootTest
#ContextConfiguration(classes = S3DownUpLoaderTest.S3Configuration.class)
public class S3DownUpLoaderTest implements ApplicationContextAware {
private static final String BUCKET_NAME = "bucket";
#ClassRule
public static LocalStackContainer localstack = new LocalStackContainer().withServices(S3);
#Autowired
S3DownUpLoader s3DownUpLoader;
#Autowired
SimpleStorageProtocolResolver resourceLoader;
#Autowired
AmazonS3 amazonS3;
#Before
public void setUp(){
amazonS3.createBucket(BUCKET_NAME);
}
#Test
public void someTestA() throws IOException {
....
}
#After
public void tearDown(){
ObjectListing object_listing = amazonS3.listObjects(QLM_BUCKET_NAME);
while (true) {
for (S3ObjectSummary summary : object_listing.getObjectSummaries()) {
amazonS3.deleteObject(BUCKET_NAME, summary.getKey());
}
// more object_listing to retrieve?
if (object_listing.isTruncated()) {
object_listing = amazonS3.listNextBatchOfObjects(object_listing);
} else {
break;
}
};
amazonS3.deleteBucket(BUCKET_NAME);
}
#Override
public void setApplicationContext(ApplicationContext applicationContext) throws BeansException {
if (applicationContext instanceof ConfigurableApplicationContext) {
ConfigurableApplicationContext configurableApplicationContext = (ConfigurableApplicationContext) applicationContext;
configurableApplicationContext.addProtocolResolver(this.resourceLoader);
}
}
public static class S3Configuration {
#Bean
public S3DownUpLoader s3DownUpLoader(ResourceLoader resourceLoader){
return new S3DownUpLoader(resourceLoader);
}
#Bean(destroyMethod = "shutdown")
public AmazonS3 amazonS3() {
return AmazonS3ClientBuilder
.standard()
.withEndpointConfiguration(localstack.getEndpointConfiguration(S3))
.withCredentials(localstack.getDefaultCredentialsProvider())
.build();
}
#Bean
public SimpleStorageProtocolResolver resourceLoader(){
return new SimpleStorageProtocolResolver(amazonS3());
}
}

Unit testing a jee filter

I am trying to test this filter:
public class HttpMethodOverrideHeaderFilter extends OncePerRequestFilter {
private static final String X_HTTP_METHOD_OVERRIDE_HEADER = "X-HTTP-Method-Override";
#Override
protected void doFilterInternal(HttpServletRequest request, HttpServletResponse response, FilterChain filterChain)
throws ServletException, IOException {
if (isMethodOverriden(request)) {
HttpServletRequest wrapper = new HttpMethodRequestWrapper(request, request.getHeader(X_HTTP_METHOD_OVERRIDE_HEADER).toUpperCase(Locale.ENGLISH));
filterChain.doFilter(wrapper, response);
}
else {
filterChain.doFilter(request, response);
}
}
private boolean isMethodOverriden(HttpServletRequest request) {
String methodOverride = request.getHeader(X_HTTP_METHOD_OVERRIDE_HEADER);
return RequestMethod.POST.name().equalsIgnoreCase(request.getMethod()) &&
(RequestMethod.PUT.name().equalsIgnoreCase(methodOverride) || RequestMethod.DELETE.name().equalsIgnoreCase(methodOverride));
}
protected static class HttpMethodRequestWrapper extends HttpServletRequestWrapper {
private final String method;
public HttpMethodRequestWrapper(HttpServletRequest request, String method) {
super(request);
this.method = method;
}
#Override
public String getMethod() {
return this.method;
}
}
}
And this is the unit test:
#RunWith(MockitoJUnitRunner.class)
public class HttpMethodOverrideHeaderFilterTest {
private static final String X_HTTP_METHOD_OVERRIDE_HEADER = "X-HTTP-Method-Override";
private HttpMethodOverrideHeaderFilter httpMethodOverrideHeaderFilter;
#Mock
private HttpServletRequest httpServletRequest;
#Mock
private HttpServletResponse httpServletResponse;
#Mock
private FilterChain filterChain;
#Before
public void setUp() {
httpMethodOverrideHeaderFilter = new HttpMethodOverrideHeaderFilter();
}
#Test
public void testDoFilterInternalWithPUTMethodAsOverrideHeader() throws Exception {
when(httpServletRequest.getHeader(X_HTTP_METHOD_OVERRIDE_HEADER)).thenReturn("PUT");
when(httpServletRequest.getMethod()).thenReturn("POST");
HttpServletRequest wrapper = new HttpMethodOverrideHeaderFilter.HttpMethodRequestWrapper(httpServletRequest, "PUT");
httpMethodOverrideHeaderFilter.doFilterInternal(httpServletRequest, httpServletResponse, filterChain);
verify(filterChain).doFilter(wrapper, httpServletResponse);
}
}
The test is not passing as wrapper is not the same instance. Basically what I need to know is if the wrapper was set the PUT method. Any ideas?
I found a way to do it:
#Test
public void testDoFilterInternalWithPUTMethodAsOverrideHeader() throws Exception {
when(httpServletRequest.getHeader(X_HTTP_METHOD_OVERRIDE_HEADER)).thenReturn("PUT");
when(httpServletRequest.getMethod()).thenReturn("POST");
httpMethodOverrideHeaderFilter.doFilterInternal(httpServletRequest, httpServletResponse, filterChain);
ArgumentCaptor<ServletRequest> requestCaptor = ArgumentCaptor.forClass(ServletRequest.class);
ArgumentCaptor<ServletResponse> responseCaptor = ArgumentCaptor.forClass(ServletResponse.class);
verify(filterChain).doFilter(requestCaptor.capture(), responseCaptor.capture());
HttpMethodOverrideHeaderFilter.HttpMethodRequestWrapper wrapper = (HttpMethodOverrideHeaderFilter.HttpMethodRequestWrapper) requestCaptor.getValue();
assertEquals(wrapper.getMethod(), "PUT");
}
if anyone know any better way, let me know!!!

How Test PUT RestController in Spring Boot

How can I test one PUT request with Spring Boot??
I have this method:
#RequestMapping(method = RequestMethod.PUT, value = "/")
public NaturezaTitulo save(#RequestBody NaturezaTitulo naturezaTitulo){
return naturezaTituloService.save(naturezaTitulo);
}
and this test class:
#RunWith(SpringJUnit4ClassRunner.class)
#SpringApplicationConfiguration(classes = Application.class)
#WebAppConfiguration
public class NaturezaTituloControllerTest {
private MediaType contentType = new MediaType(MediaType.APPLICATION_JSON.getType(),
MediaType.APPLICATION_JSON.getSubtype(),
Charset.forName("utf8"));
private MockMvc mockMvc;
private HttpMessageConverter mappingJackson2HttpMessageConverter;
private List<NaturezaTitulo> naturezaTituloList = new ArrayList<>();
#Autowired
private WebApplicationContext webApplicationContext;
#Autowired
void setConverters(HttpMessageConverter<?>[] converters) {
this.mappingJackson2HttpMessageConverter = Arrays.asList(converters).stream().filter(
hmc -> hmc instanceof MappingJackson2HttpMessageConverter).findAny().get();
Assert.assertNotNull("the JSON message converter must not be null",
this.mappingJackson2HttpMessageConverter);
}
#Before
public void setup() throws Exception {
this.mockMvc = webAppContextSetup(webApplicationContext).build();
}
#Test
public void naturezaTituloNotFound() throws Exception {
mockMvc.perform(get("/naturezatitulo/55ce2dd6222e629f4b8d6fe0"))
.andExpect(status().is4xxClientError());
}
#Test
public void naturezaTituloSave() throws Exception {
NaturezaTitulo naturezaTitulo = new NaturezaTitulo();
naturezaTitulo.setNatureza("Testando");
mockMvc.perform(put("/naturezatitulo/").content(this.json(naturezaTitulo))
.contentType(contentType))
.andExpect(jsonPath("$.id", notNullValue()));
}
protected String json(Object o) throws IOException {
MockHttpOutputMessage mockHttpOutputMessage = new MockHttpOutputMessage();
this.mappingJackson2HttpMessageConverter.write(
o, MediaType.APPLICATION_JSON, mockHttpOutputMessage);
return mockHttpOutputMessage.getBodyAsString();
}
}
but I got this error:
java.lang.IllegalArgumentException: json can not be null or empty at
com.jayway.jsonpath.internal.Utils.notEmpty(Utils.java:259)
how can I pass one object from body in Put test?
tks

Initialize public static variable in Hadoop through arguments

I have a problem with changing public static variables in Hadoop.
I am trying to pass some values as arguments to the jar file from command line.
here is my code:
public class MyClass {
public static long myvariable1 = 100;
public static class Map extends Mapper<Object, Text, Text, Text> {
public static long myvariabl2 = 200;
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
}
}
public static class Reduce extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
}
}
public static void main(String[] args) throws Exception {
col_no = Long.parseLong(args[0]);
Map.myvariable1 = Long.parseLong(args[1]);
Map.myvariable2 = Long.parseLong(args[1]);
other stuff here
}
}
But it is not working, myvariable1 & myvaribale2 always have 100 & 200.
I use Hadoop 0.20.203 with Ubuntu 10.04
What you can do to get the same behavior is to store your variables in the Configuration you use to launch the job.
public static class Map extends Mapper<Object, Text, Text, Text> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
Configuration conf = context.getConfiguration();
String var2String = conf.get("myvariable2");
long myvariable2 = Long.parseLong(var2String);
//etc.
}
}
public static void main(String[] args) throws Exception {
col_no = Long.parseLong(args[0]);
String myvariable1 = args[1];
String myvariable2 = args[1];
// add values to configuration
Configuration conf = new Configuration();
conf.set("myvariable1", myvariable1);
conf.set("myvariable2", myvariable2);
//other stuff here
}

question about simple MINA client and server

I am just trying to create a simple MINA server and client to evaluate. Here is my code.
public class Server {
private static final int PORT = 8080;
static class ServerHandler extends IoHandlerAdapter {
#Override
public void exceptionCaught(IoSession session, Throwable cause) throws Exception {
cause.printStackTrace();
}
#Override
public void sessionCreated(IoSession session) {
System.out.println("session is created");
session.write("Thank you");
}
#Override
public void sessionClosed(IoSession session) throws Exception {
System.out.println("session is closed.");
}
#Override
public void messageReceived(IoSession session, Object message) {
System.out.println("message=" + message);
session.write("Reply="+message);
}
}
/**
* #param args
*/
public static void main(String[] args) throws Exception {
SocketAcceptor acceptor = new NioSocketAcceptor();
acceptor.getFilterChain().addLast( "logger", new LoggingFilter() );
acceptor.getFilterChain().addLast( "codec", new ProtocolCodecFilter( new TextLineCodecFactory( Charset.forName( "UTF-8" ))));
acceptor.setHandler(new Server.ServerHandler());
acceptor.getSessionConfig().setReadBufferSize( 2048 );
acceptor.getSessionConfig().setIdleTime( IdleStatus.BOTH_IDLE, 10 );
acceptor.bind(new InetSocketAddress(PORT));
System.out.println("Listening on port " + PORT);
for (;;) {
Thread.sleep(3000);
}
}
}
public class Client {
private static final int PORT = 8080;
private IoSession session;
private ClientHandler handler;
public Client() {
super();
}
public void initialize() throws Exception {
handler = new ClientHandler();
NioSocketConnector connector = new NioSocketConnector();
connector.getFilterChain().addLast( "codec", new ProtocolCodecFilter( new TextLineCodecFactory( Charset.forName( "UTF-8" ))));
connector.getFilterChain().addLast("logger", new LoggingFilter());
connector.setHandler(handler);
for (;;) {
try {
ConnectFuture future = connector.connect(new InetSocketAddress(PORT));
future.awaitUninterruptibly();
session = future.getSession();
break;
} catch (RuntimeIoException e) {
System.err.println("Failed to connect.");
e.printStackTrace();
Thread.sleep(5000);
}
}
if (session == null) {
throw new Exception("Unable to get session");
}
Sender sender = new Sender();
sender.start();
session.getCloseFuture().awaitUninterruptibly();
connector.dispose();
System.out.println("client is done.");
}
/**
* #param args
*/
public static void main(String[] args) throws Exception {
Client client = new Client();
client.initialize();
}
class Sender extends Thread {
#Override
public void run() {
try {
Thread.sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}
handler.messageSent(session, "message");
}
}
class ClientHandler extends IoHandlerAdapter {
#Override
public void sessionOpened(IoSession session) {
}
#Override
public void messageSent(IoSession session, Object message) {
System.out.println("message sending=" + message);
session.write(message);
}
#Override
public void messageReceived(IoSession session, Object message) {
System.out.println("message receiving "+ message);
}
#Override
public void exceptionCaught(IoSession session, Throwable cause) {
cause.printStackTrace();
}
}
}
When I execute this code, the Client seems to keep sending a message instead of stopping after it sends. It looks to me that there is a recursive call in underlying MINA code. I know that I am doing something wrong.
Can somebody tell me how to fix this?
Thanks.
Try to initialize and start your sender and use the session within sessionOpened (ClientHandler)