Apache Beam - PubSub Message to several BigQuery tables based on a field - google-bigquery

I have a pipeline that load PubSub message from a topic into a BigQuery table. In order to process less data, I want to store message based on the "customer_id" field into a BigQuery table called table-{customer_id}. However, I am struggling a lot to find out how to do it correctly.
Below, you will find my pipeline that is working but does not write into several tables.
public class PubsubAvroToBigQuery {
public interface Options extends PipelineOptions, StreamingOptions {
#Description("The Cloud Pub/Sub topic to read from.")
#Required
ValueProvider<String> getInputTopic();
void setInputTopic(ValueProvider<String> value);
#Description("BigQuery Table")
#Required
ValueProvider<String> getBigQueryTable();
void setBigQueryTable(ValueProvider<String> value);
}
public static void main(String[] args) {
Options options = PipelineOptionsFactory.fromArgs(args).withValidation().as(Options.class);
options.setStreaming(true);
try {
run(options);
} catch (IOException e) {
e.printStackTrace();
}
}
public static PipelineResult run(Options options) throws IOException {
// Create the pipeline
options.setStreaming(true);
Pipeline pipeline = Pipeline.create(options);
// load avro schema from classpath
Schema avroSchema = ApiCalls.SCHEMA$;
pipeline
.apply(
"Read PubSub Avro Message",
PubsubIO.readAvroGenericRecords(avroSchema).fromTopic(options.getInputTopic())
)
.apply("Write to BigQuery", BigQueryIO.<GenericRecord>write()
.to(options.getBigQueryTable())
.useBeamSchema()
.withWriteDisposition(WriteDisposition.WRITE_APPEND)
.withCreateDisposition(CreateDisposition.CREATE_IF_NEEDED)
.optimizedWrites());
return pipeline.run();
}
}
I did lot of test based on TupleTag or even Windows (with data grouped) but it didn't work...
Hope you can help me
Thank you

To achieve your Use Case, you could use the DynamicDestinations class to chose the TableDestination to use according to the input event.
You can use it, as stated in the Javadoc like this:
events.apply(BigQueryIO.<UserEvent>write()
.to(new DynamicDestinations<UserEvent, String>() {
public String getDestination(ValueInSingleWindow<UserEvent> element) {
// here read your input and extract the customer_id
return element.getValue().getUserId();
}
public TableDestination getTable(String user) {
// here build the destination table based on what is output by getDestination method
return new TableDestination(tableForUser(user), "Table for user " + user);
}
public TableSchema getSchema(String user) {
return tableSchemaForUser(user);
}
})
.withFormatFunction(new SerializableFunction<UserEvent, TableRow>() {
public TableRow apply(UserEvent event) {
return convertUserEventToTableRow(event);
}
}));
Note that you can create a class that implements DynamicDestinations with a constructor to pass parameters that could then be used in the implemented methods.

Related

Migration guide for Schedulers.enableMetrics() in project Reactor

I noticed that Schedulers.enableMetrics() got deprecated but I don't know what I should I do to get all my schedulers metered in a typical use case (using Spring Boot application).
Javadoc suggests using timedScheduler but how should it be achieved for Spring Boot?
First off, here are my thoughts on why the Schedulers.enableMetrics() approach was deprecated:
The previous approach was flawed in several ways:
intrinsic dependency on the MeterRegistry#globalRegistry() without any way of using a different registry.
wrong level of abstraction and limited instrumentation:
it was not the schedulers themselves that were instrumented, but individual ExecutorService instances assumed to back the schedulers.
schedulers NOT backed by any ExecutorService couldn't be instrumented.
schedulers backed by MULTIPLE ExecutorService (eg. a pool of workers) would produce multiple levels of metrics difficult to aggregate.
instrumentation was all-or-nothing, potentially polluting metrics backend with metrics from global or irrelevant schedulers.
A deliberate constraint of the new approach is that each Scheduler must be explicitly wrapped, which ensures that the correct MeterRegistry is used and that metrics are recognizable and aggregated for that particular Scheduler (thanks to the mandatory metricsPrefix).
I'm not a Spring Boot expert, but if you really want to instrument all the schedulers including the global ones here is a naive approach that will aggregate data from all the schedulers of same category, demonstrated in a Spring Boot app:
#SpringBootApplication
public class DemoApplication {
public static void main(String[] args) {
SpringApplication.run(DemoApplication.class, args);
}
#Configuration
static class SchedulersConfiguration {
#Bean
#Order(1)
public Scheduler originalScheduler() {
// For comparison, we can capture a new original Scheduler (which won't be disposed by setFactory, unlike the global ones)
return Schedulers.newBoundedElastic(4, 100, "compare");
}
#Bean
public SimpleMeterRegistry registry() {
return new SimpleMeterRegistry();
}
#Bean
public Schedulers.Factory instrumentedSchedulers(SimpleMeterRegistry registry) {
// Let's create a Factory that does the same as the default Schedulers factory in Reactor-Core, but with instrumentation
return new Schedulers.Factory() {
#Override
public Scheduler newBoundedElastic(int threadCap, int queuedTaskCap, ThreadFactory threadFactory, int ttlSeconds) {
// The default implementation maps to the vanilla Schedulers so we can delegate to that
Scheduler original = Schedulers.Factory.super.newBoundedElastic(threadCap, queuedTaskCap, threadFactory, ttlSeconds);
// IMPORTANT NOTE: in this example _all_ the schedulers of the same type will share the same prefix/name
// this would especially be problematic if gauges were involved as they replace old gauges of the same name.
// Fortunately, for now, TimedScheduler only uses counters, timers and longTaskTimers.
String prefix = "my.instrumented.boundedElastic"; // TimedScheduler will add `.scheduler.xxx` to that prefix
return Micrometer.timedScheduler(original, registry, prefix);
}
#Override
public Scheduler newParallel(int parallelism, ThreadFactory threadFactory) {
Scheduler original = Schedulers.Factory.super.newParallel(parallelism, threadFactory);
String prefix = "my.instrumented.parallel"; // TimedScheduler will add `.scheduler.xxx` to that prefix
return Micrometer.timedScheduler(original, registry, prefix);
}
#Override
public Scheduler newSingle(ThreadFactory threadFactory) {
Scheduler original = Schedulers.Factory.super.newSingle(threadFactory);
String prefix = "my.instrumented.single"; // TimedScheduler will add `.scheduler.xxx` to that prefix
return Micrometer.timedScheduler(original, registry, prefix);
}
};
}
#PreDestroy
void resetFactories() {
System.err.println("Resetting Schedulers Factory to default");
// Later on if we want to disable instrumentation we can reset the Factory to defaults (closing all instrumented schedulers)
Schedulers.resetFactory();
}
}
#Service
public static class Demo implements ApplicationRunner {
final Scheduler forComparison;
final SimpleMeterRegistry registry;
final Schedulers.Factory factory;
Demo(Scheduler forComparison, SimpleMeterRegistry registry, Schedulers.Factory factory) {
this.forComparison = forComparison;
this.registry = registry;
this.factory = factory;
Schedulers.setFactory(factory);
}
public void generateMetrics() {
Schedulers.boundedElastic().schedule(() -> {});
Schedulers.newBoundedElastic(4, 100, "bounded1").schedule(() -> {});
Schedulers.newBoundedElastic(4, 100, "bounded2").schedule(() -> {});
Micrometer.timedScheduler(
forComparison,
registry,
"my.custom.instrumented.bounded"
).schedule(() -> {});
Schedulers.newBoundedElastic(4, 100, "bounded3").schedule(() -> {});
}
public String getCompletedSummary() {
return Search.in(registry)
.name(n -> n.endsWith(".scheduler.tasks.completed"))
.timers()
.stream()
.map(c -> c.getId().getName() + "=" + c.count())
.collect(Collectors.joining("\n"));
}
#Override
public void run(ApplicationArguments args) throws Exception {
generateMetrics();
System.err.println(getCompletedSummary());
}
}
}
Which prints:
my.instrumented.boundedElastic.scheduler.tasks.completed=4
my.custom.instrumented.bounded.scheduler.tasks.completed=1
Notice how the metrics for the four instrumentedFactory-produced Scheduler are aggregated together.
There's a bit of a hacky workaround for this: by default Schedulers uses ReactorThreadFactory, an internal private class which happens to be a Supplier<String>, supplying the "simplified name" (ie toString but without the configuration options) of the Scheduler.
One could use the following method to tentatively extract that name:
static String inferSimpleSchedulerName(ThreadFactory threadFactory, String defaultName) {
if (!(threadFactory instanceof Supplier)) {
return defaultName;
}
Object supplied = ((Supplier<?>) threadFactory).get();
if (!(supplied instanceof String)) {
return defaultName;
}
return (String) supplied;
}
Which can be applied to eg. the newParallel method in the factory:
String simplifiedName = inferSimpleSchedulerName(threadFactory, "para???");
String prefix = "my.instrumented." + simplifiedName; // TimedScheduler will add `.scheduler.xxx` to that prefix
This can then be demonstrated by submitting a few tasks to different parallel schedulers in the Demo#generateMetrics() part:
Schedulers.parallel().schedule(() -> {});
Schedulers.newParallel("paraOne").schedule(() -> {});
Schedulers.newParallel("paraTwo").schedule(() -> {});
And now it prints (blank lines for emphasis):
my.instrumented.paraOne.scheduler.tasks.completed=1
my.instrumented.paraTwo.scheduler.tasks.completed=1
my.instrumented.parallel.scheduler.tasks.completed=1
my.custom.instrumented.bounded.scheduler.tasks.completed=1
my.instrumented.boundedElastic.scheduler.tasks.completed=4

stop polling files when rabbitmq is down: spring integration

I'm working on a project where we are polling files from a sftp server and streaming it out into a object on the rabbitmq queue. Now when the rabbitmq is down it still polls and deletes the file from the server and losses the file while sending it on queue when rabbitmq is down. I'm using ExpressionEvaluatingRequestHandlerAdvice to remove the file on successful transformation. My code looks like this:
#Bean
public SessionFactory<ChannelSftp.LsEntry> sftpSessionFactory() {
DefaultSftpSessionFactory factory = new DefaultSftpSessionFactory(true);
factory.setHost(sftpProperties.getSftpHost());
factory.setPort(sftpProperties.getSftpPort());
factory.setUser(sftpProperties.getSftpPathUser());
factory.setPassword(sftpProperties.getSftpPathPassword());
factory.setAllowUnknownKeys(true);
return new CachingSessionFactory<>(factory);
}
#Bean
public SftpRemoteFileTemplate sftpRemoteFileTemplate() {
return new SftpRemoteFileTemplate(sftpSessionFactory());
}
#Bean
#InboundChannelAdapter(channel = TransformerChannel.TRANSFORMER_OUTPUT, autoStartup = "false",
poller = #Poller(value = "customPoller"))
public MessageSource<InputStream> sftpMessageSource() {
SftpStreamingMessageSource messageSource = new SftpStreamingMessageSource(sftpRemoteFileTemplate,
null);
messageSource.setRemoteDirectory(sftpProperties.getSftpDirPath());
messageSource.setFilter(new SftpPersistentAcceptOnceFileListFilter(new SimpleMetadataStore(),
"streaming"));
messageSource.setFilter(new SftpSimplePatternFileListFilter("*.txt"));
return messageSource;
}
#Bean
#Transformer(inputChannel = TransformerChannel.TRANSFORMER_OUTPUT,
outputChannel = SFTPOutputChannel.SFTP_OUTPUT,
adviceChain = "deleteAdvice")
public org.springframework.integration.transformer.Transformer transformer() {
return new SFTPTransformerService("UTF-8");
}
#Bean
public ExpressionEvaluatingRequestHandlerAdvice deleteAdvice() {
ExpressionEvaluatingRequestHandlerAdvice advice = new ExpressionEvaluatingRequestHandlerAdvice();
advice.setOnSuccessExpressionString(
"#sftpRemoteFileTemplate.remove(headers['file_remoteDirectory'] + headers['file_remoteFile'])");
advice.setPropagateEvaluationFailures(false);
return advice;
}
I don't want the files to get removed/polled from the remote sftp server when the rabbitmq server is down. How can i achieve this ?
UPDATE
Apologies for not mentioning that I'm using spring cloud stream rabbit binder. And here is the transformer service:
public class SFTPTransformerService extends StreamTransformer {
public SFTPTransformerService(String charset) {
super(charset);
}
#Override
protected Object doTransform(Message<?> message) throws Exception {
String fileName = message.getHeaders().get("file_remoteFile", String.class);
Object fileContents = super.doTransform(message);
return new customFileDTO(fileName, (String) fileContents);
}
}
UPDATE-2
I added TransactionSynchronizationFactory on the customPoller as suggested. Now it doesn't poll file when rabbit server is down, but when the server is up, it keeps on polling the same file over and over again!! I cannot figure it out why? I guess i cannot use PollerSpec cause im on 4.3.2 version.
#Bean(name = "customPoller")
public PollerMetadata pollerMetadataDTX(StartStopTrigger startStopTrigger,
CustomTriggerAdvice customTriggerAdvice) {
PollerMetadata pollerMetadata = new PollerMetadata();
pollerMetadata.setAdviceChain(Collections.singletonList(customTriggerAdvice));
pollerMetadata.setTrigger(startStopTrigger);
pollerMetadata.setMaxMessagesPerPoll(Long.valueOf(sftpProperties.getMaxMessagePoll()));
ExpressionEvaluatingTransactionSynchronizationProcessor syncProcessor =
new ExpressionEvaluatingTransactionSynchronizationProcessor();
syncProcessor.setBeanFactory(applicationContext.getAutowireCapableBeanFactory());
syncProcessor.setBeforeCommitChannel(
applicationContext.getBean(TransformerChannel.TRANSFORMER_OUTPUT, MessageChannel.class));
syncProcessor
.setAfterCommitChannel(
applicationContext.getBean(SFTPOutputChannel.SFTP_OUTPUT, MessageChannel.class));
syncProcessor.setAfterCommitExpression(new SpelExpressionParser().parseExpression(
"#sftpRemoteFileTemplate.remove(headers['file_remoteDirectory'] + headers['file_remoteFile'])"));
DefaultTransactionSynchronizationFactory defaultTransactionSynchronizationFactory =
new DefaultTransactionSynchronizationFactory(syncProcessor);
pollerMetadata.setTransactionSynchronizationFactory(defaultTransactionSynchronizationFactory);
return pollerMetadata;
}
I don't know if you need this info but my CustomTriggerAdvice and StartStopTrigger looks like this :
#Component
public class CustomTriggerAdvice extends AbstractMessageSourceAdvice {
#Autowired private StartStopTrigger startStopTrigger;
#Override
public boolean beforeReceive(MessageSource<?> source) {
return true;
}
#Override
public Message<?> afterReceive(Message<?> result, MessageSource<?> source) {
if (result == null) {
if (startStopTrigger.getStart()) {
startStopTrigger.stop();
}
} else {
if (!startStopTrigger.getStart()) {
startStopTrigger.stop();
}
}
return result;
}
}
public class StartStopTrigger implements Trigger {
private PeriodicTrigger startTrigger;
private boolean start;
public StartStopTrigger(PeriodicTrigger startTrigger, boolean start) {
this.startTrigger = startTrigger;
this.start = start;
}
#Override
public Date nextExecutionTime(TriggerContext triggerContext) {
if (!start) {
return null;
}
start = true;
return startTrigger.nextExecutionTime(triggerContext);
}
public void stop() {
start = false;
}
public void start() {
start = true;
}
public boolean getStart() {
return this.start;
}
}
Well, would be great to see what your SFTPTransformerService and determine how it is possible to perform an onSuccessExpression when there should be an exception in case of down broker.
You also should not only throw an exception do not perform delete, but consider to add a RequestHandlerRetryAdvice to re-send the file to the RabbitMQ: https://docs.spring.io/spring-integration/docs/5.0.6.RELEASE/reference/html/messaging-endpoints-chapter.html#retry-advice
UPDATE
So, well, since Gary guessed that you use Spring Cloud Stream to send message to the Rabbit Binder after your internal process (very sad that you didn't share that information originally), you need to take a look to the Binder error handling on the matter: https://docs.spring.io/spring-cloud-stream/docs/Elmhurst.RELEASE/reference/htmlsingle/#_retry_with_the_rabbitmq_binder
And that is true that ExpressionEvaluatingRequestHandlerAdvice is applied only for the SFTPTransformerService and nothing more. The downstream error (in the Binder) is not included in this process already.
UPDATE 2
Yeah... I think Gary is right, and we don't have choice unless configure a TransactionSynchronizationFactory on the customPoller level instead of that ExpressionEvaluatingRequestHandlerAdvice: ExpressionEvaluatingRequestHandlerAdvice .
The DefaultTransactionSynchronizationFactory can be configured with the ExpressionEvaluatingTransactionSynchronizationProcessor, which has similar goal as the mentioned ExpressionEvaluatingRequestHandlerAdvice, but on the transaction level which will include your process starting with the SFTP Channel Adapter and ending on the Rabbit Binder level with the send to AMQP attempts.
See Reference Manual for more information: https://docs.spring.io/spring-integration/reference/html/transactions.html#transaction-synchronization.
The point with the ExpressionEvaluatingRequestHandlerAdvice (and any AbstractRequestHandlerAdvice) that they have a boundary only around handleRequestMessage() method, therefore only during the component they are declared.

Storm Kafkaspout KryoSerialization issue for java bean from kafka topic

Hi I am new to Storm and Kafka.
I am using storm 1.0.1 and kafka 0.10.0
we have a kafkaspout that would receive java bean from kafka topic.
I have spent several hours digging to find the right approach for that.
Found few articles which are useful but none of the approaches worked for me so far.
Following is my codes:
StormTopology:
public class StormTopology {
public static void main(String[] args) throws Exception {
//Topo test /zkroot test
if (args.length == 4) {
System.out.println("started");
BrokerHosts hosts = new ZkHosts("localhost:2181");
SpoutConfig kafkaConf1 = new SpoutConfig(hosts, args[1], args[2],
args[3]);
kafkaConf1.zkRoot = args[2];
kafkaConf1.useStartOffsetTimeIfOffsetOutOfRange = true;
kafkaConf1.startOffsetTime = kafka.api.OffsetRequest.LatestTime();
kafkaConf1.scheme = new SchemeAsMultiScheme(new KryoScheme());
KafkaSpout kafkaSpout1 = new KafkaSpout(kafkaConf1);
System.out.println("started");
ShuffleBolt shuffleBolt = new ShuffleBolt(args[1]);
AnalysisBolt analysisBolt = new AnalysisBolt(args[1]);
TopologyBuilder topologyBuilder = new TopologyBuilder();
topologyBuilder.setSpout("kafkaspout", kafkaSpout1, 1);
//builder.setBolt("counterbolt2", countbolt2, 3).shuffleGrouping("kafkaspout");
//This is for field grouping in bolt we need two bolt for field grouping or it wont work
topologyBuilder.setBolt("shuffleBolt", shuffleBolt, 3).shuffleGrouping("kafkaspout");
topologyBuilder.setBolt("analysisBolt", analysisBolt, 5).fieldsGrouping("shuffleBolt", new Fields("trip"));
Config config = new Config();
config.registerSerialization(VehicleTrip.class, VehicleTripKyroSerializer.class);
config.setDebug(true);
config.setNumWorkers(1);
LocalCluster cluster = new LocalCluster();
cluster.submitTopology(args[0], config, topologyBuilder.createTopology());
// StormSubmitter.submitTopology(args[0], config,
// builder.createTopology());
} else {
System.out
.println("Insufficent Arguements - topologyName kafkaTopic ZKRoot ID");
}
}
}
I am serializing the data at kafka using kryo
KafkaProducer:
public class StreamKafkaProducer {
private static Producer producer;
private final Properties props = new Properties();
private static final StreamKafkaProducer KAFKA_PRODUCER = new StreamKafkaProducer();
private StreamKafkaProducer(){
props.put("bootstrap.servers", "localhost:9092");
props.put("acks", "all");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "com.abc.serializer.MySerializer");
producer = new org.apache.kafka.clients.producer.KafkaProducer(props);
}
public static StreamKafkaProducer getStreamKafkaProducer(){
return KAFKA_PRODUCER;
}
public void produce(String topic, VehicleTrip vehicleTrip){
ProducerRecord<String,VehicleTrip> producerRecord = new ProducerRecord<>(topic,vehicleTrip);
producer.send(producerRecord);
//producer.close();
}
public static void closeProducer(){
producer.close();
}
}
Kyro Serializer:
public class DataKyroSerializer extends Serializer<Data> implements Serializable {
#Override
public void write(Kryo kryo, Output output, VehicleTrip vehicleTrip) {
output.writeLong(data.getStartedOn().getTime());
output.writeLong(data.getEndedOn().getTime());
}
#Override
public Data read(Kryo kryo, Input input, Class<VehicleTrip> aClass) {
Data data = new Data();
data.setStartedOn(new Date(input.readLong()));
data.setEndedOn(new Date(input.readLong()));
return data;
}
I need to get the data back to the Data bean.
As per few articles I need to provide with a custom scheme and make it part of topology but till now I have no luck
Code for Bolt and Scheme
Scheme:
public class KryoScheme implements Scheme {
private ThreadLocal<Kryo> kryos = new ThreadLocal<Kryo>() {
protected Kryo initialValue() {
Kryo kryo = new Kryo();
kryo.addDefaultSerializer(Data.class, new DataKyroSerializer());
return kryo;
};
};
#Override
public List<Object> deserialize(ByteBuffer ser) {
return Utils.tuple(kryos.get().readObject(new ByteBufferInput(ser.array()), Data.class));
}
#Override
public Fields getOutputFields( ) {
return new Fields( "data" );
}
}
and bolt:
public class AnalysisBolt implements IBasicBolt {
/**
*
*/
private static final long serialVersionUID = 1L;
private String topicname = null;
public AnalysisBolt(String topicname) {
this.topicname = topicname;
}
public void prepare(Map stormConf, TopologyContext topologyContext) {
System.out.println("prepare");
}
public void execute(Tuple input, BasicOutputCollector collector) {
System.out.println("execute");
Fields fields = input.getFields();
try {
JSONObject eventJson = (JSONObject) JSONSerializer.toJSON((String) input
.getValueByField(fields.get(1)));
String StartTime = (String) eventJson.get("startedOn");
String EndTime = (String) eventJson.get("endedOn");
String Oid = (String) eventJson.get("_id");
int V_id = (Integer) eventJson.get("vehicleId");
//call method getEventForVehicleWithinTime(Long vehicleId, Date startTime, Date endTime)
System.out.println("==========="+Oid+"| "+V_id+"| "+StartTime+"| "+EndTime);
} catch (Exception e) {
e.printStackTrace();
}
}
but if I submit the storm topology i am getting error:
java.lang.IllegalStateException: Spout 'kafkaspout' contains a
non-serializable field of type com.abc.topology.KryoScheme$1, which
was instantiated prior to topology creation.
com.minda.iconnect.topology.KryoScheme$1 should be instantiated within
the prepare method of 'kafkaspout at the earliest.
Appreciate help to debug the issue and guide to right path.
Thanks
Your ThreadLocal is not Serializable. The preferable solution would be to make your serializer both Serializable and threadsafe. If this is not possible, then I see 2 alternatives since there is no prepare method as you would get in a bolt.
Declare it as static, which is inherently transient.
Declare it transient and access it via a private get method. Then you can initialize the variable on first access.
Within the Storm lifecycle, the topology is instantiated and then serialized to byte format to be stored in ZooKeeper, prior to the topology being executed. Within this step, if a spout or bolt within the topology has an initialized unserializable property, serialization will fail.
If there is a need for a field that is unserializable, initialize it within the bolt or spout's prepare method, which is run after the topology is delivered to the worker.
Source: Best Practices for implementing Apache Storm

Spring JDBC and Java 8 - JDBCTemplate: retrieving SQL statement and parameters for debugging

I am using Spring JDBC and some nice Java 8 lambda-syntax to execute queries with the JDBCTemplate.
The reason for choosing Springs JDBCTemplate, is the implicit resource-handling that Spring-jdbc offers (I do NOT want a ORM framework for my simple usecase's).
My problem is that I want to debug the whole SQL statements with their parameters. Spring prints the SQL by default but not the parameters. Therefor I have subclassed the JDBCTemplate and overridden a query-method.
An example usage of the JDBCTemplate:
public List<Product> getProductsByModel(String modelName) {
List<Product> productList = jdbcTemplate.query(
"select * from product p, productmodel m " +
"where p.modelId = m.id " +
"and m.name = ?",
(rs, rowNum) -> new Product(
rs.getInt("id"),
rs.getString("stc_number"),
rs.getString("version"),
getModelById(rs.getInt("modelId")), // method not shown
rs.getString("displayName"),
rs.getString("imageUrl")
),
modelName);
return productList;
}
To get hold of the parameters I have, as mentioned, overridden the JDBCTemplate class. By doing a cast and using reflection I get the Object[] field with the parameters from an instance of ArgumentPreparedStatementSetter.
I suspect this implementation could potentially be dangerous, as the actual implementation of the PreparedStatementSetter may not always be ArgumentPreparedStatementSetter (Yes I should do an instanceOf check). Also, the reflection code may not be as elegant, but that is besides the point now though :).
Here's my custom implementation:
public class CustomJdbcTemplate extends JdbcTemplate {
private static final Logger log = LoggerFactory.getLogger(CustomJdbcTemplate.class);
public CustomJdbcTemplate(DataSource dataSource) {
super(dataSource);
}
public <T> T query(PreparedStatementCreator psc, final PreparedStatementSetter pss, final ResultSetExtractor<T> rse)
throws DataAccessException {
if(log.isDebugEnabled()) {
ArgumentPreparedStatementSetter aps = (ArgumentPreparedStatementSetter) pss;
try {
Field args = aps.getClass().getDeclaredField("args");
args.setAccessible(true);
Object[] parameters = (Object[]) args.get(aps);
log.debug("Parameters for SQL query: " + Arrays.toString(parameters));
} catch (NoSuchFieldException | IllegalAccessException e) {
throw new GenericException(e.toString(), e);
}
}
return super.query(psc, pss, rse);
}
}
So, when I execute the log.debug(...) statement I would also like to have the original SQL query logged (same line). Has anyone done something similar or are there any better suggestions as to how this can be achieved?
I do quite a few queries using this CustomJDBCTemplate and all my tests run, so I think it may be an acceptable solution of for most debug purposes.
Kind regards,
Thomas
I found a way to get the SQL-statement, so I will answer my own question :)
The PreparedStatementCreator has the following implementation:
private static class SimplePreparedStatementCreator implements PreparedStatementCreator, SqlProvider
So the SqlProvider has a getSql() method which does exactly what I need.
Posting the "improved" CustomJdbcTemplate class if anyone ever should need to do the same :)
public class CustomJdbcTemplate extends JdbcTemplate {
private static final Logger log = LoggerFactory.getLogger(CustomJdbcTemplate.class);
public CustomJdbcTemplate(DataSource dataSource) {
super(dataSource);
}
public <T> T query(PreparedStatementCreator psc, final PreparedStatementSetter pss, final ResultSetExtractor<T> rse)
throws DataAccessException {
if(log.isDebugEnabled()) {
if(pss instanceof ArgumentPreparedStatementSetter) {
ArgumentPreparedStatementSetter aps = (ArgumentPreparedStatementSetter) pss;
try {
Field args = aps.getClass().getDeclaredField("args");
args.setAccessible(true);
Object[] parameters = (Object[]) args.get(aps);
log.debug("SQL query: [{}]\tParams: {} ", getSql(psc), Arrays.toString(parameters));
} catch (NoSuchFieldException | IllegalAccessException e) {
throw new GenericException(e.toString(), e);
}
}
}
return super.query(psc, pss, rse);
}
private static String getSql(Object sqlProvider) { // this code is also found in the JDBCTemplate class
if (sqlProvider instanceof SqlProvider) {
return ((SqlProvider) sqlProvider).getSql();
}
else {
return null;
}
}
}

Load external properties files into EJB 3 app running on WebLogic 11

Am researching the best way to load external properties files from and EJB 3 app whose EAR file is deployed to WebLogic.
Was thinking about using an init servlet but I read somewhere that it would be too slow (e.g. my message handler might receive a message from my JMS queue before the init servlet runs).
Suppose I have multiple property files or one file here:
~/opt/conf/
So far, I feel that the best possible solution is by using a Web Logic application lifecycle event where the code to read the properties files during pre-start:
import weblogic.application.ApplicationLifecycleListener;
import weblogic.application.ApplicationLifecycleEvent;
public class MyListener extends ApplicationLifecycleListener {
public void preStart(ApplicationLifecycleEvent evt) {
// Load properties files
}
}
See: http://download.oracle.com/docs/cd/E13222_01/wls/docs90/programming/lifecycle.html
What would happen if the server is already running, would post start be a viable solution?
Can anyone think of any alternative ways that are better?
It really depends on how often you want the properties to be reloaded. One approach I have taken is to have a properties file wrapper (singleton) that has a configurable parameter that defines how often the files should be reloaded. I would then always read properties through that wrapper and it would reload the properties ever 15 minutes (similar to Log4J's ConfigureAndWatch). That way, if I wanted to, I can change properties without changing the state of a deployed application.
This also allows you to load properties from a database, instead of a file. That way you can have a level of confidence that properties are consistent across the nodes in a cluster and it reduces complexity associated with managing a config file for each node.
I prefer that over tying it to a lifecycle event. If you weren't ever going to change them, then make them static constants somewhere :)
Here is an example implementation to give you an idea:
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.util.*;
/**
* User: jeffrey.a.west
* Date: Jul 1, 2011
* Time: 8:43:55 AM
*/
public class ReloadingProperties
{
private final String lockObject = "LockMe";
private long lastLoadTime = 0;
private long reloadInterval;
private String filePath;
private Properties properties;
private static final Map<String, ReloadingProperties> instanceMap;
private static final long DEFAULT_RELOAD_INTERVAL = 1000 * 60 * 5;
public static void main(String[] args)
{
ReloadingProperties props = ReloadingProperties.getInstance("myProperties.properties");
System.out.println(props.getProperty("example"));
try
{
Thread.sleep(6000);
}
catch (InterruptedException e)
{
e.printStackTrace();
}
System.out.println(props.getProperty("example"));
}
static
{
instanceMap = new HashMap(31);
}
public static ReloadingProperties getInstance(String filePath)
{
ReloadingProperties instance = instanceMap.get(filePath);
if (instance == null)
{
instance = new ReloadingProperties(filePath, DEFAULT_RELOAD_INTERVAL);
synchronized (instanceMap)
{
instanceMap.put(filePath, instance);
}
}
return instance;
}
private ReloadingProperties(String filePath, long reloadInterval)
{
this.reloadInterval = reloadInterval;
this.filePath = filePath;
}
private void checkRefresh()
{
long currentTime = System.currentTimeMillis();
long sinceLastLoad = currentTime - lastLoadTime;
if (properties == null || sinceLastLoad > reloadInterval)
{
System.out.println("Reloading!");
lastLoadTime = System.currentTimeMillis();
Properties newProperties = new Properties();
FileInputStream fileIn = null;
synchronized (lockObject)
{
try
{
fileIn = new FileInputStream(filePath);
newProperties.load(fileIn);
}
catch (FileNotFoundException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
finally
{
if (fileIn != null)
{
try
{
fileIn.close();
}
catch (IOException e)
{
e.printStackTrace();
}
}
}
properties = newProperties;
}
}
}
public String getProperty(String key, String defaultValue)
{
checkRefresh();
return properties.getProperty(key, defaultValue);
}
public String getProperty(String key)
{
checkRefresh();
return properties.getProperty(key);
}
}
Figured it out...
See the corresponding / related post on Stack Overflow.