Is Apache Camel's idempotent consumer pattern scalable? - sql

I'm using Apache Camel 2.13.1 to poll a database table which will have upwards of 300k rows in it. I'm looking to use the Idempotent Consumer EIP to filter rows that have already been processed.
I'm wondering though, whether the implementation is really scalable or not. My camel context is:-
<camelContext xmlns="http://camel.apache.org/schema/spring">
<route id="main">
<from
uri="sql:select * from transactions?dataSource=myDataSource&consumer.delay=10000&consumer.useIterator=true" />
<transacted ref="PROPAGATION_REQUIRED" />
<enrich uri="direct:invokeIdempotentTransactions" />
<!-- Any processors here will be executed on all messages -->
</route>
<route id="idempotentTransactions">
<from uri="direct:invokeIdempotentTransactions" />
<idempotentConsumer
messageIdRepositoryRef="jdbcIdempotentRepository">
<ognl>#{request.body.ID}</ognl>
<!-- Anything here will only be executed for non-duplicates -->
<log message="non-duplicate" />
<to uri="stream:out" />
</idempotentConsumer>
</route>
</camelContext>
It would seem that the full 300k rows are going to be processed every 10 seconds (via consumer.delay parameter) which seems very inefficient. I would expect some sort of feedback loop as part of the pattern so that the query that feeds the filter could take advantage of the set of rows already processed.
However, the messageid column in the CAMEL_MESSAGEPROCESSED table has the pattern of
{1908988=null}
where 1908988 is the request.body.ID I've set the EIP to key on so this doesn't make it easy to incorporate into my query.
Is there a better way of using the CAMEL_MESSAGEPROCESSED table as a feedback loop into my select statement so that the SQL server is performing most of the load?
Update:
So, I've since found out that it was my ognl code that was causing the odd message id column value. Changing it to
<el>${in.body.ID}</el>
has fixed it. So, now that I have a usable messageId column, I can now change my 'from' SQL query to
select * from transactions tr where tr.ID IN (select cmp.messageid from CAMEL_MESSAGEPROCESSED cmp where cmp.processor = 'transactionProcessor')
but I still think I'm corrupting the Idempotent Consumer EIP.
Does anyone else do this? Any reason not to?

Yes, it is. But you need to use scalable storage for holding sets of already processed messages. You can use either Hazelcast - http://camel.apache.org/hazelcast-idempotent-repository-tutorial.html or Infinispan - http://java.dzone.com/articles/clustered-idempotent-consumer - depending on which solution is already in your stack. Of course, JDBC repository would work, but only if it meets performance criteria selected.

Related

Apache Camel Timer - FixedRate true causing duplicate calls

<route>
<from uri="timer://SomeTimer?fixedRate=true&period=60000" />
<to uri="seda:SomeProcessor" />
</route>
<route>
<from uri="seda:SomeProcessor" />
<setHeader headerName="ServiceName">
<constant>SomeService</constant>
</setHeader>
<to uri="bean:serviceConsumer?method=callService" />
</route>
it intern calls a procedure which is supposed to Poll and then inert some data in a table. Its duplicating intermittently in the target table and we can see the sqlid is separate for the duplicate rows inserted. Lately the DB is performing badly and taking a lot of time.
I am thinking FixedRate=true is making timer objects bunching up and then firing rapidly creating a race condition to duplicate data. Can anyone advice please.

Play 2.4 - Display Ebeans SQL statement in logs

How to display SQL Statements in the log ? I'm using EBeans and it fails to insert for some reasons but I can't see what's the problem.
I tried to edit my config to:
db.default.logStatements=true
and add this to logback.xml
<logger name="com.jolbox" level="DEBUG" />
to follow some answers I found online, but it doesn't seem to work for 2.4…
Logging has changed with Play 2.4. Starting from now, to display the SQL statements in the console, simply add the following line to the conf/logback.xml file:
<logger name="org.avaje.ebean.SQL" level="TRACE" />
It should work just fine.
As #Flo354 pointed out in the comments, with Play 2.6 you should use:
<logger name="io.bean" level="TRACE" />
From Play 2.5 Logging SQL statements is very easy, Play 2.5 has an easy way to log SQL statements, built on jdbcdslog, that works across all JDBC databases, connection pool implementations and persistence frameworks (Anorm, Ebean, JPA, Slick, etc). When you enable logging you will see each SQL statement sent to your database as well as performance information about how long the statement takes to run.
The SQL log statement feature in Play 2.5 can be configured by database, using logSql property:
db.default.logSql=true
After that, you can configure the jdbcdslog-exp log level by adding this lines to logback.xml:
<logger name="org.jdbcdslog.ConnectionLogger" level="OFF" /> <!-- Won' log connections -->
<logger name="org.jdbcdslog.StatementLogger" level="INFO" /> <!-- Will log all statements -->
<logger name="org.jdbcdslog.ResultSetLogger" level="OFF" /> <!-- Won' log result sets -->
FYI, there's nice video tutorial on Ebean's new doc page showing the way to capture SQL statements only for selected areas of the code.
Thanks to this you can log statements only in problematic places while developing and/or use the logged statements for performing tests as showed in video.
In short: add latest avaje-ebeanorm-mocker dependency to your built.sbt as usually, so later you can use it in your code like:
LoggedSql.start();
User user = User.find.byId(123);
// ... other queries
List<String> capturedLogs = LoggedSql.stop();
Note you don't even need to fetch the List of statements if you do not need to process them as they are displayed in the console as usually. So you can use it like this as well:
if (Play.isDev()) LoggedSql.start();
User user = User.find.byId(345);
// ... other queries
if (Play.isDev()) LoggedSql.stop();
I had success using jdbcdslog. As #Saeed Zarinfam mentioned here, Play 2.5 includes this by default.
Unlike this answer, this solution shows the parameter values instead of question marks.
Here are the steps I followed to get it working for Play 2.4 and MySQL:
Add to build.sbt:
"com.googlecode.usc" % "jdbcdslog" % "1.0.6.2"
Add to logback.xml:
<logger name="org.jdbcdslog.StatementLogger" level="INFO" /> <!-- Will log all statements -->
Create conf/jdbcdslog.properties file containing:
jdbcdslog.driverName=mysql
jdbcdslog.showTime=true
Change db.default.url (example):
jdbc:mysql://127.0.0.1:3306/mydb
changes to
jdbc:jdbcdslog:mysql://127.0.0.1:3306/mydb;targetDriver=com.mysql.jdbc.Driver
Change db.default.driver:
org.jdbcdslog.DriverLoggingProxy

Apache Camel get empty response from jetty

I face a complex case.
What I'm doing is as following steps:
1) <from uri="jetty:http://0.0.0.0:30100/jetty/test"/>
2) <to uri="hazelcast-client:master-test-series" />
3) <to uri="bean:modelSeriesWrapperTest" />
4)
<split parallelProcessing="true" streaming="true">
<simple>${body}</simple> <to uri="direct:dw.model.test"/>
</split>
5) From another route
<from uri="direct:dw.model.test"/>
<aggregate strategyRef="myAggregatorStrategy"
completionTimeout="1000">
<correlationExpression>
<constant>true</constant>
</correlationExpression>
<marshal ref="modelSeriesVariantColourGson" />
<camel:to uri="file:src/data/catask/output?fileName=output.xml"/>
</aggregate>
The problem is that the jetty response is empty. I use TCP trace to track the request and response, the Content-Length is 0. But the output.xml file has correct JSON format content.
Even I cross the <camel:to uri="file:src/data/catask/output?fileName=output.xml"/>. The jetty response is still empty.
I try the InOut pattern, it doesn't work as well.
It seems jetty return directly, not waiting split done. I try to set In and Out body, it doesn't work either. I Google every case that I can image. There is no helpful case.
Could you please help me? Thank you very much.
If you want the jetty response to include whatever information from your aggregator, then you must use the splitter only approach as documented at:
http://camel.apache.org/composed-message-processor.html
The splitter has built-in aggregation, and that ensures when the splitter is done, it aggregates also, and then you can use that as the jetty response.
When you use <aggregate> then it becomes a separate exchange. To understand this more then read more about the aggregate eip, and other SO, and in various Camel books etc.

JMS Message Selector in Mule using date

In Mule 3.3.1, during async processing, when any of my external services are down, I would like to place the message on a queue (retryQueue) with a particular "next retry" timestamp. The flow that processes messages from this retryQueue selects messages based on "next retry" time as in if "next retry" time is past current time, select the message for processing. Similar to what has been mentioned in following link.
Retry JMS queue implementation to deliver failed messages after certain interval of time
Could you please provide sample code to achieve this?
I tried:
<on-redelivery-attempts-exceeded>
<message-properties-transformer scope="outbound">
<add-message-property key="putOnQueueTime" value="#[function:datestamp:yyyy-MM-dd hh:mm:ssZ]" />
</message-properties-transformer>
<jms:outbound-endpoint ref="retryQueue"/>
</on-redelivery-attempts-exceeded>
and on the receiving flow
<jms:inbound-endpoint ref="retryQueue">
<!-- I have no idea how to do the selector....
I tried....<jms:selector expression="#[header:INBOUND:putOnQueueTime > ((function:now) - 30)]"/>, but obviously it doesn't work. Gives me an invalid message selector. -->
</jms:inbound-endpoint>.
Another note: If I set the outbound property using
<add-message-property key="putOnQueueTime" value="#[function:now]"/>,
it doesn't get carried over as part of header. That's why I changed it to:
<add-message-property key="putOnQueueTime" value="#[function:datestamp:yyyy-MM-dd hh:mm:ssZ]" />
The expression in:
<jms:selector expression="#[header:INBOUND:putOnQueueTime > ((function:now) - 30)]"/>
should evaluate to a valid JMS selector, which is not the case here. Try with:
<jms:selector expression="putOnQueueTime > #[XXX]"/>
replacing XXX with an expression that creates the time you want.
We were trying to achieve this in one of the projects I'm working on, and tried what was being suggested in the other answer here, and it did not work, with various variantions. The problem is that the jms:selector doesn't support MEL, since it's relies on ActiveMQ classes.
We registered a support-ticket to Mulesoft, and their reply was that this is not supported.
What we ended up doing was this:
Create a simple Component, which does a Thread.sleep(numberOfMillis), where the number of millis is defined in a property.
In the flow that was supposed to delay processing, we added this component as the first step after reading the message from the inbound endpoint.
Not the best solution ever made, but it works..

Quickest method for matching nested XML data against database table structure

I have an application which creates datarequests which can be quite complex. These need to be stored in the database as tables. An outline of a datarequest (as XML) would be...
<datarequest>
<datatask view="vw_ContractData" db="reporting" index="1">
<datefilter modifier="w0">
<filter index="1" datatype="d" column="Contract Date" param1="2009-10-19 12:00:00" param2="2012-09-27 12:00:00" daterange="" operation="Between" />
</datefilter>
<filters>
<alternation index="1">
<filter index="1" datatype="t" column="Department" param1="Stock" param2="" operation="Equals" />
</alternation>
<alternation index="2">
<filter index="1" datatype="t" column="Department" param1="HR" param2="" operation="Equals" />
</alternation>
</filters>
<series column="Turnaround" aggregate="avg" split="0" splitfield="" index="1">
<filters />
</series>
<series column="Requested 3" aggregate="avg" split="0" splitfield="" index="2">
<filters>
<alternation index="1">
<filter index="1" datatype="t" column="Worker" param1="Malcom" param2="" operation="Equals" />
</alternation>
</filters>
</series>
<series column="Requested 2" aggregate="avg" split="0" splitfield="" index="3">
<filters />
</series>
<series column="Reqested" aggregate="avg" split="0" splitfield="" index="4">
<filters />
</series>
</datatask>
</datarequest>
This encodes a datarequest comprising a daterange, main filters, series and series filters. Basically any element which has the index attribute can occur multiple times within its parent element - the exception to this being the filter within datefilter.
But the structure of this is kind of academic, the problem is more fundamental:
When a request comes through, XML like this is sent to SQLServer as a parameter to a stored proc. This XML is shredded into a de-normalised table and then written iteratively to normalised tables such as tblDataRequest (DataRequestID PK), tblDataTask, tblFilter, tblSeries. This is fine.
The problem occurs when I want to match a given XML defintion with one already held in the DB. I currently do this by...
Shredding the XML into a de-normalised table
Using a CTE to pull all the existing data in the database into that same de-normalised form
Matching using a huge WHERE condition (34 lines long)
..This will return me any DataRequestID which exactly matches the XML given. I fear that this method will end up being painfully slow - partly because I don't believe the CTE will do any clever filtering, it will pull all the data every single time before applying the huge WHERE.
I have thought there must be better solutions to this eg
When storing a datarequest, also store a hash of the datarequest somehow and simply match on that. In the case of collision, use the current method. I wanted however to do this using set-logic. And also, I'm concerned about irrelevant small differences in the XML changing the hash - spurious spaces etc.
Somehow perform the matching iteratively from the bottom up. Eg produce a list of filters which match on the lowest level. Use this as part of an IN to match Series. Use this as part of an IN to match DataTasks etc etc. The trouble is, I start to black-out when I think about this for too long.
Basically - Has anyone ever encountered this kind of problem before (they must have). And what would be the recommended route for tackling it? example (pseudo)code would be great :)
To get rid of the possibility of minor variances, I'd run the request through an XML transform (XSLT).
Alternatively, since you've already got the code to parse this out into a denormalized staging table that's fine too. I would then simply using FOR XML to create a new XML doc.
Your goal here is to create a standardized XML document that respects ordering where appropriate and removes inconsistencies where it is not.
Once that is done, store this in a new table. Now you can run a direct comparison of the "standardized" request XML against existing data.
To do the actual comparison, you can use a hash, store the XML as a string and do a direct string comparison, or do a full XML comparison like this: http://beyondrelational.com/modules/2/blogs/28/posts/10317/xquery-lab-36-writing-a-tsql-function-to-compare-two-xml-values-part-2.aspx
My preference, as long as the XML is never over 8000bytes, would be to create a unique string (either VARCHAR(8000) or NVARCHAR(4000) if you have special character support) and create a unique index on the column.