Using XADatasource or non-XA Datasource for JTA based transactionsn in JPA - jta

We are using JPA 1.0 for ORM based operations and we want to have JTA datasource for our application. We are having only 1 database to which our application will connect.
We start our transaction boundary in controller class and it goes till DAO layer controller--> BOImpl--> DAO.
In websphere application server admin console when I am defining datasource should I use non-XA datasource or XA-Datasource.
My understanding is that for single datasource I should not use XADatasource.
Please let me know what should I need to use.

For a single resource (like a single DB) you indeed do not need an XA-datasource.
On the other hand, bear in mind that most JTA/JTS implementations actually recognize that there is only 1 resource participating in a transactions, so the overhead for XA would be minimal or none then. There can also be additional participants in the transactions that you might now not think about, like sending JMS messages.
But if you're really sure you only have 1 resource participating, you can safely go for non-XA.

I hope your doubt may be clear by now, but here is more information on that just in case.
The typical XA resources are databases, messaging queuing products such as JMS or WebSphere MQ, mainframe applications, ERP packages, or anything else that can be coordinated with the transaction manager. XA is used to coordinate what is commonly called a two-phase commit (2PC) transaction. The classic example of a 2PC transaction is when two different databases need to be updated atomically. Most people think of something like a bank that has one database for savings accounts and a different one for checking accounts. If a customer wants to transfer money between his checking and savings accounts, both databases have to participate in the transaction or the bank risks losing track of some money.
The problem is that most developers think, "Well, my application uses only one database, so I don't need to use XA on that database." This may not be true. The question that should be asked is, "Does the application require shared access to multiple resources that need to ensure the integrity of the transaction being performed?" For instance, does the application use Java 2 Connector Architecture adapters or the Java Message Service (JMS)? If the application needs to update the database and any of these other resources in the same transaction, then both the database and the other resource need to be treated as XA resources.

Related

When exactly do we use WCF transactions?

I was trying to get some info on WCF transactions, and I did manage to get info on how to use them. What I didn't get much info on is Why/When to use them.
What is the difference between database transactions and WCF transactions? Is there any specific case when either of these approach is preferred over the other?
By WCF transactions what you are really asking about is Microsoft's implementation of the WS-AtomicTransaction web service extension standard.
Why/When to use them
Similar to using a database transaction to guarantee consistency within the database, WS-AtomicTransaction is used to guarantee consistency within a larger, distributed system, based on communication over SOAP 1.2 services. This distributed system may or may not include database writes, but more often than not it will do.
Transactions propagated to the service from clients will cause the internal code of the service to execute within the context of the clients transaction.
So, in that same way a database transaction can wrap multiple database writes into a single unit of work, a WCF transaction can wrap multiple service calls into a single unit of work, so that a failure in one will roll back the others.
This, as you can imagine, is hugely costly from a resource perspective, so these kinds of cross-network transactions should be rarely (if ever) used unless absolutely necessary.

What is the difference between JTA and a local transaction?

What is the difference between JTA and a local transaction?
An example that shows when to use JTA and when to use a local transaction would be great.
JTA is a general API for managing transactions in Java. It allows you to start, commit and rollback transactions in a resource neutral way. Transactional status is typically stored in TLS (Thread Local Storage) and can be propagated to other methods in a call-stack without needing some explicit context object to be passed around. Transactional resources can join the ongoing transaction. If there is more than one resource participating in such a transaction, at least one of them has to be a so-called XA resource.
A resource local transaction is a transaction that you have with a specific single resource using its own specific API. Such a transaction typically does not propagate to other methods in a call-stack and you are required to pass some explicit context object around. In the majority of the resource local transactions it's not possible to have multiple resources participating in the same transaction.
You would use a resource local transaction in for instance low-level JDBC code in Java SE. Here the context object is expressed by an instance of java.sql.Connection. Other examples of resource local transactions are developers creating enterprise applications around 2002. Since transaction managers (used by JTA) were expensive, closed source and complicated things to setup around that era, people went with the cheaper and easier to obtain resource local variants.
You would use a JTA transaction in basically every other scenario. Very simple, small, free and open-source servers like TomEE (25MB) or GlassFish (35MB) have JTA support out of the box. There's nothing to setup and they Just Work.
Finally, technologies like EJB and Spring make even JTA easier to use by offering declarative transactions. In most cases it's advised to use those as they are easier, cleaner and less error prone. Both EJB and Spring can use JTA under the covers.
Transaction-type should be set to "RESOURCE_LOCAL" for Java SE application and to "JTA" for Java EE application.
"RESOURCE_LOCAL" may work fine on some web application deployed on Tomcat, but may cause issues when you run your application under glassfish environment.
If you are working on distributed transactions you must use "JTA" as your transaction manager.
The Java Transaction API (JTA) is one of the Java Enterprise Edition (Java EE) APIs allowing distributed transactions to be done across multiple XA resources in a Java environment.
J2EE application includes suppoart fot DT through 2 specifications
JTA--->Java Transaction API.highe-level implementation and is always enabled
JTS--->Java Transaction Service.

Why does Quartz Scheduler(JobSToreCMT) require the use of two datasources?

I found this annswer:
1. Long answer to Quartz requiring to data sources, however, if you want an even deeper answer, I believe I’ll need to dig into the source code or do more research:
a. JobStoreCMT relies upon transactions being managed by the application which is using Quartz. A JTA transaction must be in progress before attempt to schedule (or unschedule) jobs/triggers. This allows the "work" of scheduling to be part of the applications "larger" transaction. JobStoreCMT actually requires the use of two datasources - one that has it's connection's transactions managed by the application server (via JTA) and one datasource that has connections that do not participate in global (JTA) transactions. JobStoreCMT is appropriate when applications are using JTA transactions (such as via EJB Session Beans) to perform their work. (Ref; http://quartz-scheduler.org/documentation/quartz-1.x/configuration/ConfigJobStoreCMT)
However, there is a believed conflict with a non transactional driver in our particular application. Does anyone know if Quartz (JobsStoreCMT) can just work with just a transactional data source?
Does anyone know if Quartz (JobsStoreCMT) can just work with just a transactional data source?
No you must have a datasource of each type. Invocations on the API by the client application use the connections that are XA-capable, so that the work join's the application's transaction. Work done by the scheduler's internal threads use the non-XA connections.

Database good system decoupling point?

We have two systems where system A sends data to system B. It is a requirement that each system can run independently of the other and neither will blow up if the other is down. The question is what is the best way for system A to communicate with system B while meeting the decoupling requirement.
System B currently has a process that polls data in a db table and processes any new rows that have been inserted.
One proposed design is for system A to just insert data into system b's db table and have system B process the new rows by the existing process. Question is does this solution meet the requirement of decoupling the two systems? Is a database considered part of a system B which might become unavailable and cause system A to blow up?
Another solution is for system A to put data into an MQ queue and have a process that would read from MQ and then insert into system B's database. But is this just extra overhead? Ultimately is an MQ queue any more fault tolerant than a db table?
Generally speaking, database sharing is a close coupling and not to be preferred except possibly for speed purposes. Not only for availability purposes, but also because system A and B will be changed and upgraded at several points in their future, and should have minimal dependencies on each other - message passing is an obvious dependency, whereas shared databases tend to bite you (or your inheritors) on the posterior when least expected. If you go the database sharing route, at least make the sharing interface explicit with dedicated tables or views.
There are four common levels of integration:
Database sharing
File sharing
Remote procedure call
Message passing
which can be applied and combined in various situations, with different availability and maintainability. You have an excellent overview at the enterprise integration patterns site.
As with any central integration infrastructure, MQ should be hosted in an environment with great availability, full failover &c. There are other queue solutions which allow you to distribute the queue coordination.
Use Queues for communication. Do not "pass" data from System A to System B through the database. You're using the database as a giant, expensive, complex message queue.
Use a message queue as a message queue.
This is not "Extra" overhead. This is the best way to decouple systems. It's called Service Oriented Architecture (SOA) and using messages is absolutely central to the design.
An MQ queue is far simpler than a DB table.
Don't compare "fault tolerance" because an RDBMS uses huge (almost unimaginable) overheads to achieve a reasonable level of assurance that your transaction finished properly. Locking. Buffering. Write Queues. Storage Management. Etc. Etc.
A reliable message queue implementation uses some backing store to keep the queue's state. The overhead is much, much less than an RDBMS. The performance is much better. And it's much, much simpler to interact with.
In SQL Server I would do this through an SSIS package or a job (depending on the number of records and the complexity of what I was moving). Other databases also have ETL solutions. I like the ETL solution becasue I can keep logs of what was changed and what errors were processed, I can send records which for some reason won't go to the other system (data structures are rarely the same between two databases) to a holding table without killing the rest of the process. I can also make changes to the data as it flows to adjust for database differences (things like lookup table values, say the completed status in db1 is 5 and it is 7 in db2 or say db2 has a required field that db1 does not and you have to add a default value to the filed if it is null). If one or the other servver is down the job running the SSIS package will fail and neither system will be affected, so it keeps the datbases decoupled as using triggers or replication would not.

Application Level Replication Technologies

I am building out a solution that will be deployed in multiple data centers in multiple regions around the world, with each data center having a replicated copy of data actively updated in each region. I will have a combination of multiple databases and file systems in each data center, the state of which must be kept consistent (within a data center). These multiple repositories will be fronted by a SOA service tier.
I can tolerate some latency in the replication, and need to allow for regions to be off-line, and then catch up later.
Given the multiple back end repositories of data, I can't easily rely on independent replication solutions for each one to maintain a consistent state. I am thus lead to implementing replication at the application layer -- by replicating the SOA requests in some manner. I'll need to make sure that replication loops don't occur, and that last writer conditions are sorted out correctly.
In your experience, what is the best pattern for solving this problem, and are there good products (free or otherwise) that should be investigated?
Lotus/ Domino is your answer. I've been working with it for ten years and its exactly what you need. It may not be trendy (a perception that I would challenge) but its powerful, adaptable and very secure, The latest version R8 is the best yet.
You should definitely consider IBM Lotus Domino. A Lotus Notes database can replicate between sites on a predefined schedule. The replicate in Notes/Domino is definitely a very powerful feature and enables for full replication of data between sites. Even if a server is unavailable the next time it connects it will simply replicate and get back in sync.
As far as SOA Service tier you could then use Domino Designer to write a webservice. Since Notes/Domino 7.5.x (I believe) Domino has been able to provision and consume webservices.
AS what other advised, I will recommend also Lotus Notes/Domino. 8.5 is really very powerful application development platfrom
You dont give enough specifics to be certain of your needs but I think you should check out SQL Server Merge replication. It allows for asynchronous replication of multiple databases with full conflict resolution. You will need to designate a Global master and all the other databases will replicate to that one, but all the database instances are fully functional (read/write) and so you can schedule replication at whatever intervals suit you. If any region goes offline they can catch up later with no issues - if the master goes offline everyone will work independantly until replication can resume.
I would be interested to know of other solutions this flexible (apart from Lotus Notes/Domino of course which is not very trendy these days).
I think that your answer is going to have to be based on a pub/sub architecture. I am assuming that you have reliable messaging between your data centers so that you can rely on published updates being received eventually. If all of your access to the data repositories is via service you can add an event notification to the orchestration of each of your update services that notifies all interested data centers of the event. Ideally the master database is the only one that sends out these updates. If the master database is the only one sending the updates you can exclude routing the notifications to the node that generated them in the first place thus avoiding update loops.