My organization moves data for customers between systems, these integrations are in BizTalk and are done by file, sometimes to/from APIs. More and more customers are switching to APIs so we are facing more and more API to API integrations.
I'm mostly a backend developer but have been tasked with finding out how we can find a more generic pattern or system to make these integrations, we are talking close to a thousand of integrations.
But not thousands of different APIs, many customers use the same sort of systems.
What I want is a solution that:
Fetches data from the source api
Transforms the data to the format for the target api
Sends the data to the target api
Another requirement is that it should be possible to set a schedule when these jobs should run.
This is easily done in BizTalk but as mentioned there will be thousands of integrations and if we need to change something in one of the steps it will be a lot of work.
My vision is something that holds interfaces to all APIs that we communicate with and also contains the scheduled jobs we want to be run between them. Preferrably with logging/tracking.
There must be something out there that does this?
Suggestions?
NOTE: No cloud-based solutions since they are not allowed in our organization.
You can easily implement this using temporal.io open source project. You can code your integrations using a general-purpose programming language. Temporal ensures that the integration runs to completion in the presence of all sorts of intermittent failures. Scheduling is also supported out of the box.
Disclaimer: I'm a founder of the Temporal project.
Related
I have some app data which is currently stored in Splunk. But i am looking for a way where I can input the Splunk data directly to BigQuery. My target is to analyze the app data on BigQuery and perhaps create Data Studio dashboards based on the BigQuery.
I know there are a lot of third party connectors that can help me with this, but I am looking for a solution where I can use features from Splunk or BigQuery to conncet both of them together and not rely on third party connectors.
Based on your comment indicating that you're interested in resources to egress data from Splunk into BigQuery with custom software, I would suggest using either tool's REST API on either side.
You don't indicate whether this is a one-time or a recurring asking - that may impact where you want the software to run that performs this operation. If it's a one-time thing and you've got a fair internet connection yourself, you may just want to write a console application from your own machine to perform the migration. If it's a recurring operation, you might instead look at any of the various "serverless" hosting options out there (e.g. Azure Functions, Google Cloud Functions, or AWS Lambda). In addition to development experience, note that you may have to pay an egress bandwidth cost for each on top of normal service charges.
Beyond that, you need to decide whether it makes more sense to do a bulk export from Splunk out to some external file that you load into Google Drive and then import into Big Query. But maybe it makes more sense to download the records as paged data via HTTPS so you can perform some ETL operation on top of it (e.g. replace nulls with empty strings, update Datetime types to match Google's exacting standards, etc.). If you go this route, it looks as though this is the documentation you'd use from Splunk and you can either use Google's newer, and higher-performance Storage Write API to receive the data or their legacy streaming API to ingest into BigQuery. Either option supports SDKs across varied languages (e.g. C#, Go, Ruby, Node.js, Python, etc.), though only the legacy streaming API supports plain HTTP REST calls.
Beyond that, don't forget your OAuth2 concerns to authenticate on either side of the operation, though this is typically abstracted away by the various SDKs offered by either party, and less of something you'd have to deal with the ins and outs of.
I have two systems:
REST web application which return data in xml
Windows service which daily gets data from 1st web app and sync it wit its database.
Question: how to make integration testing for this applications (check whether data is sunchronised corectly)? Is it possible to automate such testing?
If I were you, I would send a request from 2 and validate my database data at 2. This forms a whole journey (E2E) there by interacting with as many other systems involved. You may also need to consider different scenarios/paths so that as much interaction is covered.
I am looking at extracting data from BigQuery and I have found out that it can be extracted using API or tools. Does any one know the advantage of using API over tools?
One of the things I can think of API advantage is that, with API data extraction can be scheduled for fixed time intervals.Are there any other advantage of using API's?
Basically I want to know when to use API vs tools.
To state it explicitly, the BigQuery tools including the BQ CLI, the Web UI, and even third party tools are leveraging the BigQuery API to enable whatever functionality they expose. Google also provides client libraries for many popular programming languages that make working with the API more straightforward.
Your question then becomes whether your particular needs are best served by using one of these tools or building your own integration with the API. If you're simply loading data into tables once an hour, perhaps a local cron job that calls the BQ CLI tool is sufficient. If you're streaming some kind of event record into a table as they happen, the API route may be more appropriate as you're integrating more deeply into your own software stack.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
I've just started at a new communications company, and we're looking at a workflow / intranet system to manage jobs and processes.
Basically, we receive data files from clients which we then process through our systems.
Receive data file (FTP, Email, etc)
Process data file (either generic script with data mapping to the file, or bespoke ETL package). Adds address values
Create printstream (send processed data file into a postscript / PDF composition engine), or create email output
Send output to production floor (copy to printer input stream, mailing machines)
Process other streams (e.g. send emails / faxes, upload to e-Archive)
Update internal systems (e.g. warehouse stock, invoicing)
We also have a lot of other internal business processes (e.g. reprocessing damaged output, processing dead/returned mail).
I'm trying to keep all elements separated. Some will be off the shelf (e.g. printstream composition, email sending / management, CRM). Some will be built in house (e.g. reprocess damaged output).
But, I'm looking for something to tie it all together, and put the business workflow processes in. E.g. scheduling jobs, kicking off data processing tasks in sequence and managing errors. A lot of this will have human steps. Also, put in SLA management and business activity monitoring / reporting.
One key requirement soon is for automated file receipt and processing (i.e. directory watching and matching to client / application).
I'm keen for something that's easy to manage and maintain (e.g. adding in new steps to a workflow, or conditional logic, or whatever).
I realise this is a big job, and at the moment we're focusing on each individual component and putting manual processes in place until we get a system to manage it. We don't want to design a gargantuan bespoke system to tie it all in, but would rather look at buying some kind of workflow or integration system.
Any suggestions? I've had a look at Biztalk, but not sure if it's overkill or not suited for internal-only systems. Another product I've been exposed to is Sagent Automation, but it looks a little pokey.
-- EDIT --
Forgot to mention, our existing skillset is largely Microsoft. So anything in Microsoft technologies / .Net based would be preferable. But if there's a fantastic product, we're not adverse to upskilling
Check out Apache's Active MQ. It implements the Java Message Service 1.1 specification, layers on a servlet API, and has tons of features that should address your requirements. You can also layer on Camel, which adds a rich implementation of many enterprise integration patterns.
Typically, JMS messages are persisted in a transactional database, which can be configured to give you extremely high degrees of fault tolerance (eg, RAID, master-backup database machine pairs, multiple copies of transaction log files). On top of the database can go multiple, load-balanced app server machines running Active MQ, to give you scalability and high-availability. I think you'll find that you can write your components in a very decoupled fashion if you use Active MQ as your common message bus.
In JMS, when a message is de-queued by a consumer, the consuming process must later confirm that the message was successfully handled. If a confirmation does not come in in time, the JMS system will revive the message so another consuming process can attempt to handle it. This means you can run multiple copies of your application to gain reliability and fault tolerance.
Take a look at O'Reilly's Java Message Service, 2nd Edition, which just came out this week.
A different avenue would be to look into BPEL (Business Process Execution Languge).
Edit: I'm not very familiar with Microsoft offerings, but MSMQ seems like the equivalent to JMS.
You should be able to use ActiveMQ in a Microsoft environment. They claim to support "cross language clients" like "C# and .NET". And even if that should be problematic, since ActiveMQ has a Java servlet-based API for queueing and de-queuing messages, the outside world only has to be able to make HTTP requests to the ActiveMQ server. That should limit the amount of learning your team would have to do. Good luck, this sounds like an awesome project!
SharePoint has a workflow engine that works very well. You can build your workflow using SharePoint designer or Visual Studio 2008. It uses Windows Workflow, which is similar to BizTalk (if not the same engine), but without BizTalk's other services that may not be necessary for your application.
I am building out a solution that will be deployed in multiple data centers in multiple regions around the world, with each data center having a replicated copy of data actively updated in each region. I will have a combination of multiple databases and file systems in each data center, the state of which must be kept consistent (within a data center). These multiple repositories will be fronted by a SOA service tier.
I can tolerate some latency in the replication, and need to allow for regions to be off-line, and then catch up later.
Given the multiple back end repositories of data, I can't easily rely on independent replication solutions for each one to maintain a consistent state. I am thus lead to implementing replication at the application layer -- by replicating the SOA requests in some manner. I'll need to make sure that replication loops don't occur, and that last writer conditions are sorted out correctly.
In your experience, what is the best pattern for solving this problem, and are there good products (free or otherwise) that should be investigated?
Lotus/ Domino is your answer. I've been working with it for ten years and its exactly what you need. It may not be trendy (a perception that I would challenge) but its powerful, adaptable and very secure, The latest version R8 is the best yet.
You should definitely consider IBM Lotus Domino. A Lotus Notes database can replicate between sites on a predefined schedule. The replicate in Notes/Domino is definitely a very powerful feature and enables for full replication of data between sites. Even if a server is unavailable the next time it connects it will simply replicate and get back in sync.
As far as SOA Service tier you could then use Domino Designer to write a webservice. Since Notes/Domino 7.5.x (I believe) Domino has been able to provision and consume webservices.
AS what other advised, I will recommend also Lotus Notes/Domino. 8.5 is really very powerful application development platfrom
You dont give enough specifics to be certain of your needs but I think you should check out SQL Server Merge replication. It allows for asynchronous replication of multiple databases with full conflict resolution. You will need to designate a Global master and all the other databases will replicate to that one, but all the database instances are fully functional (read/write) and so you can schedule replication at whatever intervals suit you. If any region goes offline they can catch up later with no issues - if the master goes offline everyone will work independantly until replication can resume.
I would be interested to know of other solutions this flexible (apart from Lotus Notes/Domino of course which is not very trendy these days).
I think that your answer is going to have to be based on a pub/sub architecture. I am assuming that you have reliable messaging between your data centers so that you can rely on published updates being received eventually. If all of your access to the data repositories is via service you can add an event notification to the orchestration of each of your update services that notifies all interested data centers of the event. Ideally the master database is the only one that sends out these updates. If the master database is the only one sending the updates you can exclude routing the notifications to the node that generated them in the first place thus avoiding update loops.