Pentaho Kettle - REST API Does not Stop - api

I hope you are doing well, I am running one transformation in Pentaho kettle which is retrieving the data from an API having nearly 9000 records.
So, The transformation is hitting the API 9000 times to retrieve the data and inject it into MongoDB.
I am not sure why but for the last 2 weeks the transformation is getting stuck after fetching some amount of data, sometimes 1k sometimes 19 and when I stop it manually I noticed that the rest API operation is in a halting state and it never stopped.

Related

Pentaho Data Integration - Transformation is stuck not able to fetch data from Rest API

I hope you are doing well, I am newbie in Pentaho I need help in troubleshooting an issue.
Flow of transformation --
Fetching 9000 Id number from previous step without any issue.
Requesting data from an API for 9000 Id's - "Rest Client".
Injecting it into MongoDB
Transformation Snapshot
I have attached an snapshot of transformation (not the actual only with main steps).
After fetching some amount of data form REST Client, I believe it is not sending the next request which is way the transformation is being stuck and it never stopped.
The Steps I have taken to troubleshoot this issue but did not work:
I broken-down the transformation to retrieve 2k data at one go with the help of "block until this step" operation.
Closely monitor the CPU & Memory of the server - Max CPU 40% sometimes it touch 90% for a seconds , Memory - less than 80%
Not sure , if this is a cache issue or PDI issue or something else ?
Please help me to resolve this issue, Any suggestion will be much appreciated.
Thanks and Regards
Aslam Shaikh

Load data to Salesforce using ADF SF connector

We are planning following to transfer data to Salesforce:
Data bricks to do logic transformation and put result in sql to use existing ADF salesforce connector to load.
Want to know if ADF supports latest bulk Salesforce API which is Bulk API 2.0 ??
While creating linked service to salesforce i dont see it in apiversionz
Been a while since I used Azure Data Factory. The ADF2 plugin available on marketplace is provided for free by "Simba Technologies". Their page says it's on API v 45 at the moment, 8 releases behind so almost 3 years. You could try asking them but you aren't their customer directly so you don't have much leverage...
In theory it should be fine, the ingest endpoint first appeared in API 41.
Whether they implemented it - ultimately you will have to try it out. Prepare a big job (over 10K records), load it. Examine login history to check the API version used. Go to Setup, start searching "bulk" and check if there's a job or was it loaded old school way, 200 records at a time...

BigQuery python client library dropping data on insert_rows

I'm using the Python API to write to BigQuery -- I've had success previously, but I'm pretty novice with the BigQuery platform.
I recently updated a table schema to include some new nested records. After creating this new table, I'm seeing significant portions of data not making it to BigQuery.
However, some of the data is coming through. In a single write statement, my code will try to send through a handful of rows. Some of the rows make it and some do not, but no errors are being thrown from the BigQuery endpoint.
I have access to the stackdriver logs for this project and there are no errors or warnings indicating that a write would have failed. I'm not streaming the data -- using the BigQuery client library to call the API endpoint (I saw other answers that state issues with streaming data to a newly created table).
Has anyone else had issues with the BigQuery API? I haven't found any documentation stating about a delay to access the data (I found the opposite -- supposed to be near real-time, right?) and I'm not sure what's causing the issue at this point.
Any help or reference would be greatly appreciated.
Edit: Apparently the API is the streaming API -- missed on my part.
Edit2: This issue is related. Though, I've been writing to the table every 5 minutes for about 24 hours, and I'm still seeing missing data. I'm curious if writing to a BigQuery table within 10 minutes of it's creation puts you in a permanent state of losing data or if it would be expected to catching everything after the initial 10 minutes from creation.

Walmart Developer API connection

I have celigo and I am trying to connect to the Walmart API manually. The walmart API wants an epoch timestamp and an authentication key which requires me to run a jar file, and I can get those two values.
The time stamp and authentication key changes every time I run the jar file, so the connection ends up running for about 5 minutes before losing the connection. How can I make it so that it doesn't lose connection to Walmart.
As you are using the jar to generate those two values, they will return WM_SEC.AUTH_SIGNATURE and WM_SEC.TIMESTAMP as per the documentation. These need to be generated using the jar every time you make the API call (even if you are trying to make the same API call).
It works for 5 minutes because the WM_SEC.TIMESTAMP has a validity of 5 minutes. In that case as I mentioned earlier, using the jar you will get a WM_SEC.AUTH_SIGNATURE and WM_SEC.TIMESTAMP and it will work fine for you.
The time stamp and authentication key changes every time I run the jar file, so the connection ends up running for about 5 minutes before losing the connection. How can I make it so that it doesn't lose connection to Walmart.
Well , timestamps and signature you generated by jar file will always change.
You said connection ends up running about 5minutes But i think its your script
that was running for 5 min. Walmart API do not allow you to make a live connection. As soon you send request , walmart-api will respond in few seconds.
For Bulk Items Feeds upto 9.5mb(feed of 5K items) , it takes 2-3 sec maximum.
For Inventory Feeds upto 5mb (about 2K items) , it takes 2sec max.
No, you cannot. This is basically the entire point of the uniquely generated connection key. It prevents connections from being left in an open state that would cause unneeded server load.
Your question doesn't identify what you are trying to accomplish, nor why you would want to maintain an open connection to Walmart. After looking at Celigo's website, I'm still not sure what you are trying to accomplish, but based on the limited information, it appears that you are trying to do something with the Walmart API that it is not intended to do. Connections to the Walmart API should be on a request by request basis, and do not consist of a live connection.
The Walmart API documentation indicates that you should use a uniquely generated authentication key for every request made to the API, so the fact that you are able to keep the connection alive for a full 5 minutes is even beyond what you are supposed to be doing.
What are you trying to accomplish?

High response time in WSO2 DSS

I have created a simple data service using WSO2 DSS for the following simple query.
"SELECT * FROM EMP_VIEW"
"EMP_VIEW" is having around 45 columns and 8500 entries(tuples). My DB instance is Oracle 11g Enterprise edition & i'm using ojdbc6.jar as the driver. Due to some reason Data Service takes around 14 mins to get the response once I try it in SoapUI.
But the same query takes around 14 or less seconds to retrieve all the records in Oracle SQL Developer/ Eclipse database explorer.
Any idea why it's taking high response time?
Not an answer but potential direction in order to get to an answer.
There may be multiple factors at play here. You have proven that the Oracle side is working well (assuming the 14s response time is acceptable).
You mention that SOAPUI takes considerable time. This could be a SOAPUI problem where it is waiting for all results to be returned (time taken) and then building a full display (more time taken) before showing the full result.
The Oracle Dev tool could be faster at showing results since it may not be; waiting for the full result set and/or taking much time to build the display.
Keep in mind that DSS is taking the SQL result and placing XML, that in itself may add some time but I suspect the SOAPUI tool is taking a significant amount of time to decode the XML and place on your screen.
To further narrow down the problem I suggest you use another tool
1. possibly the TRYIT tool from DSS and see what type of timing it gets for the same calls.
2. write a small client c# / java etc and measure that actual time between your request and the response. This will definitely tell you how long DSS is taking versus how long it takes for the client to form a display.
Please do post your results as this type of information is definitely helpful to others.
As per my understandings and observations, SOAP UI waits till whole message receive. therefore that much of time will spent. but when you try curl, you can find less seconds to generate the response.
I tried curl to receive 2MB messages with streaming enabled DSS service,
The response was generated within less than one second.