I am trying to call a system which does not have it's api bulkified. So basically I have 1 record and for example 1000 child records. In order to send this info to other system am currently required to make 1000 api calls. Can we use middleware Dell Boomi to do this for me.
In short, I call only one Dell Boomi api with all 1000 records and Dell Boomi breaks into 1000 such calls and send this to other system.
Is this scenario even possible? Any suggestion in the right direction would be helpful.
user2272821 makes the assumption that you already have Boomi...
A Dell Boomi process (program) treats your workflow like a flowchart.
The first step would read your parent record (or params) from a source (like a file)
The second step would use a connection object to access your 1000 records
The third step would be to Data Process Shape to split your 'document' into 1000 bits.
The next step would be to process each 'bit' in some way - or many ways...
(There is built in functionality to handle REST and SOAP calls.)
so, yes, it should be able to help you.
note that once you have Boomi, you'll find that you can add filtering and divert records with different values down different paths, send rejected records to a log, all sorts of cool stuff.
Related
Overview
I’m currently building a prototype to track and control a fleet of drones.
The prototype consists of a service and a web app. In the web app, the location of each drone is displayed in real-time on a map and the user can issue basic commands to each of these drones.
The service is automated and can also issue commands to each of the drones at random times when certain conditions occur.
I am using HiveMQ (an MQTT broker) to facilitate communication between drones, the web app and the service. The web app and the service are both subscribed to the 'telemetry' topic to receive real-time data about the network of drones. The broker will store the telemetry data for each drone directly into a database through the use of HiveMQ's extension functionality.
Specific commands can only be executed if certain criteria are met.
For example: To issue an 'execute mission' command to a drone the service or the web app will make a call to an API. The API will:
Check the drone is not currently on a mission (drone status value must be idle)
Check weather conditions are acceptable in the area the mission is to occur
(Note by 'mission' I mean a drone fly's to a series of set locations autonomously).
If conditions aren't met a response indicating this will be returned to the requester (web app or service). If conditions are met the API will issue the command to the appropriate drone via the MQTT broker and send a response to the requester.
Requirements
I need a storage mechanism that meets the following criteria:
I need to ensure that a race condition does not occur between the web app and the service. That is if a request to issue a command to a drone is being made by the web app, a request made by the service in this time should be automatically rejected.
Drone status between the service and the web app are not synchronous, as a result, they need a synchronized point to check a drones status.
Drones will update their status every second, and API call's to issue commands will be made every 10 - 30 seconds. There will be 5 drones in this prototype but I would like a solution that can scale to 50 drones.
Considered Solution
My solution would be that of a relational database - using a separate table with a 'request_lock' field, this field uses a row-level lock.
When an API call is made it checks if this field is true, if true the request is rejected. If it is false it sets the field to true performs the necessary condition checks and then sets the 'request_lock' field to false when once the command has reached the drone.
I am concerned the status update frequency from each drone does not fit a relational database model and won't scale well. Am I on the right track, or should I be looking to include a NoSQL database in some way to handle status updates?
Thank you to anyone who takes the time to answer.
There are a lot of questions here, so I'll try to pick what seems to be most important:
I am concerned the status update frequency from each drone does not fit a relational database model ..
Should I use a relational or non-relational database?
First, let's calculate the maximum number of drone status updates, per second.
Drones will update their status every second, and API call's [sic] to issue commands will be made every 10 - 30 seconds. There will be 5 drones in this prototype but I would like a solution that can scale to 50 drones.
50 drones * 1 drone-update per second = 50 drone-updates per second
50 drones * (10 / 60) drone-commands per second = 8.3 drone-commands per second
So, can a relational database handle ~60 queries per second?
Yes. Assuming reasonable query complexity, this is within the ability of a traditional relational database. I would not expect the database to need extraordinary system resources, either.
If you'd like to confirm this level of performance with a benchmark, I'd recommend a tool like pgbench.
I am trying to sync oneDrive Files (metadata and permissions) for a domain using MSGraph API using list, children and permission endpoints.
I am using batching for children and permission endpoints, sending 10-20 request urls in single batch requests concurrently for 10 users.
I am getting a lot of 429 errors by doing so. Though, I was also getting 429 errors on making single (non-batched) calls also.
According to the documentation related to throttling, they ask to
1. Reduce the number of operations per request
2. Reduce the frequency of calls.
So, my question is
Does a batch call of 10 get urls, count as 10 different operations and 10 different calls ?
Does a batch call of 10 get urls, count as 10 different operations and
10 different calls ?
Normally, N URLs will be treated as N+1 operations(even more). N operations from the batch URLs and one for the batch URL itself.
Pay attention to the docs:
JSON batching allows you to optimize your application by combining
multiple requests into a single JSON object.
Due to multiple requests have been combined to one request, the server side just need to send back one response too. But the underlying operation for each URL still need to be handle, so the workload on server side is still very high, just may reduce a little.
The answer lies somewhere in between.
Even though the documentation (cannot find the actual page at this moment) says you can combine up to 20 requests, I found out by experimenting that the limit is currently set to 15. So if you reduce the amount off calls in a single batch you should be good to go.
I'm not sure but it might also help to restrict the batches to a single user.
The throttling limit is set to 10000 items per 10 minutes per user resource, see this blog item
what would be the best option for exposing 220k records to third party applications?
SF style 'bulk API' - independent of the standard API to maintain availability
server-side pagination
call back to a ftp generated file?
webhooks?
This bulk will have to happen once a day or so. ANY OTHER SUGGESTIONS WELCOME!
How are the 220k records being used?
Must serve it all at once
Not ideal for human consumers of this endpoint without special GUI considerations and communication.
A. I think that using a 'bulk API' would be marginally better than reading a file of the same data. (Not 100% sure on this.) Opening and interpreting a file might take a little bit more time than directly accessing data provided in an endpoint's response body.
Can send it in pieces
B. If only a small amount of data is needed at once, then server-side pagination should be used and allows the consumer to request new batches of data as desired. This reduces unnecessary server load by not sending data without it being specifically requested.
C. If all of it needs to be received during a user-session, then find a way to send the consumer partial information along the way. Often users can be temporarily satisfied with partial data while the rest loads, so update the client periodically with information as it arrives. Consider AJAX Long-Polling, HTML5 Server Sent Events (SSE), HTML5 Websockets as described here: What are Long-Polling, Websockets, Server-Sent Events (SSE) and Comet?. Tech stack details and third party requirements will likely limit your options. Make sure to communicate to users that the application is still working on the request until it is finished.
Can send less data
D. If the third party applications only need to show updated records, could a different endpoint be created for exposing this more manageable (hopefully) subset of records?
E. If the end-result is displaying this data in a user-centric application, then maybe a manageable amount of summary data could be sent instead? Are there user-centric applications that show 220k records at once, instead of fetching individual ones (or small batches)?
I would use a streaming API. This is an API that does a "select * from table" and then streams the results to the consumer. You do this using a for loop to fetch and output the records. This way you never use much memory and as long as you frequently flush the output the webserver will not close the connection and you will support any size of result set.
I know this works as I (shameless plug) wrote the mysql-crud-api that actually does this.
First of all I'm developing my own C# library for controlling Philips Hue, which means I'm not using the official SDK. (I'm guessing that the SDK will make sure you won't have any problems)
I'm a little confused about the limitation in the Core concepts page in the API, which states:
We can’t send commands to the lights too fast. If you stick to around 10 commands per second to the /lights resource as maximum you should be fine. For /groups commands you should keep to a maximum of 1 per second.
I intend to respect this limitation, but does the limitation still apply when you are performing GET requests on the /lights resource, or is it only for sending actual commands with PUT requests to /lights/<id>/state that change the state of the light? Same question goes for the /groups resource.
Also is it even possible to damage anything by sending too many requests, or will it just take longer to get all responses?
Edit:
My overall question is: How should I understand the API limitation?
A more specific sub-question is: Should I wait 100 ms before sending another /lights command, relative to when I received a response, or relative to when I sent the previous command?
Another sub-question is: Should I consider this limitation only when using PUT requests on e.g. /lights/<id>/state, or on all request types GET/PUT/POST/DELETE
I don't know if anything was changed in firmware updates, but I have discovered that the bridge might not be so simple as you would think, and that the API description isn't very clear.
I've done a little testing while running firmware 01009914.
The bridge seems to have some kind of queue of incoming commands. I sent {"bri":254} to a group 9 times and 1 final command of {"bri":1}. From the first command to when the light is actually dimmed, takes roughly 3-4 seconds. Each time I sent a command the bridge replied almost instantly with success token.
I did the same small tests sending other commands, 10 of each JSON object:
{"bri":254} 3-4 seconds
{"on":true, "bri":254} 6-7 seconds
{"on":true, "bri":254, "alert":"none", "effect":"none"} 12-13 seconds
This actually shows that each change of attributes takes roughly 0.3 seconds for the bridge to handle.
I will claim that for each attribute we change, the bridge takes about 300 ms to finish, and the limitation of commands should be understood as: As long as you stick with changing one attribute of a group each second, you should be fine.
Note: I only tried with one group consisting of three lights, and I don't know if the bridge actually does have a queue of incoming commands, and in case it does have a queue, I don't know what the limit of items in it is.
Edit:
Now we have some official clarification of the Hue System Performance.
I'm fairly certain that the 10 commands per second is a guideline to prevent failure of the Bridge, and is a technical limitation of the hardware. Any more than that and you're apt to overload the bridge. I believe this applies to commands as well as requests.
Both approaches are reasonable. For laziness' sake, you could wait for 100ms to send a response, but I would only rely on this method if you don't plan on any other interactions with the Bridge.
I consider this limitation on all request types.
You won't damage anything if you send commands too fast. However, if you send commands too fast the bridge might become unresponsive and/or some messages can be ignored.
When it comes to the bridge, the way I think of it is that the bridge is more or less single threaded, so it works best if you make sure you don't send the next command before the previous one has returned.
In practice we've found that this works much better than waiting a fixed time between each request. In fact, you can pretty much send commands as fast as you want as long as you wait for the previous one to finish.
When you send a command to the bridge, the bridge has to then send it to the lamps through Zigbee. Since it's a mesh network in some cases the message has to make a couple of hops from lamp to lamp before it reaches the target. Depending on how many lamps you have and how many hops the signal needs to take, this can take a while. Also, it's possible that some messages randomly take much longer than others.
In general the system is not designed to handle very fast changes, but if you keep the above in mind you can make many cool effects :)
I receive Graph API error #613 (message: "Calls to mailbox_fql have exceeded the rate of 300 calls per 600 seconds", type:OAuthException) when testing my app. It's a desktop app, and the only copy is the one running on my machine (so there's only one access_token and one user - me).
I query the inbox endpoint once every 15 seconds or so. Combined, the app makes about 12 API calls (to various endpoints) per minute. It consistently fails on whichever call fetches the 300th thread (there are about 25 threads on the first page of the inbox endpoint, and I'm only fetching the first page). I am not batching any calls to the Graph API.
I'm developing on Mac OS X 10.7 using Objective-C. I use NSURLConnection to call the Graph API asynchronously. As far as I know, each request processed by NSURLConnection should only result in one request to Facebook's API.
Going on the above, I'm having trouble figuring out why I am receiving this error. I suspect that it is because a single call to the inbox endpoint (i.e. a call to the URI https://graph.facebook.com/me/inbox?access_token=...) is counted as more than one call to mailbox_fql. In particular, I think that a single call that returns <n> threads counts as <n> calls against mailbox_fql. If this is the case, is there a way to reduce the number of calls to mailbox_fql per API call (e.g. by fetching only the <n> most recent threads in the inbox, rather than the whole first page)?
The documentation appears to be pretty sparse on this topic, so I've had to get by mostly through trial and error. I'd be thrilled if anyone else knows how to tackle this issue.
Edit: It turns out that you can pass a limit GET parameter that, unsurprisingly, limits the number of results. However, the Developer blog notes some limitations with this approach (namely that fewer results than requested may be returned if some are not visible to your user).
The blog recommends using until and/or since as GET parameters when calling the standard Graph API. These parameters take any strtotime()-compliant string (or Unix epoch time) and limit your results accordingly.
Original answer follows:
After some further research, it looks like my options are to fetch less frequently or use custom FQL queries to limit the number of calls to mailbox_fql. I haven't been able to find any way to limit the response of the standard Graph API call to the inbox endpoint. In the present case, I'm using an FQL query of the following form:
https://graph.facebook.com/fql?q=SELECT <fields> FROM thread WHERE folder_id=1 LIMIT <n>&access_token=...
<fields> is a comma-separated list of fields (described in Facebook's thread FQL docs). thread is the literal name of the table corresponding to the inbox endpoint; the new thread endpoint corresponds to the unified_thread table, but it's not publicly available yet. folder_id=1 indicates that we want to use the inbox (as opposed to outbox or updates folders).
In practice, I'm setting <n> to 5, which results in a reasonable 200 calls to mailbox_fql in a 10-minute span when using 15-second call intervals. In my tests, I haven't been receiving error #613, so I guess it works.
I imagine that most people here were already familiar with the ins and outs of FQL, but it was new to me. I hope that this helps some other newbies dealing with similar issues!