Best practice for getRefreshedUserItems - yodlee

On the data extracts page Yodlee describes best practices for using getRefreshedUserItems but I think there are a few more details there that should be shared:
Is the 1 minute recommendation just in place to mitigate having to deal with large amounts of returned data? Is it within reason to only perform the polling for refreshed accounts every 5 minutes instead?
Say I do set up my process to retrieve refreshed items every 5 minutes as previously described, but my process fails to run during one of the iterations. If I leave it alone does that mean for 24 hours there are a few items I will have failed to pick up the refresh for? If so, how are others handling this? Recording a record of the timestamp of each successful communication with getRefreshedUserItems or perhaps iterating their local cache of Financial institutions that haven't been synced in more than 24 hours and retrieving updates for those as a one off communication? Or something else?

The main reason for keeping the limit as 1 minute is because of high number of refreshes, at current point of time you may not have high number of users but in future it may become higher.
Coming to your question about handling failure use cases- Say one of your job fails to fetch the items for that particular instance(passed duration in request), then you can have records of all such requests(which failed to get those items) and you can have a follow up job every hour which will trigger request for all the failed durations. This way you won't missed any items and keep data in sync with Yodlee.

Related

Can I use transactions to ensure consistent walk through my records with repeated SELECTs using OFFSET and LIMIT?

I have a scheduled job that runs once a day, synchronizing entities between multiple APIs. I'm looking for a reliable way to pull "pages" of data from my DB, without downloading GBs worth of it in one go, using LIMIT and OFFSET.
From what I understand, starting a transaction at the beginning of the process and executing repeated SELECTs within it will ensure that no records in my result set are added or skipped due to other concurrent processes?
Hopefully, that would allow me to perform the synchronization job on the exact state of DB records at the start of the transaction. Also, it may be worth to know that the sync job itself won't alter the records from said result set.

Storing vast amounts of "uptime" data for a website monitoring service

this is more of a general discussion rather than a code question.
I have a website monitoring platform whereby users of the system can input their website URL and we'll check it every X minutes based on the customer's interval, at each interval, an entry is stored as a UptimeCheck model in the Laravel 8 project with the status being down or up.
If a customer has 20 monitors, and each checks every minute, then over a 30 day period for the one customer they'd accumulate over 1 million rows.
My query, is really do I need to keep this number of rows?
The reason this number of rows is kept is so that we can present a graph showing the average website uptime.
My thinking is that if I created some kind of SVG programatically for each day and store this in the table then I wouldn't need to store as many entries, but my concern here is how would I merge SVG models into one to present a daily graph?
What kind of libraries could I use and how else might I approach this?
Unlike performance, the trick for storing uptime data is simple. You don't store it. ;)
You need to store DOWNTIME data instead. Register only unavailability events and extrapolate uptime when displaying reports.

Abort a table import stuck in 'pending'

Similar questions have been asked but not exactly what I am looking for.
The problem: on some occasions importing a table from Google Cloud to Big Query gets stuck in a 'pending' state for hours if not days. Tables that get stuck in this state never seem to come out of it, or at least we didn't bother waiting that long. I know it's not a queue issue since in the mean time we can import other tables just fine. No errors are returned by Big Query.
My question: in this situation, and in general, how can we safely abort/cancel an import to Big Query without having the table quietly import on us without us knowing. This would actually apply to any table regardless of its state, as long as it hasn't finished importing.
Thanks.
You may be hitting job load rate limits. For example, if you try to start more than two load jobs per minute for the same table, the load jobs against that table will be defferred, while other load jobs against other tables may continue at normal speed.
There are per-project limits on rate at which load jobs will be started and limits on the number of load jobs that can be running per project at any one time. If you send jobs faster than this, we'll queue, but as you've noticed, our queueing is not a fair queue, and can start newer jobs before older ones.
Aborting pending jobs is a commonly requested feature. If you file a feature request here that will help us prioritize it.

Finding an applications scalibility point using JMeter

I am trying to find an applications scalibility point using JMeter. I define the scalability point as "The minimum number of concurrent users from which any increase no longer increases the Throughput per second".
I am using the following technique. Schedule my load test to run for an hour, starting a new thread sending SOAP/XML-RPC Requests every 30 seconds. I do this by setting my number of threads to 120 and my ramp up period to 3600 seconds.
Then looking at my TOTAL rows Throughput in my Summary Report Listener. A new row (thread) is added every 30 seconds, the total throughput number rises until it plateaus at about 123 requests per second after 80 of the threads are active in my case. It then slowly drops the throughput number to 120 per second as the last 20 threads are added. I then conclude that my applications scalability point is 123 requests per second with 80 active users.
My question, is this a valid way to find an application scalibility point or is there different technique that I should be trying?
From a technical perspective what you're doing does answer your question regarding one specific user scenario, though I think you might be missing the big picture.
First of all keep in mind that the actual HTTP request you're sending and ramp up times can often impact what you call a scalability point. Are your requests hitting a cache? Are they not random enough? Are they too random? Do they represent real world requests? is 30 seconds going to give you the same results as 20 seconds or 10 seconds?
From my personal experience it's MUCH easier and more intuitive to look at graphs when trying to analyze app performance. It's not just a question of raw numbers but also looking and trends and rates of change.
For example here is an example testing the ghost.org blogging platofom using JMeter with an interactive JMeter results graph.
http://blazemeter.com/blog/ghost-performance-benchmark

Rest philosophy for updating and getting records

In my app I'm displaying Race objects that essentially have three states: pending, inProgress and completed. I want to display all Races that are currently pending or inProgress, but not the ones that are completed. To do this, I want to create a RESTful API for getting these resources from my server, but I'm not sure what the best (i.e. most RESTful) approach would be.
The issue is that when someone opens or refreshes the app, I need to two things:
Perform a GET on all the Races that are currently displayed in the client to update their status.
GET all of the new pending or inProgress Races that have been created since the client last updated
I've come up with a few different solutions, though I don't know which, if any, would be best:
Simply delete the old Race records on the client and always GET all new records
Perform 2 separate GET operations, the first which updates all the old records, and the second where I GET all the new pending / inProgress Races
Perform a single GET operation where I specify the created date of the last client record, and GET all records that are newer.
To me, this seems like a pretty common scenario but I haven't been able to find a specific answer to this type of problem. I'd like to see what SO thinks :)
Thanks in advance for your help!
Simply delete the old Race records on the client and always GET all new records
This is probably the easiest solution. However you shouldn't do that if you need a very smooth update on your client (for games, data visualization, etc.).
Perform 2 separate GET operations (...) / Perform a single GET operation where I specify the created date of the last client record, and GET all records that are newer.
I would definitely do it with a single operation. Better than an update timestamp (timestamp operations are costly, and several operations could happen at the same time), I would use a sequence number. This is the way CouchDB handles "changes".
Moreover, as you will see in the documentation, this solution can then be upgraded for asynchronous notifications (if you need so).