BigQuery: I have reached the daily limit for Load Jobs. When does the quota reset back to 0? - google-bigquery

I have exceeded the daily limit for the number of import to a specific table.
(Max=1000 imports according to the documentation here: https://developers.google.com/bigquery/quota-policy#import )
I would like to know when exactly does the quota reset back to 0? Is it 24hours after I exceeded the quota, or is it at a specific time?

As of this July 18 2014, all daily quotas are partially replenished every 10 minutes or so.
The first time you run a load job to a table (or if you haven't done so in a while) you'll get 1000 loads. Every few minutes, the quota will partially replenish, up to a maximum of 1000 available.
While this sounds complex, it means that you never get in a situation where you run out of daily quota and have to wait up to 24 hours for quota to reset. Instead, if you run out of quota you can start running jobs fairly soon thereafter (as long as you stay within the replenishment rate).
Hope that is helpful.

Related

How many short urls can be generated per day for Firebase Dynamic Links?

https://firebase.google.com/docs/dynamic-links/rest
The document says "Requests are limited to 5 requests/IP address/second, and 200,000 requests/day.".
So which is correct?
"200,000 total times per day" or "200,000 times per 1 IP Address per day"??
Look at it more like, 5 requests/IP address/second as a rate - you can't exceed that number of requests sent per IP address per second.
Whereas 200,000 requests/day is the total limit of requests you can send per day.
So, you're probably looking for the answer as 200,000 total requests per day.
If you require it, you can request to increase your quota using this form.
Enter the following fields to request the increase in Firebase Dynamic Links API quota. We have default spread quota (200,000 queries per day) and burst (500 queries per 100 seconds). Consider spreading your load over a longer period of time before requesting.

report scheduler system design using database as master

Problem
we have ~50k scheduled financial reports that we periodically deliver to clients via email
reports have their own delivery frequency (date&time format - as configured by clients)
weekly
daily
hourly
weekdays only
etc.
Current architecture
we have a table called report_metadata that holds report information
report_id
report_name
report_type
report_details
next_run_time
last_run_time
etc...
every week, all 6 instances of our scheduler service poll the report_metadata database, extract metadata for all reports that are to be delivered in the following week, and puts them in a timed-queue in-memory.
Only in the master/leader instance (which is one of the 6 instances):
data in the timed-queue is popped at the appropriate time
processed
a few API calls are made to get a fully-complete and current/up-to-date report
and the report is emailed to clients
the other 5 instances do nothing - they simply exist for redundancy
Proposed architecture
Numbers:
db can handle up to 1000 concurrent connections - which is good enough
total existing report number (~50k) is unlikely to get much larger in the near/distant future
Solution:
instead of polling the report_metadata db every week and storing data in a timed-queue in-memory, all 6 instances will poll the report_metadata db every 60 seconds (with a 10 s offset for each instance)
on average the scheduler will attempt to pick up work every 10 seconds
data for any single report whose next_run_time is in the past is extracted, the table row is locked, and the report is processed/delivered to clients by that specific instance
after the report is successfully processed, table row is unlocked and the next_run_time, last_run_time, etc for the report is updated
In general, the database serves as the master, individual instances of the process can work independently and the database ensures they do not overlap.
It would help if you could let me know if the proposed architecture is:
a good/correct solution
which table columns can/should be indexed
any other considerations
I have worked on a differt kind of sceduler for a program that reported analyses on a specific moment of the month/week and what I did was combining the reports to so called business cycle based time moments. these moments are on the "start of a new week", "start of the month", "start/end of a D/W/M/Q/Y'. So I standardised the moments of sending the reports and added the id's to a table that would carry the details of the report. - now you add thinks to the cycle of you remove it when needed, you could do this by adding a tag like(EOD(end of day)/EOM (End of month) SOW (Start of week) ect, ect, ect,).
So you could index the moments of when the clients want to receive the reports and build on that track. Hope that this comment can help you with your challenge.
It seems good to simply query that metadata table by all 6 instances to check which is the next report to process as you are suggesting.
It seems odd though to have a staggered approach with a check once every 60 seconds offset by 10 seconds for your servers. You have 6 servers now but that may change. Also I don't understand the "locking" you are suggesting, why now simply set a flag on the row such as [State] = "processing", then the next scheduler knows to skip that row and move on to the next available one. Once a run is processed, you can simply update a [Date_last_processed] column, or maybe something like [last_cycle_complete] = 'YES'.
Alternatively you could have one server-process to go through the table, and for each available row, sends it off to one of the instances, in a round-robin fashion (or keep track of who is busy and who isn't).

How to alert on an event that normally happens once a day?

I have a batch job that runs once per day.
At the end of the job I submit a meter metric with a count of the items processed.
I want to alert if one day this metric is not updated.
On http://metrics.librato.com the maximum time I can check "not reported for" when creating an alert is 60 minutes.
I thought maybe I can create a composite metric and take the avg rate of change over the past 24 hours, and alert if that reaches zero.
I've been trying:
derive(s("my.metric", "%", {function:"sum", period:"86400"}))
However it seems that, because I log only a single event, above quite small values of period (~250s) my rate of change simply drops to zero ...I guess the low frequency means my single value is completely lost by the sampling.
Maybe I am using the wrong tool for the job...
Is there a way to achieve this in Librato?
There currently is not a way to achieve this as composite metrics are subject to the 60 minute limitation of alerts as well (as of 5/15/2015). You may need to look into configuring the metric (or a similar metric) to report within the 60m time range if possible.

Best way to increment health of a user in a game every minute

Previously for my PHP app, I used a cron job that increments the health of a user in SQL every 10 minutes and the cron job script incremented the health of all users.
For my next app, I tried using MySQL events to increment the health every minutes for each individual user and ran into some problems with them not working after awhile (MySQL events stop working after awhile)
What's the best way to do this if I were to create a new app in Ruby on Rails? I'm open to using MySQL or PostgreSQL.
This is for a game where users will fight each other and lose health.
edit: Sometimes the user will encounter another user, and I need to select that user based on their health among other things. So I need the actual health stored in the database.
Instead updating every record in the database every 10 minutes, store a last-modified timestamp in the same row as the health. Every time you read the player_health from the database, add (current_time - last_modified) / (10 min) to the value. Every time you write player_health to the database, update the last_modified.
I would create a rake task that increases all users' health by 10, and call it using the awesome whenever gem every 10 minutes.
UPDATE
However, as Dan said in his comment, it might be inefficient to do such a huge DB update every 10 seconds (especially if you have huge number of users) if you can just update every user's health when he requests that. But that's subject to how your game actually works.
The correct fix, given a health bump of 10 points every minute, is a hitpoints variable and a timestamp for the last time it was set. Then the select statement will say "hitpoints + minutes(now - timestamp) * 10". Converting that to SQL is left as an exercise for the reader.

Dealing with Amazon Product Advertising API Throttle limits

For those of you who use the Amazon Product Advertising API, what experience have you had with running into their throttle? Supposedly, the limit is set at 1 request per second, is that your experience?
I want my site to grow to be nation-wide, but I'm concerned about its capability to make all the Amazon API requests without getting throttled. We cache all the responses for 24 hours, and also throttle our own users who make too many searches within a short period.
Should I be concerned? Any suggestions?
I believe they have changed it. Per this link:
https://forums.aws.amazon.com/message.jspa?messageID=199771
Hourly request limit per account = 2,000 + 500 * [Average associate revenue driven per day over the past 30 days period]/24 to a maximum of 25,000 requests per hour.
Here is the latest on request limits that I could find, effective Sept 3rd, 2012.
If your application is trying to submit requests that exceed the
maximum request limit for your account, you may receive error messages
from Product Advertising API. The request limit for each account is
calculated based on revenue performance. Each account used to access
the Product Advertising API is allowed an initial usage limit of 1
request per second. Each account will receive an additional 1 request
per second (up to a maximum of 10 requests per second) for every
$4,600 of shipped item revenue driven per hour in a trailing 30-day
period.
https://affiliate-program.amazon.com/gp/advertising/api/detail/faq.html
They have updated their guidelines, you now have more requests when you sell more items.
Effective 23-Jan-2019, the request limit for each account is calculated based on revenue performance attributed to calls to the
Product Advertising API (PA API) during the last 30 days.
Each account used for Product Advertising API is allowed an initial
usage limit of 8640 requests per day (TPD) subject to a maximum of 1
request per second (TPS). Your account will receive an additional 1
TPD for every 5 cents or 1 TPS (up to a maximum of 10) for every $4320
of shipped item revenue generated via the use of Product Advertising
API for shipments in the last 30 days.
Source: https://docs.aws.amazon.com/AWSECommerceService/latest/DG/TroubleshootingApplications.html
Amazon enforces limits on how many calls you can make per hour and per second.
You can increase the former by following the sanctioned route (increase commission revenue) or by privately petitioning Amazon with a valid reason. When whitelisted, your limit will go up to 25,000 calls per hour, which is more than good enough for the vast majority of projects I can think of.
The latter limit is murkier and enforced depending on the type of query you make. My interpretation is that it is meant to keep serial crawlers who do batch item lookups in check. If you are simply doing keyword searches etc., I would not worry so much about it. Otherwise, the solution is to distribute your calls across multiple IPs.
One other point to keep in mind if you are querying multiple locales is to use separate accounts per locale. Some locales are grouped and will count to the same call quota. European Amazons, for instance, form such a pool.