I'm having trouble understanding slots in BigQuery. The documentation is a lot of marketing and at least for me not very helpful.
Specifically I was looking at Flex slots. This is what I think I understood so far:
If I buy 500 flex slots, I will not have to pay anything for the time being.
I have to create a reservation first to apply these slots.
My questions would be:
In the BQ UI, how do I define on query time if I want to use flex slots or stay on my on demand pricing?
How do I cancel the reservation afterwards, so it's only billed for the time the query runs?
How would I control costs in general?
There is no way to change constantly between both pricing methods. However, there is a workaround that might work for you:
Beforehand you need to specify which projects within your organization will be charged using the slots and which will be charged using on-demand billing.
Then, you can then swap to the project you want to your query in (so this will determine the billing type used for the query).
Make sure to give all the projects permission to access to BigQuery resources within the organization.
I understand you mean how to cancel the commitment (bear in mind the difference between commitment and reservation). Commitment is the purchase of Bigquery slots. Reservations are only a way to make divisions of the slots purchased in the commitment so only specific projects or regions can use these slots (as explained in answer 1.)
If you actually meant commitment for flex slots, you cannot cancel them for 60 seconds after your commitment is active.. Afterward, you can cancel any time and it will stop charging you.
Related
I am trying to use slots to run my queries, so I asked from BigQuery to increase my quota after my request got approval (5 days!) I pressed on "buy slots" and select 400 (i have 500 available) and I thought that will be enough to make sure that my queries will run on those 400 slots and not on the "on-demand-serverless" method.
unfortunately, I got the bill for my queries the day after and I saw they charged me for the "on-demand" charging method.
I tried to use BigQuery chat support to understand how to use those 400 slots and guarantee that my queries will run on those slots but I didn't get any useful answer!
does someone know how can I use those slots to run my query? what I did wrong?
thanks,
After you buy BigQuery slots you have to create a "reservation" and assigned it to a project. At that point, all the queries running inside that project (note that they can reference table outside the project) will use that slots reservation. See Assign a project to a reservation for more details.
Very basic question. If I purchase flex slots on BigQuery related with a specific project id, without (1) creating a reservation manually, and (2) assigning those slots, are my queries related to this project automatically going to be billed using flex slots?
I assume so - the unclear documentation suggests that a 'default' reservation is created when you purchase slots. Therefore, I imagine BigQuery recognizes the user's intention, unless otherwise specified, is to use the purchased capacity.
It would be a double whammy though if I was charged on-demand pricing while my slots were idle. And, I sense that given I reserved 100 slots, my queries feel slower. But I can't see a way to confirm the jobs used reservations.
Reservations
After you purchase slots, you can assign them to different buckets, called reservations. Reservations let you allocate the slots in ways that make sense for your particular organization.
A reservation named default is automatically created when you purchase
slots.
There is nothing special about the default reservation — it's created as a convenience. You can decide whether you need additional reservations or just use the default reservation.
For example, you might create a reservation named prod for production workloads, and a separate reservation named test for testing. That way, your test jobs won't compete for resources that your production workloads need. Or, you might create reservations for different departments in your organization.
Assignments
To use the slots that you purchase, you will assign projects, folders, or organizations to reservations. Each level in the resource hierarchy inherits the assignment from the level above it, unless you override. In other words, a project inherits the assignment of its parent folder, and a folder inherits the assignment of its organization.
When a job is started from a project that is assigned to a reservation, the job uses that reservation's slots.
If a project is not assigned to a reservation (either directly or by
inheriting from its parent folder or organization), the jobs in that
project use on-demand pricing.
None assignments represent an absence of an assignment. Projects assigned to None use on-demand pricing. The common use case for None assignments is to assign an organization to the reservation and to opt-out some projects or folders from that reservation by assigning them to None. For more information, see Assign a project to None.
Creating assignments
When you create an assignment, you specify the job type for that assignment:
QUERY: Use this reservation for query jobs, including SQL, DDL, DML, and BigQuery ML queries.
PIPELINE: Use this reservation for load, export, and other pipeline jobs.
By default, load and export jobs are free and use a shared pool of slots. BigQuery does not make guarantees about the available capacity of this shared pool. If you are loading large amounts of data, your job may wait as slots become available. In that case, you might want to purchase dedicated slots and assign pipeline jobs to them. We recommend creating an additional dedicated reservation with idle slot sharing disabled.
When load jobs are assigned to a reservation, they lose access to the free pool. Monitor performance to make sure the jobs have enough capacity. Otherwise, performance could actually be worse than using the free pool.
ML_EXTERNAL: Use this reservation for BigQuery ML queries that use services that are external to BigQuery.
Certain BigQuery ML queries use services that are external to BigQuery. To use reserved slots with these external services, create an assignment with job type ML_EXTERNAL.
Screenshots
A full screen guide how to work with Reservations and Assignments is here.
With the new beta BI Engine Reservations, I've noticed some queries speed up, but others remain unaffected. Will it be possible
- to monitor how the reservation is being used?
- to have some control over how the reservation is used?
When it comes to control, I've seen no indication that you'll have any—the system decides what the most efficient mechanism is (BI Engine, query cache, etc.) and then allocates accordingly. Also, the size of your reservation, usage, and age are factored into what is added and subsequently removed from the BI Engine reservation.
While that may seem frustrating, it's also the selling point: zero-config, automatic acceleration of your dashboards. As Google iterates quickly on these products, I would expect some controls to find their way in eventually.
As a workaround, you could use a separate project for data you want to ensure has access to the full reservation (since BI Engine is project-level).
As was mentioned elsewhere, there are a handful of metrics that can be viewed using Stackdriver logging (if you enable it). These are all high-level metrics, and are listed in the documentation:
Reservation Total Bytes
Reservation Used Bytes
Inflight Requests
Request Count
Request Execution Times
These won't likely give you a lot of the information you're looking for, but can be monitored for patterns.
You can use the elasticsearch and logstash for monitoring and implementing a security enviroment. The way with works is simple and for Near Real Time.
Google Cloud billing is not updating with the free trial (on monthly payments) and I can not change it to a faster update cycle. As per https://cloud.google.com/free-trial/docs/billing-during-free-trial the bill should come every month.
It is therefore not easy to see how much of the 300$ is left.
Is there any way to at least see how many TBs my queries used? This should be by far the biggest item on the bill.
I am concerned that I might get 'stuck' between some important queries that I otherwise could have managed better to have at least partial results available after the trial ends.
BigQuery analysis & storage costs should be listed under your GCP billing transactions:
https://console.cloud.google.com/billing/<INSERT_YOUR_BILLING_ID_HERE>/history?e=13803970,13803205
Another way to see how much you have queried is by enabling audit logging as described here.
I am trying to download as much information from Bloomberg for as many securities as I can. This is for a machine learning project, and I would like to have the data reside locally, rather than querying for it each time I need it. I know how to download information for a few fields for a specified security.
Unfortunately, I am pretty new to Bloomberg. I've taken a look at the excel add-in, and it doesn't allow me to specify that I want ALL securities and ALL their data fields.
Is there a way to blanket download data from Bloomberg via excel? Or do I have to do this programmatically. Appreciate any help on how to do this.
Such a request in unreasonable. Bloomberg has dozens of thousands of fields for each security. From fundamental fields like sales, through technical analysis like Bollinger bands and even whether CEO is a woman and if the company abides by Islamic law. I doubt all of these interest you.
Also, some fields come in "flavors". Bloomberg allows you to set arguments when requesting a field, these are called "overrides". For example, when asking for an analyst recommendation, you could specify whether you're interested in yearly or quarterly recommendation, you could also specify how do you want the recommendation consensus calculated? Are you interested in GAAP or IFRS reporting? What type of insider buys do you want to consider? I hope I'm making it clear, the possibilities are endless.
My recommendation is, when approaching a project like you're describing: think in advance what aspects of the security do you want to focus on? Are you looking for value? growth? technical analysis? news? Then "sit down" with a Bloomberg rep and ask what fields apply to this aspect. Then download those fields.
Also, try to reduce your universe of securities. Bloomberg has data for hundreds of thousands of equities. The total number of securities (including non equities) is probably many millions. You should reduce that universe to securities that interest you (only EU? only US? only above certain market capitalization?). This could make you research more applicable to real life. What I mean is that if you find out that certain behavior indicates a stock is going to go up - but you can't buy that stock - then that's not that interesting.
I hope this helps, even if it doesn't really answer the question.
They have specific "Data Licence" products available if you or your company can fork out the (likely high) sums of money for bulk data dumps. Otherwise, as has been mentioned, there are daily and monthly restrictions on how much data (and what type of data) is downloaded via their API. These limits are not very high at all and so, by the sounds of your request, this will take a long and frustrating time. I think the daily limits are something like 500,000 hits, where one hit is one item of data, e.g. a price for one stock. So if you wanted to download only share price data for the 2500 or so US stocks, you'd only managed 200 days for each stock before hitting the limit. And they also monitor your usage, so if you were consistently hitting 500,000 each day - you'd get a phone call.
One tedious way around this is to manually retrieve data via the clipboard. You can load a chart of something (GP), right click and copy data to clipboard. This stores all data points that are on display, which you can dump in excel. This is obviously an extremely inefficient method but, crucially, has no impact on your data limits.
Unfortunately you will find no answer to your (somewhat unreasonable) request, without getting your wallet out. Data ain't cheap. Especially not "all securities and all data".
You say you want to download "ALL securities and ALL their data fields." You can't.
You should go to WAPI on your terminal and look at the terms of service.
From the "extended rules:"
There is a daily limit to the number of hits you can make to our data servers via the Bloomberg API. A "hit" is defined as one request for a singled security/field pairing. Therefore, if you request static data for 5 fields and 10 securities, that will translate into a total of 50 hits.
There is a limit to the number of unique securities you can monitor at any one time, where the number of fields is unlimited via the Bloomberg API.
There is a monthly limit that is based on the volume of unique securities being requested per category (i.e. historical, derived, intraday, pricing, descriptive) from our data servers via the Bloomberg API.