bigquery reservations are automatic? - google-bigquery

Very basic question. If I purchase flex slots on BigQuery related with a specific project id, without (1) creating a reservation manually, and (2) assigning those slots, are my queries related to this project automatically going to be billed using flex slots?
I assume so - the unclear documentation suggests that a 'default' reservation is created when you purchase slots. Therefore, I imagine BigQuery recognizes the user's intention, unless otherwise specified, is to use the purchased capacity.
It would be a double whammy though if I was charged on-demand pricing while my slots were idle. And, I sense that given I reserved 100 slots, my queries feel slower. But I can't see a way to confirm the jobs used reservations.

Reservations
After you purchase slots, you can assign them to different buckets, called reservations. Reservations let you allocate the slots in ways that make sense for your particular organization.
A reservation named default is automatically created when you purchase
slots.
There is nothing special about the default reservation — it's created as a convenience. You can decide whether you need additional reservations or just use the default reservation.
For example, you might create a reservation named prod for production workloads, and a separate reservation named test for testing. That way, your test jobs won't compete for resources that your production workloads need. Or, you might create reservations for different departments in your organization.
Assignments
To use the slots that you purchase, you will assign projects, folders, or organizations to reservations. Each level in the resource hierarchy inherits the assignment from the level above it, unless you override. In other words, a project inherits the assignment of its parent folder, and a folder inherits the assignment of its organization.
When a job is started from a project that is assigned to a reservation, the job uses that reservation's slots.
If a project is not assigned to a reservation (either directly or by
inheriting from its parent folder or organization), the jobs in that
project use on-demand pricing.
None assignments represent an absence of an assignment. Projects assigned to None use on-demand pricing. The common use case for None assignments is to assign an organization to the reservation and to opt-out some projects or folders from that reservation by assigning them to None. For more information, see Assign a project to None.
Creating assignments
When you create an assignment, you specify the job type for that assignment:
QUERY: Use this reservation for query jobs, including SQL, DDL, DML, and BigQuery ML queries.
PIPELINE: Use this reservation for load, export, and other pipeline jobs.
By default, load and export jobs are free and use a shared pool of slots. BigQuery does not make guarantees about the available capacity of this shared pool. If you are loading large amounts of data, your job may wait as slots become available. In that case, you might want to purchase dedicated slots and assign pipeline jobs to them. We recommend creating an additional dedicated reservation with idle slot sharing disabled.
When load jobs are assigned to a reservation, they lose access to the free pool. Monitor performance to make sure the jobs have enough capacity. Otherwise, performance could actually be worse than using the free pool.
ML_EXTERNAL: Use this reservation for BigQuery ML queries that use services that are external to BigQuery.
Certain BigQuery ML queries use services that are external to BigQuery. To use reserved slots with these external services, create an assignment with job type ML_EXTERNAL.
Screenshots
A full screen guide how to work with Reservations and Assignments is here.

Related

BigQuery flex slot reservations

I'm having trouble understanding slots in BigQuery. The documentation is a lot of marketing and at least for me not very helpful.
Specifically I was looking at Flex slots. This is what I think I understood so far:
If I buy 500 flex slots, I will not have to pay anything for the time being.
I have to create a reservation first to apply these slots.
My questions would be:
In the BQ UI, how do I define on query time if I want to use flex slots or stay on my on demand pricing?
How do I cancel the reservation afterwards, so it's only billed for the time the query runs?
How would I control costs in general?
There is no way to change constantly between both pricing methods. However, there is a workaround that might work for you:
Beforehand you need to specify which projects within your organization will be charged using the slots and which will be charged using on-demand billing.
Then, you can then swap to the project you want to your query in (so this will determine the billing type used for the query).
Make sure to give all the projects permission to access to BigQuery resources within the organization.
I understand you mean how to cancel the commitment (bear in mind the difference between commitment and reservation). Commitment is the purchase of Bigquery slots. Reservations are only a way to make divisions of the slots purchased in the commitment so only specific projects or regions can use these slots (as explained in answer 1.)
If you actually meant commitment for flex slots, you cannot cancel them for 60 seconds after your commitment is active.. Afterward, you can cancel any time and it will stop charging you.

Aggregate or not aggregate

I have User model which is aggregate. I also plan to create WorkingHours object. It's like every user will have his own working hours per day. There will be also graphical user interface separated from User for add/remove/update hours etc. I am thinking that whether should i put all operations into UserRepository related to WorkingHours or should i tread WorkingHours model as aggregate and create separated WorkingHoursRepository so then i could put property into User as id to WorkingHours object. Which option should i choose?
My thoughts are that to not make WorkingHours as aggregate because every set of working hours belong to specific user which makes it if i am thinking right dependent on User and cannot live without it. My only thought about to make it aggregate and create separate repository is due to have cleaner code means not to put all CRUD etc in same repository but i suppose it's should be not the thing to separate it therefore to me the only way is to WorkingHours as value object and not aggregate and use UserRepository for it.
You design your Domain Model based on your business requirements and not on how it needs to be saved.
In this scenario, if Working Hours can be only manipulated within the User domain and if you think User is the only aggregate required, then Working Hours should not be made aggregate. That said, it does not stop you save your data in a clean manner in your data store. Strategy to store your data also depends a lot on your type of data store.
For example, if you are using SQL and your data is stored in multiple tables then you can Commit or Rollback the entire transaction. How you implement it is not tied to DDD as long as you are adhering to the concept that the aggregates should only be updated via the root entity.
If you are using a No-SQL database like Cosmos DB you can choose to load or save the entire document. In that case, you would be only dealing with the User repository.
Hope this helps.

Controlling and monitoring use of BI Engine Reservations

With the new beta BI Engine Reservations, I've noticed some queries speed up, but others remain unaffected. Will it be possible
- to monitor how the reservation is being used?
- to have some control over how the reservation is used?
When it comes to control, I've seen no indication that you'll have any—the system decides what the most efficient mechanism is (BI Engine, query cache, etc.) and then allocates accordingly. Also, the size of your reservation, usage, and age are factored into what is added and subsequently removed from the BI Engine reservation.
While that may seem frustrating, it's also the selling point: zero-config, automatic acceleration of your dashboards. As Google iterates quickly on these products, I would expect some controls to find their way in eventually.
As a workaround, you could use a separate project for data you want to ensure has access to the full reservation (since BI Engine is project-level).
As was mentioned elsewhere, there are a handful of metrics that can be viewed using Stackdriver logging (if you enable it). These are all high-level metrics, and are listed in the documentation:
Reservation Total Bytes
Reservation Used Bytes
Inflight Requests
Request Count
Request Execution Times
These won't likely give you a lot of the information you're looking for, but can be monitored for patterns.
You can use the elasticsearch and logstash for monitoring and implementing a security enviroment. The way with works is simple and for Near Real Time.

How can blockchains be used in audit trails?

I'm currently trying to figure out how to use blockchain in audit trails and potentially in accounting (and if they actually make sense). Both Deloitte and EY mention them.
I somehow cannot understand how this could be of benefit for audits and/or accounting.
To my understanding to make use of the power of blockchains you need multiple users. Only one user means you cannot validate the integrity since all blocks of that user could be compromised (if one block of a blockchain of a user got changed maybe also all of the following where changed, making it impossible to detect the modification). This means blockchains only make sense if you can share them with different users?
Data and thus blockchains however aren't always shared between multiple users. In accounting you often only have one "user"/"owner" of the data. Sure you could create multiple users in one company but there wouldn't be any benefit since they are in one location (company) and potentially all compromised. Or if the admin want's to change something he could easily modify all users making it useless for audits.
To make it work you would need different partners (supplier/customer) to share the information with. In that case you could however only have two users share the same blockchain (depending on legal regulations in your country) and then again who do you trust if one of the two doesn't validate?
Deloitt mentions that they can be used for files. Again I don't see the benefit since you would need multiple users AND files might get compressed with a different algorithm over time rendering them invalid (the useful information didn't change but the block will still be invalid). Or is this a not an issue from your experience? To me it seems it could be a problem.
The same goes for all the internal data which may be important for audits from my point of view. Which company would like to share the information with independent users. Or is it only intendet for "public"/"shared" data?
To identify a modification of one block in a blockchain the user would have ot validate every single block (every hash in the header of a block needs to be compared to the data of the previous block). In terms of accounting a blockchain could be all transactions of one account during one fiscal year. This however could easily be thousands of transactions. Wouldn't this be very slow to validate?
Maybe I'm misunderstanding the point in terms of audit trails but as long as the users are not independent data can always be modified making it useless for audits. And you need a critical mass to share the blockchain with.
First of all, I think that it's neccesary to get the power of Blockchain. It gives us the chance to create descentralized data bases, i.e. data bases that are not controled by an authority. Also, the data of Blockchain is immutable and permanent, i.e. it can not be modified or deleted. Thanks to it you achive a unique descentralized registry in a distributed network, for example for audit trails.
It's true that it has no sense if you use it inside your company. But if you use it among different companies? Each one could encode its data, so the rest of the companies couldn't see it. However, all the data would be stored in all the companies, so anyone couldn't change it. Moreover, you can have more than one user (node) for each company.
Nowadays, there are many implementations of Blockchain, each one with a different objetive. To understan better the power of Blockchain, I suggest you to wathc the video were is explained the new version (the v 1.0) of the Hyperledger Fabric.

DynamoDB for bank transactions

I am thinking to simulate a bank account like transaction in a dynamodb
Being a noSQL type database, would this use case be suited for dynamodb?
Or should I stick with SQL based database?
Does anyone know how do banks usually handles this?
Do they use something like DynamoDB or do they keep our transaction in a separate table?
A SQL based system allows transactions across multiple tables, while DynamoDB only allows transactions at the individual item. However, there is a DynamoDB transaction library for Java that allows atomic transactions across multiple items at the expense of speed and substantially more writes per request.
DynamoDB does support atomic counters, which are bank-account-like (the example on the documentation page is along those line).
DynamoDB also provides conditional writes which can help avoid certain type of race conditions when an item is attempted to be updated concurrently.
Could you build a suitable system on top of DynamoDB? Maybe, but this all really depends on your specific requirements and how you go about implementing such a system. That said, DynamoDB does provide some functionality that I outlined above to address the types of concerns that arise when building such a system.
In 2018, DynamoDB now has transaction support for reading and writing items in multiple tables within a single region. It is not exactly the same as SQL transactions, because transactions may fail and return an error if the data is modified by another parallel operation, but it is ACID.
https://aws.amazon.com/blogs/aws/new-amazon-dynamodb-transactions/