How to get costs in bigquery by job id - google-bigquery

I want to get the cost of each job I run.
I know that in this table billing.gcp_billing_export_v1_XXXXX there is billing information but I can't find a way to connect it by job id. I also know that in this table region-us.INFORMATION_SCHEMA.JOBS_BY_PROJECT there is costs info.
Im confused, how can I know what is the right way to calc the costs and how can I get the costs by job id?
I have tried to connect between the 2 above tables but couldn't because they have different granularity.
I need to be able to calculate it automaticlly and not manually with Google Cloud Pricing (GCP) Calculator

Related

Get the cost of an individual pipline execution in Data Factory

I'm looking into using Azure Data Factory V2 for integration imports and want to know if there's a way to track the cost of individual pipelines being run?
For example if I had 3 pipelines which represented 3 different integrations would there be a way to see the cost incurred from each?
Is there also a way to do this in near real time, so that during a month I could somehow put a budget on each integration/pipeline?
The Azure Data Factory bills only show the total cost. We can't get the each pipeline cost in Data Factory, we must manually calculate the price.
We can see the pipeline level consumption: Monitor-->Pipeline run-->Consumption:
Azure document says that "The pipeline run consumption view shows you the amount consumed for each ADF meter for the specific pipeline run, but it does not show the actual price charged". We need manually calculate the cost by Pricing calculator.
For your questions, is there a way to see the cost incurred from each?
No, there isn't. Must manually calculate the cost.
Is there also a way to do this in near real time, so that during a month I could somehow put a budget on each integration/pipeline?
I'm afraid no.
Others have post almost same question, please ref here: Azure Data Factory Pipeline Consumption Details

Service that does advanced queries on a data set, and automatically returns relevant updated results every time new data is added to the set?

I'm looking for a cloud service that can do advanced statistics calculations on a large amount of votes submitted by users, in "real time".
In our app, users can submit different kind of votes like picking a favorite, rating 1-5, say yes/no etc. on various topics.
We also want to show "live" statistics to the user, showing the popularity of a person etc. This will be generated by a rather complex SQL where we are calculating the average number of times a person was picked as favorite, divided by total number of votes and the number of games in which the person has been participating etc. And the score for the latest X games should count higher than the overall score for all games. This is just an example, there are several other SQL queries with similar complexity.
All our presentable data (including calculated statistics) is served from Firestore documents, and the votes will be saved as Firestore documents.
Ideally, the Firebase-backend (functions, firestore etc) should not need to know about the query logic.
What I wish for is a pay as you go cloud service that does the following:
I define some schemas and set up the queries we need for the statistics we have (15-20 different SQLs). Like setting up views in MySQL
On every vote, we push the vote data to this service, which will store it in a row.
The service should then, based on its knowledge about the defined queries, and the content of the pushed vote data, determine which statistics that are affected by the newly added row, and recalculate these. A specific vote type can affect one or more statistics.
Every time a statistic is recalculated, the result should be automatically pushed back to our Firebase backend (for instance by calling an HTTPS endpoint that hits a cloud function) - so we can update the relevant Firestore documents.
The service should be able to throttle the calculations, like only regenerating new statistics every 1 minute despite having several votes per second on the same topic.
Is there any product like this in the market? Or can it be built by combining available cloud services? And what is the official term for such a product, if I should search for it myself?
I know that I can probably build a solution like this myself, and run it on a cloud hosted database server, which can scale as our need grows - but I believe that I'm not the first developer with a need of this, so I hope that someone has solved it before me :)
You can leverage the existing cloud services available on the Google Cloud Platform.
Google BigQuery, Google Cloud Firestore, Google App Engine (CRON Jobs), Google Cloud Tasks
The services can be used to solve the problems mentioned above:
1) Google BigQuery : Here you can define schema for the data on which you're going to run the SQL queries. BigQuery supports Standard and legacy SQL queries.
2) Every vote can be pushed to the defined BigQuery tables using its streaming insert service.
3) Every vote pushed can trigger the recalculation service which calculates the statistics by executing the defined SQL queries and the query results can be stored as documents in collections in Google Cloud Firestore.
4) Google Cloud Firestore: Here you can store the live statistics of the user. This is a real time database, so you'll be able to configure listeners for the modifications to the statistics and show the modifications as soon as the statistics are recalculated.
5) In the same service which inserts every vote, create a new record with a "syncId" in an another table. The idea is to group a number of votes cast in a particular interval to a its corresponding syncId. The syncId can be suffixed with a timestamp. According to your requirement a particular time interval can be set so that the recalculation can be triggered using CRON jobs service which invokes the recalculation service within the interval. Once the recalculation related to a particular syncId is completed the record corresponding to the syncId should be marked as completed.
We are leveraging the above technologies to build a web application on Google Cloud Platform, where the inputs are recorded on Google Firestore and then stream-inserted to Google BigQuery. The data stored in BigQuery is queried after 30 sec of each update using SQL queries and the query results are stored in Google Cloud Firestore to serve dashboards which are automatically updated using listeners configured for the collection in which the dashboard information is stored.

BQ: How to check query cost every time

Is it possible to show alert or message popup every time I run queries in BQ GUI?I am afraid of spending query cost too much.
I hope BQMate has this function.
Sometimes the cost of the query can only be determined when the query is finished, e.g, federated tables, and the newly released clustering tables. If you're concerned about the cost, the best option is to set the Maximum Bytes Billed option, then you can be sure you'll never be charged for more than that. You can set a default value for this option in your project, but right now you have to contact the support to set it for your project.
A fast way to get a query cost estimation is checking the amount of data processed on the right side of the screen in the query validator, by performing a dry-run. Check here a "query validator" example. You have two options to calculate the cost:
Manually: query pricing is described here on GB units, so you can sum and multiply: 1 free TB per month, $5 per extra TB. If you expect to query more than 1TB of data per month, you should sum queries' used data to know when to start calculating costs.
Automatically: using the online pricing calculator, which is available for all Google Cloud Platform products.
If you want to set custom cost controls, have a look on this page, since custom quotas are not enabled by default. Cost controls can be applied on project -level or user-level by restricting the number of bytes billed. Nowadays you have to submit a request from the Google Cloud Platform Console to ask for them to be set, on 10TB increments. If the usage exceeds a set quota the error message is quite clear, and is different depending on the project/user quota exceeded. For project quota:
Custom quota exceeded: Your usage exceeded the custom quota for
QueryUsagePerDay, which is set by your administrator. For more information,
see https://cloud.google.com/bigquery/cost-controls
With no remaining quota, BigQuery stops working for everyone in that project.
If you want to constantly monitorize billing data for BigQuery, have a look on this tutorial, which explains how to create a billing dashboard using Data Studio.
I don't know about BQMate since this is from Vaint Inc.

BigQuery detailed charges just shows how much data was analyzed

I'm trying to find out what is causing my BigQuery bill to be so high but when I click View Detailed Charges on Google Cloud I just get how much data was analyzed and how much it costs. Is there a place where I can view a detailed breakdown of what jobs cost so much and what is causing the bill to get so large?
Is there a place where I can view a detailed breakdown of what jobs cost so much and what is causing the bill to get so large?
You should be able to use Jobs.list API to lists all jobs that you started in the specified project. Job information is available for a six month period after creation. The job list is sorted in reverse chronological order, by job creation time. Requires the Can View project role, or the Is Owner project role if you set the allUsers property
You actually can even make it without any coding - https://cloud.google.com/bigquery/docs/reference/rest/v2/jobs/list#try-it
Collect all you jobs info and analyse it as you wish
For the long term solution - you can either automate above process or use BigQuery Monitoring using Stackdriver

How can I retrieve data from SAP EWM from the query 0WM_MP17_Q0001?

I wanted to know how one can retrieve data from the various query tools available in SAP EWM.
I found the queries in the following link: Extended Warehouse Management - SAP Library
The description of the query 0WM_MP17_Q0001 says:
0WM_MP17_Q0001
You can use this query to see the number and duration of confirmed warehouse orders by day, week, or month. This allows you to see when typical warehouse trends are changing, and thus take actions such as:
Adjusting work schedules to meet demands
Hiring new workers, or letting existing workers go
Requesting budget for expenses such as extra equipment
And I need to retrieve the data for the reasons above.
However, is there a transaction code that I can run to get this report? How can I retrieve this data?
I think you already asked this question on SDN and got a response, see your message and response.
This is BI content.