Bigquery - How to in crease the expiration time of tables in free sandbox? - google-bigquery

I am using the free bigquery sandbox to generate some custom metrics based on my analytics data. I have read in the documentation that the expiration time of table in free account is 60 days. What does this expiration time means ? What will exactly happen after 60 days. All my datas will be lost ? How can i increase the expiration time in this case ? Should i need to pay for it ? If yes, what will be the cost ?

According to the documentation:
The BigQuery sandbox gives you free access to the power of BigQuery
subject to the sandbox's limits. The sandbox allows you to use the web
UI in the Cloud Console without providing a credit card. You can use
the sandbox without creating a billing account or enabling billing for
your project.
In addition, according to the limits :
All datasets have the default table expiration time and the default
partition expiration set to 60 days. Any tables, views, or partitions
in partitioned tables automatically expire after 60 days.
You can edit this expiration date if your data is exported to BigQuery but, in order to do that, you have to upgrade the project's plan to use it (if needed). Then you would be billed by the amount of bytes processed, you can check the billing options here.
Thus, within BigQuery you can edit the expiration date. In BigQuery, you go to Project > Dataset > Table > Details > click in the pencil next to the table's name and set expiration date to never or select a date. As follows:

Related

Bigquery: Will switching from BQ sandbox to BQ paid will change 60 days data limit setup in sandbox

Bigquery: Will switching from BQ sandbox to BQ paid will change 60 days data limit setup in sandbox.
Also, will I be able to export all GA4 data (last 1 year minimum) post switching to BQ paid?
currently we only have 60 days data in BQ sandbox and want to know if moving to BQ paid service will remove this limitation.
lso, will I be able to export all GA4 data (last 1 year minimum) post switching to BQ paid?
I'm not sure if the change from 60 days is automatic, you may have to change it manually.
Unfortunately, you can't export old data from GA4. Once you are out of the sandbox and have changed the data limit, you will start to get more days stored.

How to manually test a data retention requirement in a search functionality?

Say, data needs to be kept for 2years. Then all data that were created 2years + 1day ago should not be displayed and be deleted from the server. How do you manually test that?
I’m new to testing and I can’t think of any other ways. Also, we cannot do automation due to time constraints.
You can create the data with backdating of more than two years in the database and can test, if it is being deleted or not automatically, In other ways ,you can change the current business date from the database and can test it
For the data retention functionality a manual tester needs to remember the search data so that the tester can perform the test cases for the search retention feature.
By Taking an example of a social networking app , being a manual tester you need to remember all the users that you searched for recently.
To check the time period of retention you can take the help from the backend developer so that they can change the time period (from like one year to 10 min) for testing purpose.
Even if you delete the search history and then you start typing the already entered search result the related result should pop on the first location of the search result. Data retention policies concern what data should be stored or archived, where that should happen, and for exactly how long. Once the retention time period for a particular data set expires, it can be deleted or moved as historical data to secondary or tertiary storage, depending on the requirement
Let’s us understand with an example, that we have below data in our database table based on past search made by users. Now with the help of this table, you can perform this testing with minimum effort and optimum result. We have Current Date as - ‘2022-03-10’ and Status column states that data is available / not available in database, where Visible means available, while Expired means deleted from table.

Search Keyword
Search On Date
Search Expiry Date
Status
sport
2022-03-05
2024-03-04
Visible
cricket news
2020-03-10
2022-03-09
Expired - Deleted
holy books
2020-03-11
2022-03-10
Visible
dance
2020-03-12
2022-03-11
Visible

How can I set a date range for requestReport using Amazon Advertising API?

Is it possible to set a date range for requestReport?
POST /v2/sp/{recordType}/report
{
"segment": {segment},
"reportDate": {reportDate}, <-- here
"metrics": {metrics}
}
Or do I need to make 30 request in order to get results for a month? Maybe snapshots can help?
Yes, you have to do multiple requests (one for each day) and merge them. The Advertising API currently does not offer an option to request data for a date range.
Snapshots can maybe help you to get data which has no click data for the requested date(s).
Yes, you do. What I do is run it daily using a cron job and load the data into a database, and query the database. Keep in mind certain metrics, like sales, don't finalize until after the attribution window e.g. 14 days.

Normal for BigQuery data to be higher than Firebase?

I'm running the following query to select the active users for a time frame on my project.
SELECT DISTINCT
active_users,
unix
FROM [mobileapp_logs].[dbo].[active_users]
WHERE (rtrim(app_id) + ':' + app_os) = 'tbl'
AND [aggregation] = '30-day-active'
AND [unix] BETWEEN 1491696000 AND 1494288000
AND active_users >= 100
The query seems to be working but with every row returned for that day it will give me about 10 - 30 more than what's in firebase. Is this normal for bigquery -> firebase?
I'm not familiar with the table you are querying, according to the documentation Firebase imports data to app_events_intraday_YYYYMMDD. Could you provide more information about [mobileapp_logs].[dbo].[active_users]?
According to different SO questions it seems there may be a delay of a few days where offline devices upload their data. Also Firebase updates data in BigQuery daily. Since you are querying up until today you may be seeing data that has already been updated in Firebase but not in BigQuery. I would recommend changing your query to a range ending 3 days before today.

Where do you get Google Bigquery usage info (mainly for processed data)

I know that BigQuery offers the first "1 TB of data processed" per month for free but I can't figure out where to look on my dashboard to see my monthly usage. I used to be able to "revert" to the old dashboard which had the info but for the past couple of weeks the "old dashboard" isn't accessible.
From the Google Cloud Console overview page for your project, click on the "details" section on the top-right, next to the charge estimate :
You'll get an estimate of the charges for the current month for each service and item in the service, including Big Query analysis :
If you want to track this usage, you can also export the data into CSV every day by going in the Billing settings and enable the usage export feature. Do not worry about the fact that it only mentions Compute Engine, it actually works for other services also.
You can also access directly the billing history by clicking on the billing account link :
You will get a detailed bill with the usage info :
Post GCP Console Redesign Answer
The GCP console was redesigned and now the other answer here no longer applies, but it is still possible to view your usage by going to IAM & Admin -> Quotas.
What you're looking for is "Big Query API: Query usage per day". It doesn't seem possible to view your usage over 30 days unfortunately, but you can see your current usage (per day) and your peak usage over the past 7 days. You can also set a daily quota. If you're just working infrequently or doing a lot in one day, you can set a quota to 1 TiB and prevent yourself from blowing your whole allocation in one day.
You can try sending feedback about these limitations, like I did, by clicking the question mark at the top right and then send feedback.
Theo is correct that there is no way to view the number of bytes processed or billed since the start of the month (inside of the free tier) in the GCP Billing Console. However, you can extract the bytes processed and bytes billed data from logs in Cloud Logging and calculate the total bytes processed/billed since the start of the month inside of BigQuery.
Here are the steps to count total bytes billed in a month:
Under Cloud Logging, go to Logs Explorer (NOT the Legacy Logs Explorer) and run the following query in the query builder frame:
resource.type="bigquery_project" AND
protoPayload.metadata.jobChange.job.jobStats.queryStats.totalBilledBytes>1 AND
timestamp>="2021-04-01T00:00:00Z"
The timestamp clause is not actually necessary, but it will speed up the query. You can set timestamp >= <value> to any valid timestamp you want as long as it returns at least one result.
In the Query Results frame, click the "Action" button, and select "Create Sink".
In the window that opens, give your sink a name, click "Next", and in the "Select sink service" dropdown menu select "BigQuery dataset".
In the "Select BigQuery dataset" dropdown menu, either select an existing dataset where you would like to create your sink (which is a table containing logs) or if you prefer, choose "Create new BigQuery dataset.
Finally, you will likely want to check the box for Partition Table, since this will help you control costs whenever you query this sink. As of the time of this answer, however, Google limits partition tables to 4000 partitions, so you may find it is necessary to clear out old logs eventually.
Click "Create Sink" (there is no need for any inclusion or exclusion filters).
Run a query in BigQuery that produces bytes billed (i.e. a query that does not return a previously cached result). This is necessary to instantiate the sink. Moments after your query runs, you should now see a table called <your_biquery_dataset>.cloudaudit_googleapis_com_data_access
Enter the following Standard SQL query in the BigQuery query editor:
WITH
bytes_table AS (
SELECT
JSON_VALUE(protopayload_auditlog.metadataJson,
'$.jobChange.job.jobStats.createTime') AS date_time,
JSON_VALUE(protopayload_auditlog.metadataJson,
'$.jobChange.job.jobStats.queryStats.totalBilledBytes') AS billedbytes
FROM
`<your_project><your_bigquery_dataset>.cloudaudit_googleapis_com_data_access`
WHERE
EXTRACT(MONTH
FROM
timestamp) = 4
AND EXTRACT(YEAR
FROM
timestamp) = 2021)
SELECT
(SUM(CAST(billedbytes AS INT64))/1073741824) AS total_GB
FROM
bytes_table;
You will want to chance the month from 4 to whatever month you intend to query, and 2021 to whatever year you intend to query. Also, you may find it helpful to save this query as a view if you intend to rerun it periodically.
Be advised that your sink does not contain your past BigQuery logs, only BigQuery logs produced after you created the sink. Therefore in the first month the number of GB returned by this query will not be an accurate count your bytes billed in month unless you happen to have created the sink prior to running any queries in BigQuery during the current month.
Might be related to How can I monitor incurred BigQuery billings costs (jobs completed) by table/dataset in real-time?
If you are fine by using BigQuery itself to get that information (instead of using a UI), you can use something like this:
DECLARE gb_divisor INT64 DEFAULT 1024*1024*1024;
DECLARE tb_divisor INT64 DEFAULT gb_divisor*1024;
DECLARE cost_per_tb_in_dollar INT64 DEFAULT 5;
DECLARE cost_factor FLOAT64 DEFAULT cost_per_tb_in_dollar / tb_divisor;
SELECT
ROUND(SUM(total_bytes_processed) / gb_divisor,2) as bytes_processed_in_gb,
ROUND(SUM(IF(cache_hit != true, total_bytes_processed, 0)) * cost_factor,4) as cost_in_dollar,
user_email,
FROM (
(SELECT * FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
UNION ALL
(SELECT * FROM `other-project.region-us`.INFORMATION_SCHEMA.JOBS_BY_USER)
)
WHERE
DATE(creation_time) BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY) and CURRENT_DATE()
GROUP BY
user_email
Open in BigQuery UI
Explanation
Please consider the caveats I mentioned in my answer here