Is it possible to query only rows that were modified in a certain timespan?
The table in question is generated by Google Analytics, but it doesn't have an obvious way to query something like this (for example a last_modified timestamp or something smiliar).
As others have commented, BigQuery does not have a concept of updated rows. It's append only.
If you want to get newly inserted rows for a given timespan, you could either use timestamps when inserting and query using that column, or use table decorators [1].
[1] https://cloud.google.com/bigquery/table-decorators
Related
I am working on bigquery with standard sql and I have the following problem.
I am transforming a table with millions of data, but I will only work with the data of yesterday and today.
The result of that query (which is already listed) I have to store in another table.
The problem is that what must be executed every 1 hour and when creating the scheduled query and placing the option of "write append", the data that has been previously saved will be duplicated.
I need something like "write to table if it does not exist"
You should have your scheduled query written with replace in mind:
REPLACE TABLE `dataset.mytable`
AS
SELECT 1;
this way you fully replace on each run.
Update:
You may use MERGE statement to skip existing rows and add only new ones.
Materialized views are appending only new data.
They can query only single table, support only a limited set of aggregation functions (APPROX_COUNT_DISTINCT, ARRAY_AGG, AVG, COUNT, HLL_COUNT.INIT, MAX, MIN, SUM) and do not support a computation on top of an aggregation but maybe they will fit your use case.
This article: https://cloud.google.com/bigquery/docs/writing-results states that it is possible to overwrite a BigQuery table with new data however what I'd like to do is overwrite a partition (or multiple partitions). Is that possible?
I've read through tonnes of of documentation about inserting data into BigQuery (e.g. https://cloud.google.com/bigquery/docs/creating-column-partitions) and can't find any reference to overwriting partitions so I assume the answer to my question is "no", but thought I'd ask anyway.
You can always over-write a partitioned table in BQ using the postfix of YYYYMMDD in the output table name of your query, along with using WRITE_TRUNCATE as your write disposition (i.e. to truncate whatever is existing in that partition and write new results).
So, lets say when you run your query, and you want to overwrite a partition for date 2019-01-15 in your table named xyz, you just set the output destination for your query results to be yourdataset.xyz$20190115 and specify the write disposition to be WRITE_TRUNCATE.
Hope it helps.
You are in luck! This is possible through MERGE DML.
https://cloud.google.com/bigquery/docs/using-dml-with-partitioned-tables#pruning_partitions_when_using_a_merge_statement
My advice is to play around with it a bit. If you can't get it working, post a new question with specific data/queries.
I'm having a table with 6 columns and I want to perform a select query based on all 6 columns, but one column's data is mistyped/incorrect and I get no data returned. What could be used to still have my data returned even though some data is incorrect, but I still want most relevant records to be returned?
Don't know how much this can help you, But you can give a try using SOUNDEX function in SQL server.
Check this link for information on SOUNDEX Function
I have a bunch of data in my database and I want to filter out data that has been stored that for longer than a week. I'm using SQLServer and I found that I could use the 'DATEDIFF' function.
At the moment it works great and fast but I don't have a lot of records at the moment therefore anything runs quite smoothly.
After some research online I found out that the comparison of integers in databases is faster than the comparison of strings, I assume at this point that the comparison of datetimes (using the given function) is even slower at a major scale.
Let's say my database table content looks like this:
Currently I would filter out records that are older like a week like so:
SELECT * FROM onlineMainTable WHERE DATEDIFF(wk, Date, GETDATE()) > 1
I assume that this query would be quite slow if there were a thousand rows tin the table.
The status column represents a calculation status, I wondered if I would speed up the process if I were to look for a specific status instead of matching datetimes, for me in order to set that status to the one that represents 'old records' I need to update those rows before I select them, it would look something like this:
UPDATE table SET Status = -1 WHERE NOT Status = -1 AND DATEDIFF(wk, Date, GETDATE()) > 1;
SELECT * FROM table WHERE Status = -1;
I used the '-1' as an example.
So obviously I could be wrong but I think updating in this case would be fast enough since there won't be that many records to update since older ones have already been updated with its status. The selection would be faster aswell since I would be matching integers instead of datetimes.
The downside to my (possible) solution is that I would query twice every time I fetch data, even when it might not be needed (if every row is newer than 1 week).
It comes down to this: Should I compare datetimes or should I update an integer column based on that datetime and then select using the comparison of those ints?
If there is a different/better way of doing this i'm all ears.
Context
I am making a webapp for quotation requests. Requests should expire after a week since they won't be valid at that point. I need to both display valid requests and expired requests (so costumers have an overview). All these requests are stored in a database table.
Indexes are the objects that are design to improve select queries performances the drawback is that they slow down insert delete and update operations, so they have to be used when necessary. Generally DBMS provide tools to explain queries execution plan.
Maybe you just need to add an index on Date column:
create index "index_name" on onlineMainTable(Date)
and query could be
SELECT * FROM onlineMainTable WHERE Date > DATEADD(week,-1,GETDATE());
I have a table called book with, the attrbutes are booked_id, yearmon, and day_01....day_31. Now i need to unpivot the table and transform day_01...day_31 into rows, I have successed in doing that, but the problem is that my yearmon is a format of 200805 and i need to append a day to it based on day_01 or day_02 etc, so that i can create a new column with date information for example, if it is day_01, it looks like 20080501. Instead of writing huge query, does anyone how to use ssis to tranform it
You should be able to use the Unpivot component and the Derived Column component to do what you need. Look into those and post back if they don't seem to do what you need.