Configuration Change for Incremental Model on DBT - google-bigquery

On one our previously created incremental models, I added partition_by and partition_expiration_days parameters to the configuration to set the table partition and retention in place.
{{ config(
materialized='incremental',
unique_key='record_id',
on_schema_change='append_new_columns',
partition_by={
"field": "row_ts",
"data_type": "timestamp",
"granularity": "day"
},
partition_expiration_days = 365
)
}}
I observed on the next run that the configuration didn't applied to the table.
It seems a full-refresh operation needed here. Yet we have strict retention on the data source for this table which some of the data would be lost with full-refresh operation.
Could anyone please let me know how this issue can be addressed with a solution?

Related

How to use createDisposition with BigQuery insertAll

insertAll will fail if no table with that name exists. So currently we run a process to create the table, then re-runs insertAll once the table exists.
With every other BigQuery API call you can use a createDisposition to create the table if it doesn't exist.
My question, is there something like this for insertAll? If not, why not! Haha.
Check out templateSuffix property for insertAll
It does what you expect
From documentation
[Experimental] If specified, treats the destination table as a base
template, and inserts the rows into an instance table named
"{destination}{templateSuffix}". BigQuery will manage creation of the
instance table, using the schema of the base template table. See
https://cloud.google.com/bigquery/streaming-data-into-bigquery#template-tables
for considerations when working with templates tables.
So, request should look something like below
var request = {
projectId: "yourProject",
datasetId: "yourDataset",
tableId: "yourTable",
resource: {
"kind": "bigquery#tableDataInsertAllRequest",
"skipInvalidRows": true,
"ignoreUnknownValues": true,
"templateSuffix": "YourTableSuffix",
"rows": ...
},
with resulting destination table - yourTableYourTableSuffix

Can an object have 2 active snapshots?

According to this page in the docs, only one snapshot can be active for a given object. However, I seem to have a Defect with 2 active snapshots. All snapshots are shown in the screenshot below:
As you can, see I have connected the snapshots with arrows and they do not all link together. Is this a bug with Rally or is it in fact possible to have 2 defects with _ValidTo dates in the year 9999?
My query is taken from the example in the docs:
URI: https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/12345/artifact/snapshot/query.js
POST data:
{
"find": {
"ObjectID": my funky object
},
"fields": ["State", "_ValidFrom", "_ValidTo", "ObjectID", "FormattedID"],
"hydrate": ["State"],
"compress": true
}
The object should not have two current snapshots with _ValidTo set to 9999-01-01. Please contact CA Agile Central (Rally) support, and they will raise the issue with the LookbackAPI team, which I believe has a way of fixing the data on their end.

Query to get all projects in a workspace using lookback api

Is Project a valid _Type to use in a lookback lquery?
I tried "_Type":"Project"
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/1234/artifact/snapshot/query.js?find={"_Type":"Project","State":"Open"}&fields=["Name"]
and also "_TypeHierarchy":"Project"
https://rally1.rallydev.com/analytics/v2.0/service/rally/workspace/1234/artifact/snapshot/query.js?find={"_TypeHierarchy":"Project","State":"Open"}&fields=["Name"]
and both returned 0 results. The same syntax works if "_TypeHierarchy":"Defect" but not with Project, but there are no errors. Thanks.
The Lookback API supports querying snapshots for a given Project or ProjectHierarchy. For example:
{
[...]
"Project": 12345
}
or
{
[...]
"_ProjectHierarchy": 12345
}
However, it's not possible get a list of projects from the Lookback API outside the context of artifact snapshots. Getting projects would be a manual process. If you get a list of snapshots, you could iterate the result set and extract the Project OIDs, then generate a list. You could even parse the _ProjectHierarchy values and construct the project tree. Another caveat is hydrating the Project OIDs will require WSAPI calls.
Querying projects from the Lookback API may be expensive. You can specify fields to reduce the amount of data in the response. e.g.
fields: ["Project", "_ProjectHierarchy"]

Creating a custom defect trend chart using App SDK 2.0

I've been tasked with plotting some defect charts using Rally historical data. Right now I'm using a simple REST client to pull data at certain points in time and plot the count on a spreadsheet. What I'm doing right now is:
{
find : {
"_ProjectHierarchy": <projectId>,
"_TypeHierarchy": -51006,
"FoundInBuild" : {$regex: "3\\.3\\."},
"State" : {$in : ["Submitted","Open"] },
$or: [
{"Severity" : { $in : ["Catastrophic","Severe"] }},
{"Priority" : "showstopper"}
],
"__At" : "<date>"
},
pagesize : 1,false
}
I just run this once for every date I need the data for. That's a lot of queries! What I'm looking for is a way to run a single query using _ValidFrom and _ValidTo to enclose a time range, then pass it on to a SnapshotStore, then plot that on a Chart? I'm certain there's a way to do it, but I can't figure it out from the docs. Any help much appreciated.
Unfortunately, the example space for AppSDK2 and Lookback API is presently a bit thin. There are some cool apps out there, for instance, you may wish to check out David Thomas' cool Hackathon app:
Defect Re-work Trend
As a starting point. It queries LBAPI for Defects and stores the resulting data in a SnapshotStore. The App itself measures Defect "thrash" or the trending around how many times a Defect is re-opened during a particular development cycle.
In reviewing Hackathon apps, just be aware that certain methods and syntax for the SnapshotStore may change slightly in a future release of AppSDK2.

What are the resources or tools used to manage temporal data in key-value stores?

I'm considering using MongoDB or CouchDB on a project that needs to maintain historical records. But I'm not sure how difficult it will be to store historical data in these databases.
For example, in his book "Developing Time-Oriented Database Applications in SQL," Richard Snodgrass points out tools for retrieving the state of data as of a particular instant, and he points out how to create schemas that allow for robust data manipulation (i.e. data manipulation that makes invalid data entry difficult).
Are there tools or libraries out there that make it easier to query, manipulate, or define temporal/historical structures for key-value stores?
edit:
Note that from what I hear, the 'version' data that CouchDB stores is erased during normal use, and since I would need to maintain historical data, I don't think that's a viable solution.
P.S. Here's a similar question that was never answered: key-value-store-for-time-series-data
There are a couple options if you wanted to store the data in MongoDB. You could just store each version as a separate document, as then you can query to get the object at a certain time, the object at all times, objects over ranges of time, etc. Each document would look something like:
{
object : whatever,
date : new Date()
}
You could store all the versions of a document in the document itself, as mikeal suggested, using updates to push the object itself into a history array. In Mongo, this would look like:
db.foo.update({object: obj._id}, {$push : {history : {date : new Date(), object : obj}}})
// make changes to obj
...
db.foo.update({object: obj._id}, {$push : {history : {date : new Date(), object : obj}}})
A cooler (I think) and more space-efficient way, although less time-efficient, might be to store a history in the object itself about what changed in the object at each time. Then you could replay the history to build the object at a certain time. For instance, you could have:
{
object : startingObj,
history : [
{ date : d1, addField : { x : 3 } },
{ date : d2, changeField : { z : 7 } },
{ date : d3, removeField : "x" },
...
]
}
Then, if you wanted to see what the object looked like between time d2 and d3, you could take the startingObj, add the field x with the value 3, set the field z to the value of 7, and that would be the object at that time.
Whenever the object changed, you could atomically push actions to the history array:
db.foo.update({object : startingObj}, {$push : {history : {date : new Date(), removeField : "x"}}})
Yes, in CouchDB the revisions of a document are there for replication and are usually lost during compaction. I think UbuntuOne did something to keep them around longer but I'm not sure exactly what they did.
I have a document that I need the historical data on and this is what I do.
In CouchDB I have an _update function. The document has a "history" attribute which is an array. Each time I call the _update function to update the document I append to the history array the current document (minus the history attribute) then I update the document with the changes in the request body. This way I have the entire revision history of the document.
This is a little heavy for large documents, there are some javascript diff tools I was investigating and thinking about only storing the diff between the documents but haven't done it yet.
http://wiki.apache.org/couchdb/How_to_intercept_document_updates_and_perform_additional_server-side_processing
Hope that helps.
I can't speak for mongodb but for couchdb it all really hinges on how you write your views.
I don't know the specifics of what you need but if you have a unique id for a document throughout its lifetime and store a timestamp in that document then you have everything you need for robust querying of that document.
For instance:
document structure:
{ "docid" : "doc1", "ts" : <unix epoch> ...<set of key value pairs> }
map function:
function (doc) {
if (doc.docid && doc.ts)
emit([doc.docid, doc.ts], doc);
}
}
The view will now output each doc and its revisions in historical order like so:
["doc1", 1234567], ["doc1", 1234568], ["doc2", 1234567], ["doc2", 1234568]
You can use view collation and start_key or end_key to restrict the returned documents.
start_key=["doc1", 1] end_key=["doc1", 9999999999999]
will return all historical copies of doc1
start_key=["doc2", 1234567] end_key=["doc2", 123456715]
will return all historical copies of doc2 between 1234567 and 123456715 unix epoch times.
see ViewCollation for more details