DBT incremental models - Is there a way to keep track of the "updated" timestamp? - dbt

Is there a way to create an incremental model with dbt and update the timestamp if a new row was added or an existing row was updated, but not if the row remains unchanged?
Have tried changing stategies, nothing avails.

Related

Wants to add a New Column into Slowly Changing Dimension, but gets Error

We have a slowly changing dimension ETL package, which reads data from Task table to update DimTask table. The thing is that, we added a new column 'Category' into Task table, and want DimTask to slowly change on it (that is, once the value of 'Category' of one TaskID is changed in Task table, we want to add a new row in DimTask table to record this new value with new start and end date).
So we inert 'Category' into both Task and DimTask table, then we added the 'Category' in advanced editor of the ETL package, as well as the OLE DB Source and Insert Destination. The Error here is that, the advanced editor says 'There must be at least one column of Fixed, Changing, or Historical type on the input of a Slowly Changing Dimension transform.'
We are not sure why this appears, does this mean we have to use the Slowly changing dimension Wizard to go through the process (like choose primary key, which columns are historical) all over again each time we want to update the slowly changing dimension?
Is there any way we could only add this new column? Because we have hundreds of other columns in the table and it would costs lots of time to go through the Wizard again.
Thanks a lot for your help!
Oh we found that the column type of 'Input columns' under slowly changing dimension's advanced editor's 'Input and Output Properties' would be automatically deleted when adding a new column. Once we fill the Column types (especially the Key type), the ETL starts to work.
enter image description here

vb.net dataview grid won't add record and doesn't update after data is modified independently

I have a dataview grid bound to a datasource at run time. The datasource is filled from an access database via a DataAdapter. The data fills and displays correctly, and updates to existing rows seem to work OK but I have two problems:
When I type something in a new row and then press return or switch to a different row, I want the DataAdapter to add that row then and there to the database so I can retrieve the Autonumber index of the new record from Access and use that to add an associated record in a different table (Entries, a many to many linking table). This isn't happening. In the RowLeave event I have adapter.Update(dsSentences) and then I check for the new row, but the RowCount doesn't reflect its presence even though the newly added data is visible in the grid, and the adapter.Update doesn't seem to have triggered the Insert query that I specified in the DataAdapter. So nothing is added.
(edit: OK, so the new row has not yet been added when this event is fired. Which event should I then use to commit the data and retrieve the Autonumber primary key for my new record? I've tried UserAddedRow but that one fires before you've entered any data into the new row.)
THe second problem is that I need to update the data independently and then have the grid reflect those changes. How do I do that? Is there some call that will force the grid to get the updated data from the DataAdapter via the Dataset? Any help would be much appreciated. I'm almost ready to dtop the whole idea of binding data and do it all through code, Data binfing is supposed to save time but I'm finding it labyrinthine and unpredictable.
FWIW here's the query I'm using to fill the grid:
PARAMETERS nIdCollection Long;
SELECT tblSentences.IdSentence, tblSentences.SentenceText, tblSentences.SentenceParsed, Not IsNull([tblSentences]![SentenceParsed]) AS HasParsed, Entries.IdEntry
FROM tblSentences INNER JOIN Entries ON tblSentences.IdSentence = Entries.IdSentence
WHERE (((Entries.IdCollection)=[nIdCollection]))
ORDER BY Entries.SortValue;
As you can see, it requires a record in Entries. After I've entered a new record in tblSentences, before there are any entries the IdEntry will be null assuming it shows up at all. That's why I need to intercept directly after the Insert, add the record to Entries and requery to keep everything in order. You could do it all in an SQL stored procedure but I have to use Access.
Edit: After a lot of googling I've come to the conclusion that what I'm trying to do = add a record to a table through an additional INSERT query apart from the one handled by the DataAdapter, every time a new row is added - simply can't be done if you are using data binding. I am going to have to delete all my code and start from scratch populating the grid through code (unbound). I think it's the only way to do what I want. I will leave this here as a warning to anyone else not to make my mistake of trying to use Data binding when your data is coming from more than one table. Bad mistake.

Set default value for column in Mosaic Decisions

I’m using a data flow in Mosaic Decisions and I’m using a MySQL writer node. The result set that I’m going to write has a field inserted-time. But I want to skip the value in this column and want to use the default value set for that column in the DB table. How do I do that?
You can simply drag the column that you want to skip into the "skip-insert-column" section of the writer node.
In this screenshot for example, the column "Target" will not be inserted into the target table and whatever default value set for that column in the DB table will be applied automatically.

Add new column to existing table Pentaho

I have a table input and I need to add the calculation to it i.e. add a new column. I have tried:
to do the calculation and then, feed back. Obviously, it stuck the new data to the old data.
to do the calculation and then feed back but truncate the table. As the process got stuck at some point, I assume what happens is that I was truncating the table while the data was still getting extracted from it.
to use stream lookup and then, feed back. Of course, it also stuck the data on the top of the existing data.
to use stream lookup where I pull the data from the table input, do the calculation, at the same time, pull the data from the same table and do a lookup based on the unique combination of date and id. And use the 'Update' step.
As it is has been running for a while, I am positive it is not the option but I exhausted my options.
It's seems that you need to update the table where your data came from with this new field. Use the Update step with fields A and B as keys.
actully once you connect the hope, result of 1st step is automatically carried forward to the next step. so let's say you have table input step and then you add calculator where you are creating 3rd column. after writing logic right click on calculator step and click on preview you will get the result with all 3 columns
I'd say your issue is not ONLY in Pentaho implementation, there are somethings you can do before reaching Data Staging in Pentaho.
'Workin Hard' is correct when he says you shouldn't use the same table, but instead leave the input untouched, and just upload / insert the new values into a new table, doesn't have to be a new table EVERYTIME, but instead of truncating the original, you truncate the staging table (output table).
How many 'new columns' will you need ? Will every iteration of this run create a new column in the output ? Or you will always have a 'C' Column which is always A+B or some other calculation ? I'm sorry but this isn't clear. If the case is the later, you don't need Pentaho for transformations, Updating 'C' Column with a math or function considering A+B, this can be done directly in most relational DBMS with a simple UPDATE clause. Yes, it can be done in Pentaho, but you're putting a lot of overhead and processing time.

Issue with create dimension table with last update time column in Pentaho Data Integration

I am creating dimension table with last updated time(from GetSystemInfo)in Pentaho Data Integration(PDI).It works fine except it enters new rows even there is no changes in row and reason is there is lookup is also performing on last updated time field which should not perform. But when I removes this field from key field from attribute Dimenssion lookup/update it works as expected but values in lat time updated field goes empty.Thanx in advance for any solution/Suggestion.
I expect you are talking about SDC II. (Slowly changing dimension of type 2) here and you want to store a date of when a row is inserted to a SCD table.
Instead of obtaining data from GetSystemInfo step, you can use Date of last insert (without stream field as source) type of dimension update in the Fields tab of Dimension Lookup / Update step which stores a datetime automatically in defined table column.
Additional hint: If you need to store maximum value of some date from a source system table which is relevant for loading new / changed data, store its maximum right after Dimension Lookup / Update step into a separate table and use it as when loading updated data at the beginning of a ETL transformation.
I think it is better to use the below components
Step 1: Using table input step you can get a max value from target system and pass the value to next step
Step 2: Take one more table input step and write a source query and assign the previous value in where clause(like ?)
Step 3: Then perform the as usual operation on target level
I think you are getting the above steps.