Can we schedule different times for two dbt projects in one account? - dbt

Right now, we are working on organizing our dbt cloud (dividing into staging, intermediate, marts and etc...)and the way we organize it would depend on the ability of dbt cloud to run two different projects at two different times. Can we schedule our first project to run every day and another one only once a week? Thank you so much beforehand. I sincerely appreciate that.
I guess it is not something related to coding, so I'll leave it without queries 🙈

Related

Multi System Database structure based copying/updating best practice

so after searching and not finding similar cases I want to open a new question.
So here is the case:
We are working with a large database with a very complicated data structure. Also we are working on multiple systems to ensure stability (development, testing, quality and productive) and its always a struggle so move data between those systems. As I said the data structure is very large and there is also a lot of logic inside the database. Customers are able to add new data parts as configuration and there is also a static income of data which are used for statistics and monitoring. So let me explain the problem with a small example:
Lets take this Database as an example. We have some families making some contest with each other. And they will create some statistics about the points they make.
The Purple Tables are fixed configurations. They are created once and they can only be changed via an Operator. Those changes will be done and tested in the development system first.
The Yellow Tables are changing configurations. Each Family is able to create or delete multiple Contests and assign their kids.
The Red Table is just plain data. Each time a kid makes points, a new row is added with the amount and current time and the relation to the kid and contest.
This table will be the base for the later statistics.
This Database is developed on two systems a productive one which is used by the families and a develop system which is used by the programmers/operators.
While developing the programmers will add test data like kids families contests and points. And while using the families will create new contests and assign new kids and will fill up the point table.
It's necessary to copy new/tested/fixed families from the development to the productive system.
Its also necessary to copy Contests, Contest-Kid-Assignments and Points from the productive to the development system to find new errors.
Also it must be possible to change the table structure on the development system and transmit this change to the productive system. (This shouldn't be the main topic here sometimes it can be such a large changes that there just is no easy way, so lets keep this point simple but keep it in mind.)
I want to copy parts of the tables to another system but be able to ignore some tables (for example: Points) and I want to make sure to not copy kids without their parent family so there is no "parentless" object in the database.
Question: What would be a good and save way to do this?
I don't need a solution for a specific database type or some scripts. I'm looking for tools, libraries or good practice. (But just as a note we're using mssql.)
We are currently making a tool for this problem (not going well: unstable, overly complicated, slow and possible reinventing the wheel).
Also a lot of devs I know just copy the whole database (making a backup and running it into another server) But this is also making problems: users are being copied and their guid change so they loose permissions etc. I don't think this is a good solution. Also the database is down for quite a long time and its never a smooth process.
Making it manually is sometimes the easiest way but considering the size of our data structure its not just a huge piece of work there is also a large possibility for mistakes.
So I'm hoping someone knows a tool or something similar to help me out.
Welcome to the pains of development for a Stateful entity like a database. :) RedGate makes a tool called SQL Source Control that is good for moving changed data and Schema into Production, and it can interface with source control solutions such as GIT. It's a bit pricey, but it's the best I've found. One option for keeping dev up to date with prod data and dev changes is one I concocted at my last place of employment, which was... not 100% perfect, but better than nothing, and free. It was developed in Powershell, and it went something like this:
Create Pre-restore, Pre-dacpac and Post-dacpac SQL scripts to store data and
permission diffs between dev and prod
Use SQLPackage.EXE to make DacPac of Dev(Dacpac is basically an xml schema of db, no
data)
Execute Pre-restore Proc (Often copying out test data that needs to be persisted)
Restore Prod over Dev
Execute Pre-dacpac script (any DDL That could cause data loss may need to go here)
Use SQLPackage.EXE to apply DacPac made in step 2 to Newly restored database
Execute Post-Dacpac Script (Permissions, restoration of data copied in step 3)
Again, like I said, it worked and automated the restoration of prod data into our dev environment while keeping our dev changes intact, but it required a good bit of upkeep and maintenance. Also, keep in mind, once your DB reaches a certain size, doing a nightly restore is no longer a viable option due to the time it takes to restore.

SQL files management

Most of my day is spent on writing SQL queries to perform small tasks, mainly to get information from the database and manipulate it somehow for data visualization building reports for others.
At the end of the day i try to have a nice folder scheme to help me reusing code and so on, but it's becoming harder to handle so many files and keep
track of everything I've done so far.
Don't want to have huge SQL files because I might want to
the end It's hard to avoid a war zone in my desktop and on this folders. It's also a mess to handle so many folders/codes.
For version control we're using a GIT server, but there is plenty of code that is not in production that we would like to keep track and reuse somehow.
We're using iPython notebook, R studio and SSMS to build our codes, I'm wonder if there is some efficient ways to work.
There must be an efficient way to work out there. What do you use to keep track of your (SQL) codes? and more importantly reuse it.
Thanks in advance,
Rafael
I just use a folder system. And I keep the shell-scripts so to speak as the first file (like the generic code to do X). Whereas the specific codes where I take X and apply dates and other conditions in the bottom half of the folder.
What do you use to keep track of your (SQL) codes? and more importantly reuse it.
For ease of reuse, I have all my running SQL code backed up on an SQL server through routine INFORMATION SCHEMA dumps. For all development code that I need to reuse with others, I have a GIT server that gets automatic updates throughout the day. For reuse on my laptop itself, I have a local backup through time machine.
As for directory or folder structure, all code starts as project based and eventually I migrate the best and most useful code to a personal folder structure that is topic based (date arithmetic, indexing, etc.). No matter how they are stored, all these folders are indexed using local and remote indexing features so I can search and retrieve them with just a few keystrokes when needed. Ultimately what's needed for optimum reuse is ease of retrieval. The quicker I can retrieve, the more reuse I get.
Lastly, it's not just SQL code, but all the supporting documents that led to that code solution. Sometimes this collection may include code from other languages, code from other servers, emails, text documents, images, workflows, etc. Keeping them all together enhances the value of reuse.

Rails update old database

Right now I'm working on updating a Rails app and the database has some issues. It's also being converted from MySQL to PostgreSQL.
There's 3 columns being used to track one time value. The time the facility opens on Monday is being recorded as monday_open_hour, monday_open_minute, monday_open_ampm. I'd like to merge these into a single time field.
There are also several fields being used for only 1% of the 3000+ records, so I'd like to break those out into a separate table.
What would be the best way to do this? I imagine it could probably be done in SQL with some kind of stored procedures/cursors. Is there a way to do it with Ruby/Rails?
The Rails way to deal with incremental database changes is to use migrations. Migrations let you apply incremental changes to your schema or database contents in an orderly fashion, even as you're collaborating with a team. There are nice helpers for common tasks like creating and dropping tables, renaming columns, and simple things like that, but you can drop to arbitrary SQL if you need to (although, be aware that that will most likely tie you to your current database, and make further moves more difficult).
Basically, you can generate a new migration with rails generate migration ConsolidateDateColumns (for example). This will create a template for you in the db/migrate directory; see the Rails Guides entry to get started on writing them. When you're ready to apply it, run rake db:migrate.
The advantages of doing it this way are that it lets you easily apply the same changes to different environments (development, test, production, staging, or across your development team) and keep them in sync, and it encourages you to keep things reversible whenever possible, so you maintain some degree of freedom to migrate back and forth if you need to.
One more thing: it sounds like you're going to be doing a lot of major changes in quick succession. Make sure that you take a backup of your original database before you begin, and thoroughly test your work against a reduced test set in a separate environment before you run it against the real thing!

Finding subsets of ClearCase branch types

I'm working on a large project, for which several thousand branch types are defined, and would like to quickly retrieve a list of "my" branch types. This can be achieved either by listing branch types created by me, or by listing branch types whose names start with my username.
As the full list is huge and lstype runs for approximately an hour normally, is there a way to formulate a query that can be answered quickly?
I never found a native command able to return quickly an answer.
When looking at the cleartool lstype command, the technote "LSTYPE performance improvements" does mention that:
The -short, -nostatus and -unsorted options can be used to improve performance of the cleartool lstype command
But as with everything with ClearCase, this doesn't stand the test of the real world, where the number of (here) types quite be really big...
So what I usually do for this kind of request, considering I don't create a brtype every 5 minutes, is to have a batch job running every 2 hours, updating a list of brtype with the informations I need (owner, date, ...).
I can then at any time filter that file (at least the most updated version of that file) in order to extract the list of brtype I need.
There is the risk this list isn't up-to-date, but in practice this works relatively well.

Alternative in using Excel in reporting

Even with the advanced technologies and available database tools (even free alternatives) are available today, it seems that huge number of users are still very comfortable in using Excel IN EVERYTHING! That's why, as a database developer working as one of these users, I am forced to let them use Excel simply because they are very comfortable using it. Especially for the older people who seemed to never gonna let Excel go and embrace a new tool.
Currently, to make their experience as smooth as possible and at the same time, automated, I'm using a lot of database queries inside Excel be it view,SQL or stored procedures. Mostly on ad-hoc (but then became permanent) reports. My question is are there any hopes to improve this situation? I'm sure a lot of organizations are using this same method. Is it possible to completely replace this arrangement with something more logical and efficient both in data collection and reliability? I'm thinking about using Sharepoint. Am I on the right track?
I have also struggled with this problem in the past and can say that what worked for me was a two pronged approach.
Step 1 – Make a good alternative
It sounds like you have already done this, depending on the system there will always be some random report that someone needs to run to suit their “Business Need”. There is no way that you could cram all of these into your system as it would fill up with reports and the users would become snow blind.
Step 2 – User education
Show them the new way of making their own reports (Business objects SSRS whatever) and make sure they are comfortable with it. This is the hardest part as some people like their comfort blanket of excel and wont want to leave it. Give them some templates and some standard reports, maybe even pair develop one or two reports at their desk with them so they get the knack of it.
I will leave on a bit of a daily WTF, there was once this expert business manager who was an expert in business objects. She made reports left right and centre but she treated it like a giant version of excel and her work was littered with examples of this i.e. one report she wrote was to get the dealing totals for a year. No problem I hear you cry just do
SELECT SUM(DealAmount) where DealDate Between X and Y
Nope not our business expert, in here excel frame of mind this was too much like black magic so what she did was return a row for EVERY SINGLE DEAL done in that year and then aggregated it client side to give her a total. In I step and wow the users by reducing this 104mb report that took 17 minutes to run down to a 100kb report that ran in about 15 seconds.
I would go the other way around. And I mean by that, not making queries and database connections within Excel, but using some sort of Web Application to let users (through wizards) generate data they need, and export them to Excel to work.
That way you will have the following benefits:
No DB connections (and probably passwords) in your Excel files
No distribution problem of Excel files with new queries, views, etc.
Centralized approach to data retrieval
Excel for users used to it
Back in the day, I loved using Crystal Reports for ad-hoc reporting. I'm not sure about it's current status, as it seems that SAP has purchased the product: http://www.sap.com/solutions/sapbusinessobjects/sap-crystal-solutions/index.epx