Source control in SSIS and Concurrent work on dtsx file

Source control in SSIS and Concurrent work on dtsx file - sql

I am working on building a new SSIS project from scratch. I want to work with couple of my teammates. I was hoping to get a suggestion on how we can have some have some source control, so that few of us can work concurrently on the same SSIS project (same dtsx file, building new packages.)
Version:
SQL Server Integration Service v11
Microsoft Visual Studio 2010

It is my experience that there are two opportunities for any source control system and SSIS projects to get out of whack: adding new items to the project and concurrent changes to an existing package.
Adding new items
An SSIS project has the .dtproj extension. Inside there, it's "just" XML defining what all belongs to the project. At least for 2005/2008 and 2012+ on the package deployment model. The 2012+ project deployment model carries a good bit more information about the state of the packages in the project.
When you add new packages (or project level connection managers or .biml files) the internal structure of the .dtproj file is going to change. Diff tools generally don't handle merging XML well. Or at all really. So, to prevent the need for merging the project definition, you need to find a strategy that works for you team.
I've seen two approaches work well. The first is to upfront define all the packages you think you'll need. DimFoo, DimDate, DimFoo, DimBar, FactBlee. Check that project and the associated empty packages in and everyone works on what is out there. When the initial cut of packages is complete, then you'll ensure everyone is sync'ed up and then add more empty packages to the project. The idea here is that there is one person, usually the lead, who is responsible for changing the "master" project definition and everyone consumes from their change.
The other approach requires communication between team members. If you discover a package needs to be added, communicate with your mates "I need to add a new package - has anyone modified the project?" The answer should be No. Once you've notified that a change to the project definition is coming, make it and immediately commit it. The idea here is that people commit and sync/check in whatever terminology with great frequency. If you as a developer don't keep your local repository up to date, you're going to be in for a bad time.
Concurrent edits
Don't. Really, that's about it. The general problem with concurrent changes to an SSIS package is that in addition to the XML diff issue above, SSIS also includes layout data alongside tasks so I can invert the layout and make things flow from bottom to top or right to left and there's no material change to SSIS package but as Siyual notes "Merging changes in SSIS is nightmare fuel"
If you find your packages are so large and that developers need to make concurrent edits, I would propose that you are doing too much in there. Decompose your packages into smaller, more tightly focused units of work and then control their execution through a parent package. That would allow a better level of granularity to your development and debugging process in addition to avoiding the concurrent edit issue.

A dtsx file is basically just an xml file. Compare it to a bunch of people trying to write the same book. The solution I suggest is to use Team Foundation Server as a source control. That way everyone can check in and out and merge packages. If you really dont have that option try to split your ETL process in logical parts and at the end create a master package that calls each sub packages in the right order.
An example: Let's say you need to import stock data from one source, branches and other company information from an internal server and sale amounts from different external sources. After u have all information gathered, you want to connect those and run some analyses.
You first design the target database entities that you need and the relations. One of your member creates a package that does all the import to staging tables. Another guy maybe handles external sources and parallelizes / optimizes the loading. You would build a package that in merges your staging and production tables, maybe historicizing and so on.
At the end you have a master package that calls each of the mentioned packages and maybe some additional logging or such.

In our multi-developer operation, we follow this rough plan:
Each dev has their own branch, separate from master branch
Once a week, devs push all their changes to remote
One of us pulls all changes, and merges all branches into master, manually resolving .dtproj conflicts as we go
Merge master in all dev branches - now all branches agree
Test in VS
Push all branches to remote, other devs can now pull and keep working
It's not a perfect solution, but it helps quarantine the amount of merge pain we have to experience.

We have large ssis solutions with 20+ packages in one solution, with TFS Git. One project required adding a bunch of new packages to the existing solution. We thought we were smart and knew to assign only one person to work on each new package, 2 people working on the same package would be suicide. Wasn't good enough. When 2 people tried add a different named, new, package at the same time, each showed dtproj as a file that had changed/needed to be checked in and suddenly I found myself looking at the xml for dtproj and trying to figure out which lines to keep (Microsoft should never ask end users to manually edit their internal files, which only they wrote and understand). Billinkc's solutions here are very good and the problem is very real. You may think that Microsoft is the great Wise One, and that your team can always add new packages to an existing solution without conflicts, but you'd be wrong. It also doesn't work to put dtproj in .gitignore. If you do that, you won't see other peoples new packages (actually the .dtsx file will come down in git, but you won't see that package in Solution Explorer because dtproj is what feeds Solution Explorer). This is a current problem (2021) and we are using Visual Studio 2017 Enterprise with SSDT.
To explain this problem to people, git obviously can handle a group of independent, individual files in a directory (like say .bat files) and can add, change, and delete those files easily. The problem comes in when you have a file that is naming, describing, and counting all the files in a directory (what dtproj does). When you have a file like dtproj you are creating a conflict on dtproj itself, when 2 people try to a add a new package at the same time. Your dtproj file has a line that shows the package you added, and my dtproj file shows the package I added, and tfs/git sees that as a Conflict.
Some are suggesting ways to deal with this if you have to add a lot of new packages, my idea is a little different. For the people who have to add new packages, don't work in the primary solution where this problem is, work somewhere else. Probably best to work in the "Projects" directory you get when you install Visual Studio, outside of TFS/Git. Obviously follow all the standards, Variable naming, and Package Configuration conventions for the target Solution. Then when the new packages are ready, give the .dtsx files to your Solution Gatekeeper for them to check in. Only the Gatekeeper can check in new packages using Add From Existing, avoiding conflicts. Once the package is checked in, developers can work on them in the main Solution.

Related

Optionally leave old version of component on upgrade

I've been trying to set up a WiX component such that the user can specify that the installer should not upgrade that component on a MajorUpgrade. I had the following code, but this means that if the condition is met then the new version is not installed, but the old version is also removed.
<Component Id="ExampleComponent" GUID="{GUID here}">
<Condition>NOT(KEEPOLDFILE="TRUE")</Condition>
<File Id="ExampleFile" Name="File.txt" KeyPath="yes" Source="File.txt"/>
</Component>
Ideally, if the user specifies "KEEPOLDFILE=TRUE", then the existing version of "File.txt" should be kept. I've looked into using the Permanent attribute, but this doesn't look relevant.
Is this possible to achieve without using CustomActions?

A bit more background information would be useful, however:
If your major upgrade is sequenced early (e.g. afterInstallInitialize) the upgrade is an uninstall followed by a fresh install, so saving the file is a tricky proposition because you'd save it, then do the new install, then restore it.
If the upgrade is late, then file overwrite rules apply during the upgrade, therefore it won't be replaced anyway. You'd need to do something such as make the creation and modify timestamps identical so that Windows will overwrite it with the new one. The solution in this case would be to run a custom action conditioned on "keep old file", so you'd do the reverse of this:
https://blogs.msdn.microsoft.com/astebner/2013/05/23/updating-the-last-modified-time-to-prevent-windows-installer-from-updating-an-unversioned-file/
And it's also not clear if that file is ALWAYS updated, so if in fact it has not been updated then why bother to ask the client whether to keep it?
It might be simpler to ignore the Windows Installer behavior by setting its component id to null, as documented here:
https://msdn.microsoft.com/en-us/library/windows/desktop/aa368007(v=vs.85).aspx
Then you can do what you want with the file. If you've already installed it with a component guid it's too late for this solution.
There are better solutions that require the app to get involved where you install a template version of this file. The app makes a copy of it that it always uses. At upgrade time that template file is always replaced, and when the app first runs after the upgrade it asks whether to use the new file (so it copies and overwrites the one it was using) or continue to use the existing file. In my opinion delegating these issues to the install is not often an optimal solution.
Setting attributes like Permanent is typically not a good idea because they are not project attributes you can turn on and off on a whim - they apply to that component id on the system, and permanent means permanent.

I tried to make this a comment, it became to long. I prefer option 4 that Phil describes. Data files should not be meddled with by the setup, but managed by your application exe (if there is one) during its launch sequence. I don't know about others, but I feel like a broken record repeating this advice, but hear us out...
There is a description of a way to manage your data file's overwriting or preservation here. Essentially you update your exe to be "aware" of how your data file should be managed - if it should be preserved or overwritten, and you can change this behavior per version of your application exe if you like. The linked thread describes registry keys, but the concept can be used for files as well.
So essentially:
Template: Install your file per-machine as a read-only template
Launch Sequence: Copy it in place with application.exe launch sequence magic
Complex File Revision: Update the logic for file overwrite or preservation for every release as you see fit along the lines as the linked thread proposes
Your setup will "never know" about your data file, only the template file. It will leave your data file alone in all cases. Only the template file it will deal with.
Liberating your data files from the setup has many advantages:
Setup.exe bugs: No unintended accidental file overwrites or file reset problems from problematic major upgrade etc... this is a very common problem with MSI.
Setup bugs are hard to reproduce and debug since the conditions found on the target systems can generally not be replicated and debugging involves a lot of unusual technical complexity.
This is not great - it is messy - but here is a list of common MSI problems: How do I avoid common design flaws in my WiX / MSI deployment solution? - "a best effort in the interest of helping sort of thing". Let's be honest, it is a mess, but maybe it is helpful.
Application.exe Bugs: Keep in mind that you can make new bugs in your application.exe file, so you can still see errors - obviously. Bad ones too - if you are not careful - but you can easily implement a backup feature as well - one that always runs in a predictable context.
You avoid the complicated sequencing, conditioning and impersonation concerns that make custom actions and modern setups so complicated to do right and make reliable.
Following from that and other, technical and practical reasons: it is much easier to debug problems in the application launch sequence than bugs in your setup.
You can easily set up test conditions and test them interactively. In other words you can re-create problem conditions easily and test them in seconds. It could take you hours to do so with a setup.
Error messages can be interactive and meaningful and be shown to the user.
QA people are more familiar with testing application functionality than setup functionality.
And I repeat it: you are always in the same impersonation context (user context) and you have no installation sequence to worry about.

SSDT - Build Deployment Script without dacpac

I've got a question about building a deployment script using SSDT.
Could anyone tell me if it's possible to build a deployment script using SQLPackage.exe where the source file is NOT a dacpac file, but uses the .sql files instead?
To give some background, I've created a project in Visual Studio 2012 for my database schema. This works great, and SSDT builds the folder structure without a problem (functions, stored procedures etc which contain all the .sql files).
Here's the problem - the database in question is from a legacy system, and is riddled with errors. Most of these errors we don't care about anymore and it's not practical or safe to fix them all, so for years we've basically ignored them. However it means we can't build the project and therefore can't generate the dacpac file. Now this doesn't prevent us from doing the schema compare and syncing the database with the file system (a local mercurial repository). However it does seemingly prevent us from building a deployment script.
What I'm looking for is a way of building the deployment script using SQLPackage.exe without having to generate the dacpac file. I need to use the .sql files in the file system instead. Visual Studio will produce a script of the differences without building the dacpac, so this makes me think it must be possible to do it using SQLPackage.exe using one of the parameters.
Here's an example of SQLPackage.exe which I'd like to adapt to use the .sql files instead of the dacpac:
sqlpackage.exe /Action:Script /SourceFile:"E:\SourceControl\Project\Database
\test_SSDTProject\bin\Debug\test_SSDTProject.dacpac" /TargetConnectionString:"Data
Source=local;Initial Catalog=TestDB;User ID=abc;Password=abc" /OutputPath:"C:
\temp\hbupdate.sql" /OverwriteFiles:true /p:IgnoreExtendedProperties=True
/p:IgnorePermissions=True /p:IgnoreRoleMembership=True /p:DropObjectsNotInSource=True
This works fine because it uses the dacpac file. However I need to point it at the folder structure where the .sql files are instead.
Any help would be great.

As has been suggested in comments, I think that biting the bullet and fixing the errors is the way ahead. You say
it's not practical or safe to fix them all,
but I think you should give this a bit more thought. I have recently been in a similar situation to you, and the key to emerging from it is to realise that the operational risk associated with dropping procedures and functions that will throw an exception as soon as they are called is zero.
Note that this does not apply if the reason these objects won't build is that they contain cross-database or cross-server references that are present in production but not in your project; this is a separate problem altogether, but also a solvable one.
Nor am I in favour of "exclude from build" as an alternative to "delete"; a while ago I saw a project where this technique had been deployed extensively; it makes it harder to see what does what from the source files and I am now of the opinion that "Build Action=None" is simply "commenting out the bits that don't work" for the Snapchat generation.
The key to all of this, of course, is source control. This addresses the residual risk that one day you might indeed want to implement a working version of one of your currently non-working procedures, using the non-working code as a starting point. It also obviates the need to keep stuff hanging around in the solution using Build Action=None, as one can simply summon an earlier revision of the code that contained the offending objects.
If my experience is any guide, 60 build errors is nothing; these could easily be caused by references to three or four objects that no longer exists, and can be consigned to the dustbin of source control with some enthusiastic use of the "Delete" key.

Do you have a copy of SQL Compare at your disposal? If not, it might be worth downloading the trial to see if it will work in your scenario.
Here are the available switches:
http://documentation.red-gate.com/display/SC10/Switches+used+in+the+command+line
At the very least you'll need to specify the following:
/scripts1:
/server2:
/database2:
/ScriptFile:

WiX: Paraffin and repository/build server integration

Short version: How can I make sure that my component GUIDs remain stable using Paraffin on a build server?
I am currently working on a project that should be deployed via WiX. As this is a web project, it contains many files (still in early stage and already almost 200 files). Also, during development, files are constantly added and deleted, so maintaining the WiX component lists manually is simply not an option.
Since I read a lot about component rules and that people breaking them go to hell, I decided to go with Paraffin as a harvester. This tool is capable of updating an existing component list, thus not re-creating new GUIDs for existing components.
However, when a new component is created, the tool assigns a new GUID. Even if the component files are identical, then initial GUIDs will be different on different machines or even only at different times.
So, obviously, I need a central authority for fixing the initial GUIDs. My idea was to commit empty component lists, which are then filled by the build server calling Paraffin on build. So when I only distribute the MSIs created by the build server, I can be sure that component rules are being followed.
However, the problem with this approach is, that I have no means of tracking my GUIDs, should the build server crash or empty its local repository. I was thinking about having the build server commit the generated component list to my repository, but that doesn't seem like a clean idea.
Another solution I thought of was having all developers build (and thus call Paraffin) before commiting. Thus, each developer would create the initial GUIDs for their newly added files and commit them to the component list.
The obvious problem with this approach: People (e.g. developer A) will forget to build before they commit. So in these cases the build server will create the initial GUIDs for the new files, but those will also only be stored locally. A few commits later, developer B will come along and build the solution, creating a new GUID for the files created by developer A. He will then commit the component list containing this GUID and the build server will check it out. Now the build server has obtained a GUID (created by developer A) for a package, for which it had previously used a different (self-created) GUID, even though the files didn't change in the meantime.
So, how can I make sure, that my GUIDs remain stable between builds without relying on developers to build their solution before they commit? The approaches outlined above both seem unsatisfying to me, but are all I can think of right now.

As far as I am concerned component rules only really come into play when you have multiple installers that share components with the same guids (which should then be exactly the same resource(s)) or you are using a wixlib or a merge module which is then included as part of different installers.
From what you have said above, to me it doesn't sound like you will so, there is no harm in having different component guids for each build. It will just mean that when you upgrade the website, files that have not changed will be removed and re-installed under a different component guid. IMHO that doesn't really matter as long as the installer correctly installs all files that are required for the site to function and doesn't remove components from other products.
If you use the MajorUpgrade element, the old product will be completely removed before the new one is installed so any component guid's that are shared between the two versions will be removed and then re-installed anyway.
I always just leave my guid elements as Guid='*' that way I know that the there will never^ be any guid clashes in any of my components across my multiple products.
^ I know this is not theoretically true but in this use case it is.

Not entirely true. Changing your component GUIDs from build to build is fine if and only if you schedule RemoveExistingProducts early so that the files are off the system before you reinstall the new GUIDs. This approach works well for smallish installers with not so many files, but as your installer grows you will feel the pinch of having twice as much IO to do, as you remove and then reinstall, rather than just overwrite your files. In short, it's up to you, but you should think carefully about how large your application is likely to get before jumping in with the suggested approach.

WIX MSBuild automation help - solution best practices

I know there are many questions out there regarding this same information. I have read them all, but my brain is all turned around and I don't know which way to go. Plus the lack of documentation really hurts.
Here is my scenerio. We are trying to use WIX to create an installer for our application that goes out to our dealers for our product information. The app includes about 2000 images and documents of our products and a SQL CE database that are updated via Microsoft Sync Framework. The data changes so often that keeping these 2000 as content files in the app's project is very undesirable. The app relies on .NET Framework 3.5 SP1, SQL Server CE 3.5, Microsoft Sync Framework 1.0 and ADO.NET Sync Services 2.0.
Here are the requirements for the app:
The dealers will be given the app on a CD every year for any updates (app or data updates).
The app must update itself from the internet to get any new images, documents or data.
The prerequisites must be installed if they do not exist on the client machine.
The complete installer should be generated from an MSBuild script with as little human interaction as possible (we don't want to be manually updating the 2000+ file list).
What we have accomplished so far is that we have a Votive project in our solution. We have manually specified the binaries in a .wxs file. Web have modified the .wixproj file to use the HeatDirectory task to gather our data (images and documents and database) from a specified location (This is broken and giving an ICE38 error). This seems all right, but still is a lot of work. We have to manually update our data by running the program in release mode and copying it to the specified directory.
I am looking to see what other people would do in this situation.
How would you arrange your solution with regards to the 2000+ data files? Would you create a custom build script that gets the current data from the server or would you include them as content files in the main project?
How would you get WIX to include all of the project output (including the referenced assemblies) and all of the data files? If you have any complete samples, that would be great. All I have found are little clips here and there and not an entire example from start to finish.
How would you deal with the version numbers? Would you put them as a constant in the build script and reference them through the $(var.VersionNumberName)? Would you have the version number automatically picked up from the project being deployed? If so, How?
If there is any better information than what I am finding, please include. I have read numerous articles, blogs, Stackoverflow questions, the tuturial, the wiki, etc. Everything seems to be in bits and pieces. The tutorial is nice, but doesn't explain anything about MSBuild and Votive. I would like to see a start to finish tutorial on using MSBuild and Votive and all the WIX MSBuild targets. If no one knows of a tutorial like this I may put one together. I have already spent the entire week gathering info and reading. I'm new to MSBuild as well, so if anyone has any great articles on MSBuild, please include them.

The key is to isolate the different types of complexities into separate merge modules and put them altogether into an MSI as part of the build. That way things that change often can change without impacting things that hardly change at all.
1) For the data files:
We use Paraffin to generate the WiX and hence the merge modules for an html + Flash based help system consisting of thousands of files (I can't convince the customer to go to CHM).
Compile these into a merge module all by themselves.
2) Assemblies: assuming that this is a set that changes less often just make a merge module by hand or with WixEdit with the correct files and dependencies.
3) For the version number there a lot of ways to manage this depending on your build system. The AssemblyInfoTask is pretty straight forward way to make sure all your assemblies are versioned appropriately. The MSBuild Extension Pack has some versioning stuff if you are using TFS.

I had a similar scenario and was unable to find a drop in solution so ended up with the following:
I wrote a custom command line program called wixgen.exe for generating wxs manifest files. It is pretty specific to our implementation in that it only knows how to create 2 types of wxs files. One for IIS Website/Virtual Directory deployments and another for Windows Service deployments.
Each time a build is triggered by our continuous integration server a post-build task runs wixgen with the right args to generate a new manifest.wxs for the project being changed. It automatically includes all the files needed for the deployment. These builds also version the dlls using a variation of the technique at: http://richardsbraindump.blogspot.com/2007/07/versioning-builds-with-tfs-and-msbuild.html
A seperate build which is manually triggered is then used to build the wixproj projects containing the generated wxs files and produce the msi's.

I would ditch the CD delivery (so 90's) and got with ClickOnce. This solution seems to fit well since you already use the .NET framework. With ClickOnce you should be able to just keep updating the content of your solution and make updates available to your heart's content. Let me know if you need, sample ClickOnce deployment code.
You can find more ClickOnce information here.

Similar to dkackman's answer, you should seperate your build into several components, isolating build components to be built seperately.
I come from a mainly Java background, however for building MSIs and NET executables we use maven; with the 'maven-wix-plugin' plugin for building the installers, and using the NMaven plugin for compiling any NET code. However, as we're only performing very basic development in NET, with most development in Java, we don't need too much complexity from the NMaven plugin (which is probably a 'good thing' (TM) as it's only at version 0.17).
If you're a purely NET house, you could also look into Blydan (http://www.codeplex.com/byldan), which seems to be the focus of development there at the moment (it's the same team for NMaven and Byldan).
If you do want more information on NMaven or Byldan raise another question and I'll give as much info as I can (which is not a huge amount, as stated I only do very limited NET development).

Best approach to perform a CMMI Physical Configuration Audit?

The organization I currently work for an organization that is moving into the whole CMMI world of documenting everything. I was assigned (along with one other individual) the title of Configuration Manager. Congratulations to me right.
Part of the duties is to perform on a regular basis (they are still defining regular basis, it will either by quarterly or monthly) a physical configuration audit. This is basically a check of source code versions deployed in production to what we believe to be the source code versions in production.
Our project is a relatively small web application with written in Java. The file types we work with are java, jsp, xml, property files, and sql packages.
The problem I have (and have expressed but seem to be going ignored) is how am I supposed to physical log on to the production server and verify file versions and even if I could it would take a ridiculous amount of time?
The file versions are not even currently in the file(i.e. in a comment or something). It was suggested that we place visible version numbers on each screen that is visible to the users also. I thought this ridiculous also, since the screens themselves represent only a small fraction of the code we maintain.
The tools we currently use are Netbeans for our IDE and Serena Dimensions as our versioning tool.
I am specifically looking for ideas on how to perform this audit in a hopefully more automated way, that will be both accurate and not time consuming.
My idea is currently to add a comment to the top of each file that contains the version number of that file, a script that runs when a production build is created to create an XML file or something similar containing the file name and version file of each file in the build. Then when I need to do an audit I go to the production server grab the the xml file with the info, and compare it programmatically to what we believe to be in production, and output a report.
Any better ideas. I know this has to have been done already, and seems crazy to me that I have not found any other resources.

You could compute a SHA1 hash of the source files on the production server, and compare that hash value to the versions stored in source control. If you can find the same hash in source control, then you know what version is in production. If you can't find the same hash in source control, then there are untracked modifications in production and your new job title is justified. :)

The typical trap organizations fall into with the CMMI is trying to overdo everything. If I could suggest anything, it'd be start small & only do what you need. So consider any problems that you may have had in the CM area peviously.
The CMMI describes WHAT an organisation should do, but leaves the HOW up to you. The CMMI specification, chapter 2 is well worth a read - it describes the required, expected, and informative components of the specification - basically the goals are required, the practices are expected, and everything else is informative. This means there is only a small part of the specification which a CMMI appraiser can directly demand - the goals. At the practice level, it is permissable to have either the practices as described, or acceptable alternatives to them.
In the case of configuration audits, goal SG3 is "Integrity of baselines is established and maintained". SP3.2 says "Perform configuration audits to maintain integrity of the configuration baselines." There is nothing stated here about how often these are done, or how long they may take.
In my previous organisation, FCA/PCA was usually only done as part of the product release process, and we used ClearCase as the versioning tool, with labels applied across the codebase to define baselines. We didn't have version numbers in all the source files, nor did we have version numbers on all the products screens - the CM activity was doing the right thing & was backed up by audits, and this was never an issue in any CMMI appraisal.
We could use the deltas between labels to look at what files had changed, perform diffs to see the actual code changes. An important part of the process is being able to link those changes back to either a requirement/bug report/whatever the reason was which initiated the change.
Our auditing did use scripts to automate the process, but these were in-house developed scripts are specific to ClearCase - basically they would list all the files, their versions in the CM system, and the baseline/config item to which they belonged.

can't you use your source control for this? if you deploy a version and tag your sourcecontrol with that deployment, you can then verify against the source control system

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas