Cruise Control - Parent / Child projects

Cruise Control - Parent / Child projects - maven-2

I am using cruise control to constantly check my projects and so far it appears to be working better than continuum except for some minor issues.
If I have a project that depends on several other projects and I update one of those dependencies without updating its version number thus requiring no change in the parent pom, the parent will never get rebuilt.
Is this possible, or should I just go back and keep incrementing the pom after each change then make the corresponding changes in the parents as needed?
I was hoping this would be done automatically (if cruise control had tighter maven integration), so that I would magically know this change broke something downstream.
I am guessing my new format will probably be:
${date}.${buildNumber}
2009.12.18.1
Thanks,
Walter

If your projects are in a parent->child relationship in maven, building the parent should also the build its modules with the newly resolved dependencies everytime if you are using the install goal.
Based on your description, it sounds like you don't have a parent project with modules but instead of a handful of separate projects with dependencies between them. In this case, I would suggest you do two things
Change the versions in top level of your poms and in your to include -SNAPSHOT on the end. This forces maven to check for new versions of any dependency in a specified timeframe - daily by default. This behavior can by altered by specifically changing the updatePolicy for your repositories - you can have it check every single time if you wish. When it comes to tag and then build/release your projects, you'll want to remove the -SNAPSHOT qualifier for the release, and then re-add it back in after incrementing your version number to support your next development cycle.
In your CI server, you can force projects to build in succession. Since it looks like you switched to Hudson, it should involve setting up build trigger based on the completion of another project just like you'd like add a trigger for scm polling.

Unfortunately I don't know enough about cruise control to directly answer your question.
However I have had positive experience with Hudson. Hudson is widely used as a CI server, is free, and has excellent Maven integration plus handles what you describe perfectly.

Related

Source control in SSIS and Concurrent work on dtsx file

I am working on building a new SSIS project from scratch. I want to work with couple of my teammates. I was hoping to get a suggestion on how we can have some have some source control, so that few of us can work concurrently on the same SSIS project (same dtsx file, building new packages.)
Version:
SQL Server Integration Service v11
Microsoft Visual Studio 2010

It is my experience that there are two opportunities for any source control system and SSIS projects to get out of whack: adding new items to the project and concurrent changes to an existing package.
Adding new items
An SSIS project has the .dtproj extension. Inside there, it's "just" XML defining what all belongs to the project. At least for 2005/2008 and 2012+ on the package deployment model. The 2012+ project deployment model carries a good bit more information about the state of the packages in the project.
When you add new packages (or project level connection managers or .biml files) the internal structure of the .dtproj file is going to change. Diff tools generally don't handle merging XML well. Or at all really. So, to prevent the need for merging the project definition, you need to find a strategy that works for you team.
I've seen two approaches work well. The first is to upfront define all the packages you think you'll need. DimFoo, DimDate, DimFoo, DimBar, FactBlee. Check that project and the associated empty packages in and everyone works on what is out there. When the initial cut of packages is complete, then you'll ensure everyone is sync'ed up and then add more empty packages to the project. The idea here is that there is one person, usually the lead, who is responsible for changing the "master" project definition and everyone consumes from their change.
The other approach requires communication between team members. If you discover a package needs to be added, communicate with your mates "I need to add a new package - has anyone modified the project?" The answer should be No. Once you've notified that a change to the project definition is coming, make it and immediately commit it. The idea here is that people commit and sync/check in whatever terminology with great frequency. If you as a developer don't keep your local repository up to date, you're going to be in for a bad time.
Concurrent edits
Don't. Really, that's about it. The general problem with concurrent changes to an SSIS package is that in addition to the XML diff issue above, SSIS also includes layout data alongside tasks so I can invert the layout and make things flow from bottom to top or right to left and there's no material change to SSIS package but as Siyual notes "Merging changes in SSIS is nightmare fuel"
If you find your packages are so large and that developers need to make concurrent edits, I would propose that you are doing too much in there. Decompose your packages into smaller, more tightly focused units of work and then control their execution through a parent package. That would allow a better level of granularity to your development and debugging process in addition to avoiding the concurrent edit issue.

A dtsx file is basically just an xml file. Compare it to a bunch of people trying to write the same book. The solution I suggest is to use Team Foundation Server as a source control. That way everyone can check in and out and merge packages. If you really dont have that option try to split your ETL process in logical parts and at the end create a master package that calls each sub packages in the right order.
An example: Let's say you need to import stock data from one source, branches and other company information from an internal server and sale amounts from different external sources. After u have all information gathered, you want to connect those and run some analyses.
You first design the target database entities that you need and the relations. One of your member creates a package that does all the import to staging tables. Another guy maybe handles external sources and parallelizes / optimizes the loading. You would build a package that in merges your staging and production tables, maybe historicizing and so on.
At the end you have a master package that calls each of the mentioned packages and maybe some additional logging or such.

In our multi-developer operation, we follow this rough plan:
Each dev has their own branch, separate from master branch
Once a week, devs push all their changes to remote
One of us pulls all changes, and merges all branches into master, manually resolving .dtproj conflicts as we go
Merge master in all dev branches - now all branches agree
Test in VS
Push all branches to remote, other devs can now pull and keep working
It's not a perfect solution, but it helps quarantine the amount of merge pain we have to experience.

We have large ssis solutions with 20+ packages in one solution, with TFS Git. One project required adding a bunch of new packages to the existing solution. We thought we were smart and knew to assign only one person to work on each new package, 2 people working on the same package would be suicide. Wasn't good enough. When 2 people tried add a different named, new, package at the same time, each showed dtproj as a file that had changed/needed to be checked in and suddenly I found myself looking at the xml for dtproj and trying to figure out which lines to keep (Microsoft should never ask end users to manually edit their internal files, which only they wrote and understand). Billinkc's solutions here are very good and the problem is very real. You may think that Microsoft is the great Wise One, and that your team can always add new packages to an existing solution without conflicts, but you'd be wrong. It also doesn't work to put dtproj in .gitignore. If you do that, you won't see other peoples new packages (actually the .dtsx file will come down in git, but you won't see that package in Solution Explorer because dtproj is what feeds Solution Explorer). This is a current problem (2021) and we are using Visual Studio 2017 Enterprise with SSDT.
To explain this problem to people, git obviously can handle a group of independent, individual files in a directory (like say .bat files) and can add, change, and delete those files easily. The problem comes in when you have a file that is naming, describing, and counting all the files in a directory (what dtproj does). When you have a file like dtproj you are creating a conflict on dtproj itself, when 2 people try to a add a new package at the same time. Your dtproj file has a line that shows the package you added, and my dtproj file shows the package I added, and tfs/git sees that as a Conflict.
Some are suggesting ways to deal with this if you have to add a lot of new packages, my idea is a little different. For the people who have to add new packages, don't work in the primary solution where this problem is, work somewhere else. Probably best to work in the "Projects" directory you get when you install Visual Studio, outside of TFS/Git. Obviously follow all the standards, Variable naming, and Package Configuration conventions for the target Solution. Then when the new packages are ready, give the .dtsx files to your Solution Gatekeeper for them to check in. Only the Gatekeeper can check in new packages using Add From Existing, avoiding conflicts. Once the package is checked in, developers can work on them in the main Solution.

How to Group Plug-ins into Features

We are struggeling hard with how to use features the correct way.
Let’s say we have the plug-in org.acme.module which depends on org.thirdparty.specific and org.acme.core.
And we have the plug-in org.acme.other which depends on org.acme.core.
We want to create an application from these, which includes a target file and a product file. We have the following options:
One feature per module:
org.acme.core.feature
org.acme.core
org.acme.module.feature
org.acme.module
org.acme.other.feature
org.acme.other
org.thirdparty.specific.feature
org.thirdparty.specific
This makes the target and product files gigantic, and the dependencies are very hard to manage manually.
One feature per dependency group:
org.acme.module.feature
org.acme.core
org.acme.module
org.thirdparty.specific
org.acme.other.feature
org.acme.core
org.acme.other
This approach makes the dependencies very easy to manage, and the target and product files are easy to read and maintain. However it does not work at all. The moment org.acme.core changes, you need to change ALL the features. Furthermore, the application has no say in what to package, so it can’t even decide to update org.acme.core (because of a bugfix or something).
Platform Feature:
org.acme.platform.feature
org.acme.core
org.acme.other
org.thirdparty.specific (but could be its own feature)
org.acme.module.feature
org.acme.module
This is the approach used for Hello World applications and Eclipse add-ons - and it only works for those. Since all modules' target platforms would point to org.acme.platform.feature, every time anything changes for any platform plug-in, you'd have to update org.acme.platform.feature accordingly.
We actually tried that approach with only about 50 platform plug-ins. It's not feasible to have a developer change the feature for every bugfix. (And while Tycho supports version "0.0.0", Eclipse does not, so it's another bag of problems to use that. Also, we need reproducibility, so having PDE choose versions willy-nilly is out of the question.)
Again it all comes down to "I can't use org.acme.platform.feature and override org.acme.core's version for two weeks until the new feature gets released.
The entire problem is made even more difficult since sometimes more than one configuration of plug-ins are possible (let's say for different database providers), and then there are high level modules using other child modules to work correctly, which has to be managed somehow.
Is there something we are missing? How do other companies manage these problems?
The Eclipse guys seem to use the “one feature per module” approach. Not surprisingly, since it’s the only one that works. But they don’t use target platforms nor product files.

The key to a successful grouping is when to use "includes" in features and when to just use dependencies. The difference is that "includes" are really included, i.e. p2 will install included bundles and/or included features all the time. That's the reason why you need to update a bundle in every feature if it's included. If you don't update it, you will end up with multiple versions in the install.
Also, in the old day one had to specify dependencies in features. These days, p2 will mostly figure out dependencies from the bundles. Thus, I would actually stop specifying dependencies in features but just includes. Think of features as a way to specify what gets aggregated.
Another key point to grouping is - less is more. If you have as many features as bundles chances a pretty high that you have a granularity issue. Instead, think about what would a user install separately. There is no need to have four features for things that a user would never install alone. Features should not be understood as a way of grouping development/project structures - that's where folders in SCM or different SCM repos are ok. Think of features as deployment structures.
With that approach, I would recommend a structure similar to the following example.
my.product.base
base feature containing the bare minimum of the product
could be org.acme.core plus a few minimum
my.product.base.dependencies
features with 3rd party libraries for my.product.base
my.addon.xyz
feature bundling an add-on
separate features for things that can be installed separately
my.addon.xyz.dependencies
3rd party libraries for add-on dependencies
Now in the product definition I would list just my.product.base. There is no need to also list the dependencies features. p2 will fetch and install the dependencies automatically. However, if you want to bind your product to specific versions of the dependencies and don't want p2 to select any matching one, then you must include the my.product.base.dependencies feature.
In the target definition I would include a "my.product.sdk" feature. That feature is an aggregation feature of all other features. It makes target platform management easier. I typically create an sdk feature with everything.
Another feature that is also very often seen is a "master" feature. This is an "everything" feature that maybe used for creating a p2 repository during the build. The resulting p2 repository is then used for assembling products.
For a more real world example see here:
http://git.eclipse.org/c/gyrex/gyrex-server.git/tree/releng/features
Features and Continuous Delivery
There was a comment regarding frequent updates to feature.xml. A feature.xml only needs to be modified when there is a change in structure. No updates need to happen when the bundle version is modified. You should reference bundles in features with version 0.0.0. That makes Tycho to fill in the proper version at build time. Thus, all you need to do is commit a change to any bundle and then kick off a rebuild. Tycho also takes care of updating the feature qualifier based on the qualifiers of the contained bundles. Thus, the new feature qualifier will be different than in a previous build.

Does Ivy have different resolution behavior depending on status attribute?

My colleague pointed out a flaw in maintaining our artifacts (still somewhat new to Ivy):
The release builds are marked as “integration” which means it is rechecking for new versions on each build slowing down the build even when it has cached the dependencies.
That did not make much sense to me, since, I think, Ivy still needs to check what is in repo before making a decision about the version to deliver. So, I decided to research that a bit to understand exactly what are the effects of marking libraries with different status values.
I cannot find much in the documentation, though, or on the net. What am I missing?
Could someone please shed some light on this?
Thank you

The status is just a string, that can be defined for ivy. They don't affect the resolving of artifacts per se. It has no effect on the default retrieval. It's just a marker for an artifact.
Status:
Status of a revision A module's status indicates how stable a module
revision can be considered. It can be used to consolidate the status
of all the dependencies of a module, to prevent the use of an
integration revision of a dependency in the release of your module.
Three statuses are defined by default in Ivy:
integration: revisions builded by a continuous build, a nightly
build, and so on, fall in this category
milestone: revisions delivered to the public but not actually
finished fall in this category
release: a revision fully tested and labelled fall in this
category
You need to declare the dependency as changing or the resolver definition to achieve what your co-worker mentioned:
Changes in artifacts Some people, especially those coming from maven 2
land, like to use one special revision to handle often updated
modules. In maven 2 this is called a SNAPSHOT version, and some argue
that it helps save disk space to keep only one version for the high
number of intermediary builds you can make whilst developing.
Ivy supports this kind of approach with the notion of "changing
revision". A changing revision is just that: a revision for which Ivy
should consider that the artifacts may change over time. To handle
this, you can either specify a dependency as changing on the
dependency tag, or use the changingPattern and changingMatcher
attributes on your resolvers to indicate which revision or group of
revisions should be considered as changing.
Once Ivy knows that a revision is changing, it will follow this
principle to avoid checking your repository too often: if the module
metadata has not changed, it will considered the whole module
(including artifacts) as not changed. Even if the module descriptor
file has changed, it will check the publication data of the module to
see if this is a new publication of the same revision or not. Then if
the publication date has changed, it will check the artifacts' last
modified timestamps, and download them accordingly.
So if you want to use changing revisions, use the publish task to
publish your modules, it will take care of updating the publication
date, and everything will work fine. And remember to set
checkModified=true" on your resolver too!

Flash Builder best practice for working on multiple versions of a project

I have a large Flash Builder project that is part of a much larger (.net) solution. I typically have, for the entire project, a forward dev branch going, as well one or more bug fix branches. What is the best way to set this up in Flash Builder, given that Flash Builder does NOT want to import an new project (bug fix branch) that has the same name as an existing (forward dev branch) project?

The best way is understand the workspace limitations. Eclipse doesn't accept projects with the same name, is an old problem. I could say you, isn't only eclipse, but a lot IDEs have such problem and bugs about it.
We want a create a project, production, what is the current stable version, at least one bug test and the next project version. Then, in this Eclipse case, you should name it to something like ProjectNameProd, ProjectNameBugFix and ProjectNameNextVersion. It's also good for browser files by path, as you have the folder saying what is inside.
You can put all in one SVN or GIT repo, or create one repository for each one of these contents. Then, configure the Eclipse/Flash Builder to use SVN, follow this well explained.
Another, not well, approach is set one project and all your targets but it is really a mess to organize and keep things separated.
Hope help you.

How to find unneccesary dependencies in a maven multi-project?

If you are developing a large evolving multi module maven project it seems inevitable that there are some dependencies given in the poms that are unneccesary, since they are transitively included by other dependencies. For example this happens if you have a module A that originally includes C. Later you refactor and have A depend on a module B which in turn depends on C. If you are not careful enough you'll wind up with both B and C in A's dependency list. But of course you do not need to put C into A's pom, since it is included transitively, anyway. Is there tool to find such unneccesary dependencies?
(These dependencies do not actually hurt, but they might obscure your actual module structure and having less stuff in the pom is usually better. :-)

To some extent you can use dependency:analyze, but it's not too helpful. Also check JBoss Tattletale.
Some time ago I've started a maven-storyteller-plugin to be able to deeper analyze the poms, but the project is very far from production/public use. You can use the storyteller:recount goal to analyze the unused/redundant dependencies.
The problem with the whole story is - how to determine "unused" things. What is quite possible to analyze is for instance class references. But it won't work if you're using reflection - directly or non-directly.
Update November 2014.
I've just moved my old code of the Storyteller plugin to GitHub. I'll refresh it and release to the central so that it's usable for others.

I
personaly use the pom editor of M2Eclipse to visually view the dependency tree (2D tree). Then I give a look in my deliverable (war, ear) lib directories. Then still in M2Eclipse pom dependencies viewer I go to every 3rd party, and right click on the dependency I want to exclude (an exclusion is added automatically in the right dependency).
There is no golden rules, simply some basic tips:
a lot of pom are not correct: a lot of 3rd party libs out there require way too much dependencies in the default compile scope, if everybody carefully craft their pom, you must not have so much unwanted dependencies.
you need to guess by the name of dependencies what you will have to exclude, best example are parsers, transformer, documentbuilder: xalan, xerces, xalan alfred and co. try to remove them and use the internal jdk1.6 parser, common apache stuff, log4j is also worth looking at.
look also regularly in lib delivery if you do not have duplicate libraries with different version (the dependency resolver of maven should avoid that)
go bottom up, start with your common modules, then go up till the service layer, trimming down dependencies in every module, dont try to start in modules ear/war, it will be too difficult
check often if your deliverable are still working, by either testing or comparing and old deliverable with the new one (especially in web-inf/lib directory what has disappeared with winmerge/beyoncompare)

When you have A -> B, B -> C, and then refactor such that A -> (B, C). IF it is the case that A still compiles against B, you very much don't want to simply pick up the dependency because you receive it transitively.
Think of the case when A -> (B-1.0, C-1.0), B-1.0 -> C-1.0. Everything's in sync, so to avoid "duplication" you remove C from A's dependency. Then you upgrade A to use B-2.0 -> C-2.0. You begin to see errors because A wants C-1.0 classes but found C-2.0 classes. While quickly reconcilable in this scenario, it is far less so when you have lots of dependencies.
You very much want the information in A's pom that says that it explicitly expects to find C-1.0 on the classpath so that you can understand when you have transitive dependency conflicts. Again, Maven will do the job of ensuring that the "closest" version of any particular jar ends up on your classpath. But when things go wrong - you want all the dependency metatdata you can get.
On a slightly more practical note, a dependency is unused when you can remove it from your pom and all of your unit/integration/acceptance tests still pass. ;-)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas