Database repository (Oracle) vs File system as a repository for Pentaho - pentaho

I want to use Pentaho for my work. After a bit of research I found that to store the ktr/kjb files I can have either database as a repository or I can use file system as a repository. However I don't find any benefits of using database as a repository over file system. The basic purpose of repository here is to create a common location where I can keep all the developed ktr/kjb files in production environment. Basically if I consider the database repository, it will hold all the developed ktr/kjb files in production and every time I need to run a job/transformation I will connect to database to get the respective ktr/kjb file (similar to how informatica stores transformation) on the other hand file based repository will be like a folder holding all the developed files.
Can somebody here will be able to explain pros and cons of both type of repository?
Please let me know if you need any other information.
Thanks in advance.

When several people develop on the same jobs/transformations, the database repository will hold the changes, and ensure the latest versions.
The pros of a filesystem is of course ease of backup, no database connection that can trouble you, and the possibility to use other, more modern and mature version control systems for the files, than the database repositories use.

If you are using the free community edition, I would definitely go with the file repository, along with external file-based version control and migration systems. If you are using the enterprise edition, then you might want to consider the database repository, since you can then use Pentaho's built-in version control and migration systems.

Related

Repository for storing derived information (build artifacts)

I'm looking for a "repository" to store derived information (build artifacts).
We have a repository (currently Mercurial) to store our source code. When something is pushed to the source repository the code goes through a continuous integration server and we do an incremental build and as a result some dlls will be changed. This should be added to some "repository" so that everybody can use that version without needing to do the build again.
I'm looking for the following features:
It should be easy to update the source code and get the corresponding binaries (we could probably make a script for that)
You should easily get all binaries at once (not only those that changed during the last incremental build.
Binaries that weren't changed should only be stored once in the repository.
When updating the source code and the binaries only the changed binaries should be transferred (and not all binaries). This is similar to what happens for source code.
When updating to some version, only that version should be stored locally, not the complete history.
We should be able to remove certain versions from the binary "repository" after a while. However if the dlls are still necessary for subsequent incremental builds, these dlls should of course not be completely removed from the "repository"
What would fit these requirements?
I agree with Manfred, what you are looking for is a binary repository manager. Besides the Nexus repository manager you should consider Artifactory.
As for the feature list you asked about:
As you have mentioned the CI server should be responsible for identifying a change in the version control and starting a build process which creates the binaries. The CI server/build tool should also deploy the generated binaries to the repository manager, in case the build was successful. Artifactory offers a build integration feature which takes care of deploying the binaries together with the build metadata.
Using the build integration feature of Artifactory, you can get a list of all the binaries generated by a specific build and download them as an archive. Artifactory provides a REST API for those actions.
There are different approaches for storing the artifacts in a repository manager. Some tools stores a multiple copies of the same binary. Other, for example Artifactory, use a checksum based storage which keeps only one copy per binary (based on its checksum). This pays of if you keep multiple copies of the same binary in different repositories, especially if you are dealing with large binaries (war files, docker images, ISOs etc.). Another benefit are cheap copies/moves between repositories which is a common practice for promotion workflows.
The Artifactory build integration uses checksum based deployment which deploys only binaries which does not exist in Artifactory. For binaries which do exist and have not changed, it only created a new reference to the existing binary saving the need to send the actual bytes.
Artifactory provides multiple option of cleaning up binaries, including built in cleanup policies and the option to develop your own custom logic using user plugins and the Artifactory query language (AQL)
In addition, I highly recommend to take a look at the binary repository comparison matrix.
Disclaimer: I am working for JFrog the company behind Artifactory
You are basically asking for a repository manager like the Nexus Repository Manager as you have correctly identified with the tags.
In terms of specific requirement from your questions here are a couple of ideas.
binary components are typically identified via some coordinates that most of the time includes some sort of name and version. A release and build process changes those and deploys them to the repository. This allows you to match source code with binaries. You can also embed information like git refs in the produced binaries.
accessing the binaries is typically done via HTTP, so its easy. You then just have to determine what it means to get "all binaries".
not duplicating binaries that are essentially the same can be supported by the underlying file system or the build tool. I have seen both processes to work. Often it is however not worth the effort since storage is cheap.
there are various ways to automatically clean up repositories including scheduled tasks that do it regularly. Worst case you have to implement your own logic in an extension
Disclaimer: I work as community advocate and trainer for the Nexus Repository Manager with Sonatype.

How to upload new/changed files from development server to the production one?

Recently I started to incorporate good practices in my development workflow, so I split the development server and the production one. I also incorporated a versioning system using Subversion (Tortoise SVN).
Now I have the problem of synchronize the production server (Apache shared hosting) with the files of the last development version in my local machine.
Before I didn't have this problem because I worked directly with the server files through Filezilla. But now I don't know how to transfer the files in an efficient way and what are the good practices in this aspect.
I read something about Ant and Phing but I'm not sure if this appropiate to me or is unnecessary complexity.
Rsync is a cross-platform tool designed to help in situations like this; I've used it for similar purposes on multiple occasions. This DevShed tutorial may be of some help.
I don't think you want to "authomatize" it, rather establish control over your deployment and integration process. I generally like SVN but it has some bugs and one problem I have with it is that it doesn't support baselining -- instead you need to make a physical branch of your repository if you want to have a stable version to promote to higher environments while continuing to advance the trunk.
Anyway, you should look at continuous integration and Jenkins. This is a rather wide topic to which not a specific answer can be given. There are many ins, outs, what-have-yous. Depends on your application platform, components, do you have database changes, are you dealing with external web services or 3rd party APIs etc.
Maybe out there are more structured solutions but with Tortoise SVN you can export only the files changed between versions in a folder tree structure. And then, upload as always in Filezilla.
Take a look to:
http://verysimple.com/2007/09/06/using-tortoisesvn-to-export-only-newmodified-files/
Using TortoiseSVN, right-click on your working folder and select
“Show Log” from the TortoiseSVN menu.
Click the revision that was last published
Ctrl+Click the HEAD revision (or whatever revision you want to
release) so that both the old and the new revisions are
highlighted.
Right-click on either of the highlighted revisions and select
“Compare revisions.” This will open a dialog window that lists all
new/modified files.
Select all files from this list (Ctrl+a) then right-click on the
highlighted files and select “Export selection to…”
Side note:
You have to open more details about your workflow and configuration - applicable solutions depends from it. I see 4 main nodes in game: Workplace, Repo Server, DEV, PROD, some nodes may be united (1+2, 2+3), may have different set of tools (do you have SSH, Rsync, NFS, Subversion clients on DEV|PROD). All details matter
In any case - Subversion repositories have such thing, as hooks, in your case post-commit hook (executed on Repository Server side after each commit) may be used
If this hook (any code, which can be executed in unattended mode) you can define and implement any rules for performing deploy to any target under any conditions. You must only know
Which transport will be used for transferring files
What is your webspaces on servers (Working Copies of just clean unversioned files - both solution have pro and contra sets) - it will define, which deployment-policy ("export" or "update") you have to implement in hook
Some links to scripts, which export files, affected by revision (or range of revisions) into unversioned tree

Management tools for configuration properties in the same style as tools like liquibase and flyway?

I'm wondering if there are any existing tools that allow for management on configuration properties (like Java properties files, or other similar formats) in the same style as database migration tools like liquibase and flyway.
I'd like a better way of managing my application's configuration properties which would allow me to track their evolution over time.
Does something like this exist?
Isn't it enough if all config is externalized and version controlled? Tracking evolution then is just a matter of version history. Unlike db migration scripts, config doesn't execute to upgrade something to version x to verion y. The continuous delivery book says:
Generally, we consider it bad practice to inject configuration
information at build or packaging time. This follows from the
principle that you should be able to deploy the same binaries to every
environment so you can ensure that the thing that you release is the
same thing that you tested.
And then there is this advice to store it all as environment variables.

migrating from gforge to teamforge

can anyone give an detailed procedure on how to
migrate projects from gforge version(4.5) to teamforge version 5.2.0.
Migration includes source repository, bug tracking, wiki and discussions.
is it possible to shift all of them.
What's the best way to handle a situation like this?
Thank
You
Teamforge uses Subversion, so assuming you have commandline access to the server then you can use the svnadmin dump and svnadmin load commands.
You will need to run these commands for each repository.
Some (I don't know which) of the pages and wiki are also stored in subversion repositories, so you may be able to migrate that using the same method.

Migrating Kettle file repository into Kettle DB repository

Is it possible to migrate a Kettle file based repository (including history of all files) into a Kettle database repository?
Yes. Just open all the file (all together or one by one), and save them (all together or one by one) in the repository.
You will loose however all the versioning (maybe that is what you call file history), unless you have the enterprise edition.
If you are wit a the community version, I suggest you use the database repository for the current version, and keep a one a week or so, export it to a file repository.