Software configuration management tool for hundreds of binary files, many are large

Software configuration management tool for hundreds of binary files, many are large - configuration-management

Note: I've tried searching, Stackoverflows near useless. I am not sure what kind of tool I need.
At my organization we need to keep track of the software configuration for many types of computers including the binary installers and automation scripts. Change is infrequent but the size of latest version of the configuration is several gigs.
We are trying to use Mercurial to store changes but it is just too slow, even without many revisions at all. I did an hg status but killed it after it took 10 minutes without finishing.
We are looking for a way to store the current configuration as well as having the old configurations there just in case. I have never done anything like this before and do not know what tools are available or even suitable for such tasks. Can someone point me in the right direction or tell me how the are solving this problem? Thanks

Since hard disk space is cheap and being able to view binary differences isn't very helpful, perhaps the best option you have is to store each configuration in a new directory that is indexed somehow. Example below:
/software/configs/2009-03-15
/software/configs/2009-09-28
/software/configs/2009-09-30
Given the size of your files and the infrequent number of changes, this would allow you to pick a configuration from a given 'tag' without the overhead of revision control.
If you pack your files into a single tar file and generate a SHA-512 hash, then you can be reasonably sure that no one has tampered with your files since they were archived.

While I don't know specific details about how to implement this strategy in mercurial, I have been working with git and git-fat. It sets up a general procedure that is likely to be feasible on mercurial as well. Basically the idea is whenever you add a binary file to the repository, under the hood, the repo creates a symlink to the file that is actually stored in another location as a checksummed object.
This allows large files to be tracked by the repo, without storing the actual data inside. It requires the data to be stored in some other location (perhaps in a binary management system).
It might take some configuration to do it in mercurial, but I think it's an elegantly simple solution.

Related

Attaching a specific piece of non-intrusive info to a file or folder to keep a connection to a program

This is going to be a question with a lot of hypotheticals, but it's been on my mind for a while now and I finally want to get some perspectives on how to tackle this "issue". For the sake of the question, I'll make up an example requirement of how the program I want to make would work on a conceptual level without too many specifics.
The Problem
I want to create a program to keep track of miscellaneous info for files and folders. This miscellaneous info can be anything from comments, authors, to more specific info like the original source of the file (a URL for example), categories, tags, and more. All this info is kept track of in an SQLite database.
Now... how would you create a connection to the file (or folder) to the database? Whatever file is added to the program, the file should continue to operate on an independent level from the program, meaning you should be able to edit, copy, move, rename or do anything else with the file you would usually do with your OS of choice - even deleting it.
You should even be able to archive it, zip it, upload it somewhere or do other things that temporarily or permanently removes the file from your system, without losing the connection to the database. The program itself doesn't actually ever touch the files themselves, unless to generate a new entry in the database, but obviously, there should be some kind of reference in the file to a database entry in the program.
Yes, I know that if you delete the file, you would have a dead entry in the database. For now, just treat this as an unfortunate reality that can't be solved unless you incorporate the file more closely into the program.
Possible solutions and why I decided against them
Reference inside Filename
Probably the most obvious choice, you could just have a reference inside the filename to point to a database entry, for example by including the id at the start of the filename:
#1 my-example-file.txt
#12814 this-is-one-of-many-files.txt
Obviously, that goes against what I established earlier, as you would be restricted from freely renaming the file. You would always have to keep in mind to not mess with the id inside the filename, or else the connection to your program is broken. Unfortunately, that is the best bet I currently have, but I would like to avoid using that approach if possible.
Alternate Data Streams (ADS)
A pretty cool feature I recently discovered that's available on NTFS file systems, ADS allows you to store different streams of data for your files, to grossly simplify it. You could attach a data stream to your file that saves the id for the database entry in the program, and a regular user would never be able to mess directly with that.
However, since this is a feature reserved for specific file systems, there's some ugly side effects to ADS, as you can easily lose that part of the file by:
moving/copying it to a file system that doesn't support ADS, such as the file systems most often used in removable drives
uploading it to a cloud then later downloading it
moving it to another OS that might not support ADS or treats it in an unexpected way
zipping it
Thus I can't really rely on ADS either.

Objective-C - Finding directory size without iterating contents

I need to find the size of a directory (and its sub-directories). I can do this by iterating through the directory tree and summing up the file sizes etc. There are many examples on the internet but it's a somewhat tedious and slow process, particularly when looking at exceptionally large directory structures.
I notice that Apple's Finder application can instantly display a directory size for any given directory. This implies that the operating system is maintaining this information in real time. However, I've been unable to determine how to access this information. Does anyone know where this information is stored and if it can be retrieved by an Objective-C application?

IIRC Finder iterates too. In the old days, it used to use FSGetCatalogInfo (an old File Manager call) to do this quickly. I think there's a newer POSIX call for that these days that's the fastest, lowest-level API for this, especially if you're not interested in all the other info besides the size and really need blazing speed over easily maintainable code.
That said, if it is cached somewhere in a publicly accessible place, it is probably Spotlight. Have you checked whether the spotlight info for a folder includes its size?
PS - One important thing to remember when determining the size of a file: Mac files can have two "forks", the data fork, and the resource fork (where e.g. Finder keeps the info if you override a particular file to open with another application than the default for its file type, and custom icons assigned to files). So make sure you add up both forks' sizes, or your measurements will be off.

Quick backup system for large projects

I've always backed up all my source codes into .zip files and put it in my usb drive and uploaded to my server somewhere else in the world.. however I only do this once every two weeks, because my project is a little big.
Right now my project directories (I have a few of them) contains a hierarchy of c++ files in it, and interspersed with them are .o files which would make backing up take a while if not ignored.
What tools exist out there that will let me just back things up efficiently, conveniently and lets me specify which file types to back up (lots of .png, .jpg and some text types in there), and which directories to be ignored (esp. the build dirs)?
Or is there any ingenious methods out there that people use?

Though not a backup solution, a version control manager on a remote server responds to most of your needs:
only changes are saved, not the whole project
you can filter out what you don't want to save
Moreover, you can create archives of your repository for true backup purposes.
If you want to learn about version control, take a look at Eric Sink's weblog, in particular:
Source Control HOWTO, for the basics of source control
Mercurial, Subversion, and Wesley Snipes for the links to articles on distributed version control systems

I use dropbox, im a single developer developing software. In some projects I work out from my dropbox which means they synchronize every time i build. Other projects i copy the source code there my self. But most important is that i can work on all my computers with dropbox installed on them... works for my simple needs

Agree with mouviciel. If you do not want that, consider rsync or unison to efficiently keep an up-to-date copy, be it on the same or a different machine.

What is your review process for Rhapsody development?

My team is using the IBM's Rhapsody tool to do real-time embedded development. Unfortunately, we are unhappy with our current review process.
More specifically, we've had difficulty because:
there is a lack of a good diff tool for diagram changes
the Rhapsody diff tool doesn't generate reports that you can use in a review
source file history is spotty because source files are products in MDD thus not configured in a VCS at a high granularity
running diffs on source code sometimes pulls in unrelated changes made by other devs
sometimes changing a property of a model element changes dozens of source files
it's easy to change a source file through a property change and not know it
Does anyone have any tips for making peer reviews on Rhapsody development robust but low-hassle? Any best practices and lessons learned you would like to share? I'm not looking for a mature process write-up; tidbits I didn't know about would be great.

We use Rhapsody for the same purpose at my workplace. Reviews of model changes are done with a script that opens diffmerge on two copies of our repository (one at the start of the changes, one at the latest). That shows all of the pertinent changes, without any of the internal cruft Rhapsody adds.
Our repo doesn't track the generated sources, but we see plenty of irrelevant changes in Rhapsody's sbs files frequently. We've started setting sbs files as read-only on the filesystem, and then changing them to read/write from the properties panel in Rhapsody. That doesn't stop the files you mark as read/write from having cruft inserted, but it prevents unrelated files from being modified.
I still haven't found a way to make Rhapsody stop inserting irrelevant changes (for example: it sometimes adds and removes filename fields between saves, despite minimal changes to the model). It creates a lot of merge conflicts, and I've personally started taking 5 or so minutes per commit to only add the changes that matter.

We have been using Rhapsody for development for the past 5 years. Our current process involves using the Rhapsody COM interface and the Microsoft Word COM interface to dump review packages to Word for design reviews. We also do this to generate the reference manual portion of our SUM.
For code we review the generated source.
We put the model into our version control system, and lock down model elements after they have been reviewed. If your version control tool makes things read only when they are checked in, it prevents you from accidentally changing a model element.
The COM interface is also good for dumping the model to make PowerPoint slides of diagrams if you want to present your design to a customer. You will have to tweak the slides after they are generated, as the pictures usually end up looking a little funny, but it gives a quick starting point.

It is also possible to prevent Rhapsody from writing timestamps to the sbs files by setting the property CG::General::IncrementalCodeGenAcrossSession to false. This can help reduce the amount of unnecessary data.
See this link

Best approach to perform a CMMI Physical Configuration Audit?

The organization I currently work for an organization that is moving into the whole CMMI world of documenting everything. I was assigned (along with one other individual) the title of Configuration Manager. Congratulations to me right.
Part of the duties is to perform on a regular basis (they are still defining regular basis, it will either by quarterly or monthly) a physical configuration audit. This is basically a check of source code versions deployed in production to what we believe to be the source code versions in production.
Our project is a relatively small web application with written in Java. The file types we work with are java, jsp, xml, property files, and sql packages.
The problem I have (and have expressed but seem to be going ignored) is how am I supposed to physical log on to the production server and verify file versions and even if I could it would take a ridiculous amount of time?
The file versions are not even currently in the file(i.e. in a comment or something). It was suggested that we place visible version numbers on each screen that is visible to the users also. I thought this ridiculous also, since the screens themselves represent only a small fraction of the code we maintain.
The tools we currently use are Netbeans for our IDE and Serena Dimensions as our versioning tool.
I am specifically looking for ideas on how to perform this audit in a hopefully more automated way, that will be both accurate and not time consuming.
My idea is currently to add a comment to the top of each file that contains the version number of that file, a script that runs when a production build is created to create an XML file or something similar containing the file name and version file of each file in the build. Then when I need to do an audit I go to the production server grab the the xml file with the info, and compare it programmatically to what we believe to be in production, and output a report.
Any better ideas. I know this has to have been done already, and seems crazy to me that I have not found any other resources.

You could compute a SHA1 hash of the source files on the production server, and compare that hash value to the versions stored in source control. If you can find the same hash in source control, then you know what version is in production. If you can't find the same hash in source control, then there are untracked modifications in production and your new job title is justified. :)

The typical trap organizations fall into with the CMMI is trying to overdo everything. If I could suggest anything, it'd be start small & only do what you need. So consider any problems that you may have had in the CM area peviously.
The CMMI describes WHAT an organisation should do, but leaves the HOW up to you. The CMMI specification, chapter 2 is well worth a read - it describes the required, expected, and informative components of the specification - basically the goals are required, the practices are expected, and everything else is informative. This means there is only a small part of the specification which a CMMI appraiser can directly demand - the goals. At the practice level, it is permissable to have either the practices as described, or acceptable alternatives to them.
In the case of configuration audits, goal SG3 is "Integrity of baselines is established and maintained". SP3.2 says "Perform configuration audits to maintain integrity of the configuration baselines." There is nothing stated here about how often these are done, or how long they may take.
In my previous organisation, FCA/PCA was usually only done as part of the product release process, and we used ClearCase as the versioning tool, with labels applied across the codebase to define baselines. We didn't have version numbers in all the source files, nor did we have version numbers on all the products screens - the CM activity was doing the right thing & was backed up by audits, and this was never an issue in any CMMI appraisal.
We could use the deltas between labels to look at what files had changed, perform diffs to see the actual code changes. An important part of the process is being able to link those changes back to either a requirement/bug report/whatever the reason was which initiated the change.
Our auditing did use scripts to automate the process, but these were in-house developed scripts are specific to ClearCase - basically they would list all the files, their versions in the CM system, and the baseline/config item to which they belonged.

can't you use your source control for this? if you deploy a version and tag your sourcecontrol with that deployment, you can then verify against the source control system

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas