override mercurial username with username from apache authentication - authentication

I've set up a repository that is served through apache2. Users first need to authenticate to apache in order to read / write to the repository.
I has come to my attention, that if users set some crazy name as 'username', this name will used for the commit - and not the apache authentication name.
Now, is there a way so that either
the username is replaced by the apache login name?
or I add the apache login name to the username as defined in the commit?
I know that subversion & apache will always use the apache login name, so that should be possible with mercurial too, right?
EDIT:
I think what I need is to write a hook which extracts the http username and checks whether it matches the commit username. if it doesn't, then the push request should be rejected.
Does anyone know how to do this?

This is the wrong approach to this, and is guaranteed to cause more headache and problems than whatever problem it is that you're trying to solve right now.
Let's assume that you succeeded in implementing the proposed method, what would happen?
Well, in my local repository, that I'm trying to push, I have changesets 1, 2, and 3, with hashes ABC, DEF and KLM. For some reason, I did not use the apache username when committing, so they're wrong, according to your proposed changes.
I push to the server.
In-flight, your code changes my commits to have the apache username instead. This causes the hashes of those changesets to become recalculated, and different. In other words, my changeset 1, 2, and 3 will now have hashes XYZ, DEF and JKL.
So now my changes are on the server. I did not get a conflict during push since I was the last person cloning.
However, if I now try to pull, I now suddenly discover there are 3 changesets I don't have, so I pull those, and discover that I now have those 3 changesets in parallel with the 3 I had, with the same contents, a different committer name, and different hashes.
This is how every push and pull will behave from now on.
You push, and immediately you can pull the "same" changesets back, with new hashes, in a parallel branch to yours.
And now the fun begins. How does your local client discover how what to push? It asks the server, "what do you have?", and then compare that. Well, the server still doesn't have your 3 original changesets, so the outgoing-command is going to figure, well, those 3 changesets should be pushed.
But if you try to push that, you then recreate the same 3 new changesets, which can't be pushed, so you're going to have troubles with that.
What you have to do is impose the following workflow on your users:
Push the new changesets
Pull the new changesets back, in their new form
Strip out the original changesets that was pushed
A better approach would be for the server to prevent the push in the first place, with a message about using the wrong commit name.
Then you place the burden on the user to fix those changesets before trying to push, for instance by importing them into MQ and reapplying them one at a time.
Or... not.
What if I do a pull from you? You fix a bug, and you're not yet ready to push everything to the server, so you allow me to pull from you, and now I have outgoing changesets with your name on it, and a server that will enforce my name on them all.
About now you should realise that this approach is going to cause a lot of problems, you're basically trying to make a distributed version control system behave like a centralised version control system.

Related

How do I detect changes in vCards?

I am developing a library to edit contacts on a CardDAV Server and I wonder what is the proper way to sync contacts.
So when I find an etag for a specific contact changed: How do I sync both?
Do I just combine the changed data, e.g. phone numbers? Or must one side (Server or client) win? And how to I detect if a number changed or was added?
The Building a CardDAV client document explains all this very well.
But to address your questions:
So when I find an etag for a specific contact changed: How do I sync both?
You load the vCard from the server. Then it depends on the logic of your client. Do you want to auto merge? Do you want to prompt the user whether he wants to merge? Etc.
Usually you want to auto-merge. So do this. After you have the merged vCard, PUT that again to the server, but make sure to use the If-Match header to ensure that it didn't change again on the server side.
Do I just combine the changed data, e.g. phone numbers?
What you consider useful is entirely up to your application. But just combining fields may not be what you want. For example you wouldn't be able to detect deletes.
So in most cases this is going to be a three-way merge:
old version of the server (stored locally)
new version of the server (that you just fetched)
current version of the local application
Or must one side (Server or client) win?
Some clients do it like that, but this is not required. However, if you modify after a change, you need to be VERY careful with sync cycles!
And how to I detect if a number changed or was added?
You store the old copy you know and diff.
In general it is a good idea to store the (last known) opaque server copy locally and just pick out the fields your client cares about. Then when uploading the item again, you just patch the ones again. (and preserve the rest of what the server sent you).
Summary: A proper vCard diff and local cache is non-trivial. Many clients fail on that and loose or dupe user data.
So unless you plan to put the necessary work and testing into this, an easier way is to detect the changes and ask the user what he wants to do (let server win, force user copy, merge).

Git: merge two working copies without committing

I have a clone for server development and another for client development. Both material will eventually make it into the same branch, but I want to synchronize them and I want it to perform a merge as though I had commit pushed and pulled, but I want to do it without that.
I'm able to make a patch with this script I wrote:
git diff --cached
git diff
on the server, but applying that to the client is much harder.
I've tried the Unix patch command, for some reason, it keeps asking me what files to patch, like I can't find them. (Yes, they're there) I've tried
git apply -3 patch.patch
but that gives a lot of errors like "with conflicts" (without making any changes) and "does not match index". It doesn't even seem to be trying to patch the other half of the files.
Stashing, then applying the patch, and then popping from the stash doesn't work, because unstashing refuses to do merges.
It looks like doing it without the pulling isn't going to work--I haven't found a way to do it conveniently and safely. However, my problem with committing is that I didn't want to spam the git log with garbage like:
Sync'ing to client
Sync'ing back to server
Oops! Sync'ing something I forgot to the server again!
etc.
But I can avoid all this by committing, then pulling from the remote repos. In the end, I wouldn't have to push those commits, since I would use reset to remove them all from the local repo and then, with all my changes in the working directory, do a proper commit and push it.
Gotachas
They are many.
It's commonly known that you shouldn't reset your local repo if something has already pulled from it. This is probably from the obvious confusion that results when one repo delete commits that another repo believes were there. For that reason it's important that the same reset is performed on both repos before they start sharing code again.
If after you've done your commits that you later want to reset, then pull/merge, you could make things very difficult for yourself. There should be a way to manage it, but I haven't yet figured it out. One idea is to reset, stash, pull, merge, and commit again. Another involves revert with the -n option.
Instructions
The following example assumes you have 2 clones; one called "client" and the other "server".
Following https://help.github.com/articles/adding-a-remote, setup your client's and server's repo on each others' systems to they can pull from each other.
When you want to sync, just commit on the donor system, then instead of pulling from the origin, pull from a remote. Say the client wanted a commit from the server. On the client:: git pull myserver-repo mybranch.
Merge and conflict-resolve as necessary.
Loop back to 2 as many times as is necessary.
After several iterations of 2-4, you arrive at the point when you are ready to push your changes to the server. Go to whichever local repo has all the changes you want pushed, then run git log. Find the commit before the first commit you did in 2. Copy its hash to the clipboard.
Then git reset: git reset <hash you copied in 5>.
You should then see all the commits you don't want disappear from the log and all the changes therein in your working directory. Commit and push.
It's important that you do a cleanup on the repo from which you didn't perform 5-7. So if you pushed from your server repo, you need to perform the same reset operation on your client, then dispense with the changes as you see fit. My preferred method is git stash save "delete_me".

CloudTrail RunInstances event, who actually provisioned EC2 instance when STS AssumeRole used?

My client is in need of an AWS spring cleaning!
Before we can terminate EC2 instances, we need to find out who provisioned them and ask if they are still using the instance before we delete them. AWS doesn't seem to provide out-of-the-box features for reporting who the 'owner'/'provisioner' of an EC2 instance is, as I understand, I need to parse through gobs of archived zipped log files residing in S3.
Problem is, their automation is making use of STS AssumeRole to provision instances. This means the RunInstances event in the logs doesn't trace back to an actual user (correct me if I'm wrong, please please I hope I am wrong).
AWS blog provides a story of a fictional character, Alice, and her steps tracing a TerminateInstance event back to a user which involves 2 log events: The TerminateInstance event and an event "somewhere around the time" of an AssumeRole event containing the actual user details. Is there a pragmatic approach one can take to correlate these 2 events?
Here's my POC that's parsing through a cloudtrail log from s3:
import boto3
import gzip
import json
boto3.setup_default_session(profile_name=<your_profile_name>)
s3 = boto3.resource('s3')
s3.Bucket(<your_bucket_name>).download_file(<S3_path>, "test.json.gz")
with gzip.open('test.json.gz','r') as fin:
file_contents = fin.read().replace('\n', '')
json_data = json.loads(file_contents)
for record in json_data['Records']:
if record['eventName'] == "RunInstances":
user = record['userIdentity']['userName']
principalid = record['userIdentity']['principalId']
for index, instance in enumerate(record['responseElements']['instancesSet']['items']):
print "instance id: " + instance['instanceId']
print "user name: " + user
print "principalid " + principalid
However, the details are generic since these roles are shared by many groups. How can I find details of the user before they Assumed Role in a script?
UPDATE: Did some research and it looks like I can correlate the Runinstances event to an AssumeRole event by a shared 'accessKeyId' and that should show me the account name before it assumed a role. Tricky though. Not all RunInstances events contain this accessKeyId, for example, if 'invokedby' was an autoscaling event.
Direct answer:
For the solution you are proposing, you are unfortunately out of luck. You can take a look at http://docs.aws.amazon.com/IAM/latest/UserGuide/cloudtrail-integration.html#w28aac22b9b4b7b3b1. On the 4th row, it says that the Assume Role will save the Role identity only for all subsequent calls.
I'd contact aws support to make sure of this as I might very well be mistaken.
What I would do in your case:
First, wait a couple of days in case someone had a better idea or I was mistaken and aws support answers with an out-of-the-box solution
Create an aws config rule that would delete all instances that have a certain tag. Then tell your developers to tag all instances that they are sure that should be deleted, then these will get deleted
Tag all the production instances and still needed development instances with a tag of their own
Run a script that would tag all of the untagged instances with a separate tag. Douple and triple check these instances.
Back up and turn off the instances tagged in step 3 (without
deleting the instances).
If someone complained about something not being on, that means they
missed an instance in step 1 or 2. Tag this instance correctly and
turn it on again.
After a while (a week or so), delete the instances that are still
stopped (keep the backups)
After a couple months, delete the backups that were not restored
Note that this isn't foolproof as it has the possibility of human error and possible downtime, so double and triple check, make a clone of the same environment and test on that (if you have a development environment that already has such a configuration, that would be the best scenario), take it slow to be able to monitor everything, and be sure to keep backups of everything.
Good luck and plzz tell me what your solution ended up being.
General guidelines for the future:
Note: The following points are very opiniated, and are general rules that I abide by as I find them saving me a load of trouble from time to time. Read them, dismiss what you find as unfit for you and take the things that you find reasonable.
Don't use assume role that often as it obfuscates user access. In case it was a script run on the developer's pc, let it run with their own username. If it's running on a server, keep it with the role it was created in. The amount of management will be less that way as you just cut the middle-man (the assume-role) and don't need to create roles anymore, just assign the permissions to the correct group/user. Take a look below for when I'd consider using the assume-role as a necessity.
Automate deletions. The first things you should create is automating the task of keeping the aws account as clean as possible as this would save both $$$ and debugging pain. Tags and scripts to act on these tags are very powerful tools. So if a developer needs an instance for a day to try out something new, he can create a tag that times the instance out, then there is a script that cleans it up when the time comes. These are project-specific, and not everyone needs all of these, so see and assess what you need for your project and act on them.
What I'd recommend is giving the permissions to the users themselves in the development environment as it would make tracking things to their root and finding the most knowledgeable person to solve things easier. As of the production environment, everything should be automated anyway (creation when needed and deletion when no longer needed) and no one should have any write access to that account, ever.
As for the assume-role, I only use it in case I want to give access to read-only production logs on another account. Another case would be something that really shouldn't be happening that often, if at all, but still need to give some users access to it. So, as an extra layer of protection against the 'I did it by mistake', I make them switch role to do it, and never have a script that automatically switches roles and do the action in an attempt to make it as deliberate as possible (think deleting a database and such). Another thing would be accessing sensitive information (credit-card database, etc.). Many more scenarios can occur, and here it comes to your judgement.
Again, Good Luck.

Schema migration in Redis

I have an application using redis. I used a key name user:<id> to store user info. Then locally I changed my app code to use key name user:<id>:data for that purpose.
I am scared by the fact that if I git push this new code to my production server things will break. And the reason for this is that since my production redis server would already have the keys will older key names.
So the only way I think is to stop my app, change all the older key names to new ones & then restart it. Do you have a better alternative? Thanks for help :)
Pushing new code to your production environment is always a scary business (that's why only the toughest survive in this profession ;)). I strongly recommend that before you change your production code and database, make sure that you test the workflow and its results locally.
Almost any update to the application requires its stoppage - even if only to replace the relevant files. This is even truer for any changes that involve a database exactly because of the reason you had mentioned.
Even if you can deploy your code changes without stopping the application per se (e.g. a PHP page), you will still want the database change to be done "atomically" - i.e. without any application requests intervening and possibly breaking. While some database can be taken offline for maintenance, even then you usually stop the app or else errors will be generated all over the place.
If that is indeed the case, you'll be stopping the app (or putting it into maintenance mode) regardless of the database change, so we take your question to actually mean: what's the fastest way to rename all/some keys in my database?
To answer that question, similarly to the pseudo-code suggested above, I suggest you use a Lua script such as the following and EVAL it once you stop the app:
for _,k in ipairs(redis.call('keys', 'user:*')) do
if k:sub(-5) ~= ':data' then
redis.call('rename', k, k .. ':data')
end
end
A few notes about this script that you should keep in mind:
Although the KEYS command is not safe to use in production, since you are doing maintenance it can be used safely. For all other use cases where you need to scan you keys, Redis' SCAN is much more advisable.
Since Lua scripts are "atomic", you can in theory run this script without stopping the app - as long as the script runs (which depends on the size of your dataset) the app's requests will be blocked. Put differently, this approach solves the concern of getting mixed key names (old & new). This, however, is probably not what you'd want to do in any case because a) your app may still error/timeout during that time but mainly because b) it will need to be able to handle both types of key names (i.e. running with old keys -> short/long pause -> running with new keys) making your code much more complex.
The if condition is not required if you're going to run the script only once and succeed.
Depending on the actual contents of your database, you may want to further filter out keys that should not be renamed.
To ensure compatibility, refrain from hardcoding / computationally generating key names - instead, they should be passed as arguments to the script.
You can run a migration script in your redis client language, using RENAME.
If you don't have any other control over the total of keys, you first issue a KEYS user:* to list all keys, then substring for getting the numeric id, then renaming.
You can issue all of this in a transaction.
So a little pseudocode and redis commands:
MULTI
KEYS user:*
For each key {
id = <Get id from key>
RENAME user:id user:id:data
}
EXEC
Got it ?

How to prevent Trac to show some commits in the Timeline?

I'm trying to configure a trac server we are using in my team, in order to avoid an undesired behaviour. We are mainly developing free and open-source software in the team, but we sometimes need to be able to build our early prototypes as completely private.
Because of our first constraint, we want our timeline to be visible for anonymous users. But because of the seconde constraints, we want some commits to be completely hidden from the external world, i.e. we don't want anybody else than us to be able to read the message and content of some commits in the timeline.
Unfortunately, I've been unable to configure Trac the proper way to reach this behaviour untli now. I wan't find a configuration that would let me manage the Timeline content with enough accuracy.
Consequently, I would like to know if such a configuration is possible with trac.
For information, I'm using Trac 0.12.2. The installed plugins are :
Trac 0.12.2
TracAccountManager 0.2.1dev-r7731
TracNav 4.1
The only permission I can see that is related to Timeline is TIMELINE_VIEW.
EDIT :
I have forgot to mention something. We don't want to loose the private commits. And we want them to display for registered users. Consequently, it's not a solution for us to remove them from the database.
EDIT 2 :
Ideally, we would like the commits' message to be displayed according to the right to read the content of our Subversion repository. The idea is that, if a commit is made on a part someone can't access, this person is not supposed to be able to read the message of the commit either.
EDIT 3 :
If we have a look in the configuration file of trac, we already can find :
permission_policies = AuthzSourcePolicy, DefaultPermissionPolicy, LegacyAttachmentPolicy
and the authz_file variable is properly set too. Moreover, svn access to the private folders of the svn repositories can't be accessed by anonymous users.
You should set up authz checking for both your Subversion repository and your Trac installation. You can use the same permission file for both. For Subversion, see Path-based authorization in the SVN book. For Trac, enable and configure the trac.versioncontrol.svn_authz.AuthzSourcePolicy component.
This will allow you to have a very fine-grained control over who can access which part of the repository. Note that the implementation of AuthzSourcePolicy in Trac 0.12.2 has a few bugs that will be fixed in 0.12.3.
There are two ways of going about this :
1) You can directly edit the plugins that are running in trac, and add a module that helps you to filter these out at the code level (i.e. you can edit the behavior of the script to , say, only include commits which exclude certain key words). The timeline script is here (trac 2.4) : /usr/local/lib/python2.4/site-packages/trac/Timeline.py (here is an online diff snapshot of the source code : http://trac.edgewall.org/attachment/ticket/890/Timeline.py.diff)
2) You can remove the commits entirely - trac commits are derived from the sqlLite database (the schema is here http://trac.edgewall.org/wiki/TracDev/DatabaseSchema).
Of course, there also might be some fancy tools out there that provide a nice interface for editing the way the timeline looks.
Finally - temporarily, you can remove the timeline/roadmap entirely from the trac.ini file : http://www.gossamer-threads.com/lists/trac/users/28079
I confess that I've virtually no experience with the repository part of Trac, even less with using a repository with a variety of permissions across it's contents.
On the subject: Configuration is certainly not enough, see rblanks answer. While I've never seen the code for that functionality, I was wrong to suggest it doesn't exist. Because it is a central place and developed/supported in Trac core this is definitely the way to go.