Bamboo OnDemand: how to auto-generate snapshots? - bamboo

I've been busy setting up our bamboo on demand instance to run the builds we previously ran on our local instance.
As many others have already posted about or created tickets for (see for example https://answers.atlassian.com/questions/109719/automated-ebs-snapshots-in-bamboo-ondemand, https://jira.atlassian.com/browse/BAM-11525, https://jira.atlassian.com/browse/AOD-3293), I'm facing the issue of outdated maven repositories on the attached EBS volume as the contents on those volumes are not automatically persisted when the EC2 instance is terminated/shut down.
Specifically, the post at https://answers.atlassian.com/questions/109719/automated-ebs-snapshots-in-bamboo-ondemand is more or less exactly what I want to achieve: automatic generation of a snapshot before shutdown so that all the build artifacts are preserved and don't need to be re-created the next time the instance starts up. However, I was unable to get it to run as suggested in the above post.
Is there anyone who has successfully tackled this issue? If so, how did you do it? If not, any ideas or pointers are greatly appreciated.
Regards,
Chris

Related

ZooKeeper showing non-existent node after network outage

I have a 3 box Solr cloud setup with ZooKeeper, each server has a Solr and ZK install (not perfect I know). Everything was working fine until a network outage this morning.
Post outage boxes A and C came back as expected. Box B did not, a restart of the Solr service revealed an error which states
A previous ephemeral live node still exists. Solr cannot continue.
Upon looking in the B node ZooKeeper Live_Nodes path the Solr install is already showing as an active live node even though Solr is off. This node is not shown on boxes A and B within the Live_nodes path. I'm also unable to delete or rmr this node because ZooKeeper is telling that it doesn't exist.
I have attempted Solr stop -all in case there was a hidden process that I wasn't seeing but Solr states that there are no instances running.
Next move was installing a fresh ZooKeeper instance on B. After that was up a ls /live_nodes continues showing this solr instance that doesn't exist.
Any help is appreciated. Thank you.
FYI, I continued troubleshooting and eventually rebuilt all 3 ZooKeeper nodes. That led me to a separate error of showing that the collection shard was broken. After troubleshooting the 'clusterstate.json' file, what ended up being the fix was creating a duplicate collection with a separate name and then an alias for redirecting traffic. After this I was able to delete the broken collection.
I'm thinking a duplicate collection and alias would have fixed it whole time.
Hopefully this helps someone in the future.
Thanks.
We had a similar issue recently and were able to delete the data from /solr/live_nodes by doing the following listed below and then solr was able to start up and get past the issue from OP.
Adding this as hope it will help someone else in the future.
Example data ZK shell in /solr/live_nodes:
[solr.node1.sp.local:8983_solr, solr.node2.sp.local:8983_solr]
Create the solr nodes again (fails with Node already exists):
create /solr/live_nodes/solr.node1.sp.local:8983_solr
create /solr/live_nodes/solr.node2.sp.local:8983_solr
Set some data on the nodes:
set /solr/live_nodes/solr.node1.sp.local:8983_solr "hello"
set /solr/live_nodes/solr.node2.sp.local:8983_solr "hello"
Delete the nodes:
delete /solr/live_nodes/solr.node1.sp.local:8983_solr
delete /solr/live_nodes/solr.node1.sp.local:8983_solr
After that we were able to start up solr and that issue was resolved and /solr/live_nodes was repopulated.

Prevent build concurrency in drone ci

My Jenkins setup that provides this feature saves me alot of headache when I get 100 callbacks from services telling me that data has changed asking to rebuild the same build. If I ask for 100 new builds Jenkins just add one in the que pipeline without adding 100. It also has a option to wait for the last build to finish before starting a new one.
I found a old ticket https://github.com/drone/drone/issues/683 related to this and was suggested to ask it here first to see if the current version has a way of doing it. As far as i can tell there doesn't seem a option to achieve this?
http://docs.drone.io/pipelines/

File Addition and Synchronization issues in RavenFS

I am having a very hard time making RavenFS behave properly and was hoping that I could get some help.
I'm running into two separate issues, one where uploading files to the ravenfs while using an embedded db inside a service causes ravendb to fall over, and the other where synchronizing two instances setup in the same way makes the destination server fall over.
I have tried to do my best in documenting this... Code and steps to reproduce these issues are located here (https://github.com/punkcoder/RavenFSFileUploadAndSyncIssue), and video is located here (https://youtu.be/fZEvJo_UVpc). I tried looking for these issues in the issue tracker and didn't find something directly that looked like it related, but I may have missed something.
Solution for this problem was to remove Raven from the project and replace it with MongoDB. Binary storage in Mongo can be done on the record without issue.

Ubuntu + PBS + Apache? How can I show a list of running jobs as a website?

Is there a plugin/package to display status information for a PBS queue? I am currently running an apache webserver on the login-node of my PBS cluster. I would like to display status info and have the ability to perform minimal queries without writing it from scratch (or modifying an age old python script, ala jobmonarch). Note, the accepted/bountied solution must work with Ubuntu.
Update: In addition to ganglia as noted below, I also looked that the Rocks Cluster Toolkit, but I firmly want to stay with Ubuntu. So I've updated the question to reflect that.
Update 2: I've also looked at PBSWeb as well as MyPBS neither one appears to suit my needs. The first is too out-of-date with the current system and the second is more focused on cost estimation and project budgeting. They're both nice, but I'm more interested in resource availability, job completion, and general status updates. So I'm probably just going to write my own from scratch -- starting Aug 15th.
Have you tried Ganglia?
I have no personal experience but few sysadmin I know are using it.
Following pages may help,
http://taos.groups.wuyasea.com/articles/how-to-setup-ganglia-to-monitor-server-stats/3
http://coe04.ucalgary.ca/rocks-documentation/2.3.2/monitoring-pbs.html
my two cents
Have you tried using nagios: http://www.nagios.org/ ?

Installing and Removing Custom Performance Counters Issue

I just executed installutil on a DLL in which custom performance counters are installed. I installed 2 categories but then realized I had an issue with the first category, so I deleted the category but before deleting I ran an asp.net app against to make sure it was working.
The issue is after deleting the category and then recreating the application is logging to the custom perfmon counter but the values never get updated.
The second custom category works fine and counter is getting populated. I can see both categories within perfmon but noticed that the first category counters never get updated when running an asp.net against it.
Has anyone run into this issue. Do I need to delete the existing instance? I'm trying to avoid a reboot of the machine.
depending on how you install the counter, (assuming transacted installation let's say...) perf counters can get "orphaned".
IMHO this is because perf counters seem to get installed in the Reg and "elsewhere" <--still trying to find out where else perf counter info gets stored.
In some cases, the regkeys get built appropriately and so register as appropriate but the OS "elsewhere" location is not properly built out. It's almost like there is a perfcounter cache somewhere. ( comments anyone?)
So in summary after installation run lodctr /R from the commandline with the appropriate perms and this "seems" to solve the issue for most installations. I would be interested to see what others say about this as the generally available documentation (i.e. MS) SUCKS beyond belief on this topic...
grrr.