in icecream scheduler log, what does "<host> not eligible " mean? - ice

While diagnosing slow build times using icecream. I encounted several instances of the following message in the icecc-scheduler logs:
<hostname> not eligible
What is this trying to communicate to me?

This could stem from a few scenarios and they are documented in the source code here
bool CompileServer::is_eligible(const Job *job)
{
bool jobs_okay = int(m_jobList.size()) < m_maxJobs;
bool load_okay = m_load < 1000;
bool version_okay = job->minimalHostVersion() <= protocol;
return jobs_okay
&& (m_chrootPossible || job->submitter() == this)
&& load_okay
&& version_okay
&& m_acceptingInConnection
&& can_install(job).size()
&& this->check_remote(job);
}
I'd check to make sure that each host is:
iceccd is run as root
each host has the same version of icecc installed. Verify with /usr/bin/icecc --version
plenty of free space left
maximum number of jobs has not been exceeded

Related

Detect if a Tcl script is run in a background process

I'm looking for a preferably cross-platform way to detect from within a Tcl script if the interpreter is running in a foreground or in a background process.
I've seen how to do it via ps (or /proc/$$/stat on Linux); is there a better way or do I have to hack something around that approach? I already have a utility library written in C so exposing the lowlevel API that ps also uses so I don't have to parse process output (or special file content) would be fine.
There's no truly cross-platform notion of foreground, but the main platforms do have ways of doing it according to the notion they have of foreground.
Linux, macOS, and other Unix:
For determining if a process is foreground or not, you need to check if its process group ID is the terminal's controlling process group ID. For Tcl, you'd be looking to surface the getpgrp() and tcgetpgrp() system calls (both POSIX). Tcl has no built-in exposure of either, so you're talking either a compiled extension (may I recommend Critcl for this?) or calling an external program like ps. Fortunately, if you use the latter (a reasonable option if this is just an occasional operation) you can typically condition the output so that you get just the information you want and need to do next to no parsing.
# Tested on macOS, but may work on other platforms
proc isForeground {{pid 0}} {
try {
lassign [exec ps -p [expr {$pid ? $pid : [pid]}] -o "pgid=,tpgid="] pgid tpgid
} on error {} {
return -code error "no such process"
}
# If tpgid is zero, the process is a daemon of some kind
expr {$pgid == $tpgid && $tpgid != 0}
}
Windows
There's code to do it, and the required calls are supported by the TWAPI extension so you don't need to make your own. (WARNING! I've not tested this!)
package require twapi_ui
proc isForeground {{pid 0}} {
set forground_pid [get_window_thread [get_foreground_window]]
return [expr {($pid ? $pid : [pid]) == $foreground_pid}]
}
Thanks to Donal I came up with the implementation below that should work on all POSIX Unix variants:
/*
processIsForeground
synopsis: processIsForeground
Returns true if the process is running in the foreground or false
if in the background.
*/
int IsProcessForegroundCmd(ClientData clientData UNUSED, Tcl_Interp *interp, int objc, Tcl_Obj *CONST objv[])
{
/* Check the arg count */
if (objc != 1) {
Tcl_WrongNumArgs(interp, 1, objv, NULL);
return TCL_ERROR;
}
int fd;
errno = 0;
if ((fd = open("/dev/tty", O_RDONLY)) != -1) {
const pid_t pgrp = getpgrp();
const pid_t tcpgrp = tcgetpgrp(fd);
if (pgrp != -1 && tcpgrp != -1) {
Tcl_SetObjResult(interp, Tcl_NewBooleanObj(pgrp == tcpgrp));
close(fd);
return TCL_OK;
}
close(fd);
}
Tcl_SetErrno(errno);
Tcl_ResetResult(interp);
Tcl_AppendResult(interp, "processIsForeground: ", (char *)Tcl_PosixError(interp), NULL);
return TCL_ERROR;
}
int Pextlib_Init(Tcl_Interp *interp)
{
if (Tcl_InitStubs(interp, "8.4", 0) == NULL)
return TCL_ERROR;
// SNIP
Tcl_CreateObjCommand(interp, "processIsForeground", IsProcessForegroundCmd, NULL, NULL);
if (Tcl_PkgProvide(interp, "Pextlib", "1.0") != TCL_OK)
return TCL_ERROR;
return TCL_OK;
}

Lock between N Processes in Promela

I am trying to model one of my project in promela for model checking. In that, i have N no of nodes in network. So, for each node I am making a process. Something like this:
init {
byte proc;
atomic {
proc = 0;
do
:: proc < N ->
run node (q[proc],proc);
proc++
:: proc >= N ->
break
od
}
}
So, basically, here each 'node' is process that will simulate each node in my network. Now, Node Process has 3 threads which run parallelly in my original implementation and within these three threads i have lock at some part so that three threads don't access Critical Section at the same time. So, for this in promela, i have done something like this:
proctype node (chan inp;byte ppid)
{
run recv_A()
run send_B()
run do_C()
}
So here recv_A, send_B and do_C are the three threads running parallelly at each node in the network. Now, the problem is, if i put lock in recv_A, send_B, do_C using atomic then it will put lock lock over all 3*N processes whereas i want a lock such that the lock is applied over groups of three. That is, if process1's(main node process from which recv_A is made to run) recv_A is in its CS then only process1's send_B and do_C should be prohibited to enter into CS and not process2's recv_A, send_B, do_C. Is there a way to do this?
Your have several options, and all revolve around implementing some kind of mutual exclusion algorithm among N processes:
Peterson Algorithm
Eisenberg & McGuire Algorithm
Lamport's bakery Algorithm
SzymaƄski's Algorithm
...
An implementation of the Black & White Bakery Algorithm is available here. Note, however, that these algorithms -maybe with the exception of Peterson's one- tend to be complicated and might make the verification of your system impractical.
A somewhat simple approach is to resort on the Test & Set Algorithm which, however, still uses atomic in the trying section. Here is an example implementation taken from here.
bool lock = false;
int counter = 0;
active [3] proctype mutex()
{
bool tmp = false;
trying:
do
:: atomic {
tmp = lock;
lock = true;
} ->
if
:: tmp;
:: else -> break;
fi;
od;
critical:
printf("Process %d entered critical section.\n", _pid);
counter++;
assert(counter == 1);
counter--;
exit:
lock = false;
printf("Process %d exited critical section.\n", _pid);
goto trying;
}
#define c0 (mutex[0]#critical)
#define c1 (mutex[1]#critical)
#define c2 (mutex[2]#critical)
#define t0 (mutex[0]#trying)
#define t1 (mutex[1]#trying)
#define t2 (mutex[2]#trying)
#define l0 (_last == 0)
#define l1 (_last == 1)
#define l2 (_last == 2)
#define f0 ([] <> l0)
#define f1 ([] <> l1)
#define f2 ([] <> l2)
ltl p1 { [] !(c0 && c1) && !(c0 && c2) && !(c1 && c2)}
ltl p2 { []((t0 || t1 || t2) -> <> (c0 || c1 || c2)) }
ltl p3 {
(f0 -> [](t0 -> <> c0))
&&
(f1 -> [](t1 -> <> c1))
&&
(f2 -> [](t2 -> <> c2))
};
In your code, you should use a different lock variable for every group of 3 related threads. The lock contention would still happen at a global level, but some process working inside the critical section would not cause other processes to wait other than those who belong to the same thread group.
Another idea is that to exploit channels to achieve mutual exclusion: have each group of threads share a common asynchronous channel which initially contains one token message. Whenever one of these threads wants to access the critical section, it reads from the channel. If the token is not inside the channel, it waits until it becomes available. Otherwise, it can go forward in the critical section and when it finishes it puts the token back inside the shared channel.
proctype node (chan inp; byte ppid)
{
chan lock = [1] of { bool };
lock!true;
run recv_A(lock);
run send_B(lock);
run do_C(lock);
};
proctype recv_A(chan lock)
{
bool token;
do
:: true ->
// non-critical section code
// ...
// acquire lock
lock?token ->
// critical section
// ...
// release lock
lock!token
// non-critical section code
// ...
od;
};
...
This approach might be the simplest to start with, so I would pick this one first. Note however that I have no idea on how that affects performance during verification time, and this might very well depend on how channels are internally handled by Spin. A complete code example of this solution can be found here in the file channel_mutex.pml.
To conclude, note that you might want to add mutual exclusion, progress and lockout-freedom LTL properties to your model to ensure that it behaves correctly. An example of the definition of these properties is available here and a code example is available here.

How is Key * implemented inside Redis?

I would like to know the internal implementation of Redis Key * .
I am implementing a distributed cache functionality.
The internal behavior of the "KEYS *" command is to linearly scan the main dictionary to collect all the keys, and build the result. Expired keys are excluded. Here is the implementation:
void keysCommand(redisClient *c) {
dictIterator *di;
dictEntry *de;
sds pattern = c->argv[1]->ptr;
int plen = sdslen(pattern), allkeys;
unsigned long numkeys = 0;
void *replylen = addDeferredMultiBulkLength(c);
di = dictGetSafeIterator(c->db->dict);
allkeys = (pattern[0] == '*' && pattern[1] == '\0');
while((de = dictNext(di)) != NULL) {
sds key = dictGetKey(de);
robj *keyobj;
if (allkeys || stringmatchlen(pattern,plen,key,sdslen(key),0)) {
keyobj = createStringObject(key,sdslen(key));
if (expireIfNeeded(c->db,keyobj) == 0) {
addReplyBulk(c,keyobj);
numkeys++;
}
decrRefCount(keyobj);
}
}
dictReleaseIterator(di);
setDeferredMultiBulkLength(c,replylen,numkeys);
}
While this operation occurs, no other command can be executed on the Redis server, the event loop being pending on the result of the KEYS command. If the number of keys is high (> 10K), the clients will notice the server is not anymore responsive.
This is a command intended for debugging purpose only. Do not use it in applications.

Call to CFURLCreateFromFileSystemRepresentation sometimes fails

I have an application that loads in a bundle and in doing so I call CFURLCreateFromFileSystemRepresentation before CFBundleCreate: -
bundlePackageURL = CFURLCreateFromFileSystemRepresentation(
kCFAllocatorDefault,
(const UInt8*)bundlePackageFileSystemRepresentation,
strlen(bundlePackageFileSystemRepresentation),
true );
Most of the time, running the same application and loading the same bundle that resides in the application bundle's Resource directory, the function works and returns a valid CFURL. However, with exactly the same parameters passed in to the function, the call sometimes fails.
I now have code to handle the failure: -
CFURLRef bundlePackageURL = NULL;
int attempt = 0;
while((bundlePackageURL == NULL) && (attempt++ < 12000))
{
bundlePackageURL = CFURLCreateFromFileSystemRepresentation(
kCFAllocatorDefault,
(const UInt8*)bundlePackageFileSystemRepresentation,
strlen(bundlePackageFileSystemRepresentation),
true );
// failed to load, so try again
if(bundlePackageURL == NULL)
fprintf(stdout, "Retrying to obtain CFURL: %d...\n", attempt);
}
As you can see, this makes up to 12000 attempts to call the function and when it fails, I've seen it take anything between a few hundred to over 10000 repeated calls before it succeeds.
Can anyone please explain why the function may be failing at times and if this is normal?

Unix C code cp in system()

I have a C code..
i which I have following code for UNIX:
l_iRet = system( "/bin/cp -p g_acOutputLogName g_acOutPutFilePath");
when I am running the binary generated..I am getting the following error:
cp: cannot access g_acOutputLogName
Can any one help me out?
You should generally prefer the exec family of functions over the system function. The system function passes the command to the shell which means that you need to worry about command injection and accidental parameter expansion. The way to call a subprocess using exec is as follows:
pid_t child;
child = fork();
if (child == -1) {
perror("Could not fork");
exit(EXIT_FAILURE);
} else if (child == 0) {
execlp("/bin/cp", g_acOutputLogName, g_acOutPutFilePath, NULL);
perror("Could not exec");
exit(EXIT_FAILURE);
} else {
int childstatus;
if (waitpid(child, &childstatus, 0) == -1) {
perror("Wait failed");
}
if (!(WIFEXITED(childstatus) && WEXITSTATUS(childstatus) == EXIT_SUCCESS)) {
printf("Copy failed\n");
}
}
Presumably g_acOutputLogName and g_acOutPutFilePath are char[] (or char*) variables in your program, rather than the actual paths involved.
You need to use the values stored therein, rather than the variable names, for example:
char command[512];
snprintf( command, sizeof command, "/bin/cp -p %s %s",
g_acOutputLogName, g_acOutPutFilePath );
l_iRet = system( command );