Torque PBS Manager permission rules cannot be changed - jobs

When I try to change the queue as such:
set queue standard total_jobs=16
I get the following error:
qmgr obj=standard svr=default: Cannot set attribute, read only or insufficient permission total_jobs
I am issuing the command as root.

total_jobs does not appear as valid qmgr parameter in the documentation.
http://docs.adaptivecomputing.com/torque/help.htm#topics/12-appendices/serverParameters.htm
I'm not sure what total_jobs is supposed to do. Maybe you are looking for max_user_queuable

Related

Datastream Troubleshoot: "An unknown error occurred. Please try again. If the error persists, contact Google support"

We are trying to replicate data from AlloyDB to Bigquery using Datastream.
We Get "An unknown error occurred. Please try again. If the error persists, contact Google support."
In the Datastream console --> objects list, we see all source tables with Object Status "Failed" and Backfill status "Completed".
In Bigquery we see only a subset of the tables (not all the "Completed" objects were synced).
In the Logs Explorer I can see this error on BQ:
I also see this error: error: {
code: 11
message: "Unsupported primary key column either does not exist or is a pseudocolumn at [1:401]"
}
The column referred in the error is of type enum.
The desired situation is having all the AlloyDB tables replicated into Bigquery.
The error message is not very informative...
What does it mean?
What would be the best way to go about troubleshooting this?
We're actively working on making these error messages be more informative, and improvements are continuously being rolled out as we identify more edge cases. Assuming you followed all the steps in the documentation, then you may need to open a ticket with support for further investigation. If a support ticket isn't an option, you can still report the issue using the public issue tracker
I just had this same issue but connecting to a PostgreSQL in AWS RDS:
Beginning with Postgres 10, passwords are encrypted using SCRAM-SHA-256 in PostgreSQL. Google DataStream still expects MD5 password encryption, or it will generate an "unknown error" in the logs and fail the backfills.
You'll need to update your postgresql.conf (or RDS Cluster Parameter Group if you're using AWS like me):
password_encryption = 'MD5'
Restart the database and make sure the parameter has changed with:
SHOW password_encryption;
Reset the password of your users:
ALTER USER "{username}" with password '{password}';
More info from the PostgreSQL docs: https://www.postgresql.org/docs/current/auth-password.html

How to Call package successfully

got error when calling package
error is
Error starting at line : 1 in command -
PKG_Generate_GRNo.GenerateGR(TO_NUMBER(:P164_APP_ID,
'9999999'),:APP_USER,:P164_FIRST_NAME,:P164_LAST_NAME,:P164_EMAIL,:P164_SKYPE_ID,:P164_COUNTRY,:P164_DATE_OF_BIRTH)
Error report - Unknown Command
PKG_Generate_GRNo.GenerateGR(TO_NUMBER(:P164_APP_ID,
'9999999'),:APP_USER,:P164_FIRST_NAME,:P164_LAST_NAME,:P164_EMAIL,
:P164_SKYPE_ID,:P164_COUNTRY,:P164_DATE_OF_BIRTH);
Session state protection violation is definitely an Apex error, relating to your page settings. It seems your package is trying to change the state of a read-only page. See this other question.
The item identifier in the error message P164_COURSECOUNT has the same prefix as the parameters you pass to the package (:P164_APP_ID) so presumably they relate to the same page. We know nothing about your application or its architecture, so it's hard to offer concrete advice. Maybe you need to change the page or item settings, maybe you need to change what the package does. Only you can tell the right course of action.
As you didn't post the whole command, a note: you have to enclose it into begin-end block, e.g.
BEGIN
PKG_Generate_GRNo.GenerateGR (TO_NUMBER ( :P164_APP_ID, '9999999'),
:APP_USER,
:P164_FIRST_NAME,
:P164_LAST_NAME,
:P164_EMAIL,
:P164_SKYPE_ID,
:P164_COUNTRY,
:P164_DATE_OF_BIRTH);
END;
/

Datastax: Block not found error from DSEFS

Spark streaming job running in DSE using DSEFS for check-pointing directory. I see this error in debug log file. How to resolve this error?
ERROR [dsefs-netty-worker-5] 2017-12-01 05:23:02,679 DSE-FS RestServerHandler.scala:126 - [id: 0x9964e082, /<>:58874 :> 0.0.0.0/0.0.0.0:5598] Streaming data to remote end failed.
java.io.IOException: Block not found a3859f30-aa23-11e7-80b9-4b8bdaf197cd
at com.datastax.bdp.fs.server.blocks.BlockService$stateMachine$33$1.apply(BlockService.scala:706) ~[dsefs-server_2.10-5.0.19.jar:5.0.19]
at com.datastax.bdp.fs.server.blocks.BlockService$stateMachine$33$1.apply(BlockService.scala:703) ~[dsefs-server_2.10-5.0.19.jar:5.0.19]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) [scala-library-2.10.6.jar:na]
at com.datastax.bdp.fs.exec.SameThreadExecutionContext$class.executeInSameThread(SameThreadExecutionContext.scala:24) ~[dsefs-common_2.10-5.0.19.jar:5.0.19]
at com.datastax.bdp.fs.exec.SameThreadExecutionContext$class.execute(SameThreadExecutionContext.scala:33) ~[dsefs-common_2.10-5.0.19.jar:5.0.19]
at com.datastax.bdp.fs.exec.SerialExecutionContextProvider$$anon$5$$anon$2.execute(SerialExecutionContextProvider.scala:24) ~[dsefs-common_2.10-5.0.19.jar:5.0.19]
at scala.concurrent.impl.CallbackRunnable.executeWithValue(Promise.scala:40) [scala-library-2.10.6.jar:na]
at scala.concurrent.impl.Promise$DefaultPromise.tryComplete(Promise.scala:248) ~[scala-library-2.10.6.jar:na]
at scala.concurrent.Promise$class.complete(Promise.scala:55) ~[scala-library-2.10.6.jar:na]
at scala.concurrent.impl.Promise$DefaultPromise.complete(Promise.scala:153) ~[scala-library-2.10.6.jar:na]
at com.datastax.bdp.fs.server.blocks.BlockService$stateMachine$1$1.apply(BlockService.scala:60) ~[dsefs-server_2.10-5.0.19.jar:5.0.19]
at com.datastax.bdp.fs.server.blocks.BlockService$stateMachine$1$1.apply(BlockService.scala:60) ~[dsefs-server_2.10-5.0.19.jar:5.0.19]
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32) [scala-library-2.10.6.jar:na]
at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:358) [netty-all-4.0.34.Final.jar:4.0.34.Final]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) [netty-all-4.0.34.Final.jar:4.0.34.Final]
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112) [netty-all-4.0.34.Final.jar:4.0.34.Final]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_112]
This error means DSEFS server failed to find metadata of the data block in the dsefs.blocks Cassandra table. The ids of the file blocks are stored in the dsefs.block_offsets table and they reference blocks stored in dsefs.blocks. If a row exists in dsefs.block_offsets and points to the block id that is absent in dsefs.blocks, you get this error when reading the file.
This error should not happen under normal circumstances and it means the filesystem metadata somehow got into inconsistent state. This may be a bug in the DSEFS implementation, a result of a data loss caused by setting up dsefs keyspace with insufficient replication factor or a result of a write operation that did not finish successfully and was applied only partially.
Please make sure you set dsefs keyspace RF to at least 3 and run nodetool repair to avoid accidental data loss or unavailability of some DSEFS metadata.
If this doesn't help, please contact me directly or through DataStax technical support and provide more details, including logs from the time before the error and more context on what the job was doing when the failure occurred.

Configuring SLURM so it requires the user to specify --account

I'm trying to figure out how to configure SLURM so that a user is required to specify --account when using the SLURM commands (salloc, sbatch, srun). Effectively I want to disable the default account behavior.
Has anyone out there found a simple way to do this?
I had the same requirement to force users to specify accounts and, after finding several ways to fulfill it with slurm, I decided to revive this post with the shortest/easiest solution.
The slurm lua submit plugin sees the job description before the default account is applied. Hence, you can install the slurm-lua package, add "JobSubmitPlugins=lua" to the slurm.conf, restart the slurmctld, and directly test against whether the account was defined via the job_submit.lua script (create the script wherever you keep your slurm.conf; typically in /etc/slurm/):
-- /etc/slurm/job_submit.lua to reject jobs with no account specified
function slurm_job_submit(job_desc, part_list, submit_uid)
if job_desc.account == nil then
slurm.log_error("User %s did not specify an account.", job_desc.user_id)
slurm.log_user("You must specify an account!")
return slurm.ERROR
end
return slurm.SUCCESS
end
function slurm_job_modify(job_desc, job_rec, part_list, modify_uid)
return slurm.SUCCESS
end
return slurm.SUCCESS
Errors resulting from not specifying an account appear as follows:
# srun --pty bash
srun: error: You must specify an account!
srun: error: Unable to allocate resources: Unspecified error
# sbatch submit.slurm
sbatch: error: You must specify an account!
sbatch: error: Batch job submission failed: Unspecified error
These errors are also printed out to the slurmctld log so that you know what the resource allocation issue was for the particular job:
[2017-09-12T08:32:00.697] error: job_submit.lua: User 0 did not specify an account.
[2017-09-12T08:32:00.697] _slurm_rpc_submit_batch_job: Unspecified error
As an addendum, the Slurm Submit Plugins Guide is only moderately useful and you will probably be much better off simply examining the Lua job_submit plugin implementation for guidance.
One option is to set the AccountingStorageEnforce parameter to associations in slurm.conf.
AccountingStorageEnforce
This controls what level of association-based enforcement to impose on job submissions. Valid options are any combination of
associations, limits, nojobs, nosteps, qos, safe, and wckeys, or all
for all things (expect nojobs and nosteps, they must be requested as
well).
By enforcing Associations no new job is allowed to run unless a corresponding association exists in the system. If limits are enforced
users can be limited by association to whatever job size or run time
limits are defined.
Then, with the sacctmgr command, make sure the default account has no access to the defined partitions. Effectively, the users will be denied submission if they do not specify a valid account.
Another option is to write a custom submission plugin, which you can write in Lua. In that script, you can check whether the --account parameter was set and deny submission with a custom message if it was not.

Redis SET command can't fail, but can?

I'm trying to debug some Redis issues I am experiencing and came by some inconclusive documentation about the SET command.
In my Redis config; I have the following lines (snippet):
# Note: with all the kind of policies, Redis will return an error on write
# operations, when there are not suitable keys for eviction.
#
# At the date of writing this commands are: set setnx setex append
On the documentation page for the SET command I found:
Status code reply: always OK since SET can't fail.
Any insights on the definitive behaviour?
tl;dr: SET will return an error response if the redis instance runs out of memory.
As far as I can tell from the source code in redis.c, esentially when a command is to be processed the flow goes like this (pseudo code):
IF memory is needed
IF we can free keys
Free keys
Process the command
SET -> process and return OK response
ELSE return error response
ELSE
Process command
SET -> process and return OK response
It's not exactly written this way, but the basic idea comes down to that: memory is being checked before the command is processed, so even if the command cannot fail, an error response will be returned if there's no memory regardless the actual response of the command.