How to fix 'File name too long' errors when using Snakemake - snakemake

When using Snakemake, I store the values for my variables as part of the filenames (ex. "processed/count_{project}.tsv"). Recently, I've started using R formulas with many covariates as a variable. Now I get an error because the the filename is too long for the operating system. Has anyone else run into this issue and have any suggestions? Is there a canonical Snakemake approach for this problem?

Personally, I don't think it is a good idea to store information into the filename.
Rather, I would create a temp file in tabular or yaml format linking the file in question to covariates or other data. Then read this file in R or else to extract the relevant information.

One idea is to use paths instead since paths allowed to be longer.

Related

Openvms: Extracting RMS Indexed file t to Windows as a sequential flat file

I haven't used openvms for 20+ years. It was my 1st OS. I've been asked if it possible to copy the data from RMS files from openvms server to windows as a text file - so that it's readable.
No-one has experience or knowledge of the record structures etc.
The files are xyz.DAT and are relative files. I'm hoping the dat files are fixed length.
My 1st attempt would be to try and use Datatrieve (DTR) but get an error that the image isn't loaded.
Thought it might be as easy using CONVERT/FDL = nnnn.FDL - by changing the Relative to Sequential. The file seems still to be unreadable.
Is there an easy way to stream an RMS index file to a flat ASCII file?
I use to use COBOL and C to access the data in the past but had lots of libraries to help....
I've notice some solution may use odbc to connect but not sure what I can or cannot install on the server.
I can FTP using Filezilla to the server....
Another plan writing C application to read a file and output out as string.....or DCL too.....doesn't have to be quick...
Any ideas
Has mentioned before
The simple solution MIGHT be to to just use: $ TYPE/OUT=test.TXT test.DAT.
This will handle Relatie and Indexed files alike.
It is much the same as $ CONVERT / FDL=NL: test.DAT test.TXT
Both will just read records from the source and transfer the bytes, byte for byte, to the records in a sequential file.
FTP in ASCII mode will transfer that nicely to windows.
You can also use an 'inline' FDL file to generate a 'unix' LF file like:
$ conv /fdl="record; format stream_lf" test.DAT test.TXT
Or CR-LF file using:
$ conv /fdl="record; format stream" test.DAT test.TXT
Both can be transferring in Binary or Ascii with FTP.
MOSTLY - because this really only works well for TEXT ONLY source .DAT file.
There should be no CR, LF, FF or NUL characters in the source or things will break.
As 'habo' points out, use DUMP /RECORD=COUNT=3 to see how 'readable' the source data is.
If you spot 'binary' data using DUMP then you will need to find a record defintion somewhere which maps byte to Integers or Floating points or Dates as needed.
These defintions can be COBOL LIB files, or BASIC MAPS and are often stores IN the CDD (Common Data Dictionary) or indeed in DATATRIEVE .DIC DICTIONARIES
To use such definition you likely need a program to just read following the 'map' and write/print as text. Normally that's not too hard - notably not when you can find an example program on the server to tweak.
If it is just one or two 'suspect' byte ranges, then you can create a DCL loop to read and write and use F$EXTRACT to select the chunks you like.
If you want further help, kindly describe in words what kind of data is expected and perhaps provide the output from DUMP for 3 or 5 rows.
Good luck!
Hein.

Not able to filter files using pathGlobFilter

We are trying to read file from directory based on pattern from azure blob srorage.We are using
pathGlobFilter option to select files. The directory contains following files
Sales_51820_14529409_T_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_51820_14529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_T_7a3cc7d1d17261fd17e7e1fabd3.csv
We need to process only those files which does not have "T" in file name .We need to process only these two files
Sales_51820_14529409_7a3cc7d1d17261fd17e7e1fabd3.csv
Sales_61820_17529409_7a3cc7d1d17261fd17e7e1fabd3.csv
But we are not able to read only these two files.
Here is the code,
df = spark.read.format("csv").schema(structSchema).options(header=False,inferSchema=True,sep='|',pathGlobFilter= "Sales_\d{5} _ \d{8}_[a-z0-9]+.csv$").load("wasbs://abc#xxxxx.blob.core.windows.net/abc/2022/02/11/"
Regards,
Rajib
Glob is not a standard regular expression, there is differences between them.
For example glob doesn't match the number of times.
For details, see:here
Back to this question, a relatively stupid way, looking forward to the perfect solution of the giant.
pathGlobFilter="Sales_[0-9][0-9][0-9][0-9][0-9]_[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]_[a-z0-9]*.csv"

is there any way to splitting Variables in VSCode?

I wonder there is any way to splitting some variables in VSCode?
my example will explain my question better:
I have an exe file in such path C:\path\to\workspace\main\project\project.exe
my cpp source path that will create exe file is this C:\path\to\workspace\main\project\test.cpp
I want to create a task in tasks.json but my oder of variables does not give me the right path
as you understand:
${workspaceFolder} is C:\path\to\workspace
${fileDirname} returns C:\path\to\workspace\main\project
and ${relativeFileDirname}.exe returns main\project.exe
and combination of "${fileDirname}\\${relativeFileDirname}.exe" as a command will return C:\path\to\workspace\main\project\main\project.exe that is wrong.
so I wanted to know there is any other variable that just return the parent of current file or not?
if not can we split variables with \ ?
I hope it makes some sense
thanks
Add new fileDirnameBasename variable
see https://github.com/microsoft/vscode/commit/551db7ec94f02a4bdc8999092cf8bef642b3992d
${fileDirnameBasename} is being added to vscode v1.52 which I believe is what you are looking for.
You can use the extension Command Variable
Use the commands:
extension.commandvariable.file.fileDirBasename
extension.commandvariable.file.fileDirBasename1Up
extension.commandvariable.file.fileDirBasename2Up
btw, as an workaround, this worked for me
"${fileDirname}\\*.exe"
but need a variable for getting parent folder of current open file
any idea?

How to get values of variables after simulation in open modelica?

I've simulated a model in open modelica, now is it possible to get values of all variables that they had during the simulation? If yes, how can I get them?
When you simulate you can give outputFormat="csv" and then you get a result file Model_res.csv containing all variable values for all the time steps. You can then open this file in Excel if you want.
https://build.openmodelica.org/Documentation/OpenModelica.Scripting.simulate.html
In OpenModelica Connection Editor (OMEdit) you go to Simulation->Simulation Setup, tab Output and select csv.
I'm not sure I understand your question. Normally, when you run OpenModelica, you'll get a results file. This file contains all the simulation data for all variables.
How you read that file depends on what platform you are running on and what tools you are using. Is that the issue?

Ignore includes with #pycparser and define multiple Subgraphs in #pydot

I am new to stackoverflow, but I got a lot of help until now, thanks to the community for that.
I'm trying to create a software showing me caller depandencys for legacycode.
I'parsing a directory with c code with pycparcer, and for each file i want to create a subgraph with pydot.
Two questions:
When parsing a c file, the parser references the #includes, an i get also functions in my AST, from the included files. How can i know, if the function is included, or originaly from this actual file/ or ignore the #includes??
For each file i want to create a subgraph, an then add all functions in this file to this subgraph. I don't know how many subgraphs i have to create...
I have a set of files, where each file is a frozenset with the functions of this file
somthing like this is pssible?
for files in SetOfFiles:
#how to create subgraph with name of files?
for function in files:
self.graph.add_node(pydot.Node(funktion)) #--> add node to subgraph "files"
I hope you got my challange... any ideas?
Thanks!
EDIT:
I solved the question about pydot, it was quiet easy... So I stay with my pycparser problem :(
for files in ListOfFuncs:
cluster_x = pydot.Cluster(files, label=files)
for functions in files:
cluster_x.add_node(pydot.Node(functions))
graph.add_subgraph(cluster_x)
I can address the pycparser part. The preprocessor leaves #line directives that specify which file & line code came for, and pycparser consumes those. You can get that information from the AST it creates (see tests for an example).