Looking for a faster way to batch export pdf:s in InDesign - pdf

I'm using this script (below) to batch export pdf:s from several indesign files for a task i do every week. The filenames are always the same, i'm using 8-10 different indd files to create 12-15 different pdf:s.
The script is set up like this:
//Sets variables for print and web presets
var myPDFExportPreset = app.pdfExportPresets.item("my-present-for-print-pdf");
var myPDFExportPreset2 = app.pdfExportPresets.item("my-preset-for-web-pdf");
//sample of one pdf exported first with print, then web pdf preset as two different files
var firstFileIntoPdfs = function(){
var openDocument= app.open(File("MYFILEPATH/firstfile.indd"));
openDocument.exportFile(
ExportFormat.pdfType,
File("MYFILEPATH/print-pdfs/firstfile-print.pdf"),
false,
myPDFExportPreset
);
openDocument.exportFile(
ExportFormat.pdfType,
File("MYFILEPATH/web-pdfs/firstfile-web.pdf"),
false,
myPDFExportPreset2
);
};
I'm defining all exports like the one above as named functions, some using only one of the presets, some two like the one above. I'm calling all these functions at the end of the file
firstFileIntoPdfs();
secondFileIntoPdfs();
thirdFileIntoPdfs();
fourthFileIntoPdfs();
and so on... ยจ
The script is however quite slow, 10 files into 1 or 2 pdfs each, like the function above, can take 10 minutes. I don't think this is a CPU issue, what i noticed is that it seems like the script is waiting for the files in "firstFileIntoPdfs()" to be created, a process that takes some minutes, before proceeding to execute the next function. Then waiting again...
Selecting File -> Export manually you can set new files to be exported while the previous ones are still processing the pdf files, which to me has seemed faster than how this script is working. Manual clicking is however error prone and tedious, of course.
Is there a better way to write this batch export script than how i've done above, that would make all functions executed while pdfs from previous functions still are processed in the system? I'd like to keep them as separate functions in order to be able to comment out some when only needing certain specific pdf:s. (unless the process of exporting all becomes nearly as fast as exporting only 1 pdf).
I hope my question makes sense!

There is an asynch method available, replace exportFile with asynchrousExportFile:
var openDocument= app.open(File("MYFILEPATH/firstfile.indd"));
openDocument.asynchronousExportFile(
ExportFormat.pdfType,
File("MYFILEPATH/print-pdfs/firstfile-print.pdf"),
false,
myPDFExportPreset
);
which use a background task

Related

llvm-cov report so big, i want to clip it

When I want to generate the llvm cov report, I find that The profraw file is very large, sometimes more than 1G. How should I cut it? Or I can write a part every time, and then upload it to the server. I can immediately clear the part that has been written from the cache. When I write again, I no longer need the part that has been uploaded
I tried to use void__ llvm_ profile_ initialize_ file(void); And void__ llvm_ profile_ initialize(void); It doesn't seem to work. My assumption is that__ llvm_ profile_ write_ When writing files, for example, there are already 10M files. I generate a new file. The previous 10M will not be written to this new file. The following content will be written to new files. Repeat this operation continuously

Fetching big SQL table in the web app session

I'm quite new in web app so apologize if my question is abit basic. I'm developing a Web app with R shiny where the inputs are very large tables from Azure SQL server. They are 20 tables each in the order of hundred-thousand rows and hundreds of columns containing numbers, Characters and etc. I have no problem calling them, my main issue is that it takes so much time to fetch everything from Azure SQL server. It takes approximately 20 minutes. So the user of the web app needs to wait quite a long.
I'm using DBI package as follows:
db_connect <- function(database_config_name){
dbConfig <- config::get(database_config_name)
connection <- DBI::dbConnect(odbc::odbc(),
Driver = dbConfig$driver,
Server = dbConfig$server,
UID = dbConfig$uid,
PWD = dbConfig$pwd,
Database = dbConfig$database,
encoding = "latin1"
)
return(connection)
}
and then fetching tables by :
connection <- db_connect(db_config_name)
table <- dplyr::tbl(con, dbplyr::in_schema(fetch_schema_name(db_config_name,table_name,data_source_type), fetch_table_name(db_config_name,table_name,data_source_type)))
I searched a lot but didn't come across a good solution, I appreciate any solutions can tackle this problem.
I work with R accessing SQL Server (not Azure) daily. For larger data (as in your example), I always revert to using the command-line tool sqlcmd, it is significantly faster. The only pain point for me was learning the arguments and working around the fact that it does not return proper CSV, there is post-query munging required. You may have an additional pain-point of having to adjust my example to connect to your Azure instance (I do not have an account).
In order to use this in a shiny environment and preserve its interactivity, I use the processx package to start the process in the background and then poll its exit status periodically to determine when it has completed.
Up front: this is mostly a "loose guide", I do not pretend that this is a fully-functional solution for you. There might be some rough-edges that you need to work through yourself. For instance, while I say you can do it asynchronously, it is up to you to work the polling process and delayed-data-availability into your shiny application. My answer here provides starting the process and reading the file once complete. And finally, if encoding= is an issue for you, I don't know if sqlcmd does non-latin correctly, and I don't know if or how to fix this with its very limited, even antiquated arguments.
Steps:
Save the query into a text file. Short queries can be provided on the command-line, but past some point (128 chars? I don't know that it's clearly defined, and have not looked enough recently) it just fails. Using a query-file is simple enough and always works, so I always use it.
I always use temporary files for each query instead of hard-coding the filename; this just makes sense. For convenience (for me), I use the same tempfile base name and append .sql for the query and .csv for the returned data, that way it's much easier to match query-to-data in the temp files. It's a convention I use, nothing more.
tf <- tempfile()
# using the same tempfile base name for both the query and csv-output temp files
querytf <- paste0(tf, ".sql")
writeLines(query, querytf)
csvtf <- paste0(tf, ".csv")
# these may be useful in troubleshoot, but not always [^2]
stdouttf <- paste0(tf, ".stdout")
stderrtf <- paste0(tf, ".stderr")
Make the call. I suggest you see how fast this is in a synchronous way first to see if you need to add an async query and polling in your shiny interface.
exe <- "/path/to/sqlcmd" # or "sqlcmd.exe"
args <- c("-W", "b", "-s", "\037", "-i", querytf, "-o", csvtf,
"-S", dbConfig$server, "-d", dbConfig$database,
"-U", dbConfig$uid, "-P", dbConfig$pwd)
## as to why I use "\037", see [^1]
## note that the user id and password will be visible on the shiny server
## via a `ps -fax` command-line call
proc <- processx::process$new(command = exe, args = args,
stdout = stdouttf, stderr = stderrtf) # other args exist
# this should return immediately, and should be TRUE until
# data retrieval is done (or error)
proc$is_alive()
# this will hang (pause R) until retrieval is complete; if/when you
# shift to asynchronous queries, do not do this
proc$wait()
One can use processx::run instead of process$new and proc$wait(), but I thought I'd start you down this path in case you want/need to go asynchronous.
If you go with an asynchronous operation, then periodically check (perhaps every 3 or 10 seconds) proc$is_alive(). Once that returns FALSE, you can start processing the file. During this time, shiny will continue to operate normally. (If you do not go async and therefore choose to proc$wait(), then shiny will hang until the query is complete.)
If you make a mistake and do not proc$wait() and try to continue with reading the file, that's a mistake. The file may not exist, in which case it will err with No such file or directory. The file may exist, perhaps empty. It may exist and have incomplete data. So really, make a firm decision to stay synchronous and therefore call proc$wait(), or go asynchronous and poll periodically until proc$is_alive() returns FALSE.
Reading in the file. There are three "joys" of using sqlcmd that require special handling of the file.
It does not do embedded quotes consistently, which is why I chose to use "\037" as a separator. (See [^1].)
It adds a line of dashes under the column names, which will corrupt the auto-classing of data when R reads in the data. For this, we do a two-step read of the file.
Nulls in the database are the literal NULL string in the data. For this, we update the na.strings= argument when reading the file.
exitstat <- proc$get_exit_status()
if (exitstat == 0) {
## read #1: get the column headers
tmp1 <- read.csv(csvtf, nrows = 2, sep = "\037", header = FALSE)
colnms <- unlist(tmp1[1,], use.names = FALSE)
## read #2: read the rest of the data
out <- read.csv(csvtf, skip = 2, header = FALSE, sep = "\037",
na.strings = c("NA", "NULL"), quote = "")
colnames(out) <- colnms
} else {
# you should check both stdout and stderr files, see [^2]
stop("'sqlcmd' exit status: ", exitstat)
}
Note:
After a lot of pain with several issues (some in sqlcmd.exe, some in data.table::fread and other readers, all dealing with CSV-format non-compliance), at one point I chose to stop working with comma-delimited returns, instead opting for the "\037" field Delimiter. It works fine with all CSV-reading tools and has fixed so many problems (some not mentioned here). If you're not concerned, feel free to change the args to "-s", "," (adjusting the read as well).
sqlcmd seems to use stdout or stderr in different ways when there are problems. I'm sure there's rationale somewhere, but the point is that if there is a problem, check both files.
I added the use of both stdout= and stderr= because of a lot of troubleshooting I did, and continue to do if I munge a query. Using them is not strictly required, but you might be throwing caution to the wind if you omit those options.
By the way, if you choose to only use sqlcmd for all of your queries, there is no need to create a connection object in R. That is, db_connect may not be necessary. In my use, I tend to use "real" R DBI connections for known-small queries and the bulk sqlcmd for anything above around 10K rows. There is a tradeoff; I have not measured it sufficiently in my environment to know where the tipping point is, and it is likely different in your case.

Repast: how to add and set a new parameter directly from the code instead of GUI

I want to create a parameter that contains a list of string (list of hub codes). This list of string is created by reading an external csv file (this list could contain the different codes depending on the hub codes in the CSV file)
What I want is to find a easy auto way to perform batch runs by each hub code in the list.
So this question is:
1) how to add and set a new parameter directly from the code (during the initialization when reading the CSV) instead of GUI parameter panel?
2) how to avoid manual configuration of hub list in the batch run configuration
Something like this for adding the parameters should work in your ContextBuilder.
Parameters params = RunEnvironment.getInstance().getParameters();
((DefaultParameters)params).addParameter("foo", "Big Foo", Integer.class, 3, false);
You would read the csv file to get the parameter name and value.
I'm not sure I completely understand the batch run configuration question, but each batch run has a run number associated with it
RunState.getInstance().getRunInfo().getRunNumber()
If you can associate line numbers in your csv parameter file with run number (e.g. run number 1 should use line 1, and so on), then each batch run would use a different parameter line.

Set variables in Javascript job entry at root level

I need to set variables in root scope in one job to be used in a different job. The first job has a Javascript job entry, with the statements:
parent_job.setVariable("customers_full_path", "C:\\customers22.csv", "r");
true;
But the compilation fails with:
Couldn't compile javascript:
org.mozilla.javascript.EvaluatorException: Can't find method
org.pentaho.di.job.Job.setVariable(string,string,string). (#2)
How to set a variable at root level in a Javascript job entry?
Sorry for the passive agressive but:
I don't know if you are new to Pentaho but, the most common mistake for new users, with previous knowledge of programming, is to be sort of 'addicted' to know methods, as such you are using JavaScript for a functionality that is built in the tool. Both Transformations(KTR) and JOBs(KJB) have a similar step, you can better manipulate this in a KTR.
JavaScript steps slow down the flow considerably, so try to stay away from those as much as possible.
EDIT:
Reading This article, seems the only thing you're doing wrong is the actual syntax of the command..
Correct usage :
parent_job.setVariable("Desired Value", [name_of_variable]);
The command you described has 3 parameters, when it should be 2. If you have more than 1 variable you need to set, use 3 times the command. Try it out see if it works.

JSR 352 : How do you write to a MVS Dataset from a Java Batch program?

I need to write to a non-VSAM dataset in the mainframe. I know that we need to use the ZFile library to do it and I found how to do it here
I am running my Java batch job in the WebSphere Liberty on zOS. How do I specify the dataset? Can I directly give the DataSet a name like this?
dsnFile = new ZFile("X.Y.Z", "wb,type=record,noseek");
I am able to write it to a text file on the server itself using Java's File Writers but I don't know how to access a mvs dataset.
I am relatively new to the world of zOS and mainframe.
It sounds like you might be asking more generally how to use the ZFile API on WebSphere Liberty on z/OS.
Have you tried something like:
String pdsName = ZFile.getSlashSlashQuotedDSN("X.Y.Z");
ZFile zfile = new ZFile(pdsName , ...options...)
As far as batch-specific use cases, you might obviously have to differentiate between writing to a new file that's created for the first time on an original execution, as opposed to appending to an already-existing one on a restart.
You also might find some useful snipopets in this doctorbatch.io repo, along with the original link you posted.
For reference, I'll copy/paste from the ZFile Javadoc:
ZFile dd = new ZFile("//DD:MYDD", "r");
Opens the DD namee MYDD for reading
ZFile dsn = new ZFile("//'SYS1.HELP(ACCOUNT)'", "rt");
Opens the member ACCOUNT from the PDS SYS1.HELP for reading text records
ZFile dsn = new ZFile("//SEQ", "wb,type=record,recfm=fb,lrecl=80,noseek");
Opens the data set {MVS_USER}.SEQ for sequential binary writing. Note that ",noseek" should be specified with "type=record" if access is sequential, since performance is greatly improved.
One final note, another couple useful ZFile helper methods are: bpxwdyn() and getFullyQualifiedDSN().