llvm-cov report so big, i want to clip it - llvm-cov

When I want to generate the llvm cov report, I find that The profraw file is very large, sometimes more than 1G. How should I cut it? Or I can write a part every time, and then upload it to the server. I can immediately clear the part that has been written from the cache. When I write again, I no longer need the part that has been uploaded
I tried to use void__ llvm_ profile_ initialize_ file(void); And void__ llvm_ profile_ initialize(void); It doesn't seem to work. My assumption is that__ llvm_ profile_ write_ When writing files, for example, there are already 10M files. I generate a new file. The previous 10M will not be written to this new file. The following content will be written to new files. Repeat this operation continuously

Related

File create time doesn't change even after it is deleted

I am using the following code:
from datetime import datetime
import time, pandas as pd, os, pickle
df = pd.DataFrame(np.arange(1,200))
fn = r'C:\z1.p'
pickle.dump(df, open(fn, 'wb'))
print(datetime.fromtimestamp(os.stat(fn).st_ctime))
os.remove(fn)
time.sleep(5)
pickle.dump(df, open(fn, 'wb'))
print(datetime.fromtimestamp(os.stat(fn).st_ctime))
But I get the same create time from both print statements as:
2022-03-16 08:43:30.885011
2022-03-16 08:43:30.885011
How do I make sure that new time gets printed for second print statement?
This is a Windows feature, called "file system tunnelling".
The apocryphal history of file system tunnelling
One of the file system features you may find yourself surprised by is
tunneling, wherein the creation timestamp and short/long names of a
file are taken from a file that existed in the directory previously.
In other words, if you delete some file “File with long name.txt” and
then create a new file with the same name, that new file will have the
same short name and the same creation time as the original file. You
can read this KB article for details on what operations are sensitive
to tunnelling.
Why does tunneling exist at all?
When you use a program to edit an existing file, then save it, you
expect the original creation timestamp to be preserved, since you’re
editing a file, not creating a new one. But internally, many programs
save a file by performing a combination of save, delete, and rename
operations (such as the ones listed in the linked article), and
without tunneling, the creation time of the file would seem to change
even though from the end user’s point of view, no file got created.
...
See this archived copy of Windows NT Contains File System Tunneling Capabilities:
When a name is removed from a directory (rename or delete), its
short/long name pair and creation time are saved in a cache, keyed by
the name that was removed. When a name is added to a directory (rename
or create), the cache is searched to see if there is information to
restore. The cache is effective per instance of a directory. If a
directory is deleted, the cache for it is removed.
These paired operations can cause tunneling on "name."
delete(name)/create(name)
delete(name)/rename(source, name)
rename(name, newname)/create(name)
rename(name, newname)/rename(source, name)
The idea is to mimic the behavior MS-DOS programs expect when they use
the safe save method. They copy the modified data to a temporary file,
delete the original and rename the temporary to the original. This
should seem to be the original file when complete. Windows performs
tunneling on both FAT and NTFS file systems to ensure long/short file
names are retained when 16-bit applications perform this safe save
operation.
One Windows function related to file tunneling is FltGetTunneledName():
The FltGetTunneledName routine retrieves the tunneled name for a file, given the normalized name returned for the file by a previous call to FltGetFileNameInformation, FltGetFileNameInformationUnsafe, or FltGetDestinationFileNameInformation.
...
To disable tunnelling:
Open regedit
Navigate here:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem
On the Edit menu, point to New and then click DWORD Value
Type MaximumTunnelEntries and then press Enter
On the Edit menu, click Modify
Type 0 and then click OK
Restart your computer
Done

What is the best practice for downloading large CSV files from S3 in Java?

I'm trying to get a large CSV file from S3 but the download fails with “java.net.SocketException: Connection reset”, which is probably due to the InputStream simply being open for too long (the download often takes more than an hour since I am doing multiple time-consuming processes on the streamed content). This is how I currently parse the file:
InputStream inputStream = new GZIPInputStream(s3Client.getObject("bucket", "key").getObjectContent());
Reader decoder = new InputStreamReader(inputStream, Charset.defaultCharset());
BufferedReader isr = new BufferedReader(decoder);
CSVParser csvParser = new CSVParser(isr, CSVFormat.DEFAULT);
CSVRecord nextRecord = csvParser.iterator().next();
...
I know I have to split the download into multiple short getObject-calls with a defined offset for the GetObjectRequest, but I'm wondering how to define this offset in case of a CSV, since I need complete lines.
Do I have to ditch the parser library and parse each line into an Object myself so I can keep a count of the read bytes and use it as an offset for the next batch? That doesn't seem very robust to me. Is there any best practice way to achieve "batch downloading" of CSV records?
I decided on simply using the dedicated getObject(GetObjectRequest getObjectRequest, File destinationFile) method to copy the entire CSV to a temporary file on disk. This closes the HTTP connection as soon as possible and allows me to get the InputStream from the local file with no problems. It doesn't resolve the question of the best way to download in batches, but it's a nice and simple workaround.

rstan() should not run in #'#example?

In package development, each example requires <5s. However, the pair of stan_model() and rstan::sampling() take long times more than 5s as follows:
Examples with CPU or elapsed time > 5s
user system elapsed
fit 1.25 0.11 32.47
So I put \donttest{} for each rstan::sampling() in roxygen comments #'#examples
In examples#'#examples, we should not run sampling() or is there any treatment ?
I had tried to create my package based on the code rstan_package_skeleton(path = 'BayesianAAA') when I was taught from you (Thank you !!) but, I do not understand many things about it.
Previously, rstan_package_skeleton(path = 'BayesianAAA') launched the errors in my computer ( but now the error does not occur).
So, I made my package without the rstan_package_skeleton(), say BayesianAAA, and in my original making, I put the Model_A.stan,Model_B.stan,Model_C.stan,.... in the inst/extdata and I refer my stan files as follows;
scr <- system.file("extdata", "Model_A.stan", package="BayesianAAA")
scr <- rstan::stan_model(scr)
I have many questions about the code rstan_package_skeleton(path = 'BayesianAAA').
1) The first question is How to include my existing stan files and how to refer my .stan files for the rstan::stan_model() ?
According to the page following page, it said that
If we had existing .stan files to include with the package we could use the optional stan_files argument to rstan_package_skeleton to include them.
So, I think I should execute, I am not sure but the following like manner is required;
`rstan_package_skeleton(path = 'BayesianAAA', stan_files = "Model_A.stan" )`.
But I do not know how to write the code for several stan files, say Model_A.stan,Model_B.stan,Model_C.stan in my existing package made without the rstan_package_skeleton(). I do not understand , but the following code is correct ? Since I do not where the files described in the variable stan_files is reflected in the new project created by the rstan_package_skeleton().
`rstan_package_skeleton(path = 'BayesianAAA', stan_files = c("Model_A.stan",`Model_B.stan`,`Model_C.stan` )`.
Here, the another question arise, that is,
2) The second question: Where I execute the code rstan_package_skeleton(path = 'BayesianAAA', stan_files = "Model_A.stan" ) ? I execute it form the R studio console in my existing package project. Is it correct ? And then, the new project arise and it is contained the old existing project. What should I do ?
https://cran.r-project.org/web/packages/rstantools/vignettes/minimal-rstan-package.html
3) I do not quite know about the packages "rstanarm" , but I try to imitate it for my package, but I can not fined any .stan file in it, I am wrong ?
I am sorry for my poor English, and Lack of study about these things.
I would be grateful if you could tell me.
You generally should not be writing a package that calls stan_model at runtime, unless like brms or tmbstan you are generating a Stan program at runtime as opposed to writing it statically. There are dozens of packages on CRAN that provide compiled Stan programs basically by following the build process developed for rstanarm, which is facilitated by the rstantools::rstan_package.skeleton function, the step-by-step guide, and the developer guidelines which directly address your question
CRAN policy permits long installation times but imposes restrictions on the time consumed by examples and unit tests that are much shorter than the time that it takes to compile even a simple Stan program. Thus, it is only possible to adequately test your package if it has pre-compiled Stan programs.
Even then, it can be difficult to sample from a posterior distribution (adequately) in five seconds, so you often have to use small datasets, one chain, a small number of iterations, etc.
It is best to pass the names of your Stan programs (which should end in a .stan extension, not use a period otherwise, and only have ASCII letters, numbers, and the underscore in their names) to rstantools::rstan_package_skeleton(). If doing so from RStudio, I would call it while not in an existing project. Then
During installation, all Stan programs will be compiled and saved in the list stanmodels that can then be used by R function in the package. The rule is that the Stan program compiled from the model code in src/stan_files/foo.stan is stored as list element stanmodels$foo.
There are dozens of R packages that have Stan programs in their src/stan_files directory (although the locations of the Stan programs are going to move to inst/stan for the next rstantools release) that for the most part just followed the vignettes and did not have to do any additional steps except write more R functions.

Looking for a faster way to batch export pdf:s in InDesign

I'm using this script (below) to batch export pdf:s from several indesign files for a task i do every week. The filenames are always the same, i'm using 8-10 different indd files to create 12-15 different pdf:s.
The script is set up like this:
//Sets variables for print and web presets
var myPDFExportPreset = app.pdfExportPresets.item("my-present-for-print-pdf");
var myPDFExportPreset2 = app.pdfExportPresets.item("my-preset-for-web-pdf");
//sample of one pdf exported first with print, then web pdf preset as two different files
var firstFileIntoPdfs = function(){
var openDocument= app.open(File("MYFILEPATH/firstfile.indd"));
openDocument.exportFile(
ExportFormat.pdfType,
File("MYFILEPATH/print-pdfs/firstfile-print.pdf"),
false,
myPDFExportPreset
);
openDocument.exportFile(
ExportFormat.pdfType,
File("MYFILEPATH/web-pdfs/firstfile-web.pdf"),
false,
myPDFExportPreset2
);
};
I'm defining all exports like the one above as named functions, some using only one of the presets, some two like the one above. I'm calling all these functions at the end of the file
firstFileIntoPdfs();
secondFileIntoPdfs();
thirdFileIntoPdfs();
fourthFileIntoPdfs();
and so on... ¨
The script is however quite slow, 10 files into 1 or 2 pdfs each, like the function above, can take 10 minutes. I don't think this is a CPU issue, what i noticed is that it seems like the script is waiting for the files in "firstFileIntoPdfs()" to be created, a process that takes some minutes, before proceeding to execute the next function. Then waiting again...
Selecting File -> Export manually you can set new files to be exported while the previous ones are still processing the pdf files, which to me has seemed faster than how this script is working. Manual clicking is however error prone and tedious, of course.
Is there a better way to write this batch export script than how i've done above, that would make all functions executed while pdfs from previous functions still are processed in the system? I'd like to keep them as separate functions in order to be able to comment out some when only needing certain specific pdf:s. (unless the process of exporting all becomes nearly as fast as exporting only 1 pdf).
I hope my question makes sense!
There is an asynch method available, replace exportFile with asynchrousExportFile:
var openDocument= app.open(File("MYFILEPATH/firstfile.indd"));
openDocument.asynchronousExportFile(
ExportFormat.pdfType,
File("MYFILEPATH/print-pdfs/firstfile-print.pdf"),
false,
myPDFExportPreset
);
which use a background task

Loop over file names in sub job (Kettle job)

The task is to get file names from the folder and then loop the same task (job) over all the files one by one.
I created a simple job with transformation (get files names) and then job with flag "Execute for each row" (now is just logging the name of the file).
Did it the same way it is described here: http://ramathoughts.blogspot.ch/2010/08/processing-group-of-files-with-kettle.html
However, the path of the received files is not passed to the sub-job (logging doesn't display variable value). But the sub-job is executed as many times as there is number of files in the input folder. So it looks like it is passed to some extent, but for some reason is not available as a variable.
Image with log details, as seen the variable is displayed as ${path} instead of value of the path:
http://i.imgur.com/pK1iHtl.png?1
The sample code is below as archive with jobs and transformation and also sample input files. Any help is appreciated, as I may be missing something simple here https://www.hightail.com/download/bXBhL0dNcklCMTVsQXNUQw
The issue is the 2nd Job (i.e. j_log_file_names.kjb) is unable to detect the parameter path. Just try defining the parameter to this Job; like the image below:
This will make sure that the parameter that is coming from the prev. step is correctly fetched into the Job. Rest of your job looks absolutely fine.
Hope this helps :)