I have a GraphDB application with a huge folder size in "./logs".
I notice there are 3 types of file logs: error, main, and query.
I understand We can delete those but I could find any mention in the documentation. Is it safe to remove them?
It's completely safe to delete the logs if you don't need them.
Related
I have a workflow that produces tons of files, most of them are not the output of any rule (they are intermediate results). I'd like to have the option of deleting everything that is not the output of any rule after the workflow is complete. This would be useful for archiving.
Right now the only way I found to do that is to define all outputs of all rules as protected, and then run snakemake --delete-all-output. Two questions:
1. Is this the way to go, or is there a better solution?
2. Is there a way to automatically define all outputs as protected, or do I have to go through the entire code and wrap all outputs with protected()?
Thanks!
Maybe the option --list-untracked helps?
--list-untracked, --lu
List all files in the working directory that are not
used in the workflow. This can be used e.g. for
identifying leftover files. Hidden files and
directories are ignored.
In addition to #dariober's suggestion, here's a few ideas:
It sounds like you know this already, but you could wrap unneeded output in temp(), which will cause Snakemake to delete it automatically. You can combine this with --notemp for debugging. With temp(), deletion will happen progressively, not after the workflow is complete.
Another option may be to use the onsuccess hook defined by snakemake. From the docs, "The onsuccess handler is executed if the workflow finished without error." So, say, if throughout the workflow, you put unneeded file in a temp/ folder or similar, you could use shutil.rmtree("temp") in onsuccess, which would delete all your unneeded files only after the workflow finished successfully, as you require. (Note also the similar onerror, should you need it.)
I have an internal Apache server for testing purpose, not client facing.
I wanted to upgrade the server to apache 2.4, but there is no space left, so I was trying to delete some files on the server.
After checking file size, I found a folder /var/lib/elasticsearch takes 80g space. For example, /var/lib/elasticsearch/elasticsearch/nodes/0/indices/logstash-2015.12.08 takes 60g already. I'm not sure what's elasticsearch. Is it safe if i delete this logstash? Thanks!
Elasticsearch is a search engine, like a NoSql database, and it stores the data in indeces. What you are seeing is the data of one index.
Probobly someone was using the index aroung 2015 when the index was timestamped.
I would just delete it.
I'm afraid that only you can answer that question. One use for logstash+elastic search are to help make sense out of system logs. That combination isn't normally setup by default, so I presume someone set it up at some time for some reason, and it has obviously done some logging. Only you can know if it is still being used, or if it is safe to delete.
As other answers pointed out Elastic search is a distributed search engine. And I believe an earlier user was pushing application or system logs using Logstash to this Elastic search instance. If you can find the source application, check if the log files are already there, if yes, then you can go ahead and delete your index. I highly doubt anyone still needs the logs back from 2015, but it is really your call to see what your application's archiving requirements are and then take necessary action.
I'm giving the flag -Denable-debug-rules, which the documentation says should print something to a log at least every 5 minutes, according to http://graphdb.ontotext.com/documentation/standard/rules-optimisations.html
Unfortunately it's not, and I need to figure out why inferencing is taking so long.
Help?
The specific files is http://purl.obolibrary.org/obo/pr.owl and I'm using owl2-rl-optimized
Version graphdb-ee-6.3.1
An exchange with GraphDB tech support clarified that the built-in rule sets can not be monitored. To effectively monitor them, copy into a new file and add that file as a ruleset following http://graphdb.ontotext.com/documentation/enterprise/reasoning.html#operations-on-rulesets
I'm trying to do the following:
Read a file's attributes
If the attributes match a certain condition,
delete the file
Right now I'm using NSFileManager to perform a attributesOfItemAtPath:error: followed by removeItemAtPath:error:. I'm worried something will happen in between the two operations that invalidates the initial check.
What's the best way to make these two operations atomic?
Edit
The answers so far suggest file locking, which I have tried looking into. The closest thing I could find was setting the NSFileImmutable flag. But it seems like any other program could come along, unset it, and modify the file.. Is there a better way to lock a file?
Edit 2
Someone asked for a use case. Let's say I'm trying to keep two folders in sync. Any changes made to the files in one folder are mirrored in the other, and vice versa. If I delete file 1 from folder A, I will also delete file 1 from folder B. But if file 1 in folder B changes right before I delete it; then instead of deleting it, I want to sync it back to folder A
You can use mandatory (kernel enforced) file locking to lock the file in question to prevent changes being done to the file when you are operating on it. I know Linux and Solaris support mandatory file locking but I have no clue if OS X / HFS+ does and if so how to use it. Hope this helps.
So you have more than one attribute query then? If so, why not just lock the file before starting the queries? Once done, unlock. Then if delete, delete.
There's a way to lock a file with Cocoa; I googled and worked that problem a few days back, but I already forgot the specific message; sorry..
I suggest to use a message in order to accept or delete the file with this method:
fileManager:shouldRemoveItemAtPath:
The prototype of your development is to call method delete the file and in the method shouldRemoveItemAtPath: you accept (returns YES) or you reject (returns NO) as the file attributes values.
Hope this help
It seems to me that you should just go ahead and delete the files that matched. There's no point to locking unless you are worried some other app will change the file such that it can't be deleted. Think about it; you found a file that matches your delete criteria. You want to delete it. Does it really matter if it changes in the meantime?
Every once in awhile I am fed a large data file that my client uploads and that needs to be processed through CMFL. The problem is that if I put the processing on a CF page, then it runs into a timeout issue after 120 seconds. I was able to move the processing code to a CFC where it seems to not have the timeout issue. However, sometime during the processing, it causes ColdFusion to crash and has to restarted. There are a number of database queries (5 or more, mixture of updates and selects) required for each line (8,000+) of the file I go through as well as other logic provided by me in the form of CFML.
My question is what would be the best way to go through this file. One caveat, I am not able to move the file to the database server and process it entirely with the DB. However, would it be more efficient to pass each line to a stored procedure that took care of everything? It would still be a lot of calls to the database, but nothing compared to what I have now. Also, what would be the best way to provide feedback to the user about how much of the file has been processed?
Edit:
I'm running CF 6.1
I just did a similar thing and use CF often for data parsing.
1) Maintain a file upload table (Parent table). For every file you upload you should be able to keep a list of each file and what status it is in (uploaded, processed, unprocessed)
2) Temp table to store all the rows of the data file. (child table) Import the entire data file into a temporary table. Attempting to do it all in memory will inevitably lead to some errors. Each row in this table will link to a file upload table entry above.
3) Maintain a processing status - For each row of the datafile you bring in, set a "process/unprocessed" tag. This way if it breaks, you can start from where you left off. As you run through each line, set it to be "processed".
4) Transaction - use cftransaction if possible to commit all of it at once, or at least one line at a time (with your 5 queries). That way if something goes boom, you don't have one row of data that is half computed/processed/updated/tested.
5) Once you're done processing, set the file name entry in the table in step 1 to be "processed"
By using the approach above, if something fails, you can set it to start where it left off, or at least have a clearer path of where to start investigating, or worst case clean up in your data. You will have a clear way of displaying to the user the status of the current upload processing, where it's at, and where it left off if there was an error.
If you have any questions, let me know.
Other thoughts:
You can increase timeouts, give the VM more memory, put it in 64 bit but all of those will only increase the capacity of your system so much. It's a good idea to do these per call and do it in conjunction with the above.
Java has some neat file processing libraries that are available as CFCS. if you run into a lot of issues with speed, you can use one of those to read it into a variable and then into the database
If you are playing with XML, do not use coldfusion's xml parsing. It works well for smaller files and has fits when things get bigger. There are several cfc's written out there (check riaforge, etc) that wrap some excellent java libraries for parsing xml data. You can then create a cfquery manually if need be with this data.
It's hard to tell without more info, but from what you have said I shoot out three ideas.
The first thing, is with so many database operations, it's possible that you are generating too much debugging. Make sure that under Debug Output settings in the administrator that the following settings are turned off.
Enable Robust Exception Information
Enable AJAX Debug Log Window
Request Debugging Output
The second thing I would do is look at those DB queries and make sure they are optimized. Make sure selects are happening with indicies, etc.
The third thing I would suspect is that the file hanging out in memory is probably suboptimal.
I would try looping through the file using file looping:
<cfloop file="#VARIABLES.filePath#" index="VARIABLES.line">
<!--- Code to go here --->
</cfloop>
Have you tried an event gateway? I believe those threads are not subject to the same timeout settings as page request threads.
SQL Server Integration Services (SSIS) is the recommended tool for complex ETL (Extract, Transform, and Load) work, which is what this sounds like. (It can be configured to access files on other servers.) The question might be, can you work up an interface between Cold Fusion and SSIS?
If you can upgrade to cf8 and take advantage of cfloop file="" which would give you greater speed and the file would not be put in memory (which is probably the cause of the crashing).
Depending on the situation you are encountering you could also use cfthread to speed up processing.
Currently, an event gateway is the only way to get around the timeout limits of an HTTP request cycle. CF does not have a way to process CF pages offline, that is, there is no command-line invocation (one of my biggest gripes about CF - very little offling processing).
Your best bet is to use an Event Gateway or rewrite your parsing logic in straight Java.
I had to do the same thing, Ben Nadel has written a bunch of great articles uses java file io, to allow you to more speedily read files, write files etc...
Really helped improve the performance of our csv importing application.