I've written a Kettle job that moves files from Pentaho 5.3 (SP201505) JCR folders to Windows file-system folders (on the same server; Server 2008 R2 Enterprise). The "move" part of the job uses the Copy Files step with the Remove source files option selected.
Initially the job runs as expected, moving all files from the source JCR folders to the destination file-system folders.
Before this job runs again Pentaho users have placed new files into the source JCR folders. However, when I next run this job it no longer sees any files in the source JCR folders, even though I can browse them from within the PUC.
I'm running the job from within Spoon (while coding and testing). It is using the VFS protocol jcr-solution to access files within JCR folders.
Does this job need to do some kind of repository refresh each time it runs in order to see changes to the JCR folders, and if so, how would this been done within the job?
Apparently multiple instances of the JCR file system are not dynamically coherent.
I reverse-engineered plugin Pentaho Repository Synchronizer and figured out how to refresh my local instance of the JCR. The refresh can be accomplished via the following code segment in PDI transformation step User Defined Java Class. This code expects the file system root URI to be in an input field named RootURI:
import org.apache.commons.vfs.FileObject;
import org.pentaho.di.core.vfs.KettleVFS;
public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
try
{
// Get a row from the input hop.
Object[] r=getRow();
// Are we are done?
if (r==null)
{
// Yes.
setOutputDone();
return false;
}
// No, pick up the file system root URI from a field named RootURI.
// RootURI example: "jcr-solution:http://admin:password#localhost:8080/pentaho!/"
String fileName=get(Fields.In,"RootURI").getString(r);
// Get the file system object and close it.
FileObject jcrObject=KettleVFS.getFileObject(fileName);
if ((jcrObject!=null)&&(jcrObject.exists()))
{
KettleVFS.getInstance().getFileSystemManager().closeFileSystem(jcrObject.getFileSystem());
KettleVFS.getInstance().getFileSystemManager().getFilesCache().close();
//System.out.println("*** JCR Refreshed ***");
}
return true;
}
catch (Exception e)
{
throw new KettleException(e);
}
}
The above solution seems to have solved my problem.
Related
I am cloning a large public triplestore for local development of a client app.
The data is too large to fit on the ssd partition where <graphdb.home>/data lives. How can I create a new repository at a different location to host this data?
GraphDB on startup will read the value of graphdb.home.data parameter. By default it will point to ${graphdb.home}/data. You have two options:
Move all repositories to the big non-SSD partition
You need to start graphdb with ./graphdb -Dgraphdb.home.data=/mnt/big-drive/ or edit the value of graphdb.home.data parameter in ${graphdb.home/conf/graphdb.properties.
Move a single repository to a different location
GraphDB does not allow creating a repository if the directory already exists. The easiest way to work around this is to create a new empty repository bigRepo, initialize the repository by making at least a request to it, and then shutdown GraphDB. Then move the directory $gdb.home/data/repositories/bigRepo/storage/ to your new big drive and create a symbolic link on the file system ln -s /mnt/big-drive/ data/repositories/bigRepo/storage/
You can apply the same technique also for moving only individual files.
Please make sure that all permissions are correctly set by using the same user to start GraphDB.
I am in process of migrating Pentaho from Database repository to file repository.
I have exported the database repository into xml file and then created a file repository and imported the repository...
The first issue that I saw after importing is all my database connections are being stored in .ktr and .kjb files, This is going to be a big issue If I update a connection string like updating a password, I have more than a hundreds of sub transformations and jobs, do I have to update this in all those files?
Is there any way to ignore the password and other connection settings that is stored in the .ktr and .kjb files and instead use the repository connection or specify it in the .kettle property?
The other issue that I face is When I try to run the master job via kitchen in cmd it does not recognize the sub transformation and jobs. However when I change the Transformation root to ${Internal.Entry.Current.Directory} - the sub transformation is being recognized and processed- As I mentioned I have more than 100 sub transformation and jobs - is there any way to update this root for all jobs and transformation at once.
Kitchen.bat /file:"C:\pentaho-8-1\Dev_Repo\home\jobs\MainProcess\MasterJob.kjb" /level:Basic /logfile:"C:\pentaho-8-1\logs\my-job.txt"
This fails with error (.ktr is not a file or the repository is not defined)
However when I change the root directory to ${Internal.Entry.Current.Directory} it works!
For the database connections, you can make .kdbs in the repository and enter variables for all the properties (Host, Port, Schema, User, etc) and define them in kettle.properties or another properties file.
This works like a more convenient version of JNDI files, with one properties file per environment. You can easily inspect current values by opening the kettle properties from within the Spoon client (don't edit them or it will mess up the layout!) and you can also put kettle "encrypted" passwords in the properties file.
PDI will still save copies of the connections into all the .kjb and ktr files (and should in theory update them from .kdb or shared.xml when opening them) but since the contents are just generic variable names (${STAGING_DB_HOST} etc) you will almost never run into problems with this.
For the transformation filenames, a good text search and replace tool should fix most of your transformations in one go. Include some of the XML tag to prevent replacing too much.
In my xhtml I use the p:fileUpload tag.
In my session scoped bean I keep a reference to UploadedFile.
Basically I am doing something like this:
private UploadedFile uploadedFile;
public void fileUpload(FileUploadEvent event) throws IOException {
uploadedFile = event.getFile();
}
With a separate button, an action in my session bean is invoked which copies the uploaded file to another location. Basically I am doing this:
File fileTo =new File(“/xyz/abc.def”);
Files.copy(uploadedFile.getInputstream(), fileTo.toPath(), StandardCopyOption.REPLACE_EXISTING);
Now the situation is as follows:
With every upload invocation, a temporary file is created in my temp directory. This is ok.
With every upload invocation, the variable uploadedFile is overwritten by event.getFile(). Therefore a previous instance of UploadedFile is at some time later garbage collected. Primefaces uses under the cover Apache’s FileUpload, which deletes the temporary file if UploadedFile is garbage collected. I can see that this works, so this is okay too. So the scenario is to upload one file after the other. I can see that in the temp directory with each upload a new temp file is created and that after a while (1 minute or so) they are magically removed, i.e. they have been deleted during garbage collection. This works perfectly.
At the moment I invoke my action which copies a previously uploaded file, this specific file is not deleted any more, even when I upload later other files and the variable uploadedFile holds references to other instances. I have tried several other techniques to copy the file, like with Apaches IOUtils or just completely hand made with a while loop by reading and writing the bytes to and from a buffer. No matter what I try, it seems that at the moment I start to read the bytes from the input stream which I obtain from UploadedFile, the temporary file is not deleted any more.
Any ideas what can be wrong.
I have this behavior with Primefaces Version 3.5 was well as with version 5.2.
Anybody know why the below error exist?
CrystalDecisions.CrystalReports.Engine.LoadSaveReportException: Load report failed
From your comments about windows\Temp, that is caused by the application pool's identity not having access to c:\windows\Temp (and possibly to the reports folder).
You can solve this problem by giving the application pool credentials that do have the necessary permissions, or giving read write permissions to "Network User" to the c:\windows\temp folder (and again, possibly to the reports folder).
The reason why this folder is required is that the crystal runtime creates a dynamic copy of the report at runtime and places it in the %temp% folder. It is the temp folder copy (with a GUID appended to the original file name) that is shown in the web browser. This is by design and is a useful feature to ensure the live report is safe.
Following from this, you will have to do a proper cleanup after loading every report because they just stay there and fill up the temp folder!
Something like:
CrystalReportViewer1.Dispose(); // if using the viewer
CrystalReportViewer1 = null;
report.Close(); // I can't remember if this is part of the reportDocument class
report.Dispose();
report = null;
GC.Collect(); // crazy but true. Monitor the temp folder to see the effect
Reckface's answer was clear enough, but to add something up.
I managed to get it working using this:
protected void Page_Unload(object sender, EventArgs e)
{
if (reportDocument != null)
{
reportDocument.Close();
reportDocument.Dispose();
crystalReportViewer1.Dispose();
}
}
Doing so may cause issues with the buttons on the toolbar, they can't find the document path anymore because the document is disposed. In that case the document needs to load the path again during postbacks: source
This is a very old question, but I want to add that I got this error because the report was not embedded into the class library.
Solution: I removed the report from the project. I restarted Visual Studio 2022 and than added the crystal report again. This time it was added as an embedded resource.
Did you even bother to Google it? This is a common exception; there are hundreds of posts about it scattered around the intertubes.
The Crystal .NET runtime has famously cryptic error messages. This one just means that the .rpt file (or embedded report) could not be loaded. There are several possible root causes: wrong filename or path, security violation, you are not disposing of old reports properly and windows/temp is getting hogged up, etc.
Do some research. If you're still stuck, come back and elaborate on the problem (do any of your reports work, is this a web app?, what code are you using, etc.)
I want to run a process that completely destroys and then rebuilds my lucene .net search index from scratch.
I'm stuck on the destroying part
I've called:
IndexWriter.Commit();
IndexWriter.Close();
Analyzer.Close();
foreach (var name in Directory.ListAll()) { Directory.ClearLock(name); Directory.DeleteFile(name); }
Directory.Close();
but the process is failing because the is still a file handler on a file '_0.cfs'
Any ideas?
Are you hosted in IIS? Try an iisreset (sometimes IIS is holding onto the files themselves).
Just call IndexWriter.DeleteAll() followed by a IndexWriter.Commit(), it will remove the index content and will enable you to start off with an empty index, while already open readers will still be able to read data until closed. The old files will automatically be removed once they are no longer used.