The route definition is simple based on FTP2 component.
Endpoint[sftp://server.com:22//path/to/file/?consumer.delay=900000&password=xxxxxx&username=user]
I am trying to read a file from a FTP folder.
JAXBContext jaxBContext = JAXBContext.newInstance(ObjectFactory.class);
Unmarshaller unmarshaller = jaxBContext.createUnmarshaller();
File authBatchFile = exchange.getIn().getBody(File.class);
AuthorizationFeed batchAuthFeed = (AuthorizationFeed) JAXBIntrospector
.getValue(unmarshaller.unmarshal(authBatchFile));
The exchange has everything it should have
Body [Body is file based: RemoteFile[fileName.txt]]
Header also shows the size of the file CamelFileLength=81612. However, I am getting the below exception just after the exchange trace.
java.lang.IllegalArgumentException: The value for the "java.io.File" parameter cannot be null.
at com.ibm.xml.xlxp2.jaxb.unmarshal.AbstractUnmarshallerImpl.reportNullParameter(AbstractUnmarshallerImpl.java:180)
at com.ibm.xml.xlxp2.jaxb.unmarshal.AbstractUnmarshallerImpl.unmarshal(AbstractUnmarshallerImpl.java:72)
at com.wellpoint.clihub.hie.um.camel.processor.BatchCFFProcessor.process(BatchCFFProcessor.java:47)
I found the solution to it by adding to the route definition
&localWorkDirectory=/tmp.
That way it doesn't consider it to be a remote file and considers it to be a java.io.File. I think Camel should incorporate that as a default feature when dealing with remote files. Per their documentation,
The route above is ultra efficient as it avoids reading the entire file content into memory. It will download the remote file directly to a local file stream. The java.io.File handle is then used as the Exchange body.
Related
i'm fairly new with working with NiFi. We're trying to validate an xmlfile, except we need to use a different xsd depending on some value passed in the file. Extracting and routing on the name wasn't an issue, and we stored the desired filepath in an attribute (xsdFile).
However, when trying to use that attribute in the XMLValidation processor, it changes the path and gives an error. When I copy the path from the attributes and copy it to the schema, it works, so the path itself isn't wrong.
Attribute passed in flowfile:
xsdFile:
C:\Users\MYNAME\Documents\NiFi\FLOW_RESOURCES\input\validatexml\camt.053.001.02_CvW_2.xsd
XMLValidation processor properties:
Schema File: ${xsdFile}
Error:
Failed to properly initialize Processor. If still scheduled to run, NiFi will attempt to initialize and run the Processor again after the 'Administrative Yield Duration' has elapsed. Failure is due to java.io.FileNotFoundException:
Schema file not found at specified location: C:\Users\MYNAME\DOCUME~1\NiFi\NIFI-1~1.0: java.io.FileNotFoundException:
Schema file not found at specified location: C:\Users\MYNAME\DOCUME~1\NiFi\NIFI-1~1.0
java.io.FileNotFoundException: Schema file not found at specified location: C:\Users\MYNAME\DOCUME~1\NiFi\NIFI-1~1.0
Why does this not work? Is there another way to do this, or do we need to route to different XMLValidators?
Check documentation for this processor:
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.9.2/org.apache.nifi.processors.standard.ValidateXml/index.html
Schema File:
The path to the Schema file that is to be used for validation
Supports Expression Language: true
(will be evaluated using variable registry only)
So, flow file attribute can't be used for this parameter
I am new to S3 and need to use it for image storage. I found a half dozen versions of an s2wrapper for cf but it appears that the only one set of for v4 is one modified by Leigh
https://gist.github.com/Leigh-/26993ed79c956c9309a9dfe40f1fce29
Dropped in the com directory and created a "test" page that contains the following code:
s3 = createObject('component','com.S3Wrapper').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
but got the following error :
So I changed the line 37 from
variables.Sv4Util = createObject('component', 'Sv4').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
to
variables.Sv4Util = createObject('component', 'Sv4Util').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
Now I am getting:
I feel like going through Leigh code and start changing things is a bad idea since I have lurked here for year an know Leigh's code is solid.
Does any know if there are any examples on how to use this anywhere? If not what I am doing wrong. If it makes a difference I am using Lucee 5 and not Adobe's CF engine.
UPDATE :
I followed Leigh's directions and the error is now gone. I am addedsome more code to my test page which now looks like this :
<cfscript>
s3 = createObject('component','com.S3v4').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
bucket = "imgbkt.domain.com";
obj = "fake.ping";
region = "s3-us-west-1"
test = s3.getObject(bucket,obj,region);
writeDump(test);
test2 = s3.getObjectLink(bucket,obj,region);
writeDump(test2);
writeDump(s3);
</cfscript>
Regardless of what I put in for bucket, obj or region I get :
JIC I did go to AWS and get new keys:
Leigh if you are still around or anyone how has used one of the s3Wrappers any suggestions or guidance?
UPDATE #2:
Even after Alex's help I am not able to get this to work. The Link I receive from getObjectLink is not valid and getObject never does download an object. I thought I would try the putObject method
test3 = s3.putObject(bucketName=bucket,regionName=region,keyName="favicon.ico");
writeDump(test3);
to see if there is any additional information, I received this :
I did find this article https://shlomoswidler.com/2009/08/amazon-s3-gotcha-using-virtual-host.html but it is pretty old and since S3 specifically suggests using dots in bucketnames I don't that it is relevant any longer. There is obviously something I am doing wrong but I have spent hours trying to resolve this and I can't seem to figure out what it might be.
I will give you a rundown of what the code does:
getObjectLink returns a HTTP URL for the file fake.ping that is found looking in the bucket imgbkt.domain.com of region s3-us-west-1. This link is temporary and expires after 60 seconds by default.
getObject invokes getObjectLink and immediately requests the URL using HTTP GET. The response is then saved to the directory of the S3v4.cfc with the filename fake.ping by default. Finally the function returns the full path of the downloaded file: E:\wwwDevRoot\taa\fake.ping
To save the file in a different location, you would invoke:
downloadPath = 'E:\';
test = s3.getObject(bucket,obj,region,downloadPath);
writeDump(test);
The HTTP request is synchronous, meaning the file will be downloaded completely when the functions returns the filepath.
If you want to access the actual content of the file, you can do this:
test = s3.getObject(bucket,obj,region);
contentAsString = fileRead(test); // returns the file content as string
// or
contentAsBinary = fileReadBinary(test); // returns the content as binary (byte array)
writeDump(contentAsString);
writeDump(contentAsBinary);
(You might want to stream the content if the file is large since fileRead/fileReadBinary reads the whole file into buffer. Use fileOpen to stream the content.
Does that help you?
I am in the process of migrating to YARN and it seems the behavior of the DistributedCache changed.
Previously, I would add some files to the cache as follows:
for (String file : args) {
Path path = new Path(cache_root, file);
URI uri = new URI(path.toUri().toString());
DistributedCache.addCacheFile(uri, conf);
}
The path would typically look like
/some/path/to/my/file.txt
Which pre-exists on HDFS and would essentially end up in the DistributedCache as
/$DISTRO_CACHE/some/path/to/my/file.txt
I could symlink to it in my current working directory and use with DistributedCache.getLocalCacheFiles()
With YARN, it seems this file instead ends up in the cache as:
/$DISTRO_CACHE/file.txt
ie, the 'path' part of the file URI got dropped and only the filename remains.
How does with work with different absolute paths ending up with the same filename? Consider the following case:
DistributedCache.addCacheFile("some/path/to/file.txt", conf);
DistributedCache.addCacheFile("some/other/path/to/file.txt", conf);
Arguably someone could use fragments:
DistributedCache.addCacheFile("some/path/to/file.txt#file1", conf);
DistributedCache.addCacheFile("some/other/path/to/file.txt#file2", conf);
But this seems unnecessarily harder to manage. Imagine the scenario where those are command-line arguments, you somehow need to manage that those 2 filenames, although different absolute paths would definitely clash in the DistributedCache and therefore need to re-map these filenames to fragments and propagate as such to the rest of the program?
Is there an easier way to manage this?
Try to add files into Job
It's most likely how you're actually configuring the job and then accessing them in the Mapper.
When you're setting up the job you're going to do something like
job.addCacheFile(new Path("cache/file1.txt").toUri());
job.addCacheFile(new Path("cache/file2.txt").toUri());
Then in your mapper code the urls are going to be stored in an array which can be accessed like so.
URI file1Uri = context.getCacheFiles()[0];
URI file2Uri = context.getCacheFiles()[1];
Hope this could help you.
I had a question when using android-async-http. After reading the source code, I knew about how to add a File or InputStream as a parameter into RequestParam. Then the RequestParam would be transferred into an AsyncHttpClient which would use RequestParam to get/put/post....Just like this:
String url = ...;
File file = ...;
ResponseHandlerInterface respHandler = ...;
AsyncHttpClient client = new AsyncHttpClient();
RequestParams params = new RequestParams();
params.add("upload_file", file);
client.get(url, params, respHandler);
As we all know, Files of any types are bits essentially. So when delivered over the internet, files could be transferred into a byte stream. But I didn't find any codes about this conversion. So, I wonder how android-async-http completes this, or did I miss something when reading source codes?
I thought I found the way android-async-http handled with files/inputstreams. Uploading a file depends on the call of put(?)/post(?), but not get(?). By searching the overwrited methods of put(?)/post(?), you will find paramsToEntity(RequestParams, ResponseHandlerInterface) which will return a HttpEntity. And then, HttpPost/HttpPut will setEntity(HttpEntity). Because get(?)s don't support uploading files, then you can't find operations about uploading files in get(?)s.
I'm writing a small nginx module that reads some url pairs from a data file, and then stores its contents on an rb-tree in nginx's shared memory. In the handler's code, the tree contents are matched with the request uri, and a redirection is performed based on certain conditions.
This works fine at the moment. The module kicks in during the postconfiguration phase at the function ngx_http_mymodule_init, where it adds the shared memory and assigns the init function that will be called by nginx later.
ngx_http_mymodule_init(ngx_conf_t *cf) {
[...]
ngx_http_mymodule_shm_zone = ngx_shared_memory_add(...);
[...]
ngx_http_mymodule_shm_zone->init = ngx_http_mymodule_init_shm_zone;
return NGX_OK;
}
And that init_shm_zone function is the one that creates the tree, reads the data file, and initializes the tree contents.
But there should be one data file for each virtual server, so I need to read the path for the input data file from an nginx configuration directive, like this:
static ngx_command_t ngx_http_mymodule_commands[] = {
[...]
{ ngx_string("mymodule_input_filepath"),
NGX_HTTP_SRV_CONF|NGX_CONF_TAKE1,
ngx_conf_set_str_slot, // should be ngx_conf_set_path_slot i guess...
NGX_HTTP_SRV_CONF_OFFSET,
offsetof(ngx_http_mymodule_srv_conf_t, input_filepath),
NULL },
ngx_null_command
};
...and then use that filepath to open the file and store its contents on shared memory. However, at the ngx_http_mymodule_init function, I can't access the config object, so I can't read the file name.
So, my question is, in which handler or phase should I hook my initialization code, so that I can read the filename from the server config and use it to initialize my shared memory?
It has to be somewhere after the config file has been parsed, and after the shared memory has been created, but before any actual request processing.
I have tried to hook into the init_master and init_process handlers, but I can't seem to find the config object from the cycle object that those handlers receive as parameter...
mmcf = (ngx_my_module_conf_t *) ngx_get_conf(cycle->conf_ctx, ngx_my_module);
See here for example:
http://trac.nginx.org/nginx/browser/nginx/trunk/src/core/ngx_regex.c#L316