X-Tika-PDFocrStrategy is an invalid X-Tika-OCR header error - apache

I am trying to use TikaJAXRS and add headers for setting PDFParser properties. Specifially the ocrStrategy property. However, when I add the header using X-Tika-PDFocrStrategy, I get an error stating that it is an invalid X-Tika-OCR header.
After looking into the source code, I believe the issue might be with the 'fillParseContext' method in the TikaResource.java file.
public static void fillParseContext(ParseContext parseContext, MultivaluedMap<String, String> httpHeaders,
Parser embeddedParser) {
TesseractOCRConfig ocrConfig = new TesseractOCRConfig();
PDFParserConfig pdfParserConfig = new PDFParserConfig();
for (String key : httpHeaders.keySet()) {
if (StringUtils.startsWith(key, X_TIKA_OCR_HEADER_PREFIX)) {
processHeaderConfig(httpHeaders, ocrConfig, key, X_TIKA_OCR_HEADER_PREFIX);
} else if (StringUtils.startsWith(key, X_TIKA_PDF_HEADER_PREFIX)) {
processHeaderConfig(httpHeaders, pdfParserConfig, key, X_TIKA_PDF_HEADER_PREFIX);
}
}
parseContext.set(TesseractOCRConfig.class, ocrConfig);
parseContext.set(PDFParserConfig.class, pdfParserConfig);
if (embeddedParser != null) {
parseContext.set(Parser.class, embeddedParser);
}
}
The if statement first looks for a key that starts with the OCR header prefix, and since the PDFParser's property name contains 'ocr', it is trying to find a property named 'ocrStrategy' in the OCRParser class, which doesn't exist.
Can anyone verify this, or tell me how I could set that property via the restful service?
Side note, I am using Tika as a service via a docker container and calling it from a python app.

Related

Header values getting lost spring integration

Im using header enricher which uses existing headers to set a value of new header. However existing header information is lost and only 3 header remain ie request-id,timestamp and raw-body.
public String vipul(Message<String> message) {
MessageHeaders messageHeaders =message.getHeaders();
if (messageHeaders.containsKey("x-death")) {
List<HashMap<String, Object>> deathList = (List<HashMap<String, Object>>) messageHeaders
.get("x-death");
//logger.debug(message.get("messageId")+" "+deathList);
if (deathList.size() > 0) {
HashMap<String, Object> death = deathList.get(0);
if (death.containsKey("original-expiration")) {
return (String) death.get("original-expiration");
//logger.info(messageHeaders.get("messageId")+" original-expiration = "+death.get("original-expiration"));
}
}
} else {
return null;
}
return "";
}
In this messageHeaders map has only has 3 keys and not all the header keys which are normally there. I need to make a retry system using original expiration .
MY spring integration xml has following snippet :
<int:header-enricher input-channel="fromPushAppointmentErrorHandler1"
output-channel="fromPushAppointmentErrorHandler">
<int:header name="original_expiration" method="vipul" ref="errorhelper"/>
</int:header-enricher>
First of all it looks like you also need an overwrite="true" for that <int:header name="original_expiration"> since the logic in your vipul() is about to produce a new value for existing header and that is not going to happen since the value is already there in headers.
The fact that you are missing some headers in this your logic might be dictated by some upstream <transformer> which returns the whole Message without copying request headers.

Extract org.restlet.http.headers value from Camel headers inside a .choice()

I'm trying to extract a value from org.restlet.http.headers header collection in a Camel route.
My incoming POST has a http header property called IncomingRequestType: ABCD.
Camel moves this inside the exchange headers collection, but it is buried inside org.restlet.http.headers which is in-itself a collection of headers.
I can extract the value in a process using the code below:
.process(new Processor() {
public void process(Exchange exchange) throws Exception {
org.restlet.util.Series<Header> httpHeaders = null;
httpHeaders = (Series<Header>) exchange.getIn().getHeader("org.restlet.http.headers");
String reqType = httpHeaders.getValues("IncomingRequestType").toString();
}})
Outside of a process I need to access the IncomingRequestType inside a .choice().when()
e.g. i want to be able to do:
.choice()
.when(header("org.restlet.http.headers")["IncomingRequestType"]).isEqualTo("ABCD"))
Any suggestions on how this can be done. I've tried creating predicates but cannot get a suitable solution.
This can be done in the simple language:
.choice()
.when(simple("${in.header.org.restlet.http.headers[IncomingRequestType]} == 'ABCD'"))

Understanding seam filter url-pattern and possible conflicts

I made a custom editor plugin, in a Seam 2.2.2 project, which makes file upload this way:
1) config the editor to load my specific xhtml upload page;
2) call the following method inside this page, and return a javascript callback;
public String sendImageToServer()
{
HttpServletRequest request = ServletContexts.instance().getRequest();
try
{
List<FileItem> items = new ServletFileUpload(new DiskFileItemFactory()).parseRequest(request);
processItems(items);//set the file data to specific att
saveOpenAttachment();//save the file to disk
}
//build callback
For this to work I have to put this inside components.xml:
<web:multipart-filter create-temp-files="false"
max-request-size="1024000" url-pattern="*"/>
The attribute create-temp-files do not seems to matter whatever its value.
But url-pattern has to be "" or "/myUploadPage.seam", any other value makes the item list returns empty. Does Anyone know why?
This turns into a problem because when I use a url-pattern that work to this case, every form with enctype="multipart/form-data" in my application stops to submit data. So I end up with other parts of the system crashing.
Could someone help me?
To solve my problem, I changed the solution to be like Seam multipart filter handle requests:
ServletRequest request = (ServletRequest) FacesContext.getCurrentInstance().getExternalContext().getRequest();
try
{
if (!(request instanceof MultipartRequest))
{
request = unwrapMultipartRequest(request);
}
if (request instanceof MultipartRequest)
{
MultipartRequest multipartRequest = (MultipartRequest) request;
String clientId = "upload";
setFileData(multipartRequest.getFileBytes(clientId));
setFileContentType(multipartRequest.getFileContentType(clientId));
setFileName(multipartRequest.getFileName(clientId));
saveOpenAttachment();
}
}
Now I handle the request like Seam do, and do not need the web:multipart-filter config that was breaking other types of request.

Call OutgoingHeaders using NServiceBus.Host

Using NServiceBus 4.0.11
I would like to call
Bus.OutgoingHeaders["user"] = "john";
The Header Manipulation sample shows how to call it with a custom host.
I would like to call it while using the NServiceBus.Host.
So actually I would like to have a reference to the instance of the Bus, to call OutgoingHeaders on.
Tried IWantCustomInitialization but that gives me an exception when calling CreateBus in it. INeedInitialization isn't the way to go neither.
How should I call Bus.OutgoingHeaders["user"] = "john"; while using the NServiceBus.Host?
Reading your question makes me think that you want to add this header to a certain message that you want to send during initialization/startup or when handling a message. Usually, headers have a more generic behavior as they need to be applied to more than one message.
Instead of setting the header before sending the message you can also add the header via a message mutator or behavior.
Behavior
public class OutgoingBehavior : IBehavior<SendPhysicalMessageContext>
{
public void Invoke(SendPhysicalMessageContext context, Action next)
{
Dictionary<string, string> headers = context.MessageToSend.Headers;
headers["MyCustomHeader"] = "My custom value";
next();
}
}
Mutator
public class MutateOutgoingTransportMessages : IMutateOutgoingTransportMessages
{
public void MutateOutgoing(object[] messages, TransportMessage transportMessage)
{
Dictionary<string, string> headers = transportMessage.Headers;
headers["MyCustomHeader"] = "My custom value";
}
}
Documentation
See: http://docs.particular.net/nservicebus/messaging/message-headers#replying-to-a-saga-writing-outgoing-headers for samples.

RazorEngine Error trying to send email

I have an MVC 4 application that sends out multiple emails. For example, I have an email template for submitting an order, a template for cancelling an order, etc...
I have an Email Service with multiple methods. My controller calls the Send method which looks like this:
public virtual void Send(List<string> recipients, string subject, string template, object data)
{
...
string html = GetContent(template, data);
...
}
The Send method calls GetContent, which is the method causing the problem:
private string GetContent(string template, object data)
{
string path = Path.Combine(BaseTemplatePath, string.Format("{0}{1}", template, ".html.cshtml"));
string content = File.ReadAllText(path);
return Engine.Razor.RunCompile(content, "htmlTemplate", null, data);
}
I am receiving the error:
The same key was already used for another template!
In my GetContent method should I add a new parameter for the TemplateKey and use that variable instead of always using htmlTemplate? Then the new order email template could have newOrderKey and CancelOrderKey for the email template being used to cancel an order?
Explanation
This happens because you use the same template key ("htmlTemplate") for multiple different templates.
Note that the way you currently have implemented GetContent you will run into multiple problems:
Even if you use a unique key, for example the template variable, you will trigger the exception when the templates are edited on disk.
Performance: You are reading the template file every time even when the template is already cached.
Solution:
Implement the ITemplateManager interface to manage your templates:
public class MyTemplateManager : ITemplateManager
{
private readonly string baseTemplatePath;
public MyTemplateManager(string basePath) {
baseTemplatePath = basePath;
}
public ITemplateSource Resolve(ITemplateKey key)
{
string template = key.Name;
string path = Path.Combine(baseTemplatePath, string.Format("{0}{1}", template, ".html.cshtml"));
string content = File.ReadAllText(path);
return new LoadedTemplateSource(content, path);
}
public ITemplateKey GetKey(string name, ResolveType resolveType, ITemplateKey context)
{
return new NameOnlyTemplateKey(name, resolveType, context);
}
public void AddDynamic(ITemplateKey key, ITemplateSource source)
{
throw new NotImplementedException("dynamic templates are not supported!");
}
}
Setup on startup:
var config = new TemplateServiceConfiguration();
config.Debug = true;
config.TemplateManager = new MyTemplateManager(BaseTemplatePath);
Engine.Razor = RazorEngineService.Create(config);
And use it:
// You don't really need this method anymore.
private string GetContent(string template, object data)
{
return Engine.Razor.RunCompile(template, null, data);
}
RazorEngine will now fix all the problems mentioned above internally. Notice how it is perfectly fine to use the name of the template as key, if in your scenario the name is all you need to identify a template (otherwise you cannot use NameOnlyTemplateKey and need to provide your own implementation).
Hope this helps.
(Disclaimer: Contributor of RazorEngine)