How to parse XML to Stanza or XmlFragment in vysper enviroment - apache-vysper

To make a nice and readable test case i'd like to parse some hand written XML (copy-paste from xmpp.org), transform it to Stanza or XMLElement and proceed with actual tests. So i'd like to avoid stanza builders at all.
Is such thing possible with Non-blocking XML parser?

In order to obtain XMLElement solution is to use DefaultNonBlockingXMLReader and assign a stanza listener.
The trick is to start "stream", so XML of stanza to test should be wrapped to something like ".....
The code:
private Stanza fetchStanza(String xml) throws SAXException {
try {
NonBlockingXMLReader reader = new DefaultNonBlockingXMLReader();
reader.setContentHandler(new XMPPContentHandler(new XMLElementBuilderFactory()));
XMPPContentHandler contentHandler = (XMPPContentHandler) reader.getContentHandler();
final ArrayList<Stanza> container = new ArrayList(); // just some container to hold stanza.
contentHandler.setListener(new XMPPContentHandler.StanzaListener() {
public void stanza(XMLElement element) {
Stanza stanza = StanzaBuilder.createClone(element, true, Collections.EMPTY_LIST).build();
if (!container.isEmpty()) {
container.clear(); // we need only last element, so clear the container
}
container.add(stanza);
}
});
IoBuffer in = IoBuffer.wrap(("<stream>" + xml + "</stream>").getBytes()); // the trick it to wrap xml to stream
reader.parse(in, CharsetUtil.UTF8_DECODER);
Stanza stanza = container.iterator().next();
return stanza;
} catch (IOException ex) {
throw new RuntimeException(ex);
}
}

Related

Text Extraction, Not Image Extraction

Please help me understand if my solution is correct.
I'm trying to extract text from a PDF file with a LocationTextExtractionStrategy parser. I'm getting exceptions because the ParseContentMethod tries to parse inline images? The code is simple and looks similar to this:
RenderFilter[] filter = { new RegionTextRenderFilter(cropBox) };
ITextExtractionStrategy strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter);
PdfTextExtractor.GetTextFromPage(pdfReader, pageNumber, strategy);
I realize the images are in the content stream but I have a PDF file failing to extract text because of inline images. It returns an UnsupportedPdfException of "The filter /DCTDECODE is not supported" and then it finally fails with and InlineImageParseException of "Could not find image data or EI", when all I really care about is the text. The BI/EI exists in my file so I assume this failure is because of the /DCTDECODE exception. But again, I don't care about images, I'm looking for text.
My current solution for this is to add a filterHandler in the InlineImageUtils class that assigns the Filter_DoNothing() filter to the DCTDECODE filterHandler dictionary. This way I don't get exceptions when I have InlineImages with DCTDECODE. Like this:
private static bool InlineImageStreamBytesAreComplete(byte[] samples, PdfDictionary imageDictionary) {
try {
IDictionary<PdfName, FilterHandlers.IFilterHandler> handlers = new Dictionary<PdfName, FilterHandlers.IFilterHandler>(FilterHandlers.GetDefaultFilterHandlers());
handlers[PdfName.DCTDECODE] = new Filter_DoNothing();
PdfReader.DecodeBytes(samples, imageDictionary, handlers);
return true;
} catch (IOException e) {
return false;
}
}
public class Filter_DoNothing : FilterHandlers.IFilterHandler
{
public byte[] Decode(byte[] b, PdfName filterName, PdfObject decodeParams, PdfDictionary streamDictionary)
{
return b;
}
}
My problem with this "fix" is that I had to change the iTextSharp library. I'd rather not do that so I can try to stay compatible with future versions.
Here's the PDF in question:
https://app.box.com/s/7eaewzu4mnby9ogpl2frzjswgqxn9rz5

How to skip value in config if it doesn't exist?

I am writing App, that need to read config at start. Some is not necessary for work.
class ParseConfig
{
string optionalkey;
//...
this()
{
this.optionalkey = config.getKey("key1");
}
//...
}
The problem that I need to find way to skip (do not try to find and parse) it if not exists and config. Now App try to parse config and show me error.
I found only one way - to wrap all in try-catch block, and if value can't be found in config in cantch block set it's to null.
What is the best way to do it?
I am using dini for config.
upd: (added example)
import std.stdio;
import std.path;
import std.file;
import dini;
void main()
{
string confpath = buildPath(getcwd, "config.ini");
if (!exists(confpath)) throw new Exception("ERROR: config.ini do not exists");
auto config = Ini.Parse(confpath);
try
{
string optionalkey;
if(config.getKey("optionalkey"))
{
optionalkey = config.getKey("optionalkey");
}
writeln(optionalkey); // nothing will shown, becouse exception
}
catch( Exception e)
{
writeln("Exception! :(");
writeln(e.msg);
}
}
Catching exception is one way, but it is not perfect (mainly if there will be many cases of optional configs). So better way is test if key exist:
class ParseConfig
{
string optionalkey;
//...
this()
{
this.optionalkey = config.hasKey("key1") ? config.getKey("key1") : "defaultValue";
}
//...
}
But ideal would be if dini has overload of getKey method so you can use something like this:
this.optionalkey = config.getKey("key1", "defaultValue");
But from sources I see it does not have it, but I plan to add it and make a PR.
UPDATE
PR: https://github.com/robik/DIni/pull/3
Wrote a pretty advanced ini file wrapper today which supports sections, comments, thread-safety, default values for reading, writing/reading using template values, entry checks etc.
You can get it here:
https://github.com/BaussProjects/baussini
Here is an example usage (example.d from the repo)
module main;
import baussini;
import std.stdio : writefln, readln;
void main() {
string fileName = "test.ini";
// Thread-safe instance, for a non thread-safe instance replace "true" with "false"
auto ini = new IniFile!(true)(fileName);
// Use open() for reading and close() for write. Both can be combined ...
if (!ini.exists()) {
ini.addSection("Root");
// Write way 1
ini.write!string("Root", "StringValue1", "Hello World!");
// Write way 2
ini.getSection("Root").write!int("IntValue1", 9001);
// Write way 3
ini.getSection("Root")
.write!string("StringValue2", "Hello Universe!")
.write!int("IntValue2", 1000000);
ini.close();
}
else {
ini.open();
// Read way 1
string stringValue1 = ini.read!string("Root", "StringValue1");
// Read way 2
int intValue1 = ini.getSection("Root").read!int("IntValue1");
// Read way 3
string stringValue2;
int intValue2;
ini.getSection("Root")
.read!string("StringValue2", stringValue2)
.read!int("IntValue2", intValue2);
writefln("%s is %s", "stringValue1", stringValue1);
writefln("%s is %s", "intValue1", intValue1);
writefln("%s is %s", "stringValue2", stringValue2);
writefln("%s is %s", "intValue2", intValue2);
readln();
}
}
In your case you could either use IniFile.hasKey or IniSection().hasKey()
Example:
// Check way 1
if (ini.hasKey("Root", "StringValue1")) {
// The section "Root" has an entry named "StringValue1"
}
// Check way 2
auto section = ini.getSection("Root");
if (section.hasKey("StringValue1")) {
// The section "Root" has an entry named "StringValue1"
}
You could also use default values.
string stringValue1 = ini.getSection("Root").read!string("StringValue1", "Default");
// stringValue1 will be "Default" if it doesn't exist within "Root"
The default value has to be a string input, but it will always convert the value of it to T.
Ex.
int defaultValue = ini.getSection("Root").read!int("IntValue3", "1000");
// defaultValue will be 1000 if it doesn't exist within "Root"
You can test if a key is present with hasKey
class ParseConfig
{
string optionalkey;
//...
this()
{
if (config.hasKey("key1"))
this.optionalkey = config.getKey("key1");
}
//...
}
assuming that we talk about the same dini

Apache Tika Api consuming given stream

I use Apache Tika bundle dependency for a Project to find out MimeTypes for Files. due to some issues we have to find out through InputStream. it is actually guaranteed to mark / reset given InputStream. Tika-Bundle includes core and parser api and uses PoifscontainerDetector , ZipContainerDetector, OggDetector, MimeTypes and Magic for detection. I have been debugging for 3 hours and all of Detectors mark and reset after detection. I did it in following way.
TikaInputStream tis = null;
try {
TikaConfig config = new TikaConfig();
tikaDetector = config.getDetector();
tis = TikaInputStream.get(in);
MediaType mediaType = tikaDetector.detect(tis, new Metadata());
if (mediaType != null) {
String[] types = mediaType.toString().split(",");
for (int i = 0; i < types.length; i++) {
mimeTypes.add(new MimeType(types[i]));
}
}
} catch (Exception e) {
logger.error("Mime Type for given Stream could not be resolved: ", e);
}
But Stream is consumed. Does anyone know how to find out MimeTypes without consuming Stream?
This problem bugged me for a while too before I finally solved it. The problem is that, while Detector.detect() methods are required to mark and reset the stream, this resetting will have no effect on your original stream (the in variable) if marking is not supported in that stream.
In order to get this to work, I had to first convert my stream to a BufferedInputStream before doing anything else. I would then pass that buffered stream to the detect algorithm, and I would use that same buffered stream later for parsing, reading, or whatever I needed to do.
BufferedInputStream buffStream = new BufferedInputStream(in);
TikaInputStream tis = null;
try {
TikaConfig config = new TikaConfig();
tikaDetector = config.getDetector();
tis = TikaInputStream.get(buffStream);
MediaType mediaType = tikaDetector.detect(tis, new Metadata());
if (mediaType != null) {
String[] types = mediaType.toString().split(",");
for (int i = 0; i < types.length; i++) {
mimeTypes.add(new MimeType(types[i]));
}
}
} catch (Exception e) {
logger.error("Mime Type for given Stream could not be resolved: ", e);
}
// further along in my code...
doSomething(buffStream); // rather than doSomething(in)

How do I use Sitecore.Data.Serialization.Manager.LoadItem(path,LoadOptions) to restore a item to Sitecore?

I am trying to use the sitecore API to serialize and restore sitecore items. I have created a WCF app to retrieve an Item name given a ID or sitecore path (/sitecore/content/home), retrieve a list of the names of the items children give an id or path. I can also Serialize the content tree.
public void BackupItemTree(string id)
{
Database db = Sitecore.Configuration.Factory.GetDatabase("master");
Item itm = db.GetItem(id);
Sitecore.Data.Serialization.Manager.DumpTree(itm);
}
The above code works great. After running it can see that the content tree has been serialized.
However when I try to restore the serialized items useing the following:
public void RestoreItemTree(string path)
{
try
{
using (new Sitecore.SecurityModel.SecurityDisabler())
{
Database db = Sitecore.Configuration.Factory.GetDatabase("master");
Data.Serialization.LoadOptions opt = new Data.Serialization.LoadOptions(db);
opt.ForceUpdate = true;
Sitecore.Data.Serialization.Manager.LoadItem(path, opt);
//Sitecore.Data.Serialization.Manager.LoadTree(path, opt);
}
}
catch (Exception ex)
{
throw ex;
}
}
With this code I get no errors. It runs, but if I check SiteCore it didn't do anything. I have tested using the Office Core example. The path I sent in, which might be the issue is:
C:\inetpub\wwwroot\sitecoretest\Data\serialization\master\sitecore\content\Home\Standard-Items\Teasers\Our-Clients.item
and
C:\inetpub\wwwroot\sitecorebfahnestockinet\Data\serialization\master\sitecore\content\Home\Standard-Items\Teasers\Our-Clients
Neither seems to do anything. I changed the teaser title of the item and am trying to restore to before the but every time the change is still present.
Any help would be appreciated as the SiteCore documentation is very limited.
You can always check how the Sitecore code works using Reflector, the following method is called when you click "Revert Item" in back-end:
protected virtual Item LoadItem(Item item, LoadOptions options)
{
Assert.ArgumentNotNull(item, "item");
return Manager.LoadItem(PathUtils.GetFilePath(new ItemReference(item).ToString()), options);
}
In LoadOptions you can specify whether you want to overwrite ("Revert Item") or just update ("Update Item") it.
See Sitecore.Shell.Framework.Commands.Serialization.LoadItemCommand for more info.
You have the correct LoadOptions for forcing an overwrite (aka Revert).
I suspect that the path you are using for the .item file wrong. I would suggest modifying your method to take a path to a Sitecore item. Using that path, you should leverage other serialization APIs to determine where the file should be.
public void RestoreItemTree(string itemPath)
{
Sitecore.Data.Database db = Sitecore.Configuration.Factory.GetDatabase("master");
Sitecore.Data.Serialization.ItemReference itemReference = new Sitecore.Data.Serialization.ItemReference(db.Name, itemPath);
string path = Sitecore.Data.Serialization.PathUtils.GetFilePath(itemReference.ToString());
Sitecore.Data.Serialization.LoadOptions opt = new Sitecore.Data.Serialization.LoadOptions(db);
opt.ForceUpdate = true;
using (new Sitecore.SecurityModel.SecurityDisabler())
{
Sitecore.Data.Serialization.Manager.LoadItem(path, opt);
}
}
Took me a while to work out, but you have to remove .item when restoring the tree
try this
public void RestoreItemTree(string itemPath)
{
var db = Factory.GetDatabase("master");
var itemReference = new ItemReference(db.Name, itemPath);
var path = PathUtils.GetFilePath(itemReference.ToString());
if (!System.IO.File.Exists(path))
{
throw new Exception("File not found " + path);
}
var opt = new LoadOptions(db);
opt.ForceUpdate = true;
using (new SecurityDisabler())
{
Manager.LoadItem(path, opt);
Manager.LoadTree(path.Replace(".item", ""), opt);
}
}

Returning binary content from a JPF action with Weblogic Portal 10.2

One of the actions of my JPF controller builds up a PDF file and I would like to return this file to the user so that he can download it.
Is it possible to do that or am I forced to write the file somewhere and have my action forward a link to this file? Note that I would like to avoid that as much as possible for security reasons and because I have no way to know when the user has downloaded the file so that I can delete it.
I've tried to access the HttpServletResponse but nothing happens:
getResponse().setContentLength(file.getSize());
getResponse().setContentType(file.getMimeType());
getResponse().setHeader("Content-Disposition", "attachment;filename=\"" + file.getTitle() + "\"");
getResponse().getOutputStream().write(file.getContent());
getResponse().flushBuffer();
We have something similar, except returning images instead of a PDF; should be a similar solution, though, I'm guessing.
On a JSP, we have an IMG tag, where the src is set to:
<c:url value="/path/getImage.do?imageId=${imageID}" />
(I'm not showing everything, because I'm trying to simplify.) In your case, maybe it would be a link, where the href is done in a similar way.
That getImage.do maps to our JPF controller, obviously. Here's the code from the JPF getImage() method, which is the part you're trying to work on:
#Jpf.Action(forwards = {
#Jpf.Forward(name = FWD_SUCCESS, navigateTo = Jpf.NavigateTo.currentPage),
#Jpf.Forward(name = FWD_FAILURE, navigateTo = Jpf.NavigateTo.currentPage) })
public Forward getImage(final FormType pForm) throws Exception {
final HttpServletRequest lRequest = getRequest();
final HttpServletResponse lResponse = getResponse();
final HttpSession lHttpSession = getSession();
final String imageIdParam = lRequest.getParameter("imageId");
final long header = lRequest.getDateHeader("If-Modified-Since");
final long current = System.currentTimeMillis();
if (header > 0 && current - header < MAX_AGE_IN_SECS * 1000) {
lResponse.setStatus(HttpServletResponse.SC_NOT_MODIFIED);
return null;
}
try {
if (imageIdParam == null) {
throw new IllegalArgumentException("imageId is null.");
}
// Call to EJB, which is retrieving the image from
// a separate back-end system
final ImageType image = getImage(lHttpSession, Long
.parseLong(imageIdParam));
if (image == null) {
lResponse.sendError(404, IMAGE_DOES_NOT_EXIST);
return null;
}
lResponse.setContentType(image.getType());
lResponse.addDateHeader("Last-Modified", current);
// public: Allows authenticated responses to be cached.
lResponse.setHeader("Cache-Control", "max-age=" + MAX_AGE_IN_SECS
+ ", public");
lResponse.setHeader("Expires", null);
lResponse.setHeader("Pragma", null);
lResponse.getOutputStream().write(image.getContent());
} catch (final IllegalArgumentException e) {
LogHelper.error(this.getClass(), "Illegal argument.", e);
lResponse.sendError(404, IMAGE_DOES_NOT_EXIST);
} catch (final Exception e) {
LogHelper.error(this.getClass(), "General exception.", e);
lResponse.sendError(500);
}
return null;
}
I've actually removed very little from this method, because there's very little in there that I need to hide from prying eyes--the code is pretty generic, concerned with images, not with business logic. (I changed some of the data type names, but no big deal.)