how to update COSDictionary? - pdfbox

I'm creating dictionary in the catalog.
PDDocumentCatalog catalog = template.getDocumentCatalog();
COSDictionary dssDictionary=new COSDictionary();
dssDictionary.setItem(COSName.getPDFName("Certs"), cosCerts);;
catalogDictionary.setNeedToBeUpdate(true);
catalogDictionary.setItem(COSName.getPDFName("DSS"), dssDictionary);
Ok. Everything works!
QUESTION 1)
Now, Imagine that I need to update my dictionary.
I can get this dictionary like this:
COSBase dssCosBase = catalogDictionary.getDictionaryObject(COSName.getPDFName("DSS"));
Ok, It conataint my certificates. But how can I add another certificates here? It does any method for that.
QUESTION 2)
I can get COSBase object but how can I get COSDictionary object?

To make my comment an answer...
I answer the questions last to first:
Question 2) I can get COSBase object but how can I get COSDictionary object?
Simply check whether the COSBase you have got, also is an instance of COSDictionary using instanceof. If it is, cast to COSDictionary.
Question 1) But how can I add another certificates here?
After casting (see above) you can get the contained Certs dictionary, and after casting that again, you can work in that dictionary, e.g. adding another certificate.
In reaction to the comment, you found the following to work as you desired:
COSDictionary dssDictionary = (COSDictionary) catalogDictionary.getDictionaryObject(COSName.getPDFName("DSS"));
While this certainly will work for valid PAdES part 4 / PDF version 2 document security stores, you in general should be aware that there are broken documents for which DSS in the catalog is of a different type. Thus, I would always check with instanceof first.

Related

Is there a function in sPacy to get the string given hash?

Basically looking for opposite of spacy.strings.get_string_id() which does not need to load the language model to get the vocabulary. I tried StringStore methods, but you need to add the string first or else you get a "Can't retrieve string for hash 'xxx'" error.
Use case is the hash is serialized then it is unserialized somewhere else.
No, you will need to keep a copy of the StringStore from the pipeline you used to process your documents in order to look up strings for hashes in the future.
In the end, it's nothing more than a list of strings that have been seen before, either as tokens or annotations, which you can simply re-add to a new StringStore.

Add path segment at last part of URL with ZnUrl

I am using Pharo 3 and I want to add a path segment as the last part of an URL for example http://example.com/myapp?key1=param1&key2=param2 and I want to get /myParam added to the last part. With ZnUrl I tried with #addSegment:
(ZnUrl fromString: 'http://example.com/myapp?key1=param1&key2=param2')
addPathSegment: 'myParam'
but results in
http://example.com/myapp/myParam?key1=param1&key2=param2
How could I configure the ZnUrl to get?
http://example.com/myapp?key1=param1&key2=param2/myParam
The thing you are describing is not a valid URL:
So what you are talking about is not an addition of a path segment, but rather string concatenation.
You can consider doing:
ZnUrl fromString: 'http://example.com/myapp?key1=param1&key2=param2/myParam'
or if you get a url from somewhere else,
(self asString, '/myParam') asUrl
should work too.
You can also do more magic to get everything to work, but in a first place you have to redesign your URL structure, to fit the standards (if you can influence it)

check if 2 linked list have the same elements regardless of order

Is there any way to check if 2 linked lists have the same elements regardless of order.
edit question:
I have fixed the code and given some more details:
this is the method that compares 2 lists
compare: object2
^ ((mylist asBag) = ((objetc2 getList) asBag)).
the method belongs to the class myClass that has a field : myLList. myList is a linkedList of type element.
I have compiled it in the workspace:
a: = element new id:1.
b:= element new id:2.
c:=element new id:3.
d: = element new id:1.
e:= element new id:2.
f:=element new id:3.
elements1 := myClass new.
elements addFirst:a.
elements addFirst:b.
elements addFirst:c.
elements2 := myClass new.
elements addFirst:d.
elements addFirst:e.
elements addFirst:f.
Transcript show: (elements1 compare:elements2).
so I am getting false.. seems like it checks for equality by reference rather than equality by value..
So I think the correct question to ask would be: how can I compare 2 Bags by value? I have tried the '=='..but it also returned false.
EDIT:
The question changed too much - I think it deserves a new question for itself.
The whole problem here is that (element new id: 1) = (element new id: 1) is giving you false. Unless it's particular class (or superclasses) redefine it, the = message is resolved comparing by identity (==) by default. That's why your code only works with a collection being compared with itself.
Test it with, for example, lists of numbers (which have the = method redefined to reflect what humans understand by numeric equality), and it will work.
You should redefine your element's class' = (and hashCode) methods for this to work.
Smalltalk handles everything by reference: all there exist is an object, which know (reference) other objects.
It would be wrong to say that two lists are equivalent if they are in different order, as the order is part of what a list means. A list without an order is what we call a bag.
The asBag message (as all of the other as<anotherCollectionType> messages) return a new collection of the named type with all the elements of the receiver. So, #(1 2 3 2) is an Array of four elements, and #(1 2 3 2) asBag is a bag containing those four elements. As it's a Bag, it doesn't have any particular order.
When you do bagA := Bag new. you are creating a new Bag instance, and reference it with bagA variable. But then you do bagA := myList asBag, so you lose the reference to the previous bag - the first assignment doesn't do anything useful in your code, as you don't use that bag.
Saying aBool ifTrue: [^true] ifFalse: [^false] has exactly the same meaning as saying ^aBool - so we prefer just to say that. And, as you only create those two new bags to compare them, you could simplify your whole method like this:
compareTo: anotherList
^ myList asBag = anotherList asBag
Read it out loud: this object (whatever it is) compares to another list if it's list without considering order is the same than the other list without order.
The name compareTo: is kind of weird for returning a boolean (containsSameElements: would be more descriptive), but you get the point much faster with this code.
Just to be precise about your questions:
1) It doesn't work because you're comparing bag1 and bag2, but just defined bagA and bagB.
2) It's not efficient to create those two extra bags just because, and to send the senseless ifTrue: message, but other way it's OK. You may implement a better way to compare the lists, but it's way better to rely on the implementation of asBag and the Bag's = message being performant.
3) I think you could see the asBag source code, but, yes, you can assume it to be something like:
Collection>>asBag
|instance|
instance := Bag new.
instance addAll: self.
^instance
And, of course, the addAll: method could be:
Collection>>addAll: anotherCollection
anotherCollection do: [ :element | self add: element ]
So, yes - it creates a new Bag with all the receiver's elements.
mgarciaisaia's answer was good... maybe too good! This may sound harsh, but I want you to succeed if you're serious about learning, so I reiterate my suggestion from another question that you pick up a good Smalltalk fundamentals textbook immediately. Depending on indulgent do-gooders to rework your nonsensical snippets into workable code is a very inefficient way to learn to program ;)
EDIT: The question has changed dramatically. The following spoke to the original three-part question, so I paraphrased the original questions inline.
Q: What is the problem? A: The problem is lack of fundamental Smalltalk understanding.
Q: Is converting to bags an efficient way to make the comparison? A: Although it's probably not efficient, don't worry about that now. In general, and especially at the beginning when you don't have a good intuition about it, avoid premature optimization - "make it work", and then only "make it fast" if justified by real-world profiling.
Q: How does #asBag work? A: The implementation of #asBag is available in the same living world as your own code. The best way to learn is to view the implementation directly (perhaps by "browsing implementors" if you aren't sure where it's defined") and answer your own question!! If you can't understand that implementation, see #1.

Fastest key/value pair container in Objective-C

I am creating a syntax highlighting engine. My need is very specific. Keywords will be associated to their respective attribute array via a pointer. The data structure will look something like:
dict = {
"printf": keyword_attr_ptr
, "sprintf": keyword_attr_ptr
, "#import": special_attr_ptr
, "string": lib_attr_ptr
}
The look-up needs to be very fast as I will be iterating over this list every keypress.
I'm asking this question because I can not find any good documentation regarding how NSDictionary caches (if it does) and looks up values by its keys (does it use a map? a hashmap?). Can I rely on NSDictionary to be optimized to search for keys by strings?
When I was doing something similar a long while ago I used the MFC CMap function with very good results. NSDictionary appears to be the equivalent to CMap but the key type isn't specified and the NSDictionary clearly states that a key can be any type of object. I just want to make sure I can rely on it to return the results extremely fast before I put a lot of energy into this problem.
UPDATE 1
After a day of research, I ask the question on SO and I find the answer immediately after... go figure.
This is the documentation related to Dictionaries:
https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/Collections/Articles/Dictionaries.html
It uses a hash table to manage its storage. I guess the short answer is that its almost equivalent to CMap.

Is it possible to add custom metadata to a Lucene field?

I've come to the point where I need to store some additional data about where a particular field comes from in my Lucene.Net index. Specifically, I want to attach a guid to certain fields of a document when the field is added to the document, and retrieve it again when I get the document from a search result.
Is this possible?
Edit:
Okay, let me clarify a bit by giving an example.
Let's say I have an object that I want to allow the user to tag with custom tags like "personal", "favorite", "some-project". I do this by adding multiple "tag" fields to the document, like so:
doc.Add( new Field( "tag", "personal" ) );
doc.Add( new Field( "tag", "favorite" ) );
The problem is I now need to record some meta data about each individual tag itself, specifically a guid representing where that tag came from (imagine it as a user id). Each tag could potentially have a different guid, so I can't simply create a "tag-guid" field (unless the order of the values is preserved---see edit 2 below). I don't need this metadata to be indexed (and in fact I'd prefer it not to be, to avoid getting hits on metadata), I just need to be able to retrieve it again from the document/field.
doc.GetFields( "tag" )[0].Metadata...
(I'm making up syntax here, but I hope my point is clear now.)
Edit 2:
Since this is a completely different question, I've posted a new question for this approach: Is the order of multi-valued fields in Lucene stable?
Okay let's try another approach... The key problem area is the indeterminacy of the multiple field values under the same field name (e.g. "tag"). If I could introduce or obtain some kind of determinacy here, I might be able to store the metadata in another field.
For example, if I could rely on the order of the values of the field never changing, I could use an index in the set of values to identify exactly which tag I am referring to.
Is there any guarantee that the order I add the values to a field will remain the same when I retrieve the document at a later time?
Depending on your search requirements for this index, this may be possible. That way you can control the order of fields. It would require updating both fields as the tag list changes of course, but the overhead may be worth it.
doc.Add(new Field("tags", "{personal}|{favorite}"));
doc.Add(new Field("tagsref", "{1234}|{12345}"));
Note: using the {} allows you to qualify your search for uniqueness where similar values exist.
Example: If values were stored as "person|personal|personage" searching for "person" would return a document that has any one of person, personal or personage. By qualifying in curly brackets like so: "{person}|{personal}|{personage}", I can search for "{person}" and be sure it won't return false positives. Of course, this assumes you don't use curly brackets in your values.
I think you're asking about payloads.
Edit: From your use case, it sounds like you have no desire to use this metadata in your search, you just want it there. (Basically, you want to use Lucene as a database system.)
So, why can't you use a binary field?
ExtraData ed = new ExtraData { Tag = "tag", Type = "personal" };
byte[] byteData = BinaryFormatter.Serialize(ed); // this isn't the correct code, but you get the point
doc.Add(new Field("myData", byteData, Field.Store.YES));
Then you can deserialize it on retrieval.