How to write WIC XMP people tags to jpg? - kotlin

I have images with people tagging information in xml format. I wish to edit this information and also add it to pictures that do not yet have it. By looking at the xml I assume it is based on the people tagging used in the microsoft imaging component.
I haven't quite understood the format, but I understood it sof far, that I can alter or gemerate the xml, I just do not know where to write it in the image. I am probably just doing some stupid mistake, because I am not experienced with these image metadatas. So if you think I'm just on the wrong track and that can be done much simpler, please tell me.
In those images that already contain this xml, I can use search and replace to update the xml. However I have a lot of pictures that do not yet contain that information and I do not know where I should write it to inside the image.
Images that already contain this information can be read with exiftool as follows:
exiftool -xmp -b existingTags.JPG
The result is the following xml:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?> <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP
Core 4.4.0-Exiv2"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about=""
xmlns:xmp="http://ns.adobe.com/xap/1.0/" xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:MP="http://ns.microsoft.com/photo/1.2/" xmlns:MPRI="http://ns.microsoft.com/photo/1.2/t/RegionInfo#"
xmlns:MPReg="http://ns.microsoft.com/photo/1.2/t/Region#" xmp:Rating="0"> <dc:subject> <rdf:Bag> <rdf:li>Valeriya
</rdf:li> </rdf:Bag> </dc:subject> <MP:RegionInfo rdf:parseType="Resource"> <MPRI:Regions> <rdf:Bag> <rdf:li
MPReg:Rectangle="0.48, 0.418, 0.059333, 0.089" MPReg:PersonDisplayName="findus_l"/> </rdf:Bag> </MPRI:Regions>
</MP:RegionInfo> </rdf:Description> </rdf:RDF> </x:xmpmeta> <?xpacket end="w"?>
However I cannot write the information using exiftool. When I ran this command, it simply reads the information again, instead of writing the contents of the file to the image:
exiftool -xmp<=alteredXMP.txt existingTags.JPG
A bit of research has shown me, that exiftool can only write specific xmp tags, and the people tagging tags from windows imaging component do not seem to be part of this.
Where in the image file should I write the information? Can I somehow find this spot programmatically and then just insert the xml there?
I am using Kotlin as programming language but I don't mind having to call command line functions or other programs.
Background: I have a Synology Diskstation and use the included software called photo station. The photo station supports tagging of people on the images and uses this given format. I like the photo station in many ways, but the face recognition is bad, so I want to use my own but have photo station be able to read it.

The data you are trying to write is part of the Microsoft Region Structure. XMP Structured data is a complex subject but you should be able to add the data with exiftool by writing region names to the RegionPersonDisplayName tag and the region dimensions to the RegionRectangle. Using the data in your example, the command would be:
exiftool -RegionPersonDisplayName=findus_l -RegionRectangle="0.48, 0.418, 0.059333, 0.089" /path/to/files
If you have to write multiple regions, you can just add them on, but you must keep names and the matching dimensions in the same order. For example
exiftool -RegionPersonDisplayName=findus_l -RegionRectangle="0.48, 0.418, 0.059333, 0.089" -RegionPersonDisplayName="John Smith" -RegionRectangle="0.37645533, 0.04499886, 0.35111009, 0.26633097" /path/to/files
These commands would overwrite any existing region data. If you are adding new names without overwriting, you would change the equal signs to PlusEqual +=.

Related

Xpages File Download collect and display extra information

I'm using file upload and download control. I understand how to use the provided display columns, but how would I go about collecting other info about each uploaded file and then displaying it (i.e. Display Name and Notes that the user would enter)?
<xp:fileUpload id="fileUpload1"
value="#{document1.files}" style="width:80%"
useUploadname="false">
<xp:eventHandler event="onchange"
submit="true" refreshMode="complete"
disableValidators="true">
</xp:eventHandler>
</xp:fileUpload>
<xp:br></xp:br>
<xp:fileDownload rows="30" id="FD1"
displayLastModified="false" value="#{document1.files}"
style="width:98%" hideWhen="true" displayType="false"
displayCreated="true" rules="all"
lastModifiedTitle="Last Modified">
<xp:this.allowDelete><![CDATA[${javascript:database.queryAccessRoles(session.getEffectiveUserName()).contains('[Admin]')}]]></xp:this.allowDelete>
</xp:fileDownload>
If I understand your question correctly: you want to add additional information columns into the file download control that are derived from information stored or computed elsewhere, e.g. from a NotesItem (a field in the Notes document)?
In this case you need to construct your own output using a repeat control. You can render a table or a list - whatever you deem fit for display.
The “trick” is how to construct the URL for download - which is simply:
/yourdatabase.nsf/0/unid/AttachmentName?OpenAttachment
(typed off memory. You might need to double check syntax).
Word of caution: if you have lots of attachments, you might consider having separate documents for them and use a view - above URL works in views too. Saves you a versioning headache (in case multiple users can upload to the same document).
Let us know how it goes

Downloading all full-text articles in PMC and PubMed databases

According to one of the answered questions by NCBI Help Desk , we cannot "bulk-download" PubMed Central. However, can I use "NCBI E-utilities" to download all full-text papers in PMC database using Efetch or at least find all corresponding PMCids using Esearch in Entrez Programming Utilities? If yes, then how? If E-utilities cannot be used, is there any other way to download all full-text articles?
First of all, before you go downloading files in bulk, I highly recommend you read the E-utilities usage guidelines.
If you want full-text articles, you're going to want to limit your search to open access files. Furthermore, I suggest also restricting your search to Medline articles if you want articles that are any good. Then you can do the search.
Using Biopython, this gives us :
search_query = 'medline[sb] AND "open access"[filter]'
# getting search results for the query
search_results = Entrez.read(Entrez.esearch(db="pmc", term=search_query, retmax=10, usehistory="y"))
You can use the search function on the PMC website and it will display the generated query that you can copy/paste into your code.
Now that you've done the search, you can actually download the files :
handle = Entrez.efetch(db="pmc", rettype="full", retmode="xml", retstart=0, retmax=int(search_results["Count"]), webenv=search_results["WebEnv"], query_key=search_results["QueryKey"])
You might want to download in batches by changing retstart and retmax by variables in a loop in order to avoid flooding the servers.
If handle contains only one file, handle.read() contains the whole XML file as a string. If it contains more, the articles are contained in <article></article> nodes.
The full text is only available in XML, and the default parser available in pubmed doesn't handle XML namespaces, so you're going to be on your own with ElementTree (or an other parser) to parse your XML.
Here, the articles are found thanks to the internal history of E-utilities, which is accessed with the webenv argument and enabled thanks to the usehistory="y" argument in Entrez.read()
A few tips about XML parsing with ElementTree : You can't delete a grandchild node, so you're probably going to want to delete some nodes recursively. node.text returns the text in node, but only up to the first child, so you'll need to do something along the lines of "".join(node.itertext()) if you want to get all the text in a given node.
According to one of the answered questions by NCBI Help Desk , we cannot "bulk-download" PubMed Central.
https://www.nlm.nih.gov/bsd/medline.html + https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/ will download a good portion of it (I don't know the percentage). It will indeed miss the PMC full-texts articles whose license doesn't allow redistribution as explained on https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/.

Concat in xslt, unable to get an image from server

I am new to xslt coding, I am trying to fix an issue in existing piece of code. I got stuck up at a point
<fo:external-graphic content-width="150pt"
content-height="50pt"
src="url:{concat('${OA_MEDIA}/',$revised_last_name,',',DOCUMENT_BUYER_FIRST_NAME,'.gif')}" />
The above piece of code is trying to find a .gif file in OA_MEDIA directory. Till that part I can understand fine.
When I am placing a file name as "Eckert,Tim.gif" (excluding the quotes) my program isn't picking that file
In the above piece, I printed $revised_lastname and $document_buyer_first_name..It's coming as Tim and Eckert, but it's still not picking the file. If I am hardcoding a file name like below it's working fine
<fo:external-graphic content-width="150pt"
content-height="50pt"
src="url:{concat('${OA_MEDIA}/','Tim','.gif')}" />
How can I print what value is coming into the src in above piece of code so I can see what file is it trying to look in the $OA_MEDIA.
Any suggestions are appreciated.
Thanks!
Your first problem appears to be that in your constructed URI and your file name, you have put the given name and the family name in different orders: Tim,Eckert.gif vs Eckert,Tim.gif. Choose one.
If you still have problems after that, your next step is to confirm that your concatenation is producing the value you expect. I'd add a line like
<xsl:message>Generating fo:external graphic with URI <xsl:value-of
select="concat('url:',
'${OA_MEDIA}/',
$revised_last_name,
',',
DOCUMENT_BUYER_FIRST_NAME,
'.gif')"/></xsl:message>
And if that did not shed light on what was going wrong, I'd insert individual messages to display the current value of $revised_last_name and DOCUMENT_BUYER_FIRST_NAME.
More than that it's difficult to say, because your question does not provide a short, self-contained, complete example that allows readers to reproduce the problem you are trying to solve. There is good advice on asking effective questions in the SO help files and in Eric Raymond and Rick Moen's essay How to ask questions the smart way.

Documentation with Diagram "Hyperlinks" in Enterprise Architect?

I'm struggling to get all the required (and only the required) information into the documentation of my Enterprise Architect Project. Precisely: we have modelled various requirements and displayed the source "standards" for these requirements in our diagrams by using the "hyperlink"-element out of the common toolbox. (This allows us to capture a title, the website where the documentation is found and a description of this documentation).
Now this element is visible on the diagram, but not in the package-view of our model and it does not get generated in our word (docx) documentation.
I can see that it should be possible to get this in the documentation, because a "Model Report" which basically prints everything does print the hyperlinks. But I can't find what I have to select in my template (in the package-tree view, as a package field, element field or diagram field) in order to get this printed. I can't just use the model report since this basically dumps the whole database in the document and reverse-engineering this model report has proven too difficult for me. Actually I would expect this to be in some kind of documentation for EA, but could not find such a thing with this level of detail... is there, is there a reproducible way of finding such things out in further cases? (btw I'm using EA 11.0)
[sorry there were illustrations here, but I'm not allowed to upload them...]
As Geert has already noted, there is a difference between "proper" elements and diagram-only elements. This is actually reflected in the document template editor, where there is an "Element" section inside the "Diagram" section. This will produce output for all elements in the diagram, whether or not they are also in the project browser.
Here's an example of the information you can pull out of your hyperlinks. Given a diagram with a hyperlink:
... and a template which outputs name, alias and hyperlink for each element in the diagram:
... EA will generate a document will the following contents:
So if you want the hyperlink to result in a hyperlink in the document, use the HyperlinkAlias field.
What might be a bit confusing is the fact that in addition to the Hyperlink element type in the Common diagram toolbox, EA allows you to create hyperlinks in regular elements (in the Element Properties dialog, Related tab: Files, which can be local files or web addresses).
In fact, I would recommend that you use those in your Requirement elements rather than diagram-only Hyperlinks if traceability is a priority in your model. The diagram-only Hyperlinks, on the other hand, give you a clearer visual.
Selecting a subset of the elements in a diagram ("only the required information") is a little more involved and depends on how your model is structured. Template fragments will get the job done, but you might be able to achieve your desired result by just using the filters in the document generation dialog.
The hyperlink is an element that is stored in the same package as the diagram it is used on, it is just not visible in the project browser (similar to a note element).
There's a good chance that it doesn't have a name, so make sure you don't omit nameless elements.
So if you print all the element of the package containing the diagram then you should be able to print the hyperlink as well.
In case that fails you might want to consider creating a template fragment based on an SQL query or a script. Those offer lots of flexibility to print whatever you need, even if it is located in a different package.
[Edited on 04.05.15 to reflect the comment by Uffe and provide a final solution]
Ok, based on Geerts answer, using the following custom query fragment in the diagram section:
select
t_object.ea_guid as CLASSGUID,
t_object.Object_Type as CLASSTYPE,
t_object.Object_Id as OBJECTID,
t_object.name as HL_Name,
t_object.Stereotype as HL_Stereotype,
t_object.object_type as HL_Type,
t_object.Alias as HL_Alias,
Note as Notes
--,t_object.*
from t_object
left join t_diagramobjects on (t_object.Object_ID = t_diagramobjects.Object_ID)
left join t_diagram on (t_diagram.Diagram_ID = t_diagramobjects.Diagram_ID)
where t_diagram.Diagram_ID = '#DIAGRAMID#'
and t_object.Object_Type='Text'
I was able to get a list of the hyperlinks following the diagram, this is the fragment:
custom >
{HL_Alias}: {HL_Name}
{Notes}
< custom
The "Notes" can be printed by getting the attribute directly out of the t_object table. Don't get confused as I was at first: the auto-completion on t_object and the results (t_object.*) DO NOT SHOW a Note-Attribute, but it does exist an when you write it into the query, it gets generated in the document.

Indexing Multiple documents and mapping to unique solr id

My use case is to index 2 files: metadata file and a binary PDF file to a unique solr id. Metadata file has content in form of XML file and some schema fields are mapped to elements in that XML file.
What I do: Extract content from PDF files(using pdftotext), process that content and retrieve specific information(example: PDF's first page/line has information about the medicine, research stage). Information retrieved(medicine/research stage) needs to be indexed and one should be able to search/sort/facet.
I can create a XML file with information retrieved(lets call this as metadata file). Now assuming my schema would be
<field name="medicine" type="text" stored="true" indexed="true"/>
<field name="researchStage". ../>
Is there a way to put this metadata file and the PDF file in Solr?
What I have tried:
Based on a suggestion in archives, I zipped these files and gave to ExtractRequestHandler. I was able to put all the content in SOLR and make it searchable. But it appear as content of zip file.(I had to apply some patches to Solr Code base to make this work). But this is not sufficient as the content in metadata file is not mapped to field names.
curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=#file.zip"
I tried to work with DataImportHandler(binURLdatasource). But I don't think I understand how it works. So could not go far.
I thought of adding metadata tags to PDF itself. For this to work, ExtractrequestHandler should process this metadata. I am not sure of that either.
So I tried "pdftk" to add metadata. Was not able to add custom tags to it. It only updates/adds title/author/keywords etc. Does anyone know similar unix tool.
If someone has tips, please share.
I want to avoid creating 1 file(by merging PDF text + metadata file).
Given a file record1234.pdf and metadata like:
<metadata>
<field1>value1</field1>
<field2>value2</field2>
<field3>value3</field3>
</metadata>
Do the programmatic equivalent of
curl "http://localhost:8983/solr/update/extract?
literal.id=record1234.pdf
&literal.field1=value1
&literal.field2=value2
&literal.field3=value3
&captureAttr=true&defaultField=text&capture=div&fmap.div=foo_txt&boost.foo_txt=3&" -F "tutorial=#tutorial.pdf"
Adapted from http://wiki.apache.org/solr/ExtractingRequestHandler#Literals .
This will create a new entry in the index containing the text output from Tika/Solr CEL as well as the fields you specify.
You should be able to perform these operations in your favorite language.
the content in metadata file is not mapped to field names
If they dont map to a predefined field, then use dynamic fields. For example you can set a *_i to be an integer field.
I want to avoid creating 1 file(by merging PDF text + metadata file).
That looks like programmer fatigue :-) But, do you have a good reason?