RDFa reference documents - google-custom-search

I have to implement a Google Custom Search for a website. The website has different content types. One of them is a "publication". A publication consists of different fields:
Title
Author
Published date
Document
Document type
Document is the URL to a PDF, Text, MS-Word, etc. document. And Document type is, as you can expect, the document type (ie. PDF, DOC, TXT, etc).
I will need this information to be in the Rich Snippet because I have to format the search results differently for each document type (ie. include a different icon, etc).
What schema should I be using for that? I could not find information about how to structure data for that kind of content. Can I use anything from Schema.org? Or should I create my own? Any idea?
Thanks in advance for any input on that.

For customizing results snippets in Google Custom Search, it doesn’t matter which vocabulary you use in your RDFa. You could use an existing one (like Schema.org), or create your own, or use any combination of multiple vocabularies.
You can see the extracted structured data that can be used for this purpose in the Google Structured Data Testing Tool by clicking at "Custom Search Result Filters" (or by changing the results filter from "All data" to "Custom Search Engine").
You can fetch this structured data and create your own presentation layer.

Related

Search keyword in PDF and check if exists

The idea was to be able to, as soon as a receive a mail with a PDF attached, find a way in which the PDF can be downloaded and be searched for a specific keyword (for instance, see if my name is in it) and if my name is on any of the pages of the PDF, then send another mail notifying the user that there’s a pdf in which he has been named.
This is in order to avoid having to check dozens of mails daily and PDFs just to see if your name is in it or not.
I managed to do this using Zapier but I relied on PDFco’s API for the search, and it is payware, so I’m taking a different approach.
My question is more based on what library would make that search inside the PDF and would provide a Boolean value that said if the keyword exists or not.
Thank you!

Implementing JSon-LD Schema in Ektron, is it possible?

This is my first time using Ektron and i'm trying to implement Json-LD schema scripts for each page. I have 68 scripts that I need to implement that are unique for each page.
I thought I would be able to implement these scripts through meta data, but now i'm unsure. Each script is over 1000 characters, the html and meta tag types only allow 500 characters, so i'm assuming i'm in the wrong place. If anyone could shed some light it would be much appreciated.
Ektron's metadata isn't intended for large chunks of data / content. So, yes, you will find limits there.
Here are two things you might try as workarounds.
Most direct:
Use the Ektron Library. Go to the Library tab and click on the Root node and view Properties. Add an extension to allow you to upload your JSON-LD as a file. Use metadata on the content item to reference the uploaded file. Combine the two upon output.
If you want the JSON-LD to be editable within the CMS...
Gaming the platform a bit
Create a new SmartForm definition and include in it a single plain-text, multi-line field (not Rich text). This should hold your JSON-LD. Set up a folder and, if your version supports it (you didn't specify CMS version, so I will assume relatively recent), set the folder to be non-searchable so these things don't come up in site search results. Add a restriction to the folder to only allow the Smart Form definition you just created. Create your JSON-LD there using the plain-text field. You should be able to store up to 1MB.
Same as above, add your JSON-LD as text then use a reference to this item from the content you want to use it.
The metadata in this case (and possibly the library one, though I'd have to test and I don't have an Ektron environment for development anymore) will give you the Content ID for the object holding your JSON-LD. You'll have to make another API call but will give you the solution you appear to want from above.

Using the TBS (or equivalent parameter) in Google Custom Search

When using Google Custom Search the TBM parameter for selecting specific types of search engines (e.g. tbm=pts for patents or tbm=blg for blogs) seems to be supported even though this isn't properly documented in the list of parameters.
However when using such "special" searches one usually extend the query by using the TBS parameter, unfortunately this doesn't work for me. For example:
https://www.google.com/?tbm=pts&gws_rd=ssl#q=touch+screen&tbas=0&tbs=ptss:g,ptso:us&tbm=pts
filters correctly when posted from a browser.
But the equivalent custom search:
https://www.googleapis.com/customsearch/v1?key=*MY_KEY*&cx=*MY_ENGINE*&tbm=pts&q=lice+comb&tbs=ptss:g
Completely ignores the TBS parameter.
Is there a way to specify an equivalent parameter in custom search?
Set image search on in the Control Panel to generate a separate Image tab (can set to default). If you're editing an existing CSE, it's under Search Features in left-hand column, then click Imagesearch Settings. You can edit settings, but specifying image type (e.g., Faces) isn't one of them - also see https://support.google.com/customsearch/answer/2630972?hl=en&ref_topic=4513743 and https://support.google.com/customsearch/answer/3037004?hl=en&topic=3036698
If you want to do additional things like that, you're probably better off creating a series of bookmarklets (easily shareable with a free tool like https://w-shadow.com/bookmarklet-combiner/ ) or via your own JS-enhanced webpage where you can build in all those parameters.

How to add a custom field provider to MS Word?

Foreword: I want to allow users to define high quality document templates and then inject there data from our information system and print the result. I think MS Word is a great starting point, because this work is aimed to business letters etc., not data reports.
Question: Is it possible to add a custom field provider to MS Word?
I don't have English MS Word, so I must try to describe what I mean in a few sentences. Normally we can insert "fields" like author name, current date etc. These fields work seamlessly. We can switch view of fileds between data and definition. Definion of author field looks like this: { AUTHOR \\* MERGEFORMAT }.
Now I want to inject external data into documents and let user specify where to put them. A user should define a document template and mark spots where external data should be injected. Since Word users generally aren't IT experts, the easiest way for them is to use some macros or "insert field" option to do it. So I want to define my own set of fields and connect Word to my custom field and data provider. How to do it? I am unable to find any documentation on this.
I think this approach is better than using sql database connection or something like that, because I want to let the external source define list of known fields and their values, not the docx document itself. Also, data source won't actually be an sql database.
Yes you can do this by using Custom Document Properties as placeholders and then use some VBA code to set those properties to whatever you want. You can get the data via ODBC or from an Excel spreadsheet or from a text file.
First of all, experiment manually by going into File, Properties and creating some custom properties. Give them a value and then, in the document, insert some DocProperty fields. If you can't find DocProperty in your language version of Word then look through a list of the fields like Author etc. Since field names are visible to end users they might have been translated.
Then in order to complete your document template, create a VBA function that uses SetProperty. Read this article for more details. It is up to you whether the VBA is triggered by opening the file or whether you add a menu item to do that.
No need for special controls or any commercial add-ons. I'm going to add a VBA tag to your question since this is really a VBA programming question. In fact, this has been possible since Word for Windows 1.0.
Using SetProperty in VBA is a bit more complex now. I got the following snippet of code from this forum posting.
object docProps = wdDoc.CustomDocumentProperties;
Type docPropsType = docProps.GetType();
object Prop = docPropsType.InvokeMember("Item",
BindingFlags.Default |
BindingFlags.GetProperty,
null, docProps,
new object[] {propName} );
Type PropType = Prop.GetType();
PropType.InvokeMember("Item",
BindingFlags.Default |
BindingFlags.SetProperty,
null, docProps,
new object[] {propName,propValue} );
Absolutely, this is the exact kind of scenario that Content Controls and CustomXMLParts were built for (Word 2007/2010 only, not earlier .doc format).
Most of the Word Developer Center home page deals with these two: Content Controls and CustomXMLParts. If you go this route, you'll find the Word Content Control Toolkit an invaluable resource as well, especially when just starting out.
From an end-user perspective, it could be as simple as just creating buttons on the Ribbon for insertable Content Controls via a template or document add-in (VSTO or VBA).
If you want a fairly decent prebuilt solution, check out Windward reports.
Yeah, the name makes it sound like a reporting tool, but in reality, it's exactly what you're describing. They have a Word add in that allows users to easily mark up a word doc with fields to be inserted from your data source.
I built a very similar system for a law firm. Windward didn't do quite everything I needed it to do, but at the same time, it's pretty powerful.

Is it possible to store document fields in a SearchKit index?

Is it possible in SearchKit (on OS X) to actually store document fields in the search index, as it is possible in some other search libraries? This would allow for quickly displaying certain fields (author, subject, date, etc) without having to read the files corresponding to the underlying documents as the result of a search.
Do you just want to be able to read the attributes or do you want to be able to search against them? If you want to be able to read them, you can add them as attributes to your document via the SKIndexSetDocumentProperties call. You then retrieve them with the SKIndexCopyDocumentProperties call. In both cases, the properties are stored in a dictionary.
This assumes you know how to read the properties from the file and you're not just blindly relying on the Search Kit / Spotlight importers.
If you want to be able to search against the properties, you're probably going to have a search index for the properties you want to search.