How to create a searchable central repository of code documentation using DocFx - documentation

I'm looking to create a central repository for all of our published API documentation using DocFx. I have documentation auto-generated via my build (using TFS) and published through my release (using Octopus) just fine for multiple individual sites. However, I'm wanting to pull it altogether in one location. The thinking is that through a parent site you could filter content in any of the individual sites without having to drill down into them. Do you have a recommendation on how to do this?
Also, within this same documentation repository I want to provide the capability to search by all of the meta data (project-level documentation) across the hundreds of projects in our portfolio. This will give our BA, DEV and QA teams easier access to what all our systems do. I like the "filtering" capability built into DocFx, but I'm wanting full-text search across all of the meta data. Do you have a recommendation for this functionality as well?

To change the location of the docfx output, edit the docfx.json file and specify the dest value. By default it is "dest": "_site". For more formatting guidance, reference: https://dotnet.github.io/docfx/tutorial/docfx.exe_user_manual.html.
Regarding full-text search, that is possible by simply ensuring the ExtractSearchIndex post-processor is invoked (in order to generate an index.json file of keywords) and that the global _enableSearch value is set to true in the docfx.json file. A snippet from that file would look like:
"postProcessors": [ "ExtractSearchIndex" ],
"globalMetadata": {
"_enableSearch": "true"
}

For your first question:
I think what you expect is like the .NET API Browser. The source code behind this page is not open to public, so you need create this page by yourself, through collecting xrefmap.yml from multiple sites, and extract the needed data into this page.
For your second question:
DocFX uses Luna to scan all the output files and generate an index file called index.json for later search use. In your case, you should want to limit the search scope only in the metadata you defined. This is also not supported by DocFX by default. You can also use Luna in your central place to search these meta. You can create your specific index.json for each project first, and the cental place to collect them for the search page.

Related

Search for issues that starts with xxx in Jira

This is my first time working with Jira and their API. My company wants me to fetch all "ASAPSD" issues, but I don't understand how to. The core problem in itself is that I do not understand exactly how Jira works, and how issues are "built" up.
The issue starts with "ASAPSD" followed by some random characters and numbers. For example "ASAPSD-334". How can I, with a GET request, get all issues that start with ASAPSD?
Basic information about Jira and projects
The first part (prefix) is the Project Key representing a project/collection where all similar issues are stored (in this case, ASAPSD may stand for ASAP Service Desk:-). There are certainly more projects in every Jira instance. Some other projects are intended to track different activities.
Searching for the project issues
You can search for any issues using the search function (available also via REST API).
First, log in to the Jira and try to search for the issues manually by yourself - in Issue Navigator (via Issues top menu bar). Here you'll find that you can search all issues via Basic (Project is ASAPSD) or Advanced search (project = ASAPSD). This advanced search is called JQL (Jira Query Language).
You can then use this JQL in your REST API search method:
https://docs.atlassian.com/software/jira/docs/api/REST/latest/#api/2/search-search
Example
GET https://jira.yourdomain.com/rest/api/2/search?jql=project%3DASAPSD
Notes
The output lists only limited number of issues (usually 50) - to get more issues, you need to increase the limit (maxResults param) or paginate (startAt param) over next results.
Using expand and fields params you can alter the output to get more/less information.
There two types of Jira instances - on-premise Server/Data Center and Cloud one. REST API and usage might slightly differ.
Alternatively, you can get CSV export. When you search for issues in the Issue Navigator, there's an option to export results to CSV. Save this URL and you can request it in your script via GET.

Implementing JSon-LD Schema in Ektron, is it possible?

This is my first time using Ektron and i'm trying to implement Json-LD schema scripts for each page. I have 68 scripts that I need to implement that are unique for each page.
I thought I would be able to implement these scripts through meta data, but now i'm unsure. Each script is over 1000 characters, the html and meta tag types only allow 500 characters, so i'm assuming i'm in the wrong place. If anyone could shed some light it would be much appreciated.
Ektron's metadata isn't intended for large chunks of data / content. So, yes, you will find limits there.
Here are two things you might try as workarounds.
Most direct:
Use the Ektron Library. Go to the Library tab and click on the Root node and view Properties. Add an extension to allow you to upload your JSON-LD as a file. Use metadata on the content item to reference the uploaded file. Combine the two upon output.
If you want the JSON-LD to be editable within the CMS...
Gaming the platform a bit
Create a new SmartForm definition and include in it a single plain-text, multi-line field (not Rich text). This should hold your JSON-LD. Set up a folder and, if your version supports it (you didn't specify CMS version, so I will assume relatively recent), set the folder to be non-searchable so these things don't come up in site search results. Add a restriction to the folder to only allow the Smart Form definition you just created. Create your JSON-LD there using the plain-text field. You should be able to store up to 1MB.
Same as above, add your JSON-LD as text then use a reference to this item from the content you want to use it.
The metadata in this case (and possibly the library one, though I'd have to test and I don't have an Ektron environment for development anymore) will give you the Content ID for the object holding your JSON-LD. You'll have to make another API call but will give you the solution you appear to want from above.

Creating an information repository referencing bot

I would like to create a bot. Someone would type "!123" the bot will search the repository for the value "123" and return(paste) the information found for that value back. I'd like this to be universal..meaning it can be used anywhere, so some sort of firefox plugin maybe.
Can someone provide me with information on where i can start?
I have an understanding of programming in c# and java.
P.s There is no intention for this to be some sort of spam bot, i just want to have a collection of information where people can easily reference it.
there are multiple portions to your project.
Bot that would crawl the data from the web and save the data in the db. (given you are considering to build your repository from web). Google Web Crawler/scraper for that.
Data extractor/Cleanser that would clean the data and extract relevant information about a particular document. (this is important so that you could tag the information for relevant information)
Then is the Search Engine part which enables you to retrieve relevant data from the repository. try vector similarity algorithm for that

Semantic store and entity hub

I am working on a content platform that should provide semantic features such as querying with SPARQL and providing rdf documents for the contained content.
I would be very thankful for some
clarification on the following
questions:
Did I get that right, that an entity
hub can connect several semantic
stores to a single point of access?
And if not, what is the difference
between a semantic store and an
entity hub?
What frameworks would you use to
store content documents as well as
their semantic annotation?
It is important for the solution to be able to later on retrieve the document (html page / docs such as pdf, doc,...) and their annotated version.
Thanks in advance,
Chris
The only Entityhub term that I know is belong to Apache Stanbol project. And here is a paragraph from the original documentation explaining what Entityhub does:
The Entityhub provides two main services. The Entityhub provides the
connection to external linked open data sites as well as using indexes
of them locally. Its services allow to manage a network of sites to
consume entity information and to manage entities locally.
Entityhub documentation:
http://incubator.apache.org/stanbol/docs/trunk/entityhub.html
Enhancer component of Apache Stanbol provides extracting external entities related with the submitted content using the linked open data sites managed by Entityhub. These enhancements of contents are formed as RDF data. Then, it is also possible to store those content items in Apache Stanbol and run SPARQL queries on top of RDF enhancements. Contenthub component of Apache Stanbol also provides faceted search functionality over the submitted content items.
Documentation of Apache Stanbol:
http://incubator.apache.org/stanbol/docs/trunk/
Access to running demos:
http://dev.iks-project.eu/
You can also ask your further questions to stanbol-dev AT incubator.apache.org.
Alternative suggestion...
Drupal 7 has in-built RDFa support for annotation and is more of a general purpose CMS than Semantic MediaWiki
In more detail...
I'm not really sure what you mean by entity hub, where are you getting that definition from or what do you mean by it?
Yes one can easily write a system that connects to multiple semantic stores, given the context of your question I assume you are referring to RDF Triple Stores?
Any decent CMS should be assigning documents some form of unique/persistent ID to documents so even if the system you go with does not support semantic annotation natively you could build your own extension for this. The extension would simply store annotations against the documents ID in whatever storage layer you chose (I'd assume a Triple Store would be appropriate) and then you can build appropriate query and presentation layers for querying and viewing this data as required.
http://semantic-mediawiki.org/wiki/Semantic_MediaWiki
Apache Stanbol
Do you want to implement a traditional CMS extended with some Semantic capabilities, or do you want to build a Semantic CMS? It could look the same, but actually both a two completely opposite approaches.
It is important for the solution to be able to later on retrieve the document (html page / docs such as pdf, doc,...) and their annotated version.
You can integrate Apache Stanbol with a JCR/CMIS compliant CMS like Alfresco. To get custom annotations, I suggest creating your own custom enhancement engine (maven archetype) based on your domain and adding it to the enhancement engine chain.
https://stanbol.apache.org/docs/trunk/components/enhancer/
One this is done, you can use the REST API endpoints provided by Stanbol to retrieve the results in RDF/Turtle format.

How is the Trac Project List page customised?

We've been using Trac for a while now for our developers only. However we are now opening it up for our (internal) clients. We have a project listing page (based on the default one that comes with Trac). What we'd like to do, is display more information about the project than what is currently available.
I have searched google and here, to see if I can find how to get more information. There seems to be a variable called $project which has .name, .description and .href as attributes.
Is there somewhere, a list of the attributes available? Or perhaps a different solution altogether that will allow us to display more information on the project list page. Such as the number of open tickets etc.
As far as I known, you can use $project.env as well. It is an object, which provides a number of attributes:
$project.env.base_url
$project.env.base_url_for_redirect
$project.env.secure_cookies
$project.env.project_name
$project.env.project_description
$project.env.project_url
$project.env.project_admin
$project.env.project_admin_trac_url
$project.env.project_footer
$project.env.project_icon
$project.env.log_type
$project.env.log_file
$project.env.log_level
$project.env.log_format
More detail is available at env.py
On the project page customization page there is not much variables, indeed. Looking at the source code there is also trac.version, trac.time, but that's all. There is also project.env that may hold more information. I do not have a multiproject setup at hand, so you might be interested to see for yourself what variables are available with TracDeveloper plugin. It dumps variables if enabled and you add debug=true in the URL.