Apache NiFi: Bootstrap UserGroups and Policies with a file based provider

Apache NiFi: Bootstrap UserGroups and Policies with a file based provider - authorization

Is it possible to bootstrap UserGroups and Policies with a file based provider?
Currently we use org.apache.nifi.authorization.FileUserGroupProvider to bootstrap an Initial User Identity and org.apache.nifi.authorization.FileAccessPolicyProvider to bootstrap the Initial Admin Identity when setting up a NiFi instance.
I inspected the code of the FileUserGroupProvider as well as the Authorizers.xml Setup in the Administration Guide and I couldn't find anything about bootstrapping UserGroups. I guess the same goes with bootstrapping AccessPolicies using the FileAccessPolicyProvider. I know that it is possible using LDAP, but we don't use that right now.
I already found a similar question her on StackOverflow but the solution is not satisfactory, as we don't want to use the nifi-api for that task, if not absolutely necessary. So what I would do is writing a new file based UserGroupProvider and AccessPolicyProvider to fulfill that task.
Is that the only possibility?
Would I use the CompositeUserGroupProvider or the CompositeConfigurableUserGroupProvider for that, so instead of re-implementing the functionality of the FileUserGroupProvider and adding my custom implementation could I use this to combine the functionality?
Meaning something like this:
<userGroupProvider>
<identifier>composite-user-group-provider</identifier>
<class>org.apache.nifi.authorization.CompositeUserGroupProvider</class>
<property name="User Group Provider 1">org.apache.nifi.authorization.FileUserGroupProvider</property>
<property name="User Group Provider 2">MyFileUserGroupProvider</property>
</userGroupProvider>
How would the configuration look like in the authorizers.xml file?
If my assumption about how to use a CompositeProvider is correct, is there something similar for bootstrapping Policies?

If I understand correctly, you want to automate setting users, groups, and policies to fixed, predefined values.
I would recommend using the FileUserGroupProvider and the FileAccessPolicyProvider, as those both give you the ability to configure users, groups, and policies directly in NiFi itself. You should not have to create custom implementations of a UserGroupProvider or AccessPolicyProvider unless you need to customize the functionality beyond what the included filed-based providers can supply.
You said you did not want to use the nifi-api, by which I assume you mean the HTTP REST API. (I am not trying to be pedantic, there is actually a library called nifi-api that is a collection of Java interfaces for nifi developers to use in writing extensions.) The REST APi is a good option I would normally recommend, as there are guarantees on backwards compatibility on for NiFI 1.x going forward, but it is not the only way to achieve what you want to do.
You can create users.xml and authorizations.xml files manually (or scripted), outside of NiFi, and you just have to configure the FileUserGroupProvider and AccessUserGroupProvider to use those files (or copy them to the default location for those files in the conf directory). On startup, NiFi reads the contents of these files into memory to create users, groups, and access policies. The Initial User and Initial Admin properties are only used to automate populating these files when they are absent or empty, so if you provide your own copies of these files, they will be used.
The structure of these XML files is fairly simple to create. You can use a NiFi instance to create users, groups, and policies through the UI, and see what is written to these files. You can then create them however you like: the NiFi UI, by hand, or scripted from another source file. Once you have the files created, you can do the "bootstrapping" part by placing them in the NiFi conf dir and (re)starting it. NiFi does not regenerate or modify these files unless users, groups, and policies are modified in the UI.
The only downside with these approach is that these files are not guaranteed to have a stable schema going forward. So new fields could be added or changed over time. That said, they have been stable for the last several versions of NiFi.

Related

How to create a searchable central repository of code documentation using DocFx

I'm looking to create a central repository for all of our published API documentation using DocFx. I have documentation auto-generated via my build (using TFS) and published through my release (using Octopus) just fine for multiple individual sites. However, I'm wanting to pull it altogether in one location. The thinking is that through a parent site you could filter content in any of the individual sites without having to drill down into them. Do you have a recommendation on how to do this?
Also, within this same documentation repository I want to provide the capability to search by all of the meta data (project-level documentation) across the hundreds of projects in our portfolio. This will give our BA, DEV and QA teams easier access to what all our systems do. I like the "filtering" capability built into DocFx, but I'm wanting full-text search across all of the meta data. Do you have a recommendation for this functionality as well?

To change the location of the docfx output, edit the docfx.json file and specify the dest value. By default it is "dest": "_site". For more formatting guidance, reference: https://dotnet.github.io/docfx/tutorial/docfx.exe_user_manual.html.
Regarding full-text search, that is possible by simply ensuring the ExtractSearchIndex post-processor is invoked (in order to generate an index.json file of keywords) and that the global _enableSearch value is set to true in the docfx.json file. A snippet from that file would look like:
"postProcessors": [ "ExtractSearchIndex" ],
"globalMetadata": {
"_enableSearch": "true"
}

For your first question:
I think what you expect is like the .NET API Browser. The source code behind this page is not open to public, so you need create this page by yourself, through collecting xrefmap.yml from multiple sites, and extract the needed data into this page.
For your second question:
DocFX uses Luna to scan all the output files and generate an index file called index.json for later search use. In your case, you should want to limit the search scope only in the metadata you defined. This is also not supported by DocFX by default. You can also use Luna in your central place to search these meta. You can create your specific index.json for each project first, and the cental place to collect them for the search page.

How to queries all the assignments from the repository?

My environment:
Alfresco Share v5.2.d (r134641-b15, Aikau 1.0.101.3, Spring Surf
5.2.d, Spring WebScripts 6.13, Freemarker 2.3.20-alfresco-patched, Rhino 1.7R4-alfresco-patched, Yui 2.9.0-alfresco-20141223)
Alfresco Community v5.2.0 (r134428-b13) schema 10005
When I start the workflow, I can assign executors - the list of users who will participate in the business process. Somehow I need to get a list of all those users.
There is an excellent guide, which shows how to use Lucene to get a list of whitepapers.
To interact with repository through REST I'm also use the Web Script Framework MVC.
But how can I get the list of assignments?..
I'm interested in how to look like the query in this case:
...
var assignments = search.luceneSearch("what should be here?");
...

If you want to know who is assigned to a specific running Workflow instance, then I don't think you can easily do that via the search service. Instead, you want to be using the Alfresco WorkflowService to get that.
Most likely you'll want to grab the WorkflowInstance for the specific running workflow, grab the tasks, and check from the properties on that.
If you look at WorkflowPermissionInterceptor from the Alfresco source tree, you'll see pretty much the logic you want, covering both individual assignees and group assignments.

Alfresco permissions depending on whether document is currently part of workflow or not

Out-of-the-box, an Alfresco user can read a document based on:
The document's permissions
The user's role
The user's groups
Whether the user owns the document or not
Maybe some other factors I forgot?
Now, I want to add a new factor: Whether the document is currently part of a workflow.
Alfresco's permissionDefinitions.xml allows me to define permissions based on authorities such as ROLE_LOCK_OWNER etc, but it does not seem to be the right place to add permission conditions.
I guess I will have to write some Java source code, but I am not sure what classes are responsible for this, and whether there is an Alfresco way to customize them?

So, I assume you want to somehow have nodes that are attached to a workflow have different access rights? You need to think about the behavior you want in all of the UIs and protocols you are exposing (e.g. share, WebDAV, CIFS, FTP, etc.).
If you want to set a permission on a node, you can do that via JavaScript as well as Java (See http://docs.alfresco.com/5.2/references/API-JS-setPermission.html and http://docs.alfresco.com/5.2/references/dev-services-permission.html). As was mentioned in one of the comments, you can also get the number of active workflows on a node by referencing the activeWorkflows property in JavaScript (http://docs.alfresco.com/5.2/references/API-JS-ScriptNode.html) or in Java
Depending on the specifics, I might implement this in different ways, but if all you want to do is have the permission change, you could just update it at the beginning and end of your workflow with a simple javascript call. The only thing bad about that is that it doesn't take into consideration the workflow getting canceled. You could also create a policy/behavior on an aspect you attach or even have a rule or job run that updates content based on the activeWorkflows values.

How to config multi repositories in one Alfresco instance?

How to config multi-Repositories in one Alfresco instance ?
such as in ' alfresco-global.properties' , I can config multi-Repositories locations:
dir1.root=\\server1\driver1\alf_data
...
dir2.root=\\server1\driver2\alf_data
...
dir3.root=\\server2\driver1\alf_data
And I can manage all these Repositories in this Alfresco instance.
Benefit:
1) I can manage them in one Alfresco instance.
2) I can increase my storage capacity anytime to add new Repositories.
3) improve search&index performance as there are many different Storage hard disk.
How to do that?
Also we can track this issue at Alfresco official forum

You can just add Content Stores to Alfresco, take a look here: http://wiki.alfresco.com/wiki/Content_Store_Selector or here: http://docs.alfresco.com/4.1/topic/com.alfresco.enterprise.doc/concepts/store-manage-content.html
So basically, you're adding a new store to Alfresco next to the workSpace/SpacesStore.
By adding an Aspect to a content you can move the content to the other location.
Probably you'll need to do some more stuff, but this will get you started.

Alfresco does not have a multi-repository feature. You always have one repository, but:
you can add & manage different content Stores as Tahir mentioned.
you can also use non-file-system content stores like EMC Centera, NetApp Filer...
you can also use elastic content stores like Caringo CAStor
you can enable multi tenancy mode
without extensive programming you'll always have 1 central DB & 1 central search index for now

You can do it with store selector. But I heard its been removed in 4.2 community. Yet to verify it.

Semantic store and entity hub

I am working on a content platform that should provide semantic features such as querying with SPARQL and providing rdf documents for the contained content.
I would be very thankful for some
clarification on the following
questions:
Did I get that right, that an entity
hub can connect several semantic
stores to a single point of access?
And if not, what is the difference
between a semantic store and an
entity hub?
What frameworks would you use to
store content documents as well as
their semantic annotation?
It is important for the solution to be able to later on retrieve the document (html page / docs such as pdf, doc,...) and their annotated version.
Thanks in advance,
Chris

The only Entityhub term that I know is belong to Apache Stanbol project. And here is a paragraph from the original documentation explaining what Entityhub does:
The Entityhub provides two main services. The Entityhub provides the
connection to external linked open data sites as well as using indexes
of them locally. Its services allow to manage a network of sites to
consume entity information and to manage entities locally.
Entityhub documentation:
http://incubator.apache.org/stanbol/docs/trunk/entityhub.html
Enhancer component of Apache Stanbol provides extracting external entities related with the submitted content using the linked open data sites managed by Entityhub. These enhancements of contents are formed as RDF data. Then, it is also possible to store those content items in Apache Stanbol and run SPARQL queries on top of RDF enhancements. Contenthub component of Apache Stanbol also provides faceted search functionality over the submitted content items.
Documentation of Apache Stanbol:
http://incubator.apache.org/stanbol/docs/trunk/
Access to running demos:
http://dev.iks-project.eu/
You can also ask your further questions to stanbol-dev AT incubator.apache.org.

Alternative suggestion...
Drupal 7 has in-built RDFa support for annotation and is more of a general purpose CMS than Semantic MediaWiki
In more detail...
I'm not really sure what you mean by entity hub, where are you getting that definition from or what do you mean by it?
Yes one can easily write a system that connects to multiple semantic stores, given the context of your question I assume you are referring to RDF Triple Stores?
Any decent CMS should be assigning documents some form of unique/persistent ID to documents so even if the system you go with does not support semantic annotation natively you could build your own extension for this. The extension would simply store annotations against the documents ID in whatever storage layer you chose (I'd assume a Triple Store would be appropriate) and then you can build appropriate query and presentation layers for querying and viewing this data as required.

http://semantic-mediawiki.org/wiki/Semantic_MediaWiki

Apache Stanbol

Do you want to implement a traditional CMS extended with some Semantic capabilities, or do you want to build a Semantic CMS? It could look the same, but actually both a two completely opposite approaches.

It is important for the solution to be able to later on retrieve the document (html page / docs such as pdf, doc,...) and their annotated version.
You can integrate Apache Stanbol with a JCR/CMIS compliant CMS like Alfresco. To get custom annotations, I suggest creating your own custom enhancement engine (maven archetype) based on your domain and adding it to the enhancement engine chain.
https://stanbol.apache.org/docs/trunk/components/enhancer/
One this is done, you can use the REST API endpoints provided by Stanbol to retrieve the results in RDF/Turtle format.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas