What is the purpose of Agent User Id in configuring Replication Agent in AEM? - replication

As per my understanding it is user who has access to publish the specific page/resource.
Documentation goes like this:
Depending on the environment, the agent will use this user account to:
collect and package the content from the author environment
create and write the content on the publish environment
Leave this field empty to use the system user account (the account defined in sling as the administrator user; by default this is admin).
means this replication agent comes into action only when replicating the content from packagemenager(by clicking replicate for specific package) ? or activating the page/resource from siteadmin?

The Agent User ID property is used to manage what part of content tree will be replicated using given replication queue. This has nothing to do with actual package creation - it applies to all replication process.
Multitenant use case
For the complicated infrastructure it may happen that the multi-tenant architecture involves some sharding approach. Imagine a geo-spread architecture with no CDN involved where the brand site should be quickly accessible from the given localisation. Due to technical limitations, pushing whole content (all sites) around the world might not be acceptable.
Dedicated DAM environment use case
When DAM storage is shared across multiple AEM implementations it is often desired to dettach that from the regular authoring by creating a separated DAM-only instance. On such platform the replication agents should be configured to have the read access to /content/dam only in order not to mess up with other content trees.
Solution
In this case, the user agent ID can be configured to use a dedicated user permission scheme. All the changes the preconfigured user sees will be replicated to the corresponding endpoint. There are technical alternatives like implementing a transport handler (see https://github.com/Cognifide/CQ-Transport-Handler/blob/master/README.md)

Related

How to work with AWS Cognito in production environment?

I am working on an application in which I am using AWS Cognito to store users data. I am working on understanding how to manage the back-up and disaster recovery scenarios for Cognito!
Following are the main queries I have:
I wanted to know what is the availability of this stored user data?
What are the possible scenarios with Cognito, which I need to take
care before we go in production?
AWS does not have any published SLA for AWS Cognito. So, there is no official guarantee for your data stored in Cognito. As to how secure your data is, AWS Cognito uses other AWS services (for example, Dynamodb, I think). Data in these services are replicated across Availability Zones.
I guess you are asking for Disaster Recovery scenarios. There is not much you can do on your end. If you use Userpools, there is no feature to export user data, as of now. Although you can do so by writing a custom script, a built-in backup feature would be much more efficient & reliable. If you use Federated Identities, there is no way to export & re-use Identities. If you use Datasets provided by Cognito Sync, you can use Cognito Streams to capture dataset changes. Not exactly a stellar way to backup your data.
In short, there is no official word on availability, no official backup or DR feature. I have heard that there are feature requests for the same but who knows when they would be released. And there is not much you can do by writing custom code or follow any best practices. The only thing I can think of is that periodically backup your Userpool's user data by writing a custom script using AdminGetUser API. But again, there are rate limits on how many times you can call this API. So, backup using this method can take a long time.
AWS now offers a SLA for Cognito. In the event they are unable to meet their availability target (99.9% at the time of writing), you will receive service credits.
Even through there are couple of third party solutions available, when restoring a user pool users will be created using admin flow (users are not restored rather they will be created from an admin) and they will end up with "Force Change Password" status. So the users will be forced to change the password using the temporary password and that has to be facilitated from the front end of the application.
More info : https://docs.amazonaws.cn/en_us/cognito/latest/developerguide/signing-up-users-in-your-app.html
Tools available.
https://www.npmjs.com/package/cognito-backup
https://github.com/mifi/cognito-backup
https://github.com/rahulpsd18/cognito-backup-restore
https://github.com/serverless-projects/cognito-tool
Pls bear in mind that some of these tools are outdated and can not be used. I have tested "cognito-backup-restore" and it is working as expected.
Also you have to think of how to secure the user information outputted by these tools. Usually they create a json file containing all the user information (except the passwords as passwords can not be backed up) and this file is not encrypted.
The best solution so far is to prevent accidental deletion of user pools with AWS SCPs.

Periodic Email Notifications (Windows Azure .Net)

I have an application written in C# ASP.Net MVC4 and running on Windows Azure Website. I would like to write a service / job to perform following:
1. Read the user information from the website database
2. Build a user-wise site activity summary
3. Generate an HTML email message that includes the summary for each user account
4. Periodically send such emails to each user
I am new to Windows Azure Cloud Services and would like to know best approach / solution to achieve the above.
Based on my study so far, I see that independent Worker Role of Cloud Services along with SendGrid and Postal would be a best fit. Please suggest.
You're on the right track, but... Remember that a Worker Role (or Web Role) is basically a blueprint for a Windows Server VM, and you run one or more instances of that role definition. And that VM, just like Windows Server running locally, can perform a bunch of tasks simultaneously. So... there's no need to create a separate worker role just for doing hourly emails. Think about it: For nearly an hour, it'll be sitting idle, and you'll be paying for it (for however many instances of the role you launch, and you cannot drop it to zero - you'll always need minimum one instance).
If, however, you create a thread on an existing worker or web role, which simply sleeps for an hour and then does the email updates, you basically get this ability at no extra cost (and you should hopefully cause minimal impact to the other tasks running on that web/worker role's instances).
One thing you'll need to do, independent of separate role or reused role: Be prepared for multiple instances. That is: If you have two role instances, they'll both be running the code to check every hour. So you'll need a scheme to prevent both instances doing the same task. This can be solved in several ways. For example: Use a queue message that stays invisible for an hour, then appears, and your code would check maybe every minute for a queue message (and the first one who gets it does the hourly stuff). Or maybe run quartz.net.
I didn't know postal, but it seems like the right combination to use.

Weblogic work manager

I am new to weblogic server. I am using work manager. I want to know what is work manager and why we need it. What is the difference between normal request with out work manager and with work manager !!
I think the documentation is rather good on this subject.
WebLogic Server prioritizes work and allocates threads based on an
execution model that takes into
account administrator-defined
parameters and actual run-time
performance and throughput.
Administrators can configure a set of
scheduling guidelines and associate
them with one or more applications, or
with particular application
components. For example, you can
associate one set of scheduling
guidelines for one application, and
another set of guidelines for other
application. At run-time, WebLogic
Server uses these guidelines to assign
pending work and enqueued requests to
execution threads.
Essentially, with work managers you can attach a scheduling policy to an application to e.g. make sure that a specific application gets a fair share of the available computing resources under a heavy load situation. Or you might want to restict the maximum number of threads that will be allocated to an application to prevent a buggy/untested application to bring the whole application server to its knees. (But surely all apps have been tested not to do anything like that.... ;) )
Outside of modifying the default allocation algorithms, the Work Manager is also useful if you are using a Foreign JMS Provider (such as IBM MQ) and need to process more than 16 messages at a time.

Selective replication with CouchDB

I'm currently evaluating possible solutions to the follwing problem:
A set of data entries must be synchonized between multiple clients, where each client may only view (or even know about the existence of) a subset of the data.
Each client "owns" some of the elements, and the decision who else can read or modify those elements may only be made by the owner. To complicate this situation even more, each element (and each element revision) must have an unique identifier that is equal for all clients.
While the latter sounds like a perfect task for CouchDB (and a document based data model would fit my needs perfectly), I'm not sure if the authentication/authorization subsystem of CouchDB can handle these requirements: While it should be possible to restict write access using validation functions, there doesn't seem to be a way to authorize read access. All solutions I've found for this problem propose to route all CouchDB requests through a proxy (or an application layer) that handles authorization.
So, the question is: Is it possible to implement an authorization layer that filters requests to the database so that access is granted only to documents that the requesting client has read access to and still use the replication mechanism of CouchDB? Simplified, this would be some kind of "selective replication" where only some of the documents, and not the whole database is replicated.
I would also be thankful for directions to some detailed information about how replication works. The CouchDB wiki and even the "Definite Guide" Book are not too specific about that.
this begs for replication filters. you filter outbound replication based on whatever criteria you impose, and give the owner of the target unrestricted access to their own copy.
i haven't had the opportunity to play with replication filters directly, but the idea would be that each doc would have some information about who has access to it, and the filtering mechanism would then allow outbound replication of only those documents that you have access to. replication from the target back to the master would be unrestricted, allowing for the master to remain a rollup copy, and potentially multicast changes to overlapping sets of data.
What you are after is replication filters. According to Chris Anderson, it is a 0.11 feature.
"The current status is that there is
an API for filtering the _changes
feed. The replicator in 0.10 consumes
the changes feed, so the next step is
getting the replicator to use the
filter API.
There is work in progress on this, so
it should be fully ready to go in
0.11."
See the orginal post
Here is a new link to the some documentation about this:
http://blog.couchbase.com/what%E2%80%99s-new-apache-couchdb-011-%E2%80%94-part-three-new-features-replication
Indeed, as others have said, replication filters are the way to go for this. Here is a link with some information on using them.
One caveat I would add is that at scale replication filters can be extremely slow. More information about this and other nuances about couchdb can be found in this excellent blog post: "what every developer should know about couchdb". For large scale systems performing replication in the application layer has proven faster and more reliable.

What does LDAP solve?

I've been in touch with LDAP in many projects I've been involved in but, the truth be told, I don't really understand it. I thought it was just a person directory but after I discovered that it can contain any objects in a hierarchical structure.
I installed openldap in my box and I found many tutorials regarding just the installation.
What is LDAP? What are the scenarios where LDAP is the right choice? What are the LDAP concepts I should know for working with it? What are the advantages of LDAP? Is it used just because old applications used it? Is there a good doc anywhere on internet explaining all this questions?
UPDATE:
Complementing the answers I found this link which contains a quick start guide for LDAP newbie like me.
What is LDAP? What are the scenarios where LDAP is the right choice?
At its core, LDAP is a protocol for accessing objects that are suitable for storage in a directory. Whether something is "suitable" is an entirely subjective determination that's left up to implementers, but typically this means collections of many objects that each have infrequently (or never) updated data, where each object has an obvious or canonical way to be looked up:
a phone book (look up by name or by phone number)
titles in a library (look up by title, author, etc.)
tenants in a building (look up by floor, suite, name, etc.)
and so on.
Note that LDAP itself is just a protocol and doesn't provide any actual storage -- in much the same way, HTTP doesn't imply anything about whether you're using Apache, Jetty, Tomcat, Mongrel, et al. as a web server. (One problem with LDAP in general is the confusing reuse of names to mean different things. Wikipedia has a good section on this.)
DITs are a hierarchical description scheme that lend themselves to B-Tree algos very nicely, resulting in tremendous search performance in most cases. Directory Server like OpenDS return indexed searches in micro-seconds, whereas RDBMS systems are much slower. Directory Servers (often called LDAP servers) trade resources (RAM, CPU) for fast read response. RDBMS systems provide greater functionality in terms of management of data in question. Need speed with few or zero updates, simplicity, and small network protocol? Use a Directory Server. Need data management and mining capabilities, and/or high rate-of-change of the database with relational aspects defined between data? Use an RDBMS (MySQL is your best bet here).
LDAP has O(1) read performance, in exchange for O(something worse) write performance. It's ideal for data that's accessed frequently, but changed rarely - directories of people, machine names and addresses, and so on. (hence the acronym: Lightweight Directory Access Protocol.)
LDAP is the right choice where the pain of using a database that isn't relational, in terms of decreased developer familiarity and strange performance characteristics, is less than the gain of blindingly fast read access.
This link will explain LDAP http://blogs.oracle.com/raghuvir/entry/ldap
We use LDAP in our office for email address lookups company wide. We use it as a single source sign on service for our internal apps as well.
One perspective I like to harp on is LDAP is an app on top of a persistence store and a database is a persistence store. Both can be used to store user information.
LDAP gives you a hierarchy which is harder to do in a database. You can make a hierarchy in a database but it's harder to do things like delegation (these rows belong to you only) or ACLs on rows. So pushing security problems out of the database is easier if you use LDAP for storing user identities. Trying to solve it in the database is weird.
At the same time, LDAP is terrible for reporting against (transform LDAP to a DB for reporting). Storing attributes deep in the tree that need to be searched quickly can be problematic for performance (don't do this, have a DB on the side or try to flatten the query by redesigning your DIT). Storing attributes all over the place in a really deep DIT is just bad LDAP or system design but sometimes it's unavoidable if you're tied to a vendor product or legacy app.
LDAP is just a protocol, the wikipedia article explains it adequately http://en.wikipedia.org/wiki/Lightweight_Directory_Access_Protocol
Its a way to query an underlying organizational structure like Microsoft's Active Directory. You can use LDAP queries to get all kinds of information about users, use it for setting application rights, etc.
I am working part time and a full time student. My curriculum encourages (read requires) many group projects.
I have used openLdap and phpLdapAdmin to control access to my Subversion and Mercurial repos, Trac projects, Hudson, etc. It wasn't easy to install, but the time saved in administration was a God send.
If you have projects where you will have many groups of people who need to be able to use different resources, it is a good tool.
See this link :
http://www.umich.edu/~dirsvcs/ldap/doc/guides/slapd/1.html#RTFToC1
Which explains deeply LDAP :
For example you can see this image in that documentation ,
(source: dirsvcs at www.umich.edu)
LDAP is an access protocol; it only provides an API to the underlying technology for which you are trying to find applications - a directory service. OpenLDAP is one of the open source directory services; Sun has another implementation called OpenDS. Active Directory and Novell NDS are another two commonly seen in the field.
The directory can be used for storing information about any sort of resource, and the relationships between the resources - for example, rights of a user to a directory, a printer, or a network access device.
Is there a good doc anywhere on internet explaining all this questions?
IBM published an excellent Red Book about LDAP. The title is:
Understanding LDAP - Design and Implementation.
It can be downloaded from the previous link.
In one of my old workplaces we used LDAP as our primary user authentication system.
This in turn provided our various systems with information which dept. they belonged to, where they should mount their home directories, contact information, employee management.
Not necessarily controlled by LDAP, but other things that we had mixed to work through LDAP was the existence of SQL users, K4, samba and email account generation.