How to delete one of the persistent volume claim attached to one Storage class - azure-storage

I created a 'file share' in the storage account and connected it to my Kubernetes cluster (AKS). I have connected as a dynamic file share following the example from this link:
https://github.com/HoussemDellai/aks-file-share/tree/main/dynamic-file-share.
Thus, I created a Storage Class. And then I made several Persistent Volume Claims in each required namespace.
I added Persistent Volume Claim to my deployment for each application and everything works fine. All my applications in all the necessary namespaces see the shared folder and can write and read to it.
Now I want to remove or detach this storage for one of the namespaces. If I delete one of the Persistent Volume Claims, then the entire file share in the storage account is deleted. But I, for example, need to delete all links and references to this resource in only one namespace. What I should do?
And the second question, after I deleted, for example, all references in the namespace to a file share, after a while if I want to create it again, what steps should I take here?
Once again I will emphasize the essence of the question, the main thing in all this is to remove the link somehow, Persistent Volume Claim or something else to the file share in only one namespac, so that in all the rest all the referring resources and Persistent Volume Claim continue to work correctly with the same 'file share'!

For your first question, I think you just need to change the reclaimpolicy of your storageclass from default ( which is Delete) to Retain.
See https://faun.pub/kubernetes-how-to-set-reclaimpolicy-for-persistentvolumeclaim-7eb7d002bb2e for more explanation.

Related

When AEM is configured to use a S3 data store will it make blue-green deployments faster?

Background
We know it's possible to setup a devops pipeline that deploys updates to AEM via a blue/green approach by using crx2oak to migrate the content from old to new environment. Why is out of scope of this question.
The problem with this approach is the content copy operation can take a significant time, as the amount of content in the JCR grows. Other ideas to mittigate this are appreciated.
We also know that AEM can have a S3 datastore that off-loads the binary content into a S3 bucket which would not be re-built during blue/green deployment as per:
https://helpx.adobe.com/experience-manager/6-3/sites/deploying/using/storage-elements-in-aem-6.html#OverviewofStorageinAEM6
What is unclear from Adobe's documentation is whether the same S3 bucket can be shared across AEM instances (i.e. blue/green instances). Maybe it's just my google fu that has failed...
Question(s)
When a new AEM instance is configured to use a S3 datastore that already has content in it from the old instance, when crx2oak is used to migrate content, will the new instance be able to access the existing content?
Are there any articles/blogs that describe what the potential time savings of this approach would be?
Yes I could do an experiment, and may do so in the future to answer my own question. I'm looking for information from anyone who has already done this? I'm an engineer so will not re-invent the wheel if someone else has done so.
You can certainly share the same S3 bucket between instances - in fact, this is commonly used along with binary-less replication from author->publisher(s) and is a tried and true configuration.
It's even possible to share the same bucket between completely different environments (e.g. DEV/STAGE, or BLUE/GREEN in your case). The main "gotcha" to be aware of is with regard to DataStore Garbage Collection (DSGC) because it's very possible that there will be blobs which are referenced by only some of the instances sharing the bucket and so when purging unused blobs this needs to be taken into account.
This is all part of the design though, and there is a flag designed specifically for this purpose which tells DSGC to only execute the first phase (the "mark" phase) of GC, and skip the 2nd "sweep" phase, until all instances have marked which blobs they wish to keep/discard. Once all instances have done so the sweep phase can be run to purge blobs not needed by any instances using the bucket.
For a more detailed explanation see the Oak docs:
https://jackrabbit.apache.org/oak/docs/plugins/blobstore.html#Shared_DataStore_Blob_Garbage_Collection_Since_1.2.0
I find it helps to understand that pretty much all of the datastore implementations are done such that blobs are stored according to their checksum, so the same file added uploaded twice will only have one copy stored in the datastore, and there will be two segment store records referencing that same blob. In the same way, multiple AEM instances sharing the same bucket will be able to find a given blob regardless of which instance put it there in the first place.
You can observe see this in action easily with FileDataStore by finding a blob and sha256'ing it - e.g. (this example is on OS X, the checksum command on Linux/Windows will be slightly different):
$ shasum -a256 crx-quickstart/repository/datastore/0c/9e/40/0c9e405fc8d0f0405930cd0044611cfbf014938a1837ae0cfaa266d7732d1002
0c9e405fc8d0f0405930cd0044611cfbf014938a1837ae0cfaa266d7732d1002 crx-quickstart/repository/datastore/0c/9e/40/0c9e405fc8d0f0405930cd0044611cfbf014938a1837ae0cfaa266d7732d1002
There you can see that a) the filename is the checksum, and b) it's nested using the first 3 pairs of characters from that checksum, so you can locate the file by just knowing the hash and if you store the same binary, even if the name or JCR metadata is different, the blob referenced will be the same literal file on disk.
From memory S3 datastore uses prefixes rather than directory nesting because this performance better, but the principle is the same.
Finally, a couple of things to consider are:
1) S3 storage is relatively cheap (and practically unlimited) so there is an argument to be made that it's not as necessary to perform regular DSGC unless you're really trying to pinch pennies.
2) If you do run DSGC you need to think about how this will work with whatever backup strategy you're using for the AEM instances. For instance, if you roll back a segment store shortly after running DSGC you'll likely have to recover some of those purged blobs. You can use versioning and/or lifecycle rules to help with this, but it can add significant additional complexity and time to your restore process.
If you opt to simply skip DSGC and leave the blobs there indefinitely it's a good idea to make sure the access key or IAM roles AEM is using doesn't have the DeleteObject permission for the bucket, just to be sure a rogue GC process can't delete anything.
Hope this helps.
Edit
In all that I forgot to actually answer your question - yes it will save some time in cloning in most cases. You'll still need to sync the segment store (obviously) and there are various approaches for this. crx2oak is certainly one - you'll see in the documentation there are specific options for using it w/ S3 where you supply a configuration file (basically a serialised .config file like you'd use with Felix/OSGi).
You can also use something like rsync to simply copy the TAR files over (while at least the target AEM is stopped. Oak is generally atomic so a hot copy from the source can work in theory, but YMMV).
Finally you could obviously use Mongo and cluster the segment store that way, but all the usual cost/complexity/performance issues with doing so apply).
Another interesting development on the horizon for blue/green type is the CompositeNodeStore - there is a good talk from the 2017 adaptTo() conference that talks about this:
https://adapt.to/2017/en/schedule/zero-downtime-deployments-for-the-sling-based-apps-using-docker.html
An external datastore will help a lot, as usually the most space is used by binary assets. The pure content typed in by real people is much less.
On my current project (quite small, but relations should be normal):
Repository 4,8 GB total (4.1 GB Segment Store, 780 MB Index)
File DataStore 222 GB total
If you wanna do it, I have the following remarks:
There are different datastores available. For testing I would start with the File DataStore.
The S3 DataStore makes only sense in my point of view, if you are hosting at Amazons AWS anyway. Adobe Managed Services is doing this, and so S3 makes sense for them. But also there only if you have more than 500 GB assets.
If you use the green/blue approach, then be careful the DataStore garbage collection (just do it manually). The shared Datastore is meant for several publishers, that have the same content. As example you could have the following situation: Your editors delete some assets, you run the DataStore GC and finally your rollback your environment. That means the assets are still in the content repository, but the binaries are cleaned out of the DataStore.
In order to to use a shared file datastore, you need to do the following:
Unpack Quickstart java -jar AEM_6.3_Quickstart.jar -unpack
Create an directory for the file datastore (anywhere outside of the crx-quickstart folder)
Create a directory install inside the extracted crx-quickstart folder
Create a file called org.apache.jackrabbit.oak.plugins.blob.datastore.FileDataStore.cfg inside this install folder
This file contains just 1 line path=<path to file datastore> (see https://jackrabbit.apache.org/oak/docs/osgi_config.html)
Place a reference.key file inside the datastore directory. First time it will be created automatically. But if you use always the same key, the same hash-values are used all datastores across all your environments. This is also a prerequisite for a feature called "binary-less replication" (so binary would only be replicated the first time between author and publisher)
kind regards,
Alex

CloudTrail RunInstances event, who actually provisioned EC2 instance when STS AssumeRole used?

My client is in need of an AWS spring cleaning!
Before we can terminate EC2 instances, we need to find out who provisioned them and ask if they are still using the instance before we delete them. AWS doesn't seem to provide out-of-the-box features for reporting who the 'owner'/'provisioner' of an EC2 instance is, as I understand, I need to parse through gobs of archived zipped log files residing in S3.
Problem is, their automation is making use of STS AssumeRole to provision instances. This means the RunInstances event in the logs doesn't trace back to an actual user (correct me if I'm wrong, please please I hope I am wrong).
AWS blog provides a story of a fictional character, Alice, and her steps tracing a TerminateInstance event back to a user which involves 2 log events: The TerminateInstance event and an event "somewhere around the time" of an AssumeRole event containing the actual user details. Is there a pragmatic approach one can take to correlate these 2 events?
Here's my POC that's parsing through a cloudtrail log from s3:
import boto3
import gzip
import json
boto3.setup_default_session(profile_name=<your_profile_name>)
s3 = boto3.resource('s3')
s3.Bucket(<your_bucket_name>).download_file(<S3_path>, "test.json.gz")
with gzip.open('test.json.gz','r') as fin:
file_contents = fin.read().replace('\n', '')
json_data = json.loads(file_contents)
for record in json_data['Records']:
if record['eventName'] == "RunInstances":
user = record['userIdentity']['userName']
principalid = record['userIdentity']['principalId']
for index, instance in enumerate(record['responseElements']['instancesSet']['items']):
print "instance id: " + instance['instanceId']
print "user name: " + user
print "principalid " + principalid
However, the details are generic since these roles are shared by many groups. How can I find details of the user before they Assumed Role in a script?
UPDATE: Did some research and it looks like I can correlate the Runinstances event to an AssumeRole event by a shared 'accessKeyId' and that should show me the account name before it assumed a role. Tricky though. Not all RunInstances events contain this accessKeyId, for example, if 'invokedby' was an autoscaling event.
Direct answer:
For the solution you are proposing, you are unfortunately out of luck. You can take a look at http://docs.aws.amazon.com/IAM/latest/UserGuide/cloudtrail-integration.html#w28aac22b9b4b7b3b1. On the 4th row, it says that the Assume Role will save the Role identity only for all subsequent calls.
I'd contact aws support to make sure of this as I might very well be mistaken.
What I would do in your case:
First, wait a couple of days in case someone had a better idea or I was mistaken and aws support answers with an out-of-the-box solution
Create an aws config rule that would delete all instances that have a certain tag. Then tell your developers to tag all instances that they are sure that should be deleted, then these will get deleted
Tag all the production instances and still needed development instances with a tag of their own
Run a script that would tag all of the untagged instances with a separate tag. Douple and triple check these instances.
Back up and turn off the instances tagged in step 3 (without
deleting the instances).
If someone complained about something not being on, that means they
missed an instance in step 1 or 2. Tag this instance correctly and
turn it on again.
After a while (a week or so), delete the instances that are still
stopped (keep the backups)
After a couple months, delete the backups that were not restored
Note that this isn't foolproof as it has the possibility of human error and possible downtime, so double and triple check, make a clone of the same environment and test on that (if you have a development environment that already has such a configuration, that would be the best scenario), take it slow to be able to monitor everything, and be sure to keep backups of everything.
Good luck and plzz tell me what your solution ended up being.
General guidelines for the future:
Note: The following points are very opiniated, and are general rules that I abide by as I find them saving me a load of trouble from time to time. Read them, dismiss what you find as unfit for you and take the things that you find reasonable.
Don't use assume role that often as it obfuscates user access. In case it was a script run on the developer's pc, let it run with their own username. If it's running on a server, keep it with the role it was created in. The amount of management will be less that way as you just cut the middle-man (the assume-role) and don't need to create roles anymore, just assign the permissions to the correct group/user. Take a look below for when I'd consider using the assume-role as a necessity.
Automate deletions. The first things you should create is automating the task of keeping the aws account as clean as possible as this would save both $$$ and debugging pain. Tags and scripts to act on these tags are very powerful tools. So if a developer needs an instance for a day to try out something new, he can create a tag that times the instance out, then there is a script that cleans it up when the time comes. These are project-specific, and not everyone needs all of these, so see and assess what you need for your project and act on them.
What I'd recommend is giving the permissions to the users themselves in the development environment as it would make tracking things to their root and finding the most knowledgeable person to solve things easier. As of the production environment, everything should be automated anyway (creation when needed and deletion when no longer needed) and no one should have any write access to that account, ever.
As for the assume-role, I only use it in case I want to give access to read-only production logs on another account. Another case would be something that really shouldn't be happening that often, if at all, but still need to give some users access to it. So, as an extra layer of protection against the 'I did it by mistake', I make them switch role to do it, and never have a script that automatically switches roles and do the action in an attempt to make it as deliberate as possible (think deleting a database and such). Another thing would be accessing sensitive information (credit-card database, etc.). Many more scenarios can occur, and here it comes to your judgement.
Again, Good Luck.

Delete all Storage Files in Backand

I need to delete all the storage files in the backand...
In Panel > Hosting > Storage Filess - it's possible to remove only one per.
In Actions it's possible to remove only one per to.
files.delete(**filename**);
When you delete an app from Backand, it will also delete any stored files for your application. There should be no additional actions needed on your part. If you still have concerns, you can use the command-line interface to point your hosting files at an empty directory, which will have a side effect of deleting the existing files. More information is available at http://docs.backand.com/en/latest/getting_started/hosting/index.html

Who can check in and deliver to the stream in a repo workspace? Only the owner?

In the RTC only owner of the Repository work space can check in the code ?
Other members can only take updates by changing the flow target.
Can I provide access for more than one user in a repository work space (Multiple owners) to deliver the code.
If i create a scoped work space , other members can only view the changes and take updates from it. Is it correct?
In the RTC only owner of the Repository work space can check in the code ?
Yes, the idea is for an owner to not have the surprise, when coming in the morning, to discover his/her own repo workspace with new checkins or having done new deliver without his/her knowledge.
Can I provide access for more than one user in a repository work space (Multiple owners) to deliver the code.
Not to deliver: you need to accept change set from that repo workspace, and publish said change sets from your very own repo workspace (that I recommend to be a scoped one, not a private one)
If i create a scoped work space , other members can only view the changes and take updates from it. Is it correct.
Yes, but the natural flow is to deliver to the stream, or to accept change set from the stream (only one source).
You change that flow target to a colleague repo workspace only if you need it (ie, not every morning, but in the rare occurrence when that colleague has committed some interesting change set but has not deliver them to the stream yet).
The rest of the time, you sync your own (scoped) repo workspace only from the Stream.

Streaming from iCloud?

Is it possible to recreate a scenario like itunes match with iCloud APIs available to-date (i am writing/editing this in april 2012) .. specifically..I mean something like this>
A user creates a media document (audio or video) in my app...and it is automatically uploaded to (his) iCloud space. Then user decides to delete his local copy. But the app still shows the document in let say a table view...and if the user presses play..it begins to stream from iCloud..The user also is able to recreate (download) a local copy of the document. (Just to be sure, I understand the difference between the document I describe here and the concept of UIDocument).
If yes..how would I implement the transfer of the file ..let say a recording of 1 minute video..to the cloud? What folder would it be?
Apple docs state that there shouldnt be a distinction between where the data is stored. No sense of local copy of cloud copy. I think it is possible to do what you ask, but you're likely to get rejected. Use something like amazon AWS for hosting instead. You'd have more control over the files and unless you're going to have tons of users, you'll also qualify for the free tier.