AWS Glue Access denied for crawler with administrator policy attached

AWS Glue Access denied for crawler with administrator policy attached - amazon-s3

I am trying to run a crawler across an s3 datastore in my account which contains two csv files. However, when I try to run the crawler, no tables are loaded, and I see the following errors in cloudwatch for the each of the files:
Error Access Denied (Service: Amazon S3; Status Code: 403; Error
Code: AccessDenied;
Tables created did not infer schemas from this file.
This is especially odd as the IAM role has the AdministratorAccess policy attached, so there should not be any access denied issue.
Any help would be appreciated.

Check to see if the files you are crawling are encrypted. If they are, then your Glue role probably doesn't have a policy that allows it to decrypt.
If so, it might need something like this:
{
"Version": "2012-10-17",
"Statement": {
"Effect": "Allow",
"Action": [
"kms:Decrypt"
],
"Resource": [
"arn:aws:kms:us-west-2:111122223333:key/1234abcd-12ab-34cd-56ef-1234567890ab",
"arn:aws:kms:us-west-2:111122223333:key/0987dcba-09fe-87dc-65ba-ab0987654321"
]
}
}

Make sure the policies attached to you IAM role have these :
AmazonS3FullAccess
AwsGlueConsoleFullAccess
AwsGlueServicerole.

We had a similar issue with an S3 crawler. According to AWS, S3 crawlers, unlike JDBC crawlers, do not create an ENI in your VPC. This means your bucket policy must allow access from outside the VPC.
Check that your bucket policy does not have an explicit deny somewhere on S3:*. If there is one, make sure to add a conditional on the statement and add the role id in the conditional as aws:userId in the statement. Keep in mind the role id and role arn is not the same thing.
To get the role id:
aws iam get-role --role-name Test-Role
Output:
{
"Role": {
"AssumeRolePolicyDocument": "<URL-encoded-JSON>",
"RoleId": "AIDIODR4TAW7CSEXAMPLE",
"CreateDate": "2013-04-18T05:01:58Z",
"RoleName": "Test-Role",
"Path": "/",
"Arn": "arn:aws:iam::123456789012:role/Test-Role"
}
}
You might also need to add a state that allows s3:putObject* and s3:getObject* with the aws principal the assumed role. The assumed role will look something like:
arn:aws:sts::123456789012:assumed-role/Test-Role/AWS-Crawler
Hope this helps.

In my case the issue was: the crawler was configured in different region than S3 bucket it meant to crawl. After configuring new crawler in the same region as my S3 bucket the problem was resolved.

This is an S3 bucket policy issue. I made my tables public (bad policy I know) and it worked.

IAM Roles
Here are the complete roles you need to give in order for Glue Crawler to work properly.
IAM Roles

I made sure that I wasn't missing something offered in the other suggestions, but I wasn't. It turns out there was another level of restrictions on reading the bucket imposed by my organization, though i'm not sure what it was.

Related

How do I edit a bucket policy deployed by organizational-level CloudTrail

we have a multi-account setup where we deployed an organizational-level CloudTrail in our root account's Control Tower.
Organizational-level CloudTrail allows us to deploy CloudTrail in each of our respective accounts and provides them the ability to send logs to CloudWatch in our Root account and to an S3 logging bucket in our central logging account.
Now I have AWS Athena set up in our logging account to try and run queries on the logs generated through our organizational-level CloudTrail deployment. So far, I have managed to create the Athena Table that is built on the mentioned logging bucket and I also created a destination bucket for the query results.
When I try to run a simple "preview table" query, I get the following error:
Permission denied on S3 path: s3://BUCKET_NAME/PREFIX/AWSLogs/LOGGING_ACCOUNT_NUMBER/CloudTrail/LOGS_DESTINATION
This query ran against the "default" database, unless qualified by the query. Please post the error message on our forum or contact customer support with Query Id: f72e7dbf-929c-4096-bd29-b55c6c41f582
I figured that the error is caused by the logging bucket's policy lacking any statement allowing Athena access, but when I try to edit the bucket policy I get the following error:
Your bucket policy changes can’t be saved:
You either don’t have permissions to edit the bucket policy, or your bucket policy grants a level of public access that conflicts with your Block Public Access settings. To edit a bucket policy, you need s3:PutBucketPolicy permissions. To review which Block Public Access settings are turned on, view your account and bucket settings. Learn more about Identity and access management in Amazon S3
This is strange since the role I am using has full admin access to this account.
Please advise.
Thanks in advance!

I see this is is a follow up question to your previous one: S3 Permission denied when using Athena
Control Tower guardrail automatically deploys a guardrail which prohibits updating the aws-controltower bucket policy.
In your master account, go to AWS Organizations. Then, go to your Security OU. Then go to Policies tab. You should see 2 guardrail policies:
One of them will contain this policy:
{
"Condition": {
"ArnNotLike": {
"aws:PrincipalARN": "arn:aws:iam::*:role/AWSControlTowerExecution"
}
},
"Action": [
"s3:PutBucketPolicy",
"s3:DeleteBucketPolicy"
],
"Resource": [
"arn:aws:s3:::aws-controltower*"
],
"Effect": "Deny",
"Sid": "GRCTAUDITBUCKETPOLICYCHANGESPROHIBITED"
},
Add these principals below AWSControlTowerExecution:
arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/*/AWSReservedSSO_AWSAdministratorAccess*
arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/*/AWSReservedSSO_AdministratorAccess*
Your condition should look like this:
"Condition": {
"ArnNotLike": {
"aws:PrincipalArn": [
"arn:aws:iam::*:role/AWSControlTowerExecution",
"arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/*/AWSReservedSSO_AWSAdministratorAccess*",
"arn:aws:iam::*:role/aws-reserved/sso.amazonaws.com/*/AWSReservedSSO_AdministratorAccess*"
]
}
},
You shoulld be able to update the bucket after this is applied.

AWS S3 Bucket Policy Source IP not working

I've been trying all possible options but with no results. My Bucket Policy works well with aws:Referer but it doesn't work at all with Source Ip as the condition.
My Server is hosted with EC2 and I am using the Public IP in this format xxx.xxx.xxx.xxx/32 (Public_Ip/32) as the Source Ip parameter.
Can anyone tell me what I am doing wrong?
Currently my Policy is the following
{
"Version": "2008-10-17",
"Id": "S3PolicyId1",
"Statement": [
{
"Sid": "IPDeny",
"Effect": "Deny",
"Principal": {
"AWS": "*"
},
"Action": "s3:*",
"Resource": "arn:aws:s3:::my_bucket/*",
"Condition": {
"NotIpAddress": {
"aws:SourceIp": "xx.xx.xxx.xxx/32"
}
}
}
]
}
I read all examples and case studies but it doesn't seem to allow access based on Source IP...
Thanks a lot!!!

While I won't disagree that policies are better than IP address wherever possible, the accepted answer didn't actually achieve the original question's goal. I needed to do this (I need access from a machine that wasn't EC2, and thus didn't have policies).
Here is a policy that only allows a certain (or multiple IPs) to access a bucket's object. This assumes that there is no other policy to allow access to the bucket (by default, buckets grant no public access).
This policy also does not allow listing. Only if you know if the full url to the object you need. If you need more permissions, just add them to the Action bit.
{
"Id": "Policy123456789",
"Statement": [
{
"Sid": "IPAllow",
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:GetObject"
],
"Resource": "arn:aws:s3:::mybucket/*",
"Condition" : {
"IpAddress" : {
"aws:SourceIp": [
"xx.xx.xx.xx/32"
]
}
}
}
]
}

From the discussion on the comments on the question, it looks like your situation can be rephrased as follows:
How can I give an specific EC2 instance full access to an S3 bucket, and deny access from every other source?
Usually, the best approach is to create an IAM Role and launch your EC2 instance associated with that IAM Role. As I'm going to explain, it is usually much better to use IAM Roles to define your access policies than it is to specify source IP addresses.
IAM Roles
IAM, or Identity and Access Management, is a service that can be used to create users, groups and roles, manage access policies associated with those three kinds of entities, manage credentials, and more.
Once you have your IAM role created, you are able to launch an EC2 instance "within" that role. In simple terms, it means that the EC2 instance will inherit the access policy you associated with that role. Note that you cannot change the IAM Role associated with an instance after you launched the instance. You can, however, modify the Access Policy associated with an IAM Role whenever you want.
The IAM service is free, and you don't pay anything extra when you associate an EC2 instance with an IAM Role.
In your situation
In your situation, what you should do is create an IAM Role to use within EC2 and attach a policy that will give the permissions you need, i.e., that will "Allow" all the "s3:xxx" operations it will need to execute on that specific resource "arn:aws:s3:::my_bucket/*".
Then you launch a new instance with this role (on the current AWS Management Console, on the EC2 Launch Instance wizard, you do this on the 3rd step, right after choosing the Instance Type).
Temporary Credentials
When you associate an IAM Role with an EC2 instance, the instance is able to obtain a set of temporary AWS credentials (let's focus on the results and benefits, and not exactly on how this process works). If you are using the AWS CLI or any of the AWS SDKs, then you simply don't specify any credential at all and the CLI or SDK will figure out it has to look for those temporary credentials somewhere inside the instance.
This way, you don't have to hard code credentials, or inject the credentials into the instance somehow. The instance and the CLI or SDK will manage this for you. As an added benefit, you get increased security: the credentials are temporary and rotated automatically.
In your situation
If you are using the AWS CLI, you would simply run the commands without specifying any credentials. You'll be allowed to run the APIs that you specified in the IAM Role Access Policy. For example, you would be able to upload a file to that bucket:
aws s3 cp my_file.txt s3://my_bucket/
If you are using an SDK, say the Java SDK, you would be able to interact with S3 by creating the client objects without specifying any credentials:
AmazonS3 s3 = new AmazonS3Client(); // no credentials on the constructor!
s3.putObject("my_bucket", ........);
I hope this helps you solve your problem. If you have any further related questions, leave a comment and I will try to address them on this answer.

amazon s3 invalid principal in bucket policy

I'm trying to create a new bucket policy in the Amazon S3 console and get the error
Invalid principal in policy - "AWS" : "my_username"
The username I'm using in principal is my default bucket grantee.
My policy
{
"Id": "Policy14343243265",
"Statement": [
{
"Sid": "SSdgfgf432432432435",
"Action": [
"s3:DeleteObject",
"s3:DeleteObjectVersion",
"s3:GetObject",
"s3:GetObjectVersion",
"s3:GetObjectVersionAcl",
"s3:PutObject",
"s3:PutObjectAcl",
"s3:PutObjectVersionAcl"
],
"Effect": "Allow",
"Resource": "arn:aws:s3:::my_bucket/*",
"Principal": {
"AWS": [
"my_username"
]
}
}
]
}
I don;t understand why I'm getting the error. What am I doing wrong?

As the error message says, your principal is incorrect. Check the S3 documentation on specifying Principals for how to fix it. As seen in the example policies, it needs to be something like arn:aws:iam::111122223333:root.

I was also getting the same error in the S3 Bucket policy generator. It turned out that one of the existing policies had a principal that had been deleted. The problem was not with the policy that was being added.
In this instance, to spot the policy that is bad you can look for a principal that does not have an account or a role in the ARN.
So, instead of looking like this:
"Principal": {
"AWS": "arn:aws:iam::123456789101:role/MyCoolRole"
}
It will look something like this:
"Principal": {
"AWS": "ABCDEFGHIJKLMNOP"
}
So instead of a proper ARN it will be an alphanumeric key like ABCDEFGHIJKLMNOP. In this case you will want to identify why the bad principal was there and most likely modify or delete it. Hopefully this will help someone as it was hard to track down for me and I didn't find any documentation to indicate this.

Better solution:
Create an IAM policy that gives access to the bucket
Assign it to a group
Put user into that group
Instead of saying "This bucket is allowed to be touched by this user", you can define "These are the people that can touch this".
It sounds silly right now, but wait till you add 42 more buckets and 60 users to the mix. Having a central spot to manage all resource access will save the day.

The value for Principal should be user arn which you can find in Summary section by clicking on your username in IAM.
It is because so that specific user can bind with the S3 Bucket Policy
In my case, it is arn:aws:iam::332490955950:user/sample ==> sample is the username

I was getting the same error message when I tried creating the bucket, bucket policy and principal (IAM user) inside the same CloudFormation stack. Although I could see that CF completed the IAM user creation before even starting the bucket policy creation, the stack deployment failed. Adding a DependsOn: MyIamUser to the BucketPolicy resource fixed it for me.

Why am I getting the error "Invalid principal in policy" when I try to update my Amazon S3 bucket policy?
Issue
I'm trying to add or edit the bucket policy of my Amazon Simple Storage Service (Amazon S3) bucket using the web console, awscli or terraform (etc). However, I'm getting the error message "Error: Invalid principal in policy." How can I fix this?
Resolution
You receive "Error: Invalid principal in policy" when the value of a Principal in your bucket policy is invalid. To fix this error, review the Principal elements in your bucket policy. Check that they're using one of these supported values:
The Amazon Resource Name (ARN) of an AWS Identity and Access Management (IAM) user or role --
Note: To find the ARN of an IAM user, run the [aws iam get-user][2] command. To find the ARN of an IAM role, run the [aws iam get-role][2] command or just go and check it from the IAM service in your account web console UI.
An AWS account ID
The string "*" to represent all users
Additionally, review the Principal elements in the policy and check that they're formatted correctly. If the Principal is one user, the element must be in this format:
"Principal": {
"AWS": "arn:aws:iam::AWS-account-ID:user/user-name1"
}
If the Principal is more than one user but not all users, the element must be in this format:
"Principal": {
"AWS": [
"arn:aws:iam::AWS-account-ID:user/user-name1",
"arn:aws:iam::AWS-account-ID:user/user-name2"
]
}
If the Principal is all users, the element must be in this format:
{
"Principal": "*"
}
If you find invalid Principal values, you must correct them so that you can save changes to your bucket policy.
Extra points!
AWS Policy Generator
Bucket Policy Examples
Ref-link: https://aws.amazon.com/premiumsupport/knowledge-center/s3-invalid-principal-in-policy-error/

I was facing the same issue when I've created a bash script to initiate my terraform s3 backend. After a few hours I've decided just to put sleep 5 after user creation and that made sense, you can notice it at the line 27 of my script

If you are getting the error Invalid principal in policy in S3 bucket policies, the following 3 steps are the way to resolve it.
1 Your bucket policy uses supported values for a Principal element
The Amazon Resource Name (ARN) of an IAM user or role
An AWS account ID
The string "*" to represent all users
2 The Principal value is formatted correctly
If the Principal is one user
"Principal": {
"AWS": "arn:aws:iam::111111111111:user/user-name1"
}
If the Principal is more than one user but not all users
"Principal": {
"AWS": [
"arn:aws:iam::111111111111:user/user-name1",
"arn:aws:iam::111111111111:user/user-name2"
]
}
If the Principal is all users
{
"Principal": "*"
}
3 The IAM user or role wasn't deleted
If your bucket policy uses IAM users or roles as Principals, then confirm that those IAM identities weren't deleted. When you edit and then try to save a bucket policy with a deleted IAM ARN, you get the "Invalid principal in policy" error.
Read more here.

FYI: If you are trying to give access to a bucket for a region that is not enabled it will give the same error.
From AWS Docs: If your S3 bucket is in an AWS Region that isn't enabled by default, confirm that the IAM principal's account has the AWS Region enabled. For more information, see Managing AWS Regions.
If you are trying to give Account_X_ID access to the my_bucket like below. You need to enable the region of my_bucket on Account_X_ID.
"Principal": {
"AWS": [
"arn:aws:iam::<Account_X_ID>:root"
]
}
"Resource": "arn:aws:s3:::my_bucket/*",
Hope this helps someone.

Why might boto be denied access to S3 with proper IAM keys?

I am trying to access a bucket on S3 with boto. I have been given read access to the bucket and my keys are working when I explore it in S3 Browser. The following code is returning 403 Forbidden Access Denied.
conn = S3Connection('Access_Key_ID', 'Secret_Access_Key')
conn.get_all_buckets()
This also occurs when using the access key and secret access key via the boto config file. Is there something else I need to be doing because the keys are from IAM perhaps? Could this indicate an error in the setup? I don't know much about IAM, I was just given the keys.

Some things to check...
If you are using boto, be sure you are using conn.get_bucket(bucket_name) to access only the bucket you have permission to access.
In your IAM (user) policy, if you are restricting access to a single
bucket, be sure that the policy includes adequate permissions to the
bucket and do not include a trailing slash+asterisks for the ARN name (see example below).
Be sure to set "Upload/Delete" permissions for "Authenticated Users" in S3 for the bucket.
Permissions sample:
IAM policy sample:
NOTE: The SID will be automatically generated when using the policy generator
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"s3:*"
],
"Sid": "Stmt0000000000001",
"Resource": [
"arn:aws:s3:::myBucketName"
],
"Effect": "Allow"
}
]
}

My guess is that it's because you're calling conn.get_all_buckets() instead of conn.get_bucket(bucket_name) for the individual bucket you have access to.

from boto.s3.connection import S3Connection
conn = S3Connection('access key', 'secret access key')
allBuckets = conn.get_all_buckets()
for bucket in allBuckets:
print(str(bucket.name))

How can I make a S3 bucket public (the amazon example policy doesn't work)?

Amazon provides an example for Granting Permission to an Anonymous User as follows (see Example Cases for Amazon S3 Bucket Policies):
{
"Version": "2008-10-17",
"Statement": [
{
"Sid": "AddPerm",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::bucket/*"
}
]
}
Within my policy I've changed "bucket" in ""arn:aws:s3:::bucket/" to "my-bucket".
However, once I try to access an image within a folder of that bucket, I get the following Access denied error:
This XML file does not appear to have any style information associated
with it. The document tree is shown below.
(if I explicitly change the properties of that image to public, then reload its url, the image loads perfectly)
What am I doing wrong?
Update #1: Apparently it has something to do with a third party site that I've given access to. Although it has all of the permissions as the main user (me), and its objects are in the same folder, with the exact same permissions, it still won't let me make them publicly viewable. No idea why.
Update #2: Bucket policies do not apply to objects "owned" by others, even though they are within your bucket, see my answer for details.

Update
As per GoodGets' comment, the real issue has been that bucket policies to do not apply to objects "owned" by someone else, even though they are in your bucket, see GoodGets' own answer for details (+1).
Is this a new bucket/object setup or are you trying to add a bucket policy to a pre-existing setup?
In the latter case you might have stumbled over a related pitfall due to the interaction between the meanwhile three different S3 access control mechanisms available, which can be rather confusing indeed. This is addressed e.g. in Using ACLs and Bucket Policies Together:
When you have ACLs and bucket policies assigned to buckets, Amazon S3
evaluates the existing Amazon S3 ACLs as well as the bucket policy
when determining an account’s access permissions to an Amazon S3
resource. If an account has access to resources that an ACL or policy
specifies, they are able to access the requested resource.
While this sounds easy enough, unintentional interferences may result from the subtle different defaults between ACLs an policies:
With existing Amazon S3 ACLs, a grant always provides access to a
bucket or object. When using policies, a deny always overrides a
grant. [emphasis mine]
This explains why adding an ACL grant always guarantees access, however, this does not apply to adding a policy grant, because an explicit policy deny provided elsewhere in your setup would still be enforced, as further illustrated in e.g. IAM and Bucket Policies Together and Evaluation Logic.
Consequently I recommend to start with a fresh bucket/object setup to test the desired configuration before applying it to a production scenario (which might still interfere of course, but identifying/debugging the difference will be easier in case).
Good luck!

Bucket policies do not apply files with other owners. So although I've given write access to a third party, the ownership remains them, and my bucket policy will not apply to those objects.

I wasted hours on this, the root cause was stupid, and the solutions mentioned here didn't help (I tried them all), and the AWS s3 permissions docs didn't emphasize this point.
If you have Requester Pays setting ON, you cannot enable Anonymous access (either by bucket policy or ACL 'Everyone'). You can sure write the policies and ACL and apply them and even use the console to explicitly set a file to public, but a non signed url will still get a 403 access denied 100% of the time on that file, until you uncheck requester pays setting in the console for the entire bucket (properties tab when bucket is selected). Or, I assume, via some API REST call.
Unchecked Requester Pays and now anonymous access is working, with referrer restrictions, ect. In fairness, the AWS console does tell us:
While Requester Pays is enabled, anonymous access to this bucket is disabled.

The issue is with your Action it should be in array format
Try this:
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"AddPerm",
"Effect":"Allow",
"Principal": "*",
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::examplebucket/*"]
}
]
}
Pass your Bucket name in 'Resource'

If you're having this problem with Zencoder uploads, checkout this page: https://app.zencoder.com/docs/api/encoding/s3-settings/public

The following policy will make the entire bucket public :
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"AddPerm",
"Effect":"Allow",
"Principal": "*",
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::examplebucket/*"]
}
]
}
If you want a specific folder under that bucket to be public using Bucket policies , then you have to explicitly make that folder/prefix as public and then apply the bucket policy as follows :
{
"Version":"2012-10-17",
"Statement":[
{
"Sid":"AddPerm",
"Effect":"Allow",
"Principal": "*",
"Action":["s3:GetObject"],
"Resource":["arn:aws:s3:::examplebucket/images/*"]
}
]
}
The above policy will allow public read to all of the objects under images , but you will not be able to access other objects inside the bucket.

I know it is an old question but I would like to add information that may still be relevant today.
I believe that this bucket should be a static site. Because of this, you must use a specific URL for your rules to be accepted. To do this, you must add a "website" to your URL. Otherwise, it will treat it just like an object repository.
Example:
With the problem pointed out:
https://name-your-bucket.sa-east-1.amazonaws.com/home
Without the problem pointed out:
http://name-your-bucket.s3-website-sa-east-1.amazonaws.com/home
Hope this helps :)

This works.
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicRead",
"Effect": "Allow",
"Principal": "*",
"Action": [
"s3:GetObject",
"s3:GetObjectVersion"
],
"Resource": "arn:aws:s3:::example-bucket/*"
}
]
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas