I am attempting to give access to parquet files on a Gen2 Data Lake container. I have owner RBAC on the container but would prefer to limit access in the container for other users.
My Query is very simple:
SELECT
TOP 100 *
FROM
OPENROWSET(
BULK 'https://aztsworddataaipocacldl.dfs.core.windows.net/pocacl/Top/Sub/part-00006-c62926ba-c530-4ad8-87d1-cf38c67a2da3-c000.snappy.parquet',
FORMAT='PARQUET'
) AS [result]
When I run this I have no problems connecting. I have attempted to add ACL rights onto the files (and of course the containing folders 'Top' and 'Sub').
I've give RWX on the 'Top' folder using Storage Explorer and default so that it cascades to the 'Sub' folder and parquet files as I add them
When my colleague attempts to run the SQL script the get the error message. Failed to execute query. Error: File 'https://aztsworddataaipocacldl.dfs.core.windows.net/pocacl/Top/Sub/part-00006-c62926ba-c530-4ad8-87d1-cf38c67a2da3-c000.snappy.parquet' cannot be opened because it does not exist or it is used by another process.
NB similar results are also experienced in Spark but with a 403 instead
SQL on-demand provides a link to the following help file after the error, it suggests:
If your query fails with the error saying 'File cannot be opened because it does not exist or it is used by another process' and you're sure both file exist and it's not used by another process it means SQL on-demand can't access the file. This problem usually happens because your Azure Active Directory identity doesn't have rights to access the file. By default, SQL on-demand is trying to access the file using your Azure Active Directory identity. To resolve this issue, you need to have proper rights to access the file. Easiest way is to grant yourself 'Storage Blob Data Contributor' role on the storage account you're trying to query.
I don't wish to grant Storage Blob Data Contributor or Storage Blob Data Reader as this gives access to every file on the container and not just those I want end users to be able to query. We have found the same experience occurs for SSMS connecting to parquet external tables.
So then in parts:
Is this the correct pattern using ACL to grant access, or should I use another method?
Are there settings on the Storage Account or within my query/notebook that I should be enabling to support ACL?*
Has ACL been implemented on Synapse Workspace to date given that we're still in preview?
*I have resisted pasting my entire settings as I really have no idea what is relevant and what entirely irrelevant to this issue but of course can supply.
It would appear that the ACL feature was not working correctly in Preview for Azure Synapse Analytics.
I have now managed to get it to work. At present I see that once Read|Execute is provided to a folder it allows access to the files contained within that folder and sub folders. Access is available even when no specific ACL access is provided on a file in a sub folder. This is not quite what I expected however it provides enough for me to proceed: only giving access to the Gold folder allows for separation of access to the files I want to let users query and the working files that I want to keep hidden.
When you assign ACL to a folder it's not propagated recursively to all files inside the folder. Only new files inherit from the folder.
You can see this here
Go to azure storage explorer change ACL permissions in the route Folder and right click on your storage and click on "propogate access control lists"
I have set up an S3 bucket to host static files.
When using the website endpoint (http://.s3-website-us-east-1.amazonaws.com/): it forces me to set an index file. When the file isn't found, it throws an error instead of listing directory contents.
When using the s3 endpoint (.s3.amazonaws.com): I get an XML listing of the files, but I need an HTML listing that users can click the link to the file.
I have tried setting the permissions of all files and the bucket itself to "List" for "Everyone" in the AWS Console, but still no luck.
I have also tried some of the javascript alternatives, but they either don't work under the website url (that redirects to the index file) or just don't work at all. As a last resort, a collapsible javascript listing would be better than nothing, but I haven't found a good one.
Is this possible? If so, do I need to change permissions, ACL or something else?
I've created a simple bit of JS that creates a directory index in HTML style that you are looking for: https://github.com/rgrp/s3-bucket-listing
The README has specific instructions for handling Amazon S3 "website" buckets: https://github.com/rgrp/s3-bucket-listing#website-buckets
You can see a live example of the script in action on this s3 bucket (in website mode): http://data.openspending.org/
There is also this solution: https://github.com/caussourd/aws-s3-bucket-listing
Similar to https://github.com/rgrp/s3-bucket-listing but I couldn't make it work with Internet Explorer. So https://github.com/caussourd/aws-s3-bucket-listing works with IE and also add the possibility to order the files by names, size and date. On the downside, it doesn't follow folders: only the files at one level are displayed.
This might solve your problem. Security settings for Everyone group:
(you need the bucketexplorer.com software for this)
If you are sharing files of HTTP, you may or may not want people to be able to list the contents of a bucket (folder.) If you want the bucket contents to be listed when someone enters the bucket name (http://s3.amazonaws.com/bucket_name/), then edit the Access Control List and give the Everyone group the access level of Read (and do likewise with the contents of the bucket.) If you don’t want the bucket contents list-able but do want to share the file within it, disable Read access for the Everyone group for the bucket itself, and then enable Read access for the individual files within the bucket.
I created a much simpler solution. Just place the index.html file in root of your folder and it will do the job. No configuration required. https://github.com/prabhatsharma/s3-directorylisting
I had a similar problem and created a JavaScript-and-iframe solution that works pretty well for listing directories in S3 website files. You just have to drop a couple of .html files into the directory you want to list. You can find it here:
https://github.com/adam-p/s3-file-list-page
I found s3browser, which allowed me to set up a directory on the main web site that allowed browsing of the s3 bucket. It worked very well and was very easy to set up.
Using another approach base in pure JavaScript and AWS SDK JavaScript API. Not need PHP or other engine just pure web site (Apache or even IIS).
https://github.com/juvs/s3-bucket-browser
Not intent for deploy on your own bucket (for me, no make sense).
Using the new IAM Users from AWS you can provide more specific and secure access to your buckets. No need to publish your bucket to website and make all public.
If you want secure the access, you can use the conventional methods to authenticate users for your current web site.
Hope this help too!
Unable to make list blob request using following URI with replacing myaccount with my storage account
http://myaccount.blob.core.windows.net/mycontainer?restype=container&comp=list
Are you sure that your blob is public?
You can check is using CloudBerry Explorer a great free tool to manage Blobs. You can download it here: http://www.cloudberrylab.com/free-microsoft-azure-explorer.aspx
Once the application is install go on the container and right-click. Go check in Properties is the security is public.
I am running Pentaho reports. The problem is all the images are set to localhost:8080 etc, so, I am not able to view the images. Can you tell me where this property be available
The related setting for that is called "base-url" in your web.xml file. Change this to a public IP or domain name. The base-url is used by all content-generators to create links to other files.
Make sure you only access the server via the public IP/name that has been configured there - otherwise you may run into troubles with the security layer as it relies on cookies to authenticate yourself and thus it may prevent you from seeing that content as you appear to be not logged in.
If I want to display an image in my webpage and its src is a file outside context root.
At the IDE, the image is shown to be loaded.
But when I test the web page, nothing displayed.
How can I config weblogic server to allow the image to be displayed. If not is there anyway to run around this problem.
Thanks a lot.
You can use the Virtual Directory Mapping feature (that you declare in the weblogic.xml):
Using the virtual directory mapping
feature, you can create one directory
to serve static files such as images
for multiple Web Applications. For
example, you would create a mapping
similar to the folowing:
<virtual-directory-mapping>
<local-path>c:/usr/gifs</local-path>
<url-pattern>/images/*</url-pattern>
</virtual-directory-mapping>
A request to
http://localhost:7001/mywebapp/images/test.gif
will cause your WebLogic Server
implementation to look for the
requested image at:
c:/usr/gifs/images/*.
This directory must be located in the
relative uri, such as
"/images/test.gif".