Apache OAK Direct Binary Access with S3DataStore

Apache OAK Direct Binary Access with S3DataStore - jcr

I'm trying to figure out how the direct binary access feature works with Apache Oak.
My understanding so far is, I can set binary properties to nodes, and later, I should be able to get a direct download link (from S3).
First, I created a node and added a binary property with the contents of some file.
val ntFile = session.getRootNode.addNode(path, "nt:file")
val ntResource = ntFile.addNode("jcr:content", "nt:resource")
ntResource.setProperty("jcr:mimeType", "application/octet-stream")
ntResource.setProperty("jcr:lastModified", Calendar.getInstance())
val fStream = new FileInputStream("/home/evren/cast.webm")
val bin = session.getValueFactory.asInstanceOf[JackrabbitValueFactory].createBinary(fStream)
ntResource.setProperty("jcr:data", bin)
And I can see on the AWS Console, my binary is uploaded.
But, still, I cannot generate direct download URI, even following the documentation on the OAK website. (So the code continues)
session.save()
session.refresh(false)
val binary = session.getRootNode.getNode(path)
.getNode("jcr:content").getProperty("jcr:data").getValue.getBinary
val uri = binary.asInstanceOf[BinaryDownload].getURI(BinaryDownloadOptions.DEFAULT)
It's always returning null.
Someone please could point me to what I am doing wrong or is my understanding.
Thanks in advance.

I figured it out. In case, anyone else is facing the same issue, the trick is to register your BlobStore using a WhiteBoard.
This explains a lot about the issue that, I could upload files directly using BlobStorage but OAK itself could not use the BlobStore functionality to get a direct download link.
val wb = new DefaultWhiteboard()
// register s3/azure as BlobAccessProvider
wb.register(
classOf[BlobAccessProvider],
blobStore.asInstanceOf[BlobAccessProvider],
Collections.emptyMap()
)
val jcrRepo = new Jcr(nodeStore).`with`(wb).createRepository()
And once you create your JCR Repo like this, direct binary download/upload works as expected.

Related

Update metadata of object in Google Cloud Storare with Java/Kotlin gives NullPoinerException

I have created a function to update metadata of an object in Google Cloud Storage.
fun updateUserMetadata(objectName: String, userMetadata: Map<String, String>) {
val blobId = BlobId.of(bucketName, basePath + objectName)
val blobInfo: BlobInfo = BlobInfo.newBuilder(blobId)
.setMetadata(userMetadata)
.build()
storage.update(blobInfo)
}
Somehow this function always gives me the following exception:
java.lang.NullPointerException
at com.google.cloud.storage.BlobId.fromPb(BlobId.java:119)
at com.google.cloud.storage.BlobInfo.fromPb(BlobInfo.java:1029)
at com.google.cloud.storage.Blob.fromPb(Blob.java:918)
at com.google.cloud.storage.StorageImpl.update(StorageImpl.java:428)
at com.google.cloud.storage.StorageImpl.update(StorageImpl.java:447)
at com.peerke.outdoorpuzzlegame.backend.common.gcp.cloudstorage.CloudStorageBase.updateUserMetadata(CloudStorageBase.kt:88)
at com.peerke.outdoorpuzzlegame.backend.common.gcp.cloudstorage.CloudStorageBaseTest.testUpdateUserMetadata(CloudStorageBaseTest.kt:71)
In the function above non of the variables are null.
Is this a bug or am I doing something wrong?

I've been looking into how you are passing things and I first did not see anything directly wrong. What I found though makes total sense and I hope helps you in finding a solution is this:
gsObjectName - the object name of this Blob, if it is stored on Google Cloud Storage, null otherwise.
I'm inclining towards it being a pathing issue on how you are pointing to the GCS object. It not being able to reach where you are pointing would explain why a nullpointer is raised.
I would recommend revising the way you are pointing to the path of the GCS object.
Hope this helps.
EDIT:
I've found another thing that might help you solve the issue. This is a similar way of doing what you are attempting to do and the different perspective might help you out.
Have a look here

Qlik Sense: how to specify path in Google Drive?

I have a Google drive account divided into some folders (say, Folder1, Folder2, etc.), with some subfolders in it.
I successfully managed to connect my Qlik Sense app to it.
I need to make it look for files only in a given subfolder.
At the moment, I read as follows ([...] is the location)
(URL IS [[...]connectorID=GoogleDriveConnector&table=ListSpreadsheets&appID=], qvx);
It works and reloads successfully, but I need it to filter the Spreadsheets properly. How could I get what I need?

To connect to Google Drive in fact you use web connector. Once web connector is installed it can be initialized as service or manually from its folder.
Once it i installed (recent version can be downloaded from https://qliksupport.force.com/apex/QS_Home_Page but it seems that you've got it as Google Drive is part of it ) it is much nicer to configure connection to online drives there.
You just go to http://localhost:5555/web and generate ready code.
In my implementation I used following options step by step to get data which I wanted:
1) CanAuthenticate to generate permanent token
2) ListSpreadsheets
3) ListWorksheets
4) GetWorksheet

You can't just specify path. But it's possible to retrieve the path from QWC services. Please use algorithm like that:
Use tables like ListFiles/ListWorksheets
Iter through every row with 'for' cycle:
FOR i=0 to (NoOfRows('Google_ListWorksheets')-1);
Let vWorksheetKey = Peek('worksheetKey', $(i), 'Google_ListWorksheets');
Let vTitle = left(Peek('title', $(i), 'Google_ListWorksheets'),3);
Using 'if' statement find desired folder id/worksheet key by its name (stored in vTitle variable) and use it:
load * FROM [$(vQwcConnectionName)]
(URL IS [http://localhost:5555/data?connectorID=GoogleDriveConnector&table=GetWorksheet&worksheetKey=$(vWorksheetKey)&appID=], qvx);
At the end you will get your files by their location.

Using Leigh version of S3Wrapper.cfc Can't get past Init

I am new to S3 and need to use it for image storage. I found a half dozen versions of an s2wrapper for cf but it appears that the only one set of for v4 is one modified by Leigh
https://gist.github.com/Leigh-/26993ed79c956c9309a9dfe40f1fce29
Dropped in the com directory and created a "test" page that contains the following code:
s3 = createObject('component','com.S3Wrapper').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
but got the following error :
So I changed the line 37 from
variables.Sv4Util = createObject('component', 'Sv4').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
to
variables.Sv4Util = createObject('component', 'Sv4Util').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
Now I am getting:
I feel like going through Leigh code and start changing things is a bad idea since I have lurked here for year an know Leigh's code is solid.
Does any know if there are any examples on how to use this anywhere? If not what I am doing wrong. If it makes a difference I am using Lucee 5 and not Adobe's CF engine.
UPDATE :
I followed Leigh's directions and the error is now gone. I am addedsome more code to my test page which now looks like this :
<cfscript>
s3 = createObject('component','com.S3v4').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
bucket = "imgbkt.domain.com";
obj = "fake.ping";
region = "s3-us-west-1"
test = s3.getObject(bucket,obj,region);
writeDump(test);
test2 = s3.getObjectLink(bucket,obj,region);
writeDump(test2);
writeDump(s3);
</cfscript>
Regardless of what I put in for bucket, obj or region I get :
JIC I did go to AWS and get new keys:
Leigh if you are still around or anyone how has used one of the s3Wrappers any suggestions or guidance?
UPDATE #2:
Even after Alex's help I am not able to get this to work. The Link I receive from getObjectLink is not valid and getObject never does download an object. I thought I would try the putObject method
test3 = s3.putObject(bucketName=bucket,regionName=region,keyName="favicon.ico");
writeDump(test3);
to see if there is any additional information, I received this :
I did find this article https://shlomoswidler.com/2009/08/amazon-s3-gotcha-using-virtual-host.html but it is pretty old and since S3 specifically suggests using dots in bucketnames I don't that it is relevant any longer. There is obviously something I am doing wrong but I have spent hours trying to resolve this and I can't seem to figure out what it might be.

I will give you a rundown of what the code does:
getObjectLink returns a HTTP URL for the file fake.ping that is found looking in the bucket imgbkt.domain.com of region s3-us-west-1. This link is temporary and expires after 60 seconds by default.
getObject invokes getObjectLink and immediately requests the URL using HTTP GET. The response is then saved to the directory of the S3v4.cfc with the filename fake.ping by default. Finally the function returns the full path of the downloaded file: E:\wwwDevRoot\taa\fake.ping
To save the file in a different location, you would invoke:
downloadPath = 'E:\';
test = s3.getObject(bucket,obj,region,downloadPath);
writeDump(test);
The HTTP request is synchronous, meaning the file will be downloaded completely when the functions returns the filepath.
If you want to access the actual content of the file, you can do this:
test = s3.getObject(bucket,obj,region);
contentAsString = fileRead(test); // returns the file content as string
// or
contentAsBinary = fileReadBinary(test); // returns the content as binary (byte array)
writeDump(contentAsString);
writeDump(contentAsBinary);
(You might want to stream the content if the file is large since fileRead/fileReadBinary reads the whole file into buffer. Use fileOpen to stream the content.
Does that help you?

Hadoop DistributedCache caching files without absolute path?

I am in the process of migrating to YARN and it seems the behavior of the DistributedCache changed.
Previously, I would add some files to the cache as follows:
for (String file : args) {
Path path = new Path(cache_root, file);
URI uri = new URI(path.toUri().toString());
DistributedCache.addCacheFile(uri, conf);
}
The path would typically look like
/some/path/to/my/file.txt
Which pre-exists on HDFS and would essentially end up in the DistributedCache as
/$DISTRO_CACHE/some/path/to/my/file.txt
I could symlink to it in my current working directory and use with DistributedCache.getLocalCacheFiles()
With YARN, it seems this file instead ends up in the cache as:
/$DISTRO_CACHE/file.txt
ie, the 'path' part of the file URI got dropped and only the filename remains.
How does with work with different absolute paths ending up with the same filename? Consider the following case:
DistributedCache.addCacheFile("some/path/to/file.txt", conf);
DistributedCache.addCacheFile("some/other/path/to/file.txt", conf);
Arguably someone could use fragments:
DistributedCache.addCacheFile("some/path/to/file.txt#file1", conf);
DistributedCache.addCacheFile("some/other/path/to/file.txt#file2", conf);
But this seems unnecessarily harder to manage. Imagine the scenario where those are command-line arguments, you somehow need to manage that those 2 filenames, although different absolute paths would definitely clash in the DistributedCache and therefore need to re-map these filenames to fragments and propagate as such to the rest of the program?
Is there an easier way to manage this?

Try to add files into Job
It's most likely how you're actually configuring the job and then accessing them in the Mapper.
When you're setting up the job you're going to do something like
job.addCacheFile(new Path("cache/file1.txt").toUri());
job.addCacheFile(new Path("cache/file2.txt").toUri());
Then in your mapper code the urls are going to be stored in an array which can be accessed like so.
URI file1Uri = context.getCacheFiles()[0];
URI file2Uri = context.getCacheFiles()[1];
Hope this could help you.

Play 2.1 : Access files on server

Let's suppose I have a folder on my server. I want to be able to access this folder from a URL using Play 2.1.
I really don't know how and I have been searching a lot on the Internet, to no avail.
Here is a folder in which there are files I want to access :
/home/user/myFiles
I'd like that when I type the following URL
localhost:9000/filesOnServer/filename
I download the file named "filename" in the folder myFiles.
This is not what I want to do :
GET /filesOnServer/*file controllers.Assets.at(path="/anything", file)
Indeed, this way I can only access files inside the play application directory.
Moreover, if I were to use dist, then the files are stored in a .jar and we can no longer add files to the application.
Thank you for your help.

Are you using scala or java?
I think you should look for Play.getFile() and SimpleResult()
Here is a sample of a little Controller method in scala, it might not be the most efficient but it seems to work !
def getFile() = Action {
val file = Play.getFile("../../Images/img.png")(Play.current)
val fileContent = Enumerator.fromFile(file)
SimpleResult(
header = ResponseHeader(200, Map(
CONTENT_LENGTH -> file.length.toString,
CONTENT_TYPE -> "image/png",
CONTENT_DISPOSITION -> s"attachment; filename=${file.getName}")),
body = fileContent)
}
Hope it will help you!
Note: you could also use new java.io.File(absolutePath)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Apache OAK Direct Binary Access with S3DataStore - jcr

Related

Update metadata of object in Google Cloud Storare with Java/Kotlin gives NullPoinerException

Qlik Sense: how to specify path in Google Drive?

Using Leigh version of S3Wrapper.cfc Can't get past Init

Hadoop DistributedCache caching files without absolute path?

Play 2.1 : Access files on server

Categories

Resources