Problem storing data in S3 using #Lob annotation - amazon-s3

For my app, i'm storing data in simpleDB, but since SimpleDB has a 1024 character per attribute maximum so larger values should be stored in S3.
I'm doing something like this ::
#Basic(fetch=FetchType.LAZY)
#Lob
private byte[] multimedia; //to be stored in s3
and #Lob on its getters and setters
The JPA query is giving no error....but NO multimedia field is being created either in s3 or simpleDB. Kindly guide where to look for solution to this problem
Also suggest how to make the rows of simpledb refer to same multimedia object in s3(to reduce data redundancy) as when i did this by manually creating lobkey and tried to use that value in another object...it created a new copy in s3 with a new key.
Thanks

You can store your multimedia files on Amazon s3 and store Public URL of that multimedia file on Amazon SimpleDB.
Let's suppose you have am image say.. my_image.png. Upload that image to Amazon s3 in your desire bucket.. say my_bucket. Generate public URL of that uploaded s3 object. It would be. http://my_bucket.s3.amazonaws.com/my_image.png. Upload that public URL to Amazon SimpleDB.
It will easy your task and you can sync your uploaded data over Amazon Simple-db and Amazon s3.

Related

Create data source in Immuta from S3 bucket

I want to create a data source in Immuta from an s3 bucket using their API. When I click + New Data Source on the Immuta website, one of the first options for storage technology is Amazon S3. But I cannot figure out how to create the source programmatically using their API (POSTing to /api/v2/data). Does anyone know how to do so?

Exporting data over an api from s3 using lambda

I have some data stored in dynamo db and some highres images of each user stored in S3. The requirement is to be able to export a users data on demand. So by an api endpoint, collate all data and send it as a response. We are using aws lambda using node.js for business logic, s3 for storing images and sql db for storing relational data
I had set up a mechanism to connect api gateway to receive requests and put them in a sqs. The sqs would trigger a lambda which would run queries to gather all the data and image paths. We would copy all the images and data into a new bucket with custId as a folder name. Now heres where Im stuck. How to stream this data from our new aws bucket. All collected data is about 4gb. I have tried to stream via aws-lambda but keep failing. I am able to stream sigle files but not all data as zip. Hv done this in node, but would not want to set up an EC2 is possible and try to solve it directly with s3 and lambdas
CAnt seem to find a way to stream an entire folder from aws to the client as a response to an http request
Okay found the answer. Instead of trying to return a zip stream, Im now just zipping and saving the folder on the bucket itself and returning a signed url for it. Many node modules help us zip s3 folders without loading entire files in memory. Using that we have zipped our folder and returned a signed url. How it will behave under actual load remains to be seen. Will do that soon

Stream 'Azure Blob Storage' file URL to Amazon S3 bucket

I am consuming a document-sharing API in my application, which when used, returns a "downloadUrl" for a given file located in Azure Blob Storage.
I want to take that Azure Blob Storage url, and stream the document into an Amazon S3 bucket.
How would I go about doing this? I see similar questions such as Copy from Azure Blob to AWS S3 using C# , but in that example they seem to have access to the stream of the document itself. Is there any way for me to simply provide S3 with the link, and have them do the rest? Or do I need to get the file on the server, and stream it as in the example above?
Thanks in advance for the help.
There is only one case where S3 can be directed to fetch content "into" an object, and that is when the source is also an existing S3 object. It can be in the same bucket or a different bucket, or even a different AWS region or account, as long as the calling user has permissions on both source and target.
Any other case -- such as what you are contemplating -- requires that you fetch the object from the source, yourself, and then upload it into S3... as in the example.

Getting data from S3 (client) to our S3 (company)

We have a requirement to get a .csv files from a bucket which is a client location (They would provide the S3 bucket info and other information required). Every day we need to pull this data into our S3 bucket so we can process it further. Please suggest the best way/technology that we can use to achieve the result.
I am planning to do it by Python boto (or Pandas or Pyspark) or Spark; reason being, once we get this data it might be processed further.
You can try the S3 cross account object copy using the S3 copy option. This is more secure and the suggested one. Please go through the below link for more details. It also works for same account different buckets. After copying then you can trigger some lambda function with custom code(python) to do the processing of the .csv files.
How to copy Amazon S3 objects from one AWS account to another by using the S3 COPY operation
If your customer keeps the data in an s3 bucket to which your account has been granted access to it, then it should be possible to use the .csv files as a direct source of data for a spark job. Use the s3a://theirbucket/nightly/*.csv as the RDD source, and save it to s3a://mybucket/somewhere, ideally in a format other than CSV (Parquet, ORC, ...). This lets you do some basic transformation of the format into one easier to work with.
If you just want the raw CSV files, that S3 Copy operation is what you need, as it copies the data within S3 itself (6+MiB/s if in the same S3 location), and not needing any of your own VMs involved.

How to search a string in Amazon S3 files?

I have all types of files or documents stored in Amazon S3.
How to perform search on those documents using a search keyword or string (full-text search, if possible) ?
Is there any documentum built on it ?
Matching documents list which has the search string will be displayed to the user for download.
Any help please ?
Searching documents in S3 is not possible.
S3 is not a document database. It is an object store, designed for storing data but inferring no "meaning" from the data -- essentially a key/value store suporting very large values. It has no sense of context. It doesn't index the content of the objects, or even the object metadata. The only way to "find" an object in S3 is to already know its key.
It is excellent for highly available and highly reliable storage, but searching not part of its design.
The solutions depends on how structured your S3 file data is.
If it is structured or semi-structured like cvs, JSON with columnar alike format, AWS Athena will be the best choice. With just a few clicks, you’re ready to query your S3 files.
Otherwise, if the data is totally un-structured, you may want to use elasticsearch and etc.
You cant search as you wish in amazon S3.
but we have alternate solution for this. I am using S3 browser software for this.
here is link to download: http://s3browser.com/
Download it and you will have all access same like amazon S3 browser. you can also perform search and other processes.