any storage service like amazon s3 which allows upload /Download at the same time on large file - amazon-s3

My requirement to upload large file (35gb), when the upload is in progress need to start the download process on the same file. Any storage service which allows develop .net application
Because Amazon s3 will not allow simultaneously upload and download on

You could use Microsoft Azure Storage Page or Append Blobs to solve this:
1) Begin uploading the large data
2) Concurrently download small ranges of data (no greater than 4MB so the client library can read it in one chunk) that have already been written to.
Page Blobs need to be 512 byte aligned and can be read and written to in a random access pattern, whereas AppendBlobs need to be written to sequentially in an append-only pattern.
As long as you're reading sections that have already been written to you should have no problems. Check out the Blob Getting Started Doc: https://azure.microsoft.com/en-us/documentation/articles/storage-dotnet-how-to-use-blobs/ and some info about Blob Types: https://msdn.microsoft.com/library/azure/ee691964.aspx
And feel free to contact us with any follow up questions.

Related

Difference Between Tile and Data usage & Feature Service

With Developer Account we get upto 5GB free for Tile And Data usage and uptoo 100 MB free for Feature Service. We are not sure what's the difference between two?
If I upload 100MB+ Geojson file will it be considered under 100MB or 5GB?
Thank you,
Raj
When you upload the data to ArcGIS, it will be published as a layer in a Feature Service. This will then count towards the 100MB Feature limit. However, feature service storage is typically (always?) more efficient than GeoJSON storage. For example, in a quick test, a 521KB GeoJSON file downloaded from here turned into 328KB Feature Service. Geometries in feature services are stored as binary fields, and various other efficiencies of the backing hosted feature service (such as efficiently storing attribute data) will also help. There are of course many factors that influence this, but I expect you would always see an improvement over the raw GeoJSON size.
Note that the GeoJSON file you upload will also be stored as the source for the published feature service as part of your 5GB limit (this is so you can upload updated GeoJSON and republish your feature service at the same URL). You can delete this if you won't ever need to update the feature service this way. For reference, here's the GeoJSON file I uploaded (it seems that was also compressed slightly for storage to 509KB).

How to resolve this error in Google Data Fusion: "Stage x contains a task of very large size (2803 KB). The maximum recommended task size is 100 KB."

I need to move data from an parameterized S3 Bucket into Google Cloud Storage. Basic Data dump. I don't own the S3 bucket. It has the following syntax,
s3://data-partner-bucket/mykey/folder/date=2020-10-01/hour=0
I was able to transfer data at the hourly granularity using the Amazon S3 Client provided by Data Fusion. I wanted to bring over a days worth of data so I reset the path in the client to:
s3://data-partner-bucket/mykey/folder/date=2020-10-01
It seemed like it was working until it stopped. The status is "Stopped." When I review the logs just before it stopped I see a warning, "Stage 0 contains a task of very large size (2803 KB). The maximum recommended task size is 100 KB."
I examined the data in the S3 bucket. Each folder contains a series of log files. None of them are "big". The largest folder contains a total of 3MB of data.
I saw a similar question for this error, but the answer involved Spark coding that I don't have access to in Data Fusion.
Screenshot of Advanced Settings in Amazon S3 Client
These are the settings I see in the client. Maybe there is another setting somewhere I need to set? What do I need to do so that Data Fusion can import these files from S3 to GCS?
When you deploy the pipeline you are redirected to a new page with a Ribbon at the top. one of the tools in the Ribbon is Configure.
In the resources section of the Configure Modal you can specify the memory resources. Fiddled around with the numbers. 1000MB worked. 6MB was not enough. (For me.)
I processed 756K records in about 46 min.

How to create an on-the-fly zip streaming with photos from database?

In my asp.net core 2.0 application, upon user's request, I need to present the user a zip file of the photos currently stored in a sqlserver database. The photos are about 1M in size on an average.
I saw a few examples that use MemoryStream over ZipArchive to create a stream-based zip file. However, this requires that all the photos (1000+) be zipped first before getting transferred. I may not even have that much memory to hold the entire collection. I am wondering if there is a mechanism such that I can continue to fetch and archive photos while streaming is going on. Also, are there any sqlserver timeout or website timeout constraints I need to worry about? Regards.

Need suggestion on usage of cloud file storage system, where reads are more than writes

I need a cloud service for saving file objects for my project.
Requirements:
1. Pushing the files would be less compared to reading files from the storage.
2. Need to maintain versions of the file objects
3. File objects must be indexed for fast retrieval.
Note:
We have already considered using Amazon S3 bucket, but considering our project requirement, as reading the file objects would be 1 million times more than writing a file into storage.
As Amazon charges S3 usage more on the number of reads that writes it really is the last option for us to use.
Can anybody kindly provide suggestions on what can we use here?
Thanks!

Does S3 multipart upload actually create multiple objects in my bucket?

Here is an example for me trying to understand the under the hood mechanism.
I decide to upload a 2GB file onto my S3 bucket, and I decide to use the size of 128MB for the parts. Then I will have
(2 * 1024) / 128 => 16 parts
Here are my questions:
Am I going to see 16 128MB objects in my bucket or a single 2GB
object in my bucket?
How can S3 understand the order of the parts (1->2->...->16) and
reassemble them into a single 2GB file when I download them back? Is
there an extra 'meta' object (see the above question) that I need to download first to help the client to achieve this reassembling-needed information?
When the s3 client download the above in parallel, at what time does it write the file descriptor for this 2GB file in the local file system (I guess it does not know all the needed information before all the parts have been downloaded)?
While uploading the individual parts, there will be multiple uploads stored in Amazon S3 that you can view with the ListMultipartUploads command.
When completing a multipart upload with the CompleteMultipartUpload command, you must specify a list of the individual parts uploaded in the correct order. The uploads will then be combined into a single object.
Downloading depends upon the client/code you use -- you could download an object in parallel or just single-threaded.