MVC4 - How to upload a file partially (only the first 10 lines, for e.g.) - asp.net-mvc-4

ASP.NET MVC - Is it possible to upload only the first 10 lines of a file? Basically, we have some files that can range from 1-10GB but the data that we need is present only in the first 10 rows in the file. Using the typical web development approache, we'd upload the whole file to the server and then read the first 10 rows, but uploading a 10GB file just to read a few bytes of data seems a big waste of resources. Is it possible to read such a file without uploading all of it to the webserver?
Solution - FileAPIs slice function solved this problem (thanks to Chris below). The simplified code is below for anyone interested -
var sampleFile = document.getElementById('yourfileelement').files[0];
var reader = new FileReader();
var fileData = sampleFile.slice(0, 500000); //Read top 500000 bytes
reader.onprogress = function (evt) { //Show progressbar etc }
reader.onloadend = function (evt) { alert(evt.target.result); } //evt.target.result contains the file data that was read
reader.readAsText(fileClientReadData);

No, but you may be able to accomplish it using the File API client-side to read and send to the server via AJAX just the first 10 lines. However, note that the File API is only supported in modern browsers, so this won't work with IE 9 or less. You might be able to create a more comprehensive solution using a Flash or Java applet, but ugh.

Related

What is the best practice for downloading large CSV files from S3 in Java?

I'm trying to get a large CSV file from S3 but the download fails with “java.net.SocketException: Connection reset”, which is probably due to the InputStream simply being open for too long (the download often takes more than an hour since I am doing multiple time-consuming processes on the streamed content). This is how I currently parse the file:
InputStream inputStream = new GZIPInputStream(s3Client.getObject("bucket", "key").getObjectContent());
Reader decoder = new InputStreamReader(inputStream, Charset.defaultCharset());
BufferedReader isr = new BufferedReader(decoder);
CSVParser csvParser = new CSVParser(isr, CSVFormat.DEFAULT);
CSVRecord nextRecord = csvParser.iterator().next();
...
I know I have to split the download into multiple short getObject-calls with a defined offset for the GetObjectRequest, but I'm wondering how to define this offset in case of a CSV, since I need complete lines.
Do I have to ditch the parser library and parse each line into an Object myself so I can keep a count of the read bytes and use it as an offset for the next batch? That doesn't seem very robust to me. Is there any best practice way to achieve "batch downloading" of CSV records?
I decided on simply using the dedicated getObject(GetObjectRequest getObjectRequest, File destinationFile) method to copy the entire CSV to a temporary file on disk. This closes the HTTP connection as soon as possible and allows me to get the InputStream from the local file with no problems. It doesn't resolve the question of the best way to download in batches, but it's a nice and simple workaround.

OutOfMemory on custom extractor

I have stitched a lot of small XML files into one file, and then made a custom extractor to return rows with one byte array that corresponds to each file.
Run on remote/master
Run it for one file (gzipped, 11Mb), it works fine.
Run it for more than one file, I get a System.OutOfMemoryException.
Run on local/master
Run it for one or more files (gzipped 500+ Mbs), works fine.
Extractor looks like this:
public override IEnumerable<IRow> Extract(IUnstructuredReader input, IUpdatableRow output)
{
using (var stream = new StreamReader(input.BaseStream))
{
var xml = stream.ReadToEnd();
// Clean stiched XML
xml = UtilsXml.CleanXml(xml);
// Get nodes - one for each stiched file
var d = new XmlDocument();
d.LoadXml(xml);
var root = d.FirstChild;
for (int i = 0; i < root.ChildNodes.Count; i++)
{
output.Set<object>(1, Encoding.ASCII.GetBytes(root.ChildNodes[i].OuterXml.ToString()));
yield return output.AsReadOnly();
}
yield break;
}
}
and error message looks like this:
==== Caught exception System.OutOfMemoryException
at System.Xml.XmlDocument.CreateTextNode(String text)
at System.Xml.XmlLoader.LoadAttributeNode()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlDocument.Load(XmlReader reader)
at System.Xml.XmlDocument.LoadXml(String xml)
at Microsoft.Analytics.Tools.Formats.Text.XmlByteArrayRowExtractor.<Extract>d__0.MoveNext()
at ScopeEngine.SqlIpExtractor<ScopeEngine::GZipInput,Extract_0_Data0>.GetNextRow(SqlIpExtractor<ScopeEngine::GZipInput\,Extract_0_Data0>* , Extract_0_Data0* output) in d:\data\ccs\jobs\bc367467-ef86-43d2-a937-46ba2d4cc524_v0\sqlmanaged.h:line 1924
So what am I doing wrong? And how do I debug this on remote?
Thanks!
Unfortunately local run does not enforce memory allocations, so you would have to check memory in local vertex debug yourself.
Looking at your code above, I see that you are loading XML documents into a DOM. Please note that an XML DOM can explode the data size from the string representation up to a factor of 10 or more (I have seen 2 to 12 in my times as the resident SQL XML guru).
Each UDO today only gets 1/2 GB of RAM to play with. So what I assume is that your XML DOM document(s) start going beyond that.
The recommendation normally is that you use the XMLReader interface (there is a reader extractor in the samples on http://usql.io as well) and scan through the document(s) to find the information you are looking for.
If your documents are always small enough (e.g., <20MB), you may want to make sure that you release the memory of the other documents and operate one document at a time.
We do have plans to allow you to annotate your UDO with memory needs, but that is still a bit out.

Using Leigh version of S3Wrapper.cfc Can't get past Init

I am new to S3 and need to use it for image storage. I found a half dozen versions of an s2wrapper for cf but it appears that the only one set of for v4 is one modified by Leigh
https://gist.github.com/Leigh-/26993ed79c956c9309a9dfe40f1fce29
Dropped in the com directory and created a "test" page that contains the following code:
s3 = createObject('component','com.S3Wrapper').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
but got the following error :
So I changed the line 37 from
variables.Sv4Util = createObject('component', 'Sv4').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
to
variables.Sv4Util = createObject('component', 'Sv4Util').init(arguments.S3AccessKey, arguments.S3SecretAccessKey);
Now I am getting:
I feel like going through Leigh code and start changing things is a bad idea since I have lurked here for year an know Leigh's code is solid.
Does any know if there are any examples on how to use this anywhere? If not what I am doing wrong. If it makes a difference I am using Lucee 5 and not Adobe's CF engine.
UPDATE :
I followed Leigh's directions and the error is now gone. I am addedsome more code to my test page which now looks like this :
<cfscript>
s3 = createObject('component','com.S3v4').init(application.s3.AccessKeyId,application.s3.SecretAccessKey);
bucket = "imgbkt.domain.com";
obj = "fake.ping";
region = "s3-us-west-1"
test = s3.getObject(bucket,obj,region);
writeDump(test);
test2 = s3.getObjectLink(bucket,obj,region);
writeDump(test2);
writeDump(s3);
</cfscript>
Regardless of what I put in for bucket, obj or region I get :
JIC I did go to AWS and get new keys:
Leigh if you are still around or anyone how has used one of the s3Wrappers any suggestions or guidance?
UPDATE #2:
Even after Alex's help I am not able to get this to work. The Link I receive from getObjectLink is not valid and getObject never does download an object. I thought I would try the putObject method
test3 = s3.putObject(bucketName=bucket,regionName=region,keyName="favicon.ico");
writeDump(test3);
to see if there is any additional information, I received this :
I did find this article https://shlomoswidler.com/2009/08/amazon-s3-gotcha-using-virtual-host.html but it is pretty old and since S3 specifically suggests using dots in bucketnames I don't that it is relevant any longer. There is obviously something I am doing wrong but I have spent hours trying to resolve this and I can't seem to figure out what it might be.
I will give you a rundown of what the code does:
getObjectLink returns a HTTP URL for the file fake.ping that is found looking in the bucket imgbkt.domain.com of region s3-us-west-1. This link is temporary and expires after 60 seconds by default.
getObject invokes getObjectLink and immediately requests the URL using HTTP GET. The response is then saved to the directory of the S3v4.cfc with the filename fake.ping by default. Finally the function returns the full path of the downloaded file: E:\wwwDevRoot\taa\fake.ping
To save the file in a different location, you would invoke:
downloadPath = 'E:\';
test = s3.getObject(bucket,obj,region,downloadPath);
writeDump(test);
The HTTP request is synchronous, meaning the file will be downloaded completely when the functions returns the filepath.
If you want to access the actual content of the file, you can do this:
test = s3.getObject(bucket,obj,region);
contentAsString = fileRead(test); // returns the file content as string
// or
contentAsBinary = fileReadBinary(test); // returns the content as binary (byte array)
writeDump(contentAsString);
writeDump(contentAsBinary);
(You might want to stream the content if the file is large since fileRead/fileReadBinary reads the whole file into buffer. Use fileOpen to stream the content.
Does that help you?

How to access files stored in SQL Server's FileTable?

As I know SQL Server since version 2012 has a new feature, FileTable. It allows us to store files in the file system and to use them from T-SQL.
I am trying to use this feature and I have no idea how to do it properly.
Generally, I don't know how to access files stored in the file table. Let's suppose I have asp.net MVC app and there are a lot of images which I show on web pages in img tags. I would like to store these images in Filetable and access them as files from the filesystem. But I don't know where these files are stored and how to use them as files. Now my images are stored in web application directory in folder images and I write something like this:
<img src='/images/mypicture.png' />
And if I move my images to file table what I should write in src?
<img src='path-toimage-in-filetable' />
I don't think you still need this, anyways I'll post my answer for anyone else interested.
First, a filetable still being a table, so, if you want to access to data from it you need to use a Select SQL statement. So you'd need something like:
select name, file_stream from filetable_name
where
name = 'file_name',
file_type = 'file_extension'
just execute an statement like this in your asp.net app, then fetch the results and use the file_stream column to get the binary data of the stored file. If you want to retrieve the file from HTML, first you need to create an action in your controller, which will return the retrieved file:
public ActionResult GetFile(){
..
return File(file.file_stream,file.file_type);
}
After this, put in you HTML tag something like:
<img src="/controller/GetFile" />
hope this could help!
If you want to know the schema of a filetable see
here
I assume by FileTable you actually mean FileStream. A couple notes about that:
This feature is best used if your files are actually files
The files should be, on average, greater than 1mb - although there can be exceptions to this rule, if they're smaller than 1mb on average, you may be better off using a VARBINARY(MAX) or XML data type as appropriate. If your images are very small on average (only a few KB), consider using a VARBINARY(MAX) column.
Accessing these files will require an open transaction and that the database is properly configured for FILESTREAM
You can get some significant advantages bypassing the normal SQL engine/database file method of data access by telling SQL Server that you want to access the file directly, however it's not meant for directly accessing the file on the file system and attempting to do so can break SQL's management of these files (transactional consistency, tracking, locking, etc.).
It's pretty likely that your use case here would be better served by using a CDN and storing image URLs in the table if you really need SQL for this. You can use FILESTREAM to do this (see code sample below for one implementation), but you'll be hammering your SQL server for every request unless you store the images somewhere else anyway that the browser can properly cache (my example doesn't do that) - and if you store them somewhere else for rendering int he browser you might as well store them there to begin with (you won't have transactional consistency for those images once they're copied to some other drive/disk/location anyway).
With all that said, here's an example of how you'd access the FILESTREAM data using ADO.NET:
public static string connectionString = ...; // get your connection string from encrypted config
// assumes your FILESTREAM data column is called Img in a table called ImageTable
const string sql = #"
SELECT
Img.PathName(),
GET_FILESTREAM_TRANSACTION_CONTEXT()
FROM ImageTagble
WHERE ImageId = #id";
public string RetreiveImage(int id)
{
string serverPath;
byte[] txnToken;
string base64ImageData = null;
using (var ts = new TransactionScope())
{
using (var conn = new SqlConnection(connectionString))
{
conn.Open();
using (SqlCommand cmd = new SqlCommand(sql, conn))
{
cmd.Parameters.Add("#id", SqlDbType.Int).Value = id;
using (SqlDataReader rdr = cmd.ExecuteReader())
{
rdr.Read();
serverPath = rdr.GetSqlString(0).Value;
txnToken = rdr.GetSqlBinary(1).Value;
}
}
using (var sfs = new SqlFileStream(serverPath, txnToken, FileAccess.Read))
{
// sfs will now work basically like a FileStream. You can either copy it locally or return it as a base64 encoded string
using (var ms = new MemoryStream())
{
sfs.CopyTo(ms);
base64ImageData = Convert.ToBase64String(ms.ToArray());
}
}
}
ts.Complete();
// assume this is PNG image data, replace PNG with JPG etc. as appropraite. Might store in table if it will vary...
return "data:img/png;base64," + base64ImageData;
}
}
Obviously, if you have lots of images to handle like this this is not an ideal method - don't try to make an instance of SQL server into what you should be using a CDN for.... However, if you have other really good reasons, you should try to grab as many images as possible in a single request/transaction (e.g. if you know you're displaying 50 images on a page, get all 50 with a single transaction scope, don't use 50 transaction scopes - this code won't handle that).

an error 3013 thrown when writing a file Adobe AIR

I'm trying to write/create a JSON file from a AIR app, I'm trying not so show a 'Save as' dialogue box.
Here's the code I'm using:
var fileDetails:Object = CreativeMakerJSX.getFileDetails();
var fileName:String = String(fileDetails.data.filename);
var path:String = String(fileDetails.data.path);
var f:File = File.userDirectory.resolvePath( path );
var stream:FileStream = new FileStream();
stream.open(f, FileMode.WRITE );
stream.writeUTFBytes( jsonToExport );
stream.close();
The problem I'm having is that I get a 'Error 3013. File or directory in use'. The directory/path is gathered from a Creative Suite Extension I'm building, this path is the same as the FLA being developed in CS that the Extension is being used with.
So I'm not sure if the problem is that there are already files in the directory I'm writing the JSON file to?
Do I need to add a timer in order to close the stream after a slight delay, giving some time to writing the file?
Can you set up some trace() commands? I would need to know what the values of the String variables are, and the f.url.
Can you read from the file that you are trying to write to, or does nothing work?
Where is CreativeMakerJSX.getFileDetails() coming from? Is it giving you data about a file that is in use?
And from Googling around, this seems like it may be a bug. Try setting up a listener for when you are finished, if you have had the file open previously.
I re-wrote how the file was written, no longer running into this issue.