convert cvs to database and graphing data: advice on optimisation - objective-c

Was hoping for some advice on best practice here. Working in objective C and Xcode.
I made a "FileConverter" class which has a method to reads a cvs file with 7 columns of float values into a SQLite database (after verifying the data and parsing it).
The way I have done this is to load the whole file into an NSString, then split into row components, then split each row into column components (saving the result as a 2x2 NSArray.
I then open database and copy the array into the sqlite database. I'm using the TEXT datatype for storage at the moment. Once there, I plan to graph the data.
It seems to check and convert the cvs ok. However if the csv is quite long ( say 10,000 rows), I get the spinning wheel for several seconds while it does its work. For shorter files it converts almost instantly.
Ultimately, at the point the user clicks "Convert CSV", I will also be running another method which will graph the data and I expect this will result in huge delay while it reads the sql database, assembles the data into CGPoints and then draws into the graph view.
My question is about how best to optimise the process, so it can handle the larger files without spinning wheels appearing. Is this possible?
a) Using NSStrings and NSArrays certainly makes the job of reading and splitting up the data super simple and makes verifying the data easy. Is this the best way? Should I malloc a float array instead?
b) I'm working on the basis that by saving the data as TEXT values in the database, converting them to CGFloat values will be straightforward, but realise this will add processing time.
c) I'm imagining that a sqlite3 database would be a faster way for getting the data when I come to graphing it, but I could also simply copy the cvs file and parse it at the point of graphing the data.
Really appreciate advice on this

Your main thread runs the UI, and you're trying to do your computation (CSV Computations) on the main thread making the UI hang (not updating) making the OS invoke the spinning wheel. To avoid this you need to move your compute intensive operations to secondary threads.
When the csv file is small, a synchronous operation is able to perform the csv computation and update the UI immediately. In case of big files, the UI (app window) waits for the computations to finish before the UI can be updated. In such cases you need to asynchronously compute the csv calculations on a background thread. There are many ways to achieve this, the most popular being Apple's GCD.
The following link, Apple's guide to non blocking UI explains it in detail:
Link to Apple's non blocking guide
Link to GCD

Related

labview - buffer data then save to excel file

My question is with respect to a labVIEW VI (2013), I am trying to modify. (I am only just learning to use this language. I have searched the NI site and stackoverflow for help without success, I suspect I am using the incorrect key words).
My VI consists of a flat sequence one pane of which contains a while loop where integer data is collected from a device and displayed on a graph.
I would like to be able to be able to buffer this data and then send it to disk when a preset number of samples have been collected. My attempts so far result in only the last record being saved.
Specifically I need to know how to save the data in a buffer (array) then when the correct number of samples are captured save it all to disk (saving as it is captured slows the process down to much).
Hope the question is clear and thanks very much in advance for any suggestions.
Tom
Below is a simple circular-buffer that holds the most recent 100 readings. Each time the buffer is refilled, its contents are written to a text file. Drag the image onto a VI's block diagram to try it out.
As you learn more about LabVIEW and as your performance and multi-threaded needs increase, consider reading about some of the LabVIEW design patterns mentioned in the other answers:
State machine: http://www.ni.com/tutorial/7595/en/
Producer-consumer: http://www.ni.com/white-paper/3023/en/
I'd suggest to split the data acquisition and the data saving in two different loops using a producer/consumer design pattern..
Moreover if you need a very high throughput consider using TDMS file format.
Have a look here for an overview: http://www.ni.com/white-paper/3727/en/
Screenshot will definitely help. However, some things are clear:
Unless you are dealing with very high volume of data, very slow hard drives or have other unusual requirements, open the file before your while loop, write to it every time you acquire a sample (leaving buffering to the OS), and close it afterwards.
If you decide you need to manage buffering on your own, you can use queues. See this example: https://decibel.ni.com/content/docs/DOC-14804 for reference (they stream data from disk, buffering it in the queue, but it is the same idea)
My VI consists of a flat sequence one pane of which
Substitute flat sequence for finite state machine (e.g. http://forums.ni.com/t5/LabVIEW/Ending-a-Flat-Sequence-Inside-a-case-structure/td-p/3170025)

How to handle a very big array in vb.net

I have a program producing a lot of data, which it writes to a csv file line by line (as the data is created). If I were able to open the csv file in excel it would be about 1 billion cells (75,000*14,600). I get the System.OutOfMemoryException thrown every time I try and access it (or even create an array this size). If anyone has any idea how to can take the data into vb.net so I can do some simple operations (all data needs to be available at once) then I'll try every idea you have.
I've looked at increasing the amount of ram used but other articles/posts say this will run short way before the 1 billion mark. There's no issues with time here, assuming it's no more than a few days/weeks I can deal with it (I'll only be running it once or twice a year). If you don't know anyway to do it the only other solutions I can think of would be increasing the number of columns in excel to ~75,000 (if that's possible - can't write the data the other way around), or I suppose if there's another language that could handle this?
At present it fails right at the start:
Dim bigmatrix(75000, 14600) As Double
Many thanks,
Fraser :)
First, this will always require a 64bit operating system and a fairly large amount of RAM, as you're trying to allocate about 8 GB.
This is theoretically possible in Visual Basic targeting .NET 4.5 if you turn on gcAllowVeryLargeObjects. That being said, I would recommend using a jagged array instead of a multidimensional array if possible, as this will remove the requirement of needing a single allocation of 8GB. (This will also potentially allow it to work in .NET 4 or earlier.)

Is it worthwhile to Big Query in realtime XML Data?

I have an xml file around 2 MB (Yes !! 2MB small file), I want to sort the file in some predetermined format, and show the formatted result, as of not it takes 2 - 3 seconds for the whole process, we want to cut down on the time.
My Questions, are
(a) Any way to directly push XML into big query instead of CSV.
(b) I would want to do realtime, so how do i push data from my website, and get the data back on my website. (Do you think the command line would do the tricks ?
(c) I am working on .NET.
I don't think you can push XML directly into BigQuery. The documentation doesn't say, "You cannot import XML." But the fact that it only explains how to use CSV makes it pretty clear.
It doesn't sound like a perfect use case for BigQuery. BigQuery is great for huge data volumes, but you have small data (as you noted). Would it not be quicker to just sort your XML in memory without pushing it somewhere else?

Loading large amounts of data to an Oracle SQL Database

I was wondering if anyone had any experience with what I am about to embark on. I have several csv files which are all around a GB or so in size and I need to load them into a an oracle database. While most of my work after loading will be read-only I will need to load updates from time to time. Basically I just need a good tool for loading several rows of data at a time up to my db.
Here is what I have found so far:
I could use SQL Loader t do a lot of the work
I could use Bulk-Insert commands
Some sort of batch insert.
Using prepared statement somehow might be a good idea. I guess I was wondering what everyone thinks is the fastest way to get this insert done. Any tips?
I would be very surprised if you could roll your own utility that will outperform SQL*Loader Direct Path Loads. Oracle built this utility for exactly this purpose - the likelihood of building something more efficient is practically nil. There is also the Parallel Direct Path Load, which allows you to have multiple direct path load processes running concurrently.
From the manual:
Instead of filling a bind array buffer
and passing it to the Oracle database
with a SQL INSERT statement, a direct
path load uses the direct path API to
pass the data to be loaded to the load
engine in the server. The load engine
builds a column array structure from
the data passed to it.
The direct path load engine uses the
column array structure to format
Oracle data blocks and build index
keys. The newly formatted database
blocks are written directly to the
database (multiple blocks per I/O
request using asynchronous writes if
the host platform supports
asynchronous I/O).
Internally, multiple buffers are used
for the formatted blocks. While one
buffer is being filled, one or more
buffers are being written if
asynchronous I/O is available on the
host platform. Overlapping computation
with I/O increases load performance.
There are cases where Direct Path Load cannot be used.
With that amount of data, you'd better be sure of your backing store - the dbf disks' free space.
sqlldr is script drive, very efficient, generally more efficient than a sql script.
The only thing I wonder about is the magnitude of the data. I personally would consider several to many sqlldr processes and assign each one a subset of data and let the processes run in parallel.
You said you wanted to load a few records at a time? That may take a lot longer than you think. Did you mean a few files at a time?
You may be able to create an external table on the CSV files and load them in by SELECTing from the external table into another table. Whether this method will be quicker not sure however might be quicker in terms of messing around getting sql*loader to work especially when you have a criteria for UPDATEs.

Saving large objects to file

I'm working on a project in Objective-c where I need to work with large quantities of data stored in an NSDictionary (it's around max ~2 gigs in ram). After all the computations that I preform on it, it seems like it would be quicker to save/load the data when needed (versus re-parsing the original file).
So I started to look into saving large amount of data. I've tried using NSKeyedUnarchiver and [NSDictionary writeToFile:atomically:], but both failed with malloc errors (Can not allocate ____ bytes).
I've looked around SO, Apple's Dev forums and Google, but was unable to find anything. I'm wondering if it might be better to create the file bit-by-bit instead of all at once, but I can't anyway to add to an existing file. I'm not completely opposed to saving with a bunch of small files, but I would much rather use one big file.
Thanks!
Edited to include more information: I'm not sure how much overhead NSDictionary gives me, as I don't take all the information from the text files. I have a 1.5 gig file (of which I keep ~1/2), and it turns out to be around 900 megs through 1 gig in ram. There will be some more data that I need to add eventually, but it will be constructed with references to what's already loaded into memory - it shouldn't double the size, but it may come close.
The data is all serial, and could be separated in storage, but needs to all be in memory for execution. I currently have integer/string pairs, and will eventually end up with string/strings pairs (with all the values also being a key for a different set of strings, so the final storage requirements will be the same strings that I currently have, plus a bunch of references).
In the end, I will need to associate ~3 million strings with some other set of strings. However, the only important thing is the relationship between those strings - I could hash all of them, but NSNumber (as NSDictionary needs objects) might give me just as much overhead.
NSDictionary isn't going to give you the scalable storage that you're looking for, at least not for persistence. You should implement your own type of data structure/serialisation process.
Have you considered using an embedded sqllite database? Then you can process the data but perhaps only loading a fragment of the data structure at a time.
If you can, rebuilding your application in 64-bit mode will give you a much larger heap space.
If that's not an option for you, you'll need to create your own data structure and define your own load/save routines that don't allocate as much memory.