CMSensorRecord Processing Large List Troubles? - objective-c

I am using the new CMSensorRecord with watchOS 2 and have no problems up until I try to get data out of CMRecordedAccelerometerData. I believe I am using fast enumeration like it is reccomended to go through the list from the docs
for (CMRecordedAccelerometerData* data in list) {
NSLog(#"Sample: (%f),(%f),(%f) ", data.acceleration.x,data.acceleration.y,data.acceleration.z);
}
however going through even 60 min of data takes a long time process (the accelerometer records at 50Hz (50 data points a second)) — the WWDC video on Core Motion recommends to decimate the data to decrease processing time, how can this be implemented? Or am I misunderstanding the intention of using this, are we intended to send the CMSensorDataList to the iPhone for processing? I would like to add each axis to an array on the Apple Watch without iPhone help.
If for example I recorded for the max 12 hours — there would be over 100 million data points to read over, even in this case would it be possible for the Apple Watch to read through this?
At the moment it is taking minutes to read through a couple million data points.

Related

Detect sharp drops in time series data (Pandas)

I have a time series DataFrame which is indexed by "Cycles" and I'd like to know if there's a way to detect sharp prolonged drops.
The data is smoothed using a rolling mean of 10k cycles. And the index is not "symmetric", meaning that each row doesn't always increase the same number of cycles.
Index(Cycle) RawData DecimatedData
4 9.032 9.363
6 9.721 9.359
10 9.831 9.363
11 8.974 9.352
15 9.143 9.354
Even after decimating there are many "small" drops along the way which I'd like to ignore.
I've tried using df['DecimatedData'].diff() with different windows but it hasn't yielded good results.
I was thinking of putting a counter and that continues to increase as the .diff() function yields negative values, and then after it reaches a certain limit I can flag it. But I think there might be a more elegant way of doing this.

Storing large amount of data in Redis / NoSQL or Relational db?

I need to store and access financial market candle stick information.
The amount of candles sticks that I will need to store is beginning to looking staggering (huge). There are 1000s of markets and each one has many trading pairs, and each pair has many time frames, and each time frame is an array of candles like the below. The array below could be for hourly price data or daily price data for example.
I need to make this information available to multiple users at any given time, so need to store it and make it available somehow.
The data looks something like this:
[
{
time: 1528761600,
openPrice: 100,
closePrice: 20,
highestPrice: 120,
lowesetPrice:10
},
{
time: 1528761610,
openPrice: 100,
closePrice: 20,
highestPrice: 120,
lowesetPrice:10
},
{
time: 1528761630,
openPrice: 100,
closePrice: 20,
highestPrice: 120,
lowesetPrice:10
}
]
Consumers of the data will mostly be a complex Javascript based charting app, but other consumers will be node code, and perhaps other backend code.
My current best idea is to put save the candlesticks in Redis, though I have also considered a noSQL database. I'm not super experienced in either, so I'm not 100% sure Redis is the right choice. It seems to be the most performant option though, but perhaps harder to work with, since I am having to learn a lot, and I'm not convinced that the method of saving and retrieval used by Redis is going to make this very easy since, I will need to continually add candles to each array.
I'm currently thinking something like:
Do an initial fetch from the candle stick api and either:
Create a Redis hash with a suitable label and stingify the whole array of candles into the hash, so that it back be parsed by Javascript etc
Drawbacks of this approach:
Every time a new candle is created, I have to parse the json, add any new candles sticks and stringify and save it.
Pros of this approach:
I can use Javascript to manage the array and make sure it's sorted etc
Create a Redis list of time stamps, which allows me to just push new candles onto the list and trust it to be in the right order. I can then do a Redis SCAN? to return time stamps between the specific dates and then use the time stamps to pull the data out of a Redis hash. After retriveng all of this, then building a json object similar to above to pass to Javascript.
I have to say that both of these approaches feels way more painfull to me putting the data in a relational database. I imagine that a no-SQL database could also be way easier, but I'm not experienced with them, so I can't say for sure.
I'm a bit lost and out of my experience here, as you can tell, and would love any advice anyone can give me.
Thanks :)
Your data is very regular - each candlestick has essentially 1 64 bit long for timestamp, and 4 32 bit numbers for the prices. This makes it very amenable to bitfield.
Storing the data
Here is how I would store it -
stock-symbol:daily_prices = bitfield with 30 * 5 records, assuming you are storing data for past 30 days
stock-symbol:hourly_prices = bitfield with 24 * 5 records
This way, your memory is (30*5 + 24*5) * 16 bytes = 4320 bytes per symbol + constant overhead per key.
You don't need to store the timestamp (see below). Also, I have assumed 4 bytes to store the price. You can store it as a whole number by eliminating the decimal.
Writing the data
To insert hourly prices, find the current hour (say 07:00 hours). If you treat the bitfield as an array of 4 byte integers, you will have to skip 7 * 4 = 28 integers. You then insert the prices at position 28, 29, 30, 31 (0 based indexes).
So, to store price for AAPL at 07:00 hours, you would run the command
bitfield AAPL:hourly_prices set i32 28 <open price> i32 29 <close price> i32 30 <highest price> i32 31 <lowest price>
You would do something similar for daily prices as well.
Reading Data
If you are building a charting library, most likely you would want to return data for multiple symbols for a given time range. Let's say you want to pull out daily prices for past 7 days, your logic will be -
For each symbol:
Get start and end range within the array
Invoke the Get Range command.
If you run this in a pipeline, it will be very fast.
Other tips
Usually, you would to filter by some property of the symbol. For example, "show me graphs of top 10 tech companies for the last 5 days".
A symbol itself is relational data. I would recommend storing that in a relational database. Just get the symbol names as a list from the relational database, and then fetch the stock prices from redis.
Redis has its limits, like anything, but they're pretty high, and if you're clever about it, you can get amazing performance out of redis. If you outgrow one instance you can start thinking about clustering, which should scale relatively linearly to a level where budget is a bigger concern than performance.
Without having a really great grasp of the data you're describing and its relations, sounds like what you're looking for is a sorted set, perhaps sorted by date. You can ZSCAN a sorted set to move through it sequentially, or you can do lots of other great things against one as well. You might have data that requires a few different things - eg a hash for some data and an entry into an index for the hash itself, or even in a few different indexes. A simple redis list might also do the job for you, since it's inherently ordered by insertion order ( this may or may not work for your cases of course; it may depend on whether your input is inherently temporally ordered).
At the end of the day, redis performance is generally dictated by how "well" the data is stored in redis - in other words, how well the native redis capabilities have been mapped into your problem domain. It's pretty easy to use and to program against. I'd highly recommend you look into it.

What is the average consumption of a GPS app (data-wise)?

I'm currently working on a school project to design a network, and we're asked to assess traffic on the network. In our solution (dealing with taxi drivers), each driver will have a smartphone that can be used to track its position to assign him the best ride possible (through Google Maps, for instance).
What would be the size of data sent and received by a single app during one day? (I need a rough estimate, no real need for a precise answer to the closest bit)
Thanks
Gps Positions compactly stored, but not compressed needs this number of bytes:
time : 8 (4 bytes is possible too)
latitude: 4 (if used as integer or float) or 8
longitude 4 or 8
speed: 2-4 (short: 2: integer 4)
course (2-4)
So binary stored in main memory, one location including the most important attributes, will need 20 - 24 bytes.
If you store them in main memory as single location object, additonal 16 bytes per object are needed in a simple (java) solution.
The maximum recording frequence is usually once per second (1/s): Per hour this need: 3600s * 40 byte = 144k. So a smartphone easily stores that even in main memory.
Not sure if you want to transmit the data:
When transimitting this to a server data usually will raise, depending of the transmit protocoll used.
But it mainly depends how you transmit the data and how often.
If you transimit every 5 minutes a position, you dont't have to care, even
when you use a simple solution that transmits 100 times more bytes than neccessary.
For your school project, try to transmit not more than every 5 or better 10 minutes.
Encryption adds an huge overhead.
To save bytes:
- Collect as long as feasible, then transmit at once.
- Favor binary protocolls to text based. (BSON better than JSON), (This might be out of scope for your school project)

Using multiple threads for faster execution

Approximate program behavior:
I have a map image with data associated with the map indicated by RGB index. The data has been populated into an MS Access database. I imported the information in the database into my program as an array and sorted them to go in the order I want the program to run.
I want the program to find the nearest pixel that has a different color from the incumbent pixel being compared. (Colors are stored as string attributes of object Pixel)
First question: Should I use integers to represent my colors instead of string? Would this make the comparison function run significantly faster?
In order to find the nearest pixel of different color, the program begins with all 8 adjacent pixels around the incumbent. If a nonMatch is not found, it then continues onto the next "degree", and in this fashion, it spirals out from the incumbent pixel until it hits a nonMatch. When found, the color of the nonMatch is saved as an attribute of incumbent. After I find the nonMatch for each of the Pixels, the data is re-inserted into the database
The program accomplishes what I want in the manner I've written it, but it is very very slow. After 24 hours, I am only about 3% through with execution.
Question Two: Does my program behavior sound about right? Is this algorithm you would use if you had to accomplish this task?
Question Three: Would it be appropriate for me to use threads in order to finish execution of the program faster? How exactly does that work? (I am brand new to threads, but know a little of the syntax)
Question Four: Would it be more "intelligent" for my program to find the nonMatch for each pixel and insert it into the database immediately after finding it? (I'm making a guess that this would be good in multi-threading, because while one record is accessing the database (to insert), another record is accessing the array of pixels (shared global variable in program).
Question Five: If threading is a good idea, I'm guessing I would split the records up into more manageable chunks (i.e. quarters), and have each thread run the same functions for their specified number of records? Am I close at all?
Please let me know if I can clarify or provide code samples, I just figured that this is more of a conceptual topic so do not want to overburden the post.
1.) Yes, integers compare much faster than strings. Additionally the y use much less memory
2.) I would adapt the algorithm in this way:
E.g.: #1: Let's say, for pixel(87,23) you found the nearest nonMatch to be (88,24) at degree=1 - you can immediately invert the relation and record, that the nearest nonMatch to (88,24) is (87,23). On degree=1 you finished 2 pixels with 1 search.
E.g. #2: Let's say, for pixel (17,18) you found the nearest nonMatch to be (17,20) at degree=2. You can immediately record, that all pixels, that border on both (16,19), (17,19) and (18,19) have the nearest noMatch (17,20) at degree=1, and that one of them is the nearest noMatch to (17,20). On degree=2 (or higher), you finished 5 pixels with 1 search.
3.) Using threads is a two-sided sword: You can do searches in parallel, but you need locking if you write to your array. So this depends on how many CPU cores you can throw at the problem. If this is 3 or more, threads will surely speed up the search.
4.) The results from 2.) make it necessary to mark a pixel as "done" in your array, as you might have finished up to 5 pixels with 1 search. I recommend you put those into a queue and use a dedicated thread to write the queue back into the database: MS Access can't handle concurrent updates, so a single database writer thread looks like a good idea.
5.) I recommend you NOT chunk up the array: You will run into problems with pixels on the edges of a chunk having their nearest nonMatch in a different chunk. Instead if you use e.g. 4 Threads, let them run 1.) From NW corner E, then S 2.) From SE Corner W, then N 3.) From NE Corner S, then W 4. From SW Corner N, then E
Yes. Using a integer would make it much faster
You can reuse the work you have done for previous pixel. Eg. If (a,b) is the nearest non-equal pixel of (x,y), it is likely that points around (x,y) might also have (a,b) as the nearest non-equal pixel
You can use different threads to work on different pixels instead of dividing searching for one pixel
IMHO, steps 1&2 should make your program much faster and you might not need multi-threading.
Yes, I'd convert colour strings to Integers for speed, or even Color structures if you intend to display them on the screen.
Don't work directly with the database if you can avoid it. Copy the necessary data out of the database into an array before you start, and copy your results back when you're finished.

Mesaure upload and download speed in iPhone

I would like to measure the upload and download speed of data in iPhone, is any API available to achieve the same? Is it correct to measure it on the basis of dividing total bytes received with time taken in response?
Yes, it is correct to measure the total bytes / time taken, that is exactly what the speed is. You might want to take an average if you want to constantly show the download speed.., like using 500 bytes and the time it took to download those particular ones.
For doing this you could like have an NSMutableArray, as a buffer, which you empty idk every 2 seconds. Then you do [bufferMutableArray length]/2 and you know how many bytes a second you had those 2 seconds. When you empty the buffer ofc append to the data you are downloading.
There is no direct API to know the speed.
Total data received/sent and time only will give you average speed. There use to be lot of variation in the speed over the time so if you want more accurate value then do the speed calculation based on sampling.
(Data transferred in 1 miniut) /(60 seconds) ---> this solution only if you need greater accuracy in the speed calculation. The sampling duration can changed based on the level of accuracy required.