Reverse Engineering Fixed Point Numbers - embedded

I am currently putting an engine into another car and I want to keep the fuel economy calulation inside the boardcomputer working. I managed to recode this part sucessfully, but I have been trying to figure out the (simple?) two byte dataformat they used without success. I assume it is fixed point notation, but no matter how I shift it around, it does not line up. How do the two bytes represent the right number?
Some examples:
Bytes (Dec) --> Result
174,10 -> 2,67
92,11 -> 2,84
128,22 -> 3,75
25,29 -> 4,85
225,23 -> 3,98
00,40 -> 5,00
128,34 -> 5,75

Here's a partial solution:
First, swap the bytes. Then join them:
The result (in hex) is:
0AAE
0B5C
1680
1D19
17E1
2800
2280
Then split the into the first digit (4 bits), the remaining three digits (12 bits) and keep the entire number (16 bits) as well. The result (in decimal) is:
0 2734 2734
0 2908 2908
1 1664 5760
1 3353 7449
1 2017 6113
2 2048 10240
2 640 8832
The first digits seems to be a multiplication factor. 0 stands for 1024, 1 for 1536, 2 for 2048. The formula possibly is f = (1024 + n * 512).
Now divide the entire number by the multiplication factor. The result, rounded to two decimal places, is:
2734 / 1024 = 2.67
2908 / 1024 = 2.84
5760 / 1536 = 3.75
7449 / 1536 = 4.85
6113 / 1536 = 3.98
10240 / 2048 = 5.00
8832 / 2048 = 4.31
It works for all except the last number, which might contain a mistake.
So it seems to be some sort of floating-point number but I don't recoginze the specific format. Possibly, there is a simpler formula the explains the number.

Related

Can DEFLATE only compress duplicate strings up to 32 KiB apart?

According to DEFLATE spec:
Compressed representation overview
A compressed data set consists of a series of blocks, corresponding to successive blocks of input
data. The block sizes are arbitrary, except that non-compressible
blocks are limited to 65,535 bytes.
Each block is compressed using a combination of the LZ77 algorithm and
Huffman coding. The Huffman trees for each block are independent of
those for previous or subsequent blocks; the LZ77 algorithm may use a
reference to a duplicated string occurring in a previous block, up to
32K input bytes before.
Each block consists of two parts: a pair of Huffman code trees that
describe the representation of the compressed data part, and a
compressed data part. (The Huffman trees themselves are compressed
using Huffman encoding.) The compressed data consists of a series of
elements of two types: literal bytes (of strings that have not been
detected as duplicated within the previous 32K input bytes), and
pointers to duplicated strings, where a pointer is represented as a
pair <length, backward distance>. The representation used in the
"deflate" format limits distances to 32K bytes and lengths to 258
bytes, but does not limit the size of a block, except for
uncompressible blocks, which are limited as noted above.
So pointers to duplicate strings only go back 32 KiB, but since block size is not limited, could the Huffman code tree store two duplicate strings more than 32 KiB apart as the same code? Then is the limiting factor the block size?
The Huffman tree for distances contains codes 0 to 29 (table below); the code 29, followed by 8191 in "plain" bits, means "distance 32768". That's a hard limit in the definition of Deflate. The block size is not limiting. Actually the block size is not stored anywhere: the block is an infinite stream. If you want to stop the block, you send an End-Of-Block code for that.
Distance Codes
--------------
Extra Extra Extra Extra
Code Bits Dist Code Bits Dist Code Bits Distance Code Bits Distance
---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- --------
0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144
1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192
2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288
3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384
4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576
5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768
6 2 9-12 14 6 129-192 22 10 2049-3072
7 2 13-16 15 6 193-256 23 10 3073-4096
To add to Zerte's answer, the references to previous sequences have nothing to do with blocks or block boundaries. Such references can be within blocks, across blocks, and the referenced sequence can cross a block boundary.

How should I impute NaN values in a categorical column?

Should I encode a categorical column and use label encoding, then impute NaN values with most frequent value, or are there other ways?
As encoding requires converting dataframe to array, then imputing would require again array to dataframe conversion (all this for a single column, and there are more columns like that).
Fore example, I have the variable BsmtQual which evaluates the height of a basement and has following number of categories:
Ex Excellent (100+ inches)
Gd Good (90-99 inches)
TA Typical (80-89 inches)
Fa Fair (70-79 inches)
Po Poor (<70 inches
NA No Basement
Out of 2919 values in BsmtQual, 81 are NaN values.
For problems you have in the future like this that don't involve coding you should post at https://datascience.stackexchange.com/.
This depends on a few things. First of all, how important is this variable in your exercise? Assuming that you are doing classification, you could try removing all rows without with NaN values, running a few models, then removing the variable and running the same models again. If you haven't seen a dip in accuracy, then you might consider removing the variable completely.
If you do see a dip in accuracy or can't judge impact due to the problem being unsupervised, then there are several other methods you can try. If you just want a quick fix, and if there aren't too many NaNs or categories, then you can just impute with the most frequent value. This shouldn't cause too many problems if the previous conditions are satisfied.
If you want to be more exact, then you could consider using the other variables you have to predict the class of the categorical variable (obviously this will only work if the categorical variable is correlated to some of your other variables). You could use a variety of algorithms for this, including classifiers or clustering. It all depends on the distribution of your categorical variable and how much effort you want to put it in to solve your issue.
(I'm only learning as well, however I think thats most of your options)
"… or there are other ways."
Example:
Ex Excellent (100+ inches) 5 / 5 = 1.0
Gd Good (90-99 inches) 4 / 5 = 0.8
TA Typical (80-89 inches) 3 / 5 = 0.6
Fa Fair (70-79 inches) 2 / 5 = 0.4
Po Poor (<70 inches 1 / 5 = 0.2
NA No Basement 0 / 5 = 0.0
However, labels express less precision (affects accuracy if combined with actual measurements).
Could be solved by either scaling values over category range (e.g. scaling 0 - 69 inches over 0.0 - 0.2), or by approximation value for each category (more linearly accurate). For example, if highest value is 200 inch:
Ex Excellent (100+ inches) 100 / 200 = 0.5000
Gd Good (90-99 inches) ((99 - 90) / 2) + 90 / 200 = 0.4725
TA Typical (80-89 inches) ((89 - 80) / 2) + 80 / 200 = 0.4225
Fa Fair (70-79 inches) ((79 - 70) / 2) + 70 / 200 = 0.3725
Po Poor (<70 inches (69 / 2) / 200 = 0.1725
NA No Basement 0 / 200 = 0.0000
Actual measurement 120 inch 120 / 200 = 0.6000
Produces decent approximation (range mid-point value, except Ex, which is a minimum value). If calculations on such columns produce inaccuracies it is for notation imprecision (labels express ranges rather than values).

How number of MFCC coefficients depends on the length of the file

I have a voice data with length 1.85 seconds, then I extract its feature using MFCC (with libraby from James Lyson). It returns 184 x 13 features. I am using 10 milisecond frame step, 25 miliseconds frame size, and 13 coefficients from DCT. How can it return 184? I still can not understand because the last frame's length is not 25 miliseconds. Is there any formula which explain how can it return 184? Thank you in advance.
There is a picture that can explain you things, basically the last window takes more space than previous ones.
If you have 184 windows, the region you cover is 183 * 10 + 25 or approximately 1855 ms.

How do I calculate sum of values in a column with different units?

Date From To Upload Download Total
03/12/15 00:53:52 01:53:52 407 KB 4.55 MB 4.94 MB
01:53:51 02:53:51 68.33 MB 1.60 GB 1.66 GB
02:53:51 03:53:51 95.39 MB 2.01 GB 2.10 GB
03:53:50 04:53:50 0 KB 208 KB 209 KB
04:53:50 05:53:50 0 KB 10 KB 11 KB
05:53:49 06:53:49 0 KB 7 KB 7 KB
06:53:49 07:53:49 370 KB 756 KB 1.10 MB
07:53:48 08:53:48 2.69 MB 64.05 MB 66.74 MB
I have this data in a spreadsheet. The last column contains total data usage in an hour. I would like to add all data used in a day in GB. The total data usage as you can see varies. It has KB, MB and GB.
How can I do it in LibreOffice Calc?
Converting all the totals into kilobytes and then summing the column of kilobytes seems like the most straightforward method.
Assuming your "Total" column is column F, and the entries in this column are text (and not numbers formatted to have the varies byte size indicators on the end), this formula will convert GB into KB:
=IF(RIGHT(F2,2)="GB",1048576*VALUE(LEFT(F2,LEN(F2)-3)),"Not a GB entry")
The IF function takes parameters IF(Test is True, Then Do This, Else Do That). In this case we are telling Calc:
IF the right two characters in this string are "GB"
THEN take the left characters minus three, convert the string into a number with VALUE, and multiply by 1,045,576
ELSE give an error message
You want to handle GB, MB, and KB, which requires nested IF statements like so:
=IF(RIGHT(F2,2)="GB",1048576*VALUE(LEFT(F2,LEN(F2)-3)),IF(RIGHT(F2,2)="MB",1024*VALUE(LEFT(F2,LEN(F2)-3)),IF(RIGHT(F2,2)="KB",VALUE(LEFT(F2,LEN(F2)-3)),"No byte size given")))
Copy and paste the formula down however long your column is. Then SUM over the calculated KB values.
This is correct formula for G, M, K suffixies, value getting from B2 cell:
=IF(RIGHT(B2;1)="G";1048576*VALUE(LEFT(B2;LEN(B2)-1));IF(RIGHT(B2;1)="M";1024*VALUE(LEFT(B2;LEN(B2)-1));IF(RIGHT(B2;1)="K";VALUE(LEFT(B2;LEN(B2)-1));"No byte size given")))

Pulling from a very very specific location imbedded in a text file

I finished every piece of code in my program save for one tid bit, how to pull two numbers from a text file. I know how to pull lines, I know how to pull search strings, but I cant figure out this one to save my life.
Anyways here is a sample of the automatically generated text that I need to pull from...
.......................................................................
Applications Memory Usage (kB):
Uptime: 6089044 Realtime: 6089040
** MEMINFO in pid 764 [com.lookout] **
native dalvik other total
size: 27908 8775 N/A 36683
allocated: 3240 4216 N/A 7456
free: 24115 4559 N/A 28674
(Pss): 1454 1142 6524 *9120*
(priv dirty): 1436 628 5588 *7652*
Objects
Views: 0 ViewRoots: 0
AppContexts: 0 Activities: 0
Assets: 3 AssetManagers: 3
Local Binders: 15 Proxy Binders: 41
Death Recipients: 3
OpenSSL Sockets: 0
SQL
heap: 98 MEMORY_USED: 98
PAGECACHE_OVERFLOW: 16 MALLOC_SIZE: 50
DATABASES
pgsz dbsz Lookaside(b) Dbname
1 14 120 google_analytics.db
Asset Allocations
zip:/system/app/com.lookout_6.0.1_r8234_Release.apk:/resources.arsc: 161K
.............................................................................
The two numbers that I need out of this are the two ones that I put in the **'s (the asterisks are not normally there). These numbers will be different every time this sheet is generated, and the number placement might be different as well as some of the numbers could have 4 digits, 5 digits, or 6 digits.
If anyone could shed any light on the subject it would be greatly appreciated
Thanks,
Zach
You just need to read in the last word of the line and convert it to a number. Use String.LastIndexOf to find the last space " " in the file and read the data from that point forwards.
Dim line as String = " (Pss): 1454 1142 6524 9120"
Dim value as Integer
If line.IndexOf("(Pss)") > 0 Then
value = CInt(line.Substring(line.LastIndexOf(" ") + 1))
End If