I have formatted a thumbdrive with Fat32 and placed a file in the root directory named sampleFile.txt and with the contents "oblique". I looked at the drive in Disk Investigator and I found in the RootDirSector: sector 4096 the following
0040 53 41 4D 50 4C 45 7E 31 S A M P L E ~ 1 83 65 77 80 76 69 126 49
0048 54 58 54 20 00 36 81 5B T X T . 6 . [ 84 88 84 32 0 54 129 91
0050 2E 45 2E 45 00 00 89 5B . E . E . . . [ 46 69 46 69 0 0 137 91
0058 2E 45 03 00 07 00 00 00 . E . . . . . . 46 69 3 0 7 0 0 0
How do I find the location of the sector cluster where the actual data of the file is located? Here is some additional info:
Logical drive: G
Size: 3 Gb (popularly 3 Gb)
Logical sectors: 3889016
Bytes per sector: 1024
Sectors per Cluster: 8
Cluster size: 8192
File system: FAT32
Number of copies of FAT: 2
Sectors per FAT: 1899
Start sector for FAT1: 298
Start sector for FAT2: 2197
Root DIR Sector: 4096
Root DIR Cluster: 2
2-nd Cluster Start Sector: 4096
Ending Cluster: 485616
Media Descriptor: 248
Root Entries: 0
Heads: 255
Hidden sectors: 0
Backup boot sector: 6
Reserved sectors: 298
FS Info sector: 1
Sectors per track: 63
File system version: 0
SerialVolumeID: 4A95395B
Volume Label: NO NAME
The "Short File Name Entry" contains the starting cluster of the file. Because the test file is very small, it only requires a cluster disk space.
In this case, 8192 bytes for a 7 byte string. So therefore, the FAT does not matter, because the file is not span multiple clusters. However, your file entry is incomplete. A FAT32 file name entry is 32 bytes long.
Offset 1Ah contains the starting cluster (2 bytes length). If offset 14h (2 bytes length) contains a value, then 1Ah is the low word, 14h the high word of the starting cluster.
I'm not sure, but I think the system area is counted sector wise, the data area cluster wise. The data area begins after the fat2. Unusually, your disk has a sector size of 1024 bytes.
Related
I am trying to read data from kafka as you can see :
var config = new ConsumerConfig
{
BootstrapServers = ""*******,
GroupId = Guid.NewGuid().ToString(),
AutoOffsetReset = AutoOffsetReset.Earliest
};
MessageParser<AdminIpoChange> parser = new(() => new AdminIpoChange());
using (var consumer = new ConsumerBuilder<Ignore, byte[]>(config).Build())
{
consumer.Subscribe("AdminIpoChange");
while (true)
{
AdminIpoChange item = new AdminIpoChange();
var cr = consumer.Consume();
item = parser.ParseFrom(new ReadOnlySpan<byte>(cr.Message.Value).ToArray());
}
consumer.Close();
}
I am using google protobuf for send and receive data .This code returns this error in parser line:
KafkaConsumer.ConsumeAsync: Protocol message end-group tag did not match expected tag.
Google.Protobuf.InvalidProtocolBufferException: Protocol message end-group tag did not match expected tag.
at Google.Protobuf.ParsingPrimitivesMessages.CheckLastTagWas(ParserInternalState& state, UInt32 expectedTag)
at Google.Protobuf.ParsingPrimitivesMessages.ReadGroup(ParseContext& ctx, Int32 fieldNumber, UnknownFieldSet set)
at Google.Protobuf.UnknownFieldSet.MergeFieldFrom(ParseContext& ctx)
at Google.Protobuf.UnknownFieldSet.MergeFieldFrom(UnknownFieldSet unknownFields, ParseContext& ctx)
at AdminIpoChange.pb::Google.Protobuf.IBufferMessage.InternalMergeFrom(ParseContext& input) in D:\MofidProject\domain\obj\Debug\net6.0\Protos\Rlc\AdminIpoChange.cs:line 213
at Google.Protobuf.ParsingPrimitivesMessages.ReadRawMessage(ParseContext& ctx, IMessage message)
at Google.Protobuf.CodedInputStream.ReadRawMessage(IMessage message)
at AdminIpoChange.MergeFrom(CodedInputStream input) in D:\MofidProject\domain\obj\Debug\net6.0\Protos\Rlc\AdminIpoChange.cs:line 188
at Google.Protobuf.MessageExtensions.MergeFrom(IMessage message, Byte[] data, Boolean discardUnknownFields, ExtensionRegistry registry)
at Google.Protobuf.MessageParser`1.ParseFrom(Byte[] data)
at infrastructure.Queue.Kafka.KafkaConsumer.ConsumeCarefully[T](Func`2 consumeFunc, String topic, String group) in D:\MofidProject\infrastructure\Queue\Kafka\KafkaConsumer.cs:line 168
D:\MofidProject\mts.consumer.plus\bin\Debug\net6.0\mts.consumer.plus.exe (process 15516) exited with code -1001.
To automatically close the console when debugging stops, enable Tools->Options->Debugging->Automatically close the console when debugging stops.'
Updated:
My sample data that comes from Kafka :
- {"SymbolName":"\u0641\u062F\u0631","SymbolIsin":"IRo3pzAZ0002","Date":"1400/12/15","Time":"08:00-12:00","MinPrice":17726,"MaxPrice":21666,"Share":1000,"Show":false,"Operation":0,"Id":"100d8e0b54154e9d902054bff193e875","CreateDateTime":"2022-02-26T09:47:20.0134757+03:30"}
My rlc Model :
syntax = "proto3";
message AdminIpoChange
{
string Id =1;
string SymbolName =2;
string SymbolIsin =3;
string Date =4;
string Time=5;
double MinPrice =6;
double MaxPrice =7;
int32 Share =8;
bool Show =9;
int32 Operation =10;
string CreateDateTime=11;
enum AdminIpoOperation
{
Add = 0;
Edit = 1;
Delete = 2;
}
}
My data in bytes :
7B 22 53 79 6D 62 6F 6C 4E 61 6D 65 22 3A 22 5C 75 30 36 34 31 5C 75 30 36 32 46 5C 75 30
36 33 31 22 2C 22 53 79 6D 62 6F 6C 49 73 69 6E 22 3A 22 49 52 6F 33 70 7A 41 5A 30 30 30
32 22 2C 22 44 61 74 65 22 3A 22 31 34 30 30 2F 31 32 2F 31 35 22 2C 22 54 69 6D 65 22 3A
22 30 38 3A 30 30 2D 31 32 3A 30 30 22 2C 22 4D 69 6E 50 72 69 63 65 22 3A 31 37 37 32 36
2C 22 4D 61 78 50 72 69 63 65 22 3A 32 31 36 36 36 2C 22 53 68 61 72 65 22 3A 31 30 30 30
2C 22 53 68 6F 77 22 3A 66 61 6C 73 65 2C 22 4F 70 65 72 61 74 69 6F 6E 22 3A 30 2C 22 49
64 22 3A 22 31 30 30 64 38 65 30 62 35 34 31 35 34 65 39 64 39 30 32 30 35 34 62 66 66 31
39 33 65 38 37 35 22 2C 22 43 72 65 61 74 65 44 61 74 65 54 69 6D 65 22 3A 22 32 30 32 32
2D 30 32 2D 32 36 54 30 39 3A 34 37 3A 32 30 2E 30 31 33 34 37 35 37 2B 30 33 3A 33 30 22
7D
The data is definitely not protobuf binary; byte 0 starts a group with field number 15; inside this group is:
field 4, string
field 13, fixed32
field 6, varint
field 12, fixed32
field 6, varint
after this (at byte 151), an end-group token is encountered with field number 6
There are many striking things about this:
your schema doesn't use groups (in fact, the mere existence of groups is now hard to find in the docs), so ... none of this looks right
end-group tokens are always required to match the last start-group field number, which it doesn't
fields inside a single level are usually (although as a "should", not a "must") written in numerical order
you have no field 12 or 13 declared
your field 6 is of the wrong type - we expect fixed64 here, but got varint
So: there's no doubt about it: that data is ... not what you expect. It certainly isn't valid protobuf binary. Without knowing how that data is stored, all we can do is guess, but on a hunch: let's try decoding it as UTF8 and see what it looks like:
{"SymbolName":"\u0641\u062F\u0631","SymbolIsin":"IRo3pzAZ0002","Date":"1400/12/15","Time":"08:00-12:00","MinPrice":17726,"MaxPrice":21666,"Share":1000,"Show":false,"Operation":0,"Id":"100d8e0b54154e9d902054bff193e875","CreateDateTime":"2022-02-26T09:47:20.0134757+03:30"}
or (formatted)
{
"SymbolName":"\u0641\u062F\u0631",
"SymbolIsin":"IRo3pzAZ0002",
"Date":"1400/12/15",
"Time":"08:00-12:00",
"MinPrice":17726,
"MaxPrice":21666,
"Share":1000,
"Show":false,
"Operation":0,
"Id":"100d8e0b54154e9d902054bff193e875",
"CreateDateTime":"2022-02-26T09:47:20.0134757+03:30"
}
Oops! You've written the data as JSON, and you're trying to decode it as binary protobuf. Decode it as JSON instead, and you should be fine. If this was written with the protobuf JSON API: decode it with the protobuf JSON API.
I'm currently trying to extract some data from a .DLL library - I've figured out the file structure (there are 1039 data blocks compressed with zlib, starting at offset 0x3c00, the last one being the fat table). The fat table itself is divided into 1038 "blocks" (8 bytes + a base64 encoded string - the filename). As far as I've seen, byte 5 is the length of the filename.
My problem is that I can't seem to understand what bytes 1-4 are used for: my first guess was that they were an offset to locate the file block inside the .DLL (mainly because the values are increasing throughout the table), but for instance, in this case, the first "block" is:
Supposed offset: 2E 78 00 00
Filename length: 30 00 00 00
Base64 encoded filename: 59 6D 46 30 64 47 78 6C 58 32 6C 75 64 47 56 79 5A 6D 46 6A 5A 56 78 42 59 33 52 70 64 6D 56 51 5A 58 4A 72 63 31 4E 6F 62 33 63 75 59 77 3D 3D
yet, as I said earlier, the block itself is at 0x3c00, so things don't match. Same goes for the second block (starting at 0x3f0b, whereas the table supposed offset is 0x167e)
Any ideas?
Answering my own question lol
Anyway, those numbers are the actual offsets of the file blocks, except for the fact that the first one starts from some random number instead than from the actual location of the first block. Aside from that, though, differences between each couple of offsets do match the length of the corresponding block.
I'm trying to extract an ECDSA public key from my known_hosts file that ssh uses to verify a host. I have one below as an example.
This is the entry for "127.0.0.1 ecdsa-sha2-nistp256" in my known_hosts file:
AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBF3QCzKPRluwunLRHaFVEZNGCPD/rT13yFjKiCesA1qoU3rEp9syhnJgTbaJgK70OjoT71fDGkwwcnCZuJQPFfo=
I ran it through a Base64 decoder to get this:
���ecdsa-sha2-nistp256���nistp256���A]2F[rUF=wXʈ'ZSzħ2r`M::WL0rp
So I'm assuming those question marks are some kind of separator (no, those are lengths). I figured that nistp256 is the elliptical curve used, but what exactly is that last value?
From what I've been reading, the public key for ECDSA has a pair of values, x and y, which represent a point on the curve. Is there some way to extract x and y from there?
I'm trying to convert it into a Java public key object, but I need x and y in order to do so.
Not all of characters are shown since they are binary. Write the Base64-decoded value to the file and open it in a hex editor.
The public key for a P256 curve should be a 65-byte array, starting from the byte with value 4 (which means a non-compressed point). The next 32 bytes would be the x value, and the next 32 the y value.
Here is the result in hexadecimal:
Signature algorithm:
00 00 00 13
65 63 64 73 61 2d 73 68 61 32 2d 6e 69 73 74 70 32 35 36
(ecdsa-sha2-nistp256)
Name of domain parameters:
00 00 00 08
6e 69 73 74 70 32 35 36
(nistp256)
Public key value:
00 00 00 41
04
5d d0 0b 32 8f 46 5b b0 ba 72 d1 1d a1 55 11 93 46 08 f0 ff ad 3d 77 c8 58 ca 88 27 ac 03 5a a8
53 7a c4 a7 db 32 86 72 60 4d b6 89 80 ae f4 3a 3a 13 ef 57 c3 1a 4c 30 72 70 99 b8 94 0f 15 fa
So you first have the name of the digital signature algorithm to use, then the name of the curve and then the public component of the key, represented by an uncompressed EC point. Uncompressed points start with 04, then the X coordinate (same size as the key size) and then the Y coordinate.
As you can see, all field values are preceded by four bytes indicating the size of the field. All values and fields are using big-endian notation.
New to python (very cool), first question. I am reading a 50+ mb ascii file, scanning for property tags and parsing the data into a numpy array. I have placed timing reports throughout the loop and found the culprit, the while loop using np.append(). Wondering if there is a faster method.
This is a sample input file format with fake data for debugging:
...
tag parameter
char name "Poro"
array float data 100
1 2 3 4 5 6 7 8 9 10 11 12
13 14 15 16 17 18 19 20 21 22 23 24
25 26 27 28 29 30 31 32 33 34 35 36
37 38 39 40 41 42 43 44 45 46 47 48
49 50 51 52 53 54 55 56 56 58 59 60
61 62 63 64 65 66 67 68 69 70 71 72
73 74 75 76 77 78 79 80 81 82 83 84
85 86 87 88 89 90 91 92 93 94 95 96
97 98 99 100
endtag
...
and this is the code fragment, where it's the while loop that is taking 70 seconds for a 350k element array:
def readParameter(self, parameterName):
startTime = time.time()
intervalTime = time.time()
token = "tag parameter"
self.inputBuffer.seek(0)
for lineno, line in enumerate(self.inputBuffer, 1):
if token in line:
line = self.inputBuffer.next().replace('"', '').split()
elapsedTime = time.time() - intervalTime
logging.debug(" Time to readParameter find token: " + str(elapsedTime))
intervalTime = time.time()
if line[2] == parameterName:
line = self.inputBuffer.next()
line = self.inputBuffer.next()
np.parameterArray = np.fromstring(line, dtype=float, sep=" ")
line = self.inputBuffer.next()
**while not "endtag" in line:
np.parameterArray = np.append(np.parameterArray, np.fromstring(line, dtype=float, sep=" "))
line = self.inputBuffer.next()**
elapsedTime = time.time() - startTime
logging.debug(" Time to readParameter load array: " + str(elapsedTime))
break
elapsedTime = time.time() - startTime
logging.debug(" Time to readParameter: " + str(elapsedTime))
logging.debug(np.parameterArray)
np.parameterArray = self.make3D(np.parameterArray)
return np.parameterArray
Thanks, Jeff
Appending to an array requires resizing the array, which usually requires allocating a new block of memory that's big enough to hold the new array, copying the existing array to the new location, and freeing the memory it used to use. All of those operations are expensive, and you're doing them for each element. With 350k elements, it's basically garbage-collector memory fragmentation stress-test.
Pre-allocate your array. You've got the count parameter, so make an array that size, and inside your loop, just assign the newly-parsed element to the next spot in the array, instead of appending it. You'll have to keep your own counter of how many elements have been filled. (You could instead iterate over the elements of the blank array and replace them, but that would make error handling a bit trickier to add in.)
In my project I need to know what a zlib header looks like. I've heard it's rather simple but I cannot find any description of the zlib header.
For example, does it contain a magic number?
zlib magic headers
78 01 - No Compression/low
78 9C - Default Compression
78 DA - Best Compression
Link to RFC
0 1
+---+---+
|CMF|FLG|
+---+---+
CMF (Compression Method and flags)
This byte is divided into a 4-bit compression method and a 4-
bit information field depending on the compression method.
bits 0 to 3 CM Compression method
bits 4 to 7 CINFO Compression info
CM (Compression method)
This identifies the compression method used in the file. CM = 8
denotes the "deflate" compression method with a window size up
to 32K. This is the method used by gzip and PNG and almost everything else.
CM = 15 is reserved.
CINFO (Compression info)
For CM = 8, CINFO is the base-2 logarithm of the LZ77 window
size, minus eight (CINFO=7 indicates a 32K window size). Values
of CINFO above 7 are not allowed in this version of the
specification. CINFO is not defined in this specification for
CM not equal to 8.
In practice, this means the first byte is almost always 78 (hex)
FLG (FLaGs)
This flag byte is divided as follows:
bits 0 to 4 FCHECK (check bits for CMF and FLG)
bit 5 FDICT (preset dictionary)
bits 6 to 7 FLEVEL (compression level)
The FCHECK value must be such that CMF and FLG, when viewed as
a 16-bit unsigned integer stored in MSB order (CMF*256 + FLG),
is a multiple of 31.
FLEVEL (Compression level)
These flags are available for use by specific compression
methods. The "deflate" method (CM = 8) sets these flags as
follows:
0 - compressor used fastest algorithm
1 - compressor used fast algorithm
2 - compressor used default algorithm
3 - compressor used maximum compression, slowest algorithm
ZLIB/GZIP headers
Level | ZLIB | GZIP
1 | 78 01 | 1F 8B
2 | 78 5E | 1F 8B
3 | 78 5E | 1F 8B
4 | 78 5E | 1F 8B
5 | 78 5E | 1F 8B
6 | 78 9C | 1F 8B
7 | 78 DA | 1F 8B
8 | 78 DA | 1F 8B
9 | 78 DA | 1F 8B
Deflate doesn't have common headers
The ZLIB header (as defined in RFC1950) is a 16-bit, big-endian value - in other words, it is two bytes long, with the higher bits in the first byte and the lower bits in the second.
It contains these bitfields from most to least significant:
CINFO (bits 12-15, first byte)
Indicates the window size as a power of two, from 0 (256 bytes) to 7 (32768 bytes). This will usually be 7. Higher values are not allowed.
CM (bits 8-11)
The compression method. Only Deflate (8) is allowed.
FLEVEL (bits 6-7, second byte)
Roughly indicates the compression level, from 0 (fast/low) to 3 (slow/high)
FDICT (bit 5)
Indicates whether a preset dictionary is used. This is usually 0.
(1 is technically allowed, but I don't know of any Deflate formats that define preset dictionaries.)
FCHECK (bits 0-4)
A checksum (5 bits, 0..31), whose value is calculated such that the entire value divides 31 with no remainder.*
Typically, only the CINFO and FLEVEL fields can be freely changed, and FCHECK must be calculated based on the final value. Assuming no preset dictionary, there is no choice in what the other fields contain, so a total of 32 possible headers are valid. Here they are:
FLEVEL: 0 1 2 3
CINFO:
0 08 1D 08 5B 08 99 08 D7
1 18 19 18 57 18 95 18 D3
2 28 15 28 53 28 91 28 CF
3 38 11 38 4F 38 8D 38 CB
4 48 0D 48 4B 48 89 48 C7
5 58 09 58 47 58 85 58 C3
6 68 05 68 43 68 81 68 DE
7 78 01 78 5E 78 9C 78 DA
The CINFO field is rarely, if ever, set by compressors to be anything other than 7 (indicating the maximum 32KB window), so the only values you are likely to see in the wild are the four in the bottom row (beginning with 78).
* (You might wonder if there's a small amount of leeway on the value of FCHECK - could it be set to either of 0 or 31 if both pass the checksum? In practice though, this can only occur if FDICT=1, so it doesn't feature in the above table.)
Following is the Zlib compressed data format.
+---+---+
|CMF|FLG| (2 bytes - Defines the compression mode - More details below)
+---+---+
+---+---+---+---+
| DICTID | (4 bytes. Present only when FLG.FDICT is set.) - Mostly not set
+---+---+---+---+
+=====================+
|...compressed data...| (variable size of data)
+=====================+
+---+---+---+---+
| ADLER32 | (4 bytes of checksum)
+---+---+---+---+
Mostly, FLG.FDICT (Dictionary flag) is not set. In such cases the DICTID is simply not present. So, the total hear is just 2 bytes.
The header values(CMF and FLG) with no dictionary are defined as follows.
CMF | FLG
0x78 | 0x01 - No Compression/low
0x78 | 0x9C - Default Compression
0x78 | 0xDA - Best Compression
More at ZLIB RFC
All answers here are most probably correct, however - if you want to manipulate ZLib compression stream directly, and it was produced by using gz_open, gzwrite, gzclose functions - then there is extra 10 leading bytes header before zlib compression steam comes - and those are produced by function gz_open - header looks like this:
fprintf(s->file, "%c%c%c%c%c%c%c%c%c%c", gz_magic[0], gz_magic[1],
Z_DEFLATED, 0 /*flags*/, 0,0,0,0 /*time*/, 0 /*xflags*/, OS_CODE);
And results in following hex dump: 1F 8B 08 00 00 00 00 00 00 0B
followed by zlib compression stream.
But there is also trailing 8 bytes - they are uLong - crc over whole file, uLong - uncompressed file size - look for following bytes at end of stream:
putLong (s->file, s->crc);
putLong (s->file, (uLong)(s->in & 0xffffffff));