wrong output when decoding base64 string - vb.net

i seem to always get incorrect output when decoding this base64 string in vb.net ( i think its base64? it really looks like it )
im using the frombase64string function
and i did it like this
Dim b64str = "0DDQQL3uAikQBgAAc4cqK4WnSQBg4SAgExEAAF3BAmAILYojRgkBhUrBAgEDRw=="
Dim i As String = System.Text.Encoding.Unicode.GetString(Convert.FromBase64String(b64str))
MsgBox(i)
but i always get this output
バ䃐⤂ؐ
that doesn't seem right

0DDQQL3uAikQBgAAc4cqK4WnSQBg4SAgExEAAF3BAmAILYojRgkBhUrBAgEDRw==
It looks like Base64, the length is a correct size, the characters belong to the Base64 character set and the trailing "==" is reasonable. Of course it might not be a Base64 encoding.
Base64 decoding results in:
D0 30 D0 40 BD EE 02 29 10 06 00 00 73 87 2A 2B 85 A7 49 00 60 E1 20 20 13 11 00 00 5D C1 02 60 08 2D 8A 23 46 09 01 85 4A C1 02 01 03 47
Now the problem, this is not a character string, it is an array of 8-bit bytes. Thus it can not be displayed as characters. The 0x00 bytes will signal the end of a string to the print method and the no-representable characters may be ignored, displayed with special characters or multiple bytes may display as must-byte unicode characters. The only guaranteed and usual display is in hexadecimal as above.

That String can be virtually anything. It might be the result of an encryption algorithm, like sha*. Your mistake is that you assume that it must be base64 because it might be.
It is a valid observation that it might be base64, so it was a perfectly valid thing to run that function, but it is you who has to determine whether based on the results it is base64 or something else, based on particular logic, which was not described in the question.

Related

How we can search structures recursively for a variable value under windbg using script

can we write a script searching for a particular value under structures recursively?
One manually way (and time taking) I can think of is to create a log file with the "dt -r " and searching it manually.
what do you mean by searching for a value recursively ?
dt -r follows structures within structures and as such their Address will not be contiguous
you can use a the simple s command t osearch within a limited space like this
kd> r? $t0 = sizeof(nt!_EPROCESS)
kd> ? #$t0
Evaluate expression: 704 = 000002c0
kd> r? $t1 = ##masm(#$proc)
kd> ? #$t1
Evaluate expression: -2063606960 = 84ffdb50
kd> $$ #$proc is pointer to current process in kernel mode
let us search for char 'k' within this range
kd> s -b #$t1 l?#$t0 'k'
84ffdcbc 6b 64 2e 65 78 65 00 00-00 00 00 00 00 00 00 02 kd.exe..........
kd> dt nt!_EPROCESS ImageFileName #$proc
+0x16c ImageFileName : [15] "kd.exe"
kd> ? #$t1+0x16c
Evaluate expression: -2063606596 = 84ffdcbc
kd>

How many combinations does SHA-256 have?

By using an online tool and wikipedia I found out that every sha-256 encrypted string is 64 chars longs containing numbers and characters. Hence I assumed that there are 34^36 combinations ( 2^216 simplified by an algebra calculator ).
After doing some research I found out that most people said there are 2^256 combinations. Could someone explain ? To make the context clear, I write a paper about cryptocurrencies and try to explain how many different combinations there are to encrypt and how long this could take ( therefore how many guesses it could take) and compare this to the amount of total atoms in the universe (roughly 10^85).
SHA-256 produces 256 bits which is 32 bytes, not characters, each byte has 256 possible values.
There are 256 bits and each bit has 2 values (0 or 1), thus 2^256.
There are 32 bytes and each byte has 256 values, thus 256^32.
Note: 2^256 == 256^32 ~= 10^77.
The 32 bytes can be encoded many ways, in hexadecimal it would be 64 characters, in Base64 it would be 44 characters.
Total combinations of SHA-256 is
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
A sha-256 hash has 64 characters, 32 hex combinations, because a hex has 2 characters.
3a 7b d3 e2 36 0a 3d 29 ee a4 36 fc fb 7e 44 c7 35 d1 17 c4 2d 1c 18 35 42 0b 6b 99 42 dd 4f 1b
Above is a hash where the hex combinations are separated so you can count 32.
There are 16 characters available to hex 0-9&a-f and 16^2 or 256 combinations in hex.
With 32 slots for a hex in a sha-256 you use 256^32 to get:
115792089237316195423570985008687907853269984665640564039457584007913129639936
Available sha-256 hashes.

u-sql: filtering out empty// Null strings (microsoft academic graph)

I am new to u-sql of azure datalake analytics.
I want to do what I think is a very simple operations but ran into trouble.
Basically: I want to create a query which ignore empty string.
using it in select works, but not in WHERE statement.
Below the statement I am making and the cryptic error I get
JOB
#xsel_res_1 =
EXTRACT
x_paper_id long,
x_Rank uint,
x_doi string,
x_doc_type string,
x_paper_title string,
x_original_title string,
x_book_title string,
x_paper_year int,
x_paper_date DateTime?,
x_publisher string,
x_journal_id long?,
x_conference_series_id long?,
x_conference_instance_id long?,
x_volume string,
x_issue string,
x_first_page string,
x_last_page string,
x_reference_count long,
x_citation_count long?,
x_estimated_citation int?
FROM #"adl://xmag.azuredatalakestore.net/graph/2018-02-02/Papers.txt"
USING Extractors.Tsv()
;
#xsel_res_2 =
SELECT
x_paper_id AS x_paper_id,
x_doi.ToLower() AS x_doi,
x_doi.Length AS x_doi_length
FROM #xsel_res_1
WHERE NOT string.IsNullOrEmpty(x_doi)
;
#xsel_res_3 =
SELECT
*
FROM #xsel_res_2
SAMPLE ANY (5)
;
OUTPUT #xsel_res_3
TO #"/graph/2018-02-02/x_output/x_papers_x6.tsv"
USING Outputters.Tsv();
THE ERROR
Vertex failed
Vertex failure triggered quick job abort. Vertex failed: SV1_Extract[0][1] with error: Vertex user code error.
VertexFailedFast: Vertex failed with a fail-fast error
E_RUNTIME_USER_EXTRACT_ROW_ERROR: Error occurred while extracting row after processing 10 record(s) in the vertex' input split. Column index: 5, column name: 'x_original_title'.
E_RUNTIME_USER_EXTRACT_EXTRACT_INVALID_CHARACTER_AFTER_QUOTED_FIELD: Invalid character following the ending quote character in a quoted field.
Row selected
Component
RUNTIME
Message
Invalid character following the ending quote character in a quoted field.
Resolution
Column should be fully surrounded with double-quotes and double-quotes within the field escaped as two double-quotes.
Description
Invalid character is detected following the ending quote character in a quoted field. A column delimiter, row delimiter or EOF is expected. This error can occur if double-quotes within the field are not correctly escaped as two double-quotes.
Details
Row Delimiter: 0x0
Column Delimiter: 0x9
HEX: 61 76 6E 69 20 74 65 72 6D 69 6E 20 75 20 70 6F 76 61 6C 6A 73 6B 6F 6A 20 6C 69 73 74 69 6E 69 20 69 20 6E 61 74 70 69 73 75 20 67 20 31 31 38 35 09 22 50 6F 20 6B 6F 6E 63 75 22 ### 20 28 73 74 61 72 69 20 68 72
UPDATE
BY the way, the operations work on other datasets, so the problem is not the syntax as far as I can tell
//Define schema of file, must map all columns
#searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int,
Urls string,
ClickedUrls string
FROM #"/Samples/Data/SearchLog.tsv"
USING Extractors.Tsv();
#searchlog_1 =
SELECT * FROM #searchlog
WHERE NOT string.IsNullOrEmpty(ClickedUrls );
OUTPUT #searchlog_1
TO #"/Samples/Output/SearchLog_output_x1.tsv"
USING Outputters.Tsv();
This is an unfortunate error display for this case.
Assuming text is utf-8, you can use a site like www.hexutf8.com to convert the hex to:
avni termin u povaljskoj listini natpisu g 1185 "Po koncu" (Stari hr
It looks like the input row contains at least one " character that is not properly escaped. It should look like this:
avni termin u povaljskoj listini natpisu g 1185 ""Po koncu"" (Stari hr
#Saveenr's answer assumes that the values in your file are all quoted. Alternatively, if they are not quoted (and do not contain your column separator as values), then setting Extractors.Tsv(quoting:false) could help as well.

How do I run md5() on a bigint in Presto?

select md5(15)
returns
Query failed (#20160818_193909_00287_8zejd): line 1:8:
Unexpected parameters (bigint) for function md5. Expected: md5(varbinary)
How do I hash 15 and get back a string? I'd like to select 1 in 16 items at random, e.g. where md5(id) like '%3'.
FYI I might be on version 0.147, don't know how to tell.
FYI I found this PR. md5 would be cross-platform, which is nice, but I'd take a Presto-dependent hash function that spread ids relatively uniformly. I suppose I could implement my own linear formula. Seems awkward.
Best thing I could come up with was to cast the integer as a varchar, then turn it into varbinary via utf8, then apply md5 on the varbinary:
presto> select md5(to_utf8(cast(15 as varchar)));
_col0
-------------------------------------------------
9b f3 1c 7f f0 62 93 6a 96 d3 c8 bd 1f 8f 2f f3
(1 row)
If this is not the result you get, you can always turn it into a hex string manually:
presto> select to_hex(md5(to_utf8(cast(15 as varchar))));
_col0
----------------------------------
9BF31C7FF062936A96D3C8BD1F8F2FF3
(1 row)

BER Encoding of a "Choice"

I am trying to parse an LDAP bind request using the Apache Harmony ASN.1/BER classes (could use another library, I just chose that as it has an Apache License).
My question is on the encoding specifically of a "CHOICE" in ASN.1. The RFC that defines the LDAP ASN.1 schema (http://www.rfc-editor.org/rfc/rfc2251.txt) gives the following as part a bind request:
BindRequest ::= [APPLICATION 0] SEQUENCE {
version INTEGER (1 .. 127),
name LDAPDN,
authentication AuthenticationChoice }
AuthenticationChoice ::= CHOICE {
simple [0] OCTET STRING,
-- 1 and 2 reserved
sasl [3] SaslCredentials }
SaslCredentials ::= SEQUENCE {
mechanism LDAPString,
credentials OCTET STRING OPTIONAL }
How is that CHOICE there actually encoded?
I generated a sample bind request using JXplorer and captured the raw data that was sent. It looks like this:
00000000 30 31 02 01 01 60 2c 02 01 03 04 1b 75 69 64 3d |01...`,.....uid=|
00000010 74 65 73 74 75 73 65 72 2c 64 63 3d 74 65 73 74 |testuser,dc=test|
00000020 2c 64 63 3d 63 6f 6d 80 0a 74 65 73 74 69 6e 67 |,dc=com..testing|
00000030 31 32 33 |123|
The 80 there (at offset 0x27) seems to represent that choice. Fair enough - and I get that (per http://en.wikipedia.org/wiki/Basic_Encoding_Rules#BER_encoding) the last bit is set in order to indicate that it's "context specific" (i.e. defined by this application/protocol) But how would I know if this is a "simple" or "sasl" auth? What indicates which option of the choice is being used? In this case it looks like the next byte (0x0a) is the length of the string - so this could be an OctetString or something of the sort - but I don't see anything here that indicates what the actual is other than 0x80...
I'm also not sure what the [0] and [3] mean in the CHOICE section above. Is that saying there are four options but only options numbered 0 and 3 are in use?
Below you can see output of openssl asn1parse command. The CHOICE members are encoded using so called context specific tags - which means normal tag value is replaced with the one specified in ASN.1 definition for respective item in the CHOICE. The tag has value 0 which implicates the first item in CHOICE is selected. The first choice item is of type OCTET STRING. The value 0 of context specific tag gives you the information about the value type. If there was no context tag, normal OCTET STRING tag would be used.
0:d=0 hl=2 l= 49 cons: SEQUENCE
2:d=1 hl=2 l= 1 prim: INTEGER :01
5:d=1 hl=2 l= 44 cons: appl [ 0 ]
7:d=2 hl=2 l= 1 prim: INTEGER :03
10:d=2 hl=2 l= 27 prim: OCTET STRING :uid=testuser,dc=test,dc=com
39:d=2 hl=2 l= 10 prim: cont [ 0 ]
The '80'H in the encoded message above is called the "identifier octets" (in general it may be more than one octet). This value of the identifier octet(s) indicates that the selected alternative of the CHOICE is "simple", because the five low-order bits of '80'H are '00000'B, which matches the tag number of the tag of "simple" ([0]).
If the sender had selected the "sasl" alternative, the identifier octet would be 'A3'H instead of '80'H. The '3'H in 'A3'H (the five low-order bits) is the tag number of the tag of "sasl" ([3]). The two highest-order bits of the identifier octet are set to '10'B for both alternatives because both [0] and [3] are "context-specific" tags (this just means that these tags don't contain the APPLICATION keyword or the PRIVATE keyword). The next bit of the identifier octet (the "constructed" bit) is set to '0' for "simple" but is set to '1' for "sasl", because the encoding of "sasl" contains nested tags whereas the encoding of "simple" does not contain any nested tags.