Using pyOpenSSL to generate p12 / pfx containers - ssl-certificate

I have just started using pyOpenSSL library to generate certificates and to read existing certs. However, I want to generate a p12/pfx bundle in my program instead of the standard pem files. I wasnt able to find the appropriate API for this. Only for dumping pkcs12 objects. Can anyone let me know how to do this ?
Thanks

Using the example PEM private key data in privkeydata and certificate data in certdata (which I moved to the bottom of the answer for better readability), I think the following is what you are looking for:
>>> cert = crypto.load_certificate(crypto.FILETYPE_PEM, certdata)
>>> privkey = crypto.load_privatekey(crypto.FILETYPE_PEM, privkeydata)
>>> pfx = crypto.PKCS12Type()
>>> pfx.set_privatekey(privkey)
>>> pfx.set_certificate(cert)
>>> pfxdata = pfx.export('passphrase')
>>> with open('test.pfx', 'wb') as pfxfile:
... pfxfile.write(pfxdata)
...
>>>
Checking the result by invoking openssl in the shell:
$ openssl pkcs12 -info -in test.pfx -passin pass:passphrase -passout pass:otherpassphrase
MAC Iteration 1
MAC verified OK
PKCS7 Encrypted data: pbeWithSHA1And3-KeyTripleDES-CBC, Iteration 2048
Certificate bag
Bag Attributes
localKeyID: 97 AD B9 5B EC 5B BA 6D BC F7 D3 06 EA CC 12 A1 52 AE 90 7B
subject=/C=nl/ST=Noord-Holland/O=Mobilefish.com/L=Zaandam/OU=Marketing/CN=www.mobilefish.com/emailAddress=contact#mobilefish.com
issuer=/C=nl/ST=Noord-Holland/O=Mobilefish.com/L=Zaandam/OU=Marketing/CN=www.mobilefish.com/emailAddress=contact#mobilefish.com
-----BEGIN CERTIFICATE-----
MIID0zCCAzygAwIBAgIBADANBgkqhkiG9w0BAQQFADCBqDELMAkGA1UEBhMCbmwx
FjAUBgNVBAgTDU5vb3JkLUhvbGxhbmQxFzAVBgNVBAoTDk1vYmlsZWZpc2guY29t
MRAwDgYDVQQHEwdaYWFuZGFtMRIwEAYDVQQLEwlNYXJrZXRpbmcxGzAZBgNVBAMT
End3dy5tb2JpbGVmaXNoLmNvbTElMCMGCSqGSIb3DQEJARYWY29udGFjdEBtb2Jp
bGVmaXNoLmNvbTAeFw0xNTExMTQwMjAyNDlaFw0xNjExMTMwMjAyNDlaMIGoMQsw
CQYDVQQGEwJubDEWMBQGA1UECBMNTm9vcmQtSG9sbGFuZDEXMBUGA1UEChMOTW9i
aWxlZmlzaC5jb20xEDAOBgNVBAcTB1phYW5kYW0xEjAQBgNVBAsTCU1hcmtldGlu
ZzEbMBkGA1UEAxMSd3d3Lm1vYmlsZWZpc2guY29tMSUwIwYJKoZIhvcNAQkBFhZj
b250YWN0QG1vYmlsZWZpc2guY29tMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKB
gQC2Yw+5xKhhelVmH7Weu9eMhreuRvQXuNsyi5SA0sBXboOybox5oJZAWbL84KN5
gX1qN7U62szotl3K49bRlzbKu/TmcVdJYlRlnwusL5XQJDKv+uERlUU0QDXeswEu
M93UxkeN/j0vKfjp8k/Ny4qc5pNOT/dqNRyx01pVFV8NFwIDAQABo4IBCTCCAQUw
HQYDVR0OBBYEFKEXjyTmz/vOVxHbtJCJUraUZhxsMIHVBgNVHSMEgc0wgcqAFKEX
jyTmz/vOVxHbtJCJUraUZhxsoYGupIGrMIGoMQswCQYDVQQGEwJubDEWMBQGA1UE
CBMNTm9vcmQtSG9sbGFuZDEXMBUGA1UEChMOTW9iaWxlZmlzaC5jb20xEDAOBgNV
BAcTB1phYW5kYW0xEjAQBgNVBAsTCU1hcmtldGluZzEbMBkGA1UEAxMSd3d3Lm1v
YmlsZWZpc2guY29tMSUwIwYJKoZIhvcNAQkBFhZjb250YWN0QG1vYmlsZWZpc2gu
Y29tggEAMAwGA1UdEwQFMAMBAf8wDQYJKoZIhvcNAQEEBQADgYEAanK63a/8Emwl
v4i8XI57hkt3Iq0NbMveGT01DrBiRUJ/Uf7jpS+j4blcaUUJ6JuOk+wrwYZIZqZE
9mHfiPKMNps22OYXoHkaZPcxtofpyTGE2tnW2ReauTKCVPSczQPqn7mhBG2t6TJs
YBpp0s2I/q7a4bVbowibPbO3RK1kBcA=
-----END CERTIFICATE-----
PKCS7 Data
Shrouded Keybag: pbeWithSHA1And3-KeyTripleDES-CBC, Iteration 2048
Bag Attributes
localKeyID: 97 AD B9 5B EC 5B BA 6D BC F7 D3 06 EA CC 12 A1 52 AE 90 7B
Key Attributes: <No Attributes>
-----BEGIN ENCRYPTED PRIVATE KEY-----
MIICxjBABgkqhkiG9w0BBQ0wMzAbBgkqhkiG9w0BBQwwDgQIQ4sDzexzf6gCAggA
MBQGCCqGSIb3DQMHBAjmWBnhSdfEJgSCAoCQMrLa0Y+V3zrgRtjesa6Er/dJFz40
rpN2unNBpdrFMkuEIcCAnlNoLKJpe3x20ly4QrYaDG7sxMbdxnr3jqf4Jy0TxgnC
nC5x8hDhIV+M7gnXQiiGTK2VPDeJ2n3/hmmIEgleBOSdbz39O1Ik52+E47Fee+pB
W9b2au/p8NUE66v7JgN+VQVG6EcXCsyFkFivl1O+eokcTwa9q3sqPW+xTiPJ43LH
yKAvjT7vWOYark6QK8Gcth4Y8FdKMA6kHNim/LAtl4Vc1Af5qHMubBO1C+Avw0HE
Qt3DP/mkdwLYjisBbqjpAFkTsdEuMIwyhuExCSu0w+QfxjVAezyC6y+7IWfBfRpG
j9+MNy9qe0DqKIQ/P09GeoXJH8Yy0RQiA1XpQBcGSuRHj6B3lWUlxtTlGlTmxlzO
yPDJXxaUmMNTCNQlYu7CBj2FOXXewAuGi0nv8/bbZpWxSgyZcVcJlCtYZq+9NmYv
RhGwfhWuNsQZQmtFDgtpg/GYD8TFV6oc6mmTurBkLEL2KGCnPWVRH8xyJeb87/EF
/H/2gA5P9aS/K3cN3OsgC5uUi38jgFZ2p69TPNLjxBHK5HakaCgh1Txdx9dcAoMt
lA/GRBu/CoqA48O4vV3RyrB0ZNSYyAYTuVRjJ+50d427InaUwrwaYCakpbxXKrlH
jvb2gKtXnvIpNnE32N1whORBGU+srEO8tz/Il5AYrZ21ESIixX9pftAgIiEMc7Xw
WmV3NexkHZGvyCG1vq62LzNxgEBN3Ng013gYdLXbO1y/pXcSRHGRdidvIwYefBbs
Yo6yvsUgdtfeAwlCC+ojgB6rTKhlbk2Yex6y9sxRCSMHibiwnveuNez+
-----END ENCRYPTED PRIVATE KEY-----
The example PEMs are created on and copy/pasted from mobilefish:
>>> certdata = """-----BEGIN CERTIFICATE-----
... MIID0zCCAzygAwIBAgIBADANBgkqhkiG9w0BAQQFADCBqDELMAkGA1UEBhMCbmwx
... FjAUBgNVBAgTDU5vb3JkLUhvbGxhbmQxFzAVBgNVBAoTDk1vYmlsZWZpc2guY29t
... MRAwDgYDVQQHEwdaYWFuZGFtMRIwEAYDVQQLEwlNYXJrZXRpbmcxGzAZBgNVBAMT
... End3dy5tb2JpbGVmaXNoLmNvbTElMCMGCSqGSIb3DQEJARYWY29udGFjdEBtb2Jp
... bGVmaXNoLmNvbTAeFw0xNTExMTQwMjAyNDlaFw0xNjExMTMwMjAyNDlaMIGoMQsw
... CQYDVQQGEwJubDEWMBQGA1UECBMNTm9vcmQtSG9sbGFuZDEXMBUGA1UEChMOTW9i
... aWxlZmlzaC5jb20xEDAOBgNVBAcTB1phYW5kYW0xEjAQBgNVBAsTCU1hcmtldGlu
... ZzEbMBkGA1UEAxMSd3d3Lm1vYmlsZWZpc2guY29tMSUwIwYJKoZIhvcNAQkBFhZj
... b250YWN0QG1vYmlsZWZpc2guY29tMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKB
... gQC2Yw+5xKhhelVmH7Weu9eMhreuRvQXuNsyi5SA0sBXboOybox5oJZAWbL84KN5
... gX1qN7U62szotl3K49bRlzbKu/TmcVdJYlRlnwusL5XQJDKv+uERlUU0QDXeswEu
... M93UxkeN/j0vKfjp8k/Ny4qc5pNOT/dqNRyx01pVFV8NFwIDAQABo4IBCTCCAQUw
... HQYDVR0OBBYEFKEXjyTmz/vOVxHbtJCJUraUZhxsMIHVBgNVHSMEgc0wgcqAFKEX
... jyTmz/vOVxHbtJCJUraUZhxsoYGupIGrMIGoMQswCQYDVQQGEwJubDEWMBQGA1UE
... CBMNTm9vcmQtSG9sbGFuZDEXMBUGA1UEChMOTW9iaWxlZmlzaC5jb20xEDAOBgNV
... BAcTB1phYW5kYW0xEjAQBgNVBAsTCU1hcmtldGluZzEbMBkGA1UEAxMSd3d3Lm1v
... YmlsZWZpc2guY29tMSUwIwYJKoZIhvcNAQkBFhZjb250YWN0QG1vYmlsZWZpc2gu
... Y29tggEAMAwGA1UdEwQFMAMBAf8wDQYJKoZIhvcNAQEEBQADgYEAanK63a/8Emwl
... v4i8XI57hkt3Iq0NbMveGT01DrBiRUJ/Uf7jpS+j4blcaUUJ6JuOk+wrwYZIZqZE
... 9mHfiPKMNps22OYXoHkaZPcxtofpyTGE2tnW2ReauTKCVPSczQPqn7mhBG2t6TJs
... YBpp0s2I/q7a4bVbowibPbO3RK1kBcA=
... -----END CERTIFICATE-----"""
>>> privkeydata = """-----BEGIN RSA PRIVATE KEY-----
... MIICXAIBAAKBgQC2Yw+5xKhhelVmH7Weu9eMhreuRvQXuNsyi5SA0sBXboOybox5
... oJZAWbL84KN5gX1qN7U62szotl3K49bRlzbKu/TmcVdJYlRlnwusL5XQJDKv+uER
... lUU0QDXeswEuM93UxkeN/j0vKfjp8k/Ny4qc5pNOT/dqNRyx01pVFV8NFwIDAQAB
... AoGBAIzWW/tYV6nGHJHapJWpeZ4DHW2PTsfOsD0MuaTsmSgqp7muUf1Nuxh/644I
... LVQTYPQXhnOnJ5n/0NduLqD0ApMk2IAdP0w224Yk3HJaMTu/KgOMj7gyDJvUOncY
... GNoxRZ9Fz/ByNUdL+OmZdECaSbcVR/PftYlduEFdy5PEcGBBAkEA8ab14UgMz7Tw
... 5zy32QWljTlmLBAuFZ73tbxNpDlX4WtP3ye1eAGm2usNVjf9vtfpfXspicgPI9z8
... Va2en2q1twJBAME3SZw/pmhijjn8+0FLO7ieooHfnEJ7XZWeEVnPU9cW66fe6EqN
... foToJadmU6avWFiIRYPazRECCgzOxkDrY6ECQCXzBmIeooRr8fkee/DFBj6raPQ6
... hkI2+Me9jqPfrYFlDOIKpmD2QXHXv/xuRpcV6UEfemJ83IPRTH9YCLUYWPkCQEu8
... eT0m8fquzyNJ188DR3iZrgeMeDrTEp7oI9L5YtrH4D2gMZuvlO1R9hiFErsetlmV
... qPIDXSiSjQ/yKWIfIqECQH8Q7WuTIpNbJjoMOoLZ18NqTDPFOG/L0BFeb/ovMZ06
... LNLN9K1eJ0ZQUHy447A3auCeMhJLG8JfBG7Kjk4wul4=
... -----END RSA PRIVATE KEY-----"""

Related

decoding octal escape sequences in input with awk

Updated
Let's suppose that you got octal escape sequences in a stream:
backslash \134 is escaped as \134134
single quote ' and double quote \042
linefeed `\012` and carriage return `\015`
%s &
etc...
note: The escaped characters are limited to 0x01-0x1F 0x22 0x5C 0x7F
How can you revert those escape sequences back to their corresponding character with awk?
While awk is able to understand them out-of-box when used in a literal string or as a parameter argument, I can't find the way to leverage this capability when the escape sequence is part of the data. For now I'm using one gsub per escape sequence but it doesn't feel efficient.
Here's the expected output for the given sample:
backslash \ is escaped as \134
single quote ' and double quote "
linefeed `
` and carriage return `
%s &
etc...
PS: While I have the additional constraint of unescaping each line into an awk variable before printing the result, it doesn't really matter.
Using GNU awk for strtonum() and lots of meaningfully-named variables to show what each step does:
$ cat tst.awk
function octs2chars(str, head,tail,oct,dec,char) {
head = ""
tail = str
while ( match(tail,/\\[0-7]{3}/) ) {
oct = substr(tail,RSTART+1,RLENGTH-1)
dec = strtonum(0 oct)
char = sprintf("%c", dec)
head = head substr(tail,1,RSTART-1) char
tail = substr(tail,RSTART+RLENGTH)
}
return head tail
}
{ print octs2chars($0) }
$ awk -f tst.awk file
backslash \ is escaped as \134
single quote ' and double quote "
linefeed `
` and carriage return `
%s &
etc...
If you don't have GNU awk then write a small function to convert octal to decimal, e.g. oct2dec() below, and then call that instead of strtonum():
$ cat tst2.awk
function oct2dec(oct, dec) {
dec = substr(oct,1,1) * 8 * 8
dec += substr(oct,2,1) * 8
dec += substr(oct,3,1)
return dec
}
function octs2chars(str, head,tail,oct,dec,char) {
head = ""
tail = str
while ( match(tail,/\\[0-7]{3}/) ) {
oct = substr(tail,RSTART+1,RLENGTH-1)
dec = oct2dec(oct) # replaced "strtonum(0 oct)"
char = sprintf("%c", dec)
head = head substr(tail,1,RSTART-1) char
tail = substr(tail,RSTART+RLENGTH)
}
return head tail
}
{ print octs2chars($0) }
$ awk -f tst2.awk file
backslash \ is escaped as \134
single quote ' and double quote "
linefeed `
` and carriage return `
%s &
etc...
The above assumes that, as discussed in comments, the only backslashes in the input will be in the context of the start of octal numbers as shown in the provided sample input.
With GNU awk which supports strtonum() function, would you
please try:
awk '{
while (match($0, /\\[0-7]{1,3}/)) {
printf("%s", substr($0, 1, RSTART - 1)) # print the substring before the match
printf("%c", strtonum("0" substr($0, RSTART + 1, RLENGTH))) # convert the octal string to character
$0 = substr($0, RSTART + RLENGTH) # update $0 with remaining substring
}
print
}' input_file
It processes the matched substring (octal presentation)
in the while loop one by one.
substr($0, RSTART + 1, RLENGTH) skips the leading backslash.
"0" prepended to substr makes an octal string.
strtonum() converts the octal string to the numeric value.
The final print outputs the remaining substring.
UPDATE :: about gawk's strtonum() in unicode mode :
echo '\666' |
LC_ALL='en_US.UTF-8' gawk -e '
$++NF = "<( "(sprintf("%c", strtonum((_=_<_) substr($++_, ++_))))" )>"'
0000000 909522524 539507744 690009798 2622
\ 6 6 6 < ( ƶ ** ) > \n
134 066 066 066 040 074 050 040 306 266 040 051 076 012
\ 6 6 6 sp < ( sp ? ? sp ) > nl
92 54 54 54 32 60 40 32 198 182 32 41 62 10
5c 36 36 36 20 3c 28 20 c6 b6 20 29 3e 0a
0000016
By default, gawk in unicode mode would decode out a multi-byte character instead of byte \266 | 0xB6. If you wanna ensure consistency of always decoding out a single-byte out, even in gawk unicode mode, this should do the trick :
echo '\666' |
LC_ALL='en_US.UTF-8' gawk -e '$++NF = sprintf("<( %c )>",
strtonum((_=_<_) substr($++_, ++_)) + _*++_^_++*_^++_)'
0000000 909522524 539507744 1042882742 10
\ 6 6 6 < ( 266 ) > \n
134 066 066 066 040 074 050 040 266 040 051 076 012
\ 6 6 6 sp < ( sp ? sp ) > nl
92 54 54 54 32 60 40 32 182 32 41 62 10
5c 36 36 36 20 3c 28 20 b6 20 29 3e 0a
0000015
long story short : add 4^5 * 54 to output of strtonum(), which happens to be 0xD800, the starting point of UTF-16 surrogates
=================== =================== ===================
one quick note about #Gene's proposed perl-based solution :
echo 'abc \555 456' | perl -p -e 's/\\([0-7]{3})/chr(oct($1))/ge'
Wide character in print at -e line 1, <> line 1.
abc ŭ 456
octal codes wrap around, meaning \4xx = \0xx ; \6xx = \2xx etc :
printf '\n %s\n' $'\555'
m
so perl is incorrectly decoding these as multi-byte characters, when in fact \555, as confirmed by printf, is merely lowercase "m" (0x6D)
ps : my perl is version 5.34
I got my own POSIX awk solution, so I post it here for reference.
The main idea is to build a hash that translates an octal escape sequence to its corresponding character. You can then use it while splitting the line during the search for escape sequences:
LANG=C awk '
BEGIN {
for ( i = 1; i <= 255; i++ )
tr[ sprintf("\\%03o",i) ] = sprintf("%c",i)
}
{
remainder = $0
while ( match(remainder, /\\[0-7]{3}/) ) {
printf("%s%s", \
substr(remainder, 1, RSTART-1), \
tr[ substr(remainder, RSTART, RLENGTH) ] \
)
remainder = substr(remainder, RSTART + RLENGTH)
}
print remainder
}
' input.txt
backslash `\`
single quote `'` and double quote `"`
linefeed `
` and carriage return `
%s &
etc...
this separate post is made specifically to showcase how to extend the octal lookup reference tables in gawk unicode-mode to all 256 bytes without external dependencies or warning messages:
ASCII bytes reside in table o2bL
8-bit bytes reside in table o2bH
.
# gawk profile, created Fri Sep 16 09:53:26 2022
'BEGIN {
1 makeOctalRefTables(PROCINFO["sorted_in"] = "#val_str_asc" \
(ORS = ""))
128 for (_ in o2bL) {
128 print o2bL[_]
}
128 for (_ in o2bH) {
128 print o2bH[_]
}
}
function makeOctalRefTables(_,__,___,____)
{
1 _=__=___=____=""
for (_ in o2bL) {
break
}
1 if (!(_ in o2bL)) {
1 ____=_+=((_+=_^=_<_)-+-++_)^_--
128 do { o2bL[sprintf("\\%o",_)] = \
sprintf("""%c",_)
} while (_--)
1 o2bL["\\" ((_+=(_+=_^=_<_)+_)*_--+_+_)] = "\\&"
1 ___=--_*_^_--*--_*++_^_*(_^=++_)^(! —_)
128 do { o2bH[sprintf("\\%o", +_)] = \
sprintf("%c",___+_)
} while (____<--_)
}
1 return length(o2bL) ":" length(o2bH)
}'
|
\0 \1 \2 \3 \4 \5 \6 \7 \10\11 \12
\13
\14
\16 \17
\20 \21 \22 \23 \24 \25 \26 \27 \30 \31 \32 \33 34 \35 \36 \37
\40 \41 !\42 "\43 #\44 $\45 %\47 '\50 (\51 )\52 *\53 +\54 ,\55 -\56 .\57 /
\60 0\61 1\62 2\63 3\64 4\65 5\66 6\67 7\70 8\71 9\72 :\73 ;\74 <\75 =\76 >\77 ?
\100 #\101 A\102 B\103 C\104 D\105 E\106 F\107 G\110 H\111 I\112 J\113 K\114 L\115 M\116 N\117 O
\120 P\121 Q\122 R\123 S\124 T\125 U\126 V\127 W\130 X\131 Y\132 Z\133 [\134 \\46 \&\135 ]\136 ^\137 _
\140 `\141 a\142 b\143 c\144 d\145 e\146 f\147 g\150 h\151 i\152 j\153 k\154 l\155 m\156 n\157 o
\160 p\161 q\162 r\163 s\164 t\165 u\166 v\167 w\170 x\171 y\172 z\173 {\174 |\175 }\176 ~\177
\200 ?\201 ?\202 ?\203 ?\204 ?\205 ?\206 ?\207 ?\210 ?\211 ?\212 ?\213 ?\214 ?\215 ?\216 ?\217 ?
\220 ?\221 ?\222 ?\223 ?\224 ?\225 ?\226 ?\227 ?\230 ?\231 ?\232 ?\233 ?\234 ?\235 ?\236 ?\237 ?
\240 ?\241 ?\242 ?\243 ?\244 ?\245 ?\246 ?\247 ?\250 ?\251 ?\252 ?\253 ?\254 ?\255 ?\256 ?\257 ?
\260 ?\261 ?\262 ?\263 ?\264 ?\265 ?\266 ?\267 ?\270 ?\271 ?\272 ?\273 ?\274 ?\275 ?\276 ?\277 ?
\300 ?\301 ?\302 ?\303 ?\304 ?\305 ?\306 ?\307 ?\310 ?\311 ?\312 ?\313 ?\314 ?\315 ?\316 ?\317 ?
\320 ?\321 ?\322 ?\323 ?\324 ?\325 ?\326 ?\327 ?\330 ?\331 ?\332 ?\333 ?\334 ?\335 ?\336 ?\337 ?
\340 ?\341 ?\342 ?\343 ?\344 ?\345 ?\346 ?\347 ?\350 ?\351 ?\352 ?\353 ?\354 ?\355 ?\356 ?\357 ?
\360 ?\361 ?\362 ?\363 ?\364 ?\365 ?\366 ?\367 ?\370 ?\371 ?\372 ?\373 ?\374 ?\375 ?\376 ?\377 ?

Convert an image in a PySpark dataframe to a Numpy array

I have a DataFrame in PySpark (version 3.1.2) which contains images:
img_path = "s3://multimedia-commons/data/images/000/24a/00024a73d1a4c32fb29732d56a2.jpg"
df = spark.read.format("image").load(img_path)
df.printSchema()
df.select("image.height", "image.width"
,"image.nChannels", "image.mode"
,"image.data").show()
root
|-- image: struct (nullable = true)
| |-- origin: string (nullable = true)
| |-- height: integer (nullable = true)
| |-- width: integer (nullable = true)
| |-- nChannels: integer (nullable = true)
| |-- mode: integer (nullable = true)
| |-- data: binary (nullable = true)
+------+-----+---------+----+--------------------+
|height|width|nChannels|mode| data|
+------+-----+---------+----+--------------------+
| 260| 500| 3| 16|[00 00 00 00 00 0...|
+------+-----+---------+----+--------------------+
I need to convert the image into a Numpy array to pass to a machine learning model.
The approach in https://stackoverflow.com/a/69215982/11262633 seems reasonable, but is giving me incorrect image values.
import pyspark.sql.functions as F
from pyspark.ml.image import ImageSchema
from pyspark.ml.linalg import DenseVector, VectorUDT
import numpy as np
img2vec = F.udf(lambda x: DenseVector(ImageSchema.toNDArray(x).flatten()), VectorUDT())
print(f'Image fields = {ImageSchema.imageFields}')
df_new = df.withColumn('vecs',img2vec('image'))
row_dict = df_new.first().asDict()
img_vec = row_dict['vecs']
img_dict = row_dict['image']
width = img_dict['width']
height = img_dict['height']
nChannels = img_dict['nChannels']
img_np = img_vec.reshape(height, width, nChannels)
m = np.ma.masked_greater(img_np, 100)
m_mask = m.mask
args = np.argwhere(m_mask)
for idx, (r, c, _) in enumerate(args):
print(r, c, img_np[r,c])
if idx > 5:
break
Output:
46 136 [ 0. 13. 101.]
47 104 [ 1. 15. 102.]
47 105 [ 1. 16. 104.]
47 106 [ 1. 16. 104.]
47 107 [ 1. 16. 104.]
47 108 [ 1. 16. 104.]
47 109 [ 1. 15. 105.]
Here's a visualization of the image:
Desired Results
Reading the image using Pillow gives a different result:
from PIL import Image
import numpy as np
img = Image.open('/home/hadoop/00024a73d1a4c32fb29732d56a2.jpg')
img_np = np.asarray(img)
m = np.ma.masked_greater(img_np, 100)
m_mask = m.mask
args = np.argwhere(m_mask)
for idx, (r, c, _) in enumerate(args):
print(r, c, img_np[r,c])
if idx > 5:
break
Output:
47 104 [101 16 9]
47 105 [103 16 9]
47 106 [103 16 9]
47 107 [103 16 9]
47 108 [103 16 9]
47 109 [104 15 9]
47 110 [105 16 10]
My question
Why are the images different, both in appearance, and when I read individual pixels?
Using np.asarray on the bytes data returned by PySpark gave the same issue. Maybe PySpark is fine and there's just some error in my manipulations of the returned data. I've spent about 8 hours working on this. Thanks in advance for any insights you may have.
This is because spark uses
data: BinaryType (Image bytes in OpenCV-compatible order: row-wise BGR
in most cases)
And Pillow is rendering it RGB.

How to show the public key for a website certificate

I've created a Go program to connect to a website and get the certificates it uses. I'm not sure how to get the correct representation of the public key.
I can fetch the certificate and I can type check on Certificate.PublicKey. Once I understand it's rsa.PublicKey or ecdsa.PublicKey I'd need to print the hex representation of it.
switch cert.PublicKey.(type) {
case *rsa.PublicKey:
logrus.Error("this is RSA")
// TODO: print hex representation of key
case *ecdsa.PublicKey:
logrus.Error("this is ECDSA")
// TODO: print hex representation of key
default:
fmt.Println("it's something else")
}
I'd expect it to print something like:
04 4B F9 47 1B A8 A8 CB A4 C6 C0 2D 45 DE 43 F3 BC F5 D2 98 F4 25 90 6F 13 0D 78 1A AC 05 B4 DF 7B F6 06 5C 80 97 9A 53 06 D0 DB 0E 15 AD 03 DE 14 09 D3 77 54 B1 4E 15 A8 AF E3 FD DC 9D AD E0 C5
it seems you are asking for the sha1 sum of the certificates involved.
here is a working example that asks for a host:port and prints the sums of the certificates involved
package main
import (
"crypto/sha1"
"crypto/tls"
"fmt"
"log"
"os"
)
func main() {
if len(os.Args) != 2 {
log.Panic("call with argument of host:port")
}
log.SetFlags(log.Lshortfile)
conf := &tls.Config{
//InsecureSkipVerify: true,
}
fmt.Printf("dialing:%s\n", os.Args[1])
conn, err := tls.Dial("tcp", os.Args[1], conf)
if err != nil {
log.Println(err)
return
}
defer conn.Close()
for i, v := range conn.ConnectionState().PeerCertificates {
//edit: use %X for uppercase hex printing
fmt.Printf("cert %d sha1 fingerprint:%x \n", i, sha1.Sum(v.Raw))
}
}
run as:
./golang-tls www.google.com:443
dialing:www.google.com:443
cert 0 sha1 fingerprint:34781c3be98cf958f514aecb1ae2e4e866effe34
cert 1 sha1 fingerprint:eeacbd0cb452819577911e1e6203db262f84a318
for general notions on SSL i have found this stackexchange answer to be extremely valuable.

GNURadio PSK bit recovery

I have followed the wonderful GNURadio Guided Tutorial PSK Demodulation:
https://wiki.gnuradio.org/index.php/Guided_Tutorial_PSK_Demodulation
I've created a very simple DBPSK modulator
I feed in a series of bits that are sliding. So the first byte I feed in is 0x01, the next byte is 0x02, 0x04, 0x08 and so on. This is the output of hd:
00000000 00 00 ac 0e d0 f0 20 40 81 02 04 08 10 00 20 40 |...... #...... #|
00000010 81 02 04 08 10 00 20 40 81 02 04 08 10 00 20 40 |...... #...... #|
*
00015000
The first few bytes are garbage, but then you can see the pattern. Looking at the second line you see:
0x81, 0x02, 0x04, 0x08, 0x10, 0x00, 0x20, 0x40, 0x81
The walking ones is there, but after 0x10, the PSK demodulator receives a 0x00, then a few bytes later is receives a 0x81. It almost seems like the timing recovery is off.
Has anyone else seen something like this?
OK, I figured it out. Below is my DBPSK modulation.
If you let this run, the BER will continue to drop. Some things to keep in mind. The PSK Mod takes an 8-bit value (or perhaps an short or int as well). It grabs the bits and modulates them. Then the PSK Demod does the same. If you save this to a file, you will not get the exact bits out. You will need to shift the bits to align them. I added the Vector Insert block to generate a preamble of sorts.
Then I wrote some Python to find my preamble:
import numpy as np
import matplotlib.pyplot as plt
def findPreamble(preamble, x):
for i in range(0, len(x) - len(preamble)):
check = 0
for j in range(0, len(preamble)):
check += x[i + j] - preamble[j]
if (check == 0):
print("Found a preamble at {0}".format(i))
x = x[i + len(preamble)::]
break
return check == 0, x
def shiftBits(x):
for i in range (0, len(x) - 1):
a = x[i]
a = a << 1
if x[i + 1] & 0x80:
a = a | 1
x[i] = (a & 0xFF)
return x
f = open('test.bits', 'rb')
x = f.read();
f.close()
preamble = [0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08]
searchForBit = True
x = np.frombuffer(x, dtype='uint8')
x = x.astype('int')
print(x)
while searchForBit:
x = shiftBits(x)
print(x)
found, y = findPreamble(preamble, x)
if found:
searchForBit = False
y = y.astype('uint8')
f = open('test.bits', 'wb')
f.write(y)
f.close()

How to use HMAC SHA256?

As per the various docs that I have read for using HMAC SHA256, I have understood that:
H (K XOR opad, H (K XOR ipad, text)) where H in my case is SHA256.
But, SHA256 input has only one parameter i.e a Message.
Whereas H(K,text) has two inputs.
So how to calculate H(k,text)?
Should I first encode text with k and then use H(encoded_text), where encoded_text will be used as a message?
Thank You
H() is your cryptographic hash function, in this case SHA256() but
could also be MD5 or whatever;
K is your predifined key
Text is the message to be authenticated
opad be the outer padding (0x5c5c5c…5c5c, one-block-long hexadecimal
constant)
ipad be the inner padding (0x363636…3636, one-block-long hexadecimal
constant)
Then HMAC(K,m) is mathematically defined by
HMAC(K,m) = H((K ⊕ opad) ∥ H((K ⊕ ipad) ∥ m)).
blocksized is determined by your hash function (MD5 would be 64
bytes)
o_key_pad = [opad * blocksize] ⊕ key
i_key_pad = [ipad * blocksize] ⊕ key
Your result would be:
H(o_key_pad || H(i_key_pad || TEXT))
You can find a good read here:
http://timdinh.nl/index.php/hmac/
With also the following pseudocode which almost looks like mine :
function hmac (key, message)
opad = [0x5c * blocksize] // Where blocksize is that of the underlying hash function
ipad = [0x36 * blocksize]
if (length(key) > blocksize) then
key = hash(key) // Where 'hash' is the underlying hash function
end if
for i from 0 to length(key) - 1 step 1
ipad[i] = ipad[i] XOR key[i]
opad[i] = opad[i] XOR key[i]
end for
return hash(opad || hash(ipad || message)) // Where || is concatenation
end function