ESP32 SSL connection works when CA Certificate is a constant, but not when read from a file - ssl

I have the following Arduino code I'm using with an ESP32:
if(!SPIFFS.begin(true)) {
Serial.println("Error mounting SPIFFS.");
}
File file = SPIFFS.open("/root.cer");
if(!file) {
Serial.println("Error opening the file.");
}
Serial.println("CA Root certificate: ");
String ca_cert = file.readString();
Serial.println(ca_cert);
espClient.setCACert(ca_cert.c_str());
file.close();
This is the relevant code for loading a file and setting the WiFiClientSecure's CA certificate. This code does not work.
However, if I replace espClient.setCACert(ca_cert.c_str()); with espClient.setCACert(ROOTCERT); where ROOTCERT is defined as such:
#define ROOTCERT "-----BEGIN CERTIFICATE-----\n" \
"MIIDSjCCAjKgAwIBAgIQRK+wgNajJ7qJMDmGLvhAazANBgkqhkiG9w0BAQUFADA/\n" \
"MSQwIgYDVQQKExtEaWdpdGFsIFNpZ25hdHVyZSBUcnVzdCBDby4xFzAVBgNVBAMT\n" \
"DkRTVCBSb290IENBIFgzMB4XDTAwMDkzMDIxMTIxOVoXDTIxMDkzMDE0MDExNVow\n" \
"PzEkMCIGA1UEChMbRGlnaXRhbCBTaWduYXR1cmUgVHJ1c3QgQ28uMRcwFQYDVQQD\n" \
"Ew5EU1QgUm9vdCBDQSBYMzCCASIwDQYJKoZIhvcNAQEBBQADggEPADCCAQoCggEB\n" \
"AN+v6ZdQCINXtMxiZfaQguzH0yxrMMpb7NnDfcdAwRgUi+DoM3ZJKuM/IUmTrE4O\n" \
"rz5Iy2Xu/NMhD2XSKtkyj4zl93ewEnu1lcCJo6m67XMuegwGMoOifooUMM0RoOEq\n" \
"OLl5CjH9UL2AZd+3UWODyOKIYepLYYHsUmu5ouJLGiifSKOeDNoJjj4XLh7dIN9b\n" \
"xiqKqy69cK3FCxolkHRyxXtqqzTWMIn/5WgTe1QLyNau7Fqckh49ZLOMxt+/yUFw\n" \
"7BZy1SbsOFU5Q9D8/RhcQPGX69Wam40dutolucbY38EVAjqr2m7xPi71XAicPNaD\n" \
"aeQQmxkqtilX4+U9m5/wAl0CAwEAAaNCMEAwDwYDVR0TAQH/BAUwAwEB/zAOBgNV\n" \
"HQ8BAf8EBAMCAQYwHQYDVR0OBBYEFMSnsaR7LHH62+FLkHX/xBVghYkQMA0GCSqG\n" \
"SIb3DQEBBQUAA4IBAQCjGiybFwBcqR7uKGY3Or+Dxz9LwwmglSBd49lZRNI+DT69\n" \
"ikugdB/OEIKcdBodfpga3csTS7MgROSR6cz8faXbauX+5v3gTt23ADq1cEmv8uXr\n" \
"AvHRAosZy5Q6XkjEGB5YGV8eAlrwDPGxrancWYaLbumR9YbK+rlmM6pZW87ipxZz\n" \
"R8srzJmwN0jP41ZL9c8PDHIyh8bwRLtTcm1D9SZImlJnt1ir/md2cXjbDaJWFBM5\n" \
"JDGFoqgCWjBH4d1QB7wCCZAA62RjYJsWvIjJEubSfZGL+T0yjWW06XyxV3bqxbYo\n" \
"Ob8VZRzI9neWagqNdwvYkQsEjgfbKbYK7p2CNTUQ\n" \
"-----END CERTIFICATE-----\n"
The code works.
The ROOTCERT string is taken directly from the certificate file, so they must be identical.
The certificate file was downloaded and exported using Windows's certificate exporter. I've tried converting line endings to no avail.
EDIT: I've found a clue.
If I do the following:
String constString = ROOTCERT;
espClient.setCACert(constString.c_str());
It also does not work.
And I added this code:
if(strcmp(constString.c_str(), ROOTCERT))
Serial.println("Constant and converted string are equal.");
else
Serial.println("Constant and converted string are different.");
And it prints "Constant and converted string are different."
So it appears to be some kind of problem with how .c_str() does things? I have no idea what this could be, though. When printed to the console, the .c_str(), ROOTCERT and ca_cert Strings all appear IDENTICAL.
I am completely confused here.
Turns out I was using strcmp() incorrectly. Things are still not working.

After messing around, I fixed it.
So .c_str() is just another way of pointing to the internal buffer of the String object.
Somehow that was messing things up. Using this code fixed it.
char *dest;
dest = (char *)malloc(sizeof(char) * (ca_cert.length()+1));
strcpy(dest, ca_cert.c_str());
espClient.setCACert(dest);

Related

Caused by: java.lang.IllegalStateException: Could not read class: VirtualFile: Kotlin+ Apache Beam defined runner

I implemented an example using Kotlin + Apache Beam to define the Kotlin properties of the pipes but when I ran the project I got the error:
Caused by: java.lang.IllegalStateException: Could not read class: VirtualFile: /Users/duanybaro/.gradle/caches/modules-2/files-2.1/org.apache.beam/beam-runners-google-cloud-dataflow-java/2.27.0/3e551e54b23441cc58c9d01e6614ff67216a7e87/beam-runners-google-cloud-dataflow-java-2.27.0.jar!/org/apache/beam/runners/dataflow/DataflowPipelineJob.class
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.BinaryJavaClass.<init>(BinaryJavaClass.kt:122)
at org.jetbrains.kotlin.load.java.structure.impl.classFiles.BinaryJavaClass.<init>(BinaryJavaClass.kt:34)
This error only occurs in Kotlin because, with the code made in java, it works perfectly. Can you give me any suggestions to solve the error?
I really recommend you using the latest version of Apache Beam, your version is very old.
You can also use the starter for Beam Kotlin.
I share with your an example of Kotlin Beam project from my Github repo based on Maven.
For the Beam pipeline options, can you try with instead of using DataflowPipelineOptions :
val options = PipelineOptionsFactory
.fromArgs(*args)
.withValidation()
.`as`(TeamLeagueOptions::class.java)
val pipeline = Pipeline.create(options)
Example of PipelineOptions :
import org.apache.beam.sdk.options.Description
import org.apache.beam.sdk.options.PipelineOptions
interface TeamLeagueOptions : PipelineOptions {
#get:Description("Path of the input Json file to read from")
var inputJsonFile: String
#get:Description("Path of the slogans file to read from")
var inputFileSlogans: String
#get:Description("Path of the file to write to")
var teamLeagueDataset: String
#get:Description("Team stats table")
var teamStatsTable: String
#get:Description("Job type")
var jobType: String
#get:Description("Failure output dataset")
var failureOutputDataset: String
#get:Description("Failure output table")
var failureOutputTable: String
#get:Description("Feature name for failures")
var failureFeatureName: String
}
And pass program argument in the mvn command line :
mvn compile exec:java \
-Dexec.mainClass=fr.groupbees.application.TeamLeagueApp \
-Dexec.args=" \
--project=my-project \
--runner=DataflowRunner \
--jobName=team-league-kotlin-job-$(date +'%Y-%m-%d-%H-%M-%S') \
--region=europe-west1 \
--streaming=false \
--zone=europe-west1-d \
--tempLocation=gs://mazlum_dev/dataflow/temp \
--gcpTempLocation=gs://mazlum_dev/dataflow/temp \
--stagingLocation=gs://mazlum_dev/dataflow/staging \
--inputJsonFile=gs://mazlum_dev/team_league/input/json/input_teams_stats_raw.json \
--inputFileSlogans=gs://mazlum_dev/team_league/input/json/input_team_slogans.json \
--teamLeagueDataset=mazlum_test \
--teamStatsTable=team_stat \
--jobType=team_league_kotlin_ingestion_job \
--failureOutputDataset=mazlum_test \
--failureOutputTable=job_failure \
--failureFeatureName=team_league \
" \
-Pdataflow-runner

bitbake do_package_qa issue contains bad RPATH

I am clueless about an issue which i am facing.
During cross compiling one of the app, i am getting following error which is making no sense.
If someone can help me debug the issue, it would be really helpful.
ERROR: lib32-audiod-1.0.0-161.jcl4tv.85-r26audiod-automation-10Feb_00 do_package_qa: QA Issue: package lib32-audiod-ptest contains bad RPATH /home/work/ashutosh.tripathi/o20_build/build-starfish/BUILD/work/o20-starfishmllib32-linux-gnueabi/lib32-audiod/1.0.0-161.jcl4tv.85-r26audiod-automation-10Feb_00/audiod-1.0.0-161.jcl4tv.85 in file /home/work/ashutosh.tripathi/o20_build/build-starfish/BUILD/work/o20-starfishmllib32-linux-gnueabi/lib32-audiod/1.0.0-161.jcl4tv.85-r26audiod-automation-10Feb_00/packages-split/lib32-audiod-ptest/opt/webos/tests/audiod/gtest_audiod
package lib32-audiod-ptest contains bad RPATH /home/work/ashutosh.tripathi/o20_build/build-starfish/BUILD/work/o20-starfishmllib32-linux-gnueabi/lib32-audiod/1.0.0-161.jcl4tv.85-r26audiod-automation-10Feb_00/audiod-1.0.0-161.jcl4tv.85 in file /home/work/ashutosh.tripathi/o20_build/build-starfish/BUILD/work/o20-starfishmllib32-linux-gnueabi/lib32-audiod/1.0.0-161.jcl4tv.85-r26audiod-automation-10Feb_00/packages-split/lib32-audiod-ptest/opt/webos/tests/audiod/gtest_audiod [rpaths]
ERROR: lib32-audiod-1.0.0-161.jcl4tv.85-r26audiod-automation-10Feb_00 do_package_qa: QA run found fatal errors. Please consider fixing them.
ERROR: lib32-audiod-1.0.0-161.jcl4tv.85-r26audiod-automation-10Feb_00 do_package_qa: Function failed: do_package_qa
ERROR: Logfile of failure stored in: /home/work/ashutosh.tripathi/o20_build/build-starfish/BUILD/work/o20-starfishmllib32-linux-gnueabi/lib32-audiod/1.0.0-161.jcl4tv.85-r26audiod-automation-10Feb_00/temp/log.do_package_qa.4873
ERROR: Task (virtual:multilib:lib32:/home/work/ashutosh.tripathi/o20_build/build-starfish/meta-lg-webos/meta-webos/recipes-multimedia/audiod/audiod.bb:do_package_qa) failed with exit code '1'
NOTE: Tasks Summary: Attempted 2622 tasks of which 2608 didn't need to be rerun and 1 failed.
Here is the audiod recipe file:
DEPENDS = "glib-2.0 libpbnjson luna-service2 pmloglib luna-prefs boost pulseaudio"
RDEPENDS_${PN} = "\
libasound \
libasound-module-pcm-pulse \
libpulsecore \
pulseaudio \
pulseaudio-lib-cli \
pulseaudio-lib-protocol-cli \
pulseaudio-misc \
pulseaudio-module-cli-protocol-tcp \
pulseaudio-module-cli-protocol-unix \
pulseaudio-server \
"
WEBOS_VERSION = "1.0.0-161.open.12_49f981e4e5a599b75d893520b30393914657a4ae"
PR = "r26"
inherit webos_component
inherit webos_enhanced_submissions
inherit webos_cmake
inherit webos_library
inherit webos_daemon
inherit webos_system_bus
inherit webos_machine_dep
inherit gettext
inherit webos_lttng
inherit webos_public_repo
inherit webos_test_provider
# TODO: move to WEBOS_GIT_REPO_COMPLETE
WEBOS_REPO_NAME = "audiod-pro"
SRC_URI = "${WEBOS_PRO_GIT_REPO_COMPLETE}"
S = "${WORKDIR}/git"
EXTRA_OECMAKE += "${#bb.utils.contains('WEBOS_LTTNG_ENABLED', '1', '-DWEBOS_LTTNG_ENABLED:BOOLEAN=True', '', d)}"
EXTRA_OECMAKE += "-DAUDIOD_PALM_LEGACY:BOOLEAN=True"
EXTRA_OECMAKE += "-DAUDIOD_TEST_API:BOOLEAN=True"
FILES_${PN} += "${datadir}/alsa/"
FILES_${PN} += "/data"
FILES_${PN} += "${webos_mediadir}/internal"
I would like to thank the stackoverflow community for the help offered.
Adding the following flags helped resolve the issue.
Posting it here, so that others may get some benefit from it, if they ever face similar problem
set(CMAKE_INSTALL_RPATH "$ORIGIN")
set(CMAKE_BUILD_WITH_INSTALL_RPATH TRUE)
Or, skipping RPATH also does the job:
SET(CMAKE_SKIP_BUILD_RPATH TRUE)
SET(CMAKE_BUILD_WITH_INSTALL_RPATH FALSE)
SET(CMAKE_INSTALL_RPATH_USE_LINK_PATH FALSE)
worked also by adding following line into the recipe:
EXTRA_OECMAKE += "-DCMAKE_SKIP_RPATH=TRUE"
https://www.yoctoproject.org/docs/current/ref-manual/ref-manual.html#ref-classes-cmake

Select all alias on modifySSLConfig using JACL script

I would want to edit all of the SSL configurations on all of my alias. I have found some resources to do this and my code so far is
$AdminTask modifySSLConfig {-alias NodeDefaultSSLSettings -sslProtocol TLSv1.2}
$AdminConfig save
I would want to be able to do this on all of the alias that can be found on my server, but I don't know how
Any ideas or leads on how to do this will help. Thank you.
Edit:
I am now able to find all of the SSL configs by using this code
[$AdminTask listSSLConfigs {-scopeName (cell):Node01Cell:(node):Node01}
My next problem is, how would I be able to extract the alias string from there? I would only need the alias so that I can replace it on another variable so that I can just use a foreach loop for this
$AdminTask modifySSLConfig {-alias ${aliasvariablegoeshere} -sslProtocol TLSv1.2}
EDIT :
set hold [list [$AdminTask listSSLConfigs {-scopeName (cell):Node01Cell:(node):Node01}]]
foreach aliasList [$AdminConfig show $hold] {
foreach aliasName [$AdminConfig show $aliasList] {
set testTrim "alias "
set test5 [string trimleft $aliasName $testTrim]
$AdminTask modifySSLConfig {-alias ${test5} -sslProtocol TLSv1.2}
}
}
$AdminControl save
I have done this and was able to extract just the alias name and was able to put it on the variable like I wanted, but it gives me an invalid parameter error. Any ideas why this is happening and how would I be able to resolve this?
You can list all the SSL configs using:
AdminTask.listSSLConfigs('[-all true]')
for JACL use:
$AdminTask listSSLConfigs {-all true}
and then iterate over the list and change whatever you need.
Instead of -all you can provide scope for example: -scopeName (cell):localhostNode01Cell:(node):localhostNode01
For details about SSLConfig commands check SSLConfigCommands command group for the AdminTask object
UPDATE:
in general this should work:
foreach aliasList [$AdminTask listSSLConfigs {-scopeName (cell):PCCell1:(node):Node1}] {
puts $aliasList
set splitList [split $aliasList " "]
puts $splitList
set aliasname [lindex $splitList 1]
puts $aliasname
$AdminTask modifySSLConfig { -alias $aliasname -sslProtocol TLSv1.2 }
}
but I cannot make $AdminTask to correctly resolve $aliasname param...
Strongly suggest you to switch to jython. ;-)
I have managed to make it work, it seems like whatever I do I can't make the alias that I got to be a valid parameter so I made the whole thing as a string command instead. Here is my code.
foreach aliasList [$AdminConfig list SSLConfig] {
foreach aliasName [$AdminConfig show $aliasList alias] {
set strTrim "alias "
set strFinal [string trimleft $aliasName $strTrim]
set command "-alias $strFinal -sslProtocol TLSv1.2"
$AdminTask modifySSLConfig $command
puts saved
}
}
$AdminConfig save
I was able to figure it out for Jython:
import sys
import os
import string
import re
#$HOME/IBM/WebSphere/AppServer/bin/wsadmin.sh -lang jython -f $HOME/tls12.py
#Updates Websphere security to TLSv1.2
AdminTask.convertCertForSecurityStandard('[-fipsLevel SP800-131 -signatureAlgorithm SHA256withRSA -keySize 2048 ]')
AdminConfig.save()
AdminNodeManagement.syncActiveNodes()
sslConfigList=AdminTask.listSSLConfigs('[-all true]').splitlines()
for sslConfig in sslConfigList:
sslElems=sslConfig.split(" ")
AdminTask.modifySSLConfig (['-alias',sslElems[1],'-scopeName',sslElems[3],'-sslProtocol', 'TLSv1.2', '-securityLevel', 'HIGH' ])
AdminConfig.save()
AdminNodeManagement.syncActiveNodes()
After that you should also update all your ssl.client.props files with:
com.ibm.ssl.protocol=TLSv1.2
Restart your deployment manager and force manual syncNode on all nodes, for example:
~/IBM/WebSphere/AppServer/profiles/*/bin/syncNode.sh <DeplymentManagerHost> <dmgr port=8879> -username <username> -password <password>

SQL query in Pro-C fails with Error:02115

I am getting some weird behavior of Pro-C procedure as shown below:
#define BGHCPY_TO_ORA(dest, source) \
{ \
(void)strcpy((void*)(dest).arr, (void*)(source)); \
(dest).len = strlen((const char *)(dest).arr); \
}
#define BGHCPY_FROM_ORA(dest, source) \
{ \
(void)memcpy((void*)(dest), (void*)(source).arr, (size_t)(source).len); \
(dest)[(source).len] = '\0'; \
}
long fnSQLMarkProcessed (char *pszRowId, char *pszMarker)
{
BGHCPY_TO_ORA (O_rowid_stack, pszRowId);
BGHCPY_TO_ORA (O_cust_processed, pszMarker);
EXEC SQL
UPDATE document_all
SET processed_by_bgh = :O_cust_processed
WHERE rowid = :O_rowid_stack;
return (sqlca.sqlcode);
}
The input arguments values passed to above function is
pszRowId = [AAAF1lAAIAABOoRAAB], pszMarker=X
The query return the error code:02115 with following message:
SQL Error:02115 Code interpretation problem -- check COMMON_NAME usage
I am using Oracle as the backend database.
Can anyone provide me information on what are the possible causes for this failed query?
Any help is highly appreciated.
Flags used during PRO-C Compilation is defined below:
------/u01/app/oracle/product/8.1.6/ORACLE_HOME/bin/proc `echo -Dbscs5 -Dsun5 -I/export/home/bscsobw/bscs6/src/CoreDumpIssue/final_Code_Fix_004641 -DNDEBUG -DSunOS53 -D_POSIX_4SOURCES -I/usr/generic++/generic++2.5.3.64_bit/include -DFEATURE_212298 -DBSCS_CONFIG -I/export/home/bscsobw/bscs6//src/bat/include -DFEATURE_00203808_GMD -DFEATURE_00241737 -DORACLE_DB_BRAND -I/u01/app/oracle/product/8.1.6/ORACLE_HOME/rdbms/demo -I/u01/app/oracle/product/8.1.6/ORACLE_HOME/precomp/public -I/export/home/bscsobw/bscs6/src/CoreDumpIssue/final_Code_Fix_004641/include -I../bat/include -DFEATURE61717 -DFEATURE52824 -DFEATURE56178 -DD236312_d -DSDP -g | sed -e 's/-I/INCLUDE=/g' -e 's/-D[^ ]=[^ ]*//g' -e 's/-D\([^ ]*\)/DEFINE=\1/g'` select_error=no DEFINE=FEATURE61717 DEFINE=FEATURE52824 DEFINE=FEATURE56178 \
lines=yes iname=bgh_esql.pc oname=bgh_esql.c lname=bgh_esql.lis
I think, you check this message:
[oracle#sb-rac02 ~]$ oerr sql 2115
02115, 00000, "Code interpretation problem -- check COMMON_NAME usage"
// *Cause: With PRO*FORTRAN, this error occurs if the precompiler option
// COMMON_NAME is specified incorrectly. **With other Oracle
// Precompilers, this error occurs when the precompiler cannot
// generate a section of code.**
// *Action: With Pro*FORTRAN, when using COMMON_NAME to precompile two or
// more source modules, make sure to specify a different common name
// for each module. With other Oracle Precompilers, if the error
// persists, call customer support for assistance.
So you can determine, that problem not in your variables.
Please, try use your code like this:
long fnSQLMarkProcessed (char *pszRowId, char *pszMarker)
{
BGHCPY_TO_ORA (O_rowid_stack, pszRowId);
BGHCPY_TO_ORA (O_cust_processed, pszMarker);
EXEC SQL UPDATE document_all
SET processed_by_bgh = :O_cust_processed
WHERE rowid = :O_rowid_stack;
return (sqlca.sqlcode);
}

Etag definition changed in Amazon S3

I've used Amazon S3 a little bit for backups for some time. Usually, after I upload a file I check the MD5 sum matches to ensure I've made a good backup. S3 has the "etag" header which used to give this sum.
However, when I uploaded a large file recently the Etag no longer seems to be a md5 sum. It has extra digits and a hyphen "696df35ad1161afbeb6ea667e5dd5dab-2861" . I can't find any documentation about this changing. I've checked using the S3 management console and with Cyberduck.
I can't find any documentation about this change. Any pointers?
You will always get this style of ETag when uploading an multipart file. If you upload the whole file as a single file, then you will get an ETag without the -{xxxx} suffix.
Bucket Explorer will show the unsuffixed ETag for a multipart file up to 5Gb.
AWS:
The ETag for an object created using the multipart upload api will contain one or more non-hexadecimal characters and/or will consist of less than 16 or more than 16 hexadecimal digits.
Reference: https://forums.aws.amazon.com/thread.jspa?messageID=203510#203510
Amazon S3 calculates Etag with a different algorithm (not MD5 Sum, as usually) when you upload a file using multipart.
This algorithm is detailed here : http://permalink.gmane.org/gmane.comp.file-systems.s3.s3tools/583
"Calculate the MD5 hash for each uploaded part of the file,
concatenate the hashes into a single binary string and calculate the
MD5 hash of that result."
I just develop a tool in bash to calculate it, s3md5 : https://github.com/Teachnova/s3md5
For example, to calculate Etag of a file foo.bin that has been uploaded using multipart with chunk size of 15 MB, then
# s3md5 15 foo.bin
Now you can check integrity of a very big file (bigger than 5GB) because you can calculate the Etag of the local file and compares it with S3 Etag.
Also in python...
#!/usr/bin/env python3
import binascii
import hashlib
import os
# Max size in bytes before uploading in parts.
AWS_UPLOAD_MAX_SIZE = 20 * 1024 * 1024
# Size of parts when uploading in parts
# note: 2022-01-27 bitnami-minio container uses 5 mib
AWS_UPLOAD_PART_SIZE = int(os.environ.get('AWS_UPLOAD_PART_SIZE', 5 * 1024 * 1024))
def md5sum(sourcePath):
'''
Function: md5sum
Purpose: Get the md5 hash of a file stored in S3
Returns: Returns the md5 hash that will match the ETag in S3
'''
filesize = os.path.getsize(sourcePath)
hash = hashlib.md5()
if filesize > AWS_UPLOAD_MAX_SIZE:
block_count = 0
md5bytes = b""
with open(sourcePath, "rb") as f:
block = f.read(AWS_UPLOAD_PART_SIZE)
while block:
hash = hashlib.md5()
hash.update(block)
block = f.read(AWS_UPLOAD_PART_SIZE)
md5bytes += binascii.unhexlify(hash.hexdigest())
block_count += 1
hash = hashlib.md5()
hash.update(md5bytes)
hexdigest = hash.hexdigest() + "-" + str(block_count)
else:
with open(sourcePath, "rb") as f:
block = f.read(AWS_UPLOAD_PART_SIZE)
while block:
hash.update(block)
block = f.read(AWS_UPLOAD_PART_SIZE)
hexdigest = hash.hexdigest()
return hexdigest
Here is an example in Go:
func GetEtag(path string, partSizeMb int) string {
partSize := partSizeMb * 1024 * 1024
content, _ := ioutil.ReadFile(path)
size := len(content)
contentToHash := content
parts := 0
if size > partSize {
pos := 0
contentToHash = make([]byte, 0)
for size > pos {
endpos := pos + partSize
if endpos >= size {
endpos = size
}
hash := md5.Sum(content[pos:endpos])
contentToHash = append(contentToHash, hash[:]...)
pos += partSize
parts += 1
}
}
hash := md5.Sum(contentToHash)
etag := fmt.Sprintf("%x", hash)
if parts > 0 {
etag += fmt.Sprintf("-%d", parts)
}
return etag
}
This is just an example, you should handle errors and stuff
Here's a powershell function to calculate the Amazon ETag for a file:
$blocksize = (1024*1024*5)
$startblocks = (1024*1024*16)
function AmazonEtagHashForFile($filename) {
$lines = 0
[byte[]] $binHash = #()
$md5 = [Security.Cryptography.HashAlgorithm]::Create("MD5")
$reader = [System.IO.File]::Open($filename,"OPEN","READ")
if ((Get-Item $filename).length -gt $startblocks) {
$buf = new-object byte[] $blocksize
while (($read_len = $reader.Read($buf,0,$buf.length)) -ne 0){
$lines += 1
$binHash += $md5.ComputeHash($buf,0,$read_len)
}
$binHash=$md5.ComputeHash( $binHash )
}
else {
$lines = 1
$binHash += $md5.ComputeHash($reader)
}
$reader.Close()
$hash = [System.BitConverter]::ToString( $binHash )
$hash = $hash.Replace("-","").ToLower()
if ($lines -gt 1) {
$hash = $hash + "-$lines"
}
return $hash
}
If you use multipart uploads, the "etag" is not the MD5 sum of the data (see What is the algorithm to compute the Amazon-S3 Etag for a file larger than 5GB?). One can identify this case by the etag containing a dash, "-".
Now, the interesting question is how to get the actual MD5 sum of the data, without downloading? One easy way is to just "copy" the object onto itself, this requires no download:
s3cmd cp s3://bucket/key s3://bucket/key
This will cause S3 to recompute the MD5 sum and store it as "etag" of the just copied object. The "copy" command runs directly on S3, i.e., no object data is transferred to/from S3, so this requires little bandwidth! (Note: do not use s3cmd mv; this would delete your data.)
The underlying REST command is:
PUT /key HTTP/1.1
Host: bucket.s3.amazonaws.com
x-amz-copy-source: /buckey/key
x-amz-metadata-directive: COPY
Copying to s3 with aws s3 cp can use multipart uploads and the resulting etag will not be an md5, as others have written.
To upload files without multipart, use the lower level put-object command.
aws s3api put-object --bucket bucketname --key remote/file --body local/file
This AWS support page - How do I ensure data integrity of objects uploaded to or downloaded from Amazon S3? - describes a more reliable way to verify the integrity of your s3 backups.
Firstly determine the base64 encoded md5sum of the file you wish to upload:
$ md5_sum_base64="$( openssl md5 -binary my-file | base64 )"
Then use the s3api to upload the file:
$ aws s3api put-object --bucket my-bucket --key my-file --body my-file --content-md5 "$md5_sum_base64"
Note the use of the --content-md5 flag, the help for this flag states:
--content-md5 (string) The base64-encoded 128-bit MD5 digest of the part data.
This does not say much about why to use this flag, but we can find this information in the API documentation for put object:
To ensure that data is not corrupted traversing the network, use the Content-MD5 header. When you use this header, Amazon S3 checks the object against the provided MD5 value and, if they do not match, returns an error. Additionally, you can calculate the MD5 while putting an object to Amazon S3 and compare the returned ETag to the calculated MD5 value.
Using this flag causes S3 to verify that the file hash serverside matches the specified value. If the hashes match s3 will return the ETag:
{
"ETag": "\"599393a2c526c680119d84155d90f1e5\""
}
The ETag value will usually be the hexadecimal md5sum (see this question for some scenarios where this may not be the case).
If the hash does not match the one you specified you get an error.
A client error (InvalidDigest) occurred when calling the PutObject operation: The Content-MD5 you specified was invalid.
In addition to this you can also add the file md5sum to the file metadata as an additional check:
$ aws s3api put-object --bucket my-bucket --key my-file --body my-file --content-md5 "$md5_sum_base64" --metadata md5chksum="$md5_sum_base64"
After upload you can issue the head-object command to check the values.
$ aws s3api head-object --bucket my-bucket --key my-file
{
"AcceptRanges": "bytes",
"ContentType": "binary/octet-stream",
"LastModified": "Thu, 31 Mar 2016 16:37:18 GMT",
"ContentLength": 605,
"ETag": "\"599393a2c526c680119d84155d90f1e5\"",
"Metadata": {
"md5chksum": "WZOTosUmxoARnYQVXZDx5Q=="
}
}
Here is a bash script that uses content md5 and adds metadata and then verifies that the values returned by S3 match the local hashes:
#!/bin/bash
set -euf -o pipefail
# assumes you have aws cli, jq installed
# change these if required
tmp_dir="$HOME/tmp"
s3_dir="foo"
s3_bucket="stack-overflow-example"
aws_region="ap-southeast-2"
aws_profile="my-profile"
test_dir="$tmp_dir/s3-md5sum-test"
file_name="MailHog_linux_amd64"
test_file_url="https://github.com/mailhog/MailHog/releases/download/v1.0.0/MailHog_linux_amd64"
s3_key="$s3_dir/$file_name"
return_dir="$( pwd )"
cd "$tmp_dir" || exit
mkdir "$test_dir"
cd "$test_dir" || exit
wget "$test_file_url"
md5_sum_hex="$( md5sum $file_name | awk '{ print $1 }' )"
md5_sum_base64="$( openssl md5 -binary $file_name | base64 )"
echo "$file_name hex = $md5_sum_hex"
echo "$file_name base64 = $md5_sum_base64"
echo "Uploading $file_name to s3://$s3_bucket/$s3_dir/$file_name"
aws \
--profile "$aws_profile" \
--region "$aws_region" \
s3api put-object \
--bucket "$s3_bucket" \
--key "$s3_key" \
--body "$file_name" \
--metadata md5chksum="$md5_sum_base64" \
--content-md5 "$md5_sum_base64"
echo "Verifying sums match"
s3_md5_sum_hex=$( aws --profile "$aws_profile" --region "$aws_region" s3api head-object --bucket "$s3_bucket" --key "$s3_key" | jq -r '.ETag' | sed 's/"//'g )
s3_md5_sum_base64=$( aws --profile "$aws_profile" --region "$aws_region" s3api head-object --bucket "$s3_bucket" --key "$s3_key" | jq -r '.Metadata.md5chksum' )
if [ "$md5_sum_hex" == "$s3_md5_sum_hex" ] && [ "$md5_sum_base64" == "$s3_md5_sum_base64" ]; then
echo "checksums match"
else
echo "something is wrong checksums do not match:"
cat <<EOM | column -t -s ' '
$file_name file hex: $md5_sum_hex s3 hex: $s3_md5_sum_hex
$file_name file base64: $md5_sum_base64 s3 base64: $s3_md5_sum_base64
EOM
fi
echo "Cleaning up"
cd "$return_dir"
rm -rf "$test_dir"
aws \
--profile "$aws_profile" \
--region "$aws_region" \
s3api delete-object \
--bucket "$s3_bucket" \
--key "$s3_key"
Here is C# version
string etag = HashOf("file.txt",8);
source code
private string HashOf(string filename,int chunkSizeInMb)
{
string returnMD5 = string.Empty;
int chunkSize = chunkSizeInMb * 1024 * 1024;
using (var crypto = new MD5CryptoServiceProvider())
{
int hashLength = crypto.HashSize/8;
using (var stream = File.OpenRead(filename))
{
if (stream.Length > chunkSize)
{
int chunkCount = (int)Math.Ceiling((double)stream.Length/(double)chunkSize);
byte[] hash = new byte[chunkCount*hashLength];
Stream hashStream = new MemoryStream(hash);
long nByteLeftToRead = stream.Length;
while (nByteLeftToRead > 0)
{
int nByteCurrentRead = (int)Math.Min(nByteLeftToRead, chunkSize);
byte[] buffer = new byte[nByteCurrentRead];
nByteLeftToRead -= stream.Read(buffer, 0, nByteCurrentRead);
byte[] tmpHash = crypto.ComputeHash(buffer);
hashStream.Write(tmpHash, 0, hashLength);
}
returnMD5 = BitConverter.ToString(crypto.ComputeHash(hash)).Replace("-", string.Empty).ToLower()+"-"+ chunkCount;
}
else {
returnMD5 = BitConverter.ToString(crypto.ComputeHash(stream)).Replace("-", string.Empty).ToLower();
}
stream.Close();
}
}
return returnMD5;
}
To go one step beyond the OP's question.. chances are, these chunked ETags are making your life difficult in trying to compare them client-side.
If you are publishing your artifacts to S3 using the awscli commands (cp, sync, etc), the default threshold at which multipart upload seems to be used is 10MB. Recent awscli releases allow you to configure this threshold, so you can disable multipart and get an easy to use MD5 ETag:
aws configure set default.s3.multipart_threshold 64MB
Full documentation here: http://docs.aws.amazon.com/cli/latest/topic/s3-config.html
A consequence of this could be downgraded upload performance (I honestly did not notice). But the result is that all files smaller than your configured threshold will now have normal MD5 hash ETags, making them much easier to delta client side.
This does require a somewhat recent awscli install. My previous version (1.2.9) did not support this option, so I had to upgrade to 1.10.x.
I was able to set my threshold up to 1024MB successfully.
Based on answers here, I wrote a Python implementation which correctly calculates both multi-part and single-part file ETags.
def calculate_s3_etag(file_path, chunk_size=8 * 1024 * 1024):
md5s = []
with open(file_path, 'rb') as fp:
while True:
data = fp.read(chunk_size)
if not data:
break
md5s.append(hashlib.md5(data))
if len(md5s) == 1:
return '"{}"'.format(md5s[0].hexdigest())
digests = b''.join(m.digest() for m in md5s)
digests_md5 = hashlib.md5(digests)
return '"{}-{}"'.format(digests_md5.hexdigest(), len(md5s))
The default chunk_size is 8 MB used by the official aws cli tool, and it does multipart upload for 2+ chunks. It should work under both Python 2 and 3.
Improving on #Spedge's and #Rob's answer, here is a python3 md5 function that takes in a file-like and does not rely on being able to get the file size with os.path.getsize.
# Function : md5sum
# Purpose : Get the md5 hash of a file stored in S3
# Returns : Returns the md5 hash that will match the ETag in S3
# https://github.com/boto/boto3/blob/0cc6042615fd44c6822bd5be5a4019d0901e5dd2/boto3/s3/transfer.py#L169
def md5sum(file_like,
multipart_threshold=8 * 1024 * 1024,
multipart_chunksize=8 * 1024 * 1024):
md5hash = hashlib.md5()
file_like.seek(0)
filesize = 0
block_count = 0
md5string = b''
for block in iter(lambda: file_like.read(multipart_chunksize), b''):
md5hash = hashlib.md5()
md5hash.update(block)
md5string += md5hash.digest()
filesize += len(block)
block_count += 1
if filesize > multipart_threshold:
md5hash = hashlib.md5()
md5hash.update(md5string)
md5hash = md5hash.hexdigest() + "-" + str(block_count)
else:
md5hash = md5hash.hexdigest()
file_like.seek(0)
return md5hash
Of course, the multipart upload of files could be common issue. In my case, I was serving static files through S3 and the etag of .js file was coming out to be different from the local file even while the content was the same.
Turns out that even while the content was the same, it was because the line endings were different. I fixed the line endings in my git repository, uploaded the changed files to S3 and it works fine now.
I built on r03's answer and have a standalone Go utility for this here:
https://github.com/lambfrier/calc_s3_etag
Example usage:
$ dd if=/dev/zero bs=1M count=10 of=10M_file
$ calc_s3_etag 10M_file
669fdad9e309b552f1e9cf7b489c1f73-2
$ calc_s3_etag -chunksize=15 10M_file
9fbaeee0ccc66f9a8e3d3641dca37281-1
The python example works great, but when working with Bamboo, they set the part size to 5MB which is NON STANDARD!! (s3cmd is 15MB) Also adjusted to use 1024 to calculate bytes.
Revised to work for bamboo artifact s3 repos.
import hashlib
import binascii
# Max size in bytes before uploading in parts.
AWS_UPLOAD_MAX_SIZE = 20 * 1024 * 1024
# Size of parts when uploading in parts
AWS_UPLOAD_PART_SIZE = 5 * 1024 * 1024
#
# Function : md5sum
# Purpose : Get the md5 hash of a file stored in S3
# Returns : Returns the md5 hash that will match the ETag in S3
def md5sum(sourcePath):
filesize = os.path.getsize(sourcePath)
hash = hashlib.md5()
if filesize > AWS_UPLOAD_MAX_SIZE:
block_count = 0
md5string = ""
with open(sourcePath, "rb") as f:
for block in iter(lambda: f.read(AWS_UPLOAD_PART_SIZE), ""):
hash = hashlib.md5()
hash.update(block)
md5string = md5string + binascii.unhexlify(hash.hexdigest())
block_count += 1
hash = hashlib.md5()
hash.update(md5string)
return hash.hexdigest() + "-" + str(block_count)
else:
with open(sourcePath, "rb") as f:
for block in iter(lambda: f.read(AWS_UPLOAD_PART_SIZE), ""):
hash.update(block)
return hash.hexdigest()