I would like to apply arbitrarily defined bit mask as virtual aperture and apply it to 4D-STEM data set in an EFFICIENT way.
I did it using the SliceN function and apply the mask pixel-by-pixel, which is very slow for large datasets. How to optimize it to so to run faster?
Image 4DSTEM := GetFrontImage() // dimention [ScanX, ScanY, Dx, Dy]
Image mask: = iradius // just an arbitrary mask (aperture)
Image out // dimention [ScanX, ScanY]
for (number i=0; i<ScanX; i++)
{ for (number j=0; j<ScanY; j++)
{
Diff2D = 4DSTEM.SliceN(4,2,i,j,0,0,2,Dx,1,3,Dy,1)
out.setpixel(i,j, sum(diff2D*mask))
}
}
out.showimage()
for an [100,100,512,512] dataset, that took few minutes to finish. When I have to repeat the operation several times, that is way to slow compare to matrix operation. but I dont know how to make it in an efficient way.
Thanks!
you're hitting the limitations of scripting languages here. Using sliceN is already pretty much the optimum you can get to, unfortunately. Everything else in speed optimization requires parallelized, compiled code. (i.e. you could code C++ code and use the SDK to compile your own plugin.)
However, there is a bit of room for improvement over your example.
First of all, your example above doesn't run :c) But that is quickly fixed.
Point #1:
Try to avoid number type casting. DM script only knows number but internally there is a difference between the proper number types (integer, floating point, signed/unsigned, byte-size). The script languages uses real-4-byte as the default unless told differently explicitly. And some methods will return real-4-byte by default. For this reason, the processing will be fastest, if both data and mask use real-4-byte data as well.
In my testing, the time-difference between running with uint16 data plus uint8 mask and *real4 data plus real4 mask) was significant! Nearly 30% time difference.
Point #2:
Don't copy you sliced image! Use := not = for your Dif2D.
The SliceN command returns an expression directly addressing the required memory. You can use it directly in any other expression (like I do below) or you can assign an image variable to it using := to give it a name.
The speed increase is not huge, but it's one copy-operation less per loop iteration.
Point #3:
You additional knowledge: Now for arbitrary masks there is not much you can do, but most often masks are zero-valued over large stretches and it is possible to define a smaller ROI containing all non-zero points. If this is the case, you can limit your math operations to that region.
i.e. instead of multiplying the whole DP with the same sized mask, just use a smaller mask and use the according sub-section of the DP.
This can actually make a big difference, but it will depend on your mask.
Of course you need to "find" this ROI first. In my script below I'm having a helper method to do that, utilizing the comparatively fast max() command and image rotation as trick for speed-up.
Point #4:
...would be to get rid of the double-for loop and replace it with image-expressions. Unfortunately, DigitalMicrograph does currently (GMS 3.3) not support this for 4D or 5D data.
The script below executed on a [53 x 52 x 512 x 512] STEM DI (of real-4 byte data) gave me the following timings:
Original: 12.80910 sec
Test 1 : 10.77700 sec
Test 2 : 1.83017 sec
// Helper class for timing
class CTimer{
number s
string n
~CTimer(object self){result("\n"+n+": "+ (GetHighResTickCount()-s)/GetHighResTicksPerSecond()+" sec");}
object Start(object self, string n_) { n=n_; s=GetHighResTickCount(); return self;}
}
// Helper method to find best non-zero containing ROI
void GetNonZeroArea( image src, number &t, number &l, number &b, number &r )
{
image work = !!src // Make a binary image which is 0 only where src==0
number d
max(work,d,t) // get "first" non-zero pixel coordinate, this is y = dist from TOP
rotateRight(work) // rotate image right
max(work,d,l) // get "first" non-zero pixel coordinate, this is y = dist from LEFT
rotateRight(work) // rotate image right
max(work,d,b) // get "first" non-zero pixel coordinate, this is y = dist from BOTTOM
b = work.ImageGetDimensionSize(1) - b // Opposite side!
rotateRight(work) // rotate image right
max(work,d,r) // get "first" non-zero pixel coordinate
r = work.ImageGetDimensionSize(1) - r // Opposite side!
}
// The original proposed script (plus fixes to make it actually run)
image Original(image STEM4D, image mask)
{
Number ScanX = STEM4D.ImageGetDimensionSize(0)
Number ScanY = STEM4D.ImageGetDimensionSize(1)
Number Dx = STEM4D.ImageGetDimensionSize(2)
Number Dy = STEM4D.ImageGetDimensionSize(3)
Image out := RealImage("Test1",4,ScanX,ScanY)
for (number i=0; i<ScanX; i++)
{ for (number j=0; j<ScanY; j++)
{
image Diff2D = STEM4D.SliceN(4,2,i,j,0,0,2,Dx,1,3,Dy,1)
out.setpixel(i,j, sum(Diff2D*mask))
}
}
return out
}
// Remove copying the slice, just reference it
image Test1(image STEM4D, image mask)
{
Number ScanX = STEM4D.ImageGetDimensionSize(0)
Number ScanY = STEM4D.ImageGetDimensionSize(1)
Number Dx = STEM4D.ImageGetDimensionSize(2)
Number Dy = STEM4D.ImageGetDimensionSize(3)
Image out := RealImage("Test1",4,ScanX,ScanY)
for (number i=0; i<ScanX; i++)
{ for (number j=0; j<ScanY; j++)
{
image Diff2D := STEM4D.SliceN(4,2,i,j,0,0,2,Dx,1,3,Dy,1)
out.setpixel(i,j, sum(Diff2D*mask))
}
}
return out
}
// Limit mask size to what is needed!
image Test2(image STEM4D, image mask )
{
Number ScanX = STEM4D.ImageGetDimensionSize(0)
Number ScanY = STEM4D.ImageGetDimensionSize(1)
Number Dx = STEM4D.ImageGetDimensionSize(2)
Number Dy = STEM4D.ImageGetDimensionSize(3)
Image out := RealImage("Test1",4,ScanX,ScanY)
Number t,l,b,r
GetNonZeroArea(mask,t,l,b,r)
Number w = r - l
Number h = b - t
image subMask := mask.slice2(l,t,0, 0,w,1, 1,h,1 )
for (number i=0; i<ScanX; i++)
for (number j=0; j<ScanY; j++)
out.setpixel(i,j, sum(STEM4D.SliceN(4,2,i,j,l,t,2,w,1,3,h,1)*subMask))
return out
}
Image src := GetFrontImage() // dimention [ScanX, ScanY, Dx, Dy]
Number ScanX = src.ImageGetDimensionSize(0)
Number ScanY = src.ImageGetDimensionSize(1)
Number Dx = src.ImageGetDimensionSize(2)
Number Dy = src.ImageGetDimensionSize(3)
Number r = 50 // mask radius
Image maskImg := RealImage("Mask",4,Dx,Dy)
maskImg = iradius < r ? 1 : 0 // just an aperture mask
image resultImg
{
object timer = Alloc(CTimer).Start("Original")
resultImg := Original(src,maskImg)
}
resultImg.SetName("Oringal")
resultImg.ShowImage()
{
object timer = Alloc(CTimer).Start("Test 1")
Test1(src,maskImg).ShowImage()
}
resultImg.SetName("Test 1")
resultImg.ShowImage()
{
object timer = Alloc(CTimer).Start("Test 2")
Test2(src,maskImg).ShowImage()
}
resultImg.SetName("Test 2")
resultImg.ShowImage()
Compiled code comparison:
Now, it should be added that the above script still is rather slow. Because it is iterating and using script language. The fully compiled c++ code of DigitalMicrograph is much faster. So if you have the licensed packages giving you the SI menu, then you want to use the SI/Map/Signal command. This is near-instantaneous for the example STEM DI I've mentioned above. My other answer shows how one could utilize this functionality by script.
As mentioned in my other answer, a real speed-win comes when compiled, parallelized code is used. DigitalMicrograph does this, after all, in the available SI "signal" map functionality. This feature is not available in the free version, but if you have Spectrum-Imaging acquisition, you most likely have the appropriated license as well.
The answer below utilizes this functionality by accessing the UI with the command ChooseMenuItem() and applying a few more tricks. The script is a bit lengthy, but its parts also show some other nice tricks worthwhile knowing:
TestSignalIntegrationInSI is the main script demoing how things can work.
CreatePickerByScript shows how one can create picker-spectra on SIs. This is used to open a 'Picker Diffraction Pattern' image from the STEM DI.
AddTestMasksToDP_ROIs programmatically adds ROIs to the diffraction pattern to be used as mask
AddTestMasksToDP_Threshold programmatically adds an intensity-threshold mask to be used as mask.
AddTestMasksToDP_DPMasks programmatically adds the various types of diffraction-masks to be used as mask
GetIntegratedSignalViaSIMenu is the central step of the script. With a picker-DP and required 'masks' on it front-most, the menu command is called to perform the signal-extraction (as fast as possible.) Then the displayed result-image is returned.
GetNewestImage is just a utility method showing how on can access the latest memory-created image.
Here is the script:
image GetNewestImage()
{
// New images get the next higher imageID.
// This can be used to identify the "latest" created image.
if ( 0 == CountImages() ) Throw( "No image in memory!" )
// We create a temp. image to get the uppder limit
number lastID = RealImage("Dummy",4,1).ImageGetID()
// Then we search for the next lower existing one
image lastImg
for( number ID = lastID - 1; ID>0; ID-- )
{
lastImg := FindImageByID(ID)
if ( lastImg.ImageIsValid() ) break
}
return lastImg
}
image CreatePickerByScript( image SI, number t, number l, number b, number r )
{
if ( SI.ImageGetNumDimensions()<3 ) Throw( "Sorry, LineScans are not supprorted here." )
// Adding a non-volatile ROI of specific RoiNAME acts as if using
// the picker-tool. The ID string must be unique!
ROI pickerROI = NewROI()
pickerROI.RoiSetVolatile( 0 )
string uniqueID = GetDate(0)+"#"+GetTime(1)+";"+round(random()*1000)
pickerROI.RoiSetName( "SICursor(##"+uniqueID+"##)" )
SI.ImageGetImageDisplay(0).ImageDisplayAddROI( pickerROI )
// This creates the picker image.
// So the child is now the "newest" image in memory
image child := GetNewestImage()
return child
}
void AddTestMasksToDP_ROIs( image DP )
{
// Add ROIs to the DP which are your masks (any numebr and type of ROI works)
imageDisplay DPdisp = DP.ImageGetImageDisplay(0)
number dpX = DP.ImageGetDimensionSize(0)
number dpY = DP.ImageGetDimensionSize(1)
// Only simple RECT ROIs are supported
ROI maskRoi1 = NewROI()
maskRoi1.ROISetRectangle( dpY*0.1, dpX*0.1, dpY*0.8, dpX*0.3 )
DPdisp.ImageDisplayAddROI(maskRoi1)
// Arbitrary multi-vertex (use for ovals etc.)
ROI maskRoi2 = NewROI()
maskRoi2.ROISetRectangle( dpY*0.7, dpX*0.1, dpY*0.9, dpX*0.9 )
DPdisp.ImageDisplayAddROI(maskRoi2)
}
void AddTestMasksToDP_Threshold( image DP )
{
// Add intensity treshhold mask (highest 95% intensity range)
imageDisplay DPdisp = DP.ImageGetImageDisplay(0)
DPdisp.RasterImageDisplaySetThresholdOn( 1 )
number low = max(DP) * 0.05
number high = max(DP)
DPdisp.RasterImageDisplaySetThresholdLimits( low, high )
}
void AddTestMasksToDP_DPMasks( image DP )
{
// Add Diffraction masks to the DP
imageDisplay DPdisp = DP.ImageGetImageDisplay(0)
// Spot masks (always symmetric pair)
Component spotMask = NewComponent(8,0,0,0,0) // 8 = Spotmask
spotMask.ComponentSetControlPoint(4, 0, 0,0) // 4 = TopLeft of one spot [Size only]
spotMask.ComponentSetControlPoint(7,10,10,0) // 7 = BottomRight of one spot [Size only]
spotMask.ComponentSetControlPoint(8,150,0,0) // 8 = Spot position [center]
DPdisp.ComponentAddChildAtEnd(spotMask)
// Bandpass mask (Only circles are correctly supported)
Component bandpassMask = NewComponent(15,0,0,0,0) // 15 = Bandpass (ring)
number r1 = 100
number r2 = 120
bandpassMask.ComponentSetControlPoint(7,r1,r1,0) // 7 = BottomRight of one ring [Size only]
bandpassMask.ComponentSetControlPoint(14,r2,r2,0) // 14 = BottomRight of one ring [Size only]
DPdisp.ComponentAddChildAtEnd(bandpassMask)
// Wege mask (symmetric)
Component wedgeMask = NewComponent(19,0,0,0,0) // 19 = wedgemask (ringsegment)
wedgeMask.ComponentSetControlPoint(9,10,20,0) // 9 = One wedge vector
wedgeMask.ComponentSetControlPoint(10,-20,40,0) // 10 = Other wedge vector
DPdisp.ComponentAddChildAtEnd(wedgeMask)
// Array mask (symmetric)
Component arrayMask = NewComponent(9,0,0,0,0) // 9 = arrayMask (ringsegment)
arrayMask.ComponentSetControlPoint(9,-70,-60,0) // 9 = One array vector
arrayMask.ComponentSetControlPoint(10,99,-99,0) // 10 = Other array vector
arrayMask.ComponentSetControlPoint(4, 0, 0,0) // 4 = TopLeft of one spot [Size only]
arrayMask.ComponentSetControlPoint(7,20,20,0) // 7 = BottomRight of one spot [Size only]
DPdisp.ComponentAddChildAtEnd(arrayMask)
}
image GetIntegratedSignalViaSIMenu( image pickerChild )
{
// Call the Menu to do the work
// The picker-spectrum or DP needs to be front-most
pickerChild.SelectImage()
ChooseMenuItem("SI","Map","Signal")
// The created signal map is NOT the newest image
// (some internal iamges are created for the mask)
// but it is the front-most displayed one.
image signalMap := GetFrontImage()
return signalMap
}
image GetMaskFromSignalMap( image signalMap, number DPx, number DPy )
{
// The actual mask is stored in the tags
string tagPath = "Processing:[0]:Parameters:Mask"
tagGroup tg = signalMap.ImageGetTagGroup()
if ( !tg.TagGroupDoesTagExist(tagPath) )
Throw( "Sorry, no mask tag found." )
image mask := RealImage("Mask",4,DPx, DPy )
if ( !tg.TagGroupGetTagAsArray(tagPath,mask) )
Throw( "Sorry, could not retrieve mask. Maybe wrong size?" )
return mask
}
void TestSignalIntegrationInSI()
{
image STEMDI := GetFrontImage()
image DP := STEMDI.CreatePickerByScript(0,0,1,1)
if ( TwoButtonDialog( "Add ROIs as mask?", "Yes", "No" ) )
AddTestMasksToDP_ROIs( DP )
else if ( TwoButtonDialog( "Add intensity treshold as mask?", "Yes", "No" ) )
AddTestMasksToDP_Threshold( DP )
else if ( TwoButtonDialog( "Add diffraction masks as mask?", "Yes", "No" ) )
AddTestMasksToDP_DPMasks( DP )
image signalMap := GetIntegratedSignalViaSIMenu( DP )
number dpX = DP.ImageGetDimensionSize(0)
number dpY = DP.ImageGetDimensionSize(1)
// We may want to close the DP again. No longer needed
//DP.DeleteImage()
// Verification: Get Mask image form SignalMap
image usedMask := GetMaskFromSignalMap( signalMap, dpX, dpY )
usedMask.SetName( "This mask was used." )
usedMask.ShowImage()
}
TestSignalIntegrationInSI()
The solution below utilizes the intrinsic expression loops by performing in-place multiplication and then projection.
Disappointingly, it turns out the solution is actually a bit slower then the for-loop with the SliceN command.
For the same test-data of size [53 x 52 x 512 x 512] the resulting timing is:
Data copy: 1.28073 sec
Inplace multiply: 30.1978 sec
Project 1/2: 1.1208 sec
Project 2/2: 0.0019557 sec
InPlace multiplication with projections (total): 32.9045 sec
InPlace multiplication with projections (total): 34.9853 sec
// Helper class for timing
class CTimer{
number s
string n
~CTimer(object self){result("\n"+n+": "+ (GetHighResTickCount()-s)/GetHighResTicksPerSecond()+" sec");}
object Start(object self, string n_) { n=n_; s=GetHighResTickCount(); return self;}
}
image MaskMultipliedSum( image STEM4D, image MASK2D, number copyFirst )
{
// Boring feasability checks...
if ( 4 != STEM4D.ImageGetNumDimensions() )
Throw( "Input data is not 4D." )
if ( 2 != MASK2D.ImageGetNumDimensions() )
Throw( "Input mask is not 2D." )
Number ScanX = STEM4D.ImageGetDimensionSize(0)
Number ScanY = STEM4D.ImageGetDimensionSize(1)
Number Dx = STEM4D.ImageGetDimensionSize(2)
Number Dy = STEM4D.ImageGetDimensionSize(3)
if ( Dx != MASK2D.ImageGetDimensionSize(0) )
Throw ("X dimension of mask does not match input data." )
if ( Dy != MASK2D.ImageGetDimensionSize(1) )
Throw ("Y dimension of mask does not match input data." )
// Do the maths!
image workCopy4D
if ( copyFirst )
{
object timer = Alloc(CTimer).Start("Data copy")
workCopy4D = STEM4D
}
else
workCopy4D := STEM4D
{
object timer = Alloc(CTimer).Start("Inplace multiply")
workCopy4D *= MASK2D[idimindex(2),idimindex(3)]
}
// Now we want to "sum up" over Dx and Dy
image p1,p2
{
object timer = Alloc(CTimer).Start("Project 1/2")
p1 := project( workCopy4D, 3 )
}
{
object timer = Alloc(CTimer).Start("Project 2/2")
p2 := project( p1, 2 )
}
return p2
}
image stack4D, mask2D
If ( GetTwoLabeledImagesWithPrompt("Please select 4D data and 2D mask", "Select input", "4D data", stack4D, "2D mask", mask2D ) )
{
number doCopy = TwoButtonDialog("Create workcopy?","Yes (takes time)","No (overwrites input data!)")
object timer = Alloc(CTimer).Start("InPlace multiplication with projections (total)")
MaskMultipliedSum(stack4D,mask2D,doCopy).ShowImage()
}