Pytesseract OCR with different colors - python-tesseract

I am trying to read this type of image with pytesseract but I have some issue with the part in yellow because the color transformation that works for other chracters won't work for those in yellow boxes. Also I want to keep the " numbers fo each row well split.
Any idea how I could manage that?
Thanks
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# invert = 255 - thresh
# OCR
data = pytesseract.image_to_string(thresh, config="--psm 6")
print(data)
cv2.imshow("thresh", thresh)
# cv2.imshow("invert", invert)
cv2.waitKey()
Returns: '> SKAPOVALOY 4 (15\nRINDERKNECH 6 [EY 15\n'

Related

Jumbled text on pymupdf textbox creation

I've created a textbox redaction on pymupdf that seems to work perfectly.
But when viewing it on Mac OS, the numbers appear incorrect and jumbled. Anyone have an idea what could change a pdf's view for an identical file across OS?
def apply_overlay(
page, new_area, variable, fontsize, color, align, font, is_column=False
):
col = fitz.utils.getColor("white")
variable_area = copy.deepcopy(new_area)
variable_area.y1 = new_area.y0 + fontsize + 3
redaction = page.addRedactAnnot(
variable_area, fill=col, text=" "
) # flags not available
else:
redaction = page.addRedactAnnot(
new_area, fill=col, text=" "
)
page.apply_redactions(images=fitz.PDF_REDACT_IMAGE_NONE)
writer = fitz.TextWriter(page.rect, color=color)
assignment
writer.fill_textbox(
new_area, variable, fontsize=fontsize, warn=True, align=align, font=font
)
writer.write_text(page)
# To show what happened, draw the rectangles, etc.
shape = page.newShape()
shape.drawRect(new_area) # the rect within which we had to stay
shape.finish(stroke_opacity=0) # show in red color
shape.commit()
shape = page.newShape()
shape.drawRect(writer.text_rect) # the generated TextWriter rectangle
shape.drawCircle(writer.last_point, 2) # coordinates of end of text
shape.finish(stroke_opacity=0) # show with blue color
shape.commit()
return shape

remove background using u2net produced mask

I am trying to remove background from an image. For this purpose I am using U2NET. I am writing the network structure using Tensorflow by following this repository. I have changed the model architecture according to my needs. It takes 96x96 image and produces 7 masks. I am taking 1st mask (out of 7) and multiplying it against the all channels of original 96x96 image.
The code that predicts 7 masks is:
img = Image.open(os.path.join('DUTS-TE','DUTS-TE-Image', test_x_names[90]))
copied = deepcopy(img)
copied = copied.resize((96,96))
copied = np.expand_dims(copied,axis=0)
preds = model.predict(copied)
preds = np.squeeze(preds)
"preds[0]" is:
predicted mask
Multiplying the mask against the original image produces:
masked image and corresponding code is ("img2" is original image):
img2 = np.asarray(img2)
immg = np.zeros((96,96,3), np.uint8)
for i in range(0,3):
immg[:,:,i] = img2[:,:,i] * preds[0]
plt.imshow(immg)
plt.show()
If i binarize the mask and then multiply it against the original image it produces :
enter image description here and corresponding code is :
frame = binarize(preds[0,:,:], threshold = 0.5)
img2 = np.asarray(img2)
immg = np.zeros((96,96,3), np.uint8)
for i in range(0,3):
immg[:,:,i] = img2[:,:,i] * frame
plt.imshow(immg)
plt.show()
Multiplying the original image with mask or binarized mask do not segment the foreground properly from the background. So, what can be done? Am I missing something?

Combining multiple values from database into one image

I'm trying to take 5 consecutive pixels from each image of a database, and position them consecutively to create a new image of 250x250px. all images in the database are 250x250px.
The Numpy array I'm getting has only 250 items in it, although the database has about 13,000 photos in it. Can someone help me spot the problem?
Current output for 'len(new_img_pxl)' = 250
Illustration
#edit:
from imutils import paths
import cv2
import numpy as np
# access database
database_path = list(paths.list_images('database'))
#grey scale database
img_gray = []
x = -5
y = 0
r = 0
new_img_pxl = []
# open as grayscale, resize
for img_path in database_path:
img = cv2.imread(img_path, cv2.IMREAD_GRAYSCALE)
img_resize = cv2.resize(img, (250, 250))
img_gray.append(img_resize)
# take five consecutive pixel from each image
for item in img_gray:
x += 5
y += 5
five_pix = item[[r][x:y]]
for pix in five_pix:
new_img_pxl.append(pix)
if y == 250:
r += 1
x = -5
y = 0
# convert to array
new_img_pxl_array = np.array(new_img_pxl)
reshape_new_img = new_img_pxl_array.reshape(25,10)
# Convert the pixels into an array using numpy
array = np.array(reshape_new_img, dtype=np.uint8)
new_img_output = cv2.imwrite('new_output_save/001.png',reshape_new_img)
your bug is in the second loop.
for item in img_gray:
for every image (i) in the list img_gray you do:
for a in item:
for each row (j) in the image (i), extract 5 pixels and append them to new_img_pxl.
the first bug is that you don't take just 5 pixels from each image, you take 5 pixels from each row of each image.
your 2nd bug is that after extracting 250 pixels the values of the variables x and y are higher than 250 (the length of a row). As a result, when you try to access the pixels [250:255] and so on you get 'None'.
If I understand your intentions, then the way you should have implemented this is as follows:
r = 0
# As Mark Setchell suggested, you might want to change iterating
# over a list of images to iterating over the list of paths
# for img_path in database_path:
for item in img_gray:
# As Mark Setchell suggested, you might wat to load and
# process your image here, overwriting the past image and
# having the memory released
x += 5
y += 5
# when you finish a row jump to the next?
if x==250:
x = 0
y = 5
r+=1
# not sure what you wanna do when you get to the end of the image.
# roll back to the start?
if r==249 && x==250:
r = 0
x = 0
y = 5
five_pix = a[r, x:y]
for pix in five_pix:
new_img_pxl.append(pix)

Tesseract and multiple line license plates: How can I get characters from a two line license plate?

i tried getting individual characters from the image and passing them through the ocr, but the result is jumbled up characters. Passing the whole image is at least returning the characters in order but it seems like the ocr is trying to read all the other contours as well.
example image:
Image being used
The result : 6A7J7B0
Desired result : AJB6779
The code
img = cv2.imread("data/images/car6.jpg")
gray = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
# resize image to three times as large as original for better readability
gray = cv2.resize(gray, None, fx = 3, fy = 3, interpolation = cv2.INTER_CUBIC)
# perform gaussian blur to smoothen image
blur = cv2.GaussianBlur(gray, (5,5), 0)
# threshold the image using Otsus method to preprocess for tesseract
ret, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV)
# create rectangular kernel for dilation
rect_kern = cv2.getStructuringElement(cv2.MORPH_RECT, (5,5))
# apply dilation to make regions more clear
dilation = cv2.dilate(thresh, rect_kern, iterations = 1)
# find contours of regions of interest within license plate
try:
contours, hierarchy = cv2.findContours(dilation, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
except:
ret_img, contours, hierarchy = cv2.findContours(dilation, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
# sort contours left-to-right
sorted_contours = sorted(contours, key=lambda ctr: cv2.boundingRect(ctr)[0])
# create copy of gray image
im2 = gray.copy()
# create blank string to hold license plate number
plate_num = ""
# loop through contours and find individual letters and numbers in license plate
for cnt in sorted_contours:
x,y,w,h = cv2.boundingRect(cnt)
height, width = im2.shape
# if height of box is not tall enough relative to total height then skip
if height / float(h) > 6: continue
ratio = h / float(w)
# if height to width ratio is less than 1.5 skip
if ratio < 1.5: continue
# if width is not wide enough relative to total width then skip
if width / float(w) > 15: continue
area = h * w
# if area is less than 100 pixels skip
if area < 100: continue
# draw the rectangle
rect = cv2.rectangle(im2, (x,y), (x+w, y+h), (0,255,0),2)
# grab character region of image
roi = thresh[y-5:y+h+5, x-5:x+w+5]
# perfrom bitwise not to flip image to black text on white background
roi = cv2.bitwise_not(roi)
# perform another blur on character region
roi = cv2.medianBlur(roi, 5)
try:
text = pytesseract.image_to_string(roi, config='-c tessedit_char_whitelist=0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ --psm 8 --oem 3')
# clean tesseract text by removing any unwanted blank spaces
clean_text = re.sub('[\W_]+', '', text)
plate_num += clean_text
except:
text = None
if plate_num != None:
print("License Plate #: ", plate_num)
For me psm mode 11 worked able to detect single line and multi as well
pytesseract.image_to_string(img, lang='eng', config='--oem 3 --psm 11').replace("\n", ""))
11 Sparse text. Find as much text as possible in no particular order.
If you want to extract license plate number from two rows you can replace following line:
sorted_contours = sorted(contours, key=lambda ctr: cv2.boundingRect(ctr)[0] + cv2.boundingRect(ctr)[1] * img.shape[1] )
with
sorted_contours = sorted(contours, key=lambda ctr: cv2.boundingRect(ctr)[0])

Grayscale image using opencv from numpy array failed

I use the following numpy array that hold an image which is black and white image with the following shape
print(img.shape)
(28, 112)
when I try to grayscale the image, to use it to get contours using opencv with following steps
#grayscale the image
grayed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#thredshold image
thresh = cv2.threshold(grayed, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
I got the following error
<ipython-input-178-7ebff17d1c18> in get_digits(img)
6
7 #grayscale the image
----> 8 grayed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
9
10
error: C:\projects\opencv-python\opencv\modules\imgproc\src\color.cpp:11073: error: (-215) depth == 0 || depth == 2 || depth == 5 in function cv::cvtColor
the opencv errors have no information in it to be able to get what is wrong
Here is the working code for how you were trying it:
img = np.stack((img,) * 3,-1)
img = img.astype(np.uint8)
grayed = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(grayed, 0, 255, cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
A simpler way of getting the same result is to invert the image yourself:
img = (255-img)
thresh = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU)[1]
As you discovered, as you perform different operations on images, the image is required to be in different formats.
cv2.THRESH_BINARY_INV and cv2.THRESH_BINARY are designed to take a color image (and convert it to grayscale) so you need a three channel representation.
cv2.THRESH_OTSU works with grayscale images so one channel is okay for that.
Since your image was already grayscale from the start, you weren't able to convert it from color to grayscale nor did you really need to. I assume you were trying to invert the image but that's easy enough on your own (255-img).
At one point you tried to do an cv2.THRESH_OTSU with floating point values but cv2.THRESH_OTSU requires integers between 0 and 255.
If openCV had more user-friendly error messages it would really help with issues like these.