Detect text language in VBA - vba

I have a textbox in PowerPoint which I store into an array with Split.
Is there any way to detect what language the text is in VBA?
There will actually only be English or Chinese text, so I guess an alternative solution would be to detect if the text is not English, or is/isn't Unicode?

It should be possible by checking that one of the characters is Chinese:
Function IsChiness(text As String) As Boolean
Dim c&, i&
For i = 1 To Len(text)
c = AscW(Mid$(text, i, 1))
If c >= &H4E00& And c <= &H9FFF& Then
IsChiness = True
Exit Function
End If
Next
End Function

The shape's .TextFrame.TextRange.LanguageID will tell you what language the text is set to. US English is 1033, for example. There's a list of language IDs here (use the Decimal LCID, right-hand column in this case):
https://msdn.microsoft.com/en-us/goglobal/bb964664.aspx?f=255&MSPPError=-2147217396
It's worth looking at the hex values as well. The rightmost two digits give you the main language code (Chinese is 04, for example) and the leftmost two digits identify the specific locale (PRC, Singapore, Taiwan, etc).
If you're likely to have mixed language text in a single text box, look at the LanguageID property of each .Run of text. For example, with a shape selected, try this:
Dim oRng As TextRange
Dim x As Long
With ActiveWindow.Selection.ShapeRange(1).TextFrame.TextRange
For x = 1 To .Runs.Count
Debug.Print .Runs(x).LanguageID
Next
End With

Related

Checking the data type (integer or string) in a word table

I am trying to do some conditional formatting in word table based on the value in a specific cell.
If the value is <1 set the background to green; if the value is between 1 and 10, format the background yellow and if the value is above 10 format the background red.
I am able to loop through a table and debug.print the content of each cell but am struggling with checking for the datatype in the correspoding cell.
I tried IsNumeric, Int, Fix but none work
`
Sub ConditionalFormat()
Dim tbl As Table, r As Long, c As Long
Set tbl = ActiveDocument.Tables(1)
For r = 1 To tbl.Rows.Count
For c = 1 To tbl.Columns.Count
If tbl.Cell(r, c) = Int(tbl.Cell(r, c)) Then
tbl.Cell(r, c).Shading.BackgroundPatternColor = wdColorBlueGray
End If
Next c
Next r
End Sub
where am i going wrong?
`
Word tables have "end of cell" characters that can get in the way when you process a cell's content.
In your case,
Int(tbl.Cell(r,c))
won't work because tbl.Cell(r,c) returns the Cell, not its value or content. To get its content, you really need
tbl.Cell(r.c).Range
But even that just specifies a block of material in the cell, so it might contain text, images etc. What you are typically looking for is the plain text of the cell, which is really
tbl.Cell(r.c).Range.Text
So you might hope that, for example, if your cell contained the text "42" the expression
IsNumber(tbl.Cell(r.c).Range.Text)
would return True. But it doesn't, because each Word table cell has an end-of-cell character that is returned at the end of the .Range.Text, and that means VBA does not recognise the text as Numeric. To deal with that, you can use
Dim rng As Word.Range
Set rng = tbl.Cell(r.c).Range
rng.End = rng.End - 1
Debug.Print IsNumber(rng.Text)
Set rng = Nothing
SOme VBA functions will ignore the end-of-cell marker anyway as they are intended to be reasonably flexible about how to recognise a number, e.g. you should be able to use
Val(tbl.Cell(r,c).Range.Text)
without running into problems.
As for which functions to use to test/convert the value, that really depends on how much you can assume about your data, how much validation you need to do and what you need to do with your data.
In a nutshell, Val looks for "bare numbers", e.g. 123, 123.45, and numbers in scientific notation. If it finds something non-numeric it will return 0. AFAICR Int and Fix work in similar ways but modify the number in different ways. IsNumeric, CInt, CDbl and so on recognise numbers taking account of the Regional Settings in your OS (e.g. Windows) and accepts/ignores grouping digits (e.g. so they might recognize 1,234,567.89 and even 1,,234,567.89 as 1234567.89 on a typical US system and 1.234.567,89 as the "same number" on a German system). CInt etc. will raise an error if they don't recognise a number.
Anything more than that and you'll probably have to find or write a piece of code that does exactly what you need. I expect there are thousands of such routines out there.
Probably worth noting that the Range objects in Excel and Word have different members. Excel has a Range.Value property but Word does not.
Try:
Sub ConditionalFormat()
Dim r As Long, c As Long
With ActiveDocument.Tables(1)
For r = 1 To .Rows.Count
For c = 1 To .Columns.Count
With .Cell(r, c)
Select Case Val(Split(.Range.Text, vbCr)(0))
Case Is < 1: .Shading.BackgroundPatternColor = wdColorGreen
Case Is > 10: .Shading.BackgroundPatternColor = wdColorRed
Case Else: .Shading.BackgroundPatternColor = wdColorYellow
End Select
End With
Next c
Next r
End With
End Sub

How to check if a cell value is Chinese text in Excel

I have a list of cells which contain the first name of users: Amy, Jim, 梅, 明, ธนกาญจน์, Андрей, etc. From the name, I would like to determine if a user is Chinese.
Does anyone know if there is any formula or VBA method to determine this?
Since the question is to check based on Chinese text I would prefer using AscW function like below
Function DetectChineseText(cel As Range) As Boolean
DetectChineseText = True
For i = 1 To Len(cel.Value)
MidChar = Mid(cel.Value, i, 1)
If Not (AscW(MidChar) >= 19968 And AscW(MidChar) <= 25343) Then
DetectChineseText = False
Exit For
End If
Next i
End Function
But as per this link there are 89,602 codepoints assigned in Unicode character set for chinese characters we need to mention every range we needed to detect in the data. For now I just used only one range (19968 to 25343) as per a wikipedia article.
You could add the range as per your requirement.

Extract first two digits that comes after some string in Excel

I have a row with values something like this, How to extract first two digits that come after the text 'ABCD' to another cell, any formula or vba? There may be a few chars in between or sometimes none.
ABCD 10 sadkf sdfas
ABCD-20sdf asdf
ABCD 40
ABCD50 asdf
You can do this with a worksheet formula. No need for VBA.
Assuming you do not need to test for the presence of two digits:
=MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890")),2)
If you need to test for the presence of two digits, you can try:
=IF(ISNUMBER(-RIGHT(MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890")),2),1)),MID(A1,MIN(FIND({1,2,3,4,5,6,7,8,9,0},A1&"1234567890")),2),"Invalid")
In general, it is always a good idea to show some code in StackOverflow. Thus, you show that you have tried something and you give some directions for the answer.
Concerning the first two digits extract, there are many ways to do this. Starting from RegEx and finishing with a simple looping of the chars and checking each one of them.
This is the loop option:
Public Function ExtractTwoDigits(inputString As String) As Long
Application.Volatile
Dim cnt As Long
Dim curChar As String
For cnt = 1 To Len(inputString)
curChar = Mid(inputString, cnt, 1)
If IsNumeric(curChar) Then
If Len(ExtractTwoDigits) Then
ExtractTwoDigits = ExtractTwoDigits & curChar
Exit Function
Else
ExtractTwoDigits = curChar
End If
End If
Next cnt
ExtractTwoDigits = -1
End Function
Application.Volatile makes sure that the formula recalculates every time;
-1 is the answer if no two digits exist in the inputString;
IsNumeric checks whether the string inside is numeric;
As a further step, you may try to make the function a bit robust, extracting the first 1, 3, 4 or 5 digits, depending on a parameter that you put. Something like this =ExtractTwoDigits("tarato123ra2",4), returning 1232.
RegEx Version:
Public Function GetFirstTwoNumbers(ByVal strInput As String) As Integer
Dim reg As New RegExp, matches As MatchCollection
With reg
.Global = True
.Pattern = "(\d{2})"
End With
Set matches = reg.Execute(strInput)
If matches.Count > 0 Then
GetFirstTwoNumbers = matches(0)
Else
GetFirstTwoNumbers = -1
End If
End Function
You have to enable Microsoft Regular Expressions 5.5 under extras->references. The pattern (\d{2}) matches 2 digits, return value is the number, if not existing -1.
Note: it only extracts 2 successive numbers.
If you place this function into a module, you can use it like normal formula.
Here a great site to to get into regEx.

From a range, return a specific word's index

I have a range (rng) which has the word "means" somewhere in it. I'm trying to determine if a word two words before "means" is underlined but can't quite figure out how.
Here's what my rng.Text is (note the brackets indicate the underlined text)
"[Automobile] - means a car that isn't a bus but can be an SUV"
Sometimes, it is "The way you have to go about it is with the various means of thinking".
The first one is a definition, since it has "means" preceeded by an underlined word. The second example is NOT a definition.
I'm trying to get my macro to look to 2 word before "means", but can't quite figure out how.
I am able to figure how many characters it is by this:
Dim meansLoc&
meansLoc = instr(rng.Text, "means")
Then, I can test If rng.Characters(meansLoc-9).Font.Underline = wdUnderlineSingle, but I run into problems if my defined word is only say 3 characters ("Dad - means a father", would error our since there means' index is 7, and 7-9 = -2). This is why I'd like to use words. (I can use one or two words before "means").
How can I return the character index of "means" in my rng. How do I get the "word index" (i.e. 2) from my rng?
Both Characters and Words are ranges, so one approach would be to compare the Start of the Character's range with each Word in the rng, e.g. you could start with
' assumes you have already declared and populated rng
Dim bDefinition As Boolean
Dim i as Integer
Dim meansLoc as Integer
Dim meansStart as Integer
meansLoc = instr(rng.Text,"means")
meansStart = rng.Characters(meansLoc).Start
bDefinition = False
For i = 1 To rng.Words.Count
If rng.Words(i).Start = meansStart Then ' i is your Word index (i.e. 3, not 2!)
If i > 2 Then
If rng.Words(i - 2).Font.Underline = wdUnderlineSingle Then
Debug.Print "Looks like a definition"
bDefinition = True
Exit For
End If
End If
End If
Next
If Not bDefinition Then
Debug.Print "Couldn't see a definition"
End If
Just bear in mind that what Word considers to be a "word" may be different from your normal understanding of what a "word" is.

VB determining values within a string

I am looking for assistance with my program. I have a user enter 6 digits; of these the input must be alpha-numeric. I have already done the TryParse method for the numbers, but I am looking for validation that the string contains an alpha.
I am aware you must use ASC but am unsure both on how to develop a range say Asc((Chr(65) <= Chr(90))) (between A-Z) and also to say (IF my input contains any of these values within the 6 characters, to return true. I keep getting an overload resolution and wish to know how to properly code so the variables are accurate.
This is a great place to use a regular expression
Dim input = ...
If Regex.IsMatch(input, "^\w+$") AndAlso input.Length = 6 Then
' It's a match
Else
' It's not a match
End If
This will match any string which consists only of letters that has length equal to 6
You can iterate through each char and check if it's a letter. If so, set a flag to true.
Dim containsAlpha Boolean = False
For i As Integer = 0 To input.Length - 1
If Char.IsLetter(input(i)) Then
containsAlpha = True
Exit For
End If
Next
Char.IsLetter will match Unicode alphabetic letters, so not just Latin A-Z (which may or may not be what you actually want).