Excel file contains invalid hidden characters that can't be removed - vba

I have a peculiar problem with hidden characters in an Excel spreadsheet which uses VBA to create a text file. I've attached a link to a test version of the file, and I'll explain as best I can the issue.
The file creates a plain txt file that can be used to feed data into a System we use. It works well normally, however we've been supplied approximately 15,000 rows of data, and at random points throughout there are hidden characters.
In the test file, there's 1 row and it's cell B11 that has hidden characters at the beginning and end of the value. If you put your cursor at the end of it, and press the backspace key, it will look as if nothing has happened, but actually you've just deleted one of the characters.
As far as Excel is concerned, those hidden characters are question marks, but they're not, as text stream would parse those, but it doesn't, and instead throws up an invalid procedure call error.
I've tried using Excel's CLEAN formula, I've tried the VBA equivalent, tried using 'Replace', but nothing seems to recognise those characters. Excel is convinced they're just question marks, even an ASCII character call gives me the same answer (63), but replace doesn't replace them as question marks, it just omits them!
Any help on this, even if it's just a formula I could apply would be appreciated. In the interests of data protection the data in the file is fake by the way, it's nobody's real NI number.
The excel file with vba code is here

This VBA macro could be run on its own or in conjunction with the ClearFormatting macro. It did strip out the rogue unichars from the sample.
Sub strip_Rogue_Unichars()
Dim uc As Long
With Cells(11, 1).CurrentRegion
For uc = 8000 To 8390
.Replace what:=ChrW(uc), replacement:=vbNullString, lookat:=xlPart
DoEvents
Next uc
End With
End Sub
There's probably a better way to do this and being able to restrict the scope of the Unicode characters to search and replace would obviously speed things up. Turning off .EnableEvents, .ScreenUpdating, etc would likewise help. I believe the calculation was already at manual. I intentionally left a DoEvents in the loop as my first run was several thousand different unichars.

Related

Modifying string in cell without getting rid of formatting (underlines)

I'm currently parsing out words in an excel file using VBA, and one of the ways I've decided to split up the words is by using their existing underlines (some are already underlined, some aren't). The line that's giving me trouble is
Rng.Value2 = Replace(Rng.Value2, Left(Rng.Value2, I - 1), "")
Where Rng is referring to one cell and I is a counter in a for loop. This line of code turns this
Euthyatira pudens Dogwood Thyatirid Habrosyne scripta Lettered Habrosyne Habrosyne gloriosa Glorious Habrosyne
into this
Habrosyne scripta Lettered Habrosyne Habrosyne gloriosa Glorious Habrosyne
Except instead of bolded words, they're underlined (this text editor can't do underlines). Any idea why this is happening and how to keep the underlines?
Edit: I can make it work in a long, roundabout way (stepping through the string one character at a time and comparing it to the original string to see if that character should be re-underlined, but this takes the computer a long time. Any faster ways that don't involve stepping though the string one character at a time?
Thanks

VBA: How to Reference Large Unicode Characters like Paperclip?

I know that similar question has been asked many times before but all I found was about characters up to 2-byte long. I need:
MyString = "📎"
The PAPERCLIP is (U+1F4CE) (http://www.fileformat.info/info/unicode/char/1f4ce/index.htm) and the
ChrW(128206) 'throws an error
HOW to reference the unicode chars longer than 2 bytes?
This is a job that your text editor ought to take care of. My memory of the VBA editor is hazy, I don't recollect any way to force the text encoding of the source code file and trying it quickly with the VBA editor in Excel 2013 looks very unpromising. It turns the utf-16 surrogate pair into two question marks.
Switching to another editor could work, Notepad works fine with the Encoding setting in the Save As dialog forced to "Unicode" for example. But that is hardship, with high odds that the string gets mangled again when you continue editing with the VBA editor. The workaround is to specify the surrogate pair explicitly. Try:
MyString = ChrW(&HD83D) & ChrW(&HDCCE)
Google "utf16 surrogate pair calculator" if you need to do this more than once.

Convert xls File to csv, but extra rows added?

So, I am trying to convert some xls files to a csv, and everything works great, except for one part. The SaveAs function in the Excel interop seems to export all of the rows (including blank ones). I can see these rows when I look at the file using Notepad. (All of the rows I expect, 15 rows with two single quotes, then the rest are just blank). I then have a stored procedure that takes this csv and imports to the desired table (this works on spreadsheets that have been manually converted to csv (e.g. open, File--> Saves As, etc.)
Here is the line of code I am using for my SavesAs in my code. I have tried xlCSV, xlCSVWindows, and xlCSVDOS as my file format, but they all do the same thing.
wb.SaveAs(aFiles(i).Replace(".xls", "B.csv"), Excel.XlFileFormat.xlCSVMSDOS, , , , False) 'saves a copy of the spreadsheet as a csv
So, is there some additional step/setting I need to do to not get the extraneuos rows to show up in the csv?
Note that if I open this newly created csv, and then click Save As, and choose csv, my procedure likes it again.
When you create a CSV from a Workbook, the CSV is generated based upon your UsedRange. Since the UsedRange can be expanded simply by having formatting applied to a cell (without any contents) this is why you are getting blank rows. (You can also get blank columns due to this issue.)
When you open the generated CSV all of those no-content cells no longer contribute to the UsedRange due to having no content or formatting (since only values are saved in CSVs).
You can correct this issue by updating your used range before the save. Here's a brief sub I wrote in VBA that would do the trick. This code would make you lose all formatting, but I figured that wasn't important since you're saving to a CSV anyway. I'll leave the conversion to VB.Net up to you.
Sub CorrectUsedRange()
Dim values
Dim usedRangeAddress As String
Dim r As Range
'Get UsedRange Address prior to deleting Range
usedRangeAddress = ActiveSheet.UsedRange.Address
'Store values of cells to array.
values = ActiveSheet.UsedRange
'Delete all cells in the sheet
ActiveSheet.Cells.Delete
'Restore values to their initial locations
Range(usedRangeAddress) = values
End Sub
Tested your code with VBA and Excel2007 - works nice.
However, I could replicate it somewhat, by formatting an empty cell below my data-cells to bold. Then I would get empty single quotes in the csv. BUT this was also the case, when I used SaveAs.
So, my suggestion would be to clear all non-data cells, then to save your file. This way you can at least exclude this point of error.
I'm afraid that may not be enough. It seems there's an Excel bug that makes even deleting the non-data cells insufficient to prevent them from being written out as empty cells when saving as csv.
http://answers.microsoft.com/en-us/office/forum/office_2010-excel/excel-bug-save-as-csv-saves-previously-deleted/2da9a8b4-50c2-49fd-a998-6b342694681e
Another way, without a script. Hit Ctrl+End . If that ends up in a row AFTER your real data, then select the rows from the first one until at least the row this ends up on, right click, and "Clear Contents".

Range(...).Formula does not translate fully

I cannot figure this one out.
We use mostly french-version Excel (as we live in a french-speaking province of Canada). Somewhere in VBA code I set a cell's formula directly. Normally, we have to write the formula in english and Excel does the translation (writing the formula in any other language than english in VBA results in an error as far as I know). However, only HALF of this equation is translated which I think is causing me issues (writing the correct formula in another cell yields different results and most probably right results).
range("J2").Formula = "=round(IF(F2="",0,F2),2)-round(IF(G2="",0,G2),2)"
Is translated to this in the cell:
=ARRONDI(SI(F2=",0,F2),2)-round(IF(G2=",0,G2),2)
As you can see, the right part should read "ARRONDI(SI(.." but it does not read that way. I have tried adding spaces, removing the minus sign altogether, etc. Nothing works, it's always half translated. Any idea ?
In VBA you neexd to escape your quotations like this:
range("J2").Formula = "=round(IF(F2="""",0,F2),2)-round(IF(G2="""",0,G2),2)"
This is because the " Character is used in VBA as the start / end of a string. So if you want ot include it IN a string you need to type it twice in a row.

Removing invisible question mark from text - #​E using vba

I have to read the text from the cells of a column in excel and search for it in another sheet.
say for example, the text in sheet1 column A is "Evoked Potential Amplitude N2 - P2." This has to be searched in sheet2 column C. This fails because a question mark appears before the "E" which is not present in the value in the sheet2.
Both are representation of same character in different application. Maybe someone might recognize it.
In the excel sheet I don't see any junk characters, but while handling it in the vb code I see a question mark before the word - Evoke.
This data was extracted from a share point application and this character (?) is not visible to the plain eye. Search and replace functions are not working in this case.
Unicode 8203 is a zero-width space. I'm not sure where it's coming from. It is probably a flaw in the way the data is imported into Excel which you haven't noticed before, but it might be worth fixing.
In the meantime, you can simply use the Mid() function in Excel VBA to remove the unwanted character. For example instead of
x = cells(1,1).value
use
x = Mid(cells(1,1).value,2)
which deletes the first character.