Superscript greater than three does not work. Only one two and three available - sql

I'm querying a database and trying to get superscripts. The first three work fine, others don't though. What are the valid superscript characters for anything over three?
1: NCHAR(185)
2:NCHAR(178)
3:NCHAR(179)
4: NCHAR(8308)
5:NCHAR(8309)
6:NCHAR(8310)
7:NCHAR(8311)
8:NCHAR(8312)
9:NCHAR(8313)

Simply search the web for characters named SUERSCRIPT FOUR, etc.
You will find pages such as this, for superscript 4:
http://www.fileformat.info/info/unicode/char/2074/index.htm
The characters you have seem to be at the correct decimal codepoints; perhaps the issue is with the fonts. You'll just have to find fonts that support these characters.
Also check that the character encoding of your database is UTF-8 and not, say, Latin-1.

Related

Parser not recognizing a dash

My program makes calculations on physics vectors and it allows copy/pasting from websites and then tries to parse them into the x, y, and z components automatically. I've come across one website (http://mathinsight.org/cross_product_examples) that has (3,−3,1). While that looks normal, that minus is actually not recognized by VB. Visually, it is longer than the normal minus (− and -), but return the same Unicode of 45. This picture shows the Unicode for every character (I added a minus in front of the first 3 for comparison) in the Textbox. Also, from this website, I had to use Ctrl+c because right clicking shows that this is not simple HTML.
One is valid (the first), but the second gives VB fits as shown below. Either it won't compile (shown by the blue line below) or a simple assignment (the second one) wrecks havok on my form.
I have tried using
vectorString.Replace("–", "-")
and pasting in the longer dash for the target string and a normal keystroke dash as the replacement, but nothing happens. I'm guessing that since they both have the same Unicode.
Is there some way to convert the longer, invalid dash into the one recognized by VB? I tried using dash symbol that Word likes to replace the minus sign with and it comes up as Unicode 150. So, apparently there are at least three different kinds of dashes. Any thoughts?
The character from Math Insight is U+2212, minus sign. The character you tried using in your Replace call is U+2013, en dash. That's why your replace didn't work.
Beyond the standard ASCII hyphen (-, U+0045), there are two common dashes: the en dash (–, U+2013) and the em dash (—, U+2014). There is also a figure dash (‒, U+2012), but it is not as common.

IE 10 not rendering Japanese correctly

I recently discovered an issue with IE10. We have a web page that displays English text beside a translation in Japanese. Some of the Japanese characters display as squares. In the view source page all characters are correctly rendered. The database also has the characters correctly rendered. The unusual part is that when I block the characters with the cursor they convert to the correct characters.
IE10 I believe has a bug.
Anyone having similar issue or know of a fix? Checked all language settings, regional settings, browser font settings and many other tests. Nothing corrects this issue.
This issue was related to a dual byte character which some fonts and windows applications will support.
Some older fonts may use a two hex character representation to present a single character. Some fonts support this and some do not.
In this case the characters at issue were the following…..
ジ
シ and ゙
The latter two which I think are special characters that combined are intended to represent ジ.
The Unicode Standard from the Unicode ISO web site table defines them like so…..
Decimal Character HEX Name
12472 ジ 30B8 KATAKANA LETTER ZI
12471 シ 30B7 KATAKANA LETTER SI
12441 っ゙ 3099 COMBINING KATAKANA-HIRAGANA VOICED SOUND MARK (combined with small tu (っ))
So some fonts use 12471 + 12441 to make 12472. This is what I found. But the actual string has 12471 + 12441 and not 12472 or the hex: 0x30B7, 0x3099 and not 0x30B8.
Any time a font being used does not support this binding, a box is displayed. The challenge is that it may be as simple as someone creating a birthday card using a non-compliant UTF8 font that could cause a PC to not allow the character to display correctly.

Approximate search with openldap

I am trying to write a search that queries our directory server running openldap.
The users are going to be searching using the first or last name of the person they're interested in.
I found a problem with accented characters (like áéíóú), because first and last names are written in Spanish, so while the proper way is Pérez it can be written for the sake of the search as Perez, without the accent.
If I use '(cn=*Perez*)' I get only the non-accented results.
If I use '(cn=*Pérez*)' I get only accented results.
If I use '(cn=~Perez)' I get weird results (or at least nothing I can use, because while the results contain both Perez and Pérez ocurrences, I also get some results that apparently have nothing to do with the query...
In Spanish this happens quite a lot... be it lazyness, be it whatever you want to call it, the fact is that for this kind of thing people tend NOT to write the accents because it's assumend all these searches work with both options (I guess since Google allowes it, everybody assumes it's supposed to work that way).
Other than updating the database and removing all accents and trimming them on the query... can you think of another solution?
You have your ~ and = swapped above. It should be (cn~=Perez). I still don't know how well that will work. Soundex has always been strange. Since many attributes are multi-valued including cn you could store a second value on the attribute that has the extended characters converted to their base versions. You would at least have the original value to still go off of when you needed it. You could also get real fancy and prefix the converted value with something and use the valuesReturnFilter to filter it out from your results.
#Sample object
dn:cn=Pérez,ou=x,dc=y
cn:Pérez
cn:{stripped}Perez
sn:Pérez
#etc.
Then modify your query to use an or expression.
(|(cn=Pérez)(cn={stripped}Perez))
And you would include a valuesReturnFilter that looked like
(!(cn={stripped}*))
See RFC3876 http://www.networksorcery.com/enp/rfc/rfc3876.txt for details. The method for adding a request control varies by what platform/library you are using to access the directory.
Search filters ("queries") are specified by RFC2254.
Encoding:
RFC2254
actually requires filters (indirectly defined) to be an
OCTET STRING, i.e. ASCII 8-byte String:
AttributeValue is OCTET STRING,
MatchingRuleId
and AttributeDescription
are LDAPString, LDAPString is an OCTET STRING.
The standard on escaping: Use "<ASCII HEX NUMBER>" to replace special characters
(https://www.rfc-editor.org/rfc/rfc4515#page-4, examples https://www.rfc-editor.org/rfc/rfc4515#page-5).
Quote:
The <valueencoding> rule ensures that the entire filter string is a
valid UTF-8 string and provides that the octets that represent the
ASCII characters "*" (ASCII 0x2a), "(" (ASCII 0x28), ")" (ASCII
0x29), "\" (ASCII 0x5c), and NUL (ASCII 0x00) are
represented as a backslash "\" (ASCII 0x5c) followed by the two hexadecimal digits
representing the value of the encoded octet.
Additionally, you should probably replace all characters that semantically modify the filter (RFC 4515's grammar gives a list), and do a Regex replace of non-ASCII characters with wildcards (*) to be sure. This will also help you with characters like "é".

Adobe PDF Guide encoding tables

In the Adobe PDF1.7 Guide, there's a table in section "D.2 Latin Character Set and Encodings".
Are the columns "MAC" and "WIN" the wrong way round? [For example the table, as it stands, implies "WIN" has fraction characters whereas "MAC" does not!?]
I don't see why the columns "MAC" and "WIN" should be the wrong way round. (Your logic to come up with such a suspicion is flawed: just because the MAC column does have empty values where WIN has entries wouldn't suggest the situation would be more comfortable. Because in this case there would a different person complaining about the same empty entries in the WIN column...)
It's not that WIN is "complete" while MAC is "incomplete". For example WinAnsiEncoding doesn't have dotlessi or breve while MacRomanEncoding does.
No -- indeed characters like fraction slash or threequarters are not present in MacRomanEncoding. (That does not mean Macs or MacRomanEncoding-using PDFs can't display or these characters if they should occur in a PDF: these characters just need to be encoded in a font using custom encoding or one of the encodings which support them...)

Why is there a convention of 1-based line numbers but 0-based char numbers?

According to TkDocs:
The "1.0" here represents where to insert the text, and can be read as "line 1, character 0". This refers to the first character of the first line; for historical conventions related to how programmers normally refer to lines and characters, line numbers are 1-based, and character numbers are 0-based.
I hadn't heard of this convention before, and I can't find anything relevant on Google. Can anyone explain this to me please?
I think you're referring to Tk's text widget. The man page says:
Lines are numbered from 1 for consistency with other UNIX programs that use this numbering scheme.
Although, I'm not sure which Unix tools it's talking about.
Update:
As mentioned in the comments, it looks like a lot of unix text manipulation tool starts line numbering at 1. And tcl/tk having a unix origin, it makes sense to be as compatible as possible with the underlying OS environment.
It really is nothing more than convention, but here is a suggestion.
Character positions are generally thought of in the same way as a Java iterator, which is a "pointer" to a position between two characters. Thus the first character is the one after index position 0. Substrings are taken between two inter-character positions, for instance.
Line positions on the other hand are generally thought of more in the way of a .NET enumerator, which is a "pointer" to the item itself, not to a position in between. Thus the first line is the line at position 1.