I am just researching building a multilingual website. I've looked into database structure but I am now looking at how the URLs would work.
My main concern is that I will have an English and Chinese version of the website. I want to use search engine friendly URLs, how would this be possible with Chinese characters?
For the English site I may use something like:
www.domain.com/en/products/[productname]/
With the product name coming from the English translation in the database.
What would I do for the Chinese website?
www.domain.com/cn/products/[productname]/
Would I just be able to put the Chinese translation from the database straight into the URL?
from URLEncoder:
When encoding a String, the following rules apply:
The alphanumeric characters "a" through "z", "A" through "Z" and "0"
through "9" remain the same. The special characters ".", "-", "*", and
"_" remain the same. The space character " " is converted into a plus
sign "+". All other characters are unsafe and are first converted into
one or more bytes using some encoding scheme. Then each byte is
represented by the 3-character string "%xy", where xy is the two-digit
hexadecimal representation of the byte. The recommended encoding
scheme to use is UTF-8. However, for compatibility reasons, if an
encoding is not specified, then the default encoding of the platform
is used.
So do the encoding as specified. I guess the search engines will decode them correctly.
Does anyone know how to derive test cases by using equivalence partitioning on email address field validation?
Test cases
1) Email Length
The format of email addresses is local-part#domain where the local-part may be up to 64 characters long and the domain name may have a maximum of 255 characters – but the maximum 256 characters length of a forward or reverse path restricts the entire email address to be no more than 254 characters
So, divide test cases in two scenarios:
i) email id between 0 to 254 characters
ii) email id greater than 254 characters
2) Characters and Numbers
Email accepts Uppercase and lowercase English letters (a–z, A–Z) and Digits 0 to 9
So, check email address with alphabets lower and upper-case and numbers, Check weather the loginid accepts the user name starting with caps letter or number or spl charaters
eg. niceandsimple#example.com, niceand122simple123#example.com
3) Special Charachters
Characters !#$%&'*+-/=?^_{|}~ are been accepted. So, write two scenarios.
1) email id with Characters !#$%&'*+-/=?^_{|}~ should be accepted
ii) email id containing characters other than Characters !#$%&'*+-/=?^_`{|}~ should not be accepted
eg.
---> !#$%&'*+-/=?^_`{}|~#example.org
---> " "#example.org
4) Special Characters with restrictions
Special characters are allowed with restrictions. They are:
Space and "(),:;<>#[]
The restrictions for special characters are that they must only be used when contained between quotation marks, and that 2 of them (the backslash \ and quotation mark " (ASCII: 92, 34)) must also be preceded by a backslash \ (e.g. "\\"").
Two scenarios
1) characters "(),:;<>#[] within double quotes
ii) charachters "(),:;<>#[] without double quotes
eg.
----> "()<>[]:,;#\\"!#$%&'*+-/=?^_`{}| ~.a"#example.org
5) Email with Dots (.)
i) email id with single dot should be accepted
a.little.lengthy.but.fine#dept.example.com
ii) email with multiple continues dot not accepted
a.little.....fine#dept.example.com
iii) Leading dot in address is not allowed
.abc123#gmail.com
iv) Trailing dot in address is not allowed
abc123.#gmail.com
v) Multiple dot in the domain portion is invalid
abc123#gmail..com
6) domain name
i) same domain name ----> check the mail can be of same domain name i.e gmail#gmail.com
ii) Domain is valid IP address
iii) Square bracket around IP address is considered valid
iv) Dash in domain name is valid
v) Missing # sign and domain
vi) Garbage ( ##%^%#$##$##.com )
vii) Two # sign
viii) Leading dash in front of domain is invalid
ix) .web is not a valid top level domain
x) Invalid IP format
7) Text in email
1) Text followed email is not allowed
email#domain.com (Joe Smith)
2) Text before email allowed
(Joe Smith)email#domain.com
Take each input condition described in the specification and derive at least two equivalence classes for it. One class represents the set of cases which satisfy the condition (the valid class) and one represents cases which do not (the invalid class), example as below:
–Number of email field: 0<21
•Class 1: any value less then 1(invalid input)
•Class 2: 1-20 (valid input)
•Class 3: any value more then 20(invalid input)
•Select at least 1 value from each class as test data for testing on the field “Number of email”
–Value below will be use for testing for “number of email” field validation and verification
–-5, 5, 25
I would like a regular expression to find the %s in the source string that don't form the start of a valid two-hex-digit escaped character (defined as a % followed by exactly two hexadecimal digits, upper or lower case) that can be used to replace only these % symbols with %25.
(The motivation is to make the best guess attempt to create legally escaped strings from strings of various origins that may be legally percent escaped and may not, and may even be a mixture of the two, without damaging the data intent if the original string was already correctly encoded, e.g. by blanket re-encoding).
Here's an example input string.
He%20has%20a%2050%%20chance%20of%20living%2C%20but%20there%27s%20only%20a%2025%%20chance%20of%20that.
This doesn't conform to any encoding standard because it is a mix of valid escaped characters eg. %20 and two loose percentage symbols. I'd like to convert those %s to %25s.
My progress so far is to identify a regex %[0-9a-z]{2} that finds the % symbols that are legal but I can't work out how to modify it to find the ones that aren't legal.
%(?![0-9a-fA-F]{2})
Should do the trick. Use a look-ahead to find a % NOT followed by a valid two-digit hexadecimal value then replace the found % symbol with your %25 replacement.
(Hopefully this works with (presumably) NSRegularExpression, or whatever you're using)
%(?![a-fA-F0-9]{2})
That's a percent followed by a negative lookahead for two hex digits.
When I search one ldap server using the following filter
(cn=%*)
It return all results under the base dn? LDAP treat '%' specially? But I haven't found any description about it.
What is your directory server ?
Are you sure tha '%' is not replace by your command line interpreter or your compiler ?
According to RFC2254 % is not a special character
If a value should contain any of the following characters
Character ASCII value
---------------------------
* 0x2a
( 0x28
) 0x29
\ 0x5c
NUL 0x00
the character must be encoded as the backslash '\' character (ASCII
0x5c) followed by the two hexadecimal digits representing the ASCII
value of the encoded character. The case of the two hexadecimal
digits is not significant.
This simple escaping mechanism eliminates filter-parsing ambiguities
and allows any filter that can be represented in LDAP to be
represented as a NUL-terminated string. Other characters besides the
ones listed above may be escaped using this mechanism, for example,
non-printing characters.
For example, the filter checking whether the "cn" attribute contained
a value with the character "" anywhere in it would be represented as
"(cn=\2a*)".
Note that although both the substring and present productions in the
grammar above can produce the "attr=*" construct, this construct is
used only to denote a presence filter.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
The documentation below is for a module, which has now been "decommissioned"
and I'm writing it's replacement.
Before i write the replacement I want to get my terms right.
I know the terms are wrong in the documentation - it was hacked together quickly
so i could instruct a college working on the hardware side of this project on how to use a program I'ld made.
Full documentary can be found here for any who are interested (in so much as has been written and added to our wiki), the Website may only be available to certain IPS's (depends on you ISP - university internet connections are most likely to work), and the SVN repo is private.
So there are alot of terms that are wrong.
such as.
deliminators
formatted string containing value expressions (might now be wrong but is hard to say)
What are the correct terms for these.
And what other mistakes have I made
==== formatted string containing value expressions ====
Before I start on actual programs an explanation of:
"formatted string containing value expressions" and how to encode values in them.
The ''formatted string containing value expressions'' is at the core of doing low level transmission.
We know the decimal 65, hex 41, binary 0100 0001, and the ascii character 'A' all have the same binary representation, so to tell which we are using we have a series of deliminators - numbers preceded by:
# are decimal
$ are Hex
# are binary
No deliminator, then ascii.
Putting a sign indicator after the deliminator is optional. It is required if you want to send a negative number.
You may put muliple values in the same string.
eg: "a#21#1001111$-0F"
All values in a ''formatted string containing value expressions'' must be in the range -128 to 255 (inclusive) as they must fit in 8bytes (other values will cause an error). Negative numbers have the compliment of 2 representation for their binary form.
There are some problems with ascii - characters that can't be sent (in future versions this will be fixed by giving ascii a delineator and some more code to make that deliminator work, I think).
Characters that can't be sent:
* The delineator characters: $##
* Numbers written immediately after a value that could have contained those digits:
* 0,1,2,3,4,5,6,7,8,9 for decimal
* 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,A,B,C,D,E,F for hex
* 0,1 for binary
For a start, deliminator would probably be delimiter, although I notice your text has both delineator and deliminator in it - perhaps deliminator is a special delimiter/terminator combination :-)
However, a delimiter is usually used to separate fields and is usually present no matter what. What you have is an optional prefix which dictates the following field type. So I would probably call that a "prefix" or "type prefix" instead.
The "formatted string containing value expressions" I would just call a "value expression string" or "value string" to change it to a shorter form.
One other possible problem:
must be in the range -128 to 255 (inclusive) as they must fit in 8bytes
I think you mean 8 bits.
Try something like the following:
==== Value string encoding ====
The value string is at the core of the data used for low level
transmissions.
Within the value string the following refixes are used:
# decimal
$ Hex
# binary
No prefix - ASCII.
An optional sign may be included after the delimiter for negative numbers.
Negative numbers are represented using twos complement.
The value string may contain multiple values:
eg: "a#21#1001111$-0F"
All elements of the value string must represent an 8bit value and must
be in the range -128 to 255
When using ASCII representation the following characters that can't be sent
* The delineator characters: $## (use prefixed hex value.)
* Numbers written immediately after a value that could have
contained those digits:
* 0,1,2,3,4,5,6,7,8,9 for decimal
* 0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f,A,B,C,D,E,F for hex
* 0,1 for binary