I am trying to decide on the correct way to deal with internationalization in a wcf web service, specifically pertaining to the Numeric Format and the Date Format. My service will support multiple languages and cultures. My question is how we should handle input and output serialization for the different Date and Number formats.
For Dates, I am thinking I will just expect that they be submitted and returned as UTC. To me this seems reasonable and a developer say in the Netherlands, should be able to deal with this format.
My other concern is with numeric types, specifically decimals. I am using the xsd:decimal data type, which uses the "." character as the decimal separator. I am wondering if this will be a problem for say a Dutch developer where they use the "," instead of the "." as the decimal separator.
I am not returning any translated text, so that is not a concern. I just want to make sure that returning and accepting a standard numeric and date format, is the right thing to do. As I am not an international developer, I am having hard time putting myself in their shoes.
Any help is appreciated.
If possible, I'd make the WCF service work with value types (numerics & date/time) in a culture agnostic manner. Since the XSD standard defines decimal using a period character for the decimal separator, it should be the responsibility of the client software to deserialize it and apply any culture specific formatting. The same should apply for date/time serialization except that time should be UTC as you point out.
Related
I am creating an archive of pdf files (magazine issues) and currently in the process of determining the file naming convention I want to adopt. So far I have thought of Title_Title-YYYY-MM-DD-VVV-NNNN where Title is the magazine name (possibly two words separated by an underscore), then the ISO 8601 date standard, followed by volume number, then issue number.
My problem is the following: not all magazines have the same types of data, for example, some have volume numbers while others only issue numbers, and more to the question here, some are issued on specific days while others on a month without a day.
My question: when faced by a magazine for which a field (say DD or VVV) does not apply, should I replace the field with zeros or drop the field completely?
And would adding zeros ruin the compatibility of my file names with ISO 8601 and any services working with it?
I am thinking about both human- and machine-readability. These files will be hosted on a website and my idea is to maximize compatibility (with google SEO for example) as well as maintain a convention that facilitates retrieval locally.
Thank you very much,
We are building a solution to optimize crew, we start working in opta with time window but i don't know the format of time.
What is actually the format of time in optaPlanner (Service time and due start time) and how i can Implement time conversion between java and C#??
It depends on your OptaPlanner implementation of the use case. Normally, you'd use java.time.* classes on the Java/DRL side to accurately deal with timezones etc in your constraints.
As for bridging Java and C#, just connect through a REST service and use the standard ISO format (yyyy-mm-dd hh:mm:ss) in the JSON or XML input/output.
I was replacing internal Serializations in my application from Jil to Bond.
I'm switching simple classes with Ms Bond attributes and everything worked fine until I got one with a DateTime.
I had then a Dictionary KeyNotFound Exception error during Serialization.
I suspect Bond do not support DateTime, is that so?
And if it is, why is not implemented? DateTime is not a basic type but adding a custom converter is not worth it, the speed gain vs protobuf-net is minimal and I don't need generics, just simple fast de/serializer.
I hope I'm missing something, I really want to use Bond, but I need an easy tool too, I cannot risk breaking the application because something basic like a Date or Guid is not supported by default.
I'm writing here after hours of researches and the Young Guide to C# bond does not clearly mention what is and what is not supported.
No, there is no built-in timestamp type in Bond. The built-in types in Bond are documented in the manual for the gbc compiler.
For GUIDs, there's Bond.GUID, which has implicit conversions to/from System.Guid. Note that Bond.GUID lives in bond.bond, so if you want to refer to this from a .bond file, you'll need to use Bond's import functionality and import "bond/core/bond.bond"
There's an example showing how to use DateTime with a custom type alias.
The reason there is no built-in timestamp type in Bond is that there are so many different ways (and standards) for representing timestamps. There's a similar C++ example that shows representing time with a boost::posix_time::ptime, highlighting the various different ways that time is represented.
Our experience has been that projects usually already have a representation for timestamps that they want to use, so, we recommend using a converter so that you can use the representation that's appropriate for your circumstances.
As a side note, my experience has been that DateTimeOffset is a more generally useful type, compared to DateTime.
We're using Lucene to develop a free text search box for data delivered to a user, as in the case of an email Inbox. We'd like to allow for the box to handle dates, for instance 5/1/2011. To make things easier, we are limiting the current version of the feature to just two date formats:
mm/dd/yy
mm/dd/yyyy
For our prototype we hacked the query analysis process to attempt to pre-process the query string to look for these two date patterns. This was about 2 years ago, and we were on Lucene 2.4. Im curious to see if there are any tools in Lucene out-of-the-box to accept a DateFormat and return a TokenStream with any identified dates. Looking through the javadocs for Lucene 2.9, I found the class:
org.apache.lucene.analysis.sinks.DateRecognizerSinkFilter
which seems to do what I need, but it implements a SinkFilter, a concept which doesn't seem to be documented in the Lucene Wiki. Has anyone used this filter before, and if so, what is the most effective way to use it?
There is a bit of sample code (which is, admittedly, over-complicated) in the documentation for TeeSinkTokenFilter. Note that the way the DateRecognizerSinkFilter is designed, it does not store the actual date; it just detects that a token is a date that conforms to the specified format. What I would try is to re-implement the DateRecognizerSinkFilter class to take an array of DateFormat instances, create a new Attribute class called DateAttribute (or some-such) and use the date recognizer subclass to set the parsed date into the DateAttribute if one of its formats matches. That way, you can always test whether you have a valid date by interrogating the DateAttribute, and localize the date formats to one class. Another advantage is that you won't have to handle multiple sinks, thereby simplifying the code from the linked example.
How do you test your app for Iñtërnâtiônàlizætiøn compliance? I tell people to store the Unicode string Iñtërnâtiônàlizætiøn into each field and then see if it is displayed correctly on output.
--- including output as a cell's content in Excel reports, in rtf format for docs, xml files, etc.
What other tests should be done?
Added idea from #Paddy:
Also try a right-to-left language. Eg, שלום ירושלים ([The] Peace of Jerusalem). Should look like:
(source: kluger.com)
Note: Stackoverflow is implemented correctly. If text does not match the image, then you have a problem with your browser, os, or possibly a proxy.
Also note: You should not have to change or "setup" your already running app to accept either the W European characters or the Hebrew example. You should be able to just type those characters into your app and have them come back correctly in your output. In case you don't have a Hebrew keyboard laying around, copy and paste the the examples from this question into your app.
Pick a culture where the text reads from right to left and set your system up for that - make sure that it reads properly (easier said than done...).
Use one of the three "pseudo-locales" available since Windows Vista:
The three different pseudo-locale are for testing 3 kinds of locales:
Base The qps-ploc locale is used for English-like pseudo
localizations. Its strings are longer versions of English strings,
using non-Latin and accented characters instead of the normal script.
Additionally simple Latin strings should sort in reverse order with
this locale.
Mirrored qpa-mirr is used for right-to-left pseudo data, which is
another area of interest for testing.
East Asian qps-asia is intended to utilize the large CJK character
repertoire, which is also useful for testing.
Windows will start formatting dates, times, numbers, currencies in a made-up psuedo-locale that looks enough like english that you can work with it, but obvious enough when you're not respecting the locale:
[Шěđлеśđαỳ !!!], 8 ōf [Μäŕςћ !!] ōf 2006
There is more to internationalization than unicode handling. You also need to make sure that dates show up localized to the user's timezone, if you know it (and make sure there's a way for people to tell you what their time zone is).
One handy fact for testing timezone handling is that there are two timezones (Pacific/Tongatapu and Pacific/Midway) that are actually 24 hours apart. So if timezones are being handled properly, the dates should never be the same for users in those two timezones for any timestamp. If you use any other timezones in your tests, results may vary depending on the time of day you run your test suite.
You also need to make sure dates and times are formatted in a way that makes sense for the user's locale, or failing that, that any potential ambiguity in the rendering of dates is explained (e.g. "05/11/2009 (dd/mm/yyyy)").
"Iñtërnâtiônàlizætiøn" is a really bad string to test with since all the characters in it also appear in ISO-8859-1, so the string can work completely without any Unicode support at all! I've no idea why it's so commonly used when it utterly fails at its primary function!
Even Chinese or Hebrew text isn't a good choice (though right-to-left is a whole can of worms by itself) because it doesn't necessarily contain anything outside 3-byte UTF-8, which curiously was a very large hole in MySQL's default UTF-8 implementation (which is limited to 3-byte chars), until it was fixed by the addition of the utf8mb4 charset in MySQL 5.5. These days one of the more common uses of >3-byte UTF-8 is Emojis like these: [💝🐹🌇⛔]. If you don't see some pretty little coloured pictures between those brackets, congratulations, you just found a hole in your Unicode stack!
First, learn The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets.
Make sure your application can handle Turkish. It has several quirks that break applications that assume English rules. Because there are four kinds of letter "i" (dotted and dot-less, upper and lower case), applications that assume uppercase(i) => I will break when using Turkish rules, where uppercase(i) => İ.
A common thing to do is check if the user typed the command "exit" by using lowercase(userInput) == "exit" or uppercase(userInput) == "EXIT". This works as expected under English rules but will fail under Turkish rules where "exıt" != "exit" and "EXİT" != "EXIT". To do this correctly, one must use case-insensitive comparison routines which are built into all modern languages.
I was thinking about this question from a completely different angle. I can't recall exactly what we did, but on a previous project I think we wound up changing the Regional Settings (in the Regional and Language Options control panel?) to help us ensure the localized strings were working.