I am writing a simple Java applet for my friend. Is there a character I can use that will never be practically used from the keyboard or almost never when taking notes as a signal? I need to record a name, and then a note for that name. But the note may have newlines so I cannot just use newlines here.
You could escape any newlines in the note if you want to use newlines as a delimiter. What would be more sensible, though, would be to just use a well-defined, lightweight file format that already exists, like CSV.
If you really want to go rogue and bash together your own format, though, the traditional choice is NUL, a.k.a. \0 or \u0000.
Related
For the data I'm processing, the field separation character is ':'. I was wondering if there's an FS value that can be set to allow allow fields to contain : by escaping it (':').
From where I've looked, I doesn't think it is possible with AWK's regular expressions set, but I figured I'd ask anyways.
Luckily it isn't necessary for my current project, just curious for future reference.
I am processing a lot of CSV files that have people data and occasionally names are used non-alpha numeric characters like á and those all become � symbols in the datatable. How do i prevent this problem ? I just wanna leave all the names as they are in the file without making any changes.
Thanks,
L
The most common reason for this is that it is actually encoded in ISO-8859-1 and interpreted as UTF-8. For less common reasons, the same principle applies, that is, something is in different encoding that it claims to be.
Change the character encoding in the database or decode it when you read from the DB.
While processing, you need a Reader or something. I suggest you configure it by using a System.Encoding.UnicodeEncoding or UTF32Encoding.
I've got a Prolog program where I'm doing some brute force search on all strings up to a certain length. I'm checking which strings match a certain pattern, keeping adding patterns until hopefully I find a set of patterns that covers all strings. I would like to store which ones to a file which don't match any of my patterns, so that when I add a new pattern, I only need to check the leftovers, instead of doing the entire brute force search again.
If I were writing this in python, I would just pickle the list of strings, and load it from the file. Does anybody know how to do something similar in Prolog?
I have a good amount of Prolog programming experience, but very little with Prolog IO. I could probably write a predicate to read a file and parse it into a term, but I figured there might be a way to do it more easily.
If you want to write out a term and be able to read it back later at any time barring variables names, use the ISO built-in write_canonical/1 or write_canonical/2. It is quite well supported by current systems. writeq/1 and write/1 work often too, but not always. writeq/1 uses operator syntax (so you need to read it back with the very same operators present) and write/1 does not use quotes. So they work "most of the time" — until they break.
Alternatively, you may use the ISO write-options [quoted(true), ignore_ops(true), numbervars(false)] in write_term/2 or write_term/3. This might be interesting to you if you want to use further options like variable_names/1 to retain also the names of the variables.
Also note that the term written does not include a period at the end. So you have to write a space and a period manually at the end. The space is needed to ensure that an atom consisting of graphic characters does not clobber with the period at the end. Think of writing the atom '---' which must be written as --- . and not as ---. You might write the space only in case of an atom. Or an atom that does not "glue" with .
writeq and read make a similar job, but read the note on writeq about operators, if you declare any.
Consider using read/1 to read a Prolog term. For more complex or different kinds of parsing, consider using DCGs and then phrase_from_file/2 with SWI's library(pio).
I have generated the Create statement for a SQL Server view.
Pretty standard, although there is a some replacing happening on a varchar column, such as:
select Replace(txt, '�', '-')
What the heck is '�'?
When I run that against a row that contains that character, I am seeing the literal '?' being replaced.
Any ideas? Do I need some special encoding in my editor?
Edit
If it helps the end point is a Google feed.
You need to read the script in the same encoding as that in which it was written. Even then, if your editor's font doesn't include a glyph for the character, it may still not display correctly.
When the script was created, did you choose an encoding, or accept the default? If the later, you need to find out which encoding was used. UTF-8 is likely.
However, in this case, the character may not be a mis-representation. Unicode replacement character explains that this character is used as a replacement for some other character that cannot be represented. It's possible in your case that the code you are looking at is simply saying, if we have some data that could not be represented, treat it as a hyphen instead. In other words, this may be nothing to do with the script generation/viewing process, but rather a deliberate piece of code.
I'm thinking about writing a templating tool for generating T-SQL code, which will include delimited sections like below;
SELECT
~~idcolumn~~
FROM
~~table~~
WHERE
~~table~~.flag = 1
Notice the double-tildes delimiting bits? This is an idea for an escape sequence in my templating language. But I want to be certain that the escape sequence is valid -- that it will never occur in a valid T-SQL statement. Problem is, I can't find any official microsoft description of the T-SQL language.
Does anyone know of an official specification for the T-SQL language, or at least the lexing rules? So I can make an informed decision about the escape sequence.
UPDATES:
Thanks for the suggestions so far, but I'm not looking for confirmation of the '~~' escape sequence per se. What I need is a document I can reference I can point to and say 'microsoft says this character sequence is totally impossible in T-SQL.' For instance, microsoft publish the language specification for C# here which includes a description of what characters can go into valid C# programs. (see page 67 of the pdf.) I'm looking for a similar reference.
The double-tilde: "~~" is actually perfectly good T-SQL. For instance; "(SELECT ~~1)" returns '1'.
There are several well known and often used formats for template parameters, one example being $(paramname) (also used in other scripts as well as T-SQL scripts)
Why not use an existing format?
It doesn't matter if ~~ is legal TSQL or not, if you provide an escape for producing ~~ in actual TSQL when you need it.
Since template parameters have to have a nonzero-length identifier, you have a peculiar case where the identifier length is ridiculously "zero", e.g., ~~~~. This kind of thing makes an ideal escape sequence, since it is useless for anything else. Simply process your template text; whenever you find ~~~~ replace it by the named parameter string, and whenever you find ~~~~ replace it by ~~. Now, if ~~ is needed in the final TSQL, just write ~~~~ in your template.
I suspect that even if you do this, that the number of times you'll actually write ~~~~ in practice will be close to zero, so the reason for doing it is theoretical completeness and giving you a warm fuzzy feeling that you can write anything in a template.
Well, I'm not sure about a complete description of the language, but it appears that ~~ could occur in an identifier provided that it is quoted (in brackets, typically).
You may have more luck with a convention saying you don't support identifiers with ~~ in them. Or, just reserve your own lexical symbols and don't worry about ~~ occurring elsewhere.
You could treat quoted literals and strings as content, regardless if they contain your escape-sequence. It would make it more robust.
Run the text trough a lexer, to separate each token. If the token is a string or a quoted literal, treat it as such. But if it is a literal that begins and ends with ~~, you can safely assume it is a template placeholder.
I'm not sure you'll find something that will never occur in a valid statement. Consider:
DECLARE #TemplateBreakingString varchar(100) = '~~I hope this works~~'
or
CREATE TABLE [~~TemplateBreakingTable~~] (IDField INT Identity)
Your escape sequence can occur in string literals, but that is all. That said, Microsoft owns t-sql, and they are free to do anything they want with it moving forward for future versions of sql server. Still, I think ~~ is safe enough.