Printing to a Zebra ZQ510 using "line_print" mode on continuous feed receipt paper.
! U1 setvar "ezpl.media_type" "continuous" \r\n
! U1 setvar "device.languages" "line_print" \r\n
! U1 ENCODING UTF-8 \r\n
! U1 SETLP 5 0 28 \r\n
! U1 PAGE-WIDTH 580 \r\n
! U1 BEGIN-PAGE \r\n
Has the person to be vaccinated ever had \r\n
Guillain-Barré syndrome?: \r\n
! U1 END-PAGE \r\n
The above commands are output to the printer connection using this Objective-C code to encode the binary data as UTF-8:
-(void) printCommands: (NSString*) tPrintCommands {
...
[aPrinterConn write:[tPrintCommands dataUsingEncoding:NSUTF8StringEncoding] error:&aError];
...
}
But instead of "Barré", I get "Barré" on the printout. Seems the Unicode "é" is being treated as 2 separate characters by the printer.
I have tried substituting ! U1 ENCODING UTF-8 \r\n with ! U1 COUNTRY UTF-8 \r\n and I have positioned ! U1 ENCODING UTF-8 \r\n before and after BEGIN-PAGE. Same result every time.
Related
I have a program that iterates over all lines of a text file, adds spaces between the characters, and writes the output to the same file. However, if there are multiple lines in the input, I want the output to have separate lines as well. I tried:
let text = format!(r"{}\n", line); // Add newline character to each line (while iterating)
file.write_all(text.as_bytes()); // Write each line + newline
Here is an example input text file:
foo
bar
baz
And its output:
f o o\n b a r\n b a z
It seems that Rust treats "\n" as an escaped n character, but using r"\n" treats it as a string. How can I have Rust treat \n as a newline character to write multiple lines to a text file?
Note: I can include the rest of my code if you need it, let me know.
Edit: I am on Windows 7 64 bit
The problem is the 'r' in front of your string. Remove it and your program will print newlines instead of '\n'.
Also note that only most Unices use '\n' as newline. Windows uses "\r\n".
I have MIME header:
Subject: =?ISO-2022-JP?B?GyRCJzEnYidWJ1UnWSdRJ1wnURsoQg==?=
=?ISO-2022-JP?B?GyRCJ1kbKEIgGyRCLWIbKEIxNzUzNTk=?=
=?ISO-2022-JP?B?IBskQidjGyhCIBskQidjJ1EnWydkJ1EbKEI=?=
=?ISO-2022-JP?B?IBskQidXGyhCLRskQideJ2AnUidaJ10nbhsoQg==?=
When i try to decode first string GyRCJzEnYidWJ1UnWSdRJ1wnURsoQg== (base64 decode and then NSSring initWithData: encoding:), all right. My code works fine for hundreds of different MIME headers except follows...
...When i try to decode second sring GyRCJ1kbKEIgGyRCLWIbKEIxNzUzNTk=, NSString initWithData:encoding: return nil
For example, http://2cyr.com/decode/?lang=en decode all strings correctly (dont forget encode this strings from base64 befor using this site).
This isn't a base64 problem, it's an ISO-2022-JP problem. Actually it's a JIS-X-0208 problem. If you look at the base64-decoded (but still ISO-2022-JP encoded) string, you'll see that it contains the sequence ESC $ B - b (bytes 9 through 13). The first three are the ISO-2022-JP shift sequence to shift into JIS-X-0208-1983 (see RFC 1468 for details), and the next two are supposed to be a 2-byte encoding of a character, but if you work it out it's on line 13 of the kuten grid, which isn't defined.
tl;dr: That's not a valid character.
Maybe you are missing a final = in your string?
I'm using a "fun" HTML special-character (✰)(see http://html5boilerplate.com/ for more info) for a Server HTTP-header and am wondering if it is "allowed" per spec.
Using the Network Tab in the dev tools in Chrome on Windows Xp Pro SP 3 I see the ✰ just fine.
In IE8 the ✰ is not rendered correctly.
The w3.org HTML validator does not render it correctly (displays "â°" instead).
Now, I'm not too keen on character encodings ... and frankly I don't really care too much about them; I just blindly use UTF-8 cus I'm told to. :-)
Is the disparity caused by bugs in the different parsers/browses/engines/(whatever-they-are-called)?
Is there a spec for this or maybe a list of allowed characters for an HTTP-header "value"?
In short: Only ASCII is guaranteed to work. Some non-ASCII bytes are allowed for backwards compatibility, but are not supposed to be displayable.
HTTPbis gave up and specified that in the headers there is no useful encoding besides ASCII:
Historically, HTTP has allowed field content with text in the
ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
through use of [RFC2047] encoding. In practice, most HTTP header
field values use only a subset of the US-ASCII charset [USASCII].
Newly defined header fields SHOULD limit their field values to
US-ASCII octets. A recipient SHOULD treat other octets in field
content (obs-text) as opaque data.
Previously, RFC 2616 from 1999 defined this:
Words of *TEXT MAY contain characters from character sets other than ISO-
8859-1 [22] only when encoded according to the rules of RFC 2047 [14].
and RFC 2047 is the MIME encoding, so it'd be:
=?UTF-8?Q?=E2=9C=B0?=
but I don't think that many (if any) clients support it.
Please read comments first, this answer likely draws wrong conclusions from the right sources, needs edit.
You can use any printable ASCII chars, and no special chars like ✰ (Which is not ASCII)
Tip: you can encode anything in JSON.
Edit: may not be obvious at first, the character encoding defined in the header only applies for the response body, not for the header itself. (As it would cause a chicken-&-egg problem.)
I'd like to sum up all the relevant definitions as per the spec linked by Penchant.
message-header = field-name ":" [ field-value ]
field-name = token
field-value = *( field-content | LWS )
So, we are after field-value.
LWS = [CRLF] 1*( SP | HT )
CRLF = CR LF
CR = <US-ASCII CR, carriage return (13)>
LF = <US-ASCII LF, linefeed (10)>
SP = <US-ASCII SP, space (32)>
HT = <US-ASCII HT, horizontal-tab (9)>
LWS stands for Linear White Space. Essentially, LWS is Space or Tab, but you can break your field-value into multiple lines by starting a new line before a Space or Tab.
Let's simplify it to this:
field-value = <any field-content or Space or Tab>
Now we are after field-content.
field-content = <the OCTETs making up the field-value
and consisting of either *TEXT or combinations
of token, separators, and quoted-string>
OCTET = <any 8-bit sequence of data>
TEXT = <any OCTET except CTLs,
but including LWS>
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
token = 1*<any CHAR except CTLs or separators>
separators = "(" | ")" | "<" | ">" | "#"
| "," | ";" | ":" | "\" | <">
| "/" | "[" | "]" | "?" | "="
| "{" | "}" | SP | HT
TEXT is the most general and includes all the rest -so forget about the rest-.
Here is the US-ASCII charset (= ASCII)
As you can see, all printable ASCII chars are allowed.
I'm not very experienced with lower level things such as howmany bytes a character is. I tried finding out if one character equals one byte, but without success.
I need to set a delimiter used for socket connections between a server and clients. This delimiter has to be as small (in bytes) as possible, to minimize bandwidth.
The current delimiter is "#". Would getting an other delimiter decrease my bandwidth?
It depends on what character encoding you use to translate between characters and bytes (which are not at all the same thing):
In ASCII or ISO 8859, each character is represented by one byte
In UTF-32, each character is represented by 4 bytes
In UTF-8, each character uses between 1 and 4 bytes
In ISO 2022, it's much more complicated
US-ASCII characters (of whcich # is one) will take only 1 byte in UTF-8, which is the most popular encoding that allows multibyte characters.
It depends on the encoding. In Single-byte character sets such as ANSI and the various ISO8859 character sets it is one byte per character. Some encodings such as UTF8 are variable width where the number of bytes to encode a character depends on the glyph being encoded.
The answer of course is that it depends. If you are in a pure ASCII env, then yes, every char takes 1 byte, but if you are in a Unicode env (all of Windows for example), then chars can range from 1 to 4 bytes in size.
If you choose a char from the ASCII set, then yes your delimter is a small as possible.
No, all characters are 1 byte, unless you're using Unicode or wide characters (for accents and other symbols for example).
A character is 1 byte, or 8 bits, long which gives 256 possible combination to form characters with. 1 byte characters are called ASCII characters. They only use 7 bits (even though 8 are available, but you can't use this 8th bit) to form the standard alphabet and various symbols used when teletypes and typewriters were still common.
You can find an ASCII chart and what numbers correspond to what characters here.
Can anyone tell me the set of control characters for a PDF file, and how to escape them? I have a (non-deflated (inflated?)) PDF document that I would like to edit the text in, but I'm afraid of accidentally making some control sequence using parentheses and stuff.
Thanks.
Okay, I think I found it. On page 15 of the PDF 1.7 spec (PDF link), it appears that the only characters I need to worry about are the parentheses and the backslash.
Sequence | Meaning
---------------------------------------------
\n | LINE FEED (0Ah) (LF)
\r | CARRIAGE RETURN (0Dh) (CR)
\t | HORIZONTAL TAB (09h) (HT)
\b | BACKSPACE (08h) (BS)
\f | FORM FEED (FF)
\( | LEFT PARENTHESIS (28h)
\) | RIGHT PARENTHESIS (29h)
\\ | REVERSE SOLIDUS (5Ch) (Backslash)
\ddd | Character code ddd (octal)
Hopefully this was helpful to someone.
You likely already know this, but PDF files have an index at the end that contains byte offsets to everything in the document. If you edit the doc by hand, you must ensure that the new text you write has exactly the same number of characters as the original.
If you want to extract PDF page content and edit that, it's pretty straightforward. My CAM::PDF library lets you do it programmatically or via the command line:
use CAM::PDF;
my $pdf = CAM::PDF->new($filename);
my $page_content = $pdf->getPageContent($pagenum);
# ...
$pdf->setPageContent($pagenum, $page_content)l
$pdf->cleanoutput($out_filename);
or
getpdfpage.pl in.pdf 1 > page1.txt
setpdfpage.pl in.pdf page1.txt 1 out.pdf