change display format for all string variables to as short as possible - printf

After compressing my data, I have several string variables with storage type str4 or str1 and format %9s. I would like to revert them all to the default display format, which help dformat reports should be %#s for str#. Is there a quick way to do this?
This is the structure of my best guess:
ds, has(type string)
foreach v of varlist `r(varlist)' {
format `v'
}
This does not work because, instead of reformatting to the default value with this command, the format function just displays the format.
A reproducible example:
clear
input str50 mystr
"b"
"a"
end
compress
format myst
This is the situation I was confronted with. I am not sure if it applies beyond strL formatted variables. (Roberto suspects that it does not; see comments.)
Addendum. My goal here was to make browse-ing my data easier. It seems that format is respected in the browser (truncating to length of one for %1s, say), while it is overridden by the actual length of the string when printing to the console.

I am surprised that you seem surprised, as with your syntax the format command (not function) does indeed just display the format, as is documented. Incidentally, you don't need a loop to do that as format will take a varlist as argument:
. clear
. set obs 1
obs was 0, now 1
. foreach t in a b c {
2. gen `t' = "`t'"
3. }
. format a b c
variable name display format
-----------------------------
a %9s
b %9s
c %9s
-----------------------------
Joshing apart, I think you need just one line here, which is something like
. format a b c %1s
or
. format a b c %-1s
to signal which justification. Stata doesn't truncate displayed strings just because they don't match the string display format; it might well truncate strings because there isn't space to show them, but (I'm naturally open to counter-examples) the above display formats for variables will do well for most purposes.
EDIT: The following device may help.
gen length = 0
ds, has(type string)
quietly foreach v in `r(varlist)' {
replace length = length(`v')
su length, meanonly
format `v' %`r(max)'s
}
drop length

Related

how to correct incoming data that contains char not in Ascii to unicode before saving to the database

I have a webservice api in vb.net that accepts string. but i cannot control the data coming to this API. I sometimes receive chars in between words in this format (–, Á, •ï€,ââ€ï€, etc. ) Is there a way for me to handle these or convert these characters to their correct symbols before saving to the database?
i know that the best solution would be to go after the source where the characters get malformed.. but i'll make that as plan B
my code is already using utf-8 as encoding pattern, but what if the client that uses my API messed up and inadvertently sent the malformed char thru the API. can i clean that string and convert the malformed char to the correct symbol?
If you only want to accept ASCII characters, you could remove non-ASCII characters by encoding and decoding the string - the default ASCII encoding uses "?" as a substitute for unrecognized characters, so you probably want to override that:
' Using System.Text
Dim input As String = "âh€eÁlâl€o¢wïo€râlâd€ï€"
Dim ascii As Encoding = Encoding.GetEncoding(
"us-ascii",
New EncoderReplacementFallback(" "),
New DecoderReplacementFallback(" ")
)
Dim bytes() As Byte = ascii.GetBytes(input)
Dim output As String = ascii.GetString(bytes)
Output:
h e l l o w o r l d
The replacement given to the En/DecoderReplacementFallback can be empty if you just want to drop the non-ASCII characters.
You could use a different encoding than ASCII if you want to accept more characters - but I would imagine that most of the characters you listed are valid in most European character sets.
While you are kind of vague I could guide you in something I think you could potentially do:
Sub Main()
Dim allowedValues = "abcdefghijklmnopqrstuvwxyz".ToCharArray()
Dim someGoodSomeBad = "###$##okay##$##"
Dim onlyGood = New String(someGoodSomeBad.ToCharArray().Where(Function(x) allowedValues.Contains(x)).ToArray)
Console.WriteLine(onlyGood)
End Sub
The first line would be valid characters, in my example I chose to use alpha characters, you could add more characters and numbers too. Basically you are creating a whitelist of acceptable characters, that you the developer would make.
The next line would be an output from your API that has some good and some bad lines.
The next part is really more simple than it looks. I am extending the string to be an array of characters, then I am finding ONLY the characters that match my whitelist in a lambda statement. Then I extend this to an array again because if I do a new String in .NET from a char array.
I then get a good string, but I could make 'good' to be subjective based on a whitelist.
The bigger question though is WHY is your Web API sending garbled data over? It should be sending well formed JSON or XML that is then able to well parsed and strongly type to models. Doing what I have shown above is more of a bandaide than a real fix to the underlying problem and it will have MANY holes.

Trailing space in i to string conversion in ABAP

On a SAP system, ABAP version 7.40 SP05, I just encountered a failure in unit tests on string comparison, but both strings should be the same?! Turns out it's not the case, as preceding conversion from i to string seems to produce extra trailing space in one of the strings.
This code bit:
DATA(i) = 111.
DATA(s1) = CONV string( i ).
DATA(s2) = '111'.
DATA(s3) = |111|.
Produces (as seen in debugger):
S1 111 3100310031002000 CString{4}
S2 111 310031003100 C(3)
S3 111 310031003100 CString{3}
The converted one has an extra trailing space. How does this happen and how can I prevent this to happen in i to string conversions? Obviously stuff like this makes me debug for a long time to find what is up (because unless I check the hex values, the debuger does not show that extra space...).
To understand why the space is added in the first place, check the documentation on the default conversion rules that are applied by CONV:
The character "-" is set at the last position for a negative value,
and a blank is set for a positive value.
Since you can't use the formatting options of string expressions with the CONV operator, I'd suggest changing the code to use |{ i }| (which might be a good idea for other values as well, since you'll probably need some formatting options when comparing date / time values in unit tests anyway).
You cannot prevent it. The best way I found so far in ABAP is use CONDENSE s1
DATA i type i VALUE 12.
DATA idx TYPE string.
idx = i. " idx = '12 '.
CONDENSE idx. " idx = '12'.

Limitting character input to specific characters

I'm making a fully working add and subtract program as a nice little easy project. One thing I would love to know is if there is a way to restrict input to certain characters (such as 1 and 0 for the binary inputs and A and B for the add or subtract inputs). I could always replace all characters that aren't these with empty strings to get rid of them, but doing something like this is quite tedious.
Here is some simple code to filter out the specified characters from a user's input:
local filter = "10abAB"
local input = io.read()
input = input:gsub("[^" .. filter .. "]", "")
The filter variable is just set to whatever characters you want to be allowed in the user's input. As an example, if you want to allow c, add c: local filter = "10abcABC".
Although I assume that you get input from io.read(), it is possible that you get it from somewhere else, so you can just replace io.read() with whatever you need there.
The third line of code in my example is what actually filters out the text. It uses string:gsub to do this, meaning that it could also be written like this:
input = string.gsub(input, "[^" .. filter .. "]", "").
The benefit of writing it like this is that it's clear that input is meant to be a string.
The gsub pattern is [^10abAB], which means that any characters that aren't part of that pattern will be filtered out, due to the ^ before them and the replacement pattern, which is the empty string that is the last argument in the method call.
Bonus super-short one-liner that you probably shouldn't use:
local input = io.read():gsub("[^10abAB]", "")

Fortran 90: How to correctly read an integer among other real

I have created a Fortran 90 code to filter and convert the text output of another program in a csv form. The file contains a table with columns of various types (character, real, integer). There is a column that generally contains decimal values (probability values). BUΤ, in some rows, where the value should be decimal "1.000", the value is actually integer "1".
I use "F5.3" specifier to read this column and I have the same format statement for every row of the table. So, when the code finds "1", it reads ".001", because it does not find a decimal point.
What ways could I use to correctly (and generally) read integers among other decimals?
Could I specify "unformatted" input only for a number of "spaces"?
The data edit descriptor fw.d for floating point format specification is for input normally used with zero d (it cannot be ommited). Nonzero d is used in the rare case when the floating point data is stored as scaled integers, or you do some unit conversion from the integer values.
You could try using list-directed input: use a * instead of a format specifier. This would be for the entire read, not selected items. Or you could read the lines into a string test their contents to decide how to read them. If the sub-string has a decimal point: read (string(M:N), '(F5.3)') value. If it doesn't, use a different format, e.g., perhaps read as as F5.0.
P.S. "unformatted" is reading binary data without conversion ... it is a direct copy of the data from the file to the data item. "listed-directed" is the Fortran term for reading & converting data without using a format specification.
well here's someting new to me: f90 allows a mix of comma and space delimiters for a simple list directed read:
read(unit,*)v1,v2,v3,v4
with input
1.222 2 , 3.14 , 4
yields
1.222000 2.000000 3.140000 4.000000

How to split lines in Haskell?

I have made a program which takes a 1000 digit number as input.
It is fixed, so I put this input into the code file itself.
I would obviously be storing it as Integer type, but how do I do it?
I have tried the program by having 1000 digits in the same line. I know this is the worst possible code format! But it works.
How can assign the variable this number, and split its lines. I read somewhere something about eos? Ruby, end of what?
I was thinking that something similar to comments could be used here.
Help will be appreciated.
the basic idea is to make this work:
a=3847981438917489137897491412341234
983745893289572395725258923745897232
instead of something like this:
a=3847981438917489137897491412341234983745893289572395725258923745897232
Haskell doesn't have a way to split (non-String) literals across multiple lines. Since Strings are an exception, we can shoehorn in other literals by parsing a multiline String:
v = read
"32456\
\23857\
\23545" :: Integer
Alternately, you can use list syntax if you think it's prettier:
v = read . concat $
["32456"
,"24357"
,"23476"
] :: Integer
The price you pay for this is that some work will be done (once) at runtime, namely, the parsing (e.g. read).