Converting string into xstring without using function module - abap

I want to convert a string into a xstring. I know that there is a function module named "SCMS_STRING_TO_XSTRING"
But since it is not a good habit to use function modules anymore, a class based solution would be my prefered way to go.
I know that there is a class
cl_abap_conv_in_ce
but I can only validate, that this class can convert xstrings into string. I wand to have the reverse case. Does anyone have experience on how to do that class based?

Meanwhile, I found the solution on my own. For people who might be interested:
DATA(lo_conv) = cl_abap_conv_out_ce=>create( ).
lo_conv->write( data = lv_content ).
DATA(lv_xstring) = lo_conv->get_buffer( ).

The help text for XSTRING provides a nice functional method for this:
cl_abap_codepage=>convert_to( )

Firstly, you need to decide how you want it encoded. UTF-8? UTF-16? Just plain HEX?
For UTF-8 You can do the following using system calls (instead of function calls):
First do a global once-off initialization:
STATICS: g_conv_utf8 TYPE xstring. " used for conversion
DATA: l_flags TYPE c LENGTH 1.
system-call convert id 20
srcenc 'SET LOCALE LANGUAGE'
dstenc 'UTF-8'
replacement '#'
type l_flags
cinfo g_conv_utf8.
And then do subsequent calls: l_string -> l_xstring (+ l_len)
SYSTEM-CALL CONVERT ID 24
DATA l_string
ENDIAN ' '
IGNORE_CERR 'X'
N -1
BUFFER l_xstring
LEN l_length
CINFO g_conv_utf_8.
This is the essence of what cl_abap_codepage=>convert_to( ) does internally.

Related

regular expression, how to replace a part of a text preserving its length

I have, in a database, records that are serialized PHP strings that I must obfuscate emails if there are any. The simplest record is like {s:20:"pika.chu#pokemon.com"}. It is basically saying: this is a string of length 20 which is pika.chu#pokemon.com. This field can be kilobytes long with lot of emails (or none) and sometimes it is empty.
I wish I could use a SQL regular expression function to obfuscate the user part of the email while preserving the length of the string in order not to break the PHP serialization. The example email above shall be turned into {s:20:"xxxxxxxx#pokemon.com"} where the number of x matches the length of pika.chu.
Any thoughts?
Here is a more complete example of what can be found as serialized PHP:
a:4:{s:7:"locales";a:3:{i:0;s:5:"fr_FR";i:1;s:5:"de_DE";i:2;s:5:"en_US";}s:9:"publisher";s:18:"john#something.com";s:7:"authors";a:2:{i:0;s:21:"william#something.com";i:1;s:19:"debbie#software.org";}s:12:"published_at";O:8:"DateTime":3:{s:4:"date";s:26:"2022-01-26 13:05:26.531289";s:13:"timezone_type";i:3;s:8:"timezone";s:3:"UTC";}}
I tried to do it using native functions but it not worked because functions like REGEXP_REPLACE don't let you manipulate the match to get the size of it, for example.
Instead, I've created a UDF to do that:
CREATE TEMP FUNCTION hideEmail(str STRING)
RETURNS STRING
LANGUAGE js AS """
return str
.replace(/([a-zA-Z.0-9_\\+-:]*)#/g, function(txt){return '*'.repeat(txt.length-1)+"#";})
""";
select hideEmail('a:4:{s:7:"locales";a:3:{i:0;s:5:"fr_FR";i:1;s:5:"de_DE";i:2;s:5:"en_US";}s:9:"publisher";s:18:"john#something.com";s:7:"authors";a:2:{i:0;s:21:"william#something.com";i:1;s:19:"debbie#software.org";}s:12:"published_at";O:8:"DateTime":3:{s:4:"date";s:26:"2022-01-26 13:05:26.531289";s:13:"timezone_type";i:3;s:8:"timezone";s:3:"UTC";}}')
Result:
a:4:{s:7:"locales";a:3:{i:0;s:5:"fr_FR";i:1;s:5:"de_DE";i:2;s:5:"en_US";}s:9:"publisher";s:18:"****#something.com";s:7:"authors";a:2:{i:0;s:21:"*******#something.com";i:1;s:19:"******#software.org";}s:12:"published_at";O:8:"DateTime":3:{s:4:"date";s:26:"2022-01-26 13:05:26.531289";s:13:"timezone_type";i:3;s:8:"timezone";s:3:"UTC";}}

Why Language Key is displayed like 1 character instead of 2?

I was trying to learn how to use String Templates and encountering displaying characters.
What I was trying to display is:
SAP Logon Language Key EN
using this line of code:
WRITE: |{ text-003 } { sy-langu }|.
But instead, it only displays:
SAP Logon Language Key E
it only displays the first character of the language instead of the full 2 letters which are EN
SAP language codes are displayed as two letters, but are internally stored as just one. There are various data-types where the internal and the external representation differ. It's called a conversion-routine and it's defined on the level of the domain in the ABAP dictionary.
If you want to convert to the external representation of a language field, use the function module CONVERSION_EXIT_ISOLA_OUTPUT. If you want to do the reverse - convert a UI representation to the database representation - use CONVERSION_EXIT_ISOLA_INPUT.
To complete Philipp answer, you may also use WRITE to convert from database to external representation, it will automatically search the right conversion routine (ISOLA when it's about SY-LANGU):
DATA display_language_code TYPE c LENGTH 2.
WRITE sy-langu TO display_language_code.
ASSERT display_language_code = 'EN'.

Generating Random String of Numbers and Letters Using Go's "testing/quick" Package

I've been breaking my head over this for a few days now and can't seem to be able to figure it out. Perhaps it's glaringly obvious, but I don't seem to be able to spot it. I've read up on all the basics of unicode, UTF-8, UTF-16, normalisation, etc, but to no avail. Hopefully somebody's able to help me out here...
I'm using Go's Value function from the testing/quick package to generate random values for the fields in my data structs, in order to implement the Generator interface for the structs in question. Specifically, given a Metadata struct, I've defined the implementation as follows:
func (m *Metadata) Generate(r *rand.Rand, size int) (value reflect.Value) {
value = reflect.ValueOf(m).Elem()
for i := 0; i < value.NumField(); i++ {
if t, ok := quick.Value(value.Field(i).Type(), r); ok {
value.Field(i).Set(t)
}
}
return
}
Now, in doing so, I'll end up with both the receiver and the return value being set with random generated values of the appropriate type (strings, ints, etc. in the receiver and reflect.Value in the returned reflect.Value).
Now, the implementation for the Value function states that it will return something of type []rune converted to type string. As far as I know, this should allow me to then use the functions in the runes, unicode and norm packages to define a filter which filters out everything which is not part of 'Latin', 'Letter' or 'Number'. I defined the following filter which uses a transform to filter out letters which are not in those character rangetables (as defined in the unicode package):
func runefilter(in reflect.Value) (out reflect.Value) {
out = in // Make sure you return something
if in.Kind() == reflect.String {
instr := in.String()
t := transform.Chain(norm.NFD, runes.Remove(runes.NotIn(rangetable.Merge(unicode.Letter, unicode.Latin, unicode.Number))), norm.NFC)
outstr, _, _ := transform.String(t, instr)
out = reflect.ValueOf(outstr)
}
return
}
Now, I think I've tried just about anything, but I keep ending up with a series of strings which are far from the Latin range, e.g.:
𥗉똿穊
𢷽嚶
秓䝏小𪖹䮋
𪿝ท솲
𡉪䂾
ʋ𥅮ᦸ
堮𡹯憨𥗼𧵕ꥆ
𢝌𐑮𧍛併怃𥊇
鯮
𣏲𝐒
⓿ꐠ槹𬠂黟
𢼭踁퓺𪇖
俇𣄃𔘧
𢝶
𝖸쩈𤫐𢬿詢𬄙
𫱘𨆟𑊙
欓
So, can anybody explain what I'm overlooking here and how I could instead define a transformer which removes/replaces non-letter/number/latin characters so that I can use the Value function as intended (but with a smaller subset of 'random' characters)?
Thanks!
Confusingly the Generate interface needs a function using the type not a the pointer to the type. You want your type signature to look like
func (m Metadata) Generate(r *rand.Rand, size int) (value reflect.Value)
You can play with this here. Note: the most important thing to do in that playground is to switch the type of the generate function from m Metadata to m *Metadata and see that Hi Mom! never prints.
In addition, I think you would be better served using your own type and writing a generate method for that type using a list of all of the characters you want to use. For example:
type LatinString string
const latin = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01233456789"
and then use the generator
func (l LatinString) Generate(rand *rand.Rand, size int) reflect.Value {
var buffer bytes.Buffer
for i := 0; i < size; i++ {
buffer.WriteString(string(latin[rand.Intn(len(latin))]))
}
s := LatinString(buffer.String())
return reflect.ValueOf(s)
}
playground
Edit: also this library is pretty cool, thanks for showing it to me
The answer to my own question is, it seems, a combination of the answers provided in the comments by #nj_ and #jimb and the answer provided by #benjaminkadish.
In short, the answer boils down to:
"Not such a great idea as you thought it was", or "Bit of an ill-posed question"
"You were using the union of 'Letter', 'Latin' and 'Number' (Letter || Number || Latin), instead of the intersection of 'Latin' with the union of 'Letter' and 'Number' ((Letter || Number) && Latin))
Now for the longer version...
The idea behind me using the testing/quick package is that I wanted random data for (fuzzy) testing of my code. In the past, I've always written the code for doing things like that myself, again and again. This meant a lot of the same code across different projects. Now, I could of course written my own package for it, but it turns out that, even better than that, there's actually a standard package which does just about exactly what I want.
Now, it turns out the package does exactly what I want very well. The codepoints in the strings which it generates are actually random and not just restricted to what we're accustomed to using in everyday life. Now, this is of course exactly the thing which you want in doing fuzzy testing in order to test the code with values outside the usual assumptions.
In practice, that means I'm running into two problems:
There's some limits on what I would consider reasonable input for a string. Meaning that, in testing the processing of a Name field or a URL field, I can reasonably assume there's not going to be a value like 'James Mc⌢' (let alone 'James Mc🙁') or 'www.🕸site.com', but just 'James McFrown' and 'www.website.com'. Hence, I can't expect a reasonable system to be able to support it. Of course, things shouldn't completely break down, but it also can't be expected to handle the former examples without any problems.
When I filter the generated string on values which one might consider reasonable, the chance of ending up with a valid string is very small. The set of possible characters in the set used by the testing/quick is just so large (0x10FFFF) and the set of reasonable characters so small, you end up with empty strings most of the time.
So, what do we need to take away from this?
So, whilst I hoped to use the standard testing/quick package to replace my often repeated code to generate random data for fuzzy testing, it does this so well that it provides data outside the range of what I would consider reasonable for the code to be able to handle. It seems that the choice, in the end, is to:
Either be able to actually handle all fuzzy options, meaning that if somebody's name is 'Arnold 💰💰' ('Arnold Moneybags'), it shouldn't go arse over end. Or...
Use custom/derived types with their own Generator. This means you're going to have to use the derived type instead of the basic type throughout the code. (Comparable to defining a string as wchar_t instead of char in C++ and working with those by default.). Or...
Don't use testing/quick for fuzzy testing, because as soon as you run into a generated string value, you can (and should) get a very random string.
As always, further comments are of course welcome, as it's quite possible I overlooked something.

How to add text plus the text written from a Parameter type C in ABAP?

I am working in an ABAP program and I have a question.
For example in C# when we have a String variable: string name; , and we want this to be filled with some data from a textbox but also add some ohter text.
For example:
string name = "Hello: " + textBox1.text;,
And I want to ask you how can I do this in ABAP ??? How to add text plus the text written from a Parameter type C?
CONCATENATE and the concatenate operator && will do it as answered by Jagger and vwegert. To do it with string expressions, you use the below where name is the screen field or whatever that has the name in it (it doesn't need to be a field-symbol):
greeting = |Hello: { <name> }|.
String expressions are extremely useful as they can be used to build up complex values without declaring extra variables - e.g. they can passed as directly as function module or method parameters without first assigning to a local variable.
You can either use the CONCATENATE keyword or -- in newer releases -- string expressions. Be sure to check the online documentation and sample programs available using the transaction ABAPDOCU, it will save you a ton of seemingly basic questions.
The equivalent operator is &&.
So in your case it would be:
name = 'Hello: ' && textBox1->text.

What's the suffix (type character) for "Byte" numeric constants in VB.NET?

Just out of curiosity:
I know I can tell the compiler if I want a value to be interpreted as a certain numeric type, e.g. as Integer (32 bit signed), this way appending an "I" (type character) to the constant value:
Private Function GetTheAnswerAsInteger() As Integer
Return 42I
End Function
There's also "S" for Short, "D" for Decimal, etc.
But what is the suffix for Byte? Hint: it's not the obvious one "B"...
There isn't one. If you need to distinguish between an integer and a byte (e.g. to call an appropriate overload) for a constant, you need to cast.
(The same is true in C#, by the way.)
MSDN provides confirmation:
Byte has no literal type character or
identifier type character.
There's also a list of type characters and literal suffixes.
So, we added binary literals in VB last fall and got similar feedback
from early testers. We did decide to add a suffix for byte for VB. We
settled on SB (for signed byte) and UB (for unsigned byte). The reason
it's not just B and SB is two-fold.
One, the B suffix is ambiguous if you're writing in hexadecimal (what
does 0xFFB mean?) and even if we had a solution for that, or another
character than 'B' ('Y' was considered, F# uses this) no one could
remember whether the default was signed or unsigned - .NET bytes are
unsigned by default so it would make sense to pick B and SB but all
the other suffixes are signed by default so it would be consistent
with other type suffixes to pick B and UB. In the end we went for
unambiguous SB and UB.
-- Anthony D. Green,
https://roslyn.codeplex.com/discussions/542111
It has been integrated to the upcoming VB.NET release, and this is the way it will work:
Public Const MyByte As Byte = 4UB;
Public Const MyByte2 As SByte = 4SB;
This answer does not really provide a suffix, but it's as close as it gets.
If you define an extension method as
Imports System.Runtime.CompilerServices
Module IntegerExtensions
<Extension()> _
Public Function B(ByVal iNumber As Integer) As Byte
Return Convert.ToByte(iNumber)
End Function
End Module
You can use it like this:
Private Function GetTheAnswerAsByte() As Byte
Return 42.B
End Function
There's no byte literal in .NET.