I'm on this frustrating journey trying to get a specific character from a Swift string. I have an Objective-C function, something like
- ( NSString * ) doIt: ( char ) c
that I want to call from Swift.
This c is eventually passed to a C function in the back that does the weightlifting here but this function gets tripped over when c is or A0.
Now I have two questions (apologies SO).
I am trying to use different encodings, especially the ASCII variants, hoping one would convert (A0) to spcae (20 or dec 32). The verdict seems to be that I need to hardcode this but if there is a failsafe, non-hardcoded way I'd like to hear about it!
I am really struggling with the conversion itself. How do I access a specific character using a specific encoding in Swift?
a) I can use
s.utf8CString[ i ]
but then I am bound to UTF8.
b) I can use something like
let s = "\u{a0}"
let p = UnsafeMutablePointer < CChar >.allocate ( capacity : n )
defer
{
p.deallocate()
}
// Convert to ASCII
NSString ( string : s ).getCString ( p,
maxLength : n,
encoding : CFStringConvertEncodingToNSStringEncoding ( CFStringBuiltInEncodings.ASCII.rawValue ) )
// Hope for 32
let c = p[ i ]
but this seems overkill. The string is converted to NSString to apply the encoding and I need to allocate a pointer, all just to get a single character.
c) Here it seems Swift String's withCString is the man for the job, but I can not even get it to compile. Below is what Xcode's completion gives but even after fiddling with it for a long time I am still stuck.
// How do I use this
// ??
s.withCString ( encodedAs : _UnicodeEncoding.Protocol ) { ( UnsafePointer < FixedWidthInteger & UnsignedInteger > ) -> Result in
// ??
}
TIA
There are two withCString() methods: withCString(_:) calls the given closure with a pointer to the contents of the string, represented as a null-terminated sequence of UTF-8 code units. Example:
// An emulation of your Objective-C method.
func doit(_ c: CChar) {
print(c, terminator: " ")
}
let s = "a\u{A0}b"
s.withCString { ptr in
var p = ptr
while p.pointee != 0 {
doit(p.pointee)
p += 1
}
}
print()
// Output: 97 -62 -96 98
Here -62 -96 is the signed character representation of the UTF-8 sequence C2 A0 of the NO-BREAK SPACE character U+00A0.
If you just want to iterate over all UTF-8 characters of the string sequentially then you can simply use the .utf8 view. The (unsigned) UInt8 bytes must be converted to the corresponding (signed) CChar:
let s = "a\u{A0}b"
for c in s.utf8 {
doit(CChar(bitPattern: c))
}
print()
I am not aware of a method which transforms U+00A0 to a “normal” space character, so you have to do that manually. With
let s = "a\u{A0}b".replacingOccurrences(of: "\u{A0}", with: " ")
the output of the above program would be 97 32 98.
The withCString(encodedAs:_:) method calls the given closure with a pointer to the contents of the string, represented as a null-terminated sequence of code units. Example:
let s = "a\u{A0}b€"
s.withCString(encodedAs: UTF16.self) { ptr in
var p = ptr
while p.pointee != 0 {
print(p.pointee, terminator: " ")
p += 1
}
}
print()
// Output: 97 160 98 8364
This method is probably of limited use for your purpose because it can only be used with UTF8, UTF16 and UTF32.
For other encodings you can use the data(using:) method. It produces a Data value which is a sequence of UInt8 (an unsigned type). As above, these must be converted to the corresponding signed character:
let s = "a\u{A0}b"
if let data = s.data(using: .isoLatin1) {
data.forEach {
doit(CChar(bitPattern: $0))
}
}
print()
// Output: 97 -96 98
Of course this may fail if the string is not representable in the given encoding.
I have tried to find the solution for this problem, but keep running my head at the wall with this one.
This function is part of a Go SQL wrapper, and the function getJSON is called to extract the informations from the sql response.
The problem is, that the id parameter becomes jibberish and does not match the desired response, all the other parameters read are correct thou, so this really weirds me out.
Thank you in advance, for any attempt at figurring this problem out, it is really appreciated :-)
func getJSON(rows *sqlx.Rows) ([]byte, error) {
columns, err := rows.Columns()
rawResult := make([][]byte, len(columns))
dest := make([]interface{}, len(columns))
for i := range rawResult {
dest[i] = &rawResult[i]
}
defer rows.Close()
var results []map[string][]byte
for rows.Next() {
result := make(map[string][]byte, len(columns))
rows.Scan(dest...)
for i, raw := range rawResult {
if raw == nil {
result[columns[i]] = []byte("")
} else {
result[columns[i]] = raw
fmt.Println(columns[i] + " : " + string(raw))
}
}
results = append(results, result)
}
s, err := json.Marshal(results)
if err != nil {
panic(err)
}
rows.Close()
return s, nil
}
An example of the response, taking from the terminal:
id : r�b�X��M���+�2%
name : cat
issub : false
Expected result:
id : E262B172-B158-4DEF-8015-9BA12BF53225
name : cat
issub : false
That's not about type conversion.
An UUID (of any type; presently there are four) is defined to be a 128-bit-long lump of bytes, which is 128/8=16 bytes.
This means any bytes — not necessarily printable.
What you're after, is a string representation of an UUID value, which
Separates certain groups of bytes using dashes.
Formats each byte in these groups using hexadecimal (base-16) representation.
Since base-16 positional count represents values 0 through 15 using a single digit ('0' through 'F'), a single byte is represented by two such digits — a digit per each group of 4 bits.
I think any sensible UUID package should implement a "decoding" function/method which would produce a string representation out of those 16 bytes.
I have picked a random package produced by performing this search query, and it has github.com/google/uuid.FromBytes which produces an UUID from a given byte slice, and the type of the resulting value implements the String() method which produces what you're after.
This is extension to the following question
Size of Serialized data is not reducing using flatbuffer
As mentioned in the answer to reduce space we should use Struct. But in my case I need to define an idl file for Polygon
Each polygon will have five or more points, And I will have another DS which will have
array of polygons
I have define my fbs file as follow
namespace MyFlat;
struct Vertices {
x : double;
y :double;
}
table Polygon {
polygons : [Vertices];
}
table Layer {
polygons : [Polygon];
}
root_type Layer;
As expected with this my serialized data size is coming quite large. Is there any way to optimize the padding in table to reduce the serialized buffer size
There's no need to further optimize the structure of your data here, since >90% of the size of these buffers will typically be taken up by Vertices.
One thing to consider is to use float for x and y, given that you're unlikely to need to extra resolution.. that would almost half the size of your buffer.
Thanks for your answer. But When I am trying to print the size of 100 polygons having vertices 5 , the size is coming around 10.24KB. Ideally size should be around 8000 bytes(8 KB)
b := flatbuffers.NewBuilder(0)
var polyoffset []flatbuffers.UOffsetT
size := 100
StartedAtMarshal := time.Now()
for k := 0; k < size; k++ {
MyFlat.PolygonStartPolygonsVector(b, 5)
for i := 0; i < 5; i++ {
MyFlat.CreateVertices(b, 2.0, 2.4)
}
vec := b.EndVector(5)
MyFlat.PolygonStart(b)
MyFlat.PolygonAddPolygons(b, vec)
polyoffset = append(polyoffset, MyFlat.PolygonEnd(b))
}
MyFlat.LayerStartPolygonsVector(b, size)
for _, offset := range polyoffset {
b.PrependUOffsetT(offset)
}
vec := b.EndVector(size)
MyFlat.LayerStart(b)
MyFlat.LayerAddPolygons(b, vec)
finalOffset := MyFlat.LayerEnd(b)
b.Finish(finalOffset)
EndedAtMarshal := time.Now()
fmt.Println("Elapes Time for Seri", EndedAtMarshal.Sub(StartedAtMarshal).String())
mybyte := b.FinishedBytes()
fmt.Println(len(mybyte))
Is it expected size or My implementation is wrong
I have the following questions regarding BLOBs in sqlite:
Does sqlite keep track of sizes of BLOBs?
I'm guessing that it does, but then, does the length function use it, or does it read the BLOB's content?
If sqlite keeps track of the size of the BLOB and length doesn't use it, is the size accessible via some other functionality?
I'm asking this because I'm wondering if I should implement triggers that set BLOBs' sizes in additional columns, of if I can obtain the sizes dynamically without the performance hit of sqlite reading the BLOBs.
From the source:
** In an SQLite index record, the serial type is stored directly before
** the blob of data that it corresponds to. In a table record, all serial
** types are stored at the start of the record, and the blobs of data at
** the end. Hence these functions allow the caller to handle the
** serial-type and data blob seperately.
**
** The following table describes the various storage classes for data:
**
** serial type bytes of data type
** -------------- --------------- ---------------
** 0 0 NULL
** 1 1 signed integer
** 2 2 signed integer
** 3 3 signed integer
** 4 4 signed integer
** 5 6 signed integer
** 6 8 signed integer
** 7 8 IEEE float
** 8 0 Integer constant 0
** 9 0 Integer constant 1
** 10,11 reserved for expansion
** N>=12 and even (N-12)/2 BLOB
** N>=13 and odd (N-13)/2 text
In other words, the blob size is in the serial, and it's length is simply "(serial_type-12)/2".
This serial is stored before the actual blob, so you don't need to read the blob to get its size.
Call sqlite3_blob_open and then sqlite3_blob_bytes to get this value.
Write a 1byte and a 10GB blob in a test database. If length() takes the same time for both blobs, the blob's length is probably accessed. Otherwise the blob is probably read.
OR: download the source code and debug through it: http://www.sqlite.org/download.html. These are some relevant bits:
/*
** Implementation of the length() function
*/
static void lengthFunc(
sqlite3_context *context,
int argc,
sqlite3_value **argv
){
int len;
assert( argc==1 );
UNUSED_PARAMETER(argc);
switch( sqlite3_value_type(argv[0]) ){
case SQLITE_BLOB:
case SQLITE_INTEGER:
case SQLITE_FLOAT: {
sqlite3_result_int(context, sqlite3_value_bytes(argv[0]));
break;
}
case SQLITE_TEXT: {
const unsigned char *z = sqlite3_value_text(argv[0]);
if( z==0 ) return;
len = 0;
while( *z ){
len++;
SQLITE_SKIP_UTF8(z);
}
sqlite3_result_int(context, len);
break;
}
default: {
sqlite3_result_null(context);
break;
}
}
}
and then
/*
** Return the number of bytes in the sqlite3_value object assuming
** that it uses the encoding "enc"
*/
SQLITE_PRIVATE int sqlite3ValueBytes(sqlite3_value *pVal, u8 enc){
Mem *p = (Mem*)pVal;
if( (p->flags & MEM_Blob)!=0 || sqlite3ValueText(pVal, enc) ){
if( p->flags & MEM_Zero ){
return p->n + p->u.nZero;
}else{
return p->n;
}
}
return 0;
}
You can see that the length of text data is calculated on the fly. That of blobs... well, I'm not fluent enough in C... :-)
If you have access to the raw c api sqlite3_blob_bytes will do the job for you. If not please provide additional information.
I was recently asked to complete a task for a c++ role, however as the application was decided not to be progressed any further I thought that I would post here for some feedback / advice / improvements / reminder of concepts I've forgotten.
The task was:
The following data is a time series of integer values
int timeseries[32] = {67497, 67376, 67173, 67235, 67057, 67031, 66951,
66974, 67042, 67025, 66897, 67077, 67082, 67033, 67019, 67149, 67044,
67012, 67220, 67239, 66893, 66984, 66866, 66693, 66770, 66722, 66620,
66579, 66596, 66713, 66852, 66715};
The series might be, for example, the closing price of a stock each day
over a 32 day period.
As stored above, the data will occupy 32 x sizeof(int) bytes = 128 bytes
assuming 4 byte ints.
Using delta encoding , write a function to compress, and a function to
uncompress data like the above.
Ok, so before this point I had never looked at compression so my solution is far from perfect. The manner in which I approached the problem is by compressing the array of integers into a array of bytes. When representing the integer as a byte I keep the calculate most
significant byte (msb) and keep everything up to this point, whilst throwing the rest away. This is then added to the byte array. For negative values I increment the msb by 1 so that we can
differentiate between positive and negative bytes when decoding by keeping the leading
1 bit values.
When decoding I parse this jagged byte array and simply reverse my
previous actions performed when compressing. As mentioned I have never looked at compression prior to this task so I did come up with my own method to compress the data. I was looking at C++/Cli recently, had not really used it previously so just decided to write it in this language, no particular reason. Below is the class, and a unit test at the very bottom. Any advice / improvements / enhancements will be much appreciated.
Thanks.
array<array<Byte>^>^ CDeltaEncoding::CompressArray(array<int>^ data)
{
int temp = 0;
int original;
int size = 0;
array<int>^ tempData = gcnew array<int>(data->Length);
data->CopyTo(tempData, 0);
array<array<Byte>^>^ byteArray = gcnew array<array<Byte>^>(tempData->Length);
for (int i = 0; i < tempData->Length; ++i)
{
original = tempData[i];
tempData[i] -= temp;
temp = original;
int msb = GetMostSignificantByte(tempData[i]);
byteArray[i] = gcnew array<Byte>(msb);
System::Buffer::BlockCopy(BitConverter::GetBytes(tempData[i]), 0, byteArray[i], 0, msb );
size += byteArray[i]->Length;
}
return byteArray;
}
array<int>^ CDeltaEncoding::DecompressArray(array<array<Byte>^>^ buffer)
{
System::Collections::Generic::List<int>^ decodedArray = gcnew System::Collections::Generic::List<int>();
int temp = 0;
for (int i = 0; i < buffer->Length; ++i)
{
int retrievedVal = GetValueAsInteger(buffer[i]);
decodedArray->Add(retrievedVal);
decodedArray[i] += temp;
temp = decodedArray[i];
}
return decodedArray->ToArray();
}
int CDeltaEncoding::GetMostSignificantByte(int value)
{
array<Byte>^ tempBuf = BitConverter::GetBytes(Math::Abs(value));
int msb = tempBuf->Length;
for (int i = tempBuf->Length -1; i >= 0; --i)
{
if (tempBuf[i] != 0)
{
msb = i + 1;
break;
}
}
if (!IsPositiveInteger(value))
{
//We need an extra byte to differentiate the negative integers
msb++;
}
return msb;
}
bool CDeltaEncoding::IsPositiveInteger(int value)
{
return value / Math::Abs(value) == 1;
}
int CDeltaEncoding::GetValueAsInteger(array<Byte>^ buffer)
{
array<Byte>^ tempBuf;
if(buffer->Length % 2 == 0)
{
//With even integers there is no need to allocate a new byte array
tempBuf = buffer;
}
else
{
tempBuf = gcnew array<Byte>(4);
System::Buffer::BlockCopy(buffer, 0, tempBuf, 0, buffer->Length );
unsigned int val = buffer[buffer->Length-1] &= 0xFF;
if ( val == 0xFF )
{
//We have negative integer compressed into 3 bytes
//Copy over the this last byte as well so we keep the negative pattern
System::Buffer::BlockCopy(buffer, buffer->Length-1, tempBuf, buffer->Length, 1 );
}
}
switch(tempBuf->Length)
{
case sizeof(short):
return BitConverter::ToInt16(tempBuf,0);
case sizeof(int):
default:
return BitConverter::ToInt32(tempBuf,0);
}
}
And then in a test class I had:
void CTestDeltaEncoding::TestCompression()
{
array<array<Byte>^>^ byteArray = CDeltaEncoding::CompressArray(m_testdata);
array<int>^ decompressedArray = CDeltaEncoding::DecompressArray(byteArray);
int totalBytes = 0;
for (int i = 0; i<byteArray->Length; i++)
{
totalBytes += byteArray[i]->Length;
}
Assert::IsTrue(m_testdata->Length * sizeof(m_testdata) > totalBytes, "Expected the total bytes to be less than the original array!!");
//Expected totalBytes = 53
}
This smells a lot like homework to me. The crucial phrase is: "Using delta encoding."
Delta encoding means you encode the delta (difference) between each number and the next:
67497, 67376, 67173, 67235, 67057, 67031, 66951, 66974, 67042, 67025, 66897, 67077, 67082, 67033, 67019, 67149, 67044, 67012, 67220, 67239, 66893, 66984, 66866, 66693, 66770, 66722, 66620, 66579, 66596, 66713, 66852, 66715
would turn into:
[Base: 67497]: -121, -203, +62
and so on. Assuming 8-bit bytes, the original numbers require 3 bytes apiece (and given the number of compilers with 3-byte integer types, you're normally going to end up with 4 bytes apiece). From the looks of things, the differences will fit quite easily in 2 bytes apiece, and if you can ignore one (or possibly two) of the least significant bits, you can fit them in one byte apiece.
Delta encoding is most often used for things like sound encoding where you can "fudge" the accuracy at times without major problems. For example, if you have a change from one sample to the next that's larger than you've left space to encode, you can encode a maximum change in the current difference, and add the difference to the next delta (and if you don't mind some back-tracking, you can distribute some to the previous delta as well). This will act as a low-pass filter, limiting the gradient between samples.
For example, in the series you gave, a simple delta encoding requires ten bits to represent all the differences. By dropping the LSB, however, nearly all the samples (all but one, in fact) can be encoded in 8 bits. That one has a difference (right shifted one bit) of -173, so if we represent it as -128, we have 45 left. We can distribute that error evenly between the preceding and following sample. In that case, the output won't be an exact match for the input, but if we're talking about something like sound, the difference probably won't be particularly obvious.
I did mention that it was an exercise that I had to complete and the solution that I received was deemed not good enough, so I wanted some constructive feedback seeing as actual companies never decide to tell you what you did wrong.
When the array is compressed I store the differences and not the original values except the first as this was my understanding. If you had looked at my code I have provided a full solution but my question was how bad was it?