how to configure serde with different value data formats? - serialization

I have the following code:
private Properties getStreamProperties(String suffix) {
Properties streamsConfiguration = new Properties();
streamsConfiguration.put(StreamsConfig.APPLICATION_ID_CONFIG, groupId + "-" + suffix);
streamsConfiguration.put(StreamsConfig.CLIENT_ID_CONFIG, applicationId + "-" + suffix);
streamsConfiguration.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, bootstrapServers);
streamsConfiguration.put(AbstractKafkaSchemaSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, schemaRegistryUrl);
// Specify default (de)serializers for record keys and for record values.
streamsConfiguration.put(StreamsConfig.DEFAULT_KEY_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName());
streamsConfiguration.put(StreamsConfig.DEFAULT_VALUE_SERDE_CLASS_CONFIG, Serdes.String().getClass().getName();
streamsConfiguration.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest");
streamsConfiguration.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 10 * 1000);
}
But my values consist of type string, double and long - not just string. How can I configure the properties to read for all types? Currently, in the messages being produced I can see all the values pushed together as one string value rather than each having their own field.

You'd need to look at your producer code to see what formats are really sent. StringSerializer will always toString the data sent, and you can expect Serdes.String() to do something similar (if it consumes data from IntSerializer, then they will be strings instead)
If you really want to mix types, use JSON/Avro/Protobuf serializers/deserializers in all components in your codes

Related

Finding best delimiter by size of resulting array after split Kotlin

I am trying to obtain the best delimiter for my CSV file, I've seen answers that find the biggest size of the header row. Now instead of doing the standard method that would look something like this:
val supportedDelimiters: Array<Char> = arrayOf(',', ';', '|', '\t')
fun determineDelimiter(headerRow): Char {
var headerLength = 0
var chosenDelimiter =' '
supportedDelimiters.forEach {
if (headerRow.split(it).size > headerLength) {
headerLength = headerRow.split(it).size
chosenDelimiter = it
}
}
return chosenDelimiter
}
I've been trying to do it with some in-built Kotlin collections methods like filter or maxOf, but to no avail (the code below does not work).
fun determineDelimiter(headerRow: String): Char {
return supportedDelimiters.filter({a,b -> headerRow.split(a).size < headerRow.split(b)})
}
Is there any way I could do it without forEach?
Edit: The header row could look something like this:
val headerRow = "I;am;delimited;with;'semi,colon'"
I put the '' over an entry that could contain other potential delimiter
You're mostly there, but this seems simpler than you think!
Here's one answer:
fun determineDelimiter(headerRow: String)
= supportedDelimiters.maxByOrNull{ headerRow.split(it).size } ?: ' '
maxByOrNull() does all the hard work: you just tell it the number of headers that a delimiter would give, and it searches through each delimiter to find which one gives the largest number.
It returns null if the list is empty, so the method above returns a space character, like your standard method. (In this case we know that the list isn't empty, so you could replace the ?: ' ' with !! if you wanted that impossible case to give an error, or you could drop it entirely if you wanted it to give a null which would be handled elsewhere.)
As mentioned in a comment, there's no foolproof way to guess the CSV delimiter in general, and so you should be prepared for it to pick the wrong delimiter occasionally. For example, if the intended delimiter was a semicolon but several headers included commas, it could wrongly pick the comma. Without knowing any more about the data, there's no way around that.
With the code as it stands, there could be multiple delimiters which give the same number of headers; it would simply pick the first. You might want to give an error in that case, and require that there's a unique best delimiter. That would give you a little more confidence that you've picked the right one — though there's still no guarantee. (That's not so easy to code, though…)
Just like gidds said in the comment above, I would advise against choosing the delimiter based on how many times each delimiter appears. You would get the wrong answer for a header row like this:
Type of shoe, regardless of colour, even if black;Size of shoe, regardless of shape
In the above header row, the delimiter is obviously ; but your method would erroneously pick ,.
Another problem is that a header column may itself contain a delimiter, if it is enclosed in quotes. Your method doesn't take any notice of possible quoted columns. For this reason, I would recommend that you give up trying to parse CSV files yourself, and instead use one of the many available Open Source CSV parsers.
Nevertheless, if you still want to know how to pick the delimiter based on its frequency, there are a few optimizations to readability that you can make.
First, note that Kotlin strings are iterable; therefore you don't have to use a List of Char. Use a String instead.
Secondly, all you're doing is counting the number of times a character appears in the string, so there's no need to break the string up into pieces just to do that. Instead, count the number of characters directly.
Third, instead of finding the maximum value by hand, take advantage of what the standard library already offers you.
const val supportedDelimiters = ",;|\t"
fun determineDelimiter(headerRow: String): Char =
supportedDelimiters.maxBy { delimiter -> headerRow.count { it == delimiter } }
fun main() {
val headerRow = "one,two,three;four,five|six|seven"
val chosenDelimiter = determineDelimiter(headerRow)
println(chosenDelimiter) // prints ',' as expected
}

Model binding fails in webapi 2.0

I am using Webapi 2.0. I am passing one parameter having value as vb/c4t+UuRLnQ2W/g8SQ== After model binding i am getting the value of authId in my code as vb/c4t UuRLnQ2W/g8SQ== The (+) sign gets replaced with a space. Could you please help me out how can i get that.
Url: api/employee/1234?authId=vb/c4t+UuRLnQ2W/g8SQ==
[HttpGet]
public IHttpActionResult Get(string eid, string authId)
{
}
+ sign has a different meaning in the query string. It is used to represent a space. Another character that has semantic importance in the query string is & which is used to separate the various var=value pairs in the query string.
Most server side scripts would decode the query parameters before using them, so that a + gets properly converted to a space. Now, if you want a literal + to be present in the query string, you need to specify %2B instead.
Example yourString.replace("+","%2b")
Alternative method : You should URLEncode your query string values to make sure you are not loosing the content.
Another alternate way is like create your own code for + sign. for example 12sfdhjsj8722nsn2232dfsdd will represent a + sign. so you can replace the + sign with the code and in your server side you can get it back using the same code.

Knex, How do I query for strings instead of objects?

I have the following query for a PostgreSQL databse using knex:
knex('mytable').select('name').then(function(rows) {
console.log(rows[1].name);
var a = "Test";
var b = rows[1].name;
console.log(a + " " + b)
})
The query is working however the "rows[1].name" value is a... object thingy whatever which looks like {"value"} instead of simply a string containing the value 'value'.
My question here is: Am I doing something "wrong" ? Are we generally speaking supposed to work with this type of values when using SQL databases rather than plain old string values ? If so how exactly should i treat these objects (say if I wished to display the value inside of it on an html page)?
Furthermore, if I am to convert this object to a string, is there a knex function that allows me to do so (obviously I can do it using plain of js and substr but I'd think it would be rather inefficient, possibly not "The right way" to do such a thing) ?

Java: StringTokenizer does not respect separator

I have the following code that extracts tab-separated strings into a string array:
static public List<String> getContents(File aFile, String separator){
// all strings, split based on separator
List<String> contentList = new ArrayList<String>();
StringTokenizer tokenizer = new StringTokenizer(Util.getContents(aFile), separator);
while (tokenizer.hasMoreTokens()){
contentList.add(tokenizer.nextToken());
}
return contentList;
}
The separator in this case is therefore a "\t".
As long as two strings are separated by one tab, everything is great. However, my dataset sometimes has two strings between separated by two tabs. This means that one parameter is missing and an emptry string shoulid be added to the list. However the method ignores that and just returns an array with one string less.
In my particular case, I always want an array of 5 strings back. That means, a text containing only 4 tabs with no text returns an array of 5 empty strings (needed for a parsing job that is based on that). Unfortunately, I have no control over the content and I am working with millions of files that are generated out of my control.
Is there a better way to do this with StringTokenizer ? Or do I have to implement something on my own?
Here some examples:
String ok = a\tb\tc\td\te
String nok = a\tb\tc\t\te
Ralf
Found this: How to split a string in Java
and that I can do it with
"myString".split("\t", -1);
to obtain the empty strings if there are multiple separators custering in one place.
Thanks anyway!

Determining whether a column is an encryption key or plain text

We have a column of type varchar(25) in a SQL Server table that mistakenly had plain text values inserted when they should have been encrypted with AES. We are going to remove the plain text values from the database. The plan was to verify the block size of the field, though this would cause some unencrypted values to be left. Is there any other criteria I can check to reliably identify valid encrypted data?
We need it to be a T-SQL only solution.
Update
Just dug a little deeper, it's getting the values back from a web service. This web service encrypts them using AES in ASP.Net. It takes the returned byte array and then it uses this method to conver the byte array to a string:
static public string ByteArrToString(byte[] byteArr)
{
byte val;
string tempStr = "";
for (int i = 0; i <= byteArr.GetUpperBound(0); i++)
{
val = byteArr[i];
if (val < (byte)10)
tempStr += "00" + val.ToString();
else if (val < (byte)100)
tempStr += "0" + val.ToString();
else
tempStr += val.ToString();
}
return tempStr;
}
For clarity, I should say I did not originally write this code!
Cheers
Not really, especially since the encoding method doesn't look normal to me. It is more common to base64 encode the data which makes it very distinctive. It really depends what the unencrypted data consists of as to how easily it is to determine whether the data is encrypted or not - for instance, is it words, numbers, does it have spaces etc (since the encoded data has no spaces for instance).
It looks like your encoded data will all be numeric represented as a string so depending on length of data, you could see if your column will cast to a BIGINT.
Not sure the best way off the top of my head but there is an answer here that might help you "try cast" in T-SQL StackOverflow-8453861