Good way to format data in a large DataTable - vb.net

I have a large data.DataTable and some formatting rules to apply. I'm sure this is not a unique problem.
For example, the LASTNAME column has a value of "Jones" but my formatting rule requires it be 10 characters padded with spaces on the right and uppercase only. Like: "JONES "
My initial thought is to loop through each row and generate a string. But, I wonder if I could accomplish this more efficiently with a DataView, LINQ or something else.
Can someone point me in a direction?

It really depends how you display the results. I would say if you display it in a grid, the easiest would be to do a quick loop, no real performance harm there in a datatable.
If you display the records individually you can create an extension method for your string, and simply call it like this for example. LastName.Padded()
public static class StringExtensions
{
public static string Padded(this string s)
{
return s.ToUpper().PadRight(10);
}
}

Related

Finding best delimiter by size of resulting array after split Kotlin

I am trying to obtain the best delimiter for my CSV file, I've seen answers that find the biggest size of the header row. Now instead of doing the standard method that would look something like this:
val supportedDelimiters: Array<Char> = arrayOf(',', ';', '|', '\t')
fun determineDelimiter(headerRow): Char {
var headerLength = 0
var chosenDelimiter =' '
supportedDelimiters.forEach {
if (headerRow.split(it).size > headerLength) {
headerLength = headerRow.split(it).size
chosenDelimiter = it
}
}
return chosenDelimiter
}
I've been trying to do it with some in-built Kotlin collections methods like filter or maxOf, but to no avail (the code below does not work).
fun determineDelimiter(headerRow: String): Char {
return supportedDelimiters.filter({a,b -> headerRow.split(a).size < headerRow.split(b)})
}
Is there any way I could do it without forEach?
Edit: The header row could look something like this:
val headerRow = "I;am;delimited;with;'semi,colon'"
I put the '' over an entry that could contain other potential delimiter
You're mostly there, but this seems simpler than you think!
Here's one answer:
fun determineDelimiter(headerRow: String)
= supportedDelimiters.maxByOrNull{ headerRow.split(it).size } ?: ' '
maxByOrNull() does all the hard work: you just tell it the number of headers that a delimiter would give, and it searches through each delimiter to find which one gives the largest number.
It returns null if the list is empty, so the method above returns a space character, like your standard method. (In this case we know that the list isn't empty, so you could replace the ?: ' ' with !! if you wanted that impossible case to give an error, or you could drop it entirely if you wanted it to give a null which would be handled elsewhere.)
As mentioned in a comment, there's no foolproof way to guess the CSV delimiter in general, and so you should be prepared for it to pick the wrong delimiter occasionally. For example, if the intended delimiter was a semicolon but several headers included commas, it could wrongly pick the comma. Without knowing any more about the data, there's no way around that.
With the code as it stands, there could be multiple delimiters which give the same number of headers; it would simply pick the first. You might want to give an error in that case, and require that there's a unique best delimiter. That would give you a little more confidence that you've picked the right one — though there's still no guarantee. (That's not so easy to code, though…)
Just like gidds said in the comment above, I would advise against choosing the delimiter based on how many times each delimiter appears. You would get the wrong answer for a header row like this:
Type of shoe, regardless of colour, even if black;Size of shoe, regardless of shape
In the above header row, the delimiter is obviously ; but your method would erroneously pick ,.
Another problem is that a header column may itself contain a delimiter, if it is enclosed in quotes. Your method doesn't take any notice of possible quoted columns. For this reason, I would recommend that you give up trying to parse CSV files yourself, and instead use one of the many available Open Source CSV parsers.
Nevertheless, if you still want to know how to pick the delimiter based on its frequency, there are a few optimizations to readability that you can make.
First, note that Kotlin strings are iterable; therefore you don't have to use a List of Char. Use a String instead.
Secondly, all you're doing is counting the number of times a character appears in the string, so there's no need to break the string up into pieces just to do that. Instead, count the number of characters directly.
Third, instead of finding the maximum value by hand, take advantage of what the standard library already offers you.
const val supportedDelimiters = ",;|\t"
fun determineDelimiter(headerRow: String): Char =
supportedDelimiters.maxBy { delimiter -> headerRow.count { it == delimiter } }
fun main() {
val headerRow = "one,two,three;four,five|six|seven"
val chosenDelimiter = determineDelimiter(headerRow)
println(chosenDelimiter) // prints ',' as expected
}

wxWidgets - wxGrid - reading/writing non string cell values

I have a wxGrid to edit an array of numerical data.
I was wondering what's the best way to get non-string data in and out of the cells without going through the string to numeric conversion all the time.
I've used SetCellEditor() to control the data entry.
currently I use this:
// numeric value into cell
str.clear();
str << val1;
m_grid4->SetCellValue(row, col, str);
..
// read value from back into variable
val = atoi(m_grid4->GetCellValue(row, col));
Apart from the fact that atoi() is a bit ugly and a template function with a stringstream would be better, is there a way do get non-string values a bit better in and out of cells?
I was looking at the editors and renderers but can't figure it out.
If you worry about efficiency, you almost certainly should use a custom table class deriving from wxGridTableBase instead of using the default trivial wxGridStringTable implementation which stores everything as strings. Then, and much less importantly, if it makes sense in your case, you can use wxGridCellNumberRenderer which will call your table GetValueAsLong() method instead of GetValue() (which returns a string).
Both of those are demonstrated in wxGrid sample, notably look at BugsGridTable there.
Good luck!

Regex match SQL values string with multiple rows and same number of columns

I tried to match the sql values string (0),(5),(12),... or (0,11),(122,33),(4,51),... or (0,121,12),(31,4,5),(26,227,38),... and so on with the regular expression
\(\s*\d+\s*(\s*,\s*\d+\s*)*\)(\s*,\s*\(\s*\d+\s*(\s*,\s*\d+\s*)*\))*
and it works. But...
How can I ensure that the regex does not match a values string like (0,12),(1,2,3),(56,7) with different number of columns?
Thanks in advance...
As i mentioned in comment to the question, the best way to check if input string is valid: contains the same count of numbers between brackets, is to use client side programm, but not clear SQL.
Implementation:
List<string> s = new List<string>(){
"(0),(5),(12)", "(0,11),(122,33),(4,51)",
"(0,121,12),(31,4,5),(26,227,38)","(0,12),(1,2,3),(56,7)"};
var qry = s.Select(a=>new
{
orig = a,
newst = a.Split(new string[]{"),(", "(", ")"},
StringSplitOptions.RemoveEmptyEntries)
})
.Select(a=>new
{
orig = a.orig,
isValid = (a.newst
.Sum(b=>b.Split(new char[]{','},
StringSplitOptions.RemoveEmptyEntries).Count()) %
a.newst.Count()) ==0
});
Result:
orig isValid
(0),(5),(12) True
(0,11),(122,33),(4,51) True
(0,121,12),(31,4,5),(26,227,38) True
(0,12),(1,2,3),(56,7) False
Note: The second Select statement gets the modulo of sum of comma instances and the count of items in string array returned by Split function. If the result isn't equal to zero, it means that input string is invalid.
I strongly believe there's a simplest way to achieve that, but - at this moment - i don't know how ;)
:(
Unless you add some more constraints, I don't think you can solve this problem only with regular expressions.
It isn't able to solve all of your string problems, just as it cannot be used to check that the opening and closing of brackets (like "((())()(()(())))") is invalid. That's a more complicated issue.
That's what I learnt in class :P If someone knows a way then that'd be sweet!
I'm sorry, I spent a bit of time looking into how we could turn this string into an array and do more work to it with SQL but built in functionality is lacking and the solution would end up being very hacky.
I'd recommend trying to handle this situation differently as large scale string computation isn't the best way to go if your database is to gradually fill up.
A combination of client and serverside validation can be used to help prevent bad data (like the ones with more numbers) from getting into the database.
If you need to keep those numbers then you could rework your schema to include some metadata which you can use in your queries, like how many numbers there are and whether it all matches nicely. This information can be computed inexpensively from your server and provided to the database.
Good luck!

What is the best way to search an Oracle CLOB column?

I need to search a CLOB column and am looking for the best way to do this, I've seen variants online of using the DBMS_LOB package as well as using something called Oracle Text. Can someone provide a quick example of how to do this?
Oracle Text indexing is the way go. You can use either CONTEXT or CTXRULE index. CONTEXT can be used on unstructured document where CTXRULE is more helpful on structured documents.
This link will provide more info the index types & syntax.
The most important factor you need to consider is LEXER & STOPLIST.
You can also read the posts on asktom.oracle.com
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:5533095920114
What is in your CLOB and what are you searching for ?
Oracle Text is good if you are searching for words or phrases (which is probably what you have in a CLOB). Sometimes you'll store something 'strange' in a CLOB, like XML or the return value of a web-service call and that might be a different kettle of fish.
I needed to do this just recently and came up with the following solution (uses Spring JDBC)
String sql = "select * from clobtest where dbms_lob.instr(myclob, ? , 1, 1) > 0";
return (String) getSimpleJdbcTemplate().getJdbcOperations().queryForObject(sql, new RowMapper<Object>() {
public String mapRow(ResultSet rs, int rowNum) throws SQLException {
String clobText = lobHandler.getClobAsString(rs, "myclob");
return clobText;
}
}, searchText);
Seems to work pretty well, but I'm going to do some performance testing to see how well it works under load.

sorting and getting uniques

i have a string that looks like this
"apples,fish,oranges,bananas,fish"
i want to be able to sort this list and get only the uniques. how do i do it in vb.net? please provide code
A lot of your questions are quite basic, so rather than providing the code I'm going to provide the thought process and let you learn from implementing it.
Firstly, you have a string that contains multiple items separated by commas, so you're going to need to split the string at the commas to get a list. You can use String.Split for that.
You can then use some of the extension methods for IEnumerable<T> to filter and order the list. The ones to look at are Enumerable.Distinct and Enumerable.OrderBy. You can either write these as normal methods, or use Linq syntax.
If you need to get it back into a comma-separated string, then you'll need to re-join the strings using the String.Join method. Note that this needs an array so Enumerable.ToArray will be useful in conjunction.
You can do it using LINQ, like this:
Dim input = "apples,fish,oranges,bananas,fish"
Dim strings = input.Split(","c).Distinct().OrderBy(Function(s) s)
I'm not a VB.NET programmer, but I can give you a suggestion:
Split the string into an array
Create a second array
Cycle through the first array, adding any value that is not in the second.
Upon completion, your second array will have only unique values.