Compare 2 datasets with dbunit? - testing

Currently I need to create tests for my application. I used "dbunit" to achieve that and now need to compare 2 datasets:
1) The records from the database I get with QueryDataSet
2) The expected results are written in the appropriate FlatXML in a file which I read in as a dataset as well
Basically 2 datasets can be compared this way.
Now the problem are columns with a Timestamp. They will never fit together with the expected dataset. I really would like to ignore them when comparing them, but it doesn't work the way I want it.
It does work, when I compare each table for its own with adding a column filter and ignoreColumns. However, this approch is very cumbersome, as many tables are used in that comparison, and forces one to add so much code, it eventually gets bloated.
The same applies for fields which have null-values
A probable solution would also be, if I had the chance to only compare the very first column of all tables - and not by naming it with its column name, but only with its column index. But there's nothing I can find.
Maybe I am missing something, or maybe it just doesn't work any other way than comparing each table for its own?

For the sake of completion some additional information must be posted. Actually my previously posted solution will not work at all as the process reading data from the database got me trapped.
The process using "QueryDataset" did read the data from the database and save it as a dataset, but the data couldn't be accessed from this dataset anymore (although I could see the data in debug mode)!
Instead the whole operation failed with an UnsupportedOperationException at org.dbunit.database.ForwardOnlyResultSetTable.getRowCount(ForwardOnlyResultSetTable.java:73)
Example code to produce failure:
QueryDataSet qds = new QueryDataSet(connection);
qds.addTable(“specificTable”);
qds.getTable(„specificTable“).getRowCount();
Even if you try it this way it fails:
IDataSet tmpDataset = connection.createDataSet(tablenames);
tmpDataset.getTable("specificTable").getRowCount();
In order to make extraction work you need to add this line (the second one):
IDataSet tmpDataset = connection.createDataSet(tablenames);
IDataSet actualDataset = new CachedDataSet(tmpDataset);
Great, that this was nowhere documented...
But that is not all: now you'd certainly think that one could add this line after doing a "QueryDataSet" as well... but no! This still doesn't work! It will still throw the same Exception! It doesn't make any sense to me and I wasted so much time with it...
It should be noted that extracting data from a dataset which was read in from an xml file does work without any problem. This annoyance just happens when trying to get a dataset directly from the database.
If you have done the above you can then continue as below which compares only the columns you got in the expected xml file:
// put in here some code to read in the dataset from the xml file...
// and name it "expectedDataset"
// then get the tablenames from it...
String[] tablenames = expectedDataset.getTableNames();
// read dataset from database table using the same tables as from the xml
IDataSet tmpDataset = connection.createDataSet(tablenames);
IDataSet actualDataset = new CachedDataSet(tmpDataset);
for(int i=0;i<tablenames.length;i++)
{
ITable expectedTable = expectedDataset.getTable(tablenames[i]);
ITable actualTable = actualDataset.getTable(tablenames[i]);
ITable filteredActualTable = DefaultColumnFilter.includedColumnsTable(actualTable, expectedTable.getTableMetaData().getColumns());
Assertion.assertEquals(expectedTable,filteredActualTable);
}

You can also use this format:
// Assert actual database table match expected table
String[] columnsToIgnore = {"CONTACT_TITLE","POSTAL_CODE"};
Assertion.assertEqualsIgnoreCols(expectedTable, actualTable, columnsToIgnore);

Related

SHOW KEYS in Aerospike?

I'm new to Aerospike and am probably missing something fundamental, but I'm trying to see an enumeration of the Keys in a Set (I'm purposefully avoiding the word "list" because it's a datatype).
For example,
To see all the Namespaces, the docs say to use SHOW NAMESPACES
To see all the Sets, we can use SHOW SETS
If I want to see all the unique Keys in a Set ... what command can I use?
It seems like one can use client.scan() ... but that seems like a super heavy way to get just the key (since it fetches all the bin data as well).
Any recommendations are appreciated! As of right now, I'm thinking of inserting (deleting) into (from) a meta-record.
Thank you #pgupta for pointing me in the right direction.
This actually has two parts:
In order to retrieve original keys from the server, one must -- during put() calls -- set policy to save the key value server-side (otherwise, it seems only a digest/hash is stored?).
Here's an example in Python:
aerospike_client.put(key, {'bin': 'value'}, policy={'key': aerospike.POLICY_KEY_SEND})
Then (modified Aerospike's own documentation), you perform a scan and set the policy to not return the bin data. From this, you can extract the keys:
Example:
keys = []
scan = client.scan('namespace', 'set')
scan_opts = { 'concurrent': True, 'nobins': True, 'priority': aerospike.SCAN_PRIORITY_MEDIUM }
for x in (scan.results(policy=scan_opts)): keys.append(x[0][2])
The need to iterate over the result still seems a little clunky to me; I still think that using a 'master-key' Record to store a list of all the other keys will be more performant, in my case -- in this way, I can simply make one get() call to the Aerospike server to retrieve the list.
You can choose not bring the data back by setting includeBinData in ScanPolicy to false.

Keeping table formatting in Sage with multiple tables

As the title suggests, I am trying to keep proper table formatting in Sage while displaying multiple tables (this is strictly a formatting question, so no knowledge of the math involved is necessary). Currently, I am using the following code:
my_table2 = table([column1, column2], frame = True)
my_table1 = table([in_the_cone, lengths_in_cone], frame = True)
result_table1 = my_table1.transpose()
result_table2 = my_table2.transpose()
result_table1
result_table2
With this, I receive no output for table1 and the following output for table2:
I want both tables to look this way, but having no output for the first table is no good. So I tried changing the bottom two lines to:
result_table1, result_table2
While this does display both tables, the formatting now looks like:
Is there a way I can display both tables at the same time with the first formatting?
It would have been nice for you to include a full minimal working example, but in any case it does depend a little on the output.
Basically, in a notebook or other "cell", only the last return value prints to the screen in some fashion (sometimes via a "hook" as in your case). But if you use the comma, that implicitly creates a "tuple" which is then printed as a tuple, so you lose that "hook" to display things with math modes (since a tuple doesn't have that).
In this case, the (newish) canonical way to achieve what you want is
pretty_print(result_table1)
pretty_print(result_table2)
though you may want to put print "\n" in between so they don't end up right on top of each other.
Edit: Here is a picture in Jupyter inside of Sage.

JSON issue with SQL lines return

I have an issue when I try to parse my JSON. I create my JSON "by my hand" like this in PHP :
$outp ='{"records":['.$outp.']}'; and I create it so I can take field from my database to show them in the page. The thing is, in my database I have a field "description" where people can give a description about something. So some people make return to line like this for example :
Interphone
Equipe:
Canape-lit
Autre:
Local
And when I try to parse my JSON there is an error because of these line's return. "SyntaxError: Unexpected token".
Here's an example of my JSON :
{"records":[{"Parking":"Aucun","Description":"Interphone
Equipé :
Canapé-lit
","Chauffage":"Fioul"}]}
Can someone help me please ?
You've really dug yourself into a very bad hole here.
The problem
The problem you're running into is that a newline (line feed and carriage return characters) are not valid JSON. They must be escaped as \n and \r. You can see the full JSON standard here here.
You need to do two things.
Fix your code
In spite of the fact that the JSON standard is comparatively simple, you should not create your JSON by hand. You already know why. You have to handle several edge cases and the like. Your users could enter anything on the page, and you need to make sure that it gets properly encoded no matter what.
You need to use a JSON serialization tool. json_encode is built in as of 5.2. If you can't use this for any reason, find an existing, widely used (and therefore heavily tested) third party library with a JSON serializer.
If you're asking, "Why can't I create my own serializer?", you could, in theory. Realistically, there is no point. Yours won't be better than existing ones. It will be much more likely to have bugs and to perform worse than something a lot of people have used in production. It will also take much longer to create and test than using an existing one.
If you need this data in code after you pull it back out of the database, then you need a JSON deserializer. json_decode should also be fine, but again, if you can't use it, look for a widely used third party library.
Fix your data
If you haven't hit production yet, you have really dodged a bullet here, and you can skip this whole section. If you have gone to production and you have data from users, you've got a major problem.
Even after you fix your code, you still have bad data in your production database that won't parse correctly. You have to do something to make this data usable. Unfortunately, it is impossible to automatically recover the original data for every possible case. This is because users might have entered the characters/substrings you added to the data to turn it into "JSON"; for example, they might have entered a comma separated list of quoted words: "dog","cat","pig", and "cow". That is an intractable problem, since you know for a fact you didn't properly serialize all your incoming input. There's no way to tell the difference between text your code generated and text the user entered. You're going to have to settle for a best effort and try to throw errors when you can't figure it out in code, and it might mess up a user's data in some special cases. You might have to fix some things manually.
Start by discussing this with your manager, team lead, whoever you answer to. Assuming that you can't lose the data, this is the most sound process to follow for creating a fix for your data:
Create a database dump of your production data.
Import that dump into a development database.
Develop and test your method of repairing this data against the development database from the last step.
Ensure you have a recovery plan for deployments gone wrong. Test this plan in your testing environment.
Once you've gone through your typical release process, it's time to release the fixed code and the data update together.
Take the website offline.
Back up the database.
Update the website with the new code.
Implement your data fix.
Verify that it worked.
Bring the site online.
If your data fix doesn't work (possibly because you didn't think of an edge case or something), then you have a nice back up you can restore and you can cancel the release. Then go back to step 1.
As for how you can fix the data, I don't recommend queries here. I recommend a little script tool. It would have to load the data from the database, pull the string apart, try to identify all the pieces, build up an object from those pieces, and finally serialize them to JSON correctly, and put them back into the database.
Here's an example function of how you might go about pulling the string apart:
const ELEMENT_SEPARATOR = '","';
const PAIR_SEPARATOR = '":"';
function recover_object_from_malformed_json($malformed_json, $known_keys) {
$tempData = substr($malformed_json, 14); // Removes {"records":[{" prefix
$tempData = substr($tempData, 0, -4); // Removes "}]} suffix
$tempData = explode(ELEMENT_SEPARATOR, $tempData); // Split into what we think are pairs
$data = array();
$lastKey = NULL;
foreach ($tempData as $i) {
$explodedI = explode(KEY_VALUE_SEPARATOR, $i, 2); // Split what we think is a key/value into key and value
if (in_array($explodedI[0], $known_keys)) { // Check if it's actually a key
// It's a key
$lastKey = $explodedI[0];
if (array_key_exists($lastKey, $data)) {
throw new RuntimeException('Duplicate key: ' + $lastKey);
}
// Assign the value to the key
$data[$lastKey] = $explodedI[1];
}
else {
// This isn't a key vlue pair, near as we can tell
// So it must actually be part of the last value,
// and the user actually entered the delimiter as part of the value.
if (is_null($lastKey)) {
// This one is REALLY messed up
throw new RuntimeException('Does not begin with a known key');
}
$data[$lastKey] += ELEMENT_SEPARATOR;
$data[$lastKey] += $i;
}
}
return $data;
}
Note that I'm assuming that your "list" is a single element. This gets much harder and much messier if you have more than one. You'll also need to know ahead of time what keys you expect to have. The bottom line is that you have to undo whatever your code did to create the "JSON", and you have to do everything you can to try to not mess up a user's data.
You would use it something like this:
$knownKeys = ["Parking", "Description", "Chauffage"];
// Fetch your rows and loop over them
foreach ($dbRows as $row) {
try {
$dataFromDb = $row.myData // or however you would pull out this string.
$recoveredData = recover_object_from_malformed_json($dataFromDb);
// Save it back to the DB
$row.myData = json_encode($recoveredData);
// Make sure to commit here.
}
catch (Exception $e) {
// Log the row's ID, the content that couldn't be fixed, and the exception
// Make sure to roll back here
}
}
(Forgive me if the database stuff looks really wonky. I don't do PHP, so I have no idea how that code should look. Hopefully, you can at least get the concept.)
Why I don't recommend trying to parse your data as JSON to recover it.
The bottom line is that your data in the database is not JSON. IF you try to parse it as such, all the other edge cases you didn't handle properly will get screwed up in the process. You'll see bad things like
\\ becomes \
\j becomes j
\t becomes a tab character
In the end, it will just mess up your data even more.
Conclusion
This is a huge mess, and you should never try to convert something into a standard format without using a properly built, well tested serializer. Fixing the data is going to be hard, and it's going to take time. I also seriously doubt you have a lot of background in text processing techniques, and lacking that knowledge is going to make this harder. You can get some good info on text processing by studying how compilers are made. Good luck.

Using Orderby on BatchedJoinBlock(Of T1, T2) - Dataflow (Task Parallel Library)

I'm just looking to be able to sort the results of a BatchedJoinBlock (http://msdn.microsoft.com/en-us/library/hh194683.aspx) so that the different results of the different targets stay together. I will explain! Example in some pseudo-code:
Dim batchedJoin = New BatchedJoinBlock(Of String, object)(4)
batchedJoin.Target1.Post("String1Target1")
batchedJoin.Target2.Post(CType(BuildIt, StringBuilder1))
batchedJoin.Target1.Post("String1Target2")
batchedJoin.Target2.Post(CType(BuildIt, StringBuilder2))
Dim results = batchedJoin.Receive()
'This sorts one result...
Dim SortByResult = results.Item1.OrderBy(Function(item) item.ToString, New NaturalStringComparer)
Basically I've got a string and an object, the SortByResult variable above sorts the strings exactly as I'd like them to sort. I'm looking for a way to get the objects that used to be at the same index number in target2 into the same order. e.g. if "String1Target1" changes order I'd like to somehow reliably refer to/pair it together with "StringBuilder1". The actual end result just needs to be that the objects (target2) are sorted in the order that is dictated by the strings being sorted (target1). Something like:
Dim EndResult = results.Item2.OrderBy(strings in target1)
but I'll gladly take an intermediate solution! I've also tried using a dictionary (results.Item2.ToDictionary) with the string as a key (which would also be a fine solution) but it's a bit beyond my ken using lamba expressions in the proper context. I can realistically do this in several steps with a list or something, but I'm trying to get something more efficient/learn something, and it seems like there's a lot of default options with the results of the jointblock that I'm just not experienced enough to use. Thanks in advance for any help you can provide!
To me, it looks like you don't actually want BatchedJoinBlock, because the two pieces of data always come together. A better option for that would be a BatchBlock of Tuple<string, object>. When you have that, you can then use LINQ directly to sort each batch:
results.OrderBy(Function(tuple) tuple.Item1)

Searching Multiple Results using Streamreader

Does anyone know of a method that allows you to search a string through a text file using StreamReader that allows you to account for multiple instances of finding the results. Basically I am creating a booking application and each time a customer books a seat, their PrimaryKey, FirstName, LastName and the co-ordinates of the seat on a data grid (which I have used as a method to book seats) are generated then saved to a text file.
I want the ability to be able to read multiple instances of a PrimaryKey then find the seat co-ordinates of each line that this PrimaryKey is listed on and repopulate another similar datagridview with these co-ordinates which is all going to be driven by a combobox index change.
It seems a bit complicated to understand but if anyone can help then please let me know.
I just need the knowhow of how to search multiple instances, so after its found the string once then look through the rest of the file to find another instance, I can do the rest by myself.
I'm coding using Visual Basic.Net
Yes it's possible to search multiple times through a file, but you'd either have to reopen the file, or rewind the stream (FileStream.Seek).
Wildly inefficient though.
If it has to remain an unsorted and unstructured file, build an in memory index to it.
If your key is an integer, create a Dictionary<int,int> of Key and Position in the stream.
Then when you want find key X you use FileStream.Seek to move to it, and read a line to get the data. If you find yourself grouping by say aeroplaneID, build a Dictionary<Int, List<Int,Int>>
where the key is the aeroplane id and the list is a list of primary keys and positions in the file.
You could push all that off to a background thread. You could try and get really clever and build them up as you need them. Personally though I'd be trying to move my storage to a more suitable format. You aren't struggling to do this because you've missed a class, you are struggling because you shouldn't.
Something like
Dictionary<int, int> _fileIndex = new Dictionary<int,int>();
using(FileStream fs = new FileStream(DataFileName,FileMode.Open,FileAccess.Read))
{
StreamReader reader = new StreamReader(fs);
int lastPosition = 0;
string currentLine = null;
while(currentLine = reader.ReadLine() != null)
{
String[] data = currentLine.Split(new char[] {','});
int key = int.Parse(data[0]);
fileIndex.Add(key,lastPosition);
lastPosition = fs.Position;
}
}
NB didn't test the above and there should be a bit more error checking in it. If there's alot of data in the line, then might be better off not suing split and just pulling out everything up to the correct delimiter. Also be careful how many indexes you keep live, wouldn't be long before they used up more space than just reading the entire thing in to memory.
Then you could create a class or structure to implement a line in the file, and write a bit of code
to use FileStream.Seek) to get there. If you wanted to load up a bunch of 'em it would make sense to get your list of positions of each one in the file and then sort them in order, then you could rip through the file in it's 'order' picking them out.