Elasticsearch testing(unit/integration) best practices in C# using Nest - testing

I've been seraching for a while how should I test my data access layer with not a lot of success. Let me list my concerns:
Unit tests
This guy (here: Using InMemoryConnection to test ElasticSearch) says that:
Although asserting the serialized form of a request matches your
expectations in the SUT may be sufficient.
Does it really worth to assert the serialized form of requests? Do these kind of tests have any value? It doesn't seem likely to change a function that should not change the serialized request.
If it does worth it, what is the correct way to assert these reqests?
Unit tests once again
Another guy (here: ElasticSearch 2.0 Nest Unit Testing with MOQ) shows a good looking example:
void Main()
{
var people = new List<Person>
{
new Person { Id = 1 },
new Person { Id = 2 },
};
var mockSearchResponse = new Mock<ISearchResponse<Person>>();
mockSearchResponse.Setup(x => x.Documents).Returns(people);
var mockElasticClient = new Mock<IElasticClient>();
mockElasticClient.Setup(x => x
.Search(It.IsAny<Func<SearchDescriptor<Person>, ISearchRequest>>()))
.Returns(mockSearchResponse.Object);
var result = mockElasticClient.Object.Search<Person>(s => s);
Assert.AreEqual(2, result.Documents.Count()).Dump();
}
public class Person
{
public int Id { get; set;}
}
Probably I'm missing something but I can't see the point of this code snippet. First he mocks an ISearchResponse to always return the people list. then he mocks an IElasticClient to return this previous search response to any search request he makes.
Well it doesn't really surprise me the assertion is true after that. What is the point of these kind of tests?
Integration tests
Integration tests do make more sense to me to test a data access layer. So after a little search i found this (https://www.nuget.org/packages/elasticsearch-inside/) package. If I'm not mistaken this is only about an embedded JVM and an ES. Is it a good practice to use it? Shouldn't I use my already running instance?
If anyone has good experience with testing that I didn't include I would happily hear those as well.

Each of the approaches that you have listed may be a reasonable approach to take, depending on exactly what it is you are trying to achieve with your tests. you haven't specified this in your question :)
Let's go over the options that you have listed
Asserting the serialized form of the request to Elasticsearch may be a sufficient approach if you build a request to Elasticsearch based on a varying number of inputs. You may have tests that provide different input instances and assert the form of the query that will be sent to Elasticsearch for each. These kinds of tests are going to be fast to execute but make the assumption that the query that is generated and you are asserting the form of is going to return the results that you expect.
This is another form of unit test that stubs out the interaction with the Elasticsearch client. The system under test (SUT) in this example is not the client but another component that internally uses the client, so the interaction with the client is controlled through the stub object to return an expected response. The example is contrived in that in a real test, you wouldn't assert on the results of the client call as you point out but rather on the output of the SUT.
Integration/Behavioural tests against a known data set within an Elasticsearch cluster may provide the most value and go beyond points 1 and 2 as they will not only incidentally test the generated queries sent to Elasticsearch for a given input, but will also be testing the interaction and producing an expected result. No doubt however that these types of test are harder to setup than 1 and 2, but the investment in setup may be outweighed by their benefit for your project.
So, you need to ask yourself what kinds of tests are sufficient to achieve the level of assurance that you require to assert that your system is doing what you expect it to do; it may be a combination of all three different approaches for different elements of the system.
You may want to check out how the .NET client itself is tested; there are components within the Tests project that spin up an Elasticsearch cluster with different plugins installed, seed it with known generated data and make assertions on the results. The source is open and licensed under Apache 2.0 license, so feel free to use elements within your project :)

Related

Karate Standalone as Mock Server with multiple Feature Files

I try to setup an integration/API test suite with Karate and consider to use Karate Netty for mocking required services. For the test setup the system under test A (a Spring Boot app) is started up completely. The Karate tests are then executed by a Maven test run against this instance.
The service A depends on multiple other services these needs to be mocked away for the tests. To do so my idea was to configure a running Karate Netty standalone instance as HTTP proxy (done by JVM args of the service A).
Now my idea was to create one test feature file: xyz-test.feature
And the required mocks for this file are defined in an associated mock feature file: xyz-mock.feature
(The test scenarios are rather complex and the responses of the external services could vary)
This means for a full test run I need to load up a couple of mock feature files. So:
What is the matching strategy for multiple mock feature files? Which scenario wins, so to say.
Is there any way to ensure, that the right mock file is used for the associated test file?
(Clearly I can reconfigure the running standalone instance and advice it to use xyz-mock.feature next.
But this would stop me from using parallel execution for my API tests, right?)
I already thought about reusing the Correlation-Id which I can send in for each test and then match against this in the mock file (it is also sent to all called services). But:
Is there a way to define a global matcher per mock file?
It sounds like you need only one mock file. You could boot 2 on different ports if you wanted, but there is no way to "merge" them into one port - if that is what you were looking for.
In my experience, you will be able to have a single mock take care of all your edge cases. This is because Karate's approach is un-conventional: you pretty much write a stateful server. But by keeping variables in memory and some clever JSON-path, you can simulate CRUD with very few lines of code: https://github.com/intuit/karate/tree/master/karate-netty#background
You can use only one at a time, by design
Given the above limitation, here's an interesting idea: add something like an extra pathMatches('/__test/reset') scenario that cleans-up your state and sets the Background variables to things like * def cats = []. Now in each feature, just call the special "reset" URL at the start. The good thing is Karate is thread-safe. Another idea as you said is you can maintain two or three different variables and use some logic to "route" based on a header, again very easy IMO. Use a map of maps, e.g:
def data = { cats1: {}, cats2: {}, cats3: {} }
And you can get the header, e.g. if it is mode: cats1
* def mode = karate.get('requestHeaders.mode[0]')
* def cats = data[mode]
not sure if this answers your question, but if the last Scenario has an "empty" description, it is a "catch all" and can in theory delegate to another server (or mock): https://github.com/intuit/karate/tree/develop/karate-netty#proxy-mode
Your question is a little confusing, so you may have to edit and re-word it if I haven't understood.
EDIT: using multiple mock files should be possible in 1.1.0 onwards: https://github.com/intuit/karate/issues/1566

Porting PHP API over to Parse

I am a PHP dev looking to port my API over to the Parse platform.
Am I right in thinking that you only need cloud code for complex operations? For example, consider the following methods:
// Simple function to fetch a user by id
function getUser($userid) {
return (SELECT * FROM users WHERE userid=$userid LIMIT 1)
}
// another simple function, fetches all of a user's allergies (by their user id)
function getAllergies($userid) {
return (SELECT * FROM allergies WHERE userid=$userid)
}
// Creates a script (story?) about the user using their user id
// Uses their name and allergies to create the story
function getScript($userid) {
$user = getUser($userid)
$allergies = getAllergies($userid).
return "My name is {$user->getName()}. I am allergic to {$allergies}"
}
Would I need to implement getUser()/getAllergies() endpoints in Cloud Code? Or can I simply use Parse.Query("User")... thus leaving me with only the getScript() endpoint to implement in cloud code?
Cloud code is for computation heavy operations that should not be performed on the client, i.e. handling a large dataset.
It is also for performing beforeSave/afterSave and similar hooks.
In your example, providing you have set up a reasonable data model, none of the operations require cloud code.
Your approach sounds reasonable. I tend to put simply queries that will most likely not change on the client side, but it all depends on your scenario. When developing mobile apps I tend to put a lot of code in cloud code. I've found that it speeds up my development cycle. For example, if someone finds a bug and it's in cloud code, make the fix, run parse deploy, done! The change is available to all mobile environments instantly!!! If that same code is in my mobile app, it really sucks, cause now I have to fix the bug, rebuild, push it to the app store/google play, wait x number of days for it to be approved, have the users download it... you see where I'm going here.
Take for example your
SELECT * FROM allergies WHERE userid=$userid query.
Even though this is a simple query, what if you want to sort it? maybe add some additional filtering?
These are the kinds of things I think of when deciding where to put the code. Hope this helps!
As a side note, I have also found cloud code very handy when needing to add extra security to my apps.

How to apply TDD to web API development

I'm designing a web service running on Google App Engine that scrapes a number of websites and presents their data via a RESTful interface. Based on some background reading, I think I'd like to attempt Test Driven Development (TDD) and develop my tests before I write any business code.
My problem is caused by the fact that my list of scraped elements includes timetables and other records that change quite frequently. The limit of my knowledge on TDD is that you write tests that examine the results of code execution and compare these results to a hardcoded result set. Seeing as the data set changes frequently, this method seems impossible. Assuming that this is true, what would be the best approach to test such an API? How would a large-scale web API be tested (Twitter, Google, Netflix etc.)?
You have to choose the type of test:
Unit tests just test proper operation of your modules (units). You provide input data and test that code outputs proper results. If there are system dependent classes you try to mock them or in case of GAE services, you use google provided local services. Unit tests can be run locally on your machine or on CI servers. There are two popular unit test libs for java: Junit & TestNG.
Integration tests check that various modules (internal & external) work together - they basically check that APIs between modules are working. They are usually run on real servers and call real external services. They are technology specific and are harder to run.
In your case, I'd go with unit tests and provide sets of different input data which you logic should parse and act upon. Since your flow is pretty simple (load data from fixed Url, parse it) you could also embed loading of real data into unit tests (we do this when we parse external sources).
From what you are describing you could easily find yourself writing integration tests. If your aim is to test the logic for processing what is returned from the scraped data (e.g. you know that you are going to get a timetable in a specific format coming in and you now have logic to process that data) you will need to create a SEAM between your web services logic and your processing logic. Once you have done this you should be able to mock the data that is returned from the web service call to always return the same table data and then you can write consistent unit tests against it.
public class ScrapingService : IScrapingService
{
public string Scrape(string url)
{// scraping logic}
}
public interface IScrapingService
{
string Scrape(string url);
}
public class ScrapingProcessor
{
private IScrapingService _scrapingService
// inject the dependency
pubilc ScrapingProcessor(IScrapingService scrapingService)
{
_scrapingService = scrapingService;
}
public void Process(string url)
{
var scrapedData = _scrapingService.Scrape(url)
// now process the scrapedData
}
}
To test you can now create a FakeScrapingService that implements the IScrapingService interface and then return whatever data you like from the Scrape method. There are some very good Mocking frameworks out there that make this type of thing easy. My personal favorite is NSubstitue.
I hope this explanation helps.

SpecFlow - How to use data driven tests like NUnits TestCaseSource property?

I'm a QA who decided to use SpecFlow for my test automation after some consideration. I think it's brilliant, but missing one feature which I did use often with other test runners such as NUnit - something similar to the TestCaseSource property from NUnit to specify a potentially dynamic set of data for tests to be ran against at run time.
I would often have different data in each environment the test should run in, so cannot specify hardcoded values for test parameters. A trivial example is for checking that each type of user account is able to login, the user account credentials can be retrieved using a DB query to populate each test case dynamically in NUnit:
public List<User> GetTestData()
{
List<User> testData = new List<User>();
testData = MyDatabase.GetAllUsersInfo().ToList();
return testData;
}
[Test, TestCaseSource("GetTestData")]
public void CallLoginService(User user)
{
var response = LoginController.TryLogin(User.UserName, User.Password);
if (response.Error != null)
{
Assert.Fail("Failed to Login: {0}", response.Error);
}
Assert.AreEqual("Logged in ok", response.Message, "Login message not as expected");
}
Obviously this is a simple example of that feature, but I think it describes it well enough. I know we have the ability in SpecFlow to use a Scenario Outline and table of test run input data, but that is still static, so doesn't fit the bill.
I've been looking for a while and have not found anything in SpecFlow like this yet, does anybody know of anything similar to the above which can be used (or planned if anyone who works on the project reads this)?
Thanks :)
I have no idea if anything like this is planned but for now the problem is that there is a background code generation step when you edit your feature file via Visual Studio.
When it is saved in Visual Studio it is parsed and converted into the feature.cs file and that is the one that is compiled and used for testing.
So your process would become
edit your data source
export to feature file
get specflow's VS plugin to convert to feature.cs
run msbuild
run tests via Nunit or similar
I wouldn't do this. Instead I'd focus on getting my tests to be better examples. It sounds like you are to trying to exhaustively cover every possibility. Don't come up with examples to cover every possible case, but instead cover as much logic as possible with fewer tests.

In Integration Testing, does it make sense to replace Async process with a Synchronous one for the sake of testing?

In integration tests, asynchronous processes (methods, external services) make for a very tough test code. If instead, I factored out the async part and create a dependency and replace it with a synchronous one for the sake of testing, would that be a "good thing"?
By replacing the async process with a synchronous one, am I not testing in the spirit of integration testing? I guess I'm assuming that integration testing refers to testing close to the real thing.
Nice question.
In a unit test this approach would make sense but for integration testing you should be testing the real system as it will behave in real-life. This includes any asynchronous operations and any side-effects they may have - this is the most likely place for bugs to exist and is probably where you should concentrate your testing not factor it out.
I often use a "waitFor" approach where I poll to see if an answer has been received and timeout after a while if not. A good implementation of this pattern, although java-specific you can get the gist, is the JUnitConditionRunner. For example:
conditionRunner = new JUnitConditionRunner(browser, WAIT_FOR_INTERVAL, WAIT_FOR_TIMEOUT);
protected void waitForText(String text) {
try {
conditionRunner.waitFor(new Text(text));
} catch(Throwable t) {
throw new AssertionFailedError("Expecting text " + text + " failed to become true. Complete text [" + browser.getBodyText() + "]");
}
}
We have a number of automated unit tests that send off asynchronous requests and need to test the output/results. The way we handle it is to actually perform all of testing as if it were part of the actual application, in other words asynchronous requests remain asynchronous. But the test harness acts synchronously: It sends off the asynchronous request, sleeps for [up to] a period of time (the maximum in which we would expect a result to be produced), and if still no result is available, then the test has failed. There are callbacks, so in almost all cases the test is awakened and continues running before the timeout has expired, but the timeouts mean that a failure (or change in expected performance) will not stall/halt the entire test suite.
This has a few advantages:
The unit test is very close to the actual calling patters of the application
No new code/stubs are needed to make the application code (the code being tested) run synchronously
Performance is tested implicitly: If the test slept for too short a period, then some performance characteristic has changed, and that needs looking in to
The last point may need a small amount of explanation. Performance testing is important, and it is often left out of test plans. The way these unit tests are run, they end up taking a lot longer (running time) than if we had rearranged the code to do everything synchronously. However this way, performance is tested implicitly, and the tests are more faithful to their usage in the application. Plus all of our message queueing infrastructure gets tested "for free" along the way.
Edit: Added note about callbacks
What are you testing? The behaviour of your class in response to certain stimuli? In which case don't suitable mocks do the job?
Class Orchestrator implements AsynchCallback {
TheAsycnhService myDelegate; // initialised by injection
public void doSomething(Request aRequest){
myDelegate.doTheWork(aRequest, this)
}
public void tellMeTheResult(Response aResponse) {
// process response
}
}
Your test can do something like
Orchestrator orch = new Orchestrator(mockAsynchService);
orch.doSomething(request);
// assertions here that the mockAsychService received the expected request
// now either the mock really does call back
// or (probably more easily) make explicit call to the tellMeTheResult() method
// assertions here that the Orchestrator did the right thing with the response
Note that there's no true asynch processing here, and the mock itself need have no logic other than to allow verification of the receipt of the correct request. For a Unit test of the Orchestrator this is sufficient.
I used this variation on the idea when testing BPEL processes in WebSphere Process Server.