We run BDD tests (Cucumber/Selenium) with Jenkins in a Continuous Integration process. The number of tests is increasing day by day and the time to run these tests is getting higher, making the whole CI process not really responsive (if you commit in the afternoon you would risk to see your building results the day after). Is there a way/pattern to keep the CI process quick in spite of increasing number of tests?
You could choose one of the following schemes:
Seperate projects for unit tests and integration tests. The unit tests will return their results faster and the integration project will run once or just a couple of times per day and not after each commit. The drawback is obvious, if the integration tests suite break there is no correlation with the breaking change.
Google approach - sort your tests according to their size: small, medium, large and enormous. Use separate projects for each kind of test and according to the total time it takes to run the specific test suite. You can read more in this book. Also, read this blog to get more wise ideas.
Try to profile your current test suite to eliminate bottlenecks. This might bring it back to give feedback in a timely fasion.
Hope that helps.
#Ikaso gave some great answers there. One more option would be to set up some build slaves (if you haven't already) and split the integration tests into multiple jobs that can be run in parallel on the slaves.
Related
we started to develop some jobs with Flink. Our current dev / deployment process looked like this:
1. develop code in local IDE and compile
2. upload jar-file to server (via UI)
3. register new job
However it turns out that the generate jar-file is ~70MB and upload process takes a few minutes. What is the best way to speed up development (e.g. using a on-server ide?)
One solution is to use a version control system and after committing and pushing your changes, you might build the jar on the server itself. You could write a simple script for this.
The other solution, which would take time and effort is to set up a CI CD Pipeline which would automate the entire process and much manual effort would be minimised.
Also, try not to use fat jar to minimise the jar size if you have to scp your jar to the cluster.
First off, uploading 70 mb shouldn't take minutes nowadays. You might want to check your network configurations. Of course, if your internet connection is not good, you can't help it.
In general, I'd try to avoid cluster test runs as much as possible. It's slow and hard to debug. It should only ever be used for performance tests or right before releasing into production.
All logic should be unit tested. The complete job should be integration tested and ideally you'd also have an end-to-end test. I recommend to use a docker-based approach for external systems and use things like test containers for Kafka, such that you are able to run all tests from your IDE.
Going onto the test cluster should then be a rare thing. If you find any issue that has not been covered by your automatic tests, you need to add a new test case and solve it locally, such that there is a high probability that it will be solved on the test cluster.
edit (addressing comment):
If it's much easier for you to write a Flink job to generate data, then it sounds like a viable option. I'm just fearing that you would also need to test that job...
It rather sounds like you want to have an end-2-end setup where you run several Flink jobs in succession and compare the final results. That's a common setup for complex pipelines consisting of several Flink jobs. However, it's rather hard to maintain and may have unclear ownership status if it involves components from several teams. I like to rather solve it by having tons of metrics (how many records are produced, how many invalid records are filtered in each pipeline...) and having specific validation jobs that just assess the quality of (intermediate) output (potentially involving humans).
So from my experience, either you can test the logic in some local job setup or it's so big that you spend much more time in setting and maintaining the test setup than actually writing production code. In the latter case, I'd rather trust and invest in the monitoring and quality assurance capabilities of (pre-)production that you need to have anyways.
If you really just want to test one Flink job with another, you can also run the Flink job in testcontainers, so technically it's not an alternative but an addition.
There is a built in mechanism for protractor to run multiple instances of chrome for a given number of test suites.
However two tests running in parallel can and will change common data causing one or both to fail.
My best bet at the moment is to use docker containers running the app with separate mongo dbs which I'm thinking is a pain to set up.
This probably won't be the answer you want, but... the trick to running parallel tests is to ALWAYS write your tests so they can be run in parallel. This means taking advantage of any and all strategies towards this goal, including using multiple users/accounts, and creating/deleting test data for each test. This also means tests cannot depend on other tests (coupling), which is a bad idea regardless of sharding.
The reason you need to do it up front is there are no situations where you wouldn't want your tests to run faster. And in addition to just sharding protractor tests, you may want to further increase test speed by also employing Docker containers in parallel.
Again... probably not what you want to hear, but...
Good luck!
This topic is the beginning of the answer I am looking for. I need to know more.
Short story:
Why use GRID if pure TestNG parallel execution seems to work just fine?
Long story:
Background:
We are running about 40 tests now, growing.
We only use one browser (chrome).
To make tests faster we do parallel testing (makes sense).
We face issues configuring GRID solution,
in many cases we just drop it and run pure testNG parallel.
Question:
I need to know if it even makes sense to be so stubborn on that
whole GRID. For now it only seems to consume time without giving any
additional value.
My own thoughts:
The only thing i can think of to justify GRID is running the tests
using different machines. If we would need to actually balance the
load on several servers. But at this point even my own laptop is
doing the job just perfectly. This situation will not change
dramatically in nearest future, so why bother?
The link mentioned above claims the results of the no-grid parallel
tests may become unpredictable. We do not face that. So the question
may be: in what sense unpredictable? What to watch out for?
Thanks in advance for your help.
cheers,
Greg
The Grid mimics as a load balancer and distribute tests to nodes according to the desired capabilities. While the parallel attribute in testNG xml is just instructing the testNGrunner to trigger n number of tests at one go.
CAVEAT : If you do not use grid for parallel test execution, your single host will get overloaded as you scale up the thread-count. The results of the no-grid parallel tests may become unpredictable because multiple sessions will fill up the heap memory quickly. A general purpose computer has limited Heap memory . You are not facing this issue ,may be because you did not hit that limit.
Lets consider some examples:
Your target is to check functionality on windows as well as on MAC. Without grid you will run the cases twice.
You got a test case where a functionality breaks at older version of a browser and now its time for regression test. Without grid you will be running test cases multiple times for each browser's older version.
A case that is dependent on different screen resolutions.
Grid can simplify the effort for configuration.
Its just about making the time as much minimal as possible for running number of test cases.
I am a little confused on the concept of test automation (using Selenium etc) when doing regression testing. If the system that is being tested is constantly under change, how does it affect the test cases? and is automation the best way to go in this scenario?
Regression testing means you test to see if the system still behaves the way it should, in particular, if it still does everything it did correctly before any change.
Most users will complain when you break features in a software. So you don't get around regression testing before a release. That leaves the question as to how you do it.
You can manually test. Hire a bunch of monkeys, interns, testers, or whatever, and let them test. In order for them to find any regressions, they need to know what to test. So you need test scripts, which tell the tester what functionality to test: which button to click and what text to enter and then what result to expect. (This part rules out most monkeys, unfortunately.)
The alternative is automated testing: you still have a kind of test script, but at this time no manual tester works with the script, but a computer does instead.
Advantages of automated testing:
It's usually faster than manual testing.
You don't need to hire testers, interns, or monkeys.
You don't need to worry about humans getting tired of the repetitive work, missing a step or getting tired of clicking through the same old program over and over.
Disadvantages of automated testing:
Won't catch everything, in particular, some UI aspects may be hard to automate: a person will notice overlapping texts or pink on neon green text, but Selenium is happy if it can click it.
First you need to write the tests, and then maintain them. If you just add features, maintenance is not soo bad, but if e.g., you restructure your user interface, you may have to adjust all tests (Page Objects may come in handy here). But then again you also have to re-write all manual tests in such a situation.
Regression automation testing tools are the most widely used tools in the industry today. Let me help you with an example, considering your scenario which is 'Software undergoes continuous change'. Assume that we are following a Scrum based model in which the software will be developed across several sprints. Say each Sprint consists of 5 user stories/features. Sprint 1 is developed, tested and delivered.
Team moves to the next Sprint 2, which again has 5 big features. By the time, the development team hands over the features to the testing team, the testing team starts writing automated scripts for Sprint 1. Testing team runs the script say on a daily basis to check whether the features that are being developed in Sprint 2 do not break the previously working and tested features of Sprint 1. This is nothing but automating regression testing.
Of course, this is not as easy as it sounds. A lot of investment is needed for automated testing. Investment not only in terms of money but also time, training costs, hiring specialists etc.
I worked on project that consisted of approx. 25 sprints, hundreds of user stories and the project span across 2 years. With just 2 testers on the team, imagine the plight of the project had there been no Automation test suite.
Again, automation cannot entirely replace manual testing, but to quite-some extent. Automated tests can be functional as well visual regression ones. You can very well use Selenium to automate your functional tests and any other visual regression tool to check CSS breaks.
NOTE: Not every project needs to be automated. You have to consider the ROI (Return on Investment) when thinking about automating any project.
Regression testing is usually performed to verify code changes made into a system do not break existing code, introduce new bugs, or alter the system functionalities. As such, it should be performed every time you deploy a new functionality to your application, add a new module, alter system configurations, fix a defect, or perform changes to improve system performance.
Below is a simple regression test for some commonly used services in Python. This script helps catch errors that stem from changes made in a program’s source code.
#!/usr/local/bin/python
import os, sys # get unix, python services
from stat import ST_SIZE # or use os.path.getsize
from glob import glob # file name expansion
from os.path import exists # file exists test
from time import time, ctime # time functions
print 'RegTest start.'
print 'user:', os.environ['USER'] # environment variables
print 'path:', os.getcwd( ) # current directory
print 'time:', ctime(time( )), '\n'
program = sys.argv[1] # two command-line args
testdir = sys.argv[2]
for test in glob(testdir + '/*.in'): # for all matching input files
if not exists('%s.out' % test):
# no prior results
os.system('%s < %s > %s.out 2>&1' % (program, test, test))
print 'GENERATED:', test
else:
# backup, run, compare
os.rename(test + '.out', test + '.out.bkp')
os.system('%s < %s > %s.out 2>&1' % (program, test, test))
os.system('diff %s.out %s.out.bkp > %s.diffs' % ((test,)*3) )
if os.stat(test + '.diffs')[ST_SIZE] == 0:
print 'PASSED:', test
os.remove(test + '.diffs')
else:
print 'FAILED:', test, '(see %s.diffs)' % test
print 'RegTest done:', ctime(time( ))
Regression tests like the one above are designed to cover both functional and non-functional aspects of an application. This ensures bugs are caught with every build, thereby enhancing the overall quality of the final product. While regression tests are a vital part of the software QA process, performing these repetitive tests manually comes with a number of challenges. Manual tests can be tedious, time-consuming, and less accurate. Additionally, the number of test cases increases with every build, and so does the regression test suite grow.
An easy way of adressing the above challenges and maintaining a robust and cohesive set of regression test scripts is by automation. Test automation increases the accuracy and widens the coverage of your regression tests as your test suite grows.
Worth a mention is that even with automation, your regression tests are only as good as your test scripts. For this reason, you must understand what events trigger the need to improve and modify the test scenarios. For every change pushed to your codebase, you should evaluate its impact and modify the scripts to ensure all affected code paths are verified.
I hope this answered your question.
Let's say I have a bunch of unit tests, integration tests, and e2e tests that cover my app. Does it make sense to have these continuously running against prod, e.g. every 10 mins?
I'm thinking no, here's why:
My tests are already ran after every prod deploy. If they passed and no code has changed after that, they should continue to pass. So testing them thereafter doesn't make sense.
What I really want to test continuously is my infrastructure -- is it still running? In this case, running an API integration test every 10 mins to check if my API is still working makes sense. So I'm dealing with a subset of my test suites -- the ones that test my infrastructure availability (integration+e2e) versus only single bits of code (unit test). So in practice, would I have seperate test suites that test prod uptime than the suites used to test pre/post deploy?
Such "redundant" verifications (they can include building as well, BTW, not only testing) offer additional datapoints increasing the monitoring precision for your actual production process.
Depending on the complexity of your production environment even the simple "is it up/running?" question might not have a simple answer and subset/shortcut versions of the verifications might not cut it - you'd only cover those versions, not the actual production ones.
For example just because a build server is up doesn't mean it's also capable of building the product successfully, you'd need to check every aspect of the build itself: availability of every tool, storage, dependencies, OS resources, etc. For complex builds it's probably simpler to just perform the build itself than to manage the code reliably checking if the build would be feasible ;)
There are 2 production process attributes that would benefit from a more precise monitoring (and for which subset/shortcut verifications won't be suitable either):
reliability/stability - the types, occurence rates and root causes of intermittent failures (yes, those nasty surprises which could make a difference between meeting the release date or not)
performance - the avg/min/max durations of various verifications; especially important if verifications are expensive in terms of duration/resources involved; trending could be desired for planning, budgeting, production ETAs, etc
Donno if any of these are applicable to or have acceptable cost/benefit ratios for your context but they are definitely important for most very large/complex sw projects.