Testing pandas code [closed] - pandas

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I have a script of about 50 lines that reads data from a database, loads it into a pandas dataframe and then perform numerous operations on the dataframe.
I was wondering how people generally test this type of code? I'm not talking about tools like assert_frame_equal, but rather principles people follow.
For instance, should I create 50 separate tests to basically test each operation performed or should I try to break up the script in smaller parts?
If anybody knows of quality open source projects that I can use as inspiration, please let me know.

If you want to start to write python unit test, this question is recommended.
Since the 50 lines are relevant, you probably want a functional test.
Read the difference between unit, functional, acceptance, and integration testing.
If you know SOLID principle of object-oriented-design, refactoring to the code is needed.
About how to design a good test, What are the properties of good unit tests
Specific to pandas, use fewer data to improve performance for testing.
Make a dummy copy for testing, rather than use the origin data.
And check mainly on the key feature, you want to check.

I may suggest such approach:
Split the script into data retrieving and data processing part. It's better to test your data access/query code and computations separately.
Prepare fixed dataset you will use for tests. It may be part of your production data or special dataset which cover some boundary conditions (like NaNs, zeroes, negative values, etc).
Write test cases, that check results of your computations. You may check values directly or do some aggregations (COUNT, SUM) and compare it with expected values.
The number of checks depends on data and computation you do. For some cases it might be enough to check only SUM() of all elements, for others - check every item.
I prefer to check only a few general conditions, which would fail if something went wrong than cover all possible cases.

Related

Test result reporting [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 2 years ago.
Improve this question
I am using TestNG and Selenium webdriver via Java.
Is there any tool that can help generate detailed test results, for example, suppose I have a test case that fails more often than not, is there a tool that can statistically report those test cases that fail more often than the others like in a graph, or pie chart, etc?
XL Testview
Have a look at XL Testview from XebiaLabs.
Test analytics and decision support that spans testing tools
See all your test results in one single dashboard
Analyze test results across multiple test tools
Track release metrics and quality trends over time
Use real-time quality data to make the best go/no-go release decisions
I havent used it, but seems to track results over time. Seems pretty interesting.
Test Result Analyzer
Or have a look a the Test Result Analyzer plugin for Jenkins.
Many of us have a requirement of knowing the execution status of a
test package , test class or test-method across multiple builds. This
plugin is an implementation of the said requirement and shows a table
containing the executions status of a package,class or a test-method
across builds.
This plugin supports jUnit and TestNG results sets. Looks like the minimum you want and it is free. :)
Tesults - it handles all this including identifying recurring / continuing fails. In general it's a great central place for a team to view and assign failing tests. Please be aware that I work at Tesults.

Optimal amount of Automation rework [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
During the development cycle of a feature, the feature is constantly changing, even after the point where the it meets all requirements (UI improvements, etc...). If you have automated tests for that feature, these changes can break your automation and you will have to rework it. If the feature keeps changing though, it does not make sense to rework automation after every revision. At some point, however, you have to automate it so you can do regression testing. How can we find the optimal time to rework automation? How do we get the optimal amount? My team agreed that we over-reworked the automation of one of our features. One example of a mistake we made was to rework automation right before a conference where we showed the feature off to customers to get customer feedback. We should have known that customer feedback would result in more changes to the feature. Functional testing should have been enough in that case.
Does anyone have any tips or experiences to share?
The general tip would be to come to a consensus on what "done" means for the feature before you you start building it.
If during the build you come across some new tweaks that you'd like to add to improve the feature (or whatever) don't add them to the existing story - write a new one... and make sure that you prioritise it against the other things that you need to be doing.
This is also sometimes, but not always, a sign that you're working with increments of functionality that are too large. Try splitting and thinning the stories further until you can write down some quite concrete definitions of "done" for the feature. Consider automating those tests of "done" before you start building (but don't go overboard).
You might find the Specification By Example book of use.
According to my experience is that the feature you've been developing is not understood by the customers yet, fully.
Separate the feature into small parts like #adrianh suggested.
One more tip for the instable customers: Let them first see the pseudo prototype at first hand, even maybe in the planning meeting (code it directly to html or something easier something like a prototyping/diagramming tool). Let them play with it. This way, you will have an easier time with your features.

Generating dummy data for my web application - looking for dictionaries [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Sorry if this is off topic - but it is certainly programming related.
I need to test my web application at scale (concurrent users and amount of data in system). For the latter, I need some way of generating dummy data for a variety of types (name, address, email and some other data types)
Are there any open source (free), or commercial providers of dummy data dictionaries (in any format but preferably mySQL) (I don't really need a whole application - just the data).
How have others solved this problem?
edit:
Sorry if I wasn't clear. I don't need a way to code this - I just need the dummy data(base) files to provide the raw information. I don't want nonsense data (like randomly generated characters) because this won't allow us to perform usability tests or demonstrations. If this isn't available in open source - does anyone know why not?
edit 2:
I've seen generatedata.comm, but the database that backs the application is too small. I need to test around 100,000 users (and I have needs for data types that are not supported by that application. Even just a dictionary (english), in database form would be useful.
This website offers you a lot of free data for tests purpose : www.fakenamegenerator.com
Could you just write a simple script to programmatically randomly generate the required data? I would use python, but you could do it in practically anything.
Something along the lines of this pseudo code should do the trick:
for i in range(0, 100000):
name = randomName()
email = randomEmail()
insertIntoSomeTable(name, email)
Where randomName generates a random name, randomEmail generates a random email and insertIntoSomeTable takes the randomly generated data and inserts it into one of your tables. These functions should be trivial to implement.
Repeat for all of the tables you need random data for.

Traceability Matrix between Requirements and Design Document [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
I have been asked to create a traceability matrix that maps between the Requirements and Design document. I am having a lot of trouble working out how I link a single requirement to the design as the link is nearly always 1:M and is therefore difficult to map and maintain. Can any point be in the direction of any examples, or provide some advice on how you manage the matrix in this context. Requirements to Testing makes sense to me, however I fail to see why I need Requirements to Design, apparently this is required for our CMMI3 audit.
Thanks for the help
It appears to me like you are talking about the role of a requirements analyst. There are various tools to help in this process, the leading commercial contender is IBM Doors. Although I believe this can equally well be acheieved using a wiki and hyperlinks within wiki pages to denote dependancy and linkage.
If you have a Requirements Spec and a Design and they aren't already linked in some way then your boss has missed the point of Requirements Management in the first place.
Requirements should guide the design process and be linked from the beginning not merely linked afterwards to keep an auditor happy. Anything you design should be done in a particular way to meet a requirement.
To cut a long story short... Personally, I would stick both the Requirements and the design in a wiki and link them together as I mentioned above. You're basically being asked to make the documentation for a process that either didn't occur or wasn't written down.
The compliance matrix is ​​a two-dimensional table that contains the correspondence of the functional requirements of the product and the prepared test cases. In the headings of the columns of the table there are requirements, and in the header lines - test scenarios. At the intersection is a mark, meaning that the requirement of the current column is covered by the test script of the current line.
The compliance matrix is ​​used by QA engineers to validate product coverage with tests. The TM is an integral part of the test plan.

SQL proc diagram generating software of a program flow [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I have a couple of very long procs in Oracle 2000+ lines with lots of calls. And I'd like to generate program flow Diagram (algorithm) for better understanding of the process for further refactoring.
It's not the code I wrote so I don't know the logic enough.
What would you advise to do in this case? I tried to draw a text-like flow but it takes lots of time and still hard to cover all the logic for understanding.
The best approach I see would be flow chart generated from SQL proc with links to "jump" between code and chart.
UPDATE:
Found couple of software doing the same:
ClearSQL - makes CRUD diagrams, call map, and flow chart.
Quest SQL Navigator Expert (using it now): it has Outline (makes code flow with ability to collapse-expand blocks of code - really cool one!), Code Explorer (enumerates all func, params with links ti SQL text - just in interface) features
There is a product, Code Visual to Flowchart, which can take code in various languages and do something like what you're describing. Unfortunately though Oracle doesn't appear to be in the list of supported languages, but Microsoft TSQL is; maybe you could at least translate your proc from Oracle into MS and use this to roughly visualize your proc's flow.
Failing that burnall's suggestion sounds like the best way to go, essentially divide and conquer.
I doubt that such tool, even if it exists could help better understanding. I think, time of big flow-charts is over.
I would advice to understand logic with step-by-step refactoring:
iteratively extract parts of procedure to smaller procedures and add tests.