In BDD, should scenarios be abstract descriptions of behvaior, or should they contain concrete examples? - testing

I feel I have reached a fundamental dillema in writing BDD scenarios as a tester.
When writing BDD scenarios from testing perspective, I tend to end up using concrete examples with concrete data and observing the state, i.e. Given these initial values, When user performs an action, Then these final values should be observed. Example with an initial dataset given in Background:
Background:
Given following items are in the store
| type | name | X | Y | Z | tags |
| single | el1 | 10 | 20 | 1.03 | t1 |
| multi | el2 | 10 | 20 | 30 | t2 |
| single | el3 | 10 | 3.02 | 30 | t3 |
Scenario: Adding tag to multi-type item
Given Edit Item Popup is opened for item: el2
When user adds tag NEWTAG
And user clicks on Apply changes button
Then item store should display following items
| type | name | X | Y | Z | tags |
| single | el1 | 10 | 20 | 1.03 | t1 |
| multi | el2 | 10 | 20 | 30 | t2, NEWTAG |
| single | el3 | 10 | 3.02 | 30 | t3 |
The initial dataset from Background can be reused in all (or most) scenarios that deal with modifying and adding/deleting items, in relation to particular feature. I can also iterate the scenario over some data set that explores the problem space, boundary conditions etc. (trivial example here: tags with too many or forbidden chars).
But when requirements are not entirely clear I sometimes go with a different approach and start from a more abstract description of the behavior (so that scenarios can become the specification), which seems to me as the more (for lack of a better word) correct way of doing BDD. So I end up with behavior descriptions which are perfectly clear when a human is reading them from the requirement analysis position, but appear to be extremely vague when you shift to testing perspective:
Scenario: Adding tag to multi-type item
Given Edit Item Popup is opened for multi-type item
When user adds a new tag
And user clicks on Apply changes button
Then that item should have that tag displayed in item store
For some reason I feel way better writing a scenario like that, as it seems closer to BDD ideals (describing the behavior, doh!). But at the same time I feel terrible because of 2 reasons:
A lot of details are implicit here and thus hidden deep in the implementation. Because of that, while implementing, we need to ask ourselvs a ton of questions like 'what initial data should I use here?', 'how to keep track of which item are we handling?', 'how deep should I examine the final state?'. This all goes away when you just compare final state with a reference table, as in the first approach.
(Possibly more serious) I am not exploring the problem space here at all, while bugs often await us somewhere in dark corners of that space.
One could argue that these 2 approaches I presented are just extreme ends of a spectrum, but I still see them fundamentally different, and often find myself wondering which approach to chose.
So, how do you write your BDD (test) scenarios? Data-driven and state-comparing, or full blown abstract descriptions of behavior?

Related

In cucumber feature file, scenario outline table can have < >-delimited parameters of Examples

Please tell me if this is possible to have angle parameter in the table of Example (of Scenario Outline). I am new in feature file development hence need your help.
Feature: Testing table can have angle parameter of examples
Scenario Outline: outline
When a table step:
| Day | Months |
| 30 | <Months> |
Then verify if day exist in this month<DoesContainInThisMonth>
Examples:
| Months | DoesContainInThisMonth |
| Jan | No |
| Feb | No |
| Mar | Yes |
I do not have a straight solution to this problem the way you are looking for.
But maybe a different approach could help.
To give an example: in the column "Months" you could have a list of months that meet each requirement ("Days"?). And then, check if a specific month is included in this list in the StepDefinitions code. This way, you only need one table.
In my opinion that makes the test more clear, which is one of the benefits of BDD.
If you provide more information about the objective of this scenario, it is likely more contributing answers will come up.

Best data structure for finding tags of nested locations

Somebody pointed out that my data structure architecture sucks.
The task
I have a locations table which stores the name of a location. Then I have a tags table which stores information about those locations. The locations have a hierarchie which I want to use to get all tags.
Example
Locations:
USA <- California <- San Francisco <- Mission St
Tags:
USA: English
California: Sunny
California: West coast
San Francisco: Sea side
Mission St: Cable car station
If somebody requests information about the Mission St I want to deliver all tags of it and it's ancestors (["English", "Sunny", "West coast", "Sea side", "Cable car station"]. If I request all tags of California the answer would be ["English", "Sunny", "West coast"].
I'm looking for the best read performance! I don't care about write performance. This data is not changed very often. And I don't care about table sizes either. If I need more or larger tables to solve this quicker so be it.
The tables
So currently I'm thinking about setting up these tables:
locations
id | name
---|--------------
1 | USA
2 | California
3 | San Francisco
4 | Mission St
tags
id | location_id | name
---|-------------|------------------
1 | 1 | English
2 | 2 | Sunny
3 | 2 | West coast
4 | 3 | Sea side
5 | 4 | Cable car station
ancestors
I added a position field to store the hierarchy.
| id | location_id | ancestor_id | position |
|----|-------------|-------------|----------|
| 1 | 2 | 1 | 1 |
| 2 | 3 | 2 | 1 |
| 3 | 3 | 1 | 2 |
| 4 | 4 | 3 | 1 |
| 5 | 4 | 2 | 2 |
| 6 | 4 | 1 | 3 |
Question
Is this a good solution to solve the problem or is there a better one? I want to select as fast as possible all tags of any given location including all the tags of it's ancestors. I'm using a PostgreSQL database but I think this is a pure SQL architecture problem.
Your problem seems to consist of two challenges. The most interesting is "how do I store hierarchies in a relational database". There are lots of answers to that - the one you've proposed is the most common.
There's an alternative called "nested set" which is faster for reading (in your example, finding all locations within a particular hierarchy would be "between x and y".
Postgres has dedicated support for hierachies; I'd assume this would also provide great performance.
The second part of your question is "given a path in my hierarchy, retrieve all matching tags". The easiest option is to join to the tags table as you suggest.
The final aspect is "should you denormalize/precalculate". I usually recommend building and optimizing the "normalized" solution and only denormalize when you need to.
If you want to deliver all tags for a particular location, then I would recommend replicating the data and storing the tags in a tags array on a row for each location.
You say that the locations don't change very much. So, I would simply batch create the entire table, when any underlying data changes.
Modifying the data in situ is rather problematic. A single update could end up affecting a zillion different rows -- consider a tag change on USA. Recalculating the entire table is going to be more efficient.
If you need to search on the tags as well as return them, then I would go for a more traditional structure of a table with two important columns, location and tag. Then you can have indexes on both (location) and (tag) to facilitate searching in either direction.
If write performance is not crucial, I would go for denormalization of the database. That means you use the above structure for your write operations and fill a table for your read operations by a trigger or a some async job, if you are afraid of triggers. Then the read performance is optimal, but you have to invest a bit more into the write logic.
Using the above structure for read operations is indeed not a smart solution, cause you don't know how deep the tree can get.

Google Refine / Open Refine: Columns to Rows

I'm afraid this might be a somewhat simple question, but I can't seem to figure it out.
I have a spreadsheet with many objects, each of which has many attributes (one per column), like this (sorry, I can't post images, so this is the best I can do):
OBJECT ID | PERIOD | COLOR | REPRESENTATION
1 | Early Intermediate | Bichrome | Abstract
2 | Middle Horizon | Multicolored | Representational
… and I'd like each column to become a separate row — which would mean that each object would be listed a number of times. Like this:
OBJECT | ATTRIBUTE
Object 1 | Early Intermediate
Object 1 | Bichrome
Object 1 | Abstract
Object 2 | Middle Horizon
Object 2 | Multicolored
Object 2 | Representational
I'm not seeing an obvious way to do this, and I can't find an answer here, though perhaps I'm not using the right search terms.
Thanks for any help you can offer!

Get the begin of a union of intervals

Disclaimer
While searching for an answer, I found this question, but I couldn't find a way to express the solution in SQL:
Union of intervals
Background
I'm trying to calculate how long the people in the company I work in are employed. In the database I have (that is already in the company for years and is [sadly] not changeable), each contract is stored as one line. Each line has a lot of information about the employee and the contract, including a contract creation date, a contract rescission date (or infinity, if still active) and the current contract situation ("active" or "deactivated"). There are, however, two problems that are preventing me from simply doing what could seem obvious:
People can be "multicontratual", so the same person could have multiple active lines at the same time.
Sometimes, there are some transfers that result in deactivating one of a person's contracts and creating a new contract line. These transfers must not be counted (i.e., I should take into account both the timelines). There is, however, no explicit flag for the transfers existence in the database, so it was defined that "it is a transfer if there was any contract rescission until 60 days before a new contract is created".
When trying to account for the multiple cases that could arise from this scenario (e.g., if the same person had many contracts through the time, then no contracts during more than 60 days, and then some other contracts, then I'd want to start counting from after the "more-than-60-days" period), I found that two rules solve the problem. I need:
The last contract creation where there was no other contract already active at the time. (this solves the problem 1)
&& there was no other active contract until 60 days before.
To the DB
To solve the problem, I decided to rearrange the rules. I wanted to take all contracts for which there was no other active contract until 60 days before its creation, and then take the "MAX()" of them. So, for example, for the following person, I would say she is active since 1973:
+----------+-----+-----------+-----------+---------------+-----------------+
| CONTRACT | ... | PERSON_ID | STATUS | CREATION_DATE | RESCISSION_DATE |
+----------+-----+-----------+-----------+---------------+-----------------+
| 1 | ... | 1 | deactived | 1973/10/01 | 1999/07/01 |
| 2 | ... | 1 | deactived | 1978/06/01 | 2000/07/01 |
| 3 | ... | 1 | deactived | 2000/08/01 | 2008/06/01 |
| 4 | ... | 1 | active | 2000/08/01 | infinity |
| 5 | ... | 1 | active | 2000/08/01 | infinity |
+----------+-----+-----------+-----------+---------------+-----------------+
I am treating the dates as if they were integers (in fact, they are in the real database). My question is: how could I create a query to take the "1973/10/01"? I.e., how could I get all the "creation_date"s that are distant from (higher than) the others in at least 60, and that are not in the intervals described by the other lines?
[and, anyway, does this seem the best way to solve the problem? (I don't think so)]

Data Modeling : Item With Dimensions (One-to-Many)

I need suggestions on how to properly model an item record along with all of its corresponding dimensions.
Consider the following:
| ITEM_ID | ITEM_DESCRIPTION | ITEM_PRICE | SIZE | LENGTH | COLOR
| SH01 | POLO SHIRT | 22.95 | LARGE| |
| PA02 | KHAKI PANTS | 9.95 | 38 | 32 |
| BR22 | BRACELET | 10.95 | | | GREEN
All of the items have different dimensions that may/may not be used by other items. Shirts and pants have sizes and lengths. The bracelet, however, has only a color.
Also, new dimensions may be necessary as new items are added (weight, pattern, etc.).
I've looked at EAV (entity-attribute-value), but from what I understand, reporting would be a nightmare with such a model.
How can I manage the dimensions for each item? Any and all suggestions would be greatly appreciated.
By using the word 'Dimension' you imply you are building a star schema. The physical representation of these 'optional' attributes is mostly dependent on your query tool and desired performance.
IMHO, in dimensional modelling, you should not be afraid of very wide dimensions, particularly if they make querying easier.
If a user runs a query on all product sizes including watches and pants, does it make sense to bucket watches etc. into a N/A size?
EAV is in many ways the opposite of dimensional modelling. dimensional modelling is about making querying as fast and as simple as possible by rearranging data in the ETL process.
Design is often easier if you find a pre proven design approach and stick with it.