I looking for mechanism that will allow to build more flexible scenarios.
For example for these two very similar scenarios that test existence of records in database:
Scenario Outline: Testing query with 1 attribute with these 2 record in and another 2 out of result
Given I'm connected to <db> database
When I select <query> from database
Then Result should contain fields:
| <row> |
| <yes1> |
| <yes2> |
And Result should not contain fields:
| <row> |
| <no1> |
| <no2> |
Examples:
| db | row | yes1 | yes2 | no1 | no2 | query |
| 1 | model | 1013 | 1006 | 1012 | 1007 | "SELECT model FROM pc WHERE speed >= 3.0;" |
| 1 | maker | E | A | C | H | "SELECT maker FROM product NATURAL JOIN laptop WHERE hd >= 100;" |
Scenario Outline: Testing query with 2 attributes with these 2 record in and another 2 out of result
Given I'm connected to <db> database
When I select <query> from database
Then Result should contain fields:
| <rowA> | <rowB> |
| <yes1A> | <yes1B> |
| <yes2A> | <yes2B> |
And Result should not contain fields:
| <rowA> | <rowB> |
| <no1A> | <no1B> |
| <no2A> | <no2B> |
Examples:
| db | rowA | rowB | yes1A | yes1B | yes2A | yes2B | no1A | no1B | no2A | no2B | query |
| 1 | model | price | 1004 | 649 | 2007 | 1429 | 2004 | 1150 | 3007 | 200 | "SELECT model,price FROM product" |
| 2 | name | country | Yamato | Japan | North | USA | Repulse | Brit | Cal | USA | "SELECT name, country FROM clases" |
I would like to be able to write one scenario with general number of attributes. It would be great if number of tested rows will not be determined too.
My dream is to write only one general scenario:
Testing query with N attribute with these M record in and another L out of result
How to do this in Gherkin? Is it possible with any hacks?
The short answer is, No. Gherkin is not about flexibility, Gherkin is about concrete examples. Concrete example are everything except flexible.
A long answer is:
You are describing a usage of Gherkin as a test tool. The purpose with Gherkin is, however, not to test things. The purpose with Gherkin is to facilitate communication between development and the stakeholders that want a specific behaviour.
If you want to test something, there are other tooling that will support exactly what you want. Any test framework will be usable. My personal choice would be JUnit since I work mostly with Java.
The litmus test for deciding on the tooling is, who will have to be able to understand this?
If the answer is non techs, I would probably use Gherkin with very concrete examples. Concrete examples are most likely not comparing things in a database. Concrete examples tend to describe external, observable behaviour of the system.
If the answer is developers, then I would probably use a test framework where I have access to a programming language. This would allow for the flexibility you are asking for.
In your case, you are asking for a programming language. Gherkin and Cucumber are not the right tools in your case.
You can do it without any hacks, but I don't think you want to, at least not the entire scenario in a single line.
You will want to follow BDD structure, else why use BDD?
You should have and follow a structure like:
Given
When
Then
You need to split and have a delimitation between initial context, action(s) and result(s).It will be a bad practice to not have a limit between these.
Also note that a clear delimitation will increase reusability, readability and also help you a lot in debugging.
Please do a research of what BDD means and how it helps, it may help if you have a checklist with best practices of BDD that could also help in code review of the automated scenarios.
Related
I have two Excel tables- the first one is the data table and the second one is a look up table. Here is how they are structured-
Data Table
+----------+-------------+----------+----------+
| Category | Subcategory | Division | Business |
+----------+-------------+----------+----------+
| A | Red | Home | Q |
| B | Blue | Office | R |
| C | Green | City | S |
| D | Yellow | State | T |
| D | Red | State | T |
| D | Green | Office | Q |
+----------+-------------+----------+----------+
Lookup Table Lookup Table
+----------+-------------+----------+----------+--------------+
| Category | Subcategory | Division | Business | LookUp Value |
+----------+-------------+----------+----------+--------------+
| 0 | 0 | 0 | Q | ABC |
| B | 0 | Office | 0 | DEF |
| C | Green | 0 | 0 | MNO |
| D | 0 | State | T | RST |
+----------+-------------+----------+----------+--------------+
So I want to add the lookup value column to the data table based on the criteria given in the lookup table. Eg, for the first row in the lookup table, I dont want to lookup on Category, Subcategory, or Division. but if the Business is Q, then I want to populate the lookup value as ABC. Similarly, for the second row I dont want to consider the Subcategory. and Business. but if the Category. is "B" and Division is "Office", I want it to populate DEF. So the result should look like this-
[Final Resulting Data Table]
+----------+-------------+----------+----------+--------------+
| Category | Subcategory | Division | Business | LookUp Value |
+----------+-------------+----------+----------+--------------+
| A | Red | Home | Q | ABC |
| B | Blue | Office | R | DEF |
| C | Green | City | S | MNO |
| D | Yellow | State | T | RST |
| D | Red | State | T | RST |
| D | Green | Office | Q | ABC |
+----------+-------------+----------+----------+--------------+
I am very new to SQL and the actual data set is very complex wih multiple lookup values based on different criteria. IF you think any other scripting language would work better, I am open to that too. My data is in Excel currently
If the data is so complex, you should first consider if you want to put it in a (relational) database (like MS Access, MySQL, etc.) instead of in a spreadsheet (like MS Excel).
Both kind of programs are used for structured data handling, but databases focus primarily on efficient data storage and data integrity (including guarding type safety, required fields, unique fields, required references between various datasets/tables, etc.) and spreadsheets focus primarily on data analysis and calculations.
Relational databases support Structured Query Language (SQL) to let clients query their data. Spreadsheets normally do not use or support SQL (as far as I know).
It is possible to let MS Excel import or reference data in an external data source (like a relational database) to perform analysis and calculations on it.
The other way around is (sometimes) possible too: to link to spreadsheet worksheets as external tables inside a relational database system to - within certain limits - allow that data to be queried using SQL. But using a database to store the data and a spreadsheet (as a database client) to perform analysis on the data in the database would be a more logical design in my opinion.
However, creating such an integrated solution using multiple MS Office applications and/or external databases can be a complex challenge, especially when you are just starting to learn about them.
To be honest, I am not experienced with designing MS Office based solutions, so I cannot guide you around any pitfalls. I do hope, that this answer helps you a little with finding the right way to go here...
One of the standards from W3C for RDB2RDF is Direct Mapping. I heard that there is a problem when converting many-to-many relationship from a relational database and they say it loses semantic, I need more explanation about it.
...there is a problem when converting many-to-many relationship from a relational database
I'd say that direct mapping introduces additional "parasitic" semantics, treating normalization artefacts as first-class object.
Let's consider the D011-M2MRelations testcase.
Student
+---------+-----------+----------+
| ID (PK) | FirstName | LastName |
+---------+-----------+----------+
| 10 | Venus | Williams |
| 11 | Fernando | Alonso |
| 12 | David | Villa |
+---------+-----------+----------+
Student_Sport
+------------+----------+
| ID_Student | ID_Sport |
+------------+----------+
| 10 | 110 |
| 11 | 111 |
| 11 | 112 |
| 12 | 111 |
+------------+----------+
Sport
+---------+-------------+
| ID (PK) | Description |
+---------+-------------+
| 110 | Tennis |
| 111 | Football |
| 112 | Formula1 |
+---------+-------------+
Direct mapping generates a lot of triples of this kind:
<Student_Sport/ID_Student=11;ID_Sport=111> <Student_Sport#ref-ID_Student> <Student/ID=11>.
<Student_Sport/ID_Student=11;ID_Sport=111> <Student_Sport#ref-ID_Sport> <Sport/ID=111>.
<Student_Sport/ID_Student=11;ID_Sport=112> <Student_Sport#ref-ID_Student> <Student/ID=11>.
<Student_Sport/ID_Student=11;ID_Sport=112> <Student_Sport#ref-ID_Sport> <Sport/ID=112>.
Modeling from scratch, you'd probably write something like this (R2RML allows to achieve that):
<http://example.com/student/11> <http://example.com/plays> <http://example.com/sport/111>.
<http://example.com/student/11> <http://example.com/plays> <http://example.com/sport/112>.
Moreover, one can't improve results denormalizing original tables or creating SQL views: without primary keys, results are probably even worse.
In order to improve results, subsequent DELETE/INSERT (or CONSTRUCT) seems to be the only option available. The process should be named ELT rather than ETL. Perhaps the following DM-generated triples were intended to help in such transformation:
<Student_Sport/ID_Student=11;ID_Sport=111> <Student_Sport#ID_Student> "11"^^xsd:integer.
<Student_Sport/ID_Student=11;ID_Sport=111> <Student_Sport#ID_Sport> "111"^^xsd:integer.
<Student_Sport/ID_Student=11;ID_Sport=112> <Student_Sport#ID_Student> "11"^^xsd:integer.
<Student_Sport/ID_Student=11;ID_Sport=112> <Student_Sport#ID_Sport> "112"^^xsd:integer.
...they say it loses semantics
#JuanSequeda means that DM doesn't generate an OWL ontology from an relational schema, this behaviour is not many-to-many relations specific.
See also links from Issue 14.
My task is to combine two tables in a specific way. I have a table Demands that contains demands of some goods (tovar). Each record has its own ID, Tovar, Date of demand and Amount. And I have another table Unloads that contains unloads of tovar. Each record has its own ID, Tovar, Order of unload and Amount. Demands and Unloads are not corresponding to each other and amounts in demands and unloads are not exactly equal. One demand may be with 10 units and there can be two unloads with 4 and 6 units. And two demands may be with 3 and 5 units and there can be one unload with 11 units.
The task is to get a table which will show how demands are covering by unloads. I have a solution (SQL Fiddle) but I think that there is a better one. Can anybody tell me how such tasks are solved?
What I have:
------------------------------------------
| DemandNumber | Tovar | Amount | Order |
|--------------------------------|--------
| Demand#1 | Meat | 2 | 1 |
| Demand#2 | Meat | 3 | 2 |
| Demand#3 | Milk | 6 | 1 |
| Demand#4 | Eggs | 1 | 1 |
| Demand#5 | Eggs | 5 | 2 |
| Demand#6 | Eggs | 3 | 3 |
------------------------------------------
------------------------------------------
| SaleNumber | Tovar | Amount | Order |
|--------------------------------|--------
| Sale#1 | Meat | 6 | 1 |
| Sale#2 | Milk | 2 | 1 |
| Sale#3 | Milk | 1 | 2 |
| Sale#4 | Eggs | 2 | 1 |
| Sale#5 | Eggs | 1 | 2 |
| Sale#6 | Eggs | 4 | 3 |
------------------------------------------
What I want to receive
-------------------------------------------------
| DemandNumber | SaleNumber | Tovar | Amount |
-------------------------------------------------
| Demand#1 | Sale#1 | Meat | 2 |
| Demand#2 | Sale#1 | Meat | 3 |
| Demand#3 | Sale#2 | Milk | 2 |
| Demand#3 | Sale#3 | Milk | 1 |
| Demand#4 | Sale#4 | Eggs | 1 |
| Demand#5 | Sale#4 | Eggs | 1 |
| Demand#5 | Sale#5 | Eggs | 1 |
| Demand#5 | Sale#6 | Eggs | 3 |
| Demand#6 | Sale#6 | Eggs | 1 |
-------------------------------------------------
Here is additional explanation from author's comment:
Demand#1 needs 2 Meat and it can take them from Sale#1.
Demand#2 needs 3 Meat and can take them from Sale#1.
Demand#3 needs 6 Milk but there is only 2 Milk in Sale#3 and 1 Milk in Sale#4, so we show only available amounts.
And so on.
The field Order in the example determine the order of calculations. We have to process Demands according to their Order. Demand#1 must be processed before Demand#2. And Sales also must be allocated according to their Order number. We cannot assign eggs from sale if there are sales with eggs with lower order and non-allocated eggs.
The only way I can get this is using loops. Is it posible to avoid loops and solve this task only with t-sql?
If the Amount values are int and not too large (not millions), then I'd use a table of numbers to generate as many rows as the value of each Amount.
Here is a good article describing how to generate it.
Then it is easy to join Demand with Sale and group and sum as needed.
Otherwise, a plain straight-forward cursor (in fact, two cursors) would be simple to implement, easy to understand and with O(n) complexity. If Amounts are small, set-based variant is likely to be faster than cursor. If Amounts are large, cursor may be faster. You need to measure performance with actual data.
Here is a query that uses a table of numbers. To understand how it works run each query in the CTE separately and examine its output.
SQLFiddle
WITH
CTE_Demands
AS
(
SELECT
D.DemandNumber
,D.Tovar
,ROW_NUMBER() OVER (PARTITION BY D.Tovar ORDER BY D.SortOrder, CA_D.Number) AS rn
FROM
Demands AS D
CROSS APPLY
(
SELECT TOP(D.Amount) Numbers.Number
FROM Numbers
ORDER BY Numbers.Number
) AS CA_D
)
,CTE_Sales
AS
(
SELECT
S.SaleNumber
,S.Tovar
,ROW_NUMBER() OVER (PARTITION BY S.Tovar ORDER BY S.SortOrder, CA_S.Number) AS rn
FROM
Sales AS S
CROSS APPLY
(
SELECT TOP(S.Amount) Numbers.Number
FROM Numbers
ORDER BY Numbers.Number
) AS CA_S
)
SELECT
CTE_Demands.DemandNumber
,CTE_Sales.SaleNumber
,CTE_Demands.Tovar
,COUNT(*) AS Amount
FROM
CTE_Demands
INNER JOIN CTE_Sales ON
CTE_Sales.Tovar = CTE_Demands.Tovar
AND CTE_Sales.rn = CTE_Demands.rn
GROUP BY
CTE_Demands.Tovar
,CTE_Demands.DemandNumber
,CTE_Sales.SaleNumber
ORDER BY
CTE_Demands.DemandNumber
,CTE_Sales.SaleNumber
;
Having said all this, usually it is better to perform this kind of processing on the client using procedural programming language. You still have to transmit all rows from Demands and Sales to the client. So, by joining the tables on the server you don't reduce the amount of bytes that must go over the network. In fact, you increase it, because original row may be split into several rows.
This kind of processing is sequential in nature, not set-based, so it is easy to do with arrays, but tricky in SQL.
I have no idea what your requirements are or what the business rules are or what the goals are but I can say this -- you are doing it wrong.
This is SQL. In SQL you do not do loops. In SQL you work with sets. Sets are defined by select statements.
If this problem is not resolved with a select statement (maybe with sub-selects) then you probably want to implement this in another way. (C# program? Some other ETL system?).
However, I can also say there is probably a way to do this with a single select statement. However you have not given enough information for me to know what that statement is. To say you have a working example and that should be enough fails on this site because this site is about answering questions about problems and you don't have a problem you have some code.
Re-phrase the question with inputs, expect outputs, what you have tried and what your question is. This is covered well in the FAQ.
Or if you have working code you want reviewed, it may be appropriate for the code review site.
I see additional 2 possible ways:
1. for 'advanced' data processing and calculations you can use cursors.
2. you can use SELECT with CASE construction
I have a project with a MySQL database, and I would like to be able to upload various datasets. Say I am building a restaurant reviews aggregator. So we would like to keep adding all sources of restaurant reviews we could get our hands on, and keeping all the information.
I have a table review_sources
=========================
| id | name |
=========================
| 1 | Zagat |
| 2 | GoodEats Magazine|
| ... |
| 50 | Allergy News |
=========================
Now say I have a table reviews
=====================================================================
| id | Restaurant Name | source_id | Star Rating | Description |
=====================================================================
| 0 | Joey's Burgers | 1 | 3.5 | Wow! |
| 1 | Jamal's Steaks | 1 | 3.5 | Yummy! |
| 2 | Jenny's Crepes | 1 | 4.5 | Sweet! |
| .... |
| 253| Jeeva's Curries | 3 | 4 | Spicy! |
=====================================================================
Now suppose someone wants to add reviews from "Allergy News", they have a field "nut-free". Or a source of reviews could describe the degree of kashrut compliance, or halal compliance or vegan-friendliness. I as a designer don't know the possible optional fields future data sources may have. I want to be able to answer queries:
What are all the fields in the Zagat reviews?
For review id=x, what is value of the optional field "vegan-friendly"?
So how do I design a schema that can handle these disparate data sources and answer these queries? My reasons for not going for NoSQL are that I do want certain types of normalization, and that this is part of an existing MySQL based project.
I'd use a many-to-many relationship with a table containing a review_id, a field (e.g. "vegan-friendly") and the value of the field. Then of course a reviews_fields table to map one to the other.
Cheers
I have been working to build a more abstract schema, where there had been several tables modeling remarkably similar relationships, I want to model just the "essence". Due to the environment I am working with (Drupal 7), I can't change the nature of the issue: that a relationship of the same essential type could reference one of two different tables for the object in one role. Let's bring in some example to clarify (this is not my actual problem domain, but a similar problem). Here are the requirements:
First, if you are unfamiliar with Drupal, here's the gist: Users in one table, every other entity in a single second table (gross generalization, but enough).
Let's say we want to model the "works for" relationship, and lets have the given be that "companies" are of type "entity" and "supervisor" is of type "user" (and by "type" I mean that's the table in the database where their tuples reside). Here are the simplified requirements:
A user can work for a company
A company can work for a company
These "works for" relationships should be in the same table.
I have two ideas, and both don't exactly sit well with my current disposition toward schema quality, and this is where I would like some insight.
One foreign-key column paired with a 'type' column
Two foreign-key columns, always at most one utilized (ick!)
In case you are a visual thinker, here are the two options representing the fact that users 123 and 632, as well as entity 123 all work for entity 435:
Option 1
+---------------+-------------+---------------+-------------+
| employment_id | employee_id | employee_type | employer_id |
+---------------+-------------+---------------+-------------+
| 1 | 123 | user | 435 |
+---------------+-------------+---------------+-------------+
| 2 | 123 | entity | 435 |
+---------------+-------------+---------------+-------------+
| 3 | 632 | user | 435 |
+---------------+-------------+---------------+-------------+
Option 2
+---------------+------------------+--------------------+-------------+
| employment_id | employee_user_id | employee_entity_id | employer_id |
+---------------+------------------+--------------------+-------------+
| 1 | 123 | <NULL> | 435 |
+---------------+------------------+--------------------+-------------+
| 2 | <NULL> | 123 | 435 |
+---------------+------------------+--------------------+-------------+
| 3 | 632 | <NULL> | 435 |
+---------------+------------------+--------------------+-------------+
Thoughts on option 1: I like that the employee_id column has concrete role, but I despise that it has ambiguous target. Option 2 has ambiguous role (which column is the employee?), but has concrete target for any given FK, so I can think of it this way:
+-----------+-----------+----------+
| | ROLE |
| | ambiguous | concrete |
+-----------+-----------+----------+
| T | | |
| A ambig. | | 1 |
| R | | |
| G -------+-----------+----------+
| E | | |
| T concr. | 2 | ? |
| | | |
+-----------+-----------+----------+
Option two has very pragmatic benefits for my project, but I do not feel comfortable with so many nulls (you might not even call it 1NF!)
So here's the crux of my question for SO: How can option 1 be improved, or else what knowledge gap might I have that leaves me unsettled? While I can't bring to mind a specific rule which it violates, the design clearly is not in keeping with the intentions of normalization (requiring two columns to uniquely identify a relationship is not doing me any favors for safeguarding against anomalies).
I do understand that the ideal solution would be to redesign the users entity to be the same as what I have been calling "entity" here, but please consider that beside the point/circumstantial (or at least let's draw the pragmatic line right exactly there for this question).
Again, the essential question: What, in terms of normalization, is wrong with schema option 1, and how might you model this relationship given the constraint of not refactoring "user" into "entity"?
note: For this, I am more interested in theoretical purity than a pragmatic solution
The solutions you present contravene 4th normal form as #podiluska says. If this is recast into the form below, then the solution removes this difficult and is in 5NF (and even 6NF?).
Adopt one of the patterns for sub/super types. This uses the relation definitions set out below, plus the super/subtype constraint. This constraint is that each tuple in the super type relation must correspond exactly to one sub type tuple. In other words, the subtypes must form a disjoint, covering set over the supertype.
I suspect the performance of this in a real situation might require some heavy tuning:
Table: Employment
+---------------+-------------+
| employee_id | employer_id |
+---------------+-------------+
| 1 | 435 |
+---------------+-------------+
| 2 | 435 |
+---------------+-------------+
| 3 | 435 |
+---------------+-------------+
Table: Employee (SuperType)
+---------------+
| employee_id |
+---------------+
| 1 |
+---------------+
| 2 |
+---------------+
| 3 |
+---------------+
Table: User employee (SubType)
+---------------+-------------+
| employee_id | user_id |
+---------------+-------------+
| 1 | 123 |
+---------------+-------------+
| 3 | 632 |
+---------------+-------------+
Table: Entity employee (SubType)
+---------------+-------------+
| employee_id | entity_id |
+---------------+-------------+
| 2 | 123 |
+---------------+-------------+
What is wrong with option 1 ( and option 2) is that it is a multivalued dependency, and as such, a breach of 4th normal form. However, within the constraints you have given, there's not a lot you can do about that.
If you could replace the worksfor table with a view, then you could keep user-company and company-company relations separate.
Of your two choices, Option 2 has the advantage that it may be easier to enforce the referential integrity, depending on your platform.
One potential, if icky, pragmatic solution within you current constraints could be to give companies positive IDs and users negative IDs which eliminates the empty column of option 2 and turns the type column of option 1 into an implication, but I feel dirty even suggesting it.
Similarly, if you don't need to know what type the entity is as long as you can determine it via joining, then using Guids as IDs would eliminate the need for the type column