Aggregate with groupby but with distinct condition on the aggregated column [duplicate] - sql

This question already has answers here:
one to one distinct restriction on selection
(2 answers)
Closed 8 years ago.
I have encountered a problem like that. There is a Table A, I want to aggregate it using a `group by x order by diff(which is abs(x-y)) incrementally
x and y goes always incrementally. And x with smaller value will have the priority when two different x can paired with same y
x y diff
1 2 1
1 4 3
1 6 5
3 2 1
3 4 1
3 6 3
4.5 2 3.5
4.5 4 0.5
4.5 6 1.5
The aggregate function I want is:
take the y in each group which has the smallest difference with x(smallest diff value).
BUT that y which is taken can not be reused.(for example y=2 will be taken in (x=1) group so that can not be reused in (x=3) group)
Expected result:
x y diff
1 2 1
3 4 1
4.5 4 0.5
seems to be very tricky in plain SQL. I am using PostgreSQL. The real data will be much
complicated and longer than this idea-shooting example

If properly understood your question
test=# select * from A;
x | y | diff
---+---+------
1 | 2 | 1
1 | 4 | 3
1 | 6 | 5
3 | 2 | 1
3 | 4 | 1
3 | 6 | 3
5 | 2 | 3
5 | 4 | 1
5 | 6 | 1
(9 rows)
test=# SELECT MIN(x) AS x, y FROM A WHERE diff = 1 GROUP BY y ORDER BY x;
x | y
---+---
1 | 2
3 | 4
5 | 6
(3 rows)
SELECT MIN(x) AS x, y, MIN(diff) FROM A WHERE diff = 1 GROUP BY y ORDER BY x;
x | y | min
---+---+-----
1 | 2 | 1
3 | 4 | 1
5 | 6 | 1
(3 rows)
added MIN(diff) if not needed can be removed.

Try like this
t1 as table name
d as diff
with cte as (
select x, y,d from t1 where d=(select min(d) from t1) order by x )
select t1.x, min(t1.y), min(t1.d) from t1 inner join cte on
t1.x=cte.x and not t1.y in (select y from cte where cte.x<t1.x)
group by t1.x

This is more of a comment.
This problem essentially a graph problem, of finding the shortest set pairs between two discrete sets (x and y in this case). Technically, this is a maximum matching of a weighted bipartite graph (see [here][1]). I don't think this problem is NP-complete. But that still can make it hard to solve particularly in SQL.
Regardless of whether or not it is hard in the theoretical sense (NP-complete is considered "hard theoretically"), it is hard to do in SQL. One issue is that greedy algorithms don't work. The same "y" value might be closest to all the X values. Which one to choose? Well, the algorithm has to look further.
The only way that I can think to do this accurate in SQL is an exhaustive approach. That is, generate all possible combinations and then check for the one that meets your conditions. Finding all possible combinations requires generating N-factorial combinations of the X's (or Y's). That, in turn, requires a lot of computation. My first thought would be to use recursive CTEs for this. However, that would only work on small problems.

Related

Using temporary extended table to make a sum

From a given table I want to be able to sum values having the same number (should be easy, right?)
Problem: A given value can be assigned from 2 to n consecutive numbers.
For some reasons this information is stored in a single row describing the value, the starting number and the ending number as below.
TABLE A
id | starting_number | ending_number | value
----+-----------------+---------------+-------
1 2 5 8
2 0 3 5
3 4 6 6
4 7 8 10
For instance the first row means:
value '8' is assigned to numbers: 2, 3 and 4 (5 is excluded)
So, I would like the following intermediairy result table
TABLE B
id | number | value
----+--------+-------
1 2 8
1 3 8
1 4 8
2 0 5
2 1 5
2 2 5
3 4 6
3 5 6
4 7 10
So I can sum 'value' for elements having identical 'number'
SELECT number, sum(value)
FROM B
GROUP BY number
TABLE C
number | sum(value)
--------+------------
2 13
3 8
4 14
0 5
1 5
5 6
7 10
I don't know how to do this and didn't find any answer on the web (maybe not looking with appropriate key words...)
Any idea?
You can do what you want with generate_series(). So, TableB is basically:
select id, generate_series(starting_number, ending_number - 1, 1) as n, value
from tableA;
Your aggregation is then:
select n, sum(value)
from (select id, generate_series(starting_number, ending_number - 1, 1) as n, value
from tableA
) a
group by n;

Qlikview - Scatter chart dot colors dimension setup not working

I have some data that I want to display in scatter chart. I have the following two dimensions:
Dimension1: This is each record in the table - say unique id for each row. So the number of dots should be equal to number of records.
Dimension2: This is a combination of 2 columns. tp and vc. Colors of each dot is based on these 2 columns.
tp vc
1 a 1
2 b 2
3 c 1
So there will be dots of 3 colors based on the above tp and vc combinations. Then there are 3 expressions representing X and Y and Size of dot. I am not sure how to configure the dimensions to achieve the goal.
Thanks
You will need a calculated dimmension which is the concatanation expression defined as =tp & vc in your case.
Then this will be your single dimmension. Then your x,y,size expressions make up the remaining requirements for this chart.
This will give you three colors, one for each unique record combination and they will be labled a1 and b2 and c1.
id tp vc x y size
1 | a | 1 | 3 | 5 | 7
2 | b | 2 | 1 | 2 | 10
3 | c | 1 | 9 | 5 | 5

Returning all children with a recursive select

Good day everyone! I've got a graph. First, I know how to build simple recursive selections. I read some info on msdn.
In this image you can see that (for example) the top node of the graph, which is numbered 0, influences node number 1 (etc (2->4), (3->4), (4->5), (5->6), (1->5))
TASK: for every node show nodes which it influences. For example,
number 1 influences 5 and 6.
The result SQL must return something like this:
who_acts| on_whom_influence
0 | 1
0 | 5
0 | 6
1 | 5
1 | 6
2 | 4
2 | 5
2 | 6
3 | 4
3 | 5
3 | 6
4 | 5
4 | 6
5 | 6
Starting data that I can get using anchor member of CTE are:
who_acts| on_whom_influence
2 | 4
3 | 4
4 | 5
5 | 6
1 | 5
0 | 1
Can I make this selection using SQL syntax and a recursive select? How can I do it?
That sounds like a straightforward CTE. You can pass along the root of the influence in a separate column:
; with Influence as
(
select who_acts
, on_whom_influence
, who_acts as root
from dbo.YourTable
union all
select child.who_acts
, child.on_whom_influence
, parent.root
from Influence parent
join dbo.YourTable child
on parent.on_whom_influence = child.who_acts
)
select root
, on_whom_influence
from Influence
order by
root
, on_whom_influence
Example on SQL Fiddle.

Pairwise testing: How to create the table?

Hello I have doubt regarding how to create the table for the pairwise testing.
For example if I have three parameter which can each attain two different values. How do I create a table of input with all possible combination then? Would it look something like this?
| 1 2 3
-----------
1 | 1 1 1
2 | 1 2 2
3 | 1 1 2
4 | 1 2 1
Does each parameter corresponds to each column?
However since I have 3 parameter, which each can take 2 different value. The number of test cases should be 2^3 isn't it?
There's a good article with links to some useful tools here:
http://blog.josephwilk.net/ruby/pairwise-testing-with-cucumber.html
For the parameters: each column is a parameter, and each row is a possible combination. Here is the table:
| 1 2 3
-----------
1 | 1 1 1
2 | 2 1 1
3 | 1 2 1
4 | 1 1 2
5 | 2 2 1
6 | 2 1 2
7 | 1 2 2
8 | 2 2 2
so 2^3=8 possible combinations as you can see :)
For the values: each column is a value, and each row is a possible combination:
| 1 2
--------
1 | 1 1
2 | 2 1
3 | 1 2
4 | 2 2
They are 2^2=4 possible combinations. Hope it helps.
1) Please note that pair-wise testing is not about scanning exhaustively all possible combination of values of all parameters. Firstly, such a scanning would give you an enormous amount of test cases that almost no existing system could be able to run all of them.
Secondly, pair-wise testing for a software system is based on the hope that the two parameters having the highest number of possible values are the culprit for the highest percentage of faults of that system.
This is of course only a hope and almost no rigorous scientific research has existed so far to prove that.
2) What I often see in the documentations discussing pair wise testing, like this is that the list of all possible values (aka the pair-wise test table) is not constructed in a well thought way. This creates confusions.
In your case, all the parameters have the same number of possible values (2 values), therefore you could choose any two parameters of those three to build the table. What you could pay attention is the ordering of the combination: you iterate first the top-right parameter, then the next parameter to the left, and so on, ...
Say if you have two parameters p1 and p2, p1 has two possible values apple and orange; and p2 has two possible values red and blue, then your pair-wise test table would be:
index| p1 p2
------------------
1 | apple red
2 | apple blue
3 | orange red
4 | orange blue

Return the last sub sorted row in a table (sql)

It's quiet hard to describe this problem but it's easy to see it graphically:
x y
1 1
2 1
3 1
* 4 1 *
5 2
* 6 2 *
7 3
8 3
9 3
* 10 3 *
I have sorted a table by x, then sub-sorted by y. I need to return the x value of the last item in the sub-sorted table (the stared rows).
I'm aware of the LAST command, but I don't know how to apply this recursively i.e. to each sub-sorted section.
Best,
Dan
SELECT y, Max(x) FROM [table] group by Y