sql problem,challenge - sql

I want to get
id a b c
--------------------
1 1 100 90
6 2 50 100
...from:
id a b c
--------------------
1 1 100 90
2 1 300 50
3 1 200 20
4 2 200 30
5 2 300 70
6 2 50 100
It's the row with the smallest b group by a.
How to do it with sql?
EDIT
I thought it can be achieved by
select * from table group by a having min(b);
which I found later it's wrong.
But is it possible to do it with having statement?
I'm using MySQL

SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON (t1.a=t2.a AND t1.b>t2.b)
WHERE t2.a IS NULL;
This works because there should be no matching row t2 with the same a and a lesser b.
update: This solution has the same issue with ties that other folks have identified. However, we can break ties:
SELECT t1.*
FROM mytable t1
LEFT OUTER JOIN mytable t2
ON (t1.a=t2.a AND (t1.b>t2.b OR t1.b=t2.b AND t1.id>t2.id))
WHERE t2.a IS NULL;
Assuming for instance that in the case of a tie, the row with the lower id should be the row we choose.
This doesn't do the trick:
select * from table group by a having min(b);
Because HAVING MIN(b) only tests that the least value in the group is not false (which in MySQL means not zero). The condition in a HAVING clause is for excluding groups from the result, not for choosing the row within the group to return.

In MySQL:
select t1.* from test as t1
inner join
(select t2.a, min(t2.b) as min_b from test as t2 group by t2.a) as subq
on subq.a=t1.a and subq.min_b=t1.b;
Here is the proof:
mysql> create table test (id int unsigned primary key auto_increment, a int unsigned not null, b int unsigned not null, c int unsigned not null) engine=innodb;
Query OK, 0 rows affected (0.55 sec)
mysql> insert into test (a,b,c) values (1,100,90), (1,300,50), (1,200,20), (2,200,30), (2,300,70), (2,50,100);
Query OK, 6 rows affected (0.39 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql> select * from test;
+----+---+-----+-----+
| id | a | b | c |
+----+---+-----+-----+
| 1 | 1 | 100 | 90 |
| 2 | 1 | 300 | 50 |
| 3 | 1 | 200 | 20 |
| 4 | 2 | 200 | 30 |
| 5 | 2 | 300 | 70 |
| 6 | 2 | 50 | 100 |
+----+---+-----+-----+
6 rows in set (0.00 sec)
mysql> select t1.* from test as t1 inner join (select t2.a, min(t2.b) as min_b from test as t2 group by t2.a) as subq on subq.a=t1.a and subq.min_b=t1.b;
+----+---+-----+-----+
| id | a | b | c |
+----+---+-----+-----+
| 1 | 1 | 100 | 90 |
| 6 | 2 | 50 | 100 |
+----+---+-----+-----+
2 rows in set (0.00 sec)

Use:
SELECT DISTINCT
x.*
FROM TABLE x
JOIN (SELECT t.a,
MIN(t.b) 'min_b'
FROM TABLE T
GROUP BY t.a) y ON y.a = x.a
AND y.min_b = x.b

You're right. select min(b), a from table group by a. If you want the entire row, then you've use analytics function. That depends on database s/w.

It depends on the implementation, but this is usually faster than the self-join method:
SELECT id, a, b, c
FROM
(
SELECT id, a, b, c
, ROW_NUMBER() OVER(PARTITION BY a ORDER BY b ASC) AS [b IN a]
) As SubqueryA
WHERE [b IN a] = 1
Of course it does require that you SQL implementation be fairly up-to-date with the standard.

Related

select all rows by distinct values with limit on distinct values

Let's say we have 2 tables:
Table1: Table2:
id | t2id id | col
---------- ----------
1 | 1 1 | a
2 | 2 2 | b
3 | 2 3 | c
4 | 1 4 | d
5 | 3 5 | e
6 | 3 6 | f
7 | 4 7 | g
8 | 5 8 | h
9 | 1 9 | i
10 | 4 10 | j
My question is:
Is there any short way to put limit for distinct results of Table1.t2id column?
For example: if limit = 2 then all rows with t2id from 1 to 2 (or any other values) are selected.
Expected result (with limit = 2):
Res:
id | t2id
----------
1 | 1
2 | 2
3 | 2
4 | 1
9 | 1
Note:
Any information or suggestion are accepted
You could use just the where clause
Select id,t2id
from table1
where t2id<=2
Or you can use where .. between
Select id,t2id
from table1
where t2id between 1 and 2
I believe you want to:
Create a subquery with all the columns you need + this one: DENSE_RANK() OVER (ORDER BY Table1.t2id) AS MyRank
outside of the sub-query, add a where on MyRank
Complete solution:
SELECT id, tb2id
FROM (
SELECT id, tb2id, DENSE_RANK() OVER (ORDER BY Table1.t2id) AS MyRank
FROM table1
) MySubQuery
WHERE MyRank <= 2
This will adapt to JOINs with table2 (with potential multiplicity increase) and non-consecutive values in tb2id.
You can also use in:
select t1.*
from table1 t1
where t1.t2_id in (select t2.id from table2 t2 limit 2);
The advantage of this approach is that it is easy to make it random:
select t1.*
from table1 t1
where t1.t2_id in (select t2.id from table2 t2 order by random() limit 2);

SQL that returns all the permutations of a summed column

Shot in the dark here. I'd personally struggle to come up with a simple SQL statement to do the following (if it can even be done), so I thought I'd throw this out here:
Let's say we have the following data:
ID VALUE
-- -----
1 60
2 60
3 60
4 60
And I wanted to find all the permutations of records that SUM to 120. Meaning, the results would be 6 rows:
1 AND 2
1 AND 3
1 AND 4
--2 AND 1 (already used)
2 AND 3
2 AND 4
--3 AND 1 (already used)
--3 AND 2 (already used)
3 AND 4
They actually want a "random sampling" of that result-set, but I need to know if I can even get that result-set. Of course, the real data wouldn't be that easy (everything 60), and the question was posed as "10 records that add up to 5 minutes" (the field is a duration field), which leads to other questions on how to handle that, but let me see if I can start with just getting permutations before actually getting more sophisticated.
Thanks.
These are combinations, not permutations. If you want all 2-way combinations, then use a self-join:
select t1.*, t2.*
from t t1 join
t t2
on t1.id < t2.id and
t1.value + t2.value = 60;
For an about 10% random sample, you can use:
select t1.*, t2.*
from t t1 join
t t2
on t1.id < t2.id and
t1.value + t2.value = 60
where rand() < 0.1;
select l.id, r.id, l.value+r.value as sum
from t l
inner join t r
on l.id < r.id
where l.value+r.value = 120
order by l.id, r.id
rextester demo: http://rextester.com/FWCLT49699
returns:
+----+----+-----+
| id | id | sum |
+----+----+-----+
| 1 | 2 | 120 |
| 1 | 3 | 120 |
| 1 | 4 | 120 |
| 2 | 3 | 120 |
| 2 | 4 | 120 |
| 3 | 4 | 120 |
+----+----+-----+
Table Tvalues
ID VALUE
-- -----
1 60
2 60
3 60
4 60
Select A.ID, B.ID from TValues A
join TValues B on B.ID != A.ID
where
(A.Value+B.Value) = 120
and
A.ID < B.ID -- eliminates dups, if (1,3) is printed, (3,1 will not be printed)

Discard rows which is not MAX in that group

I have data like this:
a b c
-------|--------|--------
100 | 3 | 50
100 | 4 | 60
101 | 3 | 70
102 | 3 | 70
102 | 4 | 80
102 | 5 | 90
a : key
b : sub_id
c : value
I want to NULL the c row for each element which has non-max a column.
My resulting table must look like:
a b c
-------|--------|--------
100 | 3 | NULL
100 | 4 | 60
101 | 3 | 70
102 | 3 | NULL
102 | 4 | NULL
102 | 5 | 90
How can I do this with an SQL Query?
#UPDATE
My relational table has about a billion rows. Please remind that while providing an answer. I cannot wait couple of hours or 1 day for executing.
Updated after the requirement was changed to "update the table":
with max_values as (
select a,
b,
max(c) over (partition by a) as max_c
from the_table
)
update the_table
set c = null
from max_values mv
where mv.a = the_table.a
and mv.b = the_table.b
and mv.max_c <> the_table.c;
SQLFiddle: http://sqlfiddle.com/#!15/1e739/1
Another possible solution, which might be faster (but you need to check the execution plan)
update the_table t1
set c = null
where exists (select 1
from the_table t2
where t2.a = t1.a
and t2.b = t2.b
and t1.c < t2.c);
SQLFiddle: http://sqlfiddle.com/#!15/1e739/2
But with "billion" rows there is no way this is going to be really fast.
DECLARE #TAB TABLE (A INT,B INT,C INT)
INSERT INTO #TAB VALUES
(100,3,50),
(100,4,60),
(101,3,70),
(102,3,70),
(102,4,80),
(102,5,90)
UPDATE X
SET C = NULL
FROM #TAB X
LEFT JOIN (
SELECT A,MAX(C) C
FROM #TAB
GROUP BY A) LU ON X.A = LU.A AND X.C = LU.C
WHERE LU.A IS NULL
SELECT * FROM #TAB
Result:
this approach will help you
How about this formulation?
select a, b,
(case when c = max(c) over (partition by a) then c end) as c
from table t;
I'm not sure if you can get this faster. An index on a, c might help.
SELECT a, b,
CASE ROW_NUMBER() OVER (PARTITION BY a ORDER BY b DESC) WHEN 1 THEN с END c
FROM mytable

Select greatest number of unique pairs from table

I have the following table:
| a | b |
|---|---|
| 2 | 4 | x
| 2 | 5 |
| 3 | 1 | x
| 6 | 4 |
| 6 | 5 | x
| 7 | 5 |
| 7 | 4 |
|---|---|
I want to select the greatest number of unique pairs possible where neither a or b are repeated. So the entries with x's next to them should be what the select would grab. Any ideas how to do this?
Currently I have some SQL that will do the opposite, select those that aren't unique and delete them but it has not been working the way I want it to. This is the SQL I have right now, but I think I'm going to scrap it and work at it from the angle I have stated above.
delete t
from #temp2 t
where (exists(select * from #temp2
where (b = t.b
and a < t.a))
or exists(select * from #temp2
where a = t.a
and (b < t.b and ) and
(not exists(select * from #temp2
where b = t.b
and a < t.a)
or not exists(select * from #temp2
where a = t.a
and b < t.b))
Thanks!
I'm assuming here that being non-unique and unique are mutually exclusive and will encompass all records in your table. If so, use your existing script, write it to a CTE, then join to the CTE from your source table selecting those records not in the CTE.
With Non_Unique_Records as (
--Insert your existing script here
)
Select t.a
, t.b
From #temp2 t
Left Outer Join Non_Unique_Records CTE
on t.a = CTE.a
and t.b = CTE.b
Where CTE.b is null
Then just delete the records that the Select statement returns.

SELECT query with cross rows WHERE statement

I'll try to explain the type of the query that I want:
Assume I have a table like this:
| ID | someID | Number |
|----|--------|--------|
| 1 | 1 | 10 |
| 2 | 1 | 11 |
| 3 | 1 | 14 |
| 4 | 2 | 10 |
| 5 | 2 | 13 |
Now, I want to find the someID that have a specific numbers (For example query for numbers 10, 11, 14 will return someID 1 and query for numbers 10, 13 will return 2). But, if someID contains all the query numbers but also more numbers, it will not return by the query. (For example query for 10, 11 will return nothing).
Is it possible?
SELECT t1.someId
FROM yourTable t1
WHERE t1.number IN (10,14,11)
GROUP BY t1.someID
HAVING COUNT(DISTINCT t1.ID) = (SELECT COUNT(DISTINCT t2.ID) FROM yourTable t2 WHERE t1.someID=t2.someID)
Example Fiddle
select someID
from yourtable
where number in (10,11,14)
and not exists (select * from yourtable t2 where number not in(10,11,14)
and t2.someid=yourtable.someid)
group by someID
having count(distinct ID) = 3
Where 3 is the number of items you are querying for
Yes, once you get the query numbers into a table variable (say it's called #QNums, with one column named QNum)) try
Select distinct someId
From table t
Where exists (Select * from #QNums
where QNum = t.Number)
And not Exists (Select * From table t2
Where someId = t.someId
And not exists(Select * From #QNums
where QNum = t3.Number))