Query to select rows with minimum distinct value of a column

Query to select rows with minimum distinct value of a column - sql

I need to select row with minimum value of column B for each row of column A but it should be distinct from the other values that so far have been selected for column A. So the order of A maters. Also if the B is used up and none is left then the later values for A should be NULL or not appearing in the result.
Both A and B are numerical (or time stamp).
example:
A | B |
----+---+
1 | 3 |
1 | 5 |
1 | 6 |
2 | 3 |
2 | 5 |
9 | 3 |
9 | 5 |
So the desired result is:
A | B |
----+---+
1 | 3 |
2 | 5 |
select A, min(B) group by A obviously doesn't work because I don't want B to be repeated. Distinct also doesn't work because the rows are already distinct. I couldn't really find any question similar to this anywhere.
The actual data I am working with is the database of timeseries on redshift so A and B are timestamps. CTE's would be specifically welcome.

First I thought this could be solved with ROW_NUMBER () OVER (ORDER PARTITION BY B DESC) however there is a problem, the numbers in B should not be repeated.
At the moment the only thing that comes to mind is to make temporary tables, I know this is not the best way, but you can probably improve it
DECLARE #Tabla1 TABLE(A INT)
DECLARE #Tabla2 TABLE(B INT)
DECLARE #Tabla3 TABLE(A INT, B INT)
INSERT INTO #Tabla1 SELECT DISTINCT A FROM PRUEBA
WHILE (SELECT COUNT(*) FROM #Tabla1) > 0
BEGIN
DECLARE #A INT, #B INT;
SET #A = (SELECT TOP 1 * FROM #Tabla1);
SET #B = (SELECT MIN(B) FROM PRUEBA WHERE A = #A AND B NOT IN(SELECT * FROM #Tabla2));
INSERT INTO #Tabla2 VALUES (#B)
DELETE FROM #Tabla1 WHERE A = #A
INSERT INTO #Tabla3 SELECT A, B FROM PRUEBA WHERE A = #A AND B = #B
END
SELECT * FROM #Tabla3
Maybe you can use a cursor, but you would have to be calculated that takes more computational expense, the cursor or the temporary tables

This is basically a "find the diagonal" problem. You need to know the rank of B within A and the rank of A within all. I believe this works for the data given:
select A, B from (
select row_number() over (partition by A order by B) as RN,
dense_rank() over (order by A) as DR.
A, B
from <table> )
where RN = DR;
For more complex cases this solution will get more complex.
Addendum:
Because I know it will be asked and this is an interesting problem, I worked out what such a more complex solution would look like:
select min(A) as A, B from (
select decode(A <> nvl(min(A) over (order by DRB, DRA rows between unbounded preceding and 1 preceding),-1), true, 'good', 'no good') as Y,
A, B from (
select dense_rank() over (partition by B order by A) as DRA,
dense_rank() over ( order by B) as DRB,
A, B from <table>
)
where DRA <= DRB
)
where Y = 'good'
group by B
order by A, B;

Related

SQL select a row X times and insert into new

I am trying to migrate a bunch of data from an old database to a new one, the old one used to just have the number of alarms that occurred on a single row. The new database inserts a new record for each alarm that occurs. Here is a basic version of how it might look. I want to select each row from Table 1 and insert the number of alarm values as new rows into Table 2.
Table 1:
| Alarm ID | Alarm Value |
|--------------|----------------|
| 1 | 3 |
| 2 | 2 |
Should go into the alarm table as the below values.
Table 2:
| Alarm New ID | Value |
|--------------|----------|
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
I want to create a select insert script that will do this, so the select statement will bring back the number of rows that appear in the "Value" column.

A recursive CTE can be convenient for this:
with cte as (
select id, alarm, 1 as n
from t
union all
select id, alarm, n + 1
from cte
where n < alarm
)
select row_number() over (order by id) as alarm_id, id as value
from cte
order by 1
option (maxrecursion 0);
Note: If your values do not exceed 100, then you can remove OPTION (MAXRECURSION 0).

Replicate values out with a CTE.
DECLARE #T TABLE(AlarmID INT, Value INT)
INSERT #T VALUES
(1,3),
(2,2)
;WITH ReplicateAmount AS
(
SELECT AlarmID, Value FROM #T
UNION ALL
SELECT R.AlarmID, Value=(R.Value - 1)
FROM ReplicateAmount R
INNER JOIN #T T ON R.AlarmID = T.AlarmID
WHERE R.Value > 1
)
SELECT
AlarmID = ROW_NUMBER() OVER( ORDER BY AlarmID),
Value = AlarmID --??
FROM
ReplicateAmount
ORDER BY
AlarmID
This answers your question. I would think the query below would be more useful, however, you did not include usage context.
SELECT
AlarmID,
Value
FROM
ReplicateAmount
ORDER BY
AlarmID

Rather than using an rCTE, which is recursive (as the name suggests) and will fail at 100 rows, you can use a Tally table, which tend to be far faster as well:
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3)
SELECT ROW_NUMBER() OVER (ORDER BY V.AlarmID,T.I) AS AlarmNewID,
V.AlarmID
FROM (VALUES(1,3),(2,2))V(AlarmID,AlarmValue)
JOIN Tally T ON V.AlarmValue >= T.I;

Two rows with the same id and two different values, getting the second value into another column

I have two rows with the same id but different values. I want a query to get the second value and display it in the first row.
There are only two rows for each productId and 2 different values.
I've tried looking for this for the solution everywhere.
What I have, example:
+-----+-------+
| ID | Value |
+-----+-------+
| 123 | 1 |
| 123 | 2 |
+-----+-------+
What I want
+------+-------+---------+
| ID | Value | Value 1 |
+------+-------+---------+
| 123 | 1 | 2 |
+------+-------+---------+

Not sure whether order matters to you. Here is one way:
SELECT MIN(Value), MAX(Value), ID
FROM Table
GROUP BY ID;

This is a self-join:
SELECT a.ID, a.Value, b.Value
FROM table a
JOIN table b on a.ID = b.ID
and a.Value <> b.Value
You can use a LEFT JOIN instead if there are IDs that only have one value and would be lost by the above JOIN

May be you may try this
DECLARE #T TABLE
(
Id INT,
Val INT
)
INSERT INTO #T
VALUES(123,1),(123,2),
(456,1),(789,1),(789,2)
;WITH CTE
AS
(
SELECT
RN = ROW_NUMBER() OVER(PARTITION BY Id ORDER BY Val),
*
FROM #T
)
SELECT
*
FROM CTE
PIVOT
(
MAX(Val)
FOR
RN IN
(
[1],[2]--Add More Numbers here if there are more values
)
)Q

SQL SELECT Convert Min/Max into Separate Rows

I have a table that has a min and max value that I'd like create a row for each valid number in a SELECT statement.
Original table:
| Foobar_ID | Min_Period | Max_Period |
---------------------------------------
| 1 | 0 | 2 |
| 2 | 1 | 4 |
I'd like to turn that into:
| Foobar_ID | Period_Num |
--------------------------
| 1 | 0 |
| 1 | 1 |
| 1 | 2 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
The SELECT results need to come out as one result-set, so I'm not sure if a WHILE loop would work in my case.

If you expect just a handful of rows per foobar, then this is a good opportunity to learn about recursive CTEs:
with cte as (
select foobar_id, min_period as period_num, max_period
from original t
union all
select foobar_id, min_period + 1 as period_num, max_period
from cte
where period_num < max_period
)
select foobar_id, period_num
from cte
order by foobar_id, period_num;
You can extend this to any number of periods by setting the MAXRECURSION option to 0.

One method would be to use a Tally table, ther's plenty of examples out there, but I'm going to create a very small one in this example. Then you can JOIN onto that and return your result set.
--Create the Tally Table
CREATE TABLE #Tally (I int);
WITH ints AS(
SELECT 0 AS i
UNION ALL
SELECT i + 1
FROM ints
WHERE i + 1 <= 10)
--And in the numbers go!
INSERT INTO #Tally
SELECT i
FROM ints;
GO
--Create the sample table
CREATE TABLE #Sample (ID int IDENTITY(1,1),
MinP int,
MaxP int);
--Sample data
INSERT INTO #Sample (Minp, MaxP)
VALUES (0,2),
(1,4);
GO
--And the solution
SELECT S.ID,
T.I AS P
FROM #Sample S
JOIN #Tally T ON T.I BETWEEN S.MinP AND S.MaxP
ORDER BY S.ID, T.I;
GO
--Clean up
DROP TABLE #Sample;
DROP TABLE #Tally;

Depending on the size of the data and the range of the period, the easiest way to do this is to use a dynamic number fact table, as follows:
WITH rn AS (SELECT ROW_NUMBER() OVER (ORDER BY object_id) -1 as period_num FROM sys.objects)
SELECT f.foobar_id, rn.period_num
FROM foobar f
INNER JOIN rn ON rn.period_num BETWEEN f.min_period AND f.max_period
However, if you're working with a larger volume of data, it will be worth creating a number fact table with an index. You can even use a TVV for this:
-- Declare the number fact table
DECLARE #rn TABLE (period_num INT IDENTITY(0, 1) primary key, dummy int)
-- Populate the fact table so that all periods are covered
WHILE (SELECT COUNT(1) FROM #rn) < (SELECT MAX(max_period) FROM foobar)
INSERT #rn select 1 from sys.objects
-- Select using a join to the fact table
SELECT f.foo_id, rn.period_num
FROM foobar f
inner join #rn rn on rn.period_num between f.min_period and f.max_period

Just Create a function sample date and use it
CREATE FUNCTION [dbo].[Ufn_GetMInToMaxVal] (#Min_Period INT,#Max_Period INT )
RETURNS #OutTable TABLE
(
DATA INT
)
AS
BEGIN
;WIth cte
AS
(
SELECT #Min_Period As Min_Period
UNION ALL
SELECT Min_Period+1 FRom
cte
WHERE Min_Period < #Max_Period
)
INSERT INTO #OutTable
SELECT * FROM cte
RETURN
END
Get the result by executing sql statement
DECLARE #Temp AS TABLE(
Foobar_ID INT,
Min_Period INT,
Max_Period INT
)
INSERT INTO #Temp
SELECT 1, 0,2 UNION ALL
SELECT 2, 1,4
SELECT Foobar_ID ,
DATA
FROM #Temp
CROSS APPLY
[dbo].[Ufn_GetMInToMaxVal] (Min_Period,Max_Period)
Result
Foobar_ID DATA
----------------
1 0
1 1
1 2
2 1
2 2
2 3
2 4

T-SQL - Get a list of all As which have the same set of Bs

I'm struggling with a tricky SQL query that I'm trying to write. Have a look at the following table:
+---+---+
| A | B |
+---+---+
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 3 | 2 |
| 3 | 3 |
| 4 | 2 |
| 4 | 3 |
| 4 | 4 |
+---+---+
Now, from this table, I essentially want a list of all As which have the exact same set of Bs and give each set an incrementing ID.
Hence, the output set for the above would be:
+---+----+
| A | ID |
+---+----+
| 1 | 1 |
| 3 | 1 |
| 2 | 2 |
| 4 | 2 |
+---+----+
Thanks.
Edit: If it helps, I have a list of all distinct values of B that are possible in another table.
Edit: Thank you so much for all the innovative answers. Was able to learn a lot indeed.

Here is mathematical trick to solve your tricky select:
with pow as(select *, b * power(10, row_number()
over(partition by a order by b)) as rn from t)
select a, dense_rank() over( order by sum(rn)) as rn
from pow
group by a
order by rn, a
Fiddle http://sqlfiddle.com/#!3/6b98d/11
This of course will work only for limited distinct count as you will get overflow. Here is more general solution with strings:
select a,
dense_rank() over(order by (select '.' + cast(b as varchar(max))
from t t2 where t1.a = t2.a
order by b
for xml path(''))) rn
from t t1
group by a
order by rn, a
Fiddle http://sqlfiddle.com/#!3/6b98d/29

Something like this:
select a, dense_rank() over (order by g) as id_b
from (
select a,
(select b from MyTable s where s.a=a.a order by b FOR XML PATH('')) g
from MyTable a
group by a
) a
order by id_b,a
Or maybe using a CTE (I avoid them when possible)
Sql Fiddle
As a side note, this is the output of the inner query using the sample data in the question:
a g
1 <b>2</b><b>3</b>
2 <b>2</b><b>3</b><b>4</b>
3 <b>2</b><b>3</b>
4 <b>2</b><b>3</b><b>4</b>

Here's a long winded approach, by finding sets with the same elements (using EXCEPT bidirectionally to eliminate, and just done a half diagonal cartesian product), then pairing equal sets up, stamping each pair with a ROW_NUMBER(), before unpivoting the pairs of A's into to your final output where the equivalent sets are projected as rows which have the same id.
WITH joinedSets AS
(
SELECT t1.A as t1A, t2.A AS t2A
FROM MyTable t1
INNER JOIN MyTable t2
ON t1.B = t2.B
AND t1.A < t2.A
),
equalSets AS
(
SELECT js.t1A, js.t2A, ROW_NUMBER() OVER (ORDER BY js.t1A) AS Id
FROM joinedSets js
GROUP BY js.t1A, js.t2A
HAVING NOT EXISTS ((SELECT mt.B FROM MyTable mt WHERE mt.A = js.t1A)
EXCEPT (SELECT mt.B FROM MyTable mt WHERE mt.A = js.t2A))
AND NOT EXISTS ((SELECT mt.B FROM MyTable mt WHERE mt.A = js.t2A)
EXCEPT (SELECT mt.B FROM MyTable mt WHERE mt.A = js.t1A))
)
SELECT A, Id
FROM equalSets
UNPIVOT
(
A
FOR ACol in (t1A, t2A)
) unp;
SqlFiddle here
As it stands, this solution will only work with pairs of sets, not triples etc. A general NTuple type solution is probably possible (but beyond my brain right now).

Here is a very simple, fast, but approximate solution.
It is possible that CHECKSUM_AGG returns the same checksum for different sets of B.
DECLARE #T TABLE (A int, B int);
INSERT INTO #T VALUES
(1, 2),(1, 3),(2, 2),(2, 3),(2, 4),(3, 2),(3, 3),(4, 2),(4, 3),(4, 4);
SELECT
A
,CHECKSUM_AGG(B) AS CheckSumB
,ROW_NUMBER() OVER (PARTITION BY CHECKSUM_AGG(B) ORDER BY A) AS GroupNumber
FROM #T
GROUP BY A
ORDER BY A, GroupNumber;
Result set
A CheckSumB GroupNumber
-----------------------------
1 1 1
2 5 1
3 1 2
4 5 2
For exact solution group by A and concatenate all B values into a long (binary) string using either FOR XML, CLR, or T-SQL function. Then you can partition ROW_NUMBER by that concatenated string to assign numbers to the groups. As shown in other answers.

EDIT
I am changing the code, but it will get bigger now, took help from
Concatenate many rows into a single text string? for concatinating strings
Select [A],
Left(M.[C],Len(M.[C])-1) As [D] into #tempSomeTable
From
(
Select distinct T2.[A],
(
Select Cast(T1.[B] as VARCHAR) + ',' AS [text()]
From sometable T1
Where T1.[A] = T2.[A]
ORDER BY T1.[A]
For XML PATH ('')
) [C]
From sometable T2
)M
SELECT t.A, DENSE_RANK() OVER(ORDER BY t.[D]) [ID] FROM
#tempSomeTable t
inner join
(SELECT [D] FROM(
SELECT [D], COUNT([A]) [D_A] from
#tempSomeTable t
GROUP BY [D] )P where [C_A]>1)t1 on t1.[D]=t.[D]

Here is an exact, rather than approximate, solution. It uses nothing more advanced than INNER JOIN and GROUP BY (and, of course, the DENSE_RANK() to get the ID you want).
It is also general, in that it allows for B values to be repeated within an A group.
SELECT A,
DENSE_RANK() OVER (ORDER BY MIN_EQUIVALENT_A) AS ID
FROM (
SELECT MATCHES.A1 AS A,
MIN(MATCHES.A2) AS MIN_EQUIVALENT_A
FROM (
SELECT T1.A AS A1,
T2.A AS A2,
COUNT(*) AS NUM_B_VALS_MATCHED
FROM (
SELECT A,
B,
COUNT(*) AS B_VAL_FREQ
FROM MyTable
GROUP BY A,
B
) AS T1
INNER JOIN
(
SELECT A,
B,
COUNT(*) AS B_VAL_FREQ
FROM MyTable
GROUP BY A,
B
) AS T2
ON T1.B = T2.B
AND T1.B_VAL_FREQ = T2.B_VAL_FREQ
GROUP BY T1.A,
T2.A
) AS MATCHES
INNER JOIN
(
SELECT A,
COUNT(DISTINCT B) AS NUM_B_VALS_TOTAL
FROM MyTable
GROUP BY A
) AS CHECK_TOTALS_A1
ON MATCHES.A1 = CHECK_TOTALS_A1.A
AND MATCHES.NUM_B_VALS_MATCHED
= CHECK_TOTALS_A1.NUM_B_VALS_TOTAL
INNER JOIN
(
SELECT A,
COUNT(DISTINCT B) AS NUM_B_VALS_TOTAL
FROM MyTable
GROUP BY A
) AS CHECK_TOTALS_A2
ON MATCHES.A2 = CHECK_TOTALS_A2.A
AND MATCHES.NUM_B_VALS_MATCHED
= CHECK_TOTALS_A2.NUM_B_VALS_TOTAL
GROUP BY MATCHES.A1
) AS EQUIVALENCE_TABLE
ORDER BY 2,1
;

Populate "Lookup Table" with random values

I have three tables, A B and C. For every entry in A x B (where x is a Cartesian product, or cross join) there is an entry in C.
In other words, the table for C might look like this, if there were 2 entries for A and 3 for B:
| A_ID | B_ID | C_Val |
----------------------|
| 1 | 1 | 100 |
| 1 | 2 | 56 |
| 1 | 3 | 19 |
| 2 | 1 | 67 |
| 2 | 2 | 0 |
| 2 | 3 | 99 |
Thus, for any combination of A and B, there's a value to be looked up in C. I hope this all makes sense.
In practice, the size of A x B may be relatively small for a database, but far too large to populate by hand for testing data. Thus, I would like to randomlly populate C's table for whatever data may already be in A and B.
My knowledge of SQL is fairly basic. What I've determined I can do so far is get that cartesian product as an inner query, like so:
(SELECT B.B_ID, C.C_ID
FROM B CROSS JOIN C)
Then I want to say something like follows:
INSERT INTO A(B_ID, C_ID, A_Val) VALUES
(SELECT B.B_ID, C.C_ID, FLOOR(RAND() * 100)
FROM B CROSS JOIN C)
Not surprisingly, this doesn't work. I don't think its valid syntax to genereate a column on the fly like that, nor to try to insert a whole table as values.
How can I basically convert this normal programming pseudocode to proper SQL?
foreach(A_ID in A){
foreach(B_ID in B){
C.insert(A_ID, B_ID, Rand(100));
}
}

The syntax problem is because:
INSERT INTO A(B_ID, C_ID, A_Val) VALUES
(SELECT B.B_ID, C.C_ID, FLOOR(RAND() * 100)
FROM B CROSS JOIN C)
Should be:
INSERT INTO A(B_ID, C_ID, A_Val)
SELECT B.B_ID, C.C_ID, FLOOR(RAND() * 100)
FROM B CROSS JOIN C;
(You don't use VALUES with INSERT/SELECT.)
However you will still have the problem that RAND() is not evaluated for every row; it will have the same value for every row. Assuming the combination of B_ID and C_ID is unique, you can use something like this:
INSERT INTO A(B_ID, C_ID, A_Val)
SELECT B.B_ID, C.C_ID, ABS(CHEKSUM(RAND(B.B_ID*C.C_ID))) % 100
FROM B CROSS JOIN C;

select A_id,B_Id, abs(checksum(newid()))%101 as C_val from A cross join B
This will give you different values in ranmge 0 to 100

Use CTE
With cte as
(SELECT B.B_ID, C.C_ID, ABS(CAST(CAST(NEWID() AS VARBINARY) AS INT)) as A_Val
FROM B CROSS JOIN C)
Insert into Table(B_ID, C_ID, A_Val)
Select B_ID,C_ID,A_Val from cte
Since rand generates the same number you can use NEWID .Source

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Query to select rows with minimum distinct value of a column - sql

Related

SQL select a row X times and insert into new

Two rows with the same id and two different values, getting the second value into another column

SQL SELECT Convert Min/Max into Separate Rows

T-SQL - Get a list of all As which have the same set of Bs

Populate "Lookup Table" with random values

Categories

Resources