How to do equivalent of "limit distinct"? - sql

How can I limit a result set to n distinct values of a given column(s), where the actual number of rows may be higher?
Input table:
client_id, employer_id, other_value
1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj
6, 1, 34kjf
7, 7, 34kjf
8, 6, lkjkj
8, 7, 23kj
desired output, where limit distinct=5 distinct values of client_id:
1, 2, abc
1, 3, defg
2, 3, dkfjh
3, 1, ldkfjkj
4, 4, dlkfjk
4, 5, 342
4, 6, dkj
5, 1, dlkfj
Platform this is intended for is MySQL.

You can use a subselect
select * from table where client_id in
(select distinct client_id from table order by client_id limit 5)

This is for SQL Server. I can't remember, MySQL may use a LIMIT keyword instead of TOP. That may make the query more efficient if you can get rid of the inner most subquery by using the LIMIT and DISTINCT in the same subquery. (It looks like Vinko used this method and that LIMIT is correct. I'll leave this here for the second possible answer though.)
SELECT
client_id,
employer_id,
other_value
FROM
MyTable
WHERE
client_id IN
(
SELECT TOP 5
client_id
FROM
(
SELECT DISTINCT
client_id
FROM
MyTable
) SQ
ORDER BY
client_id
)
Of course, add in your own WHERE clause and ORDER BY clause in the subquery.
Another possibility (compare performance and see which works out better) is:
SELECT
client_id,
employer_id,
other_value
FROM
MyTable T1
WHERE
T1.code IN
(
SELECT
T2.code
FROM
MyTable T2
WHERE
(SELECT COUNT(*) FROM MyTable T3 WHERE T3,code < T2.code) < 5
)

-- Using Common Table Expression in Microsoft SQL Server.
-- LIMIT function does not exist in MS SQL.
WITH CTE
AS
(SELECT DISTINCT([COLUMN_NAME])
FROM [TABLE_NAME])
SELECT TOP (5) [[COLUMN_NAME]]
FROM CTE;

This works for ‍‍MS SQL if anyone is on that platform:
SET ROWCOUNT 10;
SELECT DISTINCT
column1, column2, column3,...
FROM
Table1
WHERE ...

Related

BigQuery arrays - SELECT DISTINCT ordering guarantees?

I want to filter out the duplicates from a BigQuery array. I also need the order of the elements to be preserved. The docs mention that this can be done by combining SELECT DISTINCT with UNNEST. However, it doesn't mention any ordering behavior. I ran this query and got the desired ordering of [5, 3, 1, 4, 10, 8].
WITH an_array AS (
SELECT [5, 5, 3, 1, 4, 4, 10, 8, 5, 1] AS nums
)
SELECT
ARRAY((
SELECT DISTINCT num
FROM UNNEST(nums) num
))
FROM an_array;
I don't know if that's coincidence or if that ordering is guaranteed. I also tried adding WITH OFFSET with an ORDER BY to specify the order explicitly, but in that case I get Query error: ORDER BY clause expression references table alias offset which is not visible after SELECT DISTINCT.
You should always be explicit about ordering if you care about it:WITH an_array AS (
WITH an_array as (
SELECT [5, 5, 3, 1, 4, 4, 10, 8, 5, 1] AS nums
)
SELECT ARRAY((SELECT num
FROM UNNEST(nums) num WITH OFFSET o
GROUP BY num
ORDER BY MIN(o)
)
)
FROM an_array;

SQL ARRAY: Select ID from my_table where "arrayvalue" = "defined_arrayvalue"

This is a beginner-question relating arrays. I hope the answer is simple.
The example is taken from Oracle Spatial, but I think it is valid for all arrays.
I have this SELECT:
SELECT
D.FID
, D.GEOM.SDO_ELEM_INFO -- column GEOM contains spatial data
FROM
my_table D
I get this result:
73035 MDSYS.SDO_ELEM_INFO_ARRAY(1, 2, 1)
73036 MDSYS.SDO_ELEM_INFO_ARRAY(1, 4, 3, 1, 2, 1, 11, 2, 2, 19, 2, 1)
73037 MDSYS.SDO_ELEM_INFO_ARRAY(1, 2, 1)
Now I want to SELECT all rows where (1,2,1) is defined:
SELECT
D.FID
, D.GEOM.SDO_ELEM_INFO
FROM
my_table D
WHERE
-- Pseudo-Code is following
D.GEOM.SDO_ELEM_INFO is "(1, 2, 1)";
So, in simple words: "array_from_row = defined_array".
I found a lot about IMPLODE and TABLE and COLLECT etc. But how to define a clause on two arrays?
Thanks for help!
Try IN clause, you can also use both
SELECT
D.FID
, D.GEOM.SDO_ELEM_INFO
FROM
my_table D
WHERE
D.GEOM.SDO_ELEM_INFO in (1, 2, 1) or ( D.GEOM.SDO_ELEM_INFO = 1 or D.GEOM.SDO_ELEM_INFO = 2 or D.GEOM.SDO_ELEM_INFO = 3);

How to write sql for this case?

Suppose I have a table(a relationship) like
MyTab(ID1,ID2,IsMarked, data,....)
the sample data maybe looks like:
1, 1, 1, ...
1, 2, 0, ...
1, 3, 0, ...
2, 34, 1, ...
3, 4, 0, ...
4, 546, 0, ...
4, 8, 0, ...
Only one could be marked for each ID1. I want to get data marked as 1 for all Entities ID1. If there is no marked record, get the first one or any one of them.
For above sample data, the result should be:
1, 1, 1, ...
2, 34, 1, ...
3, 4, 0, ...
4, 546, 0, ...
Union could be a solution, but is too long and may have bad performance.
My idea is to sort the data by ID1 and IsMarked desc, the get the first 1 for each ID1, but how to write a SQL for this case?
For Only one could be marked for each ID1 the following should work:
;with cte as (
select *, rn=row_number() over (partition by ID1 order by IsMarked desc)
)
select *
from cte
where rn=1
Shot in the dark:
SELECT A.*
FROM MYTAB A
LEFT JOIN (
SELECT MAX(ID2) AS MAXID2, ID1
FROM MYTAB
WHERE ISMARKED=1
GROUP BY ID1
) B ON A.ID2=B.MAXID2 AND A.ID1=B.ID1
LEFT JOIN (
SELECT MAX(ID2) AS MAXID2, ID1
FROM MYTAB
WHERE ISMARKED=0
GROUP BY ID1
) C ON A.ID2=C.MAXID2 AND A.ID1=C.ID1
WHERE
(B.ID1 IS NOT NULL)
OR
(B.ID1 IS NULL AND C.ID1 IS NOT NULL);

Select one line of each code

I've got a Table that stores messages
like this:
codMsg, message, anotherCod
1, 'hi', 1
2, 'hello', 1
3, 'wasup', 1
4, 'yo', 2
5, 'yeah', 2
6, 'gogogo', 3
I was wondering if is possible to select top 1 of each anotherCod
What I expect:
1, 'hi', 1
4, 'yo', 2
6, 'gogogo', 3
I want the whole line, not just the number of the anotherCod, so group by should not work
select mytable.*
from mytable
join (select min(codMsg) as codMsg, anotherCod from mytable group by 2) x
on mytable.codMsg = x.codMsg
SQL Server 2005+, Oracle :
SELECT codMsg,
message,
anotherCod
FROM
(
SELECT codMsg,
message,
anotherCod,
RANK() OVER (PARTITION BY anotherCod ORDER BY codMsg ASC) AS Rank
FROM mytable
) tmp
WHERE Rank = 1
SELECT
*
FROM
myTable
WHERE
codMSG = (SELECT MIN(codMsg) FROM myTable AS lookup WHERE anotherCod = myTable.anotherCod)

Maintain WHERE Order In SQL Select

Is it possible to maintain the order of the WHERE clause when doing a SELECT for specific records?
For instance, given the following SELECT statement:
SELECT [RecSeq] FROM [MyData] WHERE
[RecSeq]=3 OR [RecSeq]=2 OR [RecSeq]=1 OR [RecSeq]=21 OR [RecSeq]=20 OR
[RecSeq]=19 OR [RecSeq]=110 OR [RecSeq]=109 OR [RecSeq]=108 OR
[RecSeq]=53 OR [RecSeq]=52 OR [RecSeq]=51;
I'd like the results to come back as:
3
2
1
21
20
19
110
109
108
53
53
51
However, what I get back isn't in any particular order. Currently I have a loop that calls the SELECT statement for each record required. This could range anywhere from 1 to 700,000 times. Needless to say the performance isn't the best.
Any solutions or am I stuck in the loop?
You need the ORDER BY FIELD clause.
SELECT RecSeq From MyData WHERE RecSeq IN (3, 2, 1, 21, 20, 19, 110, 109, 108, 53, 52, 51)
ORDER BY FIELD (RecSeq, 3, 2, 1, 21, 20, 19, 110, 109, 108, 53, 52, 51);
You don't say what database system you are using - I know this works in MySQL.
There is exactly one way to reliable enforce an ordering of the results of a sql statement: use an order by clause. I don't know if it is standard sql, but in oracle you could do something like this:
select ... from ...
where recseq in ( 3, 2, 1, 21, 20, 19, 110, 109, 108, 53, 53, 51)
order by decode(recseq 3,1, 2,2, 1,3, 21,4, 20,5, 19,6, 110,7, 109,8, 108,9, 53,10, 53,11, 51,12,13)
WHERE clause cannot specify your output order.
You will have to sort your results using an "order by".
If you absolutely need this order, try a 'pseudo-column' , or fake column with a union clause (performance warning here).
select 0 as my_fake_column, blah_columns from table where recseq = 3
UNION
select 1, blah_columns from table where recseq = 2
UNION
select 2, blah_columns from table where recseq = 1
UNION
select 3, blah_columns from table where recseq = 21
order by my_fake_column
The above will deliver the results in your specific order 3,2,1,21.
As the other poster said, adding a column could be an option.
You can use a derived table for filtering and sorting like this
SELECT t.RecSeq
FROM MyData t
JOIN (
SELECT 3, 1 UNION ALL
SELECT 2, 2 UNION ALL
SELECT 1, 3 UNION ALL
SELECT 21, 4 UNION ALL
SELECT 20, 5 UNION ALL
SELECT 19, 6
...
) f(RecSeq, SortKey)
ON t.RecSeq = f.RecSeq
ORDER BY f.SortKey
Ya there is a way, although, some might consider it a hack. Also, I want to point out that you can/should use the IN function instead of the giant conditional statement.
SELECT [RecSeq]
FROM [MyData]
WHERE [RecSeq] in (3,2,1,21,20,19,110,109,108,53,52,51)
ORDER BY DECODE (recseq 3,1, 2,2, 1,3, 21,4,......)
You could try using a UNION. Something like:
SELECT [RecSeq], 1 FROM [MyData] WHERE [RecSeq]=3
UNION
SELECT [RecSeq], 2 FROM [MyData] WHERE [RecSeq]=2
UNION
SELECT [RecSeq], 3 FROM [MyData] WHERE [RecSeq]=1
*etc...*
ORDER BY 2