SQL Pick by Index When Using ORDER BY - sql

I want to pick say the 10th, 20th, and 50th entry in a dataset after it has been ordered by a column. What's the best way to achieve this?

The easiest and most efficient way is to just use LIMIT/OFFSET:
SELECT * FROM MyTable ORDER BY whatever LIMIT 1 OFFSET 9
UNION ALL
SELECT * FROM MyTable ORDER BY whatever LIMIT 1 OFFSET 19
UNION ALL
SELECT * FROM MyTable ORDER BY whatever LIMIT 1 OFFSET 49

Let's assume that we have the following table:
create table Test
(
value int
);
Here is a query that returns first, third and sixth row:
select value
from
(
select value, (select count(*) + 1 from Test t2 where t2.value < t1.value) as OrderId
from Test t1
) tbl
where tbl.OrderId in (1,3,6)
You can try it here. If there are duplicates in the Test table the solution above can return more than 3 rows.
UPDATE
If you want to sort by different column than value from my example you should modify the condition t2.value < t1.value. The general form is t2.COLUMN_NAME < t1.COLUMN_NAME.

Related

How to exclude all rows with the same ID based on one record's value in psql?

Say I have the results above, and want to exclude all rows with ID of 14010497 because at least one of the rows has a date of 2/25. How would I filter it down? Using a WHERE table.end_date > '2019-02-25' would still include the row with a date of 2-23
Try something like this:
select * from your_table
where id not in (
select distinct id
from your_table
where end_date > '2019-02-25'
)
/
I would use not exists:
select t.*
from t
where not exists (select 1
from t t2
where t2.id = t.id and t2.end_date = '2019-02-25'
);
I strongly advise using not exists over not in because it handles NULL values much more intuitively. NOT IN will return no rows at all if any value in the subquery is NULL.

How do I sample the number of records in another table?

I have code where I'm sampling 50,000 random records. I.e.,
SELECT * FROM Table1
SAMPLE 50000;
That works. However, what I really want to do is sample the number of records that are in a different table. I.e.,
SELECT * FROM Table1
SAMPLE count(*) FROM Table2;
I get an error. What am I doing wrong?
This is not randomized like sample, so bear that in mind. But there also won't be an obvious pattern, I believe it's determined by disk location (don't quote me on that).
SELECT *
FROM Table1
QUALIFY ROW_NUMBER() OVER
( PARTITION BY 1
ORDER BY 1
) <=
( SELECT COUNT(*)
FROM Table2
);
Better way
SELECT TMP.* -- Or list the columns you want with "rnd"
FROM ( SELECT RANDOM(-10000000,10000000) rnd,
T1.*
FROM Table1 T1
) TMP
QUALIFY ROW_NUMBER() OVER
( ORDER BY rnd
) <=
( SELECT COUNT(*)
FROM Table2
);
SELECT TOP 50000 * FROM Table1 ORDER BY NEWID()

What is the quickest way in Oracle SQL to find out if one or more duplicates exist in a table?

I'm looking to create a statement that stops and returns true the very second it finds a duplicate value on a column. I don't care what the value is and simply need to know whether a duplicate exists or not; nothing else.
I know i can write Select count(*) from myTable group by primary_id having count(*) > 1; but this goes through every single row of the table, whereas I want the query to stop as soon as it encounters a single case of a duplicate existing.
The best shot i've attempted with what i know is this:-
select 1 as thingy from dual outer_qry
where exists
(
select * from
(
select some_ID,
case when COUNT(*) > 1 then 'X' else 'N' end as TRIG
from myTable
group by some_ID
)INNER_QRY
where INNER_QRY.trig = outer_qry.dummy
);
However this takes 13 seconds and I doubt it takes that long to find the first duplicate.
Can anyone please suggest where my thinking is going wrong as, hopefully from my SQL, my assumption is that the EXISTS function will be checked for every row returned for the inner_qry, but this doesn't seem to be the case.
You would use exists. This returns all the duplicates:
select t.*
from mytable t
where exists (select 1
from mytable t t2
where t2.some_id = t.some_id and t2.rowid <> t.rowid
);
In Oracle 12c, you would add fetch first 1 row only. And it can take advantage of an index on mytable(some_id).
In earlier versions:
select 1 as HasDuplicate
from (select t.*
from mytable t
where exists (select 1
from mytable t t2
where t2.some_id = t.some_id and t2.rowid <> t.rowid
)
) t
where rownum = 1;
If this returns no rows, then there are no duplicates.
select * from table1 t1 natural join table1 t2 where t1.rowid < t2.rowid;
you can use this to understand which id is dublicate
select some_ID
from myTable
group by some_ID having count(*) >1

Select 10 records after id=somevalue, (id>somevalue) and select first 10 records if id=somevalue doesn't exist

I have an sql query which selects 10 records after id= somevalue, but i want to select the first 10 records if the record doesnt exist. Query is in below structure.
SELECT * FROM TABLE WHERE ID > x ORDER BY METRIC LIMIT 10
Provided, id here is a varchar field which is sorted based on some field.
This comes close to what you want:
SELECT *
FROM TABLE
ORDER BY (CASE WHEN ID > X THEN 1 ELSE 0 END) DESC,
METRIC
LIMIT 10
It will always return 10 records (assuming you have at least 10 records in the table). It will put the ones with id > x first. If there are not enough of those, then it will fill in with other records.
This will also work:
SELECT TOP 10 col1, col2
FROM #yourtable
WHERE col1 > #ID
UNION ALL
SELECT TOP 10 col1, col2
FROM #yourtable
WHERE NOT EXISTS (SELECT * FROM #yourtable WHERE col1 = #ID)
However, this assumes you have an ID that you can query on using greater than/less than to retrieve the desired "next ten" records. Also, you would probably need to add an "ORDER BY" clause to ensure the records have the desired values.

Select DISTINCT, return entire row

I have a table with 10 columns.
I want to return all rows for which Col006 is distinct, but return all columns...
How can I do this?
if column 6 appears like this:
| Column 6 |
| item1 |
| item1 |
| item2 |
| item1 |
I want to return two rows, one of the records with item1 and the other with item2, along with all other columns.
In SQL Server 2005 and above:
;WITH q AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY col6 ORDER BY id) rn
FROM mytable
)
SELECT *
FROM q
WHERE rn = 1
In SQL Server 2000, provided that you have a primary key column:
SELECT mt.*
FROM (
SELECT DISTINCT col6
FROM mytable
) mto
JOIN mytable mt
ON mt.id =
(
SELECT TOP 1 id
FROM mytable mti
WHERE mti.col6 = mto.col6
-- ORDER BY
-- id
-- Uncomment the lines above if the order matters
)
Update:
Check your database version and compatibility level:
SELECT ##VERSION
SELECT COMPATIBILITY_LEVEL
FROM sys.databases
WHERE name = DB_NAME()
The key word "DISTINCT" in SQL has the meaning of "unique value". When applied to a column in a query it will return as many rows from the result set as there are unique, different values for that column. As a consequence it creates a grouped result set, and values of other columns are random unless defined by other functions (such as max, min, average, etc.)
If you meant to say you want to return all rows for which Col006 has a specific value, then use the "where Col006 = value" clause.
If you meant to say you want to return all rows for which Col006 is different from all other values of Col006, then you still need to specify what that value is => see above.
If you want to say that the value of Col006 can only be evaluated once all rows have been retrieved, then use the "having Col006 = value" clause. This has the same effect as the "where" clause, but "where" gets applied when rows are retrieved from the raw tables, whereas "having" is applied once all other calculations have been made (i.e. aggregation functions have been run etc.) and just before the result set is returned to the user.
UPDATE:
After having seen your edit, I have to point out that if you use any of the other suggestions, you will end up with random values in all other 9 columns for the row that contains the value "item1" in Col006, due to the constraint further up in my post.
You can group on Col006 to get the distinct values, but then you have to decide what to do with the multiple records in each group.
You can use aggregates to pick a value from the records. Example:
select Col006, min(Col001), max(Col002)
from TheTable
group by Col006
order by Col006
If you want the values to come from a specific record in each group, you have to identify it somehow. Example of using Col002 to identify the record in each group:
select Col006, Col001, Col002
from TheTable t
inner join (
select Col006, min(Col002)
from TheTable
group by Col006
) x on t.Col006 = x.Col006 and t.Col002 = x.Col002
order by Col006
SELECT *
FROM (SELECT DISTINCT YourDistinctField FROM YourTable) AS A
CROSS APPLY
( SELECT TOP 1 * FROM YourTable B
WHERE B.YourDistinctField = A.YourDistinctField ) AS NewTableName
I tried the answers posted above with no luck... but this does the trick!
select * from yourTable where column6 in (select distinct column6 from yourTable);
SELECT *
FROM harvest
GROUP BY estimated_total;
You can use GROUP BY and MIN() to get more specific result.
Lets say that you have id as the primary_key.
And we want to get all the DISTINCT values for a column lets say estimated_total, And you also need one sample of complete row with each distinct value in SQL. Following query should do the trick.
SELECT *, min(id)
FROM harvest
GROUP BY estimated_total;
create table #temp
(C1 TINYINT,
C2 TINYINT,
C3 TINYINT,
C4 TINYINT,
C5 TINYINT,
C6 TINYINT)
INSERT INTO #temp
SELECT 1,1,1,1,1,6
UNION ALL SELECT 1,1,1,1,1,6
UNION ALL SELECT 3,1,1,1,1,3
UNION ALL SELECT 4,2,1,1,1,6
SELECT * FROM #temp
SELECT *
FROM(
SELECT ROW_NUMBER() OVER (PARTITION BY C6 Order by C1) ID,* FROM #temp
)T
WHERE ID = 1