Select random sample of N rows from Oracle SQL query result

Select random sample of N rows from Oracle SQL query result - sql

I want to reduce the number of rows exported from a query result. I have had no luck adapting the accepted solution posted on this thread.
My query looks as follows:
select
round((to_date('2019-12-31') - date_birth) / 365, 0) as age
from
personal_info a
where
exists
(
select person_id b from credit_info where credit_type = 'C' and a.person_id = b.person_id
)
;
This query returns way more rows than I need, so I was wondering if there's a way to use sample() to select a fixed number of rows (not a percentage) from however many rows result from this query.

You can sample your data by ordering randomly and then fetching first N rows.
DBMS_RANDOM.RANDOM
select round((to_date('2019-12-31') - date_birth) / 365, 0) as age
From personal_info a
where exists ( select person_id b from credit_info where credit_type = 'C' and a.person_id = b.person_id )
Order by DBMS_RANDOM.RANDOM
Fetch first 250 Rows
Edit: for oracle 11g and prior
Select * from (
select round((to_date('2019-12-31') - date_birth) / 365, 0) as age
From personal_info a
where exists ( select person_id b from credit_info where credit_type = 'C' and a.person_id = b.person_id )
Order by DBMS_RANDOM.RANDOM
)
Where rownum< 250

You can use fetch first to return a fixed number of rows. Just add:
fetch first 100 rows
to the end of your query.
If you want these sampled in some fashion, you need to explain what type of sampling you want.

If you are using 12C, you can use the row limiting clause below
select
round((to_date('2019-12-31') - date_birth) / 365, 0) as age
from
personal_info a
where
exists
(
select person_id b from credit_info where credit_type = 'C' and a.person_id = b.person_id
)
FETCH NEXT 5 ROWS ONLY;
Instead of 5, you can use any number you want.

Related

"ORA-00923: FROM keyword not found where expected\n what should I fix

I have an oracle query as follows but when I make changes to pagination the results are different. what should i pass for my code
SELECT *
FROM (
SELECT b.*,
ROWNUM r__
FROM (
select a.KODE_KLAIM,
a.NO_SS,
a.LA,
a.NAMA_TK,
a.KODE_K,
(
select tk.TEM_LAHIR
from KN.VW_KN_TK tk
where tk.KODE_K = a.KODE_K and rownum=1
) TEM_LAHIR,
(
select TO_CHAR(tk.TLAHIR, 'DD/MM/RRRR')
from KN.VW_KTK tk
where tk.KODE_K = a.KODE_K
and rownum=1
) TLAHIR
from PN.KLAIM a
where nvl(a.STATUS_BATAL,'X') = 'T'
and A.NOMOR IS NOT NULL
and A.TIPE_KLAIM = 'JPN01'
)b
)
where 1 = 1
WHERE ROWNUM < ( ( ? * ? ) + 1 )
WHERE r__ >= ( ( ( ? - 1 ) * ? ) + 1 )
but i run this query i have result ORA-00900: invalid SQL statement

You have three WHERE clauses at the end (and no ORDER BY clause). To make it syntactically valid you could change the second and third WHERE clauses to AND.
However, you mention pagination so what you probably want is to use:
SELECT *
FROM (
SELECT b.*,
ROWNUM r__
FROM (
select ...
from ...
ORDER BY something
)b
WHERE ROWNUM < :page_size * :page_number + 1
)
WHERE r__ >= ( :page_number - 1 ) * :page_size + 1
Note: You can replace the named bind variables with anonymous bind variables if you want.
Or, if you are using Oracle 12 or later then you can use the OFFSET x ROWS FETCH FIRST y ROWS ONLY syntax:
select ...
from ...
ORDER BY something
OFFSET (:page_number - 1) * :page_size ROWS
FETCH FIRST :page_size ROWS ONLY;
Additionally, you have several correlated sub-queries such as:
select tk.TEM_LAHIR
from KN.VW_KN_TK tk
where tk.KODE_K = a.KODE_K and rownum=1
This will find the first matching row that the SQL engine happens to read from the datafile and is effectively finding a random row. If you want a specific row then you need an ORDER BY clause and you need to filter using ROWNUM AFTER the ORDER BY clause has been applied.
From Oracle 12, the correlated sub-query would be:
select tk.TEM_LAHIR
from KN.VW_KN_TK tk
where tk.KODE_K = a.KODE_K
ORDER BY something
FETCH FIRST ROW ONLY

Compare the same table and fetch the satisfied results

I am trying to achieve the below requirement and need some help.
I created the below query,
SELECT * from
(
select b.extl_acct_nmbr, b.TRAN_DATE, b.tran_time,
case when (a.amount > b.amount) then b.amount
end as amount
,b.ivst_grup, b.grup_prod, b.pensionpymt
from ##pps a
join #pps b
on a.extl_acct_nmbr = b.extl_acct_nmbr
where a.pensionpymt <=2 and b.pensionpymt <=2) rslt
where rstl.amount is not null
Output I am getting,
Requirement is to get
The lowest amount row having same account number. (Completed and getting in the output)
In case both the amounts are same for same account (get the pensionpymt =1) (not sure how to get)
In case only one pensionpymt there add that too in the result set. (not sure how to get)
could you please help, expected output should be like this,

you can use window function:
select * from (
select * , row_number() over (partition by extl_acct_nmbr order by amount asc,pensionpymt) rn
from ##pps a
join #pps b
on a.extl_acct_nmbr = b.extl_acct_nmbr
) t
where rn = 1

ROW_NUMBER() Query Plan SORT Optimization

The query below accesses the Votes table that contains over 30 million rows. The result set is then selected from using WHERE n = 1. In the query plan, the SORT operation in the ROW_NUMBER() windowed function is 95% of the query's cost and it is taking over 6 minutes to complete execution.
I already have an index on same_voter, eid, country include vid, nid, sid, vote, time_stamp, new to cover the where clause.
Is the most efficient way to correct this to add an index on vid, nid, sid, new DESC, time_stamp DESC or is there an alternative to using the ROW_NUMBER() function for this to achieve the same results in a more efficient manner?
SELECT v.vid, v.nid, v.sid, v.vote, v.time_stamp, v.new, v.eid,
ROW_NUMBER() OVER (
PARTITION BY v.vid, v.nid, v.sid ORDER BY v.new DESC, v.time_stamp DESC) AS n
FROM dbo.Votes v
WHERE v.same_voter <> 1
AND v.eid <= #EId
AND v.eid > (#EId - 5)
AND v.country = #Country

One possible alternative to using ROW_NUMBER():
SELECT
V.vid,
V.nid,
V.sid,
V.vote,
V.time_stamp,
V.new,
V.eid
FROM
dbo.Votes V
LEFT OUTER JOIN dbo.Votes V2 ON
V2.vid = V.vid AND
V2.nid = V.nid AND
V2.sid = V.sid AND
V2.same_voter <> 1 AND
V2.eid <= #EId AND
V2.eid > (#EId - 5) AND
V2.country = #Country AND
(V2.new > V.new OR (V2.new = V.new AND V2.time_stamp > V.time_stamp))
WHERE
V.same_voter <> 1 AND
V.eid <= #EId AND
V.eid > (#EId - 5) AND
V.country = #Country AND
V2.vid IS NULL
The query basically says to get all rows matching your criteria, then join to any other rows that match the same criteria, but which would be ranked higher for the partition based on the new and time_stamp columns. If none are found then this must be the row that you want (it's ranked highest) and if none are found that means that V2.vid will be NULL. I'm assuming that vid otherwise can never be NULL. If it's a NULLable column in your table then you'll need to adjust that last line of the query.

PostgreSQL use case when result in where clause

I use complex CASE WHEN for selecting values. I would like to use this result in WHERE clause, but Postgres says column 'd' does not exists.
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t WHERE d IS NOT NULL
LIMIT 100, OFFSET 100;
Then I thought I can use it like this:
select * from (
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t
LIMIT 100, OFFSET 100) t
WHERE d IS NOT NULL;
But now I am not getting a 100 rows as result. Probably (I am not sure) I could use LIMIT and OFFSET outside select case statement (where WHERE statement is), but I think (I am not sure why) this would be a performance hit.
Case returns array or null. What is the best/fastest way to exclude some rows if result of case statement is null? I need 100 rows (or less if not exists - of course). I am using Postgres 9.4.
Edited:
SELECT count(*) OVER() AS count, t.id, t.size, t.price, t.location, t.user_id, p.city, t.price_type, ht.value as houses_type_value, ST_X(t.coordinates) as x, ST_Y(t.coordinates) AS y,
CASE WHEN t.classification='public' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.classification='protected' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.id IN (SELECT rl.table_id FROM table_private_list rl WHERE rl.owner_id=t.user_id AND rl.user_id=41026) THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
ELSE null
END AS main_image_description
FROM table t LEFT JOIN table_modes m ON m.id = t.mode_id
LEFT JOIN table_types y ON y.id = t.type_id
LEFT JOIN post_codes p ON p.id = t.post_code_id
LEFT JOIN table_houses_types ht on ht.id = t.houses_type_id
WHERE datetime_sold IS NULL AND datetime_deleted IS NULL AND t.published=true AND coordinates IS NOT NULL AND coordinates && ST_MakeEnvelope(17.831490030182, 44.404640972306, 12.151558389557, 47.837396630872) AND main_image_description IS NOT NULL
GROUP BY t.id, m.value, y.value, p.city, ht.value ORDER BY t.id LIMIT 100 OFFSET 0

To use the CASE WHEN result in the WHERE clause you need to wrap it up in a subquery like you did, or in a view.
SELECT * FROM (
SELECT id, name, CASE
WHEN name = 'foo' THEN true
WHEN name = 'bar' THEN false
ELSE NULL
END AS c
FROM case_in_where
) t WHERE c IS NOT NULL
With a table containing 1, 'foo', 2, 'bar', 3, 'baz' this will return records 1 & 2. I don't know how long this SQL Fiddle will persist, but here is an example: http://sqlfiddle.com/#!15/1d3b4/3 . Also see https://stackoverflow.com/a/7950920/101151
Your limit is returning less than 100 rows if those 100 rows starting at offset 100 contain records for which d evaluates to NULL. I don't know how to limit the subselect without including your limiting logic (your case statements) re-written to work inside the where clause.
WHERE ... AND (
t.classification='public' OR t.classification='protected'
OR t.id IN (SELECT rl.table_id ... rl.user_id=41026))
The way you write it will be different and it may be annoying to keep the CASE logic in sync with the WHERE limiting statements, but it would allow your limits to work only on matching data.

#1222 - The used SELECT statements have a different number of columns

Why am i getting a #1222 - The used SELECT statements have a different number of columns
? i am trying to load wall posts from this users friends and his self.
SELECT u.id AS pid, b2.id AS id, b2.message AS message, b2.date AS date FROM
(
(
SELECT b.id AS id, b.pid AS pid, b.message AS message, b.date AS date FROM
wall_posts AS b
JOIN Friends AS f ON f.id = b.pid
WHERE f.buddy_id = '1' AND f.status = 'b'
ORDER BY date DESC
LIMIT 0, 10
)
UNION
(
SELECT * FROM
wall_posts
WHERE pid = '1'
ORDER BY date DESC
LIMIT 0, 10
)
ORDER BY date DESC
LIMIT 0, 10
) AS b2
JOIN Users AS u
ON b2.pid = u.id
WHERE u.banned='0' AND u.email_activated='1'
ORDER BY date DESC
LIMIT 0, 10
The wall_posts table structure looks like id date privacy pid uid message
The Friends table structure looks like Fid id buddy_id invite_up_date status
pid stands for profile id. I am not really sure whats going on.

The first statement in the UNION returns four columns:
SELECT b.id AS id,
b.pid AS pid,
b.message AS message,
b.date AS date
FROM wall_posts AS b
The second one returns six, because the * expands to include all the columns from WALL_POSTS:
SELECT b.id,
b.date,
b.privacy,
b.pid.
b.uid message
FROM wall_posts AS b
The UNION and UNION ALL operators require that:
The same number of columns exist in all the statements that make up the UNION'd query
The data types have to match at each position/column
Use:
FROM ((SELECT b.id AS id,
b.pid AS pid,
b.message AS message,
b.date AS date
FROM wall_posts AS b
JOIN Friends AS f ON f.id = b.pid
WHERE f.buddy_id = '1' AND f.status = 'b'
ORDER BY date DESC
LIMIT 0, 10)
UNION
(SELECT id,
pid,
message,
date
FROM wall_posts
WHERE pid = '1'
ORDER BY date DESC
LIMIT 0, 10))

You're taking the UNION of a 4-column relation (id, pid, message, and date) with a 6-column relation (* = the 6 columns of wall_posts). SQL doesn't let you do that.

(
SELECT b.id AS id, b.pid AS pid, b.message AS message, b.date AS date FROM
wall_posts AS b
JOIN Friends AS f ON f.id = b.pid
WHERE f.buddy_id = '1' AND f.status = 'b'
ORDER BY date DESC
LIMIT 0, 10
)
UNION
(
SELECT id, pid , message , date
FROM
wall_posts
WHERE pid = '1'
ORDER BY date DESC
LIMIT 0, 10
)
You were selecting 4 in the first query and 6 in the second, so match them up.

Beside from the answer given by #omg-ponies; I just want to add that this error also occur in variable assignment. In my case I used an insert; associated with that insert was a trigger. I mistakenly assign different number of fields to different number of variables. Below is my case details.
INSERT INTO tab1 (event, eventTypeID, fromDate, toDate, remarks)
-> SELECT event, eventTypeID,
-> fromDate, toDate, remarks FROM rrp group by trainingCode;
ERROR 1222 (21000): The used SELECT statements have a different number of columns
So you see I got this error by issuing an insert statement instead of union statement. My case difference were
I issued a bulk insert sql
i.e. insert into tab1 (field, ...) as select field, ... from tab2
tab2 had an on insert trigger; this trigger basically decline duplicates
It turns out that I had an error in the trigger. I fetch record based on new input data and assigned them in incorrect number of variables.
DELIMITER ##
DROP TRIGGER trgInsertTrigger ##
CREATE TRIGGER trgInsertTrigger
BEFORE INSERT ON training
FOR EACH ROW
BEGIN
SET #recs = 0;
SET #trgID = 0;
SET #trgDescID = 0;
SET #trgDesc = '';
SET #district = '';
SET #msg = '';
SELECT COUNT(*), t.trainingID, td.trgDescID, td.trgDescName, t.trgDistrictID
INTO #recs, #trgID, #trgDescID, #proj, #trgDesc, #district
from training as t
left join trainingDistrict as tdist on t.trainingID = tdist.trainingID
left join trgDesc as td on t.trgDescID = td.trgDescID
WHERE
t.trgDescID = NEW.trgDescID
AND t.venue = NEW.venue
AND t.fromDate = NEW.fromDate
AND t.toDate = NEW.toDate
AND t.gender = NEW.gender
AND t.totalParticipants = NEW.totalParticipants
AND t.districtIDs = NEW.districtIDs;
IF #recs > 0 THEN
SET #msg = CONCAT('Error: Duplicate Training: previous ID ', CAST(#trgID AS CHAR CHARACTER SET utf8) COLLATE utf8_bin);
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = #msg;
END IF;
END ##
DELIMITER ;
As you can see i am fetching 5 fields but assigning them in 6 var. (My fault totally I forgot to delete the variable after editing.

You are using MySQL Union.
UNION is used to combine the result from multiple SELECT statements into a single result set.
The column names from the first SELECT statement are used as the column names for the results returned. Selected columns listed in corresponding positions of each SELECT statement should have the same data type. (For example, the first column selected by the first statement should have the same type as the first column selected by the other statements.)
Reference: MySQL Union
Your first select statement has 4 columns and second statement has 6 as you said wall_post has 6 column.
You should have same number of column and also in same order in both statement.
otherwise it shows error or wrong data.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Select random sample of N rows from Oracle SQL query result - sql

You can use fetch first to return a fixed number of rows. Just add: fetch first 100 rows to the end of your query. If you want these sampled in some fashion, you need to explain what type of sampling you want.

Related

"ORA-00923: FROM keyword not found where expected\n what should I fix

Compare the same table and fetch the satisfied results

ROW_NUMBER() Query Plan SORT Optimization

PostgreSQL use case when result in where clause

#1222 - The used SELECT statements have a different number of columns

Categories

Resources