Selecting a single (random) row for an SQL join - sql

I've got an sql query that selects data from several tables, but I only want to match a single(randomly selected) row from another table.
Easier to show some code, I guess ;)
Table K is (k_id, selected)
Table C is (c_id, image)
Table S is (c_id, date)
Table M is (c_id, k_id, score)
All ID-columns are primary keys, with appropriate FK constraints.
What I want, in english, is for eack row in K that has selected = 1 to get a random row from C where there exists a row in M with (K_id, C_id), where the score is higher than a given value, and where c.image is not null and there is a row in s with c_id
Something like:
select k.k_id, c.c_id, m.score
from k,c,m,s
where k.selected = 1
and m.score > some_value
and m.k_id = k.k_id
and m.c_id = c.c_id
and c.image is not null
and s.c_id = c.c_id;
The only problem is this returns all the rows in C that match the criteria - I only want one...
I can see how to do it using PL/SQL to select all relevent rows into a collection and then select a random one, but I'm stuck as to how to select a random one.

you can use the 'order by dbms_random.random' instruction with your query.
i.e.:
SELECT column FROM
(
SELECT column FROM table
ORDER BY dbms_random.value
)
WHERE rownum = 1
References:
http://awads.net/wp/2005/08/09/order-by-no-order/
http://www.petefreitag.com/item/466.cfm

with analytics:
SELECT k_id, c_id, score
FROM (SELECT k.k_id, c.c_id, m.score,
row_number() over(PARTITION BY k.k_id ORDER BY NULL) rk
FROM k, c, m, s
WHERE k.selected = 1
AND m.score > some_value
AND m.k_id = k.k_id
AND m.c_id = c.c_id
AND c.image IS NOT NULL
AND s.c_id = c.c_id)
WHERE rk = 1
This will select one row that satisfies your criteria per k_id. This will likely select the same set of rows if you run the query several times. If you want more randomness (each run produces a different set of rows), you would replace ORDER BY NULL by ORDER BY dbms_random.value

I'm not too familiar with oracle SQL, but try using LIMIT random(), if there is such a function available.

Related

How to deselect duplicate entries in a query?

I've got a query like this:
SELECT *
FROM RecipeTable, RecipeIngredientTable, SyncRecipeIngredientTable
WHERE RecipeTable.recipe_id = SyncRecipeIngredientTable.recipe_id
AND RecipeIngredientTable.recipe_ingredient_id =
SyncRecipeIngredientTable.recipe_ingredient_id
AND RecipeIngredientTable.recipe_item_name in ("ayva", "pirinç", "su")
GROUP by RecipeTable.recipe_id
HAVING COUNT(*) >= 3;
and this query returns the result like this:
As you can see in the image there is 3 duplicate, unnecessary entries (no, i can't delete them because of the multiple foreign keys). How can I deselect these duplicate entries from the result query? In the end I want to return 6 entries not 9.
What you want to eliminate in the result set is not duplication of recipe_id values but recipe_name values.
You just need to group(partition) by recipe_name through use of ROW_NUMBER() analytic function :
SELECT recipe_id, author_name ...
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY recipe_name) AS rn,
sr.recipe_id, author_name ...
FROM SyncRecipeIngredientTable sr
JOIN RecipeIngredientTable ri
ON ri.recipe_ingredient_id = sr.recipe_ingredient_id
JOIN RecipeTable rt
ON rt.recipe_id = sr.recipe_id
WHERE ri.recipe_item_name in ("ayva", "pirinç", "su")
)
WHERE rn = 1
This way, you can pick only one of the records with rn=1 (ORDER BY Clause might be added to that analytic function after PARTITION BY clause if spesific record is needed to be picked)

SQL query to join smae table multiple times

I have a scenario to join the same table multiple times to get the desired output. For ex I have two tables TABLE A and TABLE B.
Step 1: I want to take the all the parties from TABLE A which have
lowest Idate. Lowest idate will be fetched based partyid and idate
column.
Step 2: Then based on CID which is fetched from TABLE A in step 1,
we need to fetch the corresponding MID from TABLE B which have
MIDTYPE=130300.
Step 3: Then based on the MID fetched in step 2 we need to traverse
the same table and find out the latest record for the same MID based
on idate in TABLE B and fetch the corresponding CID for the MID.
Step 4: Now for that CID we need to fetch MID value for MIDTYPE
130307 in the same table(TABLEB). And my final output should be combination of MID
which we fetched for step 3 and MID fetched for 130307 in step 4.
I write a query like this ..but its taking lot of time for the query to run as we are going through the same table(TABLEB) multiple times and TABLEB have millions of rows. Is there anyway we can rewrite this query in different way. Could some one can help with this me.
SELECT
ident.mid mid1,
b.mid mid2
FROM
(
SELECT
*
FROM
tableb
WHERE
midtype = 130307
) ident
INNER JOIN (
SELECT
s.cid,
s.mid,
s.midtype
FROM
(
SELECT
cid,
partyid,
admin_sys_tp_cd,
mid,
ilast
FROM
(
SELECT
cq.cid,
RANK() OVER(
PARTITION BY cq.partyid
ORDER BY
cq.idate ASC
) rnk,
cq.idate,
cq.partyid,
i.mid,
i.idate AS ilast
FROM
tablea cq
INNER JOIN tableb i ON cq.cid = i.cid
INNER JOIN tablec ON i.cid = c.cid
WHERE
i.midtype = 130300
)
WHERE
rnk = 1
) a
INNER JOIN (
SELECT
*
FROM
(
SELECT
cid,
mid,
midtype,
RANK() OVER(
PARTITION BY mid
ORDER BY
idate DESC
) rnk_mpid
FROM
tableb
)
WHERE
rnk_mpid = 1
) s ON a.mid = s.mid
AND s.midtype = 130300
) b ON ident.cid = b.cid
AND ident.midtype = 130307
not what you asked, but before others and I, spent time trying to get different approaches for you, let's make sure the basics are covered.
No matter how different you can write an SQL query, they will never perform fast, in a MILLION base table if you don't have the proper indexes for it. Specially in your case, as you have to access it 3 times at least.
Just by looking at your detailed steps. I would say that you should have at least 3 different indexes created to support this query.
TableA_Index1 ( PARTYID, LDATE, INCLUDES CID)
TableB_Index1 (CID, MIDTYPE, INCLUDES MID )
TableB_Index2 (MID, LDATE, INCLUDES CID )
Do you have them ?
Have you ever tried to run this query on db2-advisor (db2advis) to get recommended indexes for it ?

Get top 1 row for every ID

There is a few posts about it but i can't make it work...
I just want to select just one row per ID, something like row_number() over Partition in oracle but in access.
ty
SELECT a.*
FROM DATA as a
WHERE a.a_sku = (SELECT top 1 b.a_sku
FROM DATA as b
WHERE a.a_sku = b.a_sku)
but i get the same table Data out of it
Sample of table DATA
https://ibb.co/X4492fY
You should try below query -
SELECT a.*
FROM DATA as a
WHERE a.Active = (SELECT b.Active
FROM DATA as b
WHERE a.a_sku = b.a_sku
AND a.Active < b.Active)
If you don't care which record within each group of records with a matching a_sku values is returned, you can use the First or Last functions, e.g.:
select t.a_sku, first(t.field2), first(t.field3), ..., first(t.fieldN)
from data t
group by t.a_sku

Recursive calculation in SQL (Oracle)

I'm having a tough time finding a solution to ETL some data into my resulting table. I think I cannot accomplish this using pure SQL and need to use PL-SQL due to the looping. Could the sql gurus help me go towards the right direction or provide some pointers to solve this problem?
Here's the scenario:
Tables: TABLEA and TABLEB.
Steps:
Group records in TABLEA by A_CD and SUM the A_AMT FIELD. (Lets assume A_FLAG is always same for any A_CD.). Lets call the grouped resultset as TABLEA_GRP (This is not a table, it is a grouped query).
Pick a row from TABLEB and if B_FLG is 'N' then pick all rows in TABLEA_GRP where A_FLG is 'N'. If the B_FLG is 'Y' then pick all rows in TABLEA_GRP.
Starting first record of rows picked in step 2, calculate the ratio of its TOTAL_AMT to SUM of ALL TOTAL_AMT for the selected rows. Multiply the ratio to B_AMT and add resulting amount to the rows TOTAL_AMT and store in RESULTING_AMT. Repeat this calculation for all rows picked in step 2.
Repeat step 2 and 3, now using the starting TOTAL_AMT VALUE from the RESULTING_AMT value from previous calculation of the same A_CD.
RESULTING _RATIO field is not needed to be saved, it is just given for demo purpose. How would you do this?
Basically I want to get data in RESULTING_TABLE from TABLEA and TABLEB
Could anyone help? Thanks a lot in advance for any guidance.
EDIT: I added A_DATE and B_DATE for supporting join between the two tables. For simplicity you can just do A.A_DATE = B.B_DATE, example this basic join:
SELECT
A.A_CD,
SUM(A.A_AMT) AS TOTAL_AMT,
A.A_FLAG,
A.A_DATE,
B.B_ID,
B.B_AMT,
B.B_FLAG
FROM
TABLEA A
JOIN TABLEB B
ON A.A_DATE = B.B_DATE
GROUP BY
A.A_CD,
A.A_FLAG,
A.A_DATE,
B.B_ID,
B.B_AMT,
B.B_FLAG
;
Okay I think I've got the solution. The numbers are a bit different to yours, but I'm fairly sure mine is doing what you want. We can do everything in steps 1 & 2 using a single query (main_sql). 3 and 4 have to be done using a recursive statement (recur_sql).
with main_sql as (
select a.*,
b.*,
sum(a_amt) over (partition by b_id) as cd_amt,
rank() over (partition by a_cd order by b_id) as rnk
from (select a_cd, a_flag, sum(a_amt) as a_amt
from tablea
group by a_cd, a_flag) a,
tableb b
where a.a_flag = case when b.b_flag = 'Y' then a.a_flag else b.b_flag end
order by b_id, a_cd
),
recur_sql (a_cd, b_id, total_amt, cd_amt, resulting_ratio, resulting_amt, rnk) as (
select m.a_cd,
m.b_id,
m.a_amt as total_amt,
m.cd_amt, m.a_amt / m.cd_amt as resulting_ratio,
m.a_amt + (m.a_amt / m.cd_amt * m.b_amt) as resulting_amt,
rnk
from main_sql m
where rnk = 1
union all
select m.a_cd,
m.b_id,
r.resulting_amt as total_amt,
m.cd_amt,
r.resulting_amt / m.cd_amt as resulting_ratio,
r.resulting_amt + (r.resulting_amt / m.cd_amt * m.b_amt) as resulting_amt,
m.rnk
from recur_sql r,
main_sql m
where m.rnk > 1
and r.a_cd = m.a_cd
and m.rnk - 1 = r.rnk
)
select a_cd, b_id, total_amt, resulting_ratio, resulting_amt
from recur_sql
order by 2, 1

How can I select adjacent rows to an arbitrary row (in sql or postgresql)?

I want to select some rows based on certain criteria, and then take one entry from that set and the 5 rows before it and after it.
Now, I can do this numerically if there is a primary key on the table, (e.g. primary keys that are numerically 5 less than the target row's key and 5 more than the target row's key).
So select the row with the primary key of 7 and the nearby rows:
select primary_key from table where primary_key > (7-5) order by primary_key limit 11;
2
3
4
5
6
-=7=-
8
9
10
11
12
But if I select only certain rows to begin with, I lose that numeric method of using primary keys (and that was assuming the keys didn't have any gaps in their order anyway), and need another way to get the closest rows before and after a certain targeted row.
The primary key output of such a select might look more random and thus less succeptable to mathematical locating (since some results would be filtered, out, e.g. with a where active=1):
select primary_key from table where primary_key > (34-5)
order by primary_key where active=1 limit 11;
30
-=34=-
80
83
100
113
125
126
127
128
129
Note how due to the gaps in the primary keys caused by the example where condition (for example becaseu there are many inactive items), I'm no longer getting the closest 5 above and 5 below, instead I'm getting the closest 1 below and the closest 9 above, instead.
There's a lot of ways to do it if you run two queries with a programming language, but here's one way to do it in one SQL query:
(SELECT * FROM table WHERE id >= 34 AND active = 1 ORDER BY id ASC LIMIT 6)
UNION
(SELECT * FROM table WHERE id < 34 AND active = 1 ORDER BY id DESC LIMIT 5)
ORDER BY id ASC
This would return the 5 rows above, the target row, and 5 rows below.
Here's another way to do it with analytic functions lead and lag. It would be nice if we could use analytic functions in the WHERE clause. So instead you need to use subqueries or CTE's. Here's an example that will work with the pagila sample database.
WITH base AS (
SELECT lag(customer_id, 5) OVER (ORDER BY customer_id) lag,
lead(customer_id, 5) OVER (ORDER BY customer_id) lead,
c.*
FROM customer c
WHERE c.active = 1
AND c.last_name LIKE 'B%'
)
SELECT base.* FROM base
JOIN (
-- Select the center row, coalesce so it still works if there aren't
-- 5 rows in front or behind
SELECT COALESCE(lag, 0) AS lag, COALESCE(lead, 99999) AS lead
FROM base WHERE customer_id = 280
) sub ON base.customer_id BETWEEN sub.lag AND sub.lead
The problem with sgriffinusa's solution is that you don't know which row_number your center row will end up being. He assumed it will be row 30.
For similar query I use analytic functions without CTE. Something like:
select ...,
LEAD(gm.id) OVER (ORDER BY Cit DESC) as leadId,
LEAD(gm.id, 2) OVER (ORDER BY Cit DESC) as leadId2,
LAG(gm.id) OVER (ORDER BY Cit DESC) as lagId,
LAG(gm.id, 2) OVER (ORDER BY Cit DESC) as lagId2
...
where id = 25912
or leadId = 25912 or leadId2 = 25912
or lagId = 25912 or lagId2 = 25912
such query works more faster for me than CTE with join (answer from Scott Bailey). But of course less elegant
You could do this utilizing row_number() (available as of 8.4). This may not be the correct syntax (not familiar with postgresql), but hopefully the idea will be illustrated:
SELECT *
FROM (SELECT ROW_NUMBER() OVER (ORDER BY primary_key) AS r, *
FROM table
WHERE active=1) t
WHERE 25 < r and r < 35
This will generate a first column having sequential numbers. You can use this to identify the single row and the rows above and below it.
If you wanted to do it in a 'relationally pure' way, you could write a query that sorted and numbered the rows. Like:
select (
select count(*) from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
Then use that as a common table expression. Write a select which filters it down to the rows you're interested in, then join it back onto itself using a criterion that the index of the right-hand copy of the table is no more than k larger or smaller than the index of the row on the left. Project over just the rows on the right. Like:
with numbered_emps as (
select (
select count(*)
from employees b
where b.name < a.name
) as idx, name
from employees a
order by name
)
select b.*
from numbered_emps a, numbered_emps b
where a.name like '% Smith' -- this is your main selection criterion
and ((b.idx - a.idx) between -5 and 5) -- this is your adjacency fuzzy-join criterion
What could be simpler!
I'd imagine the row-number based solutions will be faster, though.