Efficiently selecting a subset from a bigger selection in SQL

Efficiently selecting a subset from a bigger selection in SQL - sql

I have a monstrous query in Oracle SQL. My problem is that I need to take the % of two related queries.
So, what I'm doing is:
SELECT type*100/decode(total, 0, 1, total) as result
from (SELECT
(select count(*) from tb1, tb2, tb3
where tb1.fieldA = tb2.fieldB
and tb2.fieldC = tb3.fieldD
and tb3.fieldE = 'Some stuf') as type,
(select count(*) from tb1, tb2, tb3
where tb1.fieldA = tb2.fieldB
and tb2.fieldC = tb3.fieldD) as total from dual) auxTable;
As you can see my variable called type is a subset of the total variable. This is a simplified example of a much bigger problem..
Is there any efficient way of selecting the subset(type) from the total and then getting the percentage?

Yes, there is a more efficient way of doing this:
SELECT type*100/DECODE(total, 0, 1, total) FROM (
SELECT COUNT(*) AS total, SUM(DECODE(tb3.fieldE, 'Some stuf', 1, 0)) AS type
FROM tb1, tb2, tb3
WHERE tb1.fieldA = tb2.fieldB
AND tb2.fieldC = tb3.fieldD
);
This SUM(DECODE(tb3.fieldE, 'Some stuf', 1, 0) will get a count of all records for which tb3.fieldE = 'Some stuff'. Alternately, you can use:
COUNT(CASE WHEN tb3.fieldE = 'Some stuff' THEN 1 END) AS type
It that CASE will return NULL when fieldE is not the chosen value, and NULLs are not counted in COUNT().

Related

Using COUNTIF inside aggregation function

Im making a betting system app. I need to count the points of a user based on his bets, knowing that some of the bets can be 'combined', which makes the calcul a bit more complex than a simple addition.
So if i have 3 bets: {points: 3, combined: false}, {points: 5, combined: true}, {points: 10, combined: true}, there are two bets combined here, so the total points should be 3 + (5 * 2) + (10 * 2). Reality is a bit more complex since the points are not directly in the bet object but in the match it refers to
Here is a part of my query. As you can see, i first check if the bet is right based on the match result, in that case if the bet is combined I multiply it by the value of combinedLength, else i'll just sum the value of that bet. I tried to replicate the COUNTIF inside the CASE, which gaves me an error like 'cannot aggregation inside aggregation'.
SELECT
JSON_EXTRACT_SCALAR(data, '$.userId') AS userId,
COUNTIF(JSON_EXTRACT_SCALAR(data, '$.combined') = 'true') AS combinedLength,
SUM (
(
CASE WHEN JSON_EXTRACT_SCALAR(data, '$.value') = match.result
THEN IF(JSON_EXTRACT_SCALAR(data, '$.combined') = "true", match.odd * combinedLength, match.odd)
ELSE 0
END
)
) AS totalScore,
FROM data.user_bets_raw_latest
INNER JOIN matchLines ON matchLines.match.matchId = JSON_EXTRACT(data, '$.fixtureId')
GROUP BY userId
I've been looking for days... thanks so much for the help !

If I follow you correctly, then you need to count the total number of combined bets per user in a subquery, using a window function. Then, you can use this information while aggregating.
Consider:
select
user_id,
sum(case when combined = 'true' then odd * cnt_combined else odd end) total_score
from (
select
u.*,
m.match.odd,
countif(b.combined = 'true') over(partition by userid) as cnt_combined,
from (
select
json_extract_scalar(data, '$.userid') userid,
json_extract_scalar(data, '$.combined') combined,
json_extract_scalar(data, '$.value') value,
json_extract_scalar(data, '$.fixtureid') fixtureid
from data.user_bets_raw_latest
) b
left join matchlines m
on m.match.matchid = b.fixtureid
and m.match.result = b.value
) t
group by userid
I find that it is simpler to use a left join and put the condition on the match result in there.
I moved the json extractions to a subquery to reduce the length of the query.

select subquery using data from the select statement?

I have two tables, headers and lines. I need to grab the batch_submission_date from the header table, but sometimes a query for batch_id will return a null for batch_submission_date, but will also return a parent_batch_id, and if we query THAT parent_batch_id as a batch_id, it will then return the correct batch_submission_date.
e.g.
SELECT t1.batch_id,
t1.parent_batch_id,
t2.batch_submission_date
FROM db.headers t1, db.lines t2
WHERE t1.batch_id = '12345';
output = 12345, 99999, null
Then we use that parent batch_id as a batch_id :
SELECT t1.batch_id,
t1.parent_batch_id,
t2.batch_submission_date
FROM db.headers t1, db.lines t2
WHERE t1.batch_id = '99999';
and we get output = 99999,99999,'2018-01-01'
So I'm trying to write a query that will do this for me - anytime a batch_id's batch_submission_date is null, we find that batch_id's parent batch_id and query that instead.
This was my idea - but I just get back null both for bp_batch_submission_date and for new_submission_date.
SELECT
t1.parent_id as parent_id,
t1.BATCH_ID as bp_batch_id,
t2.BATCH_LINE_NUMBER as bp_batch_li,
t1.BATCH_SUBMISSION_DATE as bp_batch_submission_date,
CASE
WHEN t1.BATCH_SUBMISSION_DATE is null
THEN
(SELECT a.BATCH_SUBMISSION_DATE
FROM
db.headers a,
db.lines b
WHERE
a.SD_BATCH_HEADERS_SKEY = b.SD_BATCH_HEADERS_SKEY
and a.parent_batch_id = bp_batch_id
and b.batch_line_number = bp_batch_li
) END as new_submission_date
FROM
db.headers t1,
db.lines t2
WHERE
t1.SD_BATCH_HEADERS_SKEY = t2.SD_BATCH_HEADERS_SKEY
and (t1.BATCH_ID = '12345' or t1.PARENT_BATCH_ID = '12345')
and t2.BATCH_LINE_NUMBER = '1'
GROUP BY
t2.BATCH_CLAIM_LINE_STATUS_DESC,
t1.PARENT_BATCH_ID,
t1.BATCH_ID,
t2.BATCH_LINE_NUMBER,
t1.BATCH_SUBMISSION_DATE;
is what I'm trying to do possible? using the bp_batch_id and bp_batch_li variables

Use CTE (common table expression) to avoid redundant code, then use coalesce() to find parent date in case of null. In your first queries you didn't attach joining condition between two tables, I assumed it's based on sd_batch_headers_skey like in last query.
dbfiddle demo
with t as (
select h.batch_id, h.parent_batch_id, l.batch_submission_date bs_date
from headers h
join lines l on l.sd_batch_headers_skey = h.sd_batch_headers_skey
and l.batch_line_number = '1' )
select batch_id, parent_batch_id,
coalesce(bs_date, (select bs_date from t x where x.batch_id = t.parent_batch_id)) bs_date
from t
where batch_id = 12345;
You could use simpler syntax with connect by and level <= 2 but if in your data there are really rows containing same ids (99999, 99999) then we get cycle error.

PostgreSQL use case when result in where clause

I use complex CASE WHEN for selecting values. I would like to use this result in WHERE clause, but Postgres says column 'd' does not exists.
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t WHERE d IS NOT NULL
LIMIT 100, OFFSET 100;
Then I thought I can use it like this:
select * from (
SELECT id, name, case when complex_with_subqueries_and_multiple_when END AS d
FROM table t
LIMIT 100, OFFSET 100) t
WHERE d IS NOT NULL;
But now I am not getting a 100 rows as result. Probably (I am not sure) I could use LIMIT and OFFSET outside select case statement (where WHERE statement is), but I think (I am not sure why) this would be a performance hit.
Case returns array or null. What is the best/fastest way to exclude some rows if result of case statement is null? I need 100 rows (or less if not exists - of course). I am using Postgres 9.4.
Edited:
SELECT count(*) OVER() AS count, t.id, t.size, t.price, t.location, t.user_id, p.city, t.price_type, ht.value as houses_type_value, ST_X(t.coordinates) as x, ST_Y(t.coordinates) AS y,
CASE WHEN t.classification='public' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.classification='protected' THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
WHEN t.id IN (SELECT rl.table_id FROM table_private_list rl WHERE rl.owner_id=t.user_id AND rl.user_id=41026) THEN
ARRAY[(SELECT i.filename FROM table_images i WHERE i.table_id=t.id ORDER BY i.weight ASC LIMIT 1), t.description]
ELSE null
END AS main_image_description
FROM table t LEFT JOIN table_modes m ON m.id = t.mode_id
LEFT JOIN table_types y ON y.id = t.type_id
LEFT JOIN post_codes p ON p.id = t.post_code_id
LEFT JOIN table_houses_types ht on ht.id = t.houses_type_id
WHERE datetime_sold IS NULL AND datetime_deleted IS NULL AND t.published=true AND coordinates IS NOT NULL AND coordinates && ST_MakeEnvelope(17.831490030182, 44.404640972306, 12.151558389557, 47.837396630872) AND main_image_description IS NOT NULL
GROUP BY t.id, m.value, y.value, p.city, ht.value ORDER BY t.id LIMIT 100 OFFSET 0

To use the CASE WHEN result in the WHERE clause you need to wrap it up in a subquery like you did, or in a view.
SELECT * FROM (
SELECT id, name, CASE
WHEN name = 'foo' THEN true
WHEN name = 'bar' THEN false
ELSE NULL
END AS c
FROM case_in_where
) t WHERE c IS NOT NULL
With a table containing 1, 'foo', 2, 'bar', 3, 'baz' this will return records 1 & 2. I don't know how long this SQL Fiddle will persist, but here is an example: http://sqlfiddle.com/#!15/1d3b4/3 . Also see https://stackoverflow.com/a/7950920/101151
Your limit is returning less than 100 rows if those 100 rows starting at offset 100 contain records for which d evaluates to NULL. I don't know how to limit the subselect without including your limiting logic (your case statements) re-written to work inside the where clause.
WHERE ... AND (
t.classification='public' OR t.classification='protected'
OR t.id IN (SELECT rl.table_id ... rl.user_id=41026))
The way you write it will be different and it may be annoying to keep the CASE logic in sync with the WHERE limiting statements, but it would allow your limits to work only on matching data.

Not getting SQL query for UNION to return correct value

ISSUES with a UNION query, where from one table, I need the COUNT of records for "Term" where I specify "ID" and "Count" fields.
NOTE The grader is grading down if it suspects UNION ALL was used.
I'm trying a few different formats...and none are working correctly.
------(VERSION 1)-------
e.g.
select count(term) from frequency where (select docid =
'10398_txt_earn' where count = '1' UNION select docid =
'925_txt_trade' where count = '1');
Gives me "0." The grader tells me "0" is incorrect.
If I split into the two parts, I get a value for each (110 and 225).
------(VERSION 2)-------
I modified the query, using the format here:
http://sqlite.1065341.n5.nabble.com/UNION-QUERY-td42629.html
so that I now have for the query (BTW...unlike what the poster suggests, I have to use UNION...unless I can use UNION ALL then select unique records...but this seems like an unnecessary computational step)
select count(term) from (select * from frequency where
(docid='10398_txt_earn' and count='1'))
UNION
select term from (select * from frequency where
(docid='925_txt_trade' and count='1'));
gives me two lines of output: line 1: 110, line 2: 225
If I replace 'select count(term)' to just 'select term'...I get a list whose length is the sum of the two values above
------(VERSION 3)-------
select (term) from frequency where (select docid = '10398_txt_earn'
where count = '1')
UNION
select (term) from frequency where (select docid = '925_txt_trade'
where count = '1');
again, gives me two lines of output: line 1: 110, line 2: 225
QUESTION???
How would you modify it so that you would get one value with count of the UNIQUE shared terms?

I would use this
select count(term) from frequency where docid in ('10398_txt_earn' '925_txt_trade') AND count = '1';
However if you need to use a UNION
select sum(termCount) from (
select count(term) termCount from (select * from frequency where
docid='10398_txt_earn' and count='1' group by term)
UNION
select count(term) from (select * from frequency where
docid='925_txt_trade' and count='1' group by term)
);

You're looking for a count of both items (a where clause) and need it to also be where "count" ='1'? This sounds straight forward to me - a single where clause.
SELECT SUM(TermCounts)
FROM (
SELECT COUNT(DISTINCT terms) AS TermCounts
FROM frequency
WHERE docid = '10398_txt_earn'
AND "count" = '1'
UNION ALL
SELECT COUNT(DISTINCT terms) AS TermCounts
FROM frequency
WHERE docid = '925_txt_trade'
) CountUnion;

got it...this works:
from (
select term
from frequency
where docid in ("10398_txt_earn") AND count = '1'
UNION
select term
from frequency
where docid in ("925_txt_trade") AND count = '1');

#1222 - The used SELECT statements have a different number of columns

Why am i getting a #1222 - The used SELECT statements have a different number of columns
? i am trying to load wall posts from this users friends and his self.
SELECT u.id AS pid, b2.id AS id, b2.message AS message, b2.date AS date FROM
(
(
SELECT b.id AS id, b.pid AS pid, b.message AS message, b.date AS date FROM
wall_posts AS b
JOIN Friends AS f ON f.id = b.pid
WHERE f.buddy_id = '1' AND f.status = 'b'
ORDER BY date DESC
LIMIT 0, 10
)
UNION
(
SELECT * FROM
wall_posts
WHERE pid = '1'
ORDER BY date DESC
LIMIT 0, 10
)
ORDER BY date DESC
LIMIT 0, 10
) AS b2
JOIN Users AS u
ON b2.pid = u.id
WHERE u.banned='0' AND u.email_activated='1'
ORDER BY date DESC
LIMIT 0, 10
The wall_posts table structure looks like id date privacy pid uid message
The Friends table structure looks like Fid id buddy_id invite_up_date status
pid stands for profile id. I am not really sure whats going on.

The first statement in the UNION returns four columns:
SELECT b.id AS id,
b.pid AS pid,
b.message AS message,
b.date AS date
FROM wall_posts AS b
The second one returns six, because the * expands to include all the columns from WALL_POSTS:
SELECT b.id,
b.date,
b.privacy,
b.pid.
b.uid message
FROM wall_posts AS b
The UNION and UNION ALL operators require that:
The same number of columns exist in all the statements that make up the UNION'd query
The data types have to match at each position/column
Use:
FROM ((SELECT b.id AS id,
b.pid AS pid,
b.message AS message,
b.date AS date
FROM wall_posts AS b
JOIN Friends AS f ON f.id = b.pid
WHERE f.buddy_id = '1' AND f.status = 'b'
ORDER BY date DESC
LIMIT 0, 10)
UNION
(SELECT id,
pid,
message,
date
FROM wall_posts
WHERE pid = '1'
ORDER BY date DESC
LIMIT 0, 10))

You're taking the UNION of a 4-column relation (id, pid, message, and date) with a 6-column relation (* = the 6 columns of wall_posts). SQL doesn't let you do that.

(
SELECT b.id AS id, b.pid AS pid, b.message AS message, b.date AS date FROM
wall_posts AS b
JOIN Friends AS f ON f.id = b.pid
WHERE f.buddy_id = '1' AND f.status = 'b'
ORDER BY date DESC
LIMIT 0, 10
)
UNION
(
SELECT id, pid , message , date
FROM
wall_posts
WHERE pid = '1'
ORDER BY date DESC
LIMIT 0, 10
)
You were selecting 4 in the first query and 6 in the second, so match them up.

Beside from the answer given by #omg-ponies; I just want to add that this error also occur in variable assignment. In my case I used an insert; associated with that insert was a trigger. I mistakenly assign different number of fields to different number of variables. Below is my case details.
INSERT INTO tab1 (event, eventTypeID, fromDate, toDate, remarks)
-> SELECT event, eventTypeID,
-> fromDate, toDate, remarks FROM rrp group by trainingCode;
ERROR 1222 (21000): The used SELECT statements have a different number of columns
So you see I got this error by issuing an insert statement instead of union statement. My case difference were
I issued a bulk insert sql
i.e. insert into tab1 (field, ...) as select field, ... from tab2
tab2 had an on insert trigger; this trigger basically decline duplicates
It turns out that I had an error in the trigger. I fetch record based on new input data and assigned them in incorrect number of variables.
DELIMITER ##
DROP TRIGGER trgInsertTrigger ##
CREATE TRIGGER trgInsertTrigger
BEFORE INSERT ON training
FOR EACH ROW
BEGIN
SET #recs = 0;
SET #trgID = 0;
SET #trgDescID = 0;
SET #trgDesc = '';
SET #district = '';
SET #msg = '';
SELECT COUNT(*), t.trainingID, td.trgDescID, td.trgDescName, t.trgDistrictID
INTO #recs, #trgID, #trgDescID, #proj, #trgDesc, #district
from training as t
left join trainingDistrict as tdist on t.trainingID = tdist.trainingID
left join trgDesc as td on t.trgDescID = td.trgDescID
WHERE
t.trgDescID = NEW.trgDescID
AND t.venue = NEW.venue
AND t.fromDate = NEW.fromDate
AND t.toDate = NEW.toDate
AND t.gender = NEW.gender
AND t.totalParticipants = NEW.totalParticipants
AND t.districtIDs = NEW.districtIDs;
IF #recs > 0 THEN
SET #msg = CONCAT('Error: Duplicate Training: previous ID ', CAST(#trgID AS CHAR CHARACTER SET utf8) COLLATE utf8_bin);
SIGNAL SQLSTATE '45000' SET MESSAGE_TEXT = #msg;
END IF;
END ##
DELIMITER ;
As you can see i am fetching 5 fields but assigning them in 6 var. (My fault totally I forgot to delete the variable after editing.

You are using MySQL Union.
UNION is used to combine the result from multiple SELECT statements into a single result set.
The column names from the first SELECT statement are used as the column names for the results returned. Selected columns listed in corresponding positions of each SELECT statement should have the same data type. (For example, the first column selected by the first statement should have the same type as the first column selected by the other statements.)
Reference: MySQL Union
Your first select statement has 4 columns and second statement has 6 as you said wall_post has 6 column.
You should have same number of column and also in same order in both statement.
otherwise it shows error or wrong data.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Efficiently selecting a subset from a bigger selection in SQL - sql

Related

Using COUNTIF inside aggregation function

select subquery using data from the select statement?

PostgreSQL use case when result in where clause

Not getting SQL query for UNION to return correct value

#1222 - The used SELECT statements have a different number of columns

Categories

Resources