cleaner way to write this sql - sql

sometimes when I write sql I encounter the following situation:
select A = (
select sum(A)
--... big query using abc
),
select B = (
select sum(B)
--... same big query using abc
)
from abc
Maybe it doesn't look very well, but it's the only way I can think of in some situations. So the question is: big query is repeated, perhaps there is a cleaner way to write same thing?
Clarifications: abc is a bunch of joins. using abc means using current abc row's data. big query is not the same as abc.

Outer apply will help here:
select *
from abc
outer apply (
select sum(a) as sumA, sum(b) as sumB
-- big query using abc
) sums

if the 'big query' is the same in all the subselects, can't you just do:
select sum(a), sum(b)
from abc
where ...big query
Can't be more helpful without a decent set of example data and corresponsing query..

your query could be simplified to
SELECT sum(a) as A, sum(b) as B
FROM abc
although i suspect you've oversimplified your situation

It's hard to say what to do without seeing actual query and what you are trying to achieve. There are some approaches that might be useful.
1. Use CTE or derived table for your big query
2. In some cases it can be replaced with a number of SUM(CASE WHEN [condition] THEN field END)

If A and B are fields, you can just put both sums in the query:
select sum(a), sum(b) from abc
If what you want to do is to aggregate the same rows depending on different conditions, you can often use case. Imagine you have a table TASKS with fields STATUS and EFFORT, and you want to count both ACTIVE and PASSIVE tasks, and get the total effort of each aggregate. You could do:
select
count(case when status = 'ACTIVE' then 1 end) active_nr,
sum(case when status = 'ACTIVE' then effort else 0 end) active_effort,
count(case when status = 'PASSIVE' then 1 end) passive_nr,
sum(case when status = 'PASSIVE' then effort else 0 end) passive_effort
from tasks;
This is a simple example, the predicates tested by case can be as complex as you need, involving multiple fields, etc. As a bonus, this approach will usually be nicer to the database.

select sum(A),sum(B)
--... big query using abc
from abc
No need to split it up.

Related

Filtering the same query 3 different times. Performance?

I have a query that is really slow. I will post pseudo code here.
SELECT
ListofDates.Date as Event,
(SELECT COUNT(DISTINCT TableofExtensiveJoins1.ID)
FROM TableofExtensiveJoins1)
WHERE Event=TableofExtensiveJoins1.Date AND Condition1
(SELECT COUNT(DISTINCT TableofExtensiveJoins2.ID)
FROM TableofExtensiveJoins2
WHERE Event = TableofExtensiveJoins2.Date AND Condition2)
(SELECT COUNT(DISTINCT TableofExtensiveJoins3.ElementID)
FROM TableofExtensiveJoins3
WHERE Event = TableofExtensiveJoins3.Date AND Condition3)
FROM
ListOfDates
One thing to notice here is that TableOfExtensiveJoins1 , 2 and 3 are exactly the same query. But the Where condition is different on every one. Running the same query 3 times just to filter 3 times differently seems a little bit extensive. But as you can see it is necessary because i want to count stuff on the table. The table is each time filtered differently. But because of the "count" I have the fear that SQL compiles the table every time again.
I have that fear because the query runs exceptionally long. The subqueries are really complicated itself. To give you an example: To get only one record of the main query takes around 15 seconds. The sub query itself takes 5 seconds which would explain the 15 seconds, 3*5=15. And to run the whole main query it would likely get a few thousand records. I let it run 50 Minutes one day and it didn't finish. Obviously its not linear but that is beside the point. I just wanted to stress how bad the query is.
So obviously I need to increase performance on that query. For the sake of the optimization lets say i can not create new tables in the database. Else it would be to easy I guess. Lets also assume that TableoExtensiveJoins is already optimized.
So my question here is how can i rewrite the query to run it faster. Compile the table one once and then run the filter on the compilation. The query is run in Microsoft SQL Reporting Services. So there might be limitation on what kind of query are run able. But I'm not 100% sure about this.
Edit: The desired result might be helpful for the right answer.
TableOfExtensiveJoins is basically an event table. Evertime something specific happens (Doesnt matter) a new entry is created.
I now want for any given date to count the number of events with certain conditions. The ListOfDates has a list of dates. It takes the first occurence of the event and then creats a list of dates that than is filtered with Day(Date) % 5=1. So every 5. date.
Try conditional aggregation, kind of
SELECT ListofDates.Date as Event,
COUNT(DISTINCT CASE WHEN Condition 1 THEN tej.ID END) cnt1,
COUNT(DISTINCT CASE WHEN Condition 2 THEN tej.ID END) cnt2,
COUNT(DISTINCT CASE WHEN Condition 3 THEN tej.ID END) cnt3
from ListOfDates lod
left join TableofExtensiveJoins tej on lod.Event=tej.Date
group by lod.Event
The below should perform better as it only evaluates TableofExtensiveJoins once and only needs one operation to get the distinct counts
WITH DistCounts
AS (SELECT COUNT(DISTINCT ID) AS DistCount,
condition_flag,
Date
FROM TableofExtensiveJoins
CROSS APPLY (SELECT 1 WHERE Condition1
UNION ALL
SELECT 2 WHERE Condition2
UNION ALL
SELECT 3 WHERE Condition3) CA(condition_flag)
GROUP BY condition_flag,
Date),
Pivoted
AS (SELECT Date,
MAX(CASE WHEN condition_flag = 1 THEN DistCount END) AS DistCount1,
MAX(CASE WHEN condition_flag = 2 THEN DistCount END) AS DistCount2,
MAX(CASE WHEN condition_flag = 3 THEN DistCount END) AS DistCount3
FROM DistCounts
GROUP BY Date)
SELECT lod.Date as Event,
DistCount1,
DistCount2,
DistCount3
from ListOfDates lod
left join Pivoted p on lod.Date=p.Date
I think you want OUTER APPLY:
SELECT lod.Date as Event, tej.*
From ListOfDates lod OUTER APPLY
(SELECT SUM(CASE WHEN <condition 1> THEN 1 ELSE 0 END) as col1,
SUM(CASE WHEN <condition 2> THEN 1 ELSE 0 END) as col2,
SUM(CASE WHEN <condition 3> THEN 1 ELSE 0 END) as col3
FROM TableofExtensiveJoins tej
WHERE lod.Event = tej.Date
) tej;
Assuming that tej.ID is unique, you don't need the COUNT(DISTINCT). However, if you do:
SELECT lod.Date as Event, tej.*
From ListOfDates lod OUTER APPLY
(SELECT COUNT(DISTINCT CASE WHEN <condition 1> THEN tej.ID END) as col1,
COUNT(DISTINCT CASE WHEN <condition 2> THEN tej.ID END) as col2,
COUNT(DISTINCT CASE WHEN <condition 3> THEN tej.ID END) as col3
FROM TableofExtensiveJoins tej
WHERE lod.Event = tej.Date
) tej;
This generalizes to whatever conditions you might have in the subqueries. As a bonus, lateral joins (the technical term for what APPLY is doing in this case) often have the best performance in SQL Server.

How to get three different types of score for each text in one row in SQL Server?

I am a new SQL developer and I am trying to write a query that will retrieve the following results from three tables in the database:
ID Text Score #1 Score #2 Score #3
The schema for the three tables are:
T_TextScore Table: Id, TextId, Label, Score, TypeId
T_Text: Id, Text
T_Status Table: Id, Type
The T_TextScore table contains three different types of scroes for each text. I was able to retrieve the scores for all texts but I am still unable to show each type of score for each text in one row as illustrated above. So could you please tell me how I can get the desired result?
Here's the query I have and I think it is not efficient in terms of performance as well:
SELECT T_TextScore.TextId, T_Text.Text, T_TextScore.Label, T_TextScore.Score
FROM T_TextScore INNER JOIN
T_Text ON T_TextScore.TextId = T_Text.Id
WHERE (T_TextScore.TypeId = 3)
UNION
SELECT T_TextScore.TextId, T_Text.Text, T_TextScore.Label, T_TextScore.Score
FROM T_TextScore INNER JOIN
T_Text ON T_TextScore.TextId = T_Text.Id
WHERE (T_TextScore.TypeId = 4)
UNION
SELECT T_TextScore.TextId, T_Text.Text, T_TextScore.Label, T_TextScore.Score
FROM T_TextScore INNER JOIN
T_Text ON T_TextScore.TextId = T_Text.Id
WHERE (T_TextScore.TypeId = 5);
UPDATE:
After using the query suggested by #Craig Young, I got two rows for each text and I don't know why. Could you please explain to me why?
enter image description here
You want conditional aggregation. My best guess is simply:
SELECT ts.TextId,
MAX(CASE WHEN ts.TypeId = 3 THEN ts.Score END) as Score_1,
MAX(CASE WHEN ts.TypeId = 4 THEN ts.Score END) as Score_2,
MAX(CASE WHEN ts.TypeId = 5 THEN ts.Score END) as Score_3
FROM T_TextScore ts
GROUP BY ts.TextId;
Your query isn't doing quite what you asked for. The following would be a much better way of doing what you're currently doing:
SELECT ts.TextId, t.Text, ts.Label, ts.Score
FROM T_TextScore ts /* Table alias makes query much more readable */
INNER JOIN T_Text t ON
ts.TextId = t.Id
WHERE ts.TypeId IN (3, 4, 5)
However, the first part of your question suggests you actually want to pivot your data?
If so you can use PIVOT syntax. Or manual pivoting:
SELECT ts.TextId,
/* Use aggregate function to get only 1 per TextId */
MAX(t.Text) AS Text, MAX((ts.Label) AS Label,
/* Simply move ts.Score to the correct column to be summed. */
SUM(CASE WHEN ts.TypeId = 3 THEN ts.Score ELSE 0 END) AS Score3,
SUM(CASE WHEN ts.TypeId = 4 THEN ts.Score ELSE 0 END) AS Score4,
SUM(CASE WHEN ts.TypeId = 5 THEN ts.Score ELSE 0 END) AS Score5
FROM T_TextScore ts
INNER JOIN T_Text t ON
ts.TextId = t.Id
WHERE ts.TypeId IN (3, 4, 5)
GROUP BY ts.TextId
NOTE: PIVOT syntax is a little more succinct. But strangely I have seen it run slightly slower than manual pivot on occasion. So if performance is important, you'll have to benchmark.
Based on your comment:
After using the query suggested by #Craig Young, I got two rows for each text and I don't know why.
You probably either removed the GROUP BY clause, or included Text and Label in the GROUP BY. This made me realise that I'd forgotten to deal with these 2 columns which weren't part of either aggregate or GROUP BY.
I've updated my query above appropriately. However, I should point out lack of sample data makes it tricky to determine exactly what you're trying to achieve - particularly with the Label column which could be different per Score Type.

Count instances of value (say, '4') in several columns/ rows

I have survey responses in a SQL database. Scores are 1-5.
Current format of the data table is this:
Survey_id, Question_1, Question_2, Question_3
383838, 1,1,1
392384, 1,5,4
393894, 4,3,5
I'm running a new query where I need % 4's, % 5's ... question doesn't matter, just overall.
At first glance I'm thinking
sum(iif(Question_1 =5,1,0)) + sum(iif(Question_2=5,1,0)) .... as total5s
sum(iif(Question_1=4,1,0)) + sum(iif(Question_2=4,1,0)) .... as total4s
But I am unsure if this is the quickest or most elegant way to achieve this.
EDIT: Hmm on first test this query already appears not to work correctly
EDIT2: I think I need sum instead of count in my example, will edit.
You have to unpivot the data and calculate the % responses thereafter. Because there are a limited number of questions, you can use union all to unpivot the data.
select 100.0*count(case when question=4 then 1 end)/count(*) as pct_4s
from (select survey_id,question_1 as question from tablename
union all
select survey_id,question_2 from tablename
union all
select survey_id,question_3 from tablename
) responses
Another way to do this could be
select 100.0*(count(case when question_1=4 then 1 end)
+count(case when question_2=4 then 1 end)
+count(case when question_3=4 then 1 end))
/(3*count(*))
from tablename
With unpivot as #Dudu suggested,
with unpivoted as (select *
from tablename
unpivot (response for question in (question_1,question_2,question_3)) u
)
select 100.0*count(case when response=4 then 1 end)/count(*)
from unpivoted

SQL query same column twice

I need to query the same column for different values depending on another relational value.
Table is set up like this : athleteID, meetName, eventName, score
Events are all the same but there are Different Meets and my query needs to return: aid, event, score from meetName = 'whatever1', score from meetname = 'whatever2'
Ive tried every basic way about completeing this but cannot do it. I've lastly tried
SELECT distinct athleteID, event,
(select score from performances where meetName='Snowflake') as SnowScore,
(select score from performances where meetName='Valentine') as ValScore,
from performances
where event='high jump'
which returns: single-row subquery returns more than one row
My expected result would be like this:
aid, event, SnowScore, ValScore
1 , high jump, 6, 8
2 , high jump, 3, 5
3, high jump, 8, 10
Does not stipulate RDMS, my answer is with SQL Server:
If you wanted to use a subquery you need to reference the atherleteID and eventName, also if there were more than one result (not clear from your question but I assume atheletes compete at multiple meets) you would need to aggregate.
There may be a better way but as a simple one off query I would probably do it like:
SELECT athleteID, eventName,
sum(CASE WHEN meetName='Snowflake' THEN score ELSE 0 END) as SnowScore,
sum(CASE WHEN meetName='Valentine' THEN score ELSE 0 END) as ValScore
FROM performances
GROUP BY atheleteID,eventName
A better longer term solution would be with PIVOT and if the meetNames will change over time you can create dynamic pivot queries, a good example I found is here
Didn't try it but it gives the idea... :
SELECT athleteID, event,
sum(case when meetName='Snowflake' then score else 0 end) as SnowScore,
sum(case when meetName='Valentine' then score else 0 end) as ValScore,
from performances
group by athleteID, event
I would like to add that Natural Inner Join is what should've been done here for basic(non-commercial) sql.
Syntax would've been: select * from (subquery1) NIJ (subquery2)
The subqueries syntax:
select athleteID, score as ValScore from performances, NIJ athletes where meet =‘Valentin’ and event=‘beam’
and
select athleteID, score as SnowScore from performances, NIJ athletes where meet =‘SnowFlake’ and event=‘beam’

Oracle/SQL - Need help optimizing this union/group/count query

I'm trying to optimize this query however possible. In my test tables this does exactly what I want it too, but on the live tables this takes a VERY long time to run.
select THING_,
count(case STATUS_ when '_Good_' then 1 end) as GOOD,
count(case STATUS_ when '_Bad_' then 1 end) as BAD,
count(case STATUS_ when '_Bad_' then 1 end) / count(case STATUS_ when '_Good_' then 1 end) * 100 as FAIL_PERCENT
from
(
select THING_,
STATUS_,
from <good table>
where TIMESTAMP_ > (sysdate - 1) and
STATUS_ = '_Good_' and
upper(THING_) like '%TEST%'
UNION ALL
select THING_,
STATUS_,
from <bad table>
where TIMESTAMP_ > (sysdate - 1) and
STATUS_ = '_Bad_' and
THING_THING_ like '%TEST%'
) u
group by THING_
I think by looking at the query it should be self explanatory what I want to do, but if not or if additional info is needed please let me know and I will post some sample tables.
Thanks!
Create composite indexes on (STATUS_, TIMESTAMP_) in both tables.
(1) Looking at the execution plan should always be your first step in diagnosing SQL performance issues
(2) A possible problem with the query as written is that, because SYSDATE is a function that is not evaluated until execution time (i.e. after the execution plan is determined), the optimizer cannot make use of histograms on the timestamp column to evaluate the utility of an index. I have seen that lead to bad optimizer decisions. If you can work out a way to calculate the date first then feed it into the query as a bind or a literal, that may help, although this is really just a guess.
(3) Maybe a better overall way to structure the query would be as a join (possibly full outer join) between aggregate queries on each of the tables.
SELECT COALESCE(g.thing_,b.thing_), COALESCE(good_count,0), COALESCE(bad_count,0)
FROM (SELECT thing_,count(*) good_count from good_table WHERE ... GROUP BY thing_) g
FULL OUTER JOIN
(SELECT thing_,count(*) bad_count from bad_table WHERE ... GROUP BY thing_) b
ON b.thing_ = g.thing_
(Have to say, it seems kind of weird that you have two separate tables when you also have a status column to indicate "good" or "bad". But maybe I am overinterpreting.)
Have you tried analytical function to use? It might decrease some time execution. Here you are an example:
select distinct col1, col2, col3
(Select col1,
count(col2) over (partition by col1) col2,
count(col3) over (partition by col1) col3
from table
)
Its something like that.