drawing query trees using avg? - sql

I was wondering how would avg and group by represented in query trees?
I have a query like this:
SELECT Stats.StuId, Stats.CrsAve
FROM (SELECT T.StuId, AVG(T.Grd) AS CrsAvg
FROM Transcript T
WHERE T.Semester IN (‘F2004’, ‘S2006’)
GROUP BY T.StuId) AS Stats
WHERE Stats.CrsAvg > 3.5
So, modules GROUP BY and AVG worry me - how are they drawn?

You have to use "Avg", but to optimize the query you can avoid using two select adding a clause "Having":
SELECT T.StuId, AVG(T.Grd) AS CrsAvg
FROM Transcript T
WHERE T.Semester IN (‘F2004’, ‘S2006’)
GROUP BY T.StuId
having AVG(T.Grd) > 3.5
Also, you can consider adding appropriate indexes to the table.

Related

convert access group by query to sql server query

I'm unable to convert MS Access query to SQL SERVER Query, with changing the group by columns because it will effect in the final result. The purpose of this query is to calculate the Creditor and debtor of accounts of projects.
I tried rewriting with 'CTE' but couldn't get any good result.. I hope someone could help me.. Thanks in advance...
this is the query I want to convert:
SELECT Sum(ZABC.M) AS M, Sum(ZABC.D) AS D, ZABC.ACC_NUMBER, ZABC.PROJECT_NUMBER, [M]-[D] AS RM, [D]-
[M] AS RD
FROM ZABC
GROUP BY ZABC.ACC_NUMBER, ZABC.PROJECT_NUMBER
ORDER BY ZABC.PROJECT_NUMBER;
The problem with the query are [M] and [D] in the select clause: these columns should either be repeated in the group by clause, or surrounded by an aggregate function. Your current group by clause gives you one row per (acc_number, project_number) tuple: you need to choose which computation you want for D and M, that may have several different values per group.
You did not explain the purpose of the original query. Maybe you meant:
SELECT
Sum(ZABC.M) AS M,
Sum(ZABC.D) AS D,
ZABC.ACC_NUMBER,
ZABC.PROJECT_NUMBER,
Sum(ZABC.M) - SUM(ZABC.D) AS RM,
SUM(ZABC.D) - SUM(ZABC.M) AS RD
FROM ZABC
GROUP BY ZABC.ACC_NUMBER, ZABC.PROJECT_NUMBER
ORDER BY ZABC.PROJECT_NUMBER;
There is a vast variety of aggregate functions available for you to pick from, such as MIN(), MAX(), AVG(), and so on.

Efficient way to simultaneously calc AVG and stddev_pop in postgres

stddev_pop() must be calculating AVG() as part of the full calc of standard deviation (unless there's a shortcut I'm unaware of).
for context, the goal is to test for the difference of means between the these two geom columns.
Is there any way to access that in order to avoid recalculating AVG()?
here's an example query:
select
avg(st_length(cons.geom)) as source_avg_length,
avg(st_length(csn.geom)) as target_avg_length,
stddev_pop(st_length(cons.geom)) as source_std_length,
stddev_pop(st_length(csn.geom)) as target_std_length
from
received.conflation_osm_no_service cons,
received.conflation_stress_network csn ;
and the output of EXPLAIN ANALYZE which makes me think that if I ask for avg() and stddev_pop() it will only do the avg() calc once and reuse it?:
To combine both tables in a single result you must aggregate before joining:
select *
from
(
select
avg(st_length(geom)) as source_avg_length,
stddev_pop(st_length(geom)) as source_std_length
from received.conflation_osm_no_service cons
) as src
cross join
(
select
avg(st_length(geom)) as target_avg_length,
stddev_pop(st_length(geom)) as target_std_length,
from
received.conflation_stress_network csn ;
) as tgt
or to get one row per table:
select 'source' as tablename,
avg(st_length(geom)) as avg_length,
stddev_pop(st_length(geom)) as std_length
from
received.conflation_osm_no_service cons
union all
select 'target',
avg(st_length(geom)),
stddev_pop(st_length(geom)),
from
received.conflation_stress_network csn ;
per the comments, I was attributing slow execution times to multiple average aggregations when in reality it was due to an unnecessary join.

counts' division doesn't work in full code

I do have a problem with a task because my division value is different when I use it alone and when I use it in full code. Let's say I do this code:
SELECT (count(paimta))::numeric / count(distinct paimta) as average
FROM Stud.Egzempliorius;
and finally a number I get is 2.(6)7, but when I use it in full code which is:
SELECT Stud.Egzempliorius.Paimta, COUNT(PAIMTA) as PaimtaKnyga
FROM Stud.Skaitytojas, Stud.Egzempliorius
WHERE Stud.Skaitytojas.Nr=Stud.Egzempliorius.Skaitytojas
GROUP BY Stud.Egzempliorius.Paimta
HAVING count(paimta) > (count(paimta))::numeric / count(distinct paimta);
it's value changes because division is not working anymore and let's say instead of having
HAVING count(paimta) > (count(paimta))::numeric / count(distinct paimta);
my codes turns into
HAVING count(paimta) > (count(paimta))::numeric;
and these values are equal, so I can't get final answer. That's database I use https://klevas.mif.vu.lt/~baronas/dbvs/biblio/show-table.php?table=Stud.Egzempliorius
I was struggling for 10 hours now and finally I've lost my patience... So, my question is what I have to do that this code:
SELECT (count(paimta))::numeric / count(distinct paimta) as average
FROM Stud.Egzempliorius;
value doesn't change in full code?
Picture how it changes Photo
Your solution fails because the two queries operate on a different groups of rows. The first query does a computation over the whole dataset, while the second one groups by paimta.
One option would have been to use window functions, but as far as concerns Postgres does not support count(distinct) as a window function.
I think that the simplest approach is to use a subquery :
select e.paimta, count(paimta) as paimtaknyga
from stud.skaitytojas s
inner join stud.egzempliorius e on s.nr = e.skaitytojas
group by e.paimta
having count(paimta) > (
select (count(paimta))::numeric / count(distinct paimta) from stud.egzempliorius
)

Group By Using Wildcards in Big Query

I have this query:
SELECT SomeTableA.*
FROM SomeTableB
LEFT JOIN SomeTableA USING (XYZ)
GROUP BY SomeTableA.*
I know that I cannot do the GROUP BY part with wildcards. At the same time, I don't really like listing all the columns (can be up to 20) manually.
Could this be added as new feature? Or is there any way how to easily get the list of all 20 columns from SomeTableA for the GROUP BY part?
If you really have the exact query shown in your question - then try below instead - no grouping required
#standardSQL
SELECT DISTINCT *
FROM `project.dataset.tableA`
WHERE xyz IN (SELECT xyz FROM `project.dataset.tableB`)
As of Group By Using Wildcards in Big Query this sounds more like grouping by struct which is not supported so you can submit feature request if you want - https://issuetracker.google.com/issues/new?component=187149&template=0

Using a Calculated field in SQL Query

I have a sql query in which i have a calculated field which calculates the Contribution Margin. I get it to display and the math works fine. The problem i'm having is that i want to only display the records in which the Contribution Margin is lower than 0.25. I know you cant use column alias in the where clause. I was wondering what the best way to go about doing this would be. I'm also using Visual Studio for this.
SELECT *
FROM (
SELECT m.*,
compute_margin(field1, field2) AS margin
FROM mytable m
) q
WHERE margin < 0.25
You can't use the column alias (unless you use your original query as a subquery), but you can use the expression that you're using to define the calculated value.
For example, if your query is this now:
select
contribution_amount,
total_amount,
contribution_amount / total_amount as contribution_margin
from records
You could do this:
select
contribution_amount,
total_amount,
contribution_amount / total_amount as contribution_margin
from records
where contribution_amount / total_amount < 0.25
Or this:
select * from
(
select
contribution_amount,
total_amount,
contribution_amount / total_amount as contribution_margin
from records
)
where contribution_margin < 0.25
(Personally I find the first version to be preferable, but both will likely perform the same)
You can either
repeat the calculation in the where clause
wrap the query in a table expression (CTE or derived table) and use the alias in the where clause
assign the alias in a cross apply.
To give an example of the last approach
select doubled_schema_id,*
from sys.objects
cross apply (select schema_id*2 as doubled_schema_id) c
where doubled_schema_id= 2
two ways, either the solution that Quassnoi posted(you can also use a CTE which is similar)
or WHERE compute_margin(field1, field2) < 0.25