BigQuery Standard SQL count original rows after CROSS JOIN UNNEST

BigQuery Standard SQL count original rows after CROSS JOIN UNNEST - sql

I have a table with a repeated field that requires a CROSS JOIN UNNEST and I want to be able to get the count of the original, nested rows. For example.
SELECT studentId, COUNT(1) as studentCount
FROM myTable
CROSS JOIN UNNEST classes
WHERE classes.id in ('1', '2')
Right now, if a student is in class 1 and 2 it will count that student twice in studentCount.
I know I can do count(distinct(student.id)) to workaround this, but this ends up being a lot slower than a simple count. It's not taking advantage of the fact there's exactly one row per student.
So is there any way to compute count of the original rows before unnesting (but after the where clause) but still include the unnest in the query?
Note this must be in Standard SQL.

I understood your "challenge" as to show only students from classes id 1 and 2 while still showing total count of student in all classes. If this is it - see below
#standardSQL
SELECT studentId, studentCount
FROM myTable
CROSS JOIN (SELECT COUNT(1) studentCount FROM myTable)
WHERE studentId IN (
SELECT studentID FROM UNNEST(classes) AS classes
WHERE classes.id IN ('1', '2')
)
you can test / play with it using dummy data as below
#standardSQL
WITH myTable AS (
SELECT 1 AS studentId, [STRUCT<id STRING>('1'),STRUCT('2'),STRUCT('3')] AS classes UNION ALL
SELECT 2, [STRUCT<id STRING>('4'),STRUCT('5')]
)
SELECT studentId, studentCount
FROM myTable
CROSS JOIN (SELECT COUNT(1) studentCount FROM myTable)
WHERE studentId IN (
SELECT studentID FROM UNNEST(classes) AS classes
WHERE classes.id IN ('1', '2')
)
If your desired output is different from what I guessed - you still might find above useful for calculating studentCount

Just given the original constraints--that unnesting is required and you need to count the number of students--you can use a query of this form:
SELECT studentId, (SELECT COUNT(*) FROM myTable) AS studentCount
FROM myTable
CROSS JOIN UNNEST classes
WHERE classes.id in ('1', '2')

Related

How to create an additional column with the percentages related to a count distinct statement

I'm trying to query each distinct medical speciality (e.g. oncologist, pediatrician, etc.) in a table and then count the number of times a claim (claim_id) is linked to it, which I've done using this:
select distinct specialization, count(distinct claim_id) AS Claim_Totals
from table1
group by specialization
order by Claim_Totals DESC
However, I also want to include an additional column which lists the % that each speciality makes up in the table (based on the number of claim_id related to it). So for instance, if there were 100 total claims and "cardiologist" had 25 claim_id records related to it, "oncologist" had 15, "general surgeon" had 10, and so forth, I want the output to look like this:
specialization | Claims_Totals | PERCENTAGE
___________________________________________
cardiologist 25 25%
oncologist 15 15%
general surgeon 10 10%

Could do this? I'm not familiar with Barbaros's syntax. If that works its more concise and better.
select specialization, count(distinct claim_id) AS Claim_Totals, count(distinct claim_id)/total_claims
from table1
INNER JOIN ( SELECT COUNT(DISTINCT claim_id)*1.0000 total_claims AS total_claims
FROM table1 ) TMP
ON 1 = 1
group by specialization
order by Claim_Totals DESC
select specialization,
count(distinct claim_id) AS claim_by_spec,
count(distinct claim_id)/
( SELECT COUNT(DISTINCT claim_id)*1.0000
FROM table1 ) AS percentage_calc
from table1
group by specialization
order by Claim_Totals DESC

You can use sum(count(distinct)) over() to get the overall claims and use it in the denominator to get the percentage.
select specialization
,count(distinct claim_id) AS Claim_Totals
,round(100*count(distinct claim_id)/sum(count(distinct claim_id)) over(),3) as percentage
from table1
group by specialization

You can use
,concat_ws('',count(distinct claim_id),'%') as percentage
or
,concat(count(distinct claim_id),'%') as percentage
as added to the select list's tail
Btw, distinct before specialization in the select list is redundant, since already included in the group by list.

Because you are using count(distinct), window functions are less useful. You can try:
select t1.specialization,
count(distinct t1.claim_id) AS Claim_Totals,
count(distinct t1.claim_id) / tt1.num_claims
from table1 t1 cross join
(select count(distinct claim_id) as num_claims
from table1
) tt1
group by t1.specialization
order by Claim_Totals DESC

how to reducible this SQL query

I have two tables, class_students and school_students. I need to count the total number of schools and the proportion of the whole class, but I used two identical queries. This is my query:
SELECT t.class_name,
(SELECT COUNT (1) FROM school_students) as total_school_population,
COUNT (1) / (SELECT COUNT (1) FROM school_students)
FROM class_students t;
so, how do I optimize it?

If those two table doesn't have any relationship, you can try to use to CROSS JOIN let subquery get result set then use the column.
SELECT t.class_name,
t1.cnt total_school_population,
COUNT(1)/ t1.cnt
FROM class_students t CROSS JOIN
(
SELECT COUNT(1) cnt
from school_students
) t1
group by t.class_name,t1.cnt

To avoid the recalculation of total school count for each record, save the value in a column user variable
COLUMN school_count NEW_VALUE school_count
SELECT count(*) school_count
FROM school_students;
Then use this variable for the division expression
SELECT class_name,
'&school_count' as total_school_population,
COUNT (1) / &school_count as class_proportion
FROM class_students
GROUP BY class_name;
In case the tables contain very large number of records, do gather statistics and/or use optimizer hints like
/*+ ALL_ROWS*/

Trying to figure out how to join these queries

I have a table named grades. A column named Students, Practical, Written. I am trying to figure out the top 5 students by total score on the test. Here are the queries that I have not sure how to join them correctly. I am using oracle 11g.
This get's me the total sums from each student:
SELECT Student, Practical, Written, (Practical+Written) AS SumColumn
FROM Grades;
This gets the top 5 students:
SELECT Student
FROM ( SELECT Student,
, DENSE_RANK() OVER (ORDER BY Score DESC) as Score_dr
FROM Grades )
WHERE Student_dr <= 5
order by Student_dr;

The approach I prefer is data-centric, rather than row-position centric:
SELECT g.Student, g.Practical, g.Written, (g.Practical+g.Written) AS SumColumn
FROM Grades g
LEFT JOIN Grades g2 on g2.Practical+g2.Written > g.Practical+g.Written
GROUP BY g.Student, g.Practical, g.Written, (g.Practical+g.Written) AS SumColumn
HAVING COUNT(*) < 5
ORDER BY g.Practical+g.Written DESC
This works by joining with all students that have greater scores, then using a HAVING clause to filter out those that have less than 5 with a greater score - giving you the top 5.
The left join is needed to return the top scorer(s), which have no other students with greater scores to join to.
Ties are all returned, leading to more than 5 rows in the case of a tie for 5th.
By not using row position logic, which varies from darabase to database, this query is also completely portable.
Note that the ORDER BY is optional.

With Oracle's PLSQL you can do:
SELECT score.Student, Practical, Written, (Practical+Written) as SumColumn
FROM ( SELECT Student, DENSE_RANK() OVER (ORDER BY Score DESC) as Score_dr
FROM VOTES ) as score, students
WHERE score.score_dr <= 5
and score.Student = students.Student
order by score.Score_dr;

You can easily include the projection of the first query in the sub-query of the second.
SELECT Student
, Practical
, Written
, tot_score
FROM (
SELECT Student
, Practical
, Written
, (Practical+Written) AS tot_score
, DENSE_RANK() OVER (ORDER BY (Practical+Written) DESC) as Score_dr
FROM Grades
)
WHERE Student_dr <= 5
order by Student_dr;
One virtue of analytic functions is that we can just use them in any query. This distinguishes them from aggregate functions, where we need to include all non-aggregate columns in the GROUP BY clause (at least with Oracle).

Select entry of each group having exactly 1 entry

I am looking for an optimized query
let me show you a small example.
Lets suppose I have a table having three field studentId, teacherId and subject as
Now I want those data in which a physics teacher is teaching to only one student, i.e
teacher 300 is only teaching student 3 and so on.
What I have tried till now
select sid,tid from tabletesting with(nolock)
where tid in (select tid from tabletesting with(nolock)
where subject='physics' group by tid having count(tid) = 1)
and subject='physics'
The above query is working fine. But I want different solution in which I don't have to scan the same table twice.
I also tried using Rank() and Row_Number() but no result.
FYI :
I have showed you an example, this is not the actual table i am playing with, my table contain huge number of rows and columns and where clause is also very complex(i.e date comparison etc.), so I don't want to give the same where clause in subquery and outquery.

You can do this with window functions. Assuming that there are no duplicate students for a given teacher (as in your sample data):
select tt.sid, tt.tid
from (select tt.*, count(*) over (partition by teacher) as scnt
from TableTesting tt
) tt
where scnt = 1;
Another way to approach this, which might be more efficient, is to use an exists clause:
select tt.sid, tt.tid
from TableTesting tt
where not exists (select 1 from TableTesting tt1 where tt1.tid = tt.tid and tt1.sid <> tt.sid)

Another option is to use an analytic function:
select sid, tid, subject from
(
select sid, tid, subject, count(sid) over (partition by subject, tid) cnt
from tabletesting
) X
where cnt = 1

How to find max value and its associated field values in SQL?

Say I have a list of student names and their marks. I want to find out the highest mark and the student, how can I write one select statement to do that?

Assuming you mean marks rather than remarks, use:
select name, mark
from students
where mark = (
select max(mark)
from students
)
This will generally result in a fairly efficient query. The subquery should be executed once only (unless your DBMS is brain-dead) and the result fed into the second query. You may want to ensure that you have an index on the mark column.

If you don't want to use a subquery:
SELECT name, remark
FROM students
ORDER BY remark DESC
LIMIT 1

select name, remarks
from student
where remarks =(select max(remarks) from student)

If you are using a database that supports windowing,
SELECT name, mark FROM
(SELECT name, mark, rank() AS rk
FROM student_marks OVER (ORDER BY mark DESC)
) AS subqry
WHERE subqry.rk=1;
This probably does not run as fast as the mark=(SELECT MAX(mark)... style query, but it would be worth checking out.

In SQL Server:
SELECT TOP 1 WITH TIES *
FROM Students
ORDER BY Mark DESC
This will return all the students that have the highest mark, whether there is just one of them or more than one. If you want only one row, drop the WITH TIES specifier. (But the actual row is not guaranteed to be always the same then.)

You can create view and join it with original table:
V1
select id , Max(columName)
from t1
group by id
select * from t1
where t1.id = V1.id and t1.columName = V1.columName
this is right if you need Max Values with related info

I recently had a need for something "kind of similar" to this post and wanted to share a technique. Say you have an Order and OrderDetail table, and you want to return info from the Order table along with the product name associated with the highest priced detail row. Here's a way to pull that off without subtables, RANK, etc.. The key is to create and aggregate that combined the key and value from the detailed table and then just max on that and substring out the value you want.
create table CustOrder(ID int)
create table CustOrderDetail(OrderID int, Price money, ProdName varchar(20))
insert into CustOrder(ID) values(1)
insert into CustOrderDetail(OrderID,Price,ProdName) values(1,10,'AAA')
insert into CustOrderDetail(OrderID,Price,ProdName) values(1,50,'BBB')
insert into CustOrderDetail(OrderID,Price,ProdName) values(1,10,'CCC')
select
o.ID,
JoinAggregate=max(convert(varchar,od.price)+'*'+od.prodName),
maxProd=
SUBSTRING(
max(convert(varchar,od.price)+'*'+od.prodName)
,CHARINDEX('*',max(convert(varchar,od.price)+'*'+convert(varchar,od.prodName))
)+1,9999)
from
CustOrder o
inner join CustOrderDetail od on od.orderID = o.ID
group by
o.ID

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery Standard SQL count original rows after CROSS JOIN UNNEST - sql

Just given the original constraints--that unnesting is required and you need to count the number of students--you can use a query of this form: SELECT studentId, (SELECT COUNT(*) FROM myTable) AS studentCount FROM myTable CROSS JOIN UNNEST classes WHERE classes.id in ('1', '2')

Related

How to create an additional column with the percentages related to a count distinct statement

how to reducible this SQL query

Trying to figure out how to join these queries

Select entry of each group having exactly 1 entry

How to find max value and its associated field values in SQL?

Categories

Resources