Percentage to total in BigQuery Legacy SQL (Subqueries?)

Percentage to total in BigQuery Legacy SQL (Subqueries?) - google-bigquery

I can't understand how to calulate percentage to total in BigQuery Legacy SQL.
So, I have a table:
ID | Name | Group | Mark
1 | John | A | 10
2 | Lucy | A | 5
3 | Jane | A | 7
4 | Lily | B | 9
5 | Steve | B | 14
6 | Rita | B | 11
I want to calculate percentage like this:
ID | Name | Group | Mark | Percent
1 | John | A | 10 | 10/(10+5+7)=45%
2 | Lucy | A | 5 | 5/(10+5+7)=22%
3 | Jane | A | 7 | 7/(10+5+7)=33%
4 | Lily | B | 9 | 9/(9+14+11)=26%
5 | Steve | B | 14 | 14/(9+14+11)=42%
6 | Rita | B | 11 | 11/(9+14+11)=32%
My table is quite long for me (3 million rows).
I thought that I could do it with subqueries, but in SELECT I can't use subqueries.
Does anyone know a way to do it?

SELECT
ID, Name, [Group], Mark,
RATIO_TO_REPORT(Mark) OVER(PARTITION BY [Group]) AS percent
FROM YourTable
Check more about RATIO_TO_REPORT

Related

SQL JOIN each id in JSON object

I have a JSON column containing col_values for another table. I want to return rows from that other table for each item in the JSON object.
If this was an INT column, I would use JOIN, but I need to JOIN every entry in the JSON object.
Take:
writers :
| id | name | projects (JSON) |
|:-- |:-----|:------------------|
| 1 | Andy | ["1","2","3","4"] |
| 2 | Hank | ["3","4","5","6"] |
| 3 | Alex | ["1","7","8","9"] |
| 4 | Joe | ["1","5","6","7"] |
| 5 | Ken | ["2","4","5","6"] |
| 6 | Zach | ["2","7","8","9"] |
| 7 | Walt | ["2","5","6","7"] |
| 8 | Mike | ["2","3","4","5"] |
cities :
| id | name | project |
|:-- |:---------|:--------|
| 1 | Boston | 1 |
| 2 | Chicago | 2 |
| 3 | Cisco | 3 |
| 4 | Seattle | 4 |
| 5 | North | 5 |
| 6 | West | 6 |
| 7 | Miami | 7 |
| 8 | York | 8 |
| 9 | Tainan | 9 |
| 10 | Seoul | 1 |
| 11 | South | 2 |
| 12 | Tokyo | 3 |
| 13 | Carlisle | 4 |
| 14 | Fugging | 5 |
| 15 | Turkey | 6 |
| 16 | Paris | 7 |
| 17 | Midguard | 8 |
| 18 | Fugging | 9 |
| 19 | Madrid | 1 |
| 20 | Salvador | 2 |
| 21 | Everett | 3 |
I need every city ordered by name for Mike (id=8).
Desired results:
This is what I'm getting and what I need to get (ORDER BY name).
Output :
| id | name | project |
|:---|:---------|:--------|
| 13 | Carlisle | 4 |
| 2 | Chicago | 2 |
| 3 | Cisco | 3 |
| 21 | Everett | 3 |
| 14 | Fugging | 5 |
| 5 | North | 5 |
| 20 | Salvador | 2 |
| 4 | Seattle | 4 |
| 11 | South | 2 |
| 12 | Tokyo | 3 |
Current query, but this can't be the best way...
SQL >
SELECT c.*
FROM cities c
WHERE EXISTS (
SELECT 1
FROM writers w
WHERE JSON_CONTAINS(
w.projects, CONCAT('\"', c.project, '\"'))
AND w.id = '8'
)
ORDER BY c.name;
DB Fiddle with the above. Is there a better way to do this "properly"?
Background
If it matters, I need to keep using JSON as the datatype because my server-side software that uses this database normally reads that column best if presented as a JSON object.
I would normally just do several database calls and iterate through that JSON object in my server-side language, but that is way too expensive with so many database calls, notwithstanding that it is even more costly to do multiple database calls for pagination.
I need all the results in a single database call. So, I need to JOIN or otherwise loop through each item in the JSON object within SQL.

Start with JOIN
Per a comment from a user, there is a better way...
SQL >
SELECT c.*
FROM writers w
JOIN cities c ON JSON_CONTAINS(w.projects, CONCAT('\"', c.project, '\"'))
WHERE w.id = '8'
ORDER BY c.name;
Output is the same...
Output :
id
name
project
13
Carlisle
4
2
Chicago
2
3
Cisco
3
21
Everett
3
14
Fugging
5
5
North
5
20
Salvador
2
4
Seattle
4
11
South
2
12
Tokyo
3
DB Fiddle

How to structure a proper SQL subquery?

I'm trying to wrap my head around how to do a proper subquery, it's not making sense to me, lets say I have two tables books and chapters:
Books
+----+------------------+----------+---------------------+
| id | name | author | last_great_chapters |
+----+------------------+----------+---------------------+
| 1 | some book title | john doe | 2 |
| 2 | foo novel title | some guy | 4 |
| 3 | other book title | lol man | 3 |
+----+------------------+----------+---------------------+
Chapters
+----+---------+----------------+
| id | book_id | chapter_number |
+----+---------+----------------+
| 1 | 1 | 1 |
| 2 | 1 | 3 |
| 3 | 1 | 4 |
| 4 | 1 | 5 |
| 5 | 2 | 1 |
| 6 | 2 | 2 |
| 7 | 2 | 3 |
| 8 | 2 | 4 |
| 9 | 2 | 5 |
| 10 | 3 | 1 |
| 11 | 3 | 2 |
| 12 | 3 | 3 |
| 13 | 3 | 4 |
| 14 | 3 | 5 |
+----+---------+----------------+
How can I join the two tables, and just print out the number of rows (sorted limit(last_great_chapters)) of the "last_great_chapters" from the books table list for each book?

if I understood correctly, you want to print out table books and last_great_chapters count in Chapters table?
if yes, try it
select b.id, b.name, b.author , b.last_great_chapter, COUNT(c.chapter_number) as rownumbers FROM Books as b
LEFT JOIN Chapters AS C ON c.chapter_number = b.last_great_chapters
group by b.id, b.name, b.author , b.last_great_chapter

SQL - aggregation with column value as column name

For a table like below need to do an aggregation such that for each unique field in one column, need to find the count of occurrences of a discrete value in another column
input table is:
id model datetime driver distance
---|-----|------------|--------|---------
1 | S | 04/03/2009 | john | 399
2 | X | 04/03/2009 | juliet | 244
3 | 3 | 04/03/2009 | borat | 555
4 | 3 | 03/03/2009 | john | 300
5 | X | 03/03/2009 | juliet | 200
6 | X | 03/03/2009 | borat | 500
7 | S | 24/12/2008 | borat | 600
8 | X | 01/01/2009 | borat | 700
Output required
model john juliet | borat
-----|--------|-------|------
S | 1 | 0 | 1
X | 0 | 2 | 2
3 | 1 | 0 | 1
one potential way to do is to group by model with an aggregation like
SUM (CASE WHEN driver = 'value' THEN 1 ELSE 0 END) AS value for each discrete value of driver column. But the challenge is sometimes the number of discrete values is too many ( around 50 in my case) or in some cases do not even know all possible discrete values - I was wondering if there is an alternate way to do this.

The aggregation part need a litle more work.
Here the details:
Need calculate first what are all the combinations
Then use LEFT JOIN to get which combination doesnt have data.
DEMO
WITH "allDrivers" as (
SELECT DISTINCT "driver"
FROM Table1
),
"allModels" as (
SELECT DISTINCT "model"
FROM Table1
),
"source" as (
SELECT d."driver", m."model"
FROM "allDrivers" d
CROSS JOIN "allModels" m
)
SELECT s."model", s."driver", COUNT(t."datetime")
FROM "source" s
LEFT JOIN table1 t
ON s."model" = t."model"
AND s."driver" = t."driver"
GROUP BY s."model", s."driver"
OUTPUT
| model | driver | count |
|-------|--------|-------|
| 3 | borat | 1 |
| 3 | john | 1 |
| 3 | juliet | 0 |
| S | borat | 1 |
| S | john | 1 |
| S | juliet | 0 |
| X | borat | 2 |
| X | john | 0 |
| X | juliet | 2 |
Then you can do the dynamic pivot

Selecting Multiple ID's in one Select

I have a Database with entries that have to be grouped togethe
id | Name | Surname | Time
1 | Michael | Kane | 3
2 | Torben | Dane | 4
3 | Dinge | Chain | 5
4 | Django | Fain | 5
5 | Juliett | Bravo | 6
6 | Django | Fain | 7
7 | Django | Fain | 3
8 | Django | Fain | 4
9 | Dinge | Chain | 4
10 | Torben | Dane | 4
Now I want to group the items while maintaing all Id's. I'm comming close with the following query but I am lossing my ids
SELECT id, Name, Surname, sum(Time) from Names group by(Name)
The Result of the Query is
id | Name | Surname | Time
9 | Dinge | Chain | 9
8 | Django | Fain | 19
5 | Juliett | Bravo | 6
1 | Michael | Kane | 3
10 | Torben | Dane | 8
while I would need all ids like this
ids | Name | Surname | Time
3,9 | Dinge | Chain | 9
4,6,78 | Django | Fain | 19
5 | Juliett | Bravo | 6
1 | Michael | Kane | 3
2,10 | Torben | Dane | 8
How can i accomplish this?

You would do this using group_concat():
select group_concat(id, ',') as ids, name, surname, sum(time) as time
from table t
group by name, surname;
Just don't store the results back in the database. Comma-separated values are useful for returning results, but it is the wrong format for storing data in the database.

List the name of division that all employees are working on some project(s)

List the name of division that ALL employees are working on some project(s). Namly, there not exists an employee who do is the full question. I'm having trouble getting an actual answer for this one, and my professor is being no help to telling me what I'm doing wrong. The code I have is
select dname
from division d, employee e, workon w
where e.did = d.did
and w.empid = e.empid
and not exists
(select empid
from workon
group by empid
having count (empid) >= all(select e.empid
from employee ee
where e.did = ee.did
group by ee.empid))
group by dname
The tables I have are
Employee
| EMPID | NAME | SALARY | DID |
--------------------------------
| 1 | kevin | 32000 | 2 |
| 2 | joan | 46200 | 1 |
| 3 | brian | 37000 | 3 |
| 4 | larry | 82000 | 5 |
| 5 | harry | 92000 | 4 |
| 6 | peter | 45000 | 2 |
| 7 | peter | 68000 | 3 |
| 8 | smith | 39000 | 4 |
| 9 | chen | 71000 | 1 |
| 10 | kim | 46000 | 5 |
Division
| DID | DNAME | MANAGERID |
----------------------------------------------
| 1 | engineering | 2 |
| 2 | marketing | 1 |
| 3 | human resource | 3 |
| 4 | Research and development | 5 |
| 5 | accounting | 4 |
Workon
| PID | EMPID | HOURS |
-----------------------
| 3 | 1 | 30 |
| 2 | 3 | 40 |
| 5 | 4 | 30 |
| 6 | 6 | 60 |
| 4 | 3 | 70 |
| 2 | 4 | 45 |
| 5 | 3 | 90 |
| 3 | 3 | 100 |
| 6 | 8 | 30 |
| 4 | 4 | 30 |
| 5 | 8 | 30 |
| 6 | 7 | 30 |
| 6 | 9 | 40 |
| 5 | 9 | 50 |
| 4 | 6 | 45 |
| 2 | 7 | 30 |
| 2 | 8 | 30 |
| 2 | 9 | 30 |
| 1 | 9 | 30 |
| 1 | 8 | 30 |
| 1 | 7 | 30 |
| 1 | 5 | 30 |
| 1 | 6 | 30 |
| 2 | 6 | 30 |

You're very close. What you're trying to do is called a "correlated subquery". You're relating a key from a table you are querying to a key in a query that doesn't contribute to the candidate set, but does act as a filter in your where clause.
The key line in your code that demonstrates this is the line in the NOT EXISTS clause that says:
e.did = ee.did
Instead of trying to do this by comparing aggregate COUNT(...) results, do an outer join between the Employee and Workon tables to find out if there are any employees who aren't doing anything, then find your departments based on those employees not existing for a given department.
Here's an example query using the Oracle standard HR example tutorial tables representing the same join conditions as you have here. You probably have access to these tables wherever you're running the query, and so should anyone else here who might be interested in the answer, so they can run the query without building your tables to play around with the answer. It's a relatively trivial matter to convert the query to your tables, so I'll leave that exercise to you! :)
The final capitalized line in my query below is the join condition that makes this query a correlated subquery, like you tried to do in yours.
select
*
from
hr.departments d
where
not exists
(
select
ee.employee_id
,ee.first_name
,ee.last_name
,dd.department_id
,dd.department_name
,jj.job_id
from
hr.employees ee
,hr.departments dd
,hr.job_history jj
where
ee.department_id = dd.department_id
and ee.employee_id = jj.employee_id (+)
and jj.job_id is null
AND D.DEPARTMENT_ID = DD.DEPARTMENT_ID
)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Percentage to total in BigQuery Legacy SQL (Subqueries?) - google-bigquery

SELECT ID, Name, [Group], Mark, RATIO_TO_REPORT(Mark) OVER(PARTITION BY [Group]) AS percent FROM YourTable Check more about RATIO_TO_REPORT

Related

SQL JOIN each id in JSON object

How to structure a proper SQL subquery?

SQL - aggregation with column value as column name

Selecting Multiple ID's in one Select

List the name of division that all employees are working on some project(s)

Categories

Resources