Bigquery - Count how many time a value show up in a column - sql

I have a table like this
Col1 | Col2 | Col3
A | 1 | 23
B | 3 | 23
B | 2 | 64
A | 4 | 75
C | 5 | 23
A | 6 | 12
A | 2 | 33
B | 3 | 52
A | 1 | 83
C | 5 | 24
A | 6 | 74
and I need a query that will show how many times the value in Col1 appeared:
Col1 | Col2 | Col3 | Col4
A | 1 | 23 | 6
B | 3 | 23 | 3
B | 2 | 64 | 6
A | 4 | 75 | 6
C | 5 | 23 | 2
A | 6 | 12 | 6
A | 2 | 33 | 6
B | 3 | 52 | 3
A | 1 | 83 | 6
C | 5 | 24 | 2
A | 6 | 74 | 6
How can I do it in BigQuery?

Easier with a window function
select *, count(*) over (partition by col1) as col4
from t;

You just need to use window function
select *, count(col1) over (partition by col

Related

Display Rows if Column Value is repeated

I have a SQL table that looks like this:
DATA | TEST_ID | PARAM_ID
-------------------------------------
c:\desktop\image1| 11 | 1
c:\desktop\image2| 12 | 1
c:\desktop\image3| 13 | 1
c:\desktop\image4| 14 | 1
Fail | 14 | 2
0.45 | 14 | 3
c:\desktop\image5| 15 | 1
Fail | 15 | 2
0.68 | 15 | 3
c:\desktop\image6| 16 | 1
Fail | 16 | 2
0.25 | 16 | 3
I would like to create a query where the result only shows DATA if TEST_ID has the same value repeated 3 times.
Ideal Result:
DATA | TEST_ID | PARAM_ID
-------------------------------------
c:\desktop\image4| 14 | 1
Fail | 14 | 2
0.45 | 14 | 3
c:\desktop\image5| 15 | 1
Fail | 15 | 2
0.68 | 15 | 3
c:\desktop\image6| 16 | 1
Fail | 16 | 2
0.25 | 16 | 3
Would the best approach be to use COUNT(*)>2 for the TEST_ID column?
Use window functions:
select t.*
from (select t.*, count(*) over (partition by test_id) as cnt
from t
) t
where cnt >= 3;

How to minus by period in another column

I need results in minus column like:
For example, we take first result by A = 23(1)
and we 34(2) - 23(1) = 11, then 23(3) - 23(1)...
And so on. For each category.
+--------+----------+--------+-------+
| Period | Category | Result | Minus |
+--------+----------+--------+-------+
| 1 | A | 23 | n/a |
| 1 | B | 24 | n/a |
| 1 | C | 25 | n/a |
| 2 | A | 34 | 11 |
| 2 | B | 23 | -1 |
| 2 | C | 1 | -24 |
| 3 | A | 23 | 0 |
| 3 | B | 90 | 66 |
| 3 | C | 21 | -4 |
+--------+----------+--------+-------+
Could you help me?
Could we use partitions or lead here?
SELECT
*,
Result - FIRST_VALUE(Result) OVER (PARTITION BY Category ORDER BY Period) AS Minus
FROM
yourTable
This doesn't create the hello values, but returns 0 instead. I'm not sure returning arbitrary string in an integer column makes sense, so I didn't do it.
If you really need to avoid the 0 you could just use a CASE statement...
CASE WHEN 1 = Period
THEN NULL
ELSE Result - FIRST_VALUE(Result) OVER (PARTITION BY Category ORDER BY Period)
END
Or, even more robustly...
CASE WHEN 1 = ROW_NUMBER() OVER (PARTITION BY Category ORDER BY Period)
THEN NULL
ELSE Result - FIRST_VALUE(Result) OVER (PARTITION BY Category ORDER BY Period)
END
(Apologies for any typos, etc, I'm on my phone.)
You can do:
select b.*, b.result - a.result as "minus"
from t a
join t b on b.category = a.category and a.period = 1
Result:
period category result minus
------- --------- ------- -----
1 A 23 0
1 B 24 0
1 C 25 0
2 A 34 11
2 B 23 -1
2 C 1 -24
3 A 23 0
3 B 90 66
3 C 21 -4
See running example at DB Fiddle.
Ok, just not to duplicate my question
How to do it?
If
For each new Sub period we should repeat first_value logic
+------------+----------+------------+----------+---------+
| Sub period | Period | Category | Result | Minus |
+------------+----------+------------+----------+---------+
| SA | 1 | A | 23 | n/a |
| SA | 2 | A | 34 | 11 |
| SA | 3 | A | 35 | 12 |
| SA | 4 | A | 36 | 13 |
| KS | 1 | A | 23 | n/a |
| KS | 2 | A | 21 | -2 |
| KS | 3 | A | 23 | 0 |
| KS | 4 | A | 21 | -2 |
+------------+----------+------------+----------+---------+

SQL Select and Group By clause

I have data as per the table below, I pass in a list of numbers and need the raceId where all the numbers appear in the the data column for that race.
+-----+--------+------+
| Id | raceId | data |
+-----+--------+------+
| 14 | 1 | 1 |
| 12 | 1 | 2 |
| 13 | 1 | 3 |
| 16 | 1 | 8 |
| 47 | 2 | 1 |
| 43 | 2 | 2 |
| 46 | 2 | 6 |
| 40 | 2 | 7 |
| 42 | 2 | 8 |
| 68 | 3 | 3 |
| 69 | 3 | 6 |
| 65 | 3 | 7 |
| 90 | 4 | 1 |
| 89 | 4 | 2 |
| 95 | 4 | 6 |
| 92 | 4 | 7 |
| 93 | 4 | 8 |
| 114 | 5 | 1 |
| 116 | 5 | 2 |
| 117 | 5 | 3 |
| 118 | 5 | 8 |
| 138 | 6 | 2 |
| 139 | 6 | 6 |
| 140 | 6 | 7 |
| 137 | 6 | 8 |
+-----+--------+------+
Example I pass in 1,2,7 I would get the following Id's:
2 and 4
I have tried the simple statement
SELECT * FROM table WHERE ((data = 1) or (data = 2) or (data = 7))
But I don't really understand the grouping by clause or indeed if it is the correct way of doing this.
select raceId
from yourtable
where data in (1,2,7)
group by raceId
having count(raceId) = 3 /* length(1,2,7) */
This is assuming raceId, data pair is unique. If it's not the you should use
select raceId
from (select distinct raceId, data
from yourtable
where data in(1,2,7))
group by raceId
having count(raceId) = 3
SELECT DISTINCT raceId WHERE data IN (1, 2, 7)
This is an example of a "set-within-sets" query. I like to solve these with group by and having.
select raceid
from races
where data in (1, 2, 7)
group by raceid
having count(*) = 3;

Oracle SQL: getting all row maximum number from specific multiple criteria

I have the following table named foo:
ID | D1 | D2 | D3 |
---------------------
1 | 47 | 3 | 71 |
2 | 47 | 98 | 82 |
3 | 0 | 99 | 3 |
4 | 3 | 100 | 6 |
5 | 48 | 10 | 3 |
6 | 49 | 12 | 4 |
I want to run a select query and have the results show like this
ID | D1 | D2 | D3 | Result |
------------------------------
1 | 47 | 3 | 71 | D3 |
2 | 47 | 98 | 82 | D2 |
3 | 0 | 99 | 3 | D2 |
4 | 3 | 100 | 6 | D2 |
5 | 48 | 10 | 3 | D1 |
6 | 49 | 12 | 4 | D1 |
So, basically I want to get Maximum value between D1, D2, D3 column divided by id.
As You may seen , ID 1 have D3 in the Result column since Maximum value between
D1 : D2 : D3
That Means 4 : 3 : 71 , Max value is 71. Thats Why The Result show 'D3'
Is there a way to do this in a sql query ?
Thanks!
For Oracle please try this one
select foo.*, case when greatest(d1, d2, d3) = d1 then 'D1'
when greatest(d1, d2, d3) = d2 then 'D2'
when greatest(d1, d2, d3) = d3 then 'D3'
end result
from foo
Consider the following - a normalized approach...
DROP TABLE IF EXISTS my_table;
CREATE TABLE my_table
(id INT NOT NULL
,d INT NOT NULL
,val INT NOT NULL
,PRIMARY KEY(id,d)
);
INSERT INTO my_table VALUES
(1,1,47),
(2,1,47),
(3,1,0),
(4,1,3),
(5,1,48),
(6,1,49),
(1,2,3),
(2,2,98),
(3,2,99),
(4,2,100),
(5,2,10),
(6,2,12),
(1,3,71),
(2,3,82),
(3,3,3),
(4,3,6),
(5,3,3),
(6,3,4);
SELECT * FROM my_table;
+----+---+-----+
| id | d | val |
+----+---+-----+
| 1 | 1 | 47 |
| 1 | 2 | 3 |
| 1 | 3 | 71 |
| 2 | 1 | 47 |
| 2 | 2 | 98 |
| 2 | 3 | 82 |
| 3 | 1 | 0 |
| 3 | 2 | 99 |
| 3 | 3 | 3 |
| 4 | 1 | 3 |
| 4 | 2 | 100 |
| 4 | 3 | 6 |
| 5 | 1 | 48 |
| 5 | 2 | 10 |
| 5 | 3 | 3 |
| 6 | 1 | 49 |
| 6 | 2 | 12 |
| 6 | 3 | 4 |
+----+---+-----+
SELECT x.*
FROM my_table x
JOIN
( SELECT id,MAX(val) max_val FROM my_table GROUP BY id) y
ON y.id = x.id
AND y.max_val = x.val;
+----+---+-----+
| id | d | val |
+----+---+-----+
| 1 | 3 | 71 |
| 2 | 2 | 98 |
| 3 | 2 | 99 |
| 4 | 2 | 100 |
| 5 | 1 | 48 |
| 6 | 1 | 49 |
+----+---+-----+
(This is intended as a MySQL solution - I'm not familiar with ORACLE syntax, so apologies if this doesn't port)
Does this answer your comment?
SELECT x.* , y.max_val
FROM my_table x
JOIN
( SELECT id,MAX(val) max_val FROM my_table GROUP BY id) y
ON y.id = x.id ;
+----+---+-----+---------+
| id | d | val | max_val |
+----+---+-----+---------+
| 1 | 1 | 47 | 71 |
| 1 | 2 | 3 | 71 |
| 1 | 3 | 71 | 71 |
| 2 | 1 | 47 | 98 |
| 2 | 2 | 98 | 98 |
| 2 | 3 | 82 | 98 |
| 3 | 1 | 0 | 99 |
| 3 | 2 | 99 | 99 |
| 3 | 3 | 3 | 99 |
| 4 | 1 | 3 | 100 |
| 4 | 2 | 100 | 100 |
| 4 | 3 | 6 | 100 |
| 5 | 1 | 48 | 48 |
| 5 | 2 | 10 | 48 |
| 5 | 3 | 3 | 48 |
| 6 | 1 | 49 | 49 |
| 6 | 2 | 12 | 49 |
| 6 | 3 | 4 | 49 |
+----+---+-----+---------+

SQL Joining 2 Tables

I would like to merge two tables into one and also add a counter next to that. What i have now is
SELECT [CUCY_DATA].*, [DIM].[Col1], [DIM].[Col2],
(SELECT COUNT([Cut Counter]) FROM [MSD]
WHERE [CUCY_DATA].[Cut Counter] = [MSD].[Cut Counter]
) AS [Nr Of Errors]
FROM [CUCY_DATA] FULL JOIN [DIM]
ON [CUCY_DATA].[Cut Counter] = [DIM].[Cut Counter]
This way the data is inserted but where the values don't match nulls are inserted. I want for instance this
Table CUCY_DATA
|_Cut Counter_|_Data1_|_Data2_|
| 1 | 12 | 24 |
| 2 | 13 | 26 |
| 3 | 10 | 20 |
| 4 | 11 | 22 |
Table DIM
|_Cut Counter_|_Col1_|_Col2_|
| 1 | 25 | 40 |
| 3 | 50 | 45 |
And they need to be merged into:
|_Cut Counter_|_Data1_|_Data2_|_Col1_|_Col2_|
| 1 | 12 | 24 | 25 | 40 |
| 2 | 13 | 26 | 25 | 40 |
| 3 | 10 | 20 | 50 | 45 |
| 4 | 11 | 22 | 50 | 45 |
SO THIS IS WRONG:
|_Cut Counter_|_Data1_|_Data2_|_Col1__|_Col2__|
| 1 | 12 | 24 | 25 | 40 |
| 2 | 13 | 26 | NULL | NULL |
| 3 | 10 | 20 | 50 | 45 |
| 4 | 11 | 22 | NULL | NULL |
Kind regards, Bob
How are you getting the col1 and col2 values where there is no corresponding row in your DIM table? (Rows 2 and 4). Your "wrong" result is exactly correct, that's what the outer join does.