Trying to create Row number for Distinct values in BigQuery

Trying to create Row number for Distinct values in BigQuery - google-bigquery

On BigQuery, I'm trying to get the row count of the distinct values to display for easy reference.
Assuming I have 1000 distinct values and I'm trying to get the 340th row of distinct value, how should i code it.
I tried to run
SELECT
DISTINCT column_2
FROM
table
and sure it turns out all the DISTINCT values of column_2. But how do i add the row number beside, and would I be able to put a WHERE for the row number?

Consider below approach
select distinct column_2
from your_table
qualify 340 = dense_rank() over(order by column_2)

Since BigQuery works parallelized there is no guarantee/need for any sorting of table rows. That also means there are no row numbers.
If you want the nth element of a query result you need to define a sorting logic beforehand. You can use navigational functions for that, or a LIMIT with OFFSET if you need one exact value
with t as (
select 'a' as val
union all select 'a'
union all select 'b'
union all select 'c'
union all select 'c'
union all select 'f'
union all select 'y'
union all select 'z'
union all select 'a'
)
select
distinct val
from t
order by 1
-- 5th element has offset 4
limit 1 offset 4

Related

Return COUNT rows of two SELECT related by UNION

I want to return the number of rows with columns from two Tables related by UNION.
I wrote this query
SELECT(
(SELECT * FROM(
(SELECT
ID_COMPTE,
TITLE,
LINK,
DATE_CREAT,
DATE_MODIF,
'TF1' AS "TYPE_FICHIER",
case when DATE_MODIF is null then DATE_CREAT else DATE_MODIF end as LAST_UPDATE FROM FIRST_TABLE FFF where ID_COMPTE= 11111111)
UNION
(SELECT
ID_COMPTE,
TITLE,
LINK,
DATE_CREAT,
DATE_MODIF,
'TF2' AS "TYPE_FICHIER",
case when DATE_MODIF is null then DATE_CREAT else DATE_MODIF end as LAST_UPDATE FROM SECOND_TABLE SSS where ID_COMPTE= 11111111)
order by LAST_UPDATE desc
) parentSelect WHERE ROWNUM BETWEEN 0 AND 2)), count(firstSelect) FROM firstSelect;
The purpose is to return the the last two rows with the count of all the rows of Table 1 and Table 2.
The query without the count works fine it's just the count that cause problem, I don't know how to insert it.
I tried also to use count() for each SELECT and SUM in the parent SELECT but it doesn't work.

This concept should work for you. Basically you select the data you want in the with clause. Then in the main select, you select your data, and the count.
WITH
base AS
(
SELECT 'TEST1' DATA FROM DUAL
UNION ALL
SELECT 'TEST2' DATA FROM DUAL
UNION ALL
SELECT 'TEST3' DATA FROM DUAL
)
SELECT (SELECT COUNT(*) FROM base) AS KOUNT, base.*
FROM base
;

You can use #Temp ( TempTable ).
Insert or manupulate the rows in it which you want to have and finally return it from stored procedure.

How to remove duplicate rows in Google BigQuery based on a unique identifier

In SQL, I use the following code to remove duplicates from a table based on a unique ID:
1. SELECT Unique_ID INTO holdkey FROM [Origination] GROUP BY Unique_ID HAVING count(*) > 1
2. SELECT DISTINCT Origination.*
INTO holddups
FROM [Origination], holdkey
WHERE [Origination].Unique_ID = holdkey.Unique_ID
3. DELETE Origination
FROM Origination, holdkey
WHERE Origination.Unique_ID = holdkey.Unique_ID
4. INSERT Origination SELECT * FROM holddups
The second process does not work on BigQuery. Regardless of how I change the query, I get errors for unrecognized columns and tables.
I obviously take out "select into" queries and just set the destination tables manually. I have SQL experience, and I know the process works. Does anyone have a sample of syntax that they use for the process of removing duplicate records based on a unique ID for BQ? Or a way to modify this that would make it run?

So, the trick is in having proper SELECT here
Below example is for BigQuery Standard SQL
#standardSQL
SELECT row[OFFSET(0)].* FROM (
SELECT ARRAY_AGG(t ORDER BY value DESC LIMIT 1) row
FROM `project.dataset.table_with_dups` t
GROUP BY id
)
you can test / play with above using dummy data as below
#standardSQL
WITH `project.dataset.table_with_dups` AS (
SELECT 1 id, 2 value UNION ALL SELECT 1,3 UNION ALL SELECT 1,4 UNION ALL
SELECT 2,5 UNION ALL
SELECT 3,6 UNION ALL SELECT 3,7 UNION ALL
SELECT 4,8 UNION ALL
SELECT 5,9 UNION ALL SELECT 5,10
)
SELECT row[OFFSET(0)].* FROM (
SELECT ARRAY_AGG(t ORDER BY value DESC LIMIT 1) row
FROM `project.dataset.table_with_dups` t
GROUP BY id
)
with result as
Row id value
1 1 4
2 2 5
3 3 7
4 4 8
5 5 10
As you can see it easily dedups table by id leaving row with largest value. Does not matter how many more other columns in that table - above still works (it does not care of schema rather than id and value)
So, now, you can just use above SELECT and insert result into new table or overwrite original, etc. - all in one shot!

How to create table with multiple rows and columns using only SELECT clause (i.e. using SELECT without FROM clause)

I know that in SQL Server, one can use SELECT clause without FROM clause, and create a table with one row and one column
SELECT 1 AS n;
But I was just wondering, is it possible to use SELECT clause without FROM clause, to create
a table with one column and multiple rows
a table with multiple columns and one row
a table with multiple columns and multiple rows
I have tried many combinations such as
SELECT VALUES(1, 2) AS tableName(n, m);
to no success.

You can do it with CTE and using union(Use union all if you want to display duplicates)
Rextester Sample for all 3 scenarios
One Column and multiple rows
with tbl1(id) as
(select 1 union all
select 2)
select * from tbl1;
One row and multiple columns
with tbl2(id,name) as
(select 1,'A')
select * from tbl2;
Multiple columns and multiple rows
with tbl3(id,name) as
(select 1,'A' union all
select 2,'B')
select * from tbl3;

-- One column, multiple rows.
select 1 as ColumnName union all select 2; -- Without FROM;
select * from ( values ( 1 ), ( 2 ) ) as Placeholder( ColumnName ); -- With FROM.
-- Multiple columns, one row.
select 1 as TheQuestion, 42 as TheAnswer; -- Without FROM.
select * from ( values ( 1, 42 ) ) as Placeholder( TheQuestion, TheAnswer ); -- With FROM.
-- Multiple columns and multiple rows.
select 1 as TheQuestion, 42 as TheAnswer union all select 1492, 12; -- Without FROM.
select * from ( values ( 1, 2 ), ( 2, 4 ) ) as Placeholder( Column1, Column2 ); -- With FROM.

You can do all that by using UNION keyword
create table tablename as select 1 as n,3 as m union select 2 as n,3 as m
In Oracle it will be dual:
create table tablename as select 1 as n,3 as m from dual union select 2 as n,3 as m from dual

You can use UNION operator:
CREATE TABLE AS SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;
The UNION operator selects only distinct values by default. To allow duplicate values you can use UNION ALL.
The column names in the result-set are usually equal to the column names in the first SELECT statement in the UNION.

try this:
--1) a table with one column and multiple rows
select * into tmptable0 from(
select 'row1col1' as v1
union all
select 'row2col1' as v1
) tmp
--2) a table with multiple columns and one row
select 'row1col1' as v1, 'row1col2' as v2 into tmptable1
--3) a table with multiple columns and multiple rows
select * into tmptable2 from(
select 'row1col1' as v1, 'row1col2' as v2
union all
select 'row2col1' as v2, 'row2col2' as v2
) tmp

One can create a view and later can query it whenever required
-- table with one column and multiple rows
create view vw1 as
(
select 'anyvalue' as col1
union all
select 'anyvalue' as col1
)
select * from vw1
-- table with multiple columns and multiple rows
create view vw2 as
(
select 'anyvalue1' as col1, 'anyvalue1' as col2
union all
select 'anyvalue2' as col1, 'anyvalue2' as col2
)
select * from vw2

Counting the rows of a column where the value of a different column is 1

I am using a select count distinct to count the number of records in a column. However, I only want to count the records where the value of a different column is 1.
So my table looks a bit like this:
Name------Type
abc---------1
def----------2
ghi----------2
jkl-----------1
mno--------1
and I want the query only to count abc, jkl and mno and thus return '3'.
I wasn't able to do this with the CASE function, because this only seems to work with conditions in the same column.
EDIT: Sorry, I should have added, I want to make a query that counts both types.
So the result should look more like:
1---3
2---2

SELECT COUNT(*)
FROM dbo.[table name]
WHERE [type] = 1;
If you want to return the counts by type:
SELECT [type], COUNT(*)
FROM dbo.[table name]
GROUP BY [type]
ORDER BY [type];
You should avoid using keywords like type as column names - you can avoid a lot of square brackets if you use a more specific, non-reserved word.

I think you'll want (assuming that you wouldn't want to count ('abc',1) twice if it is in your table twice):
select count(distinct name)
from mytable
where type = 1
EDIT: for getting all types
select type, count(distinct name)
from mytable
group by type
order by type

select count(1) from tbl where type = 1

;WITH MyTable (Name, [Type]) AS
(
SELECT 'abc', 1
UNION
SELECT 'def', 2
UNION
SELECT 'ghi', 2
UNION
SELECT 'jkl', 1
UNION
SELECT 'mno', 1
)
SELECT COUNT( DISTINCT Name)
FROM MyTable
WHERE [Type] = 1

Union returns two rows if data is different, one if it's the same! Why?

Taking the following statement:
select count( 1 ) as cnt from tbl where val= 1
union
select count( 1 ) as cnt from tbl where val = 0
If the two selects return the same value the result is a single row with that value. If the selects return different values the result is two rows with the two values. Why?
I am trying to find the total count of rows using:
select sum (cnt) from
(
select count( 1 ) as cnt from tbl where value = 1
union
select count( 1 ) as cnt from tbl where value = 0
) as tbl2
which works as expected if the counts are different but gives half the value if the counts are the same...
(PS : More interested in why sql behaves this way than in a solution)

This behavior is by design. You should use UNION ALL to achieve the behavior you want. Basically, UNION performs a set union operation, removing the duplicates in the set.
http://www.fmsinc.com/free/NewTips/SQL/SQLtip5.asp

the main difference between union and union all is that union does a distinct over all fields returned. Where union all just returns and joins the various result sets

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Trying to create Row number for Distinct values in BigQuery - google-bigquery

Consider below approach select distinct column_2 from your_table qualify 340 = dense_rank() over(order by column_2)

Related

Return COUNT rows of two SELECT related by UNION

How to remove duplicate rows in Google BigQuery based on a unique identifier

How to create table with multiple rows and columns using only SELECT clause (i.e. using SELECT without FROM clause)

Counting the rows of a column where the value of a different column is 1

Union returns two rows if data is different, one if it's the same! Why?

Categories

Resources