Divide Column in Half in BigQuery

Divide Column in Half in BigQuery - google-bigquery

using BigQuery, I would like to be able to divide one column, column1, into two separate columns, column2, and column3 with 50% of all records in column1 in column2 and 50% of all records in column1 in column 3. Ex column1 has 8 records of the number 2. I'd like to create a column2 with 4 records of the number 2 and column3 with 4 records of the number 2.
Is there a query to write this in BigQuery?
Column1
2
2
2
2
2
2
2
2
Column2
2
2
2
2
Column3
2
2
2
2

try:
SELECT
Column1 AS Column2
FROM `my-project.my-dataset.my-table`
WHERE 1=1
QUALIFY ROW_NUMBER() OVER (ORDER BY Column1) <= (
SELECT COUNT(*)/2
FROM `my-project.my-dataset.my-table`
);
SELECT
Column1 AS Column3
FROM `my-project.my-dataset.my-table`
WHERE 1=1
QUALIFY ROW_NUMBER() OVER (ORDER BY Column1) > (
SELECT COUNT(*)/2
FROM `my-project.my-dataset.my-table`
);
This will give you 2 results: One for each Column2 and Column3 with the first and second half of the data respectively order by Column1 (to use analytical functions you always have to specify an ORDER BY inside an OVER clause)
For random order try:
CREATE TEMP TABLE a AS (
SELECT Column1 as Column2
FROM `my-project.my-dataset.my-table`
WHERE 1=1
QUALIFY
ROW_NUMBER() OVER (ORDER BY RAND()) <= (SELECT COUNT(*)/2 FROM `my-project.my-dataset.my-table`)
);
SELECT Column1 as Column3
FROM `my-project.my-dataset.my-table`
WHERE Column1 NOT IN (SELECT * FROM a);
SELECT * FROM a
In this case you'll get 3 results: first one is the temporary table creation and the other 2 are the columns 2 and 3.

Related

Get unique rows from two tables, but keep duplicates from the same table

I want to split a table into two tables (or more, but let's say two).
table_original
id column1 column2
1 1 2
2 1 3
3 1 4
4 1 4
5 1 5
We can also assume that id is a unique identifier. Now I split this table into two, by using a CREATE TABLE table1 AS SELECT * FROM table_original WHERE column2 <= 4 and CREATE TABLE table2 AS SELECT * FROM table_original WHERE column2 >= 4. Now I have these two tables:
table1
id column1 column2
1 1 2
2 1 3
3 1 4
4 1 4
table2
id column1 column2
3 1 4
4 1 4
5 1 5
How to get the same results from those two tables that I can get from the original table? If I run a query SELECT * FROM table1 UNION SELECT * FROM table2 it will be the same as SELECT * FROM table_original because of the unique id value, however if I run a query SELECT column1, column2 FROM table1 UNION SELECT column1, column2 FROM table2 it returns:
column1, column2
1 2
1 3
1 4
1 5
which is not the same as SELECT column1, column2 FROM table_original, which returns:
column1, column2
1 2
1 3
1 4
1 4
1 5
Duplicates from the same table are removed. However, if I wanted to let's say do a count on duplicates, the results will be different, which is bad. So is there a way to do a UNION type operation but keep duplicates that are found in the same table?

not sure what are you trying to achieve but you need to use union all:
SELECT column1, column2 FROM table1
UNION ALL
SELECT column1, column2 FROM table2
union all keeps the duplicates

The UNION on whole rows in your solution will be painfully expensive for big tables (and wide rows). And it fails outright with any column type that doesn't support the equality operator (like json). See:
UNION ALL on JSON data type
This query is substantially faster, making use of the unique index on table1(id). (Create that index if you don't have it!)
SELECT column1, column2
FROM table1 -- bigger table first to micro-optimize some more
UNION ALL
SELECT column1, column2
FROM table2 t2
WHERE NOT EXISTS (SELECT FROM table1 WHERE id = t2.id)
See:
Select rows which are not present in other table
About UNION ALL (as opposed to just UNION):
Is order preserved after UNION in PostgreSQL?
Combining 3 SELECT statements to output 1 table
The question remains: Why keeps completely duplicate rows in multiple tables?

I've figured out the answer.
To keep the duplicates found in the same table, but eliminate everything else, I used a query SELECT column1, column2 FROM (SELECT * FROM table1 UNION SELECT * FROM table2) AS t;
This way the UNION uses the unique id values to eliminate real duplicates, and after that I just filter the result to get the columns I need.

SQL sum values in columns for each row

I have the following table:
column1 | column2 | column3
1 3 4
5 7 6
how do I sum the values of say, column 2 and 3, to return the sum?
The expected result is:
res
7
13

You can do maths within a select statement, so the following will work:
SELECT column2 + column3 AS res FROM table

This works in postgresql.
select sum(col2+col3) from (
select col1, col2,col3,row_number() over() as rows from column_sum ) as foo
group by rows order by rows;

MS SQL count AS to new column

Good day
I have problem with my table and counting
TABLE1
COLUMN1 COLUMN2
3 jjd
5 jd
3 jjd
4 kg
5 jd
48 gjh
446 djj
… …
I need
TABLE1
COLUMN1 COLUMN2 COLUMN3
3 jj 2
5 jd 2
4 kg 1
48 gjh 1
446 djj 1
... ... …
Iam doing but not working well.
SELECT * , COUNT(Column1) as column3 FROM TABLE1
Thanks for help withh my counting

Use a GROUP BY and ORDER BY with DESC to put them in COUNT total order.
SELECT COLUMN1, COLUMN2, COUNT(Column1) AS COLUMN3
FROM Table1
GROUP BY COLUMN1, COLUMN2
ORDER BY COUNT(Column1) DESC
Output
COLUMN1 COLUMN2 COLUMN3
5 jd 2
3 jjd 2
4 kg 1
446 djj 1
48 gjh 1
SQL Fiddle: http://sqlfiddle.com/#!6/89f49/4/0

Try by using group by
SELECT COLUMN1 ,
COLUMN2 ,
COUNT(Column1) As COLUMN3 FROM cte_TABLE1
Group by COLUMN1 ,COLUMN2
Order by COLUMN1
By using Window function
SELECT DISTINCT COLUMN1 ,
COLUMN2 ,
COUNT(Column1)OVER(Partition by COLUMN1,COLUMN2 ORder by COLUMN1 ) As COLUMN3 FROM cte_TABLE1
Result
COLUMN1 COLUMN2 column3
-----------------------
3 jjd 2
4 kg 1
5 jd 2
48 gjh 1
446 djj 1

Using OVER we can achieve it easily
SELECT COLUMN1 ,
COLUMN2 ,
COUNT(Column1)OVER(Partition by COLUMN1,COLUMN2 ORder by COLUMN1 ) As COLUMN3 FROM cte_TABLE1
Group By COLUMN1,COLUMN2

SQL Table and Stored Procedure Creation of Total Column

I have created a table in SQL and am trying to retrieve a ROW that returns the Total of all rows in a the value column and names the description value 'Total'. Is this done in the stored procedure?
EX: Table1
Column1 Column2 Desc ValueColumn
1 6/30/14 One 11.1
2 6/30/14 Two 10.2
3 6/30/14 Three 9.0
I want the table to end looking like the following:
Table1
Column1 Column2 Desc ValueColumn
1 6/30/14 One 11.1
2 6/30/14 Two 10.2
3 6/30/14 Three 9.0
4 6/30/14 Total 30.3
Can you please help with how to do this?
Thank you.

Here is an sql statement that does what you ask:
SELECT Column1, Column2, Desc, ValueColumn
FROM
(
SELECT 1 as rolluporder, Column1, Column2, Desc, ValueColumn
FROM TABLE
UNION ALL
SELECT 2 as rolluporder, null as Column1, null as Column2,
'Total' AS Desc, SUM(ValueColumn) as ValueColumn
FROM TABLE
) T
ORDER BY rolluporder, Column1
This looks a little different but I expect it is what you really want:
Column1 Column2 Desc ValueColumn
1 6/30/14 One 11.1
2 6/30/14 Two 10.2
3 6/30/14 Three 9.0
null null Total 30.3

DECLARE #IDC AS INT
SET #IDC = (
SELECT TOP 1 Column1
FROM [yourtable]
ORDER BY 1 DESC
)
SELECT *
FROM [yourtable]
UNION ALL
SELECT #IDC, Column2, 'Total', SUM(ValueColumn)
FROM [yourtable]
GROUP BY Column2

SQLite select and group matches across columns

I have an SQLite table called match that has two columns: column1 and column2 that contain integer values:
column1 column2
------------------
5 6
6 8
8 9
90 91
1 20
10 20
I want to match duplicate numbers found in either columns and join them, including the matches second value, so that my search result returns would be:
5, 6, 8, 9
1, 20, 10
(notice that 90 and 91 have no matches and therefore are not included).
My 'guess' at making this is:
SELECT column1, column2
FROM match
WHERE column2
IN (SELECT column1
FROM match
GROUP BY column1 HAVING (COUNT(column1) > 0))
UNION
SELECT column1, column2
FROM match
WHERE column1
IN (SELECT column2
FROM match
GROUP BY column1 HAVING (COUNT(column2) > 0))
UNION
SELECT column1, column2
FROM match
WHERE column1
IN (SELECT column1
FROM match
GROUP BY column1 HAVING (COUNT(column1) > 1))
UNION
SELECT column1, column2
FROM match
WHERE column2
IN (SELECT column2
FROM match
GROUP BY column2 HAVING (COUNT(column2) > 1))
and the result is almost what I need:
5 6
6 8
8 9
1 20
10 20
But what I really need is to have the result grouped somehow. For example:
(5, 6, 8, 9) (1, 10, 20)
Is this possible? And is my SQL attempt over-complicated?

I think this is what you want: http://sqlfiddle.com/#!7/05747/9
SELECT column1 as newColumn
FROM match WHERE column1 in (
SELECT myColumn
FROM(
SELECT count(*) as cnt, myColumn
FROM (
SELECT column1 as myColumn
FROM match
UNION ALL
SELECT column2 as myColumn
FROM match
) x
GROUP BY myColumn
HAVING cnt > 1
) y
) OR column2 in (
SELECT myColumn
FROM(
SELECT count(*) as cnt, myColumn
FROM (
SELECT column1 as myColumn
FROM match
UNION ALL
SELECT column2 as myColumn
FROM match
) x
GROUP BY myColumn
HAVING cnt > 1
) y
)
UNION
SELECT column2 as newColumn
FROM match WHERE column1 in (
SELECT myColumn
FROM(
SELECT count(*) as cnt, myColumn
FROM (
SELECT column1 as myColumn
FROM match
UNION ALL
SELECT column2 as myColumn
FROM match
) x
GROUP BY myColumn
HAVING cnt > 1
) y
) OR column2 in (
SELECT myColumn
FROM(
SELECT count(*) as cnt, myColumn
FROM (
SELECT column1 as myColumn
FROM match
UNION ALL
SELECT column2 as myColumn
FROM match
) x
GROUP BY myColumn
HAVING cnt > 1
) y
)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Divide Column in Half in BigQuery - google-bigquery

Related

Get unique rows from two tables, but keep duplicates from the same table

SQL sum values in columns for each row

MS SQL count AS to new column

SQL Table and Stored Procedure Creation of Total Column

SQLite select and group matches across columns

Categories

Resources