Big Query - String Function - google-bigquery

I am very new in BigQuery platform, i want to take the following strings
SOCKETIOEXCEPTION##APS.COM, NULLPOINTEREXCEPTION##RSJAVA.COM, CLASSCASTEEXCEPTION##MPS.COM
And get this as a result: SOCKETIOEXCEPTION, NULLPOINTEREXCEPTION, CLASSCASTEEXCEPTION
Before ## characters I want to separate from a given string and then I want to group by number rows available in the above-mentioned tag like SOCKETIOEXCEPTION, NULLPOINTEREXCEPTION, CLASSCASTEEXCEPTION
Sample db details
How do I write this query?

Below is for BigQuery Standard SQL
#standardSQL
SELECT SPLIT(line, '##')[OFFSET(0)] type, COUNT(1) cnt
FROM `project.dataset.table`
GROUP BY type
You can test, play with above using sample data from your question as in example below
#standardSQL
WITH `project.dataset.table` AS (
SELECT 'SOCKETIOEXCEPTION##111' line UNION ALL
SELECT 'SOCKETIOEXCEPTION##222' UNION ALL
SELECT 'SOCKETIOEXCEPTION##333' UNION ALL
SELECT 'NULLPOINTEREXCEPTION##444' UNION ALL
SELECT 'NULLPOINTEREXCEPTION##555' UNION ALL
SELECT 'CLASSCASTEEXCEPTION##666' UNION ALL
SELECT 'CLASSCASTEEXCEPTION##777' UNION ALL
SELECT 'CLASSCASTEEXCEPTION##888'
)
SELECT SPLIT(line, '##')[OFFSET(0)] type, COUNT(1) cnt
FROM `project.dataset.table`
GROUP BY type
with result
Row type cnt
1 SOCKETIOEXCEPTION 3
2 NULLPOINTEREXCEPTION 2
3 CLASSCASTEEXCEPTION 3

Related

use distinct and order by in STRING_AGG function

I am trying the string_agg a column while at the same time ordering the column and only show unique values. Consider the following demo. IS there a syntax issue or is this simply not possible with the method I am using?
SELECT STRING_AGG(DISTINCT foo.a::TEXT,',' ORDER BY foo.a DESC)
FROM (
SELECT 1 As a
UNION ALL
SELECT 1
UNION ALL
SELECT 1
UNION ALL
SELECT 2
) AS foo
[2019-11-22 13:29:32] [42P10] ERROR: in an aggregate with DISTINCT, ORDER BY expressions must appear in argument list
[2019-11-22 13:29:32] Position: 53
The error message is quite clear. The expression that you use in the ORDER BY clause must also appear in the aggregated part.
You could do:
SELECT STRING_AGG(DISTINCT foo.a::TEXT, ',' ORDER BY foo.a::TEXT DESC)
FROM (
SELECT 1 As a
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 2
) AS foo
Demo on DB Fiddle
While this will work, the problem with this solution is that it will order numbers as strings, that do not have the same ordering rules. String wise, 10 is less than 2.
Another option is to use arrays: first, ARRAY_AGG() can be used to aggregate the numbers (with proper, numeric ordering), then you can turn it to a comma-separated list of strings with ARRAY_TO_STRING().
SELECT ARRAY_TO_STRING(ARRAY_AGG(DISTINCT a ORDER BY a DESC), ',')
FROM (
SELECT 1 As a
UNION ALL SELECT 1
UNION ALL SELECT 1
UNION ALL SELECT 2
) AS foo
Demo on DB Fiddle

sql - single line per distinct values in a given column

is there a way using sql, in bigquery more specifically, to get one line per unique value in a given column
I know that this is possible using a sequence of union queries where you have as much union as distinct values as there is in the column of interest. but i'm wondering if there is a better way to do it.
You can use row_number():
select t.* except (seqnum)
from (select t.*, row_number() over (partition by col order by col) as seqnum
from t
) t
where seqnum = 1;
This returns an arbitrary row. You can control which row by adjusting the order by.
Another fun solution in BigQuery uses structs:
select array_agg(t limit 1)[ordinal(1)].*
from t
group by col;
You can add an order by (order by X limit 1) if you want a particular row.
here is just a more formated format :
select tab.* except(seqnum)
from (
select *, row_number() over (partition by column_x order by column_x) as seqnum
from `project.dataset.table`
) as tab
where seqnum = 1
Below is for BigQuery Standard SQL
#standardSQL
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY col
You can test, play with above using dummy data as in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, 1 col UNION ALL
SELECT 2, 1 UNION ALL
SELECT 3, 1 UNION ALL
SELECT 4, 2 UNION ALL
SELECT 5, 2 UNION ALL
SELECT 6, 3
)
SELECT AS VALUE ANY_VALUE(t)
FROM `project.dataset.table` t
GROUP BY col
with result
Row id col
1 1 1
2 4 2
3 6 3

Dynamic Pivoting in Oracle

Hi I want to apply dynamic pivoting on table with structure as
ID Type Amount
--- ------ ------
1 AB 50
2 PQR 100
3 AB 60
4 PQR 120
I want result in below format:
In my table, every month Type column's values changing. So I want dynamically pivot table values to get desired result. I was tried by pivoting as per syntax , but whenever I tried to place sub-query in IN operator of pivot, it has given an error. I am using Oracle 10 g.
Can anyone please assist me in this issue. Thanks.
Select * from(
Select ID , Type, Value
from mytable)x
pivot(sum(Value) for Type IN (Select distinct Type from myTable))
If you want to have dynamic results, you may prefer using xml option of pivoting
with t(ID, Type, Amount) as
(
select 1,'AB',50 from dual union all
select 2,'PQR',100 from dual union all
select 3,'AB',60 from dual union all
select 4,'PQR',120 from dual
)
select *
from(
select ID , Type, Amount
from t )
pivot xml(
sum(Amount) as sum_amount for (type)
in (Select distinct Type from t)
);
Rextester Demo

How to remove duplicate rows in Google BigQuery based on a unique identifier

In SQL, I use the following code to remove duplicates from a table based on a unique ID:
1. SELECT Unique_ID INTO holdkey FROM [Origination] GROUP BY Unique_ID HAVING count(*) > 1
2. SELECT DISTINCT Origination.*
INTO holddups
FROM [Origination], holdkey
WHERE [Origination].Unique_ID = holdkey.Unique_ID
3. DELETE Origination
FROM Origination, holdkey
WHERE Origination.Unique_ID = holdkey.Unique_ID
4. INSERT Origination SELECT * FROM holddups
The second process does not work on BigQuery. Regardless of how I change the query, I get errors for unrecognized columns and tables.
I obviously take out "select into" queries and just set the destination tables manually. I have SQL experience, and I know the process works. Does anyone have a sample of syntax that they use for the process of removing duplicate records based on a unique ID for BQ? Or a way to modify this that would make it run?
So, the trick is in having proper SELECT here
Below example is for BigQuery Standard SQL
#standardSQL
SELECT row[OFFSET(0)].* FROM (
SELECT ARRAY_AGG(t ORDER BY value DESC LIMIT 1) row
FROM `project.dataset.table_with_dups` t
GROUP BY id
)
you can test / play with above using dummy data as below
#standardSQL
WITH `project.dataset.table_with_dups` AS (
SELECT 1 id, 2 value UNION ALL SELECT 1,3 UNION ALL SELECT 1,4 UNION ALL
SELECT 2,5 UNION ALL
SELECT 3,6 UNION ALL SELECT 3,7 UNION ALL
SELECT 4,8 UNION ALL
SELECT 5,9 UNION ALL SELECT 5,10
)
SELECT row[OFFSET(0)].* FROM (
SELECT ARRAY_AGG(t ORDER BY value DESC LIMIT 1) row
FROM `project.dataset.table_with_dups` t
GROUP BY id
)
with result as
Row id value
1 1 4
2 2 5
3 3 7
4 4 8
5 5 10
As you can see it easily dedups table by id leaving row with largest value. Does not matter how many more other columns in that table - above still works (it does not care of schema rather than id and value)
So, now, you can just use above SELECT and insert result into new table or overwrite original, etc. - all in one shot!

BigQuery - Concatenate multiple rows into a single row

I have a BigQuery table with 2 columns:
id|name
1|John
1|Tom
1|Bob
2|Jack
2|Tim
Expected output: Concatenate names grouped by id
id|Text
1|John,Tom,Bob
2|Jack,Tim
For BigQuery Standard SQL:
#standardSQL
--WITH yourTable AS (
-- SELECT 1 AS id, 'John' AS name UNION ALL
-- SELECT 1, 'Tom' UNION ALL
-- SELECT 1, 'Bob' UNION ALL
-- SELECT 2, 'Jack' UNION ALL
-- SELECT 2, 'Tim'
--)
SELECT
id,
STRING_AGG(name ORDER BY name) AS Text
FROM yourTable
GROUP BY id
Optional ORDER BY name within STRING_CONCAT allows you to get out sorted list of names as below
id Text
1 Bob,John,Tom
2 Jack,Tim
For Legacy SQL
#legacySQL
SELECT
id,
GROUP_CONCAT(name) AS Text
FROM yourTable
GROUP BY id
If you would need to output sorted list here, you can use below (formally - it is not guaranteed by BigQuery Legacy SQL to get sorted list - but for most practical cases I had - it worked)
#legacySQL
SELECT
id,
GROUP_CONCAT(name) AS Text
FROM (
SELECT id, name
FROM yourTable
ORDER BY name
)
GROUP BY id
You can use GROUP_CONCAT
SELECT id, GROUP_CONCAT(name) AS Text FROM <dataset>.<table> GROUP BY id