distinct rows from bigquery table with array field

distinct rows from bigquery table with array field - google-bigquery

I have a bigquery table containing a field candidate of array type. How can I query distinct rows from this table?
In this case my query should return just the first row.

I think below is the simplest way and works for any types and length , etc.
#standardSQL
SELECT ANY_VALUE(candidate) candidate
FROM `project.dataset.table`
GROUP BY FORMAT('%T', candidate)
Previously I used to use TO_JSON_STRING() for this - but recently realized that FORMAT() fits best for most cases like this

Something like:
select split(combed, ".") as candidate from (
select distinct array_to_string(candidate, ".") as combed
from `dataset.table`
)

Related

Adding a single static column to SQL query results

I have a pretty big query (no pun intended) written out in BigQuery that returns about 5 columns. I simply want to append an extra column to it that is not joined to any other table and just returns a single word in every row. As if to be an ID for the entire table.

Just wrap original select and add new constant or add it into original query. The answer might be more precise if you put your query and expected result to your question.
select q.*, 'JOHN' as new_column
from ( <your_big_query> ) q
previous (now unrelated) answer follows
You can use row_number window function:
select q.*, row_number() over (order by null) as id
from ( <your_big_query> ) q
It returns values 1,2, etc.
Depending on how complicated your query is, the row_number could be inlined directly into your query.

If all you want is one static column, just add an extra static column at the end of your existing select columns list.
select {ALL_COLUMNS_YOU_ARE_JOINING_COMPUTING_ETC}, 'something' as your_new_static_col from {YOUR_QUERY}
This static column does not need to be a string, it can be an int or some other type.

SQL Count distinct number of rows in table in GBQ

I'd like to count the number of distinct rows in a table. I know that I can do that using groupby or by naming all the columns one by one, but would like to just do:
select count(distinct *) from my_table
Is that possible?

Do SELECT DISTINCT in a derived table (the subquery), then count the number of rows returned.
select count(*) from
(select distinct * from my_table) dt
(Doesn't your table have any primary key?)

You can use to_json_string():
select count(distinct to_json_string(t))
from t;

Below more options for BigQuery Standard SQL
select count(distinct format('%t', t))
from `project.dataset.table` t
depends on your use case - approximate count can be even more optimal option
select approx_count_distinct(format('%t', t))
from `project.dataset.table` t
APPROX_COUNT_DISTINCT - returns the approximate result for COUNT(DISTINCT expression). The value returned is a statistical estimate—not necessarily the actual value. This function is less accurate than COUNT(DISTINCT expression), but performs better on huge input.

The use of count(distinct *) is not permitted.
Alternatively you could explicitly name the columns (what defines uniqueness).

Sorting concatenated strings after grouping in Netezza

I'm using the code on this page to create concatenated list of strings on a group by aggregation basis.
https://dwgeek.com/netezza-group_concat-alternative-working-example.html/
I'm trying to get the concatenated string in sorted order, so that, for example, for DB1 I'd get data1,data2,data5,data9
I tied modifying the original code to selecting from a pre-sorted table but it doesn't seem to make any difference.
select Col1
, count(*) as NUM_OF_ROWS
, trim(trailing ',' from SETNZ..replace(SETNZ..replace (SETNZ..XMLserialize(SETNZ..XMLagg(SETNZ..XMLElement('X',col2))), '<X>','' ),'</X>' ,',' )) AS NZ_CONCAT_STRING
from
(select * from tbl_concat_demo order by 1,2) AS A
group by Col1
order by 1;
Is there a way to sort the strings before they get aggregated?
BTW - I'm aware there is a GROUP_CONCAT UDF function for Netezza, but I won't have access to it.

This is notoriously difficult to accomplish in sql, since sorting is usually done while returning the data, and you want to do it in the ‘input’ set.
Try this:
1)
Create temp table X as select * from tbl_concat_demo Order by col2
Partition by (col1)
In your original code above: select from X instead of tbl_concat_demo
Let me know if it works ?

GROUP BY in Postgres - no equality for JSON data type?

I have the following data in a matches table:
5;{"Id":1,"Teams":[{"Name":"TeamA","Players":[{"Name":"AAA"},{"Name":"BBB"}]},{"Name":"TeamB","Players":[{"Name":"CCC"},{"Name":"DDD"}]}],"TeamRank":[1,2]}
6;{"Id":2,"Teams":[{"Name":"TeamA","Players":[{"Name":"CCC"},{"Name":"BBB"}]},{"Name":"TeamB","Players":[{"Name":"AAA"},{"Name":"DDD"}]}],"TeamRank":[1,2]}
I want to select each last distinct Team in the table by their name. i.e. I want a query that will return:
6;{"Name":"TeamA","Players":[{"Name":"CCC"},{"Name":"BBB"}
6;{"Name":"TeamB","Players":[{"Name":"AAA"},{"Name":"DDD"}
So each team from last time that team appears in the table.
I have been using the following (from here):
WITH t AS (SELECT id, json_array_elements(match->'Teams') AS team FROM matches)
SELECT MAX(id) AS max_id, team FROM t GROUP BY team->'Name';
But this returns:
ERROR: could not identify an equality operator for type json
SQL state: 42883
Character: 1680
I understand that Postgres doesn't have equality for JSON. I only need equality for the team's name (a string), the players on that team don't need to be compared.
Can anyone suggest an alternative way to do this?
For reference:
SELECT id, json_array_elements(match->'Teams') AS team FROM matches
returns:
5;"{"Name":"TeamA","Players":[{"Name":"AAA"},{"Name":"BBB"}]}"
5;"{"Name":"TeamB","Players":[{"Name":"CCC"},{"Name":"DDD"}]}"
6;"{"Name":"TeamA","Players":[{"Name":"CCC"},{"Name":"BBB"}]}"
6;"{"Name":"TeamB","Players":[{"Name":"AAA"},{"Name":"DDD"}]}"
EDIT: I cast to text and following this question, I used DISTINCT ON instead of GROUP BY. Here's my full query:
WITH t AS (SELECT id, json_array_elements(match->'Teams') AS team
FROM matches ORDER BY id DESC)
SELECT DISTINCT ON (team->>'Name') id, team FROM t;
Returns what I wanted above. Does anyone have a better solution?

Shorter, faster and more elegant with a LATERAL join:
SELECT DISTINCT ON (t.team->>'Name') t.team
FROM matches m, json_array_elements(m.match->'Teams') t(team);
ORDER BY t.team->>'Name', m.id DESC; -- to get the "last"
If you just want distinct teams, the ORDER BY can go. Related:
Query for element of array in JSON column
Query for array elements inside JSON type
JSON and equality
There is no equality operator for the json data type in Postgres, but there is one for jsonb (Postgres 9.4+):
How to query a json column for empty objects?

Auto id for select statements in Oracle

Is it possible to create / to have an auto id column in the select statements in Oracle.
Example:
Assume we have a table ITEMS without an id
Normal select-statement
Select name
from ITEMS
What I'm looking for is something like this
select AutoIdGen(), name
from ITEMS

You can use ROWID or ROWNUM in oracle ,like this:
SELECT ROWID,ROWNUM,name from ITEMS;

You can use row_number for this. The row_number analytical function works a little different then rownum. You can also apply partitioning on the results when you want to or sort on different columns then the results.
select row_number() over (order by name)
, name
from ITEMS

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

distinct rows from bigquery table with array field - google-bigquery

I have a bigquery table containing a field candidate of array type. How can I query distinct rows from this table? In this case my query should return just the first row.

Something like: select split(combed, ".") as candidate from ( select distinct array_to_string(candidate, ".") as combed from `dataset.table` )

Related

Adding a single static column to SQL query results

SQL Count distinct number of rows in table in GBQ

Sorting concatenated strings after grouping in Netezza

GROUP BY in Postgres - no equality for JSON data type?

Auto id for select statements in Oracle

Categories

Resources