Unable to translate BigQuery legacy SQL to standard SQL for HAVING LEFT(...) - google-bigquery

I would like to use BigQuery Standard SQL for a query like this one:
SELECT package, COUNT(*) count
FROM (
SELECT REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, id
FROM (
SELECT SPLIT(content, '\n') line, id
FROM [github-groovy-files:github.contents]
WHERE content CONTAINS 'import'
HAVING LEFT(line, 6)='import' )
GROUP BY package, id
)
GROUP BY 1
ORDER BY count DESC
LIMIT 30;
I cannot get past something like this (works but doesn't GROUP or COUNT):
with lines as
(SELECT SPLIT(c.content, '\n') line, c.id as id
FROM `<dataset>.contents` c, `<dataset>.files` f
WHERE c.id = f.id AND f.path LIKE '%.groovy')
select
array(select REGEXP_REPLACE(l, r'import |;', '') AS class from unnest(line) as l where l like 'import %') imports, id
from lines;
LEFT() is not in Standard SQL and there doesn't seem to be a function that will accept and array type.

LEFT() is not in Standard SQL ...
In BigQuery Standard SQL you can use SUBSTR(value, position[, length]) instead of Legacy's LEFT
... and there doesn't seem to be a function that will accept and array type.
There are plenty of Array's related functions as well as functions that accept array as argument - for example UNNEST()
I would like to use BigQuery Standard SQL for a query like this one:
Below is equivalent query for BigQuery Standard SQL
SELECT package, COUNT(*) COUNT
FROM (
SELECT REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, id
FROM (
SELECT line, id
FROM `github-groovy-files.github.contents`,
UNNEST(SPLIT(content, '\n')) line
WHERE SUBSTR(line, 1, 6)='import'
)
GROUP BY package, id
)
GROUP BY 1
ORDER BY COUNT DESC
LIMIT 30
Instead of WHERE SUBSTR(line, 1, 6)='import' you can use WHERE line LIKE 'import%'
Also note, this query can be written in number of ways - so in my above example I focused on "translating" your query into from legacy to standard sql while preserving core structure and approach of original query
But if you woukld like to rewrite it using power of Standard SQL - you would ended up with something like below
SELECT REGEXP_EXTRACT(line, r' ([a-z0-9\._]*)\.') package, COUNT(DISTINCT id) count
FROM `github-groovy-files.github.contents`,
UNNEST(SPLIT(content, '\n')) line
WHERE line LIKE 'import%'
GROUP BY 1
ORDER BY count DESC
LIMIT 30

Related

How do I convert a T-SQL query to PostgreSQL that concatenates substrings?

This is my query:
SELECT
*,
CONCAT(
RIGHT( account_no, 4),
RIGHT( customer_id, 5 )
) AS "password for my diplomo"
FROM
account_info;
But I get this error:
Error: function left(bigint, integer) does not exist;
My table is:
CREATE TABLE account_info (
account_no bigint NOT NULL PRIMARY KEY,
customer_id varchar(...)
)
Postgres functions left and right expect their first argument be text.
So first cast account_no to type text and your query (a bit simplified) will work.
SELECT *,
right(account_no::text, 4) || right(customer_id, 5) as pfmd
FROM account_info;
Unrelated but the best practice under Postgres is to use type text instead of char or varchar.
You seem to be using a reference for T-SQL or JET Red SQL (for MS SQL Server and MS Access respectively) when you're actually using PostgreSQL which uses completely different functions (and syntax) for string/text processing.
This is the PostgreSQL v12 manual page for string functions and other syntax. You should read it.
As for making your query run on PostgreSQL, change it to this:
Convert account_no to a varchar type so you can use SUBSTRING with it.
I think it might work without it, but I don't like relying on implicit conversion, especially when localization/locale/culture issues might pop-up.
The LEFT and RIGHT functions for extracting substrings can be reimplemented like so:
LEFT( text, length ) == SUBSTRING( text FROM 0 FOR length )
RIGHT( text, length ) == SUBSTRING( text FROM CHAR_LENGTH( text ) - length )
And use || to concatenate text values together.
Like so:
SELECT
q.*,
(
SUBSTRING( q.account_no_text FROM CHAR_LENGTH( q.account_no_text ) - 4 )
||
SUBSTRING( q.customer_id FROM CHAR_LENGTH( q.customer_id ) - 5 )
) AS "password for my diplomo"
FROM
(
SELECT
ai.*,
ai.account_no::varchar(10) AS account_no_text
FROM
account_info AS ai
)
AS q
Here is a runnable DB-Fiddle.
Screenshot proof:

How to easily remove count=1 on aliased field in SQL?

I have the following data in a table:
GROUP1|FIELD
Z_12TXT|111
Z_2TXT|222
Z_31TBT|333
Z_4TXT|444
Z_52TNT|555
Z_6TNT|666
And I engineer in a field that removes the leading numbers after the '_'
GROUP1|GROUP_ALIAS|FIELD
Z_12TXT|Z_TXT|111
Z_2TXT|Z_TXT|222
Z_31TBT|Z_TBT|333 <- to be removed
Z_4TXT|Z_TXT|444
Z_52TNT|Z_TNT|555
Z_6TNT|Z_TNT|666
How can I easily query the original table for only GROUP's that correspond to GROUP_ALIASES with only one Distinct FIELD in it?
Desired result:
GROUP1|GROUP_ALIAS|FIELD
Z_12TXT|Z_TXT|111
Z_2TXT|Z_TXT|222
Z_4TXT|Z_TXT|444
Z_52TNT|Z_TNT|555
Z_6TNT|Z_TNT|666
This is how I get all the GROUP_ALIAS's I don't want:
SELECT GROUP_ALIAS
FROM
(SELECT
GROUP1,FIELD,
case when instr(GROUP1, '_') = 2
then
substr(GROUP1, 1, 2) ||
ltrim(substr(GROUP1, 3), '0123456789')
else
substr(GROUP1 , 1, 1) ||
ltrim(substr(GROUP1, 2), '0123456789')
end GROUP_ALIAS
FROM MY_TABLE
GROUP BY GROUP_ALIAS
HAVING COUNT(FIELD)=1
Probably I could make the engineered field a second time simply on the original table and check that it isn't in the result from the latter, but want to avoid so much nesting. I don't know how to partition or do anything more sophisticated on my case statement making this engineered field, though.
UPDATE
Thanks for all the great replies below. Something about the SQL used must differ from what I thought because I'm getting info like:
GROUP1|GROUP_ALIAS|FIELD
111,222|,111|111
111,222|,222|222
etc.
Not sure why since the solutions work on my unabstracted data in db-fiddle. If anyone can spot what db it's actually using that would help but I'll also check on my end.
Here is one way, using analytic count. If you are not familiar with the with clause, read up on it - it's a very neat way to make your code readable. The way I declare column names in the with clause works since Oracle 11.2; if your version is older than that, the code needs to be re-written just slightly.
I also computed the "engineered field" in a more compact way. Use whatever you need to.
I used sample_data for the table name; adapt as needed.
with
add_alias (group1, group_alias, field) as (
select group1,
substr(group1, 1, instr(group1, '_')) ||
ltrim(substr(group1, instr(group1, '_') + 1), '0123456789'),
field
from sample_data
)
, add_counts (group1, group_alias, field, ct) as (
select group1, group_alias, field, count(*) over (partition by group_alias)
from add_alias
)
select group1, group_alias, field
from add_counts
where ct > 1
;
With Oracle you can use REGEXP_REPLACE and analytic functions:
select Group1, group_alias, field
from (select group1, REGEXP_REPLACE(group1,'_\d+','_') group_alias, field,
count(*) over (PARTITION BY REGEXP_REPLACE(group1,'_\d+','_')) as count from test) a
where count > 1
db-fiddle

DB2 : Find distinct from a comma separated values

Find distinct from a comma separated values in ANSI SQL. I am trying this on DB2 database.
Scenario
Id Val
1 A,B,C
2 A,D,A,C,B
3 B,A,C,C,D
Expected output
Id Val
1 A,B,C
2 A,D,C,B
3 B,A,C,D
DB2 offers a way to tokens strings using XML. Using this, you can split the string into tokens and then use listagg(distinct):
select v.id, listagg(distinct tokens.token, ',')
from (values (1, 'A,B,C'), (2, 'A,D,A,C,B'), (3, 'B,A,C,C,D')) v(id, val),
xmltable('for $id in tokenize($s, ",") return <i>{string($id)}</i>' passing v.val as "s"
columns seq for ordinality, token varchar(20) path '.'
) tokens
group by v.id;
Here is a db<>fiddle.
Note: I strongly recommend that you fix the data model. Storing multiple values in a string is bad way to store data in any database.
Depending on your platform and version of Db2, you may have the functions SPLIT() and LISTAGG() available.
with dist as (
select
distinct id, element
from tbl, table(split(val,','))
)
select
id
, listagg(element) within group (order by column_values)
as disinct_list
from dist
group by id
;
EDIT
corrected name of the column returned by SPLIT(), IBM provided version is ELEMENT we happened to have an older User Defined version that used COLUMN_VALUE.

How to use cursor with LISTAGG in PL/SQL?

I have used LISTAGG to concatenate data from two different tables to form the following output:
How do I display the above output neatly like this:
I am using ORACLE PL/SQL. I am thinking if this can be done by implementing cursor, but I am not sure how to do it. Or maybe is there any other way to achieve this? Thanks.
Looks like NATION.N_NAME column's datatype is CHAR as those names are blank-padded. I'd switch to VARCAHR2 (if possible) or try with TRIM, e.g.
select ...
listagg(trim(n.n_name), ', ') within group ...
----
this
WITH CTE AS
(SELECT r.REGION_KEY
,r.R_NAME
,LIST_AGG(trim(n.N_NAME),',') WITHIN GROUP (ORDER BY R_NAME) AS REGION_NATION
FROM REGION r
INNER JOIN NATION n
ON r.R_REGION_KEY = n.N_REGIONKEY
GROUP BY r.R_REGION_KEY
,r.R_NAME
)
SELECT REGION_KEY
,R_NAME || ':' || REGION_NATION as REGION_TEXT
FROM CTE

Concatenating fields when BREAK ON command is used

I have built a command that uses the BREAK ON command to stop the output of duplicate field names. For example:
f.name | f.value
f.name | f.value
f.name | f.value
becomes:
f.name | f.value
| f.value
| f.value
Is there any way to have this output as:
f.name | f.value,f.value,f.value
In some instances the f.name field with have over 20 f.values associated with it.
The output will eventually be used to import into other places so I am trying to make the output as friendly as possible.
You're not looking for a SQL*Plus command, you're looking for a string aggregation.
Assuming your current query is:
select name, value from my_table
you can change it as follows to get your desired result. The DISTINCT is included to eliminate duplicate results in your list.
select name, listagg(value, ', ') within group (order by value) as value
from ( select distinct name, value from my_table )
group by name
LISTAGG() was only released in 11.2, if you're using an earlier version of Oracle you could use the undocumented function WM_CONCAT() or the user defined function STRAGG() as outlined in this useful page on string aggregation techniques.