Convert Legacy to Standard SQL (Join Each and comma Like) - sql

I'm struggling to convert this Legacy SQL Query to Standard SQL. Particular things that need to be converted are FLATTEN, JOIN EACH, No matching signature for function REGEXP_REPLACE for argument types: ARRAY, STRING, STRING. Supported signatures: REGEXP_REPLACE(STRING, STRING, STRING); REGEXP_REPLACE(BYTES, BYTES, BYTES), etc. ...Can anyone please help?
Thanks!
SELECT a.name, b.name, COUNT(*) as count
FROM (FLATTEN(
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*'," ")) name
FROM [gdelt-bq:gdeltv2.gkg]
WHERE DATE>20180901000000 and DATE < 20180910000000 and V2Persons like '%Trump%'
,name)) a
JOIN EACH (
SELECT GKGRECORDID, UNIQUE(REGEXP_REPLACE(SPLIT(V2Persons,';'), r',.*'," ")) name
FROM [gdelt-bq:gdeltv2.gkg]
WHERE DATE>20180901000000 and DATE < 20180910000000 and V2Persons like '%Trump%'
) b
ON a.GKGRECORDID=b.GKGRECORDID
WHERE a.name<b.name
GROUP EACH BY 1,2
ORDER BY 3 DESC
LIMIT 250

SELECT a.name, b.b_name, COUNT(*) as count
FROM (
SELECT DISTINCT GKGRECORDID, REGEXP_REPLACE(name, r',.*'," ") name
FROM `gdelt-bq.gdeltv2.gkg`, UNNEST(SPLIT(V2Persons,';')) as name
WHERE DATE>20180901000000 and DATE < 20180910000000 and V2Persons like '%Trump%'
) a
JOIN (
SELECT DISTINCT GKGRECORDID, REGEXP_REPLACE(b_name, r',.*'," ") b_name
FROM `gdelt-bq.gdeltv2.gkg`, UNNEST(SPLIT(V2Persons,';')) as b_name
WHERE DATE>20180901000000 and DATE < 20180910000000 and V2Persons like '%Trump%'
) b
ON a.GKGRECORDID=b.GKGRECORDID
WHERE a.name<b.b_name
GROUP BY 1,2
ORDER BY 3 DESC
LIMIT 250

Re: the flatten I would consult the documentation here: https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql#removing_repetition_with_flatten
Among other examples, the documentation notes:
"Standard SQL does not have a FLATTEN function as in legacy SQL, but you can achieve similar semantics using the JOIN (comma) operator."
Re: Join Each, this has been answered here: BigQuery - equivalent of GROUP EACH in standard SQL
Basically, it is not necessary at all in standard sql
Re: "LIKE that has comma separated parameters...", your syntax is fine for standard sql. it should not operate any differently than it did when you ran in in legacy sql. One of the big pluses of standard sql is that you can compare columns using functions in the WHERE statement with more flexibility than legacy SQL allowed (if necessary). For instance, if you wanted to split V2Persons before running a like comparison, you could do that right in the WHERE statement
UPDATE: Realizing I missed your last question about data type mismatches. In standard sql you will probably want to cast everything explicitly when you run into these errors. It is more finicky than legacy sql with regards to comparisons between different data-types, but I find this to me more in line with other SQL databases.

Related

How to SELECT a postgresql tsrange column as json?

I want to get a tsrange column returned as json but do not understand how get it to work in my query:
reserved is of type TSRANGE.
c.id and a.reservationid are of type TEXT.
-- query one
SELECT DISTINCT c.*, to_json(a.reserved) FROM complete_reservations c
JOIN availability a ON (c.id = a.reservationid)
throws
ERROR: could not identify an equality operator for type json
LINE 1: SELECT DISTINCT c.*, to_json(a.reserved) FROM complete_reser...
^
SQL state: 42883
Character: 22
it works if i try it like
-- query two
SELECT to_json('[2011-01-01,2011-03-01)'::tsrange);
Result:
"[\"2011-01-01 00:00:00\",\"2011-03-01 00:00:00\")"
and I do not understand the difference between both scenarios.
How do I get query one to behave like query two?
as pointed out in this comment by Edouard, there seems to be no JSON representation of a tsrange. Therefore I framed my question wrong.
In my concrete case it was sufficent to turn the tsrange to an array of the upper() and lower() values of the TSRANGE and cast those values as strings. this way i can use the output as is and let my downstream tools handle them as json.
SELECT DISTINCT c.*, ARRAY[to_char(lower(a.reserved),'YYYY-MM-DD HH:MI:SS'),to_char(upper(a.reserved),'YYYY-MM-DD HH:MI:SS')] reserved FROM complete_reservations_with_itemidArray c
JOIN availability a ON (c.id = a.reservationid)
which returns a value like this in the reserved column: {"2023-02-04 04:57:00","2023-02-05 04:57:00"} which can be parsed as json if needed.
I post this for reference. I am not sure if it exactly answers my question as it was framed.

How to convert this SAS code to SQL Server code?

SAS CODE:
data table1;
set table2;
_sep1 = findc(policynum,'/&,');
_count1 = countc(policynum,'/&,');
_sep2 = findc(policynum,'-');
_count2 = countc(policynum,'-');
_sep3 = findc(policynum,'_*');
_count3 = countc(policynum,'_*');
How can I convert this into a select statement like below:
select
*,
/*Code converted to SQL from above*/
from table2
For example I tried the below code:
select
*,
charindex('/&,',policynum) as _sep1,
LEN(policynum) - LEN(REPLACE(policynum,'/&,','')) as _count1
from table2
But I got a ERROR 42S02: Function 'CHARINDEX(UNKNOWN, VARCHAR)' does not exist. Unable to identify a function that satisfies that given argument types. You may need to add explicit typecasts.
Please note that the variable pol_no is: 'character varying(50) not null'.
I am running this on using Aginity Workbench for Netezza. I believe this is IBM.
Assuming Oracle based on CHARINDEX() this may work:
You need to apply it twice, once for each character and take the minimum to find the first occurrence.
There may be a better suited function within Oracle, but I don't know enough to suggest one.
select
*,
min(charindex('/',policynum), charindex('&', policynum)) as _sep1
from table2
EDIT: based on OP notes.
Netezza seems like IBM which means use the INSTR function, not CHARINDEX.
select
*,
min(instr(policynum, '/'), instr(policynum, '&')) as _sep1
from table2
https://www.ibm.com/support/knowledgecenter/en/SSGU8G_12.1.0/com.ibm.sqls.doc/ids_sqs_2336.htm
FINDC & COUNTC functions are basically used for searching a character & counting them.
You can use LIKE operator from SQL to find characters with '%' and '_' wildcards
e.g. -
SELECT * FROM <table_name> WHERE <column_name> LIKE '%-%';
and
SELECT COUNT(*) FROM <table_name> WHERE <column_name> LIKE '%-%';
You can use regular expressions in the LIKE operator as well

remove first two digits of customer_number in impala sql

in Cloudera / impala SQL I need to remove the first to digits of a customer_number,
I tried the following, but this does not work. Can you please help ?
many thanks
CREATE TABLE new
STORED AS PARQUET AS
SELECT DISTINCT
CASE t1.customer_number = RIGHT(t1.customer_number, LEN(t1.customer_number) - 2)
from Old;
customer_number should become short_cust_no
33764703 764703
36764624 764624
36763795 763795
37764829 764829
39766002 766002
Impala supports substr() with two arguments. You can simply do:
SELECT DISTINCT SUBSTR(t1.customer_number, 3)
FROM Old t1;
EDIT:
I had assume customer_number was a string, because the OP uses string functions.
If it is a number, use mod();
SELECT DISTINCT MOD(t1.customer_number, 1000000)
FROM Old t1;
Note: The types for the arguments to mod() need to be compatible so this might require a cast() of some sort.
If all your customer numbers are 14 characters then I think you should be able to do that with
RIGHT(t1.customer_number, 12)
This addresses the DOUBLE, TINYINT mistake
SELECT DISTINCT
SUBSTR(cast(t1.customer_number as string), 3,10)
FROM old;

Oracle SQL: Filtering rows with non-numeric characters

My question is very similar to this one: removing all the rows from a table with columns A and B, where some records include non-numeric characters (looking like '1234#5' or '1bbbb'). However, the solutions I read around don't seem to work for me. For example,
SELECT count(*) FROM tbl
--962060;
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'[^0-9]') OR REGEXP_like(B,'[^0-9]') ) ;
--17
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'[0-9]') and REGEXP_like(B,'[0-9]') )
;
--962060
From the 3rd query, I'd expect to see (962060-17)=962043. Why is it still 962060? An alternative query like this also gives the same answer:
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'[[:digit:]]')and REGEXP_like(B,'[[:digit:]]') )
;
--962060
Of course, I could bypass the problem by doing query1 minus query2, but I'd like to learn how to do that using regular expressions.
If you use regexp you should take in account that any part of string may be matched as regexp. According your example you should specify that whole string should cntain only numbers ^ - is the beginig of string $ - is the end. And you may use \d- is digits
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'^[0-9]+$') and REGEXP_like(B,'^[0-9]+$') )
or
SELECT count(*)
FROM tbl
WHERE (REGEXP_like(A,'^\d+$') and REGEXP_like(B,'^\d+$') )
I know you specifically asked for a regex solution, but translate can solve these kind of questions as well (and usually faster because regexes use more processing power):
select count(1)
from tbl
where translate(a, 'x0123456789', 'x') is null
and translate(b, 'x0123456789', 'x') is null;
What this does: translate the characters 0123456789 to null, and if the result is null, then the input must have been all digits. The 'x' is just there because the third argument to translate can not be null.
Thought I should add this here, might be helpful to other readers.

Concatenate variable to table element to compare three tables in SQL

I'm using Alpha Anywhere to take a SQL query that my company uses to create a grid.
The query is as follows:
SELECT t.name,cat.description,i.item_num,i.type_of_unit,i.brand,i.pack,i.description2
FROM cim i, tname t, cim cat
WHERE 'P'||i.price_book_code=t.nameid and i.price_book_group=cat.item_num and i.category!=95 and
i.buyer_num!=8 and cat.warehouse_num=0 and i.broken_case != 'Y' and i.item_num not in (select
item_num from proprietary_items)
ORDER BY t.name,cat.description,i.description2, type_of_unit;
In the Where clause, 'P' is concatenated to i.price_book_code to equal t.nameid because all those values have a P at the beginning.
This query works fine in sql-developer, however alpha anywhere will not run it. It claims invalid token at the 'P' level. Apparently this type of concatenation is not compatible to portable SQL. Is there any other way I can concatenate and compare?
Thank you,
Howard
The CONCAT function should work. Here's an example of using it in the WHERE clause:
SELECT *
FROM dual
WHERE CONCAT('Foo','Bar') = 'FooBar';