I have a BQ table that looks like this
| website |
| -------- |
| xyz.com |
| abc.xyz.com |
| 123.abc.xyz.com |
| 098.com |
I want to clean up this table so that I only get the domain. In my ideal world I want to execute the following steps.
For each row - Count number of '.'
If there are more than 1 '.', THEN extract everything to the right of the second '.' from the right. So 'abc.xyz.com' gets extracted as 'xyz.com'
If there is just 1 '.', THEN do nothing and give me the input as output. So 'xyz.com' gets outputed as 'xyz.com'
Use below approach
select website, net.reg_domain(website) as result
from your_table
if applied to sample data in your question - output is
A simplier approach to extract only the domain using regex:
with sample as (
select 'xyz.com' as website
union all
select 'abc.xyz.com' as website
union all
select '123.abc.xyz.com' as website
union all
select '098.com' as website
)
select
regexp_extract(website, r"[\w]+\.[\w]+$") as website
from sample;
Output:
xyz.com
xyz.com
xyz.com
098.com
Explanation:
[\w]+ will match any sequence of one or more letters
\. will match a single dot
[\w]+$ will match any sequence of one or more letters only on the end of the string
Related
I have a table
id | status | outgoing
-------------------------
1 | paid | {"a945248027_14454878":"processing","old.a945248027_14454878":"cancelled"}
2 | pending| {"069e5248cf_45299995":"processing"}
I am trying to extract the values after each underscore in the outgoing column e.g from a945248027_14454878 I want 14454878
Because the json data is not standardised I can't seem to figure it out.
You may extract the json key part after the underscore using regexp version of substring.
select id, status, outgoing,
substring(key from '_([^_]+)$') as key
from the_table, lateral jsonb_object_keys(outgoing) as j(key);
See demo.
Pretty simple question, specific to BigQuery. I'm sure there's a command I'm missing. I'm used to using "collate" in another query which doesn't work here.
email
| -------- |
| eric#email.com |
| JOHN#EMAIL.COM |
| STACY#EMAIL.COM |
| tanya#email.com |
Desired return:
JOHN#EMAIL.COM,STACY#EMAIL.COM
Consider below
select *
from your_table
where upper(email) = email
If applied to sample data in your question - output is
In case you want the output as a comma separated list - use below
select string_agg(email) emails
from your_table
where upper(email) = email
with output
You can use below cte (which is exact data sample from your question) for testing purposes
with your_table as (
select 'eric#email.com' email union all
select 'JOHN#EMAIL.COM' union all
select 'STACY#EMAIL.COM' union all
select 'tanya#email.com'
)
Sorry, I don't know how to describe that as a title.
With a query (example: Select SELECT PKEY, TRUNC (CREATEDFORMAT), STATISTICS FROM BUSINESS_DATA WHERE STATISTICS LIKE '% business_%'), I can display all data that contains the value "business_xxxxxx".
For example, the data field can have the following content: c01_ad; concierge_beendet; business_start; or also skill_my; pre_initial_markt; business_request; topIntMaster; concierge_start; c01_start;
Is it now possible in a temp-only output the corresponding value in another column?
So the output looks like this, for example?
PKEY | TRUNC(CREATEDFORMAT) | NEW_STATISTICS
1 | 13.06.2020 | business_start
2 | 14.06.2020 | business_request
That means removing everything that does not start with business_xxx? Is this possible in an SQL query? RegEx would not be the right one, I think.
I think you want:
select
pkey,
trunc(createdformat) createddate,
regexp_substr(statistics, 'business_\S*') new_statistics
from business_data
where statistics like '% business_%'
You can also use the following regexp_substr:
SQL> select regexp_substr(str,'business_[^;]+') as result
2 from
3 --sample data
4 (select 'skill_my; pre_initial_markt; business_request; topIntMaster; concierge_start; c01_start;' as str from dual
5 union all
6 select 'c01_ad; concierge_beendet; business_start;' from dual);
RESULT
--------------------------------------------------------------------------------
business_request
business_start
SQL>
My client wants the possibility to match a set of data against an array of regular expressions, meaning:
table:
name | officeId (foreignkey)
--------
bob | 1
alice | 1
alicia | 2
walter | 2
and he wants to do something along those lines:
get me all records of offices (officeId) where there is a member with
ANY name ~ ANY[.*ob, ali.*]
meaning
ANY of[alicia, walter] ~ ANY of [.*ob, ali.*] results in true
I could not figure it out by myself sadly :/.
Edit
The real Problem was missing form the original description:
I cannot use select disctinct officeId .. where name ~ ANY[.*ob, ali.*], because:
This application, stored data in postgres-xml columns, which means i do in fact have (after evaluating xpath('/data/clients/name/text()'))::text[]):
table:
name | officeId (foreignkey)
-----------------------------------------
[bob, alice] | 1
[anthony, walter] | 2
[alicia, walter] | 3
There is the Problem. And "you don't do that, that is horrible, why would you do it like this, store it like it is meant to be stored in a relation database, user a no-sql database for Document-based storage, use json" are no options.
I am stuck with this datamodel.
This looks pretty horrific, but the only way I can think of doing such a thing would be a hybrid of a cross-join and a semi join. On small data sets this would probably work pretty well. On large datasets, I imagine the cross-join component could hit you pretty hard.
Check it out and let me know if it works against your real data:
with patterns as (
select unnest(array['.*ob', 'ali.*']) as pattern
)
select
o.name, o.officeid
from
office o
where exists (
select null
from patterns p
where o.name ~ p.pattern
)
The semi-join helps protect you from cases where you have a name like "alicia nob" that would meet multiple search patterns would otherwise come back for every match.
You could cast the array to text.
SELECT * FROM workers WHERE (xpath('/data/clients/name/text()', xml_field))::text ~ ANY(ARRAY['wal','ant']);
When casting a string array into text, strings containing special characters or consisting of keywords are enclosed in double quotes kind of like {jimmy,"walter, james"} being two entries. Also when matching with ~ it is matched against any part of the string, not the same as LIKE where it's matched against the whole string.
Here is what I did in my test database:
test=# select id, (xpath('/data/clients/name/text()', name))::text[] as xss, officeid from workers WHERE (xpath('/data/clients/name/text()', name))::text ~ ANY(ARRAY['wal','ant']);
id | xss | officeid
----+-------------------------+----------
2 | {anthony,walter} | 2
3 | {alicia,walter} | 3
4 | {"walter, james"} | 5
5 | {jimmy,"walter, james"} | 4
(4 rows)
I'm working with Bigquery to process some Adwords data and, more precisely, to extract all the url parameters from our destination URLs so we can organize it better and etc.
I wrote the following Query to give me back all the parameters available in the "DestinationURL" field in the table. As follows:
SELECT Parameter
FROM (SELECT NTH(1, SPLIT(Params,'=')) as Parameter,
FROM (SELECT
AdID,
NTH(1, SPLIT(DestinationURL,'?')) as baseurl,
split(NTH(2, SPLIT(DestinationURL,'?')),'&') as Params
FROM [adwords_accounts_ads.ads_all]
HAVING Params CONTAINS '='))
GROUP BY 1
Runnig this will give me 6 parameters. That is correct but incomplete, because in this testing table I know there are 2 other parameters in the URLs that were not fetched. One called 'group' and the other called 'utm_content'.
Now if I run:
SELECT Parameter
FROM (SELECT NTH(1, SPLIT(Params,'=')) as Parameter,
FROM (SELECT
AdID,
NTH(1, SPLIT(DestinationURL,'?')) as baseurl,
split(NTH(2, SPLIT(DestinationURL,'?')),'&') as Params
FROM [adwords_accounts_ads.ads_all]
HAVING Params CONTAINS 'p='))
GROUP BY 1
I get the "group" parameter showing.
question is: shouldn't the
"CONTAINS '='"
condition include the
"CONTAINS 'p='"
In the result? same happens for 't=' instead of '='
Does anyone know how I can fix that? or even how to extract all the parameters from a string that contains a URL?
ps: using LIKE yields the exact same thing
Thanks!
Split creates a REPEATED output type, and you have to FLATTEN the table to see correctly.
Here I used flatten on params and the output is now good:
SELECT nth(1,SPLIT(Params,'=')) AS Param,
nth(2,SPLIT(Params,'=')) AS Value
FROM flatten(SELECT
AdID,
NTH(1, SPLIT(DestinationURL,'?')) AS baseurl,
split(NTH(2, SPLIT(DestinationURL,'?')),'&') AS Params
FROM
(SELECT 1 AS AdID,'http://www.example.com.br/?h=Passagens+Aereas&source=google&vt=0' AS DestinationURL)
HAVING Params CONTAINS '=',
params
)
Outputs:
+-----+--------+------------------+---+
| Row | Param | Value | |
+-----+--------+------------------+---+
| 1 | h | Passagens+Aereas | |
| 2 | source | google | |
| 3 | vt | 0 | |
+-----+--------+------------------+---+
NOTE: The Web UI always flattens your result but If you select a destination table and uncheck "flatten results", you will get a single row with a repeated parts column.