I am looking for a way to rename a column that is nested from my avro schema. I tried the options google has on their docs (https://cloud.google.com/bigquery/docs/manually-changing-schemas) but any time I try to alias or cast as a nested structure it doesn't work.
For example:
SELECT
* EXCEPT(user.name.first, user.name.last),
user.name.first AS user.name.firstName,
user.name.last AS user.name.lastName
FROM
mydataset.mytable
However this doesn't like aliasing with paths. Another option I am trying to avoid is pulling in all my previous avro files in and converting them using dataflow. I am hoping for a more elegant solution than that. Thanks.
You need to rebuild the structure at each level. Here's an example over some sample data:
SELECT
* REPLACE(
(SELECT AS STRUCT user.* REPLACE (
(SELECT AS STRUCT user.name.* EXCEPT (first, last),
user.name.first AS firstName,
user.name.last AS lastName
) AS name)
) AS user)
FROM (
SELECT
STRUCT(
STRUCT('elliott' AS first, '???' AS middle, 'brossard' AS last) AS name,
'Software Engineer' AS occupation
) AS user
)
The idea is to replace the user struct with a new one where name has the desired struct type using the nested replacement/struct construction syntax.
You have to re-build those structs. You can do something like this:
select
struct(
struct(
user.name.first as firstName,
user.name.last as lastName
) as name,
user.height as height
) as user,
address,
age
from mydataset.mytable
Once you can verify the results, you can go about either creating a new table from these results or over-writing the existing table (which is essentially a workaround for renaming columns, but with caution). Hope it helps.
Related
I am relatively new to SQL and BigQuery both. I have a table where one of the rows looks like home/desktop/parent/child/grandchild. I want to have the flexibility to choose the 'group by' condition so sometime it could be group by the root which is 'home' here and sometimes by any other folder name in the hierarchy. Is there any generic way to achieve this ? I have thought a lot about it but all I can think about is parsing the row values as Strings and do some substring operations here but I am not clear about how to make a generic query for the same. Appreciate any corrections to the question, I know it's kinda vague but I have tried my best to put it. Thanks !
To implement your idea on parsing row values and apply string manipulation. See 2 approaches that I have thought of using a simple sample data.
Approach 1:
Use regexp_extract to capture the path that you want to use for group by. If you want to change the grouping by another folder name, you can add another regex_extract to extract that specific folder name and use it in group by.
with data as (
select 'home/desktop/parent/child/grandchild' as path, 'John' as owner,
union all select 'home/desktop/parent/child/grandchild_1' as path, 'Mark' as owner,
union all select 'home/desktop/parent/child/grandchild_2' as path, 'Ron' as owner,
union all select 'root/desktop/parent/child/grandchild_3' as path, 'Jason' as owner,
union all select 'root/desktop/parent/child/grandchild_4' as path, 'Pat' as owner,
),
get_root_path as (
select
regexp_extract(path, r'^(\w+)\/\w+') as root_path,
path,
owner
from data
)
select
count(root_path) as count_root_path,
root_path from get_root_path
group by root_path
Output (Approach 1):
Approach 2:
Use regexp_extract_all to capture everything based on the defined regex. The values then can be accessed using indices. Adjust the increment value in [OFFSET(0)] (Ex. [OFFSET(1)] will return desktop and so on) to get the next folder. Just make sure that your regex is correct.
with data as (
select 'home/desktop/parent/child/grandchild' as path, 'John' as owner,
union all select 'home/desktop/parent/child/grandchild_1' as path, 'Mark' as owner,
union all select 'home/desktop/parent/child/grandchild_2' as path, 'Ron' as owner,
union all select 'root/desktop/parent/child/grandchild_3' as path, 'Jason' as owner,
union all select 'root/desktop/parent/child/grandchild_4' as path, 'Pat' as owner,
),
get_folder_names as (
select
regexp_extract_all(path, r'(\w+)\/?') as folder_name,
path,
owner
from data
)
select
count(folder_name[OFFSET(0)]) as count_folder,
folder_name[OFFSET(0)] as folder
from get_folder_names
group by folder_name[OFFSET(0)]
Output(Approach 2):
I have a table with nested values, like the following:
I'd like to grab the values, with keys as columns without multiple cross joins.
i.e.
SELECT
owner_id,
owner_type,
domain,
metafields.value AS name,
metafields.value AS image,
metafields.value AS location,
metafields.value AS draw
FROM
example_table
Obviously, the above won't work for this, but the following output would be desired:
In the actual table there are hundreds of metafields per owner_id, and hundreds of owner_ids, and owner_types. Multiple joins to other tables for owner_types is fine, but for the same owner type, I don't want to have to join multiple times.
Basically, I need to be able to select the key to which the column corresponds, and display the relevant value for that column. Without, having to display every metafield available.
Any way of doing this?
Consider below approach
select * except(id) from (
select t.* except(metafields),
to_json_string(t) id, key, value
from your_table t, unnest(metafields) kv
)
pivot (min(value) for key in ('name', 'image', 'location', 'draw'))
if applied to sample data in your question - output is
You can use the subqueries and SAFE_offset statement and get a value from an array at a specific location.
Also, you need to use STRING_AGG, which returns a value (either STRING or BYTES) obtained by concatenating non-null values.
With the information you shared, you can use the query below.
With this code, you will get all the columns separated by a comma:
WITH sequences AS
(
SELECT 1 as ID,"product" AS owner_type,"beta.com" AS domain,["name","image","lcation","draw"] AS metalfields_key, ["big","pic.png","utha","1"] AS metalfields_value
),
Val as(
SELECT distinct id, owner_type,domain, value FROM sequences, sequences.metalfields_value as value, sequences.metalfields_key
), text as(
SELECT
id, owner_type, domain,
STRING_AGG(value ORDER BY value) AS Text
FROM Val
GROUP BY owner_type, domain, id
)
In this code, you will get each element that is separated by a comma and return them by columns.
SELECT DISTINCT t1.id, t1.owner_type,domain,
split(t1.text, ',')[SAFE_offset(1)] as name,
split(t1.text, ',')[SAFE_offset(2)] as image,
split(t1.text, ',')[SAFE_offset(3)] as location,
split(t1.text, ',')[SAFE_offset(0)] as draw
from text as t1
You can see the result.
Let's say there is a profile plate with the fields nickname, full name. The nickname field is text. I want to sort this table by the TEXT field in a case insensitive manner, removing the duplicates of the nickname field. How should I do it?
With such a record
CREATE TABLE profile(
nickname text,
fullname text );
SELECT DISTINCT * FROM profile
ORDER BY lower(nickname)
An error is displayed ERROR: for SELECT DISTINCT, ORDER BY expressions must appear in select list
Unfortunately, lower () does not work with varchar (it writes an error related to utf8) and there is no way to use the CITEXT extension. How to solve this problem?
As the error message suggests, you cannot order by a column that does not exist. That being said, add the lower function to the select clause and it should work:
SELECT DISTINCT lower(nickname), fullname
FROM profile
ORDER BY lower(nickname);
In case you want to arbitrarily eliminate duplicated nicknames try DISTINCT ON:
SELECT DISTINCT ON (lower(nickname)) *
FROM profile
ORDER BY lower(nickname);
SELECT DISTINCT lower(nickname),* FROM profile
ORDER BY lower(nickname);
as the error written
I am querying a database in Postgres using psql. I have used the following query to search a field called tags that has an array of text as it's data type:
select count(*) from planet_osm_ways where 'highway' = ANY(tags);
I now need to create a query that searches the tags fields for any word starting with the letter 'A'. I tried the following:
select count(*) from planet_osm_ways where 'A%' LIKE ANY(tags);
This gives me a syntax error. Any suggestions on how to use LIKE with an array of text?
Use the unnest() function to convert array to set of rows:
SELECT count(distinct id)
FROM (
SELECT id, unnest(tags) tag
FROM planet_osm_ways) x
WHERE tag LIKE 'A%'
The count(dictinct id) should count unique entries from planet_osm_ways table, just replace id with your primary key's name.
That being said, you should really think about storing tags in a separate table, with many-to-one relationship with planet_osm_ways, or create a separate table for tags that will have many-to-many relationship with planet_osm_ways. The way you store tags now makes it impossible to use indexes while searching for tags, which means that each search performs a full table scan.
Here is another way to do it within the WHERE clause:
SELECT COUNT(*)
FROM planet_osm_ways
WHERE (
0 < (
SELECT COUNT(*)
FROM unnest(planet_osm_ways) AS planet_osm_way
WHERE planet_osm_way LIKE 'A%'
)
);
I'm trying to integrate with some software (that I can't modify) that queries a database that I can modify.
I can give this software SQL queries, like so "select username, firstname, lastname from users where username in ?"
The software than fills in the ? with something like ('alice', 'bob'), and gets user information for them.
Thing is, there's another piece of software, which I again can't modify, which occasionally generates users like 'user2343290' and feeds them through to the first piece of software. Of course, it throws errors because it can't find that user.
So the query I want to run is something like this:
select username, firstname, lastname from users where username in ?
UNION ALL
select t.column1, 'Unknown', 'Unknown' from create_table(?) t
where create_table generates a table with the rows mentioned in ?, with the first column named column1.
Or alternatively:
select username, firstname, lastname from users where username in ?
UNION ALL
select t.column1, 'Unknown', 'Unknown' from _universe_ t where t.column1 in ?
where _universe_ is some fake table that contains possible every value in column1 (i.e. infinitely large).
I've tried select ? from dual, but unfortunately this only worked when ? was something like ('x'), not ('x', 'y').
Keep in mind I can't change the format of how the ? comes out, so I can't do select 'alice' from dual union all select 'bob' from dual.
Anyone know how I could do what I've mentioned, or something else to have a similar effect?
You can turn the delimited string of names into a table type like so:
CREATE TYPE name_tab AS TABLE OF VARCHAR2(30);
/
SELECT * FROM table(name_tab('alice','bob'));
So you would just need to create the type then your example would become:
select username, firstname, lastname from users where username in ?
UNION ALL
select t.column1, 'Unknown', 'Unknown' from table(name_tab ?) t
(I'm assuming that the ? is replaced by simple text substitution -- because the IN wouldn't work if it was done as a bind variable -- and that the substituted text includes the parentheses.)
However, I am not sure the result of this will be helpful, since when a list of good usernames is given, you'll now have two result rows for each username, one with the actual information and another with the 'Unknown' values.
A better way to phrase the query might be:
select t.column_value username,
NVL(users.firstname,'Unknown'),
NVL(users.lastname,'Unknown')
from table(name_tab ?) t left join users on users.username = t.column_value
That should give you one row per username, with the actual data if it exists, or the 'Unknown' values if it does not.
You could use a pipelined function:
create type empname_t is table of varchar2(100);
create or replace function to_list(p_Names in string) return empname_t pipelined is
begin
pipe row(p_Names);
return;
end;
select * from table(to_list('bob'))
If you need to split the names (e.g. 'bob,alice'), you could use a function accepting a string and returning a empname_t, e.g. Tom Kyte's in_list, see
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:210612357425
and modify the to_list function to iterate over the collection and pipe each item from the collection.