So I ran into what I thought was a bizarre error this morning when I accidentally referred to a non-existant "count" column in my CTE. Postgres was looking for a GROUP BY clause even though I didn't think I was doing an aggregate. A little more playing around and it appears the table.count is equivalent to a count star function. Consider the following:
SELECT
c.clinic_id,
c.count,
count(*) as count_star
FROM clinic_member c
GROUP BY
c.clinic_id
ORDER BY clinic_id;
This will generate results that look like this in my dataset:
Intrigued to understand what the actual Postgres syntax rules are, I tried searching the documentation for references to this syntax and was unable to find anything; just lots of documentation for count(*). Can anyone explain if this is valid SQL and whether there are other aggregate functions that can also be called similarily? Links to Postgres documentation would be awesome if they exist.
Note. This is on Postgres 9.5.9
It is valid Postgres syntax because in Postgres a function with a single argument that matches a table type can be called in two different ways:
Assuming a table name foo and a function named some_function with a single argument of type foo then the following:
select some_function(f)
from foo f;
is equivalent to
select f.some_function
from foo f;
The alias is actually not necessary:
select foo.some_function
from foo;
This is a result of the "object oriented" structure of Postgres.
count() can take any argument - including a row reference (=record) therefore
select count(f)
from foo f;
is equivalent to
select f.count
from foo f;
This is documented in the chapter about Function Calls in the manual:
A function that takes a single argument of composite type can optionally be called using field-selection syntax, and conversely field selection can be written in functional style. That is, the notations col(table) and table.col are interchangeable. This behavior is not SQL-standard but is provided in PostgreSQL because it allows use of functions to emulate “computed fields”. For more information see Section 8.16.5.
Related
I've used Glue to generate tables for Athena. I have some nested array/struct values (complex types) that I'm having trouble accessing via query.
I have two tables, the one in question is named "sample_parquet".
ids (array<struct<idType:string,idValue:string>>)
The the cell has the value:
[{idtype=ttd_id, idvalue=cf275376-8116-4cad-a035-e241e14b1470}, {idtype=md5_email, idvalue=932babe184fb11c92b09b3e13e936124}]
And I've tried:
select ids.idtype from sample_parquet limit 1
Which yields:
SYNTAX_ERROR: line 1:8: Expression "ids" is not of type ROW
And:
select s.idtype from sample_parquet.ids s limit 1;
Which yields:
SYNTAX_ERROR: line 1:22: Schema sample_parquet does not exist
I've also tried:
select json_extract(ids, '$.idtype') as idtype from sample_parquet limit 1;
Which yields:
SYNTAX_ERROR: line 8:8: Unexpected parameters (array(row(idtype varchar,idvalue varchar)), varchar(8)) for function json_extract. Expected: json_extract(varchar(x), JsonPath) , json_extract(json, JsonPath)
Thanks for any help.
You are trying to access the elements of an array like you'd access a dictionary/key-value.
Use UNNEST to flatten the array and then you can use the . operator.
For more information on working with JSONs and ARRAYs on AWS Docs.
ids is a column of type array, not a relation (e.g. a table, view, or a subquery). Confusingly, when dealing with nested types in Athena/Presto you have to stop thinking in terms of SQL and instead think more as you would in a programming language.
There are dedicated functions that act on arrays, maps, as well as lambda functions (no relationship with the AWS service), that can be used to dig into nested types.
When you say SELECT ids.idtype … I assume that what you're after could be written like ids.map((id) => id.ittype) in JavaScript. In Athena/Presto this could be expressed as SELECT transform(ids, id -> id.idtype) ….
The result of transform will be a relation with a column of type array<string>. If you want each element of that array as a separate row, you need to use UNNEST, but if you instead want the first value you can use the element_at function. There are also other functions that you may be familiar with such as filter, slice, and flatten that produce new arrays, as well as reduce, which produce a scalar value.
I am just studying how to use SQL in snowflake. Here is a snapshot:
And this is the code used in here:
use schema SNOWFLAKE_SAMPLE_DATA.TPCH_SF1;
--use schema SNOWFLAKE_SAMPLE_DATA.TPCH_SF10;
select *
from LINEITEM
limit 200
You can see the table includes two feilds: L_LINENUMBER, L_QUANTITY. Now I want to try a user defined function, which can do:
use L_LINENUMBER, L_QUANTITY as two parameters transferred into the function,
calculate L_LINENUMBER1=L_LINENUMBER+1, and L_QUANTITY1=mean(L_QUANTITY).
join the two new fields (L_LINENUMBER1, L_QUANTITY1) to the original table (LINEITEM)
how to use create function to do this. I have read a lot of examples regarding create function. But I just cannot get the point. Maybe because I am not good at SQL. So, could anyone give me a comprehensive example with all the details?
I understand that you question is about UDFs, but using UDFs for your purpose here is overkill.
You can increment an attribute in a table using the following statement.
SELECT
L_LINENUMBER+1 as L_LINENUMBER1
FROM LINEITEM;
To calculate the mean of an attribute in a table, you should understand that this is an aggregate function which only makes sense when used in conjunction with a group by statement. An example with your data is shown below.
SELECT
AVG(L_QUANTITY) AS L_QUANTITY1
FROM LINEITEM
GROUP BY L_ORDERKEY;
Since your question was originally on UDFs and you seem to be following with Snowflake's sample data, the example that they provide is the following UDF which accepts a temperature in Kelvin and converts it to Fahrenheit (from the definition you can see that it can be applied to any attribute of the number type).
CREATE OR REPLACE FUNCTION
UTIL_DB.PUBLIC.convert_fahrenheit( t NUMBER)
RETURNS NUMBER
COMMENT='Convert from Kelvin from Fahrenheit'
AS '(t - 273.15) * 1.8000 + 32.00';
I can't seem to figure out why this won't work - can someone please help? This is part of a larger query, so I don't want to have to update the one that already exists - just wanna add to it -
SELECT INNERPART.*,
SUBSTR(status_remday, 1,1) AS COMPLETE,
**--this line shows if it is completed or not**
DECODE(SUBSTR(status_remday, 1,1),'Y','Complete','N','Incomplete', null) AS qualCompleted,
**--need this to show if the curriculum is complete or not, in it's own row. will eventually have about 10 or more qual_ids**
decode(INNERPART.qualID,'ENG_CURR_SAFETY CERT', qualCompleted) as SAFETY
FROM (Innerpart)
The problem is that the SQL syntax (the Oracle dialect, anyway) doesn't allow you to define an alias in a SELECT clause and then reference the same alias in the same SELECT clause (even if it's later in the clause).
You define qualCompleted as a DECODE, and then you reference qualCompleted in the second DECODE. That won't work.
If you don't want to define qualCompleted at one level and then wrap everything within an outer SELECT where you can reference that name, your other option is to use the first DECODE, as is (not by alias) in the second DECODE.
This:
decode(INNERPART.qualID,'ENG_CURR_SAFETY CERT', qualCompleted) as SAFETY
should instead be written as
decode(INNERPART.qualID,'ENG_CURR_SAFETY CERT',
DECODE(SUBSTR(status_remday, 1,1),'Y','Complete','N','Incomplete', null) )
as SAFETY
One more thing: by default, DECODE returns null if the first parameter is not matched in DECODE. So you don't actually need to give the last parameter (null) in your definition of qualCompleted.
EDIT: here is what the Oracle documentation says about column aliases.
Link: https://docs.oracle.com/database/121/SQLRF/statements_10002.htm#i2080424
c_alias Specify an alias for the column expression. Oracle Database will use this alias in the column heading of the result set.
The AS keyword is optional. The alias effectively renames the select
list item for the duration of the query. The alias can be used in
the order_by_clause but not other clauses in the query.
This means a few things. An alias like the qualCompleted you created cannot be used in the same query in the WHERE clause, GROUP BY, etc. - and not even in the SELECT clause where it was created. It can ONLY be used in the ORDER BY clause of the same query. Any other use must be in a surrounding, "outer" query. It also does mean, though, that you can use it in ORDER BY, if needed.
In your case, if you ONLY created qualCompleted so that you can reference it in another DECODE, and had no other use for it, then you don't even need to define it at all (since it doesn't help anyway); just define SAFETY directly as a nested call to DECODE.
I'd like to reuse an alias that contains a non-alphabetic character in a query, something like:
SELECT 42 AS "the#answer", "the#answer"+8 AS "fifty";
Output I want: 42|50; output I get: 42|8.
I've tried almost every possible combination of quote types, and looked for documentation, but I can't seem to find a working solution.
Any idea?
SQL cannot refer to an alias from the same output clause in which it was introduced. (It has nothing to do with quoting, which only allows otherwise invalid identifiers; some SQL vendors would have thrown an error, but SQLite appears "more relaxed" in the handling of this case.)
You could use a nested query (sqlfiddle).
SELECT fortytwo, fortytwo + 8 as fifty
FROM (
SELECT 42 AS fortytwo)
This works because the referenced identifier, fortytwo, was introduced in a "previous" output clause.
I am trying to run a simple query making a restriction of like % in BigQuery, but LIKE is not in their syntax, so how can it be implemented?
You can use the REGEXP_MATCH function (see the query reference page):
REGEXP_MATCH('str', 'reg_exp')
Instead of using the % syntax used by LIKE, you should use regular expressions (detailed syntax definition here)
LIKE is officially supported in BigQuery Standard SQL -
https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#comparison_operators
And I think it also works in Legacy SQL!
REGEXP_MATCH returns true if str matches the regular expression. For string matching without regular expressions, use CONTAINS instead of REGEXP_MATCH.
https://developers.google.com/bigquery/docs/query-reference#stringfunctions
REGEXP_MATCH is great if you know how to use it, but for those who aren't sure there won't be any commonly used special characters such as '.','$' or '?' in the lookup string, you can use LEFT('str', numeric_expr) or RIGHT('str', numeric_expr).
ie if you had a list of names and wanted to return all those that are LIKE 'sa%'
you'd use:
select name from list where LEFT(name,2)='sa'; (with 2 being the length of 'sa')
Additionally, if you wanted to say where one column's values are LIKE another's, you could swap out the 2 for LENGTH(column_with_lookup_strings) and ='sa' for =column_with_lookup_strings, leaving it looking something like this:
select name from list where LEFT(name,LENGTH(column_with_lookup_strings))= column_with_lookup_strings;
https://cloud.google.com/bigquery/query-reference