Accessing an 'f0_' field - sql

I have created a common table expression with a list of dates within a range using the following SQL code in BigQuery:
WITH calendar AS(
SELECT * FROM UNNEST(GENERATE_DATE_ARRAY('2020-01-01', CURRENT_DATE(), INTERVAL 1 DAY))
)
If I were to run that as a query without the 'WITH' clause, the query would return a single column called 'f0_' (you should be able to reproduce this result yourself).
Now, the problem I'm having is that when I attempt to access that column in a separate SELECT statement, the name 'f0_' is not recognised. ("Unrecognized name: f0_")
WITH calendar AS(
SELECT * FROM UNNEST(GENERATE_DATE_ARRAY('2020-01-01', CURRENT_DATE(), INTERVAL 1 DAY))
)
SELECT f0_
FROM calendar
Of course, that query in itself is pointless, as I could just run the first SELECT statement without the WITH clause and not need the second SELECT statement. I get that. The end result I'm after is a little more complex though, and the logic above should be sufficient to explain the problem I'm having. Basically, if the SELECT statement inside that common table expression returns a column called 'f0_' when I run it as a standalone query, why does my second SELECT statement return an error when it's referencing a common table expression that seems like it should return a column called 'f0_'.
I assume it's something along the lines of 'f0_' not being a real name - it's just something that gets assigned in the absence of any specified name, or maybe the naming works differently when you run it as a common table expression rather than a simple SELECT statement. Is there a way I can alias the unnested date array within my common table expression so that I can access it in the second part of my query? Or some other solution?

Just use an alias:
WITH calendar AS(
SELECT *
FROM UNNEST(GENERATE_DATE_ARRAY('2020-01-01', CURRENT_DATE(), INTERVAL 1 DAY)) as dt
)
SELECT dt
FROM calendar

Related

Using calculation with an an aliased column in ORDER BY

As we all know, the ORDER BY clause is processed after the SELECT clause, so a column alias in the SELECT clause can be used.
However, I find that I can’t use the aliased column in a calculation in the ORDER BY clause.
WITH data AS(
SELECT *
FROM (VALUES
('apple'),
('banana'),
('cherry'),
('date')
) AS x(item)
)
SELECT item AS s
FROM data
-- ORDER BY s; -- OK
-- ORDER BY item + ''; -- OK
ORDER BY s + ''; -- Fails
I know there are alternative ways of doing this particular query, and I know that this is a trivial calculation, but I’m interested in why the column alias doesn’t work when in a calculation.
I have tested in PostgreSQL, MariaDB, SQLite and Oracle, and it works as expected. SQL Server appears to be the odd one out.
The documentation clearly states that:
The column names referenced in the ORDER BY clause must correspond to
either a column or column alias in the select list or to a column
defined in a table specified in the FROM clause without any
ambiguities. If the ORDER BY clause references a column alias from
the select list, the column alias must be used standalone, and not as
a part of some expression in ORDER BY clause:
Technically speaking, your query should work since order by clause is logically evaluated after select clause and it should have access to all expressions declared in select clause. But without looking at having access to the SQL specs I cannot comment whether it is a limitation of SQL Server or the other RDBMS implementing it as a bonus feature.
Anyway, you can use CROSS APPLY as a trick.... it is part of FROM clause so the expressions should be available in all subsequent clauses:
SELECT item
FROM t
CROSS APPLY (SELECT item + '') AS CA(item_for_sort)
ORDER BY item_for_sort
It is simply due to the way expressions are evaluated. A more illustrative example:
;WITH data AS
(
SELECT * FROM (VALUES('apple'),('banana')) AS sq(item)
)
SELECT item AS s
FROM data
ORDER BY CASE WHEN 1 = 1 THEN s END;
This returns the same Invalid column name error. The CASE expression (and the concatenation of s + '' in the simpler case) is evaluated before the alias in the select list is resolved.
One workaround for your simpler case is to append the empty string in the select list:
SELECT
item + '' AS s
...
ORDER BY s;
There are more complex ways, like using a derived table or CTE:
;WITH data AS
(
SELECT * FROM (VALUES('apple'),('banana') AS sq(item)
),
step2 AS
(
SELECT item AS s FROM data
)
SELECT s FROM step2 ORDER BY s+'';
This is just the way that SQL Server works, and I think you could say "well SQL Server is bad because of this" but SQL Server could also say "what the heck is this use case?" :-)

PostgreSQL limiting results by year

i have a working PostgreSQL query, column "code" is common in both tables and table test.a has date column and i want to limit search results on year, date format is like ( 2010-08-25 )
SELECT *
FROM test.a
WHERE form IN ('xyz')
AND code IN (
SELECT code
FROM test.city)
any help is appreciated
To return rows with date_col values in the year 2010:
SELECT *
FROM test.a
WHERE form = 'xyz'
AND EXISTS (
SELECT 1
FROM test.city
WHERE code = a.code
)
AND date_col >= '2010-01-01'
AND date_col < '2011-01-01';
This way, the query can use an index on date_col (or, ideally on (form, date_col) or (form, code, date_col) for this particular query). And the filter works correctly for data type date and timestamp alike (you did not disclose data types, the "date format" is irrelevant).
If performance is of any concern, do not use an expression like EXTRACT(YEAR FROM dateColumn) = 2010. While that seems clean and simple to the human eye it kills performance in a relational DB. The left-hand expression has to be evaluated for every row of the table before the filter can be tested. What's more, simple indexes cannot be used. (Only an expression index on (EXTRACT(YEAR FROM dateColumn)) would qualify.) Not important for small tables, crucial for big tables.
EXISTS can be faster than IN, except for simple cases where the query plan ends up being the same. The opposite NOT IN can be a trap if NULL values are involved, though:
Select rows which are not present in other table
If by "limit" you mean "filter", then I can give you an option
SELECT
*
FROM
test_a
WHERE
form IN ('xyz')
AND code IN (
SELECT code
FROM test_city
)
AND EXTRACT(YEAR FROM dateColumn) = 2010;
db-fiddle for you to run and play with it: https://www.db-fiddle.com/f/5ELU6xinJrXiQJ6u6VH5/6

How to extract record's table name when using Table wildcard functions [duplicate]

I have a set of day-sharded data where individual entries do not contain the day. I would like to use table wildcards to select all available data and get back data that is grouped by both the column I am interested in and the day that it was captured. Something, in other words, like this:
SELECT table_id, identifier, Sum(AppAnalytic) as AppAnalyticCount
FROM (TABLE_QUERY(database_main,'table_id CONTAINS "Title_" AND length(table_id) >= 4'))
GROUP BY identifier, table_id order by AppAnalyticCount DESC LIMIT 10
Of course, this does not actually work because table_id is not visible in the table aggregation resulting from the TABLE_QUERY function. Is there any way to accomplish this? Some sort of join on table metadata perhaps?
This functionality is available now in BigQuery through _TABLE_SUFFIX pseudocolumn. Full documentation is at https://cloud.google.com/bigquery/docs/querying-wildcard-tables.
Couple of things to note:
You will need to use Standard SQL to enable table wildcards
You will have to rename _TABLE_SUFFIX into something else in your SELECT list, i.e. following example illustrates it
SELECT _TABLE_SUFFIX as table_id, ... FROM `MyDataset.MyTablePrefix_*`
Not available today, but something I'd love to have too. The team takes feature requests seriously, so thanks for adding support for this one :).
In the meantime, a workaround is doing a manual union of a SELECT of each table, plus an additional column with the date data.
For example, instead of:
SELECT x, #TABLE_ID
FROM table201401, table201402, table201303
You could do:
SELECT x, month
FROM
(SELECT x, '201401' AS month FROM table201401),
(SELECT x, '201402' AS month FROM table201402),
(SELECT x, '201403' AS month FROM table201403)

Is there a way to select table_id in a Bigquery Table Wildcard Query

I have a set of day-sharded data where individual entries do not contain the day. I would like to use table wildcards to select all available data and get back data that is grouped by both the column I am interested in and the day that it was captured. Something, in other words, like this:
SELECT table_id, identifier, Sum(AppAnalytic) as AppAnalyticCount
FROM (TABLE_QUERY(database_main,'table_id CONTAINS "Title_" AND length(table_id) >= 4'))
GROUP BY identifier, table_id order by AppAnalyticCount DESC LIMIT 10
Of course, this does not actually work because table_id is not visible in the table aggregation resulting from the TABLE_QUERY function. Is there any way to accomplish this? Some sort of join on table metadata perhaps?
This functionality is available now in BigQuery through _TABLE_SUFFIX pseudocolumn. Full documentation is at https://cloud.google.com/bigquery/docs/querying-wildcard-tables.
Couple of things to note:
You will need to use Standard SQL to enable table wildcards
You will have to rename _TABLE_SUFFIX into something else in your SELECT list, i.e. following example illustrates it
SELECT _TABLE_SUFFIX as table_id, ... FROM `MyDataset.MyTablePrefix_*`
Not available today, but something I'd love to have too. The team takes feature requests seriously, so thanks for adding support for this one :).
In the meantime, a workaround is doing a manual union of a SELECT of each table, plus an additional column with the date data.
For example, instead of:
SELECT x, #TABLE_ID
FROM table201401, table201402, table201303
You could do:
SELECT x, month
FROM
(SELECT x, '201401' AS month FROM table201401),
(SELECT x, '201402' AS month FROM table201402),
(SELECT x, '201403' AS month FROM table201403)

HQL count from multiple tables

I would like to query my database using a HQL query to retrieve the total number of rows having a MY_DATE greater than SOME_DATE.
So far, I have come up with a native Oracle query to get that result, but I am stuck when writing in HQL:
SELECT
(
SELECT COUNT(MY_DATE)
FROM Table1
WHERE MY_DATE >= TO_DATE('2011-09-07','yyyy-MM-dd')
)
+
(
SELECT COUNT(MY_DATE)
FROM Table2
WHERE MY_DATE >= TO_DATE('2011-09-07','yyyy-MM-dd')
)
AS total
I actually have more than 2 tables but I keep having an IllegalArgumentException (unexpected end of subtree).
The working native Oracle basically ends with FROM dual.
What HQL query should I use to get the total number of rows I want?
First of, if you have a working SQL query, why not just use that instead of trying to translate it to HQL? Since you're returning a single scalar in the first place, it's not like you need anything HQL provides (e.g. dependent entities, etc...)
Secondly, do you have 'dual' mapped in Hibernate? :-) If not, how exactly are you planning on translating that?
That said, "unexpected end of subtree" error is usually caused by idiosyncrasies of Hibernate's AST parser. A commonly used workaround is to prefix the expression with '0 +':
select 0 + (
... nested select #1 ...
) + (
... nested select #2 ...
) as total
from <from what exactly?>