Athena subquery not working - incorrect table name - sql

I'm trying to write an athena query that uses a subquery in the where clause because I want to place a restriction on an array-type field. I don't want to do a cross join unnest since I don't want to flatten each row.
Example query:
SELECT
foo.some_scalar_field,
foo.some_other_scalar_field
FROM "fake_db"."table_name"
WHERE EXISTS (SELECT NULL FROM foo.some_array_field as T(item) WHERE item = "THING")
When I do this, I get an error saying that "foo" isn't a valid table name. I'm a bit new to athena, however a previous SQL engine I used to use supported this kind of query. Is doing this not possible in Athena?
Edit;;
I went ahead and added an unnest:
SELECT
foo.some_scalar_field,
foo.some_other_scalar_field
FROM "fake_db"."table_name"
WHERE EXISTS (SELECT NULL FROM UNNEST(foo.some_array_field) as T(item) WHERE item = "THING")
Now i get an error that the correlated subquery here is not supported. It seems like Athena doesn't support correlated subqueries?

There is no need for unnest and correlated subqueries, just use array functions. Something along this lines:
SELECT
foo.some_scalar_field,
foo.some_other_scalar_field
FROM "fake_db"."table_name"
WHERE any_match(foo.some_array_field, item -> item = "THING")

Related

Cross joining to an unnested mapping field in HQL (works in athena, not in Hive)

So I have two (mapping) fields I need to unpack and break out into rows. In athena, I can use the following approach (to unpack iether of them:
SELECT
unique_id,
key,
value
FROM
(
select
unique_id,
purchase_history
from table
)
CROSS JOIN unnest(purchase_history) t(key,value)
This works perfectly in athena, I get 1 row for each purchase along with their unique identifier. However, when I try to test it in Hive it doesn't work. Is there anything specific in here that doesn't fly in HQL? I think cross joins are allowed, but perhaps the way I am calling the field isn't working? Or is it the "unnest"? Please let me know if you need further explanation.
You can do the same in Hive using lateral view explode, if purchase_history is of type map, this will work:
SELECT
s.unique_id,
t.key,
t.value
FROM
(
select
unique_id,
purchase_history
from table
) s --alias for sub-queries is a must in Hive
lateral view explode(s.purchase_history) t as key,value

Matching multiple columns in a nested query

I'm trying to make a nested query, where I return values from multiple columns and match those with values from another column. Something similar to the code below (this is for work so I can't put the exact code here)
select *
from store1
where (orderNum, customer, total) not in (
select orderNum, customer, total from store2)
When I try to run the code, I get the error:
An expression of non-boolean type specified in a context where a condition is expected, near ','.
Is there a way to do this is SQL Server. I know using a join is an option but I'd prefer to avoid that at this time. Thanks for any help!
SQL Server doesn't support table constructors like that, but there are lots of other ways to do it. My preferred approach is NOT EXISTS:
SELECT * FROM dbo.store1
WHERE NOT EXISTS
(
SELECT 1 FROM dbo.store2
WHERE store1.orderNum = store2.orderNum
AND store1.customer = store2.customer
AND store1.total = store2.total
);
For other approaches, see:
Should I use NOT IN, OUTER APPLY, LEFT OUTER JOIN, EXCEPT, or NOT EXISTS?
"count" returns a single value, works fast. Easy to understand.
select *
from store1
where
(select count(*) from store2 where store2.orderNum=store1.orderNum and store2.customer=store1.customer and store2.total=store1.total)=0

BigQuery - running count and split functions together

I am trying to run a count on the result of the split functions. The query below shows an example:
select a.name,
count(if(split(b.name,",")='test',null,1)) > 0 hasTest,
from (select * from (select 'test,this' as name) a left join (select '2' as name) b on
a.name=b.name)
This query yields an error: SELECT clause has mix of aggregations 'hasTest' and fields 'a.name' without GROUP BY clause
If I change the hasTest column to be an integer instead of boolean, so that:
count(if(split(b.name,",")='test',null,1))
The query succeeds.
For some reason BigQuery knows how to evaluate the count function (working on a nested element created in place, therefore not requiring a group by clause), but is not able to take the same capability when the count is wrapped in a boolean operator.
I think it's just an unclear error message.
The problem here seems to be with the data type of null. bq needs you to define nulls data type. The default null data type is boolean. If you don't define it, there is a mix of data types in the same field.

CROSS JOIN of query and single-row table

I have a big query and a table with a single row (I store some constants in it).
What is the best way to join the row of the table with every row of the query considering that Access doesn't support cross joins with queries?
SELECT * from (subquery), table -- Invalid in Access
Access will accept a cross join between a query named some_query and a table named some_table like this ...
SELECT *
FROM some_query, some_table;
With your names, try it this way ...
SELECT * from [some query], [table]
IOW, get rid of the parentheses, and enclose the data source names in square brackets because of the space in some query and because table is a reserved word.
OTOH, if you meant some query to be a placeholder for the text of a SQL statement instead of the name of a saved query, consider this example.
SELECT *
FROM
(SELECT * FROM agents) AS sub, Dual;
According to Microsoft and this previous question, cross joins are legal. You say is it invalid, but did you get an error message when you tried?

SQL Query Syntax : Using table alias in a count is invalid? Why?

Could someone please explain to me why the following query is invalid? I'm running this query against an Oracle 10g database.
select count(test.*) from my_table test;
I get the following error: ORA-01747: invalid user.table.column, table.column, or column specification
however, the following two queries are valid.
select count(test.column) from my_table test;
select test.* from my_table test;
COUNT(expression) will count all rows where expression is not null. COUNT(*) is an exception, it returns the number of rows: * is not an alias for my_table.*.
As far as I know, Count(Table.*) is not officially supported in the SQL specification. Only Count(*) (count all rows returned) and Count(Table.ColumnName) (count all non-null values in the given column). So, even if the DBMS supported it, I would recommend against using it.`
This syntax only works in PostgreSQL and only because it has a record datatype (for which test.* is a meaningful expression).
Just use COUNT(*).
This query:
select count(test.column) from my_table test;
will return you the number of records for which test.column is not NULL.
This query:
select test.* from my_table test;
will just return you all records from my_table.
COUNT as such is probably the only aggregate that makes sense without parameters, and using an expression like COUNT(*) is just a way to call a function without providing any actual parameters to it.
You might reasonably want to find the number of records where test.column is not NULL if you are doing an outer join. As every table should have a PK (which is not null) you should be able to count the rows like that if you want:
select count(y.pk)
from x
left outer join y on y.pk = x.ck
COUNT(*) is no good here because the outer join is creating a null row for the table that is deficient in information.