Getting SemanticException on a very simple query - hive

I have the following very simple query.
SELECT "test" FROM mydb.mytable
INNER JOIN yourdb.yourtable
ON yourtable.id = mytable.id LIMIT 10;
It is failing due to the following error (indicating syntax error):-
Error while compiling statement: FAILED: SemanticException MetaException(message:Exception thrown when executing query)
The column id is bigdata type.
Surprisingly it works if I keep the query as it is but just use a different table in place of mytable which is having exactly the same schema as mytable. Looks like Hive is not showing the meaningful error here. Anyone having clues here?
To further add to confusion check the following:-
-- SUCCESS – All the data up to 24th Sep 2016
SELECT "test" AS col FROM mydb.mytable INNER JOIN yourdb.yourtable ON yourtable.id = mytable.id where mytable.id <= 2016092423 LIMIT 1;
-- SUCCESS – All the data after 24Th Sep 2016
SELECT "test" AS col FROM mydb.mytable INNER JOIN yourdb.yourtable ON yourtable.id = mytable.id where mytable.id >= 2016092423 LIMIT 1;
-- ERROR – All the data up to 25th Sep 2016
SELECT "test" AS col FROM mydb.mytable INNER JOIN yourdb.yourtable ON yourtable.id = mytable.id where mytable.id <= 2016092523 LIMIT 1;
This is totally contradictory behavior. The first two queries cover the entire space. Considering the first two queries succeed it is totally surprising that the third query fails.
Let me depict it using a picture:-

Related

Query works in SQL Server but not in R

I wrote a query in SQL Server and it ran without a problem. Here's the query with the names changed for privacy reasons.
SELECT *
FROM table1 (nolock)
LEFT JOIN table2 (nolock)
ON table1.ID = table2.ID
WHERE table1.Date = '2021-03-05' AND table1.ID = '120';
This works fine and pulls 30k rows. I have created an ODBC connection using the DBI and odbc packages to the server in R. I can run queries from R just fine. I've run many without an issue. For example, this runs with no errors:
DBI::dbGetQuery(conn, believeNRows = FALSE, "
SELECT TOP 10 *
FROM table1 (nolock);
")
But when I include a LEFT JOIN, then the query in R fails and returns an error.
Here's the same query in R:
DBI::dbGetQuery(conn, believeNRows = FALSE, "
SELECT *
FROM table1 (nolock)
LEFT JOIN table2 (nolock)
ON table1.ID = table2.ID
WHERE table1.Date = '2021-03-05' AND table1.ID = '120';
")
I'm working in VSCode so the error message isn't very informative:
Error in app$vspace(new_style$margin-top %||% 0) : attempt to
apply non-function
Based on some other answers, I included a couple of extra options in the query, but they didn't help:
DBI::dbGetQuery(conn, believeNRows = FALSE, "
SET ANSI_WARNINGS OFF;
SET NOCOUNT ON;
SELECT *
FROM table1 (nolock)
LEFT JOIN table2 (nolock)
ON table1.ID = table2.ID
WHERE table1.Date = '2021-03-05' AND table1.ID = '120';
")
Does anyone have any idea why this perfectly good query isn't working when passed to SQL Server from within R?
In case any one else has this problem in the future, there were two hurdles in my way. First, relative to #r2evans comment, my error messages were being masked. Per a readr github issue, I reinstalled the cli package and started getting unmasked error messages. Reran my code and received this error:
Error:
! Column names `ID1` and `Col1` must not be duplicated.
Use .name_repair to specify repair.
Caused by error in `stop_vctrs()`:
! Names must be unique.
x These names are duplicated:
* "ID1" at locations 3 and 72.
* "Col1" at locations 45 and 75.
Run `rlang::last_error()` to see where the error occurred.
Turns out, the SQL call brings back all columns and SSMS removes the duplicate columns automatically. R, on the other hand, does not. So I changed my SELECT clause to choose the specific columns I wanted and the code worked.

How to do a join in hive with a subquery?

Im using hive 1.2.1 and I'm running into some problems when trying to join using a subquery.
My main table is applications and I'm trying to join it to table credits, based on account and dates. The date condition is giving me troubles when I try to get just one row (the credit has to be after the application, and it can only be one to avoid dupes in the join). I'm using the following code:
SELECT COUNT(1)
FROM applications apps
LEFT JOIN credits c
ON c.python_id =
(
SELECT python_id
FROM credits cr
WHERE cr.ind in ('NP','0P')
AND cr.acct_nbr = apps.acct_nbr
AND cr.date >= apps.date
ORDER BY cr.date DESC
LIMIT 1
)
I'm getting the following error:
[Code: 40000, SQL State: 42000] Error while compiling statement: FAILED: ParseException line 8:24 cannot recognize input near 'SELECT' 'python_id' 'FROM' in expression specification
Could you please help?
Thank you
Issue with your query is
> hive does not support sub query with equals clause, you can write sub query only for IN, NOT IN, EXISTS and NOT EXISTS clause.
> You cannot have a sub query which returns more than one row.
Please look into - [https://cwiki.apache.org/confluence/display/Hive/Subqueries+in+SELECT][1]
There is issue with you logic as well.
My understanding is You are trying to get count from main table wit left join and there is no filter condition defined on outer query to say what records you want.
So the count will always be equal to number of records in main table (applications), If you can provide sample data with expected input and output, we can help you with the query.
Hope this helps.
You should join based on act_nbr and use the date as a filter in the where clause.
SELECT COUNT(1)
FROM applications apps
JOIN credits c
ON c.acct_nbr = apps.acct_nbr
WHERE c.ind in ('NP','0P')
AND c.date >= apps.date
ORDER BY c.date DESC
LIMIT 1

Why does Oracle SQL update query return "invalid identifier" on existing column?

I have an update query for an Oracle SQL db. Upon execution the query returns ORA-00904: "t1"."sv_id": invalid identifier
So, why do I get an "invalid identifier" error message although the column exists?
Here is the complete query (replaced actual table and column names by dummies in np++)
UPDATE table_1 t1 SET (type) =
CASE
WHEN
((SELECT COUNT(dateCheck.id) FROM table_2 dateCheck
WHERE dateCheck.s_id = t1.s_id AND dateCheck.sv_id = t1.sv_id) = 0)
THEN
(SELECT sv.type FROM table_3 sv WHERE sv.id = t1.sv_id)
ELSE
(SELECT type FROM
(SELECT d.type as type FROM table_2 d
WHERE d.s_id = t1.s_id AND d.sv_id = t1.sv_id
ORDER BY d.creationTimestamp ASC)
WHERE ROWNUM = 1)
END
Now I don't understand why that error occurs.
Here is what I already know:
The Queries in the CASE statement work when executed separately, provided they are wrapped into a query that provides table_1 t1 for sure.
t1.s_id seems to work since oracle doesn't complain about that. When i change it to a column that really doesn't exist, oracle starts complaining about that non existent column before returning something about t1.sv_id. So somehow the alias might work, although I'm not sure about it.
I'm 100% sure that the column t1.sv_id exists and no typo was made. Executed a query on t1 directly and doublechecked everything in notepad by marking all occurences.
An (completely unrelated) update query like the following works as well (note the alias t1 is used in the select query). Don't assume table_1/2 to be the same as in the update query above, just reused the names. This should just illustrate that I successfully used an alias in an update query before.
update table_1 t1 set (t2_id) = (select id from table_2 t2 where t1.id = t2.t1_id)
UPDATE
Thx a lot for pointing me to the "you don't have access to alises in deeper suquery layers" issue. That got me on track again pretty fast.
So here is the query I ended up with. This seems to work fine. Eliminates the acces to t1 in the deeper layers and selects the oldest row, so that the same result should be returned from the query I expected from the original query in the ELSE part.
UPDATE table_1 t1 SET (type) =
CASE
WHEN
((SELECT COUNT(dateCheck.id) FROM table_2 dateCheck
WHERE dateCheck.s_id = t1.s_id AND dateCheck.sv_id = t1.sv_id) = 0)
THEN
(SELECT sv.type FROM table_3 sv WHERE sv.id = t1.sv_id)
ELSE
(SELECT d.type as type FROM table_2 d
WHERE d.s_id = t1.s_id
AND d.sv_id = t1.sv_id
AND d.creation = (SELECT MIN(id.creation) FROM table_2 id
WHERE d.s_id = id.s_id AND d.sv_id = id.sv_id))
END
You can't reference a table alias in a subquery of a subquery; the alias doesn't apply (or doesn't exist, or isn't in scope, depending on how you prefer to look at it). With the code you posted the error is reported against line 11 character 24, which is:
(SELECT type FROM
(SELECT d.type as type FROM table_2 d
WHERE d.s_id = t1.s_id AND d.sv_id = t1.sv_id
^^^^^^^^
If you change the t1.s_id reference on the same line to something invalid then the error doesn't change and is still reported as ORA-00904: "T1"."SV_ID": invalid identifier. But if you change the same reference on line 5 instead to something like
((SELECT COUNT(dateCheck.id) FROM table_2 dateCheck
WHERE dateCheck.s_id = t1.s_idXXX AND dateCheck.sv_id = t1.sv_id) = 0)
... then the error changes to ORA-00904: "T1"."S_IDXXX": invalid identifier. This is down to how the statement is being parsed. In your original version the subquery in the WHEN clause is value, and you only break it by changing that identifier. The subquery in the ELSE is also OK. But the nested subquery in the ELSE has the problem, and changing the t1.s_id in that doesn't make any difference because the parser reads that part of the statement backwards (I don't know, or can't remember, why!).
So you have to eliminate the nested subquery. A general approach would be to make the whole CASE an inline view which you can then join using s_id and sv_id, but that's complicated as there may be no matching table_2 record (based on your count); and there may be no s_id value to match against as that isn't being checked in table_3.
It isn't clear if there will always be a table_3 record even then there is a table_2 record, or if they're mutually exclusive. If I've understood what the CASE is doing then I think you can use an outer join between those two tables and compare the combined data with the row you're updating, but because of that ambiguity it needs to be a full outer join. I think.
Here's a stab at using that construct with a MERGE instead of an update.
MERGE INTO table_1 t1
USING (
SELECT t2.s_id,
coalesce(t2.sv_id, t3.id) as sv_id,
coalesce(t2.type, t3.type) as type,
row_number() over (partition by t2.s_id, t2.sv_id
order by t2.creationtimestamp) as rn
FROM table_2 t2
FULL OUTER JOIN table_3 t3
ON t3.id = t2.sv_id
) tmp
ON ((tmp.s_id is null OR tmp.s_id = t1.s_id) AND tmp.sv_id = t1.sv_id AND tmp.rn = 1)
WHEN MATCHED THEN UPDATE SET t1.type = tmp.type;
If there will always be a table_3 record then you could use that as the driver and have a left outer join to table_2 instead, but hard to tell which might be appropriate. So this is really just a starting point.
SQL Fiddle with some made-up data that I believe would have hit both branches of your case. More realistic data would expose the flaws and misunderstandings, and suggest a more robust (or just more correct) approach...
Your query and your analysis seems sound to me. I have no solution but a few things you can try to maybe trigger something that explains this odd behavior:
Quote the column (just in case it happens to be a SQL keyword).
Use table_1.sv_id - this works as long as the whole query contains this table only once.
Make sure that the alias t1 exists only once
Run the query with a query tool like SQuirrel SQL - the tool can examine the exact position where Oracle reports the problem. Maybe it's in a different place of the query than you think
Check () and make sure they are around the parts where they should be.
Swap the order of expressions around =

How does SQL Server Update rows with more than one value?

In an update statement for a temp table, how does SQL Server decide which value to use when there are multiple values returned, for example:
UPDATE A
SET A.dte_start_date = table1.dte_start_date
FROM #temp_table A
INNER JOIN table1 ON A.id = table1.id
In this situation the problem is more than one dte_start_date is returned for each id value in the temp table. There is there's no index or unique value in the tables I'm working on so I need to know how SQL Server will choose between the different values.
It is non-deterministic. See the following example for a better understanding. Though it is not exactly the same scenario explained here, it is pretty similar
When the single value is to be retrieved from the database also use the SET statement with a query to set the value. For example:
SET #v_user_user_id = (SELECT u.user_id FROM users u WHERE u.login = #v_login);
Reason: Unlike Oracle, SQL Server does not raise an error if more than one row is returned from a SELECT query that is used to populate variables. The above query will throw an exception whereas the following will not throw an exception and the variable will contain a random value from the queried table(s).
SELECT #v_user_user_id = u.user_id FROM users u WHERE u.login = #v_login;
It is non-deterministic which value is used if you have a one two many relationship.
In MS-SQL-Sever (>=2005) i would use a CTE since it's a readable way to specify what i want using ROW_NUMBER. Another advantage of a CTE is that you can change it easily to do a select instead of an update(or delete) to see what will happen.
Assuming that you want the latest record(acc.to dte_start_date) for every id:
WITH CTE AS
(
SELECT a.*, rn = ROW_NUMBER() OVER (PARTITION BY a.id
ORDER BY a.dte_start_date DESC)
FROM #temp_table A
INNER JOIN table1 ON A.id = table1.id
)
UPDATE A
SET A.dte_start_date = table1.dte_start_date
FROM #temp_table A INNER JOIN CTE ON A.ID = CTE.ID
WHERE CTE.RN = 1

SQL Select from table twice

I am trying to select from the same table twice within SQL.
I have a POLICIES table that has an index (NEXTPOLICYID) that refers to itself.
I need to compare the current premium with the estimated premium.
How can I get a result that shows the following on the same result row?
t1 = Current
t2 = Future
End result should be:
t1.POLICIES_ID | t1.WRITTEN_PREMIUM | t2.POLICIES_ID | t2.ESTIMATED_PREMIUM
This is what I have right now, and I am getting an error on my join statement, but I fear that is not my only problem.
SELECT
t1.POLICIES_ID, t1.WRITTEN_PREMIUM, t1.NEXTPOLICYID, t2.ESTIMATED_PREMIUM
FROM
POLICIES t1 JOIN
POLICIES t2
ON t1.NEXTPOLICYID = t2.POLICIES_ID
I am getting the following error:
Message: odbc_exec(): SQL error: [Rocket U2][UVODBC][1401233]Error ID: 29 Severity: ERROR Facility: FPSRVERR - Line 5, column 17 (around "JOIN"): Syntax error., SQL state S1000 in SQLExecDirect
This is an ODBC Connection to a uniVerse database, I have tested this with many other functions and it works fine. This error tells me it does not like something before the JOIN statement.
Thank you
Apart from the comma, the only other issues are:
You don't need to include t2.POLICIES_ID in the select list, because you have t1.NEXTPOLICYID
You might want to consider a left outer join, to keep policies that have no next policy.
The query might be:
SELECT t1.POLICIES_ID, t1.WRITTEN_PREMIUM, t1.NEXTPOLICYID,
t2.ESTIMATED_PREMIUM
FROM POLICIES t1 JOIN
POLICIES t2
ON t1.NEXTPOLICYID = t2.POLICIES_ID;
This:
SELECT
t1.POLICIES_ID,
t1.WRITTEN_PREMIUM
t1.NEXTPOLICYID,
t2.ESTIMATED_PREMIUM,
t2.POLICIES_ID
FROM
POLICIES t1,
JOIN POLICIES t2 ON t1.NEXTPOLICYID = t2.POLICIES_ID
has some issues with commas in the wrong places and should be:
SELECT
t1.POLICIES_ID,
t1.WRITTEN_PREMIUM,
t1.NEXTPOLICYID,
t2.ESTIMATED_PREMIUM,
t2.POLICIES_ID
FROM
POLICIES t1
JOIN POLICIES t2 ON t1.NEXTPOLICYID = t2.POLICIES_ID
I'm guessing that's the reason for the error.