Clarification when using the OVER clause - sql

new to ORACLE 11g and I noticed the OVER clause seems pretty useful for some analytics. I'm having some issues understanding the syntax I believe even after looking at the ORACLE manual on the OVER clause.
I'm trying to get the cumulative amount for all gifts donated in chronological order. This is all from only one table, Donations which includes all the columns seen below in the query.
SELECT Donations.donationid, Donation.charity, Donation.giftdate, Donation.amount, SUM(Donation.amount)
OVER (ORDER BY Donations.amount) AS Total_Gift_Amount
FROM Donations.donations
ORDER BY Total_Gift_Amount DESC;
I thought I was on the right track but there is something I'm missing that's making my columns be out of scope. The error I receive is
Error at line 1: ORA-00904:"DONATION"."AMOUNT": invalid identifier (its the SUM(donations.donations))
Donations table includes: DonationID, Charity, Amount, GiftDate, DonorID
My main confusion is that when I DONT use the OVER clause I can get the result set no problem. However, when I try using the OVER I start to get lots of syntax errors and things of that nature. I want to learn how to use OVER properly though.
I know that error message usually is when you type an invalid column header or if it is out of scope. Why wouldn't it be able to see that Donations.amount is a valid column name? I could just be messing up the syntax of this new clause.

The error has nothing to do with analytical (window) functions.
The table is simply named Donations, in plural, and one of the columns you're selecting is Donation.amount, with Donation, in singular. Slap on the missing "s" there and you should be fine.

Related

Can't query one record in BigQuery table, but can query others

I export Google Workspace logs to BigQuery. There are a small number of top-level records and then many nested groups of records. I can query the top level of records and most sub-levels fine but I can't select the groups records. select group_id,admin.user_email,admin.group_email works fine, for example.
But when I try to run a very similar query on the Groups records it fails with Syntax error: Expected end of input but got keyword GROUPS
SELECT
group_id,
groups.group_email
FROM
`workspace-analytics.workspace_prod.activity`
WHERE
groups.group_email='group#domain.com'
LIMIT
100;
What am I doing wrong? Why does this record in particular refuse to work the way the others do?
Answer from #MatBailie, posting it as a WikiAnswer:
The error message tells you that GROUPS is a keyword. If you quote it, then bigquery will realise its a reference and not a keyword. groups.group_email.
Because admin isn't a keyword. Imagine you had a column named from, you couldn't do SELECT from FROM table without confusing the shit out of the parser, but SELECT from FROM table isn't ambiguous at all. You can CHOOSE to quote all references regardless, but if they're keywords then they MUST be quoted.
Make sure you're quoting using backticks, the same ones you use in dataset names.

Select Distinct Not working in Spring with DSL Context

I have the ff. code.
#Override
public Optional<TransactionJournalRecord> findByReferenceNumber(final String referenceNumber) {
return this.dsl
.select(TRANSACTION_JOURNAL.fields())
.distinctOn(TRANSACTION_JOURNAL.CUSTOMER_NUMBER)
.from(TRANSACTION_JOURNAL)
.where(TRANSACTION_JOURNAL.REFERENCE_NUMBER.eq(referenceNumber))
.fetchOptionalInto(TransactionJournalRecord.class);
}
All I want it to do is to query a specific reference number but only getting the first distinct ref. no. as I want other duplicate transactions with the same ref no and customer number to be processed later on.
But I kept getting this error of
org.springframework.jdbc.BadSqlGrammarException: Access database using jOOQ; bad SQL grammar [select distinct on (`transaction_journal`.`customer_number`) `transaction_journal`.`id`, `transaction_journal`.`reference_number`, `transaction_journal`.`future_dated_transaction_id`, `transaction_journal`.`send_money_type_id`, `transaction_journal`.`source_account_number`, `transaction_journal`.`source_account_type`, `transaction_journal`.`customer_number`, `transaction_journal`.`request_id`, `transaction_journal`.`destination_account_number`, `transaction_journal`.`destination_account_type`, `transaction_journal`.`destination_validation`, `transaction_journal`.`transfer_schedule_type`, `transaction_journal`.`currency_id`, `transaction_journal`.`amount`, `transaction_journal`.`service_fee`, `transaction_journal`.`transaction_date`, `transaction_journal`.`posting_date`, `transaction_journal`.`status`, `transaction_journal`.`remarks`, `transaction_journal`.`created_date`, `transaction_journal`.`updated_date`, `transaction_journal`.`source_account_name`, `transaction_journal`.`username`, `transaction_journal`.`reason`, `transaction_journal`.`card_number`, `transaction_journal`.`status_remarks`, `transaction_journal`.`creditor_bank_code`, `transaction_journal`.`creditor_details`, `transaction_journal`.`mobile_number`, `transaction_journal`.`address`, `transaction_journal`.`channel_id`, `transaction_journal`.`system`, `transaction_journal`.`purpose_of_transaction`, `transaction_journal`.`esb_posted_date`, `transaction_journal`.`currency_id_destination`, `transaction_journal`.`gl_pa_status`, `transaction_journal`.`gl_sf_status`, `transaction_journal`.`gl_status_remarks`, `transaction_journal`.`email_address`, `transaction_journal`.`exchange_rate`, `transaction_journal`.`contact_type`, `transaction_journal`.`contact_value`, `transaction_journal`.`is_validated` from `transaction_journal` where `transaction_journal`.`reference_number` = ?]; nested exception is java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'on (`transaction_journal`.`customer_number`) `transaction_journal`.`id`, `transa' at line 1
at org.jooq_3.11.12.MYSQL_8_0.debug(Unknown Source)
Using DISTINCT ON
You're not using DISTINCT, you're using DISTINCT ON, which is a PostgreSQL vendor specific SQL feature. In newer versions of jOOQ, DISTINCT ON is being emulated using window functions for other dialects, so you might want to upgrade.
You'll still need an ORDER BY clause for DISTINCT ON to work. It's a bit of an esoteric PostgreSQL invention, adding to the confusion of the logical order of operations in SQL.
Using LIMIT
While what you want to do is possible with DISTINCT ON, it seems overly complicated. Here's a much simpler way to solve your problem, producing an arbitrary record, or optionally, if you uncomment the ORDER BY clause, the first/last record given some ordering:
SELECT *
FROM transaction_journal
WHERE transaction_journal.reference_number = :referenceNumber
-- ORDER BY something
LIMIT 1
With jOOQ:
#Override
public Optional<TransactionJournalRecord> findByReferenceNumber(
final String referenceNumber
) {
return this.dsl
.selectFrom(TRANSACTION_JOURNAL)
.where(TRANSACTION_JOURNAL.REFERENCE_NUMBER.eq(referenceNumber))
// .orderBy(something)
.limit(1)
.fetchOptional();
}
Using GROUP BY
Note that in MySQL, if strict mode is turned off, then the GROUP BY approach you've mentioned in your comments will also produce an arbitrary value for all non-GROUP BY columns, which is not correct standard SQL.
Unlike as with DISTINCT ON or LIMIT, you have no control over which value is being produced. As a matter of fact, you can't even be sure if two non-GROUP BY values belong to the same record. It is never a good idea to depend on this outdated, MySQL-specific behaviour.
Using DISTINCT
There is no way to solve this with DISTINCT only. If you don't have a unique constraint on your search criteria, then you will always get duplicates, which will throw an exception when using fetchOptional(), in jOOQ.

SQL relaX error il18n is not defined. Trying to find Students (IDs and names) along with info about courses (course_ids) they took more than 1 time

I am using the Silberschatz - UniversityDB gist f03130d8e6a7f0a9bcba3190fee1f0a8 with the relaX calulator.
This is the problem I am having. I am not sure what this error means when using the relaX relational algebra calulator, all of the problems I have seen with this error is related to web design. Any assistance would be greatly appreciated.
There are several errors in your query.
You select from student and course, but mention a table or view called takes in the ON clause.
You group by course_id (so as to get one result row per course), but select student.id, student.name and takes.course_id. This is invalid, because there can be multiple students per course, so which one to select for the group? You would need aggreation functions to select student data (e.g. MAX(student.id) for the highest student ID). And again: there is no table takes in your from clause.
You are trying to create an alias in GROUP BY which is not allowed.
The error message i18n is not defined doesn't seem to make any sense here, though. Well, fix your errors and maybe you get thus rid of the strange message, too. Maybe the app is just trying to give you an error message in your language and fails to do so for some internal error with internationalization (which is what i18n stands for).

Active Record embed Table.where('x').count inside of select statement

I'm setting up an AR query that is basically meant to find an average of a few values that span three different tables. I'm getting hung up on how to embed the result of a particular Count query inside of the Active Record select statement.
Just by itself, this query returns "3":
Order.where(user_id: 319).count => 3
My question is, can I embed this into a select statement as a SQL alias similar to below:
Table.xxxxxx.select("Order.where(user_id: 319).count AS count,user_id, SUM(quantity*current_price) AS revenue").xxxxx
It seems to be throwing an error and generally not recognizing what I'm trying to do when I declare that first count alias. Any ideas on the syntax?
Well, after examining a bit, I cleared my mind into the ActiveRecord select() syntax.
It's a method that can take a variable length of parameters. So, your failing :
Table.xxxxxx.select("Order.where(user_id: 319).count AS count,user_id, SUM(quantity*current_price) AS revenue").xxxxx
After replacing proper SQL for your misplaced ActiveRecord statement, should be more of like this [be careful, you can't use as count in most cases, count is reserved]:
Table.xxxxx.select("(SELECT count(id) from orders where user_id=319) as usercount", "user_id","SUM(quantity*current_price) AS revenue").xxxx
But I guess you should need more a per-user_id-table.
So, I'd skip Models and go to direct SQL, always being careful to avoid injections:
ActiveRecord::Base.connection.execute('SELECT COUNT(orders.id) as usercount, users.id from users, orders where users.id=orders.user_id group by users.id')
This is simplified of course, you can apply the rest of the data (which I currently do not know) accordingly. The above simplified, not full solution, could be written also as:
Order.joins(:user).select("count(orders.id) as usercount, users.id").group(:user_id)

Can scalar functions be applied before filtering when executing a SQL Statement?

I suppose I have always naively assumed that scalar functions in the select part of a SQL query will only get applied to the rows that meet all the criteria of the where clause.
Today I was debugging some code from a vendor and had that assumption challenged. The only reason I can think of for this code failing is that the Substring() function is getting called on data that should have been filtered out by the WHERE clause. But it appears that the substring call is being applied before the filtering happens, the query is failing.
Here is an example of what I mean. Let's say we have two tables, each with 2 columns and having 2 rows and 1 row respectively. The first column in each is just an id. NAME is just a string, and NAME_LENGTH tells us how many characters in the name with the same ID. Note that only names with more than one character have a corresponding row in the LONG_NAMES table.
NAMES: ID, NAME
1, "Peter"
2, "X"
LONG_NAMES: ID, NAME_LENGTH
1, 5
If I want a query to print each name with the last 3 letters cut off, I might first try something like this (assuming SQL Server syntax for now):
SELECT substring(NAME,1,len(NAME)-3)
FROM NAMES;
I would soon find out that this would give me an error, because when it reaches "X" it will try using a negative number for in the substring call, and it will fail.
The way my vendor decided to solve this was by filtering out rows where the strings were too short for the len - 3 query to work. He did it by joining to another table:
SELECT substring(NAMES.NAME,1,len(NAMES.NAME)-3)
FROM NAMES
INNER JOIN LONG_NAMES
ON NAMES.ID = LONG_NAMES.ID;
At first glance, this query looks like it might work. The join condition will eliminate any rows that have NAME fields short enough for the substring call to fail.
However, from what I can observe, SQL Server will sometimes try to calculate the the substring expression for everything in the table, and then apply the join to filter out rows. Is this supposed to happen this way? Is there a documented order of operations where I can find out when certain things will happen? Is it specific to a particular Database engine or part of the SQL standard? If I decided to include some predicate on my NAMES table to filter out short names, (like len(NAME) > 3), could SQL Server also choose to apply that after trying to apply the substring? If so then it seems the only safe way to do a substring would be to wrap it in a "case when" construct in the select?
Martin gave this link that pretty much explains what is going on - the query optimizer has free rein to reorder things however it likes. I am including this as an answer so I can accept something. Martin, if you create an answer with your link in it i will gladly accept that instead of this one.
I do want to leave my question here because I think it is a tricky one to search for, and my particular phrasing of the issue may be easier for someone else to find in the future.
TSQL divide by zero encountered despite no columns containing 0
EDIT: As more responses have come in, I am again confused. It does not seem clear yet when exactly the optimizer is allowed to evaluate things in the select clause. I guess I'll have to go find the SQL standard myself and see if i can make sense of it.
Joe Celko, who helped write early SQL standards, has posted something similar to this several times in various USENET newsfroups. (I'm skipping over the clauses that don't apply to your SELECT statement.) He usually said something like "This is how statements are supposed to act like they work". In other words, SQL implementations should behave exactly as if they did these steps, without actually being required to do each of these steps.
Build a working table from all of
the table constructors in the FROM
clause.
Remove from the working table those
rows that do not satisfy the WHERE
clause.
Construct the expressions in the
SELECT clause against the working table.
So, following this, no SQL dbms should act like it evaluates functions in the SELECT clause before it acts like it applies the WHERE clause.
In a recent posting, Joe expands the steps to include CTEs.
CJ Date and Hugh Darwen say essentially the same thing in chapter 11 ("Table Expressions") of their book A Guide to the SQL Standard. They also note that this chapter corresponds to the "Query Specification" section (sections?) in the SQL standards.
You are thinking about something called query execution plan. It's based on query optimization rules, indexes, temporaty buffers and execution time statistics. If you are using SQL Managment Studio you have toolbox over your query editor where you can look at estimated execution plan, it shows how your query will change to gain some speed. So if just used your Name table and it is in buffer, engine might first try to subquery your data, and then join it with other table.