Select Distinct Not working in Spring with DSL Context - sql

I have the ff. code.
#Override
public Optional<TransactionJournalRecord> findByReferenceNumber(final String referenceNumber) {
return this.dsl
.select(TRANSACTION_JOURNAL.fields())
.distinctOn(TRANSACTION_JOURNAL.CUSTOMER_NUMBER)
.from(TRANSACTION_JOURNAL)
.where(TRANSACTION_JOURNAL.REFERENCE_NUMBER.eq(referenceNumber))
.fetchOptionalInto(TransactionJournalRecord.class);
}
All I want it to do is to query a specific reference number but only getting the first distinct ref. no. as I want other duplicate transactions with the same ref no and customer number to be processed later on.
But I kept getting this error of
org.springframework.jdbc.BadSqlGrammarException: Access database using jOOQ; bad SQL grammar [select distinct on (`transaction_journal`.`customer_number`) `transaction_journal`.`id`, `transaction_journal`.`reference_number`, `transaction_journal`.`future_dated_transaction_id`, `transaction_journal`.`send_money_type_id`, `transaction_journal`.`source_account_number`, `transaction_journal`.`source_account_type`, `transaction_journal`.`customer_number`, `transaction_journal`.`request_id`, `transaction_journal`.`destination_account_number`, `transaction_journal`.`destination_account_type`, `transaction_journal`.`destination_validation`, `transaction_journal`.`transfer_schedule_type`, `transaction_journal`.`currency_id`, `transaction_journal`.`amount`, `transaction_journal`.`service_fee`, `transaction_journal`.`transaction_date`, `transaction_journal`.`posting_date`, `transaction_journal`.`status`, `transaction_journal`.`remarks`, `transaction_journal`.`created_date`, `transaction_journal`.`updated_date`, `transaction_journal`.`source_account_name`, `transaction_journal`.`username`, `transaction_journal`.`reason`, `transaction_journal`.`card_number`, `transaction_journal`.`status_remarks`, `transaction_journal`.`creditor_bank_code`, `transaction_journal`.`creditor_details`, `transaction_journal`.`mobile_number`, `transaction_journal`.`address`, `transaction_journal`.`channel_id`, `transaction_journal`.`system`, `transaction_journal`.`purpose_of_transaction`, `transaction_journal`.`esb_posted_date`, `transaction_journal`.`currency_id_destination`, `transaction_journal`.`gl_pa_status`, `transaction_journal`.`gl_sf_status`, `transaction_journal`.`gl_status_remarks`, `transaction_journal`.`email_address`, `transaction_journal`.`exchange_rate`, `transaction_journal`.`contact_type`, `transaction_journal`.`contact_value`, `transaction_journal`.`is_validated` from `transaction_journal` where `transaction_journal`.`reference_number` = ?]; nested exception is java.sql.SQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'on (`transaction_journal`.`customer_number`) `transaction_journal`.`id`, `transa' at line 1
at org.jooq_3.11.12.MYSQL_8_0.debug(Unknown Source)

Using DISTINCT ON
You're not using DISTINCT, you're using DISTINCT ON, which is a PostgreSQL vendor specific SQL feature. In newer versions of jOOQ, DISTINCT ON is being emulated using window functions for other dialects, so you might want to upgrade.
You'll still need an ORDER BY clause for DISTINCT ON to work. It's a bit of an esoteric PostgreSQL invention, adding to the confusion of the logical order of operations in SQL.
Using LIMIT
While what you want to do is possible with DISTINCT ON, it seems overly complicated. Here's a much simpler way to solve your problem, producing an arbitrary record, or optionally, if you uncomment the ORDER BY clause, the first/last record given some ordering:
SELECT *
FROM transaction_journal
WHERE transaction_journal.reference_number = :referenceNumber
-- ORDER BY something
LIMIT 1
With jOOQ:
#Override
public Optional<TransactionJournalRecord> findByReferenceNumber(
final String referenceNumber
) {
return this.dsl
.selectFrom(TRANSACTION_JOURNAL)
.where(TRANSACTION_JOURNAL.REFERENCE_NUMBER.eq(referenceNumber))
// .orderBy(something)
.limit(1)
.fetchOptional();
}
Using GROUP BY
Note that in MySQL, if strict mode is turned off, then the GROUP BY approach you've mentioned in your comments will also produce an arbitrary value for all non-GROUP BY columns, which is not correct standard SQL.
Unlike as with DISTINCT ON or LIMIT, you have no control over which value is being produced. As a matter of fact, you can't even be sure if two non-GROUP BY values belong to the same record. It is never a good idea to depend on this outdated, MySQL-specific behaviour.
Using DISTINCT
There is no way to solve this with DISTINCT only. If you don't have a unique constraint on your search criteria, then you will always get duplicates, which will throw an exception when using fetchOptional(), in jOOQ.

Related

Clarification when using the OVER clause

new to ORACLE 11g and I noticed the OVER clause seems pretty useful for some analytics. I'm having some issues understanding the syntax I believe even after looking at the ORACLE manual on the OVER clause.
I'm trying to get the cumulative amount for all gifts donated in chronological order. This is all from only one table, Donations which includes all the columns seen below in the query.
SELECT Donations.donationid, Donation.charity, Donation.giftdate, Donation.amount, SUM(Donation.amount)
OVER (ORDER BY Donations.amount) AS Total_Gift_Amount
FROM Donations.donations
ORDER BY Total_Gift_Amount DESC;
I thought I was on the right track but there is something I'm missing that's making my columns be out of scope. The error I receive is
Error at line 1: ORA-00904:"DONATION"."AMOUNT": invalid identifier (its the SUM(donations.donations))
Donations table includes: DonationID, Charity, Amount, GiftDate, DonorID
My main confusion is that when I DONT use the OVER clause I can get the result set no problem. However, when I try using the OVER I start to get lots of syntax errors and things of that nature. I want to learn how to use OVER properly though.
I know that error message usually is when you type an invalid column header or if it is out of scope. Why wouldn't it be able to see that Donations.amount is a valid column name? I could just be messing up the syntax of this new clause.
The error has nothing to do with analytical (window) functions.
The table is simply named Donations, in plural, and one of the columns you're selecting is Donation.amount, with Donation, in singular. Slap on the missing "s" there and you should be fine.

Active Record embed Table.where('x').count inside of select statement

I'm setting up an AR query that is basically meant to find an average of a few values that span three different tables. I'm getting hung up on how to embed the result of a particular Count query inside of the Active Record select statement.
Just by itself, this query returns "3":
Order.where(user_id: 319).count => 3
My question is, can I embed this into a select statement as a SQL alias similar to below:
Table.xxxxxx.select("Order.where(user_id: 319).count AS count,user_id, SUM(quantity*current_price) AS revenue").xxxxx
It seems to be throwing an error and generally not recognizing what I'm trying to do when I declare that first count alias. Any ideas on the syntax?
Well, after examining a bit, I cleared my mind into the ActiveRecord select() syntax.
It's a method that can take a variable length of parameters. So, your failing :
Table.xxxxxx.select("Order.where(user_id: 319).count AS count,user_id, SUM(quantity*current_price) AS revenue").xxxxx
After replacing proper SQL for your misplaced ActiveRecord statement, should be more of like this [be careful, you can't use as count in most cases, count is reserved]:
Table.xxxxx.select("(SELECT count(id) from orders where user_id=319) as usercount", "user_id","SUM(quantity*current_price) AS revenue").xxxx
But I guess you should need more a per-user_id-table.
So, I'd skip Models and go to direct SQL, always being careful to avoid injections:
ActiveRecord::Base.connection.execute('SELECT COUNT(orders.id) as usercount, users.id from users, orders where users.id=orders.user_id group by users.id')
This is simplified of course, you can apply the rest of the data (which I currently do not know) accordingly. The above simplified, not full solution, could be written also as:
Order.joins(:user).select("count(orders.id) as usercount, users.id").group(:user_id)

Feature not implemented: WINDOW/ORDER BY

I am using an embedded Apache Derby database and execute the following query:
SELECT
someUniqueValue,
row_number() over(ORDER BY someUniqueValue) as ROWID
FROM
myTable;
someUniqueValue is a varchar.
I am getting the Exception:
java.sql.SQLFeatureNotSupportedException: Feature not implemented: WINDOW/ORDER BY
If i change the row_number() line in my query to:
row_number() over() as ROWID
The query runs fine (although the result is useless for me).
The Derby documentation states this is supported. What am I doing wrong?
The link you posted is just a draft to specify how the feature should be implemented.
If you scroll down a bit you find:
An implementation of the ROW_NUMBER() window function is included in Derby starting with the 10.4.1.3 release. Limitations and usage description may be found in the Derby Reference Manual
When you then look at Derby manual (your link is not the manual) http://db.apache.org/derby/docs/10.10/ref/rreffuncrownumber.html you'll find a list of limitations:
Derby does not currently allow the named or unnamed window specification to be specified in the OVER() clause, but requires an empty parenthesis. This means the function is evaluated over the entire result set.
The ROW_NUMBER function cannot currently be used in a WHERE clause.
Derby does not currently support ORDER BY in subqueries, so there is currently no way to guarantee the order of rows in the SELECT subquery. An optimizer override can be used to force the optimizer to use an index ordered on the desired column(s) if ordering is a firm requirement.

Why is my ActiveRecord order method denying knowledge of a derived column from a select method?

I have written an ActiveRecord query designed to order my invoices by the sum of a column in an associated table (Invoice has_many :item_numbers). It involves some complex (for me) class methods but it ends up like this;
Invoice.where(user_id: 1, deleted: false, status: 'Sent').joins(:item_numbers).select('invoices.*, sum(item_numbers.amount) as total').group('invoices.id').order('total asc').limit(20)
If I run this query in the console I get the expected result - my first twenty invoices ordered by the total of their item_numbers. When it runs in the development server though, I get the following error from Postgresql;
PG::Error: ERROR: column "total" does not exist
As the query that is run on the server depends on a lot of scopes and class methods, to check that the query is correct, I called .to_sql on it and the output in the browser was;
SELECT invoices.*, sum(item_numbers.amount_with_gst) as total FROM "invoices" INNER JOIN "item_numbers" ON "item_numbers"."invoice_id" = "invoices"."id" WHERE "invoices"."user_id" = 1 AND "invoices"."deleted" = 'f' AND "invoices"."status" = 'Sent' GROUP BY invoices.id ORDER BY total asc LIMIT 20
I get exactly the same output if I call .to_sql on the query itself in the console, and if I put this output into Invoice.find_by_sql in the console I don't get the error.
This feels like some sort of weird bug, but I know that the bug is most likely mine. I have hunted for a few hours now with no clues - can anyone see what I'm doing wrong?
This is not a problem with ActiveRecord. As you demonstrated, ActiveRecord has no problem with your code and it happily creates a SQL statement. But one which PostgreSQL doesn´t like: The PG::Error exception is a low level exception coming from the database adapter stating that your SQL query is not valid.
PostgreSQL simply doesn't support expression aliases in the ORDER BY statement. (Have a look at the documentation for ORDER BY)
You have to repeat the expression in the order statement:
Invoice.select('invoices.*, sum(item_numbers.amount) as total')
.order('sum(item_numbers.amount) asc')
Don't worry, the query optimizer will detect that and your sum is still calculated just once.
Most likely you are using a different DBMS on your console and your development server. (MySQL or SQLite?) Some database engines accept expression aliases, some don't.

SQL Server: Is SELECTing a literal value faster than SELECTing a field? [duplicate]

This question already has answers here:
Subquery using Exists 1 or Exists *
(6 answers)
Closed 7 years ago.
I've seen some people use EXISTS (SELECT 1 FROM ...) rather than EXISTS (SELECT id FROM ...) as an optimization--rather than looking up and returning a value, SQL Server can simply return the literal it was given.
Is SELECT(1) always faster? Would Selecting a value from the table require work that Selecting a literal would avoid?
In SQL Server, it does not make a difference whether you use SELECT 1 or SELECT * within EXISTS. You are not actually returning the contents of the rows, but that rather the set determined by the WHERE clause is not-empty. Try running the query side-by-side with SET STATISTICS IO ON and you can prove that the approaches are equivalent. Personally I prefer SELECT * within EXISTS.
For google's sake, I'll update this question with the same answer as this one (Subquery using Exists 1 or Exists *) since (currently) an incorrect answer is marked as accepted. Note the SQL standard actually says that EXISTS via * is identical to a constant.
No. This has been covered a bazillion times. SQL Server is smart and knows it is being used for an EXISTS, and returns NO DATA to the system.
Quoth Microsoft:
http://technet.microsoft.com/en-us/library/ms189259.aspx?ppud=4
The select list of a subquery
introduced by EXISTS almost always
consists of an asterisk (*). There is
no reason to list column names because
you are just testing whether rows that
meet the conditions specified in the
subquery exist.
Also, don't believe me? Try running the following:
SELECT whatever
FROM yourtable
WHERE EXISTS( SELECT 1/0
FROM someothertable
WHERE a_valid_clause )
If it was actually doing something with the SELECT list, it would throw a div by zero error. It doesn't.
EDIT: Note, the SQL Standard actually talks about this.
ANSI SQL 1992 Standard, pg 191 http://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt
3) Case:
a) If the <select list> "*" is simply contained in a <subquery> that is immediately contained in an <exists predicate>, then the <select list> is equivalent to a <value expression> that is an arbitrary <literal>.
When you use SELECT 1, you clearly show (to whoever is reading your code later) that you are testing whether the record exists. Even if there is no performance gain (which is to be discussed), there is gain in code readability and maintainability.
Yes, because when you select a literal it does not need to read from disk (or even from cache).
doesn't matter what you select in an exists clause. most people do select *, then sql server automatically picks the best index
As someone pointed out sql server ignores the column selection list in EXISTS so it doesn't matter. I personally tend to use "SELECT null ..." to indicate that the value is not used at all.
If you look at the execution plan for
select COUNT(1) from master..spt_values
and look at the stream aggregate you will see that it calculates
Scalar Operator(Count(*))
So the 1 actually gets converted to *
However I have read somewhere in the "Inside SQL Server" series of books that * might incur a very slight overhead for checking column permissions. Unfortunately the book didn't go into any more detail than that as I recall.
Select 1 should be better to use in your example. Select * gets all the meta-data assoicated with the objects before runtime which adss overhead during the compliation of the query. Though you may not see differences when running both types of queries in your execution plan.