Limit in subquery with JPQL - orm

I'd like to get the average price of my top 100 products via JPA2. The query should look something like this (my sql is a little rusty):
select avg(price) from (
select p.price from Product p order by p.price desc limit 100)
but that is not working at all. I also tried this:
select avg(p.price) from Product p where p.id =
(select pj.id from Product pj order by pj.price desc limit 100)
this is working up until the limit keyword.
I read that limit is not available in JPQL.
Any idea on how to do this? Criteria would also be fine.

'LIMIT' is not supported by JPQL. Below is the sample-code using Criteria-API.
CriteriaBuilder builder = entityManager.getCriteriaBuilder();
CriteriaQuery<Double> criteriaQuery = builder.createQuery(Double.class);
Root<Product> productRoot = criteriaQuery.from(Product.class);
criteriaQuery.select(builder.avg(productRoot.get("price")));
criteriaQuery.orderBy(builder.desc(productRoot.get("price"));
Double average = (Double)entityManager.createQuery(criteriaQuery).setMaxResults(100).getSingleResult();
or
Double average = (Double)entityManager.createQuery("select avg(p.price) from Product p order by p.price").setMaxResults(100).getSingleResult();
If this doesn't work, then have to go for executing two queries - selecting definitely ordered records & then average them.
Else go for native query if portability is not an issue, can accomplish same using single query as many RDBMS supports restricting the number of results to be fetched from database.

SELECT AVG(SELECT PRICE FROM PRODUCT ORDER BY PRICE DESC LIMIT 100)
See this post regarding the JPQL LIMIT work around.

Related

strange Hibernate-generated SQL with OVER() function and ORDER BY ORDER ON

I'm having a problem with a slight ordering anomaly in a legacy web application, and figured I'd start with the back-end SQL query generated by Hibernate with DB2Dialect:
FROM (SELECT inner2_.*,
ROWNUMBER()
OVER(
ORDER BY ORDER OF inner2_) AS rownumber_
FROM (SELECT this_.sohn AS SOHN1_15_11_,
this_.aslc AS ASLC2_15_11_,
this_.cc AS CC3_15_11_,
bb1_.sbn AS SBN1_2_0_,
bb1_.abc AS ABC3_4_5_,
mh2_.smhn AS SMHN1_9_1_,
mh2_.sabc AS SABC3_4_6_,
og8_.sogn AS SOGN1_11_2_,
og8_.sogo AS SOGO3_4_7_,
oc9_.socn AS SOCN_1_13_3_,
oc9_.soco AS SOCO_3_4_8_
FROM ott.oh this_
INNER JOIN ott.bb1_
ON this_.sbn = bb1_.sbn
INNER JOIN ott.mh2_
ON this_.smhn = mh2_.smhn
LEFT OUTER JOIN ott.og og8_
ON this_.sogn = og8_.sogn
LEFT OUTER JOIN ott.oc oc9_
ON this_.socn = oc9_.socn
WHERE ( 1 = 1 )
AND bb1_.sbn = ?
AND mh2_.smhn = ?
FETCH first 200 ROWS only) AS inner2_) AS inner1_
WHERE rownumber_ > 190
ORDER BY rownumber_
What does this query do? I am especially curious about OVER(), which isn't coming up when I google for such a SQL function (but it is an MDX function?).
This query functions in the application to grab the last page of a paginated list that is ordered by a field that doesn't even appear in the query. The query to populate the first page on initial load is different, and its generated SQL does ORDER BY the desired field.
So to get through this I need to understand how the query functions. Takers?
OVER() is part of so called OLAP functions - a good desrciption can be found in the DB2 SQL Cookbook - i.e. available here:
http://www.ids-system.de/images/Downloads/DB2V97CK.PDF
It is a group of really useful functions.
Also good additional stuff
http://www.ibm.com/developerworks/data/library/techarticle/dm-0401kuznetsov/

How to combine this query

In the query
cr is customers,
chh? ise customer_pays,
cari_kod is customer code,
cari_unvan1 is customer name
cha_tarihi is date of pay,
cha_meblag is pay amount
The purpose of query, the get the specisified list of customers and their last date for pay and amount of money...
Actually my manager needs more details but the query is very slow and that is why im using only 3 subquery.
The question is how to combine them ?
I have researched about Cte and "with clause" and "subquery in "where " but without luck.
Can anybody have a proposal.
Operating system is win2003 and sql server version is mssql 2005.
Regards
select cr.cari_kod,cr.cari_unvan1, cr.cari_temsilci_kodu,
(select top 1
chh1.cha_tarihi
from dbo.CARI_HESAP_HAREKETLERI chh1 where chh1.cha_kod=cr.cari_kod order by chh1.cha_RECno) as sontar,
(select top 1
chh2.cha_meblag
from dbo.CARI_HESAP_HAREKETLERI chh2 where chh2.cha_kod=cr.cari_kod order by chh2.cha_RECno) as sontutar
from dbo.CARI_HESAPLAR cr
where (select top 1
chh3.cha_tarihi
from dbo.CARI_HESAP_HAREKETLERI chh3 where chh3.cha_kod=cr.cari_kod order by chh3.cha_RECno) >'20130314'
and
cr.cari_bolge_kodu='322'
or
cr.cari_bolge_kodu='324'
order by cr.cari_kod
You will probably speed up the query by changing your last where clause to:
where (select top 1 chh3.cha_tarihi
from dbo.CARI_HESAP_HAREKETLERI chh3 where chh3.cha_kod=cr.cari_kod
order by chh3.cha_RECno
) >'20130314' and
cr.cari_bolge_kodu in ('322', '324')
order by cr.cari_kod
Assuming that you want both the date condition met and one of the two codes. Your original logic is the (date and code = 322) OR (code = 324).
The overall query can be improved by finding the record in the chh table and then just using that. For this, you want to use the window function row_number(). I think this is the query that you want:
select cari_kod, cari_unvan1, cari_temsilci_kodu,
cha_tarihi, cha_meblag
from (select cr.*, chh.*,
ROW_NUMBER() over (partition by chh.cha_kod order by chh.cha_recno) as seqnum
from dbo.CARI_HESAPLAR cr join
dbo.CARI_HESAP_HAREKETLERI chh
on chh.cha_kod=cr.cari_kod
where cr.cari_bolge_kodu in ('322', '324')
) t
where chh3.cha_tarihi > '20130314' and seqnum = 1
order by cr.cari_kod;
This version assumes the revised logic date/code logic.
The inner subquery select might generate an error if there are two columns with the same name in both tables. If so, then just list the columns instead of using *.

BigQuery - Shuffle By error

I have a table of about 5M rows. Note this is just a poc. Ultimately we will need to be in the TB range. I am doing a self join to find permutations of products for a market basket analysis.
I need to find the number of times the combination occurs in a basket, the ratio of occurrences to total baskets, and the number of times the item occurs in all baskets. This is pretty standard. BigQuery does not support selects in the predicate of another select so I needed to create another join I suppose. Here's what I came up with -
select twoItem.upc1,twoItem.upc2,twoItem.twoItemOccurrences, totalUpc.totalUpcCount
from
(
select purchase1.upc as upc1,purchase2.upc as upc2,count(upc1) as twoItemOccurrences
from
conagra.purchase as purchase1
join each conagra.purchase as purchase2
on purchase1.upc = purchase2.upc
group by upc1,upc2
) as twoItem
JOIN EACH
(
select purchase3.upc as upc3, count(*) as totalUpcCount
from conagra.purchase as purchase3
group by upc3
) as totalUpc
on totalUpc.upc3 = twoItem.upc1
LIMIT 50;
I get the following error:
SHUFFLE BY may only be applied to parallelizable queries, but query is not parallelizable: (SELECT * FROM (SELECT [purchase3.upc] AS [upc3], COUNT(*) AS [totalUpcCount]...
Maybe an unpublished limitation?
Any help would be appreciated.
Try running these with GROUP EACH BY on your inner queries. We'll improve the response message for queries like this.

Why does added RAND() cause MySQL to overload?

OK I have this query which gives me DISTINCT product_series, plus all the other fields in the table:
SELECT pi.*
FROM (
SELECT DISTINCT product_series
FROM cart_product
) pd
JOIN cart_product pi
ON pi.product_id =
(
SELECT product_id
FROM cart_product po
WHERE product_brand = "everlon"
AND product_type = "'.$type.'"
AND product_available = "yes"
AND product_price_contact = "no"
AND product_series != ""
AND po.product_series = pd.product_series
ORDER BY product_price
LIMIT 1
) ORDER BY product_price
This works fine. I am also ordering by price so I can get the starting price for each series. Nice.
However today my boss told me that all the products thats are showing up from this query are of metal_type white gold And he wants to show random metal types. so I added RAND() to the order by after the ORDER BY price so that I will still get the lowest price, but a random metal in the lowest price.. here is the new query:
SELECT pi.*
FROM (
SELECT DISTINCT product_series
FROM cart_product
) pd
JOIN cart_product pi
ON pi.product_id =
(
SELECT product_id
FROM cart_product po
WHERE product_brand = "everlon"
AND product_type = "'.$type.'"
AND product_available = "yes"
AND product_price_contact = "no"
AND product_series != ""
AND po.product_series = pd.product_series
ORDER BY product_price, RAND()
LIMIT 1
) ORDER BY product_price, RAND()
When I run this query, MySQL completely shuts down and tells me that there are too many connections And I get a phone call from the host admin asking me what the hell I did.
I didn't believe that could be just from added RAND() to the query and I thought it had to be a coincidence. I waited a few hours after everything was fixed and ran the query again. Immediately... same issue.
So what is going on? Because I have no clue. Is there something wrong with my query?
Thanks!!!!
Using RAND() for ORDER BY is not a good idea, because it does not scale as the data increases. You can see more information on it, including two alternatives you can adapt, in my answer to this question.
Here's a blog post that explains the issue quite well, and workarounds:
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/
And here's a similar warning against ORDER BY RAND() for MySQL, I think the cause is basically the same there:
http://www.webtrenches.com/post.cfm/avoid-rand-in-mysql
Depending on the number of products in your site, that function call is going to execute once per record, potentially slowing the query down.. considerably.
The Too Many Connections error is probably due to this query blocking others while it tries to compute those numbers.
Find another way. ;)
Instead, you can generate random numbers on the programming language you're using, instead of the MySQL side, as rand() is being called for each row
If you know how many records you have you can select a random record like this (this is Perl):
$handle->Sql("SELECT COUNT(0) AS nor FROM table");
$handle->FetchRow();
$nor = $handle->Data('nor');
$rand = int(rand()*$nor)+1;
$handle->Sql("SELECT * FROM table LIMIT $rand,1");
$handle->FetchRow();
.
.
.

GROUP BY, ORDER BY - How to make group by consider latest apperance of item

I have a query like this (Mysql 5.X, PHP - formatted for legibility)
$query ="
SELECT
p.p_pid,
p.p_name,
p.p_url
FROM
activity a,
products p
WHERE
a.a_uid= ".$uid_int."
AND a.a_pid > 0
AND p.p_pid = a.a_pid
GROUP BY
a.a_pid
ORDER BY
a.a_time DESC LIMIT 6
");
In general it should produce a unique list of the 6 latest products the user has seen.
The problem is that if the user has seen a product more than once. one of them in the last 6 activities and one of them before the latest 6 activities the query does not return the product. I assume that the (group by) does not leave a_time with the latest time of apperance of the product. How can I correct it?
Have you tried ordering by MAX(a.a_time) ?
SELECT
p.p_pid,
p.p_name,
p.p_url
FROM products p
INNER JOIN activity a on p.p_pid = a.a_pid
WHERE
a.a_uid= ".$uid_int."
GROUP BY
p_pid, p_name, p_url
ORDER BY
max(a.a_time) DESC
LIMIT 6
As a best practice, use GROUP BY on every column you use without an aggregate. MySQL is one of the few databases that allow you to use a column that's not being grouped on. It'll give you a random column from the rows you selected.
I sure hope that $uid_int variable is double checked for SQL injection.
$query ="
SELECT
MAX(p.p_pid) p_pid,
MAX(p.p_name) p_name,
MAX(p.p_url) p_url
FROM
activity a
INNER JOIN products p ON p.p_pid = a.a_pid
WHERE
a.a_uid= ".$uid_int."
AND a.a_pid > 0
GROUP BY
a.a_pid
ORDER BY
MAX(a.a_time) DESC
LIMIT 6
");
Sometimes I wonder if it was a good design decision from MySQL to allow grouping without explicit aggregation...