Rank over multiple columns executed by the server? - sql

How can I implement the RANK() function taking into account two columns for the ranking? The main column does not have unique values. This is the query:
select *, RANK() over (order by score, posteddate desc) as rank from Post
I need to implement pagination without the offset limit pattern and I thought a kind of ranking function would be ok. I have a partial implementation which only works with uniques, using the '>' or '<' operands on the key used for pagination.
Any idea? I cannot find a solution online.
Cheers.
Edit upon request: I am using c#.

We do paging with .Skip(pageIndex * pageSize) and .Take(pageSize). You just need to make sure you have ordered the data first. This will all execute on the server.
For example, if you want the content for page 4 and each page should contain 20 results:
var pageIndex = 4;
var pageSize = 20;
var pageContent = db.Post
.OrderBy(p => p.score)
.ThenByDescending(p => p.posteddate)
.Skip(pageIndex * pageSize)
.Take(pageSize);
The generated SQL here automatically includes the addition of ROW_NUMBER() and TOP. It works pretty well up to a couple hundred thousand records.

Not exactly an answer, but I found a better approach as described in http://use-the-index-luke.com/sql/partial-results/fetch-next-page
SELECT *
FROM ( SELECT *
FROM sales
WHERE sale_date <= ?
AND NOT (sale_date = ? AND sale_id >= ?)
ORDER BY sale_date DESC, sale_id DESC
)
WHERE rownum <= 10

Related

How do I show the latest content from my SQL Database (mulitple tables) on my website?

I have just started a website where I show pictures and video. On my FrontPage, I want the latest video/Picture to appear. I upload one Picture or Video each day, so all Pictures and Videos have different dates related to them in my databases.
I have two tables: "Pictures" and "Videos" with the same structure.
The codes I use for fetching data from my databases look like this:
First two codes are the ones that show my Pictures or videos in a list.
$GET_picture = mysql_query("SELECT * FROM Pictures ORDER BY Dato DESC LIMIT 0,1");
while($picture = mysql_fetch_array($GET_picture))
and
$GET_video = mysql_query("SELECT * FROM Videos ORDER BY Dato DESC LIMIT 0,1");
while($video = mysql_fetch_array($GET_film)){
Next two codes are for showing a specific Picture or video:
$GET_spec_picture = mysql_query("SELECT * FROM Pictures WHERE id='$id'");
$spec = mysql_fetch_array($GET_spec_picture);
and
$GET_spec_video = mysql_query("SELECT * FROM Videos WHERE id='$id'");
$spec = mysql_fetch_array($GET_spec_video);
Again, what I want to do is to show the latest (and only the latest) Picture or video on my FrontPage.
I have tried using the UNION tag, but it did not Work. Could anyone show me how to use the tag correctly for this situation, or do I have another problem?
Thank you in advance!
If your PHP really is
if($_GET['mode'] == ""){
$get_content = mysql_fetch_array("
(SELECT * FROM Pictures
ORDER BY Dato DESC LIMIT 0,1)
union all (SELECT * FROM Videos
ORDER BY Dato DESC LIMIT 0,1)
order by Dato DESC LIMIT 0, 1");
while($content = mysql_fetch_array($get_content)){...}
}
then you should replace the first mysql_fetch_array with mysql_query!
Did you try this?
(SELECT * FROM Pictures ORDER BY Dato DESC LIMIT 0,1)
union all
(SELECT * FROM Videos ORDER BY Dato DESC LIMIT 0,1)
order by Date desc
LIMIT 0, 1;
The order by in each subquery is, strictly speaking, unnecessary. However, if you have an index on date, these will quickly get the most recent versions and the final sort on two rows is trivial.

Limit in subquery with JPQL

I'd like to get the average price of my top 100 products via JPA2. The query should look something like this (my sql is a little rusty):
select avg(price) from (
select p.price from Product p order by p.price desc limit 100)
but that is not working at all. I also tried this:
select avg(p.price) from Product p where p.id =
(select pj.id from Product pj order by pj.price desc limit 100)
this is working up until the limit keyword.
I read that limit is not available in JPQL.
Any idea on how to do this? Criteria would also be fine.
'LIMIT' is not supported by JPQL. Below is the sample-code using Criteria-API.
CriteriaBuilder builder = entityManager.getCriteriaBuilder();
CriteriaQuery<Double> criteriaQuery = builder.createQuery(Double.class);
Root<Product> productRoot = criteriaQuery.from(Product.class);
criteriaQuery.select(builder.avg(productRoot.get("price")));
criteriaQuery.orderBy(builder.desc(productRoot.get("price"));
Double average = (Double)entityManager.createQuery(criteriaQuery).setMaxResults(100).getSingleResult();
or
Double average = (Double)entityManager.createQuery("select avg(p.price) from Product p order by p.price").setMaxResults(100).getSingleResult();
If this doesn't work, then have to go for executing two queries - selecting definitely ordered records & then average them.
Else go for native query if portability is not an issue, can accomplish same using single query as many RDBMS supports restricting the number of results to be fetched from database.
SELECT AVG(SELECT PRICE FROM PRODUCT ORDER BY PRICE DESC LIMIT 100)
See this post regarding the JPQL LIMIT work around.

LINQ to SQL Every Nth Row From Table

Anybody know how to write a LINQ to SQL statement to return every nth row from a table? I'm needing to get the title of the item at the top of each page in a paged data grid back for fast user scanning. So if i wanted the first record, then every 3rd one after that, from the following names:
Amy, Eric, Jason, Joe, John, Josh, Maribel, Paul, Steve, Tom
I'd get Amy, Joe, Maribel, and Tom.
I suspect this can be done... LINQ to SQL statements already invoke the ROW_NUMBER() SQL function in conjunction with sorting and paging. I just don't know how to get back every nth item. The SQL Statement would be something like WHERE ROW_NUMBER MOD 3 = 0, but I don't know the LINQ statement to use to get the right SQL.
Sometimes, TSQL is the way to go. I would use ExecuteQuery<T> here:
var data = db.ExecuteQuery<SomeObjectType>(#"
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % 25) = 1");
You could also swap out the n:
var data = db.ExecuteQuery<SomeObjectType>(#"
DECLARE #n int = 2
SELECT * FROM
(SELECT *, ROW_NUMBER() OVER (ORDER BY id) AS [__row]
FROM [YourTable]) x WHERE (x.__row % #n) = 1", n);
Once upon a time, there was no such thing as Row_Number, and yet such queries were possible. Behold!
var query =
from c in db.Customers
let i = (
from c2 in db.Customers
where c2.ID < c.ID
select c2).Count()
where i%3 == 0
select c;
This generates the following Sql
SELECT [t2].[ID], [t2]. --(more fields)
FROM (
SELECT [t0].[ID], [t0]. --(more fields)
(
SELECT COUNT(*)
FROM [dbo].[Customer] AS [t1]
WHERE [t1].[ID] < [t0].[ID]
) AS [value]
FROM [dbo].[Customer] AS [t0]
) AS [t2]
WHERE ([t2].[value] % #p0) = #p1
Here's an option that works, but it might be worth checking that it doesn't have any performance issues in practice:
var nth = 3;
var ids = Table
.Select(x => x.Id)
.ToArray()
.Where((x, n) => n % nth == 0)
.ToArray();
var nthRecords = Table
.Where(x => ids.Contains(x.Id));
Just googling around a bit I haven't found (or experienced) an option for Linq to SQL to directly support this.
The only option I can offer is that you write a stored procedure with the appropriate SQL query written out and then calling the sproc via Linq to SQL. Not the best solution, especially if you have any kind of complex filtering going on.
There really doesn't seem to be an easy way to do this:
How do I add ROW_NUMBER to a LINQ query or Entity?
How to find the ROW_NUMBER() of a row with Linq to SQL
But there's always:
peopleToFilter.AsEnumerable().Where((x,i) => i % AmountToSkipBy == 0)
NOTE: This still doesn't execute on the database side of things!
This will do the trick, but it isn't the most efficient query in the world:
var count = query.Count();
var pageSize = 10;
var pageTops = query.Take(1);
for(int i = pageSize; i < count; i += pageSize)
{
pageTops = pageTops.Concat(query.Skip(i - (i % pageSize)).Take(1));
}
return pageTops;
It dynamically constructs a query to pull the (nth, 2*nth, 3*nth, etc) value from the given query. If you use this technique, you'll probably want to create a limit of maybe ten or twenty names, similar to how Google results page (1-10, and Next), in order to avoid getting an expression so large the database refuses to attempt to parse it.
If you need better performance, you'll probably have to use a stored procedure or a view to represent your query, and include the row number as part of the stored proc results or the view's fields.

How to optimize this low-performance MySQL query?

I’m currently using the following query for jsPerf. In the likely case you don’t know jsPerf — there are two tables: pages containing the test cases / revisions, and tests containing the code snippets for the tests inside the test cases.
There are currently 937 records in pages and 3817 records in tests.
As you can see, it takes quite a while to load the “Browse jsPerf” page where this query is used.
The query takes about 7 seconds to execute:
SELECT
id AS pID,
slug AS url,
revision,
title,
published,
updated,
(
SELECT COUNT(*)
FROM pages
WHERE slug = url
AND visible = "y"
) AS revisionCount,
(
SELECT COUNT(*)
FROM tests
WHERE pageID = pID
) AS testCount
FROM pages
WHERE updated IN (
SELECT MAX(updated)
FROM pages
WHERE visible = "y"
GROUP BY slug
)
AND visible = "y"
ORDER BY updated DESC
I’ve added indexes on all fields that appear in WHERE clauses. Should I add more?
How can this query be optimized?
P.S. I know I could implement a caching system in PHP — I probably will, so please don’t tell me :) I’d just really like to find out how this query could be improved, too.
Use:
SELECT x.id AS pID,
x.slug AS url,
x.revision,
x.title,
x.published,
x.updated,
y.revisionCount,
COALESCE(z.testCount, 0) AS testCount
FROM pages x
JOIN (SELECT p.slug,
MAX(p.updated) AS max_updated,
COUNT(*) AS revisionCount
FROM pages p
WHERE p.visible = 'y'
GROUP BY p.slug) y ON y.slug = x.slug
AND y.max_updated = x.updated
LEFT JOIN (SELECT t.pageid,
COUNT(*) AS testCount
FROM tests t
GROUP BY t.pageid) z ON z.pageid = x.id
ORDER BY updated DESC
You want to learn how to use EXPLAIN. This will execute the sql statement, and show you which indexes are being used, and what row scans are being performed. The goal is to reduce the number of row scans (ie, the database searching row by row for values).
You may want to try the subqueries one at a time to see which one is slowest.
This query:
SELECT MAX(updated)
FROM pages
WHERE visible = "y"
GROUP BY slug
Makes it sort the result by slug. This is probably slow.

Why does added RAND() cause MySQL to overload?

OK I have this query which gives me DISTINCT product_series, plus all the other fields in the table:
SELECT pi.*
FROM (
SELECT DISTINCT product_series
FROM cart_product
) pd
JOIN cart_product pi
ON pi.product_id =
(
SELECT product_id
FROM cart_product po
WHERE product_brand = "everlon"
AND product_type = "'.$type.'"
AND product_available = "yes"
AND product_price_contact = "no"
AND product_series != ""
AND po.product_series = pd.product_series
ORDER BY product_price
LIMIT 1
) ORDER BY product_price
This works fine. I am also ordering by price so I can get the starting price for each series. Nice.
However today my boss told me that all the products thats are showing up from this query are of metal_type white gold And he wants to show random metal types. so I added RAND() to the order by after the ORDER BY price so that I will still get the lowest price, but a random metal in the lowest price.. here is the new query:
SELECT pi.*
FROM (
SELECT DISTINCT product_series
FROM cart_product
) pd
JOIN cart_product pi
ON pi.product_id =
(
SELECT product_id
FROM cart_product po
WHERE product_brand = "everlon"
AND product_type = "'.$type.'"
AND product_available = "yes"
AND product_price_contact = "no"
AND product_series != ""
AND po.product_series = pd.product_series
ORDER BY product_price, RAND()
LIMIT 1
) ORDER BY product_price, RAND()
When I run this query, MySQL completely shuts down and tells me that there are too many connections And I get a phone call from the host admin asking me what the hell I did.
I didn't believe that could be just from added RAND() to the query and I thought it had to be a coincidence. I waited a few hours after everything was fixed and ran the query again. Immediately... same issue.
So what is going on? Because I have no clue. Is there something wrong with my query?
Thanks!!!!
Using RAND() for ORDER BY is not a good idea, because it does not scale as the data increases. You can see more information on it, including two alternatives you can adapt, in my answer to this question.
Here's a blog post that explains the issue quite well, and workarounds:
http://www.titov.net/2005/09/21/do-not-use-order-by-rand-or-how-to-get-random-rows-from-table/
And here's a similar warning against ORDER BY RAND() for MySQL, I think the cause is basically the same there:
http://www.webtrenches.com/post.cfm/avoid-rand-in-mysql
Depending on the number of products in your site, that function call is going to execute once per record, potentially slowing the query down.. considerably.
The Too Many Connections error is probably due to this query blocking others while it tries to compute those numbers.
Find another way. ;)
Instead, you can generate random numbers on the programming language you're using, instead of the MySQL side, as rand() is being called for each row
If you know how many records you have you can select a random record like this (this is Perl):
$handle->Sql("SELECT COUNT(0) AS nor FROM table");
$handle->FetchRow();
$nor = $handle->Data('nor');
$rand = int(rand()*$nor)+1;
$handle->Sql("SELECT * FROM table LIMIT $rand,1");
$handle->FetchRow();
.
.
.