little complex sql row postion - sql

basically i have albums, which has 50 images init.. now if i show list of images, i know from which to which row is showing (showing: 20 to 30 of 50), means showing 10 rows from 20 - 30. well now the problem is, i want to select an image, but still show which postion was it selected, so i can move back and forth, but keep the postion too.
like if i select 5th image, which id is 'sd564', i want to show (6 of 50 images), means you are seeing 6th of 50 images.. if i get next row id and show that, then, i want to show (7 of 50 images).
well i can do all this from pagination pointer easily, like in url say (after=5, after=6)... its moving with postion, but what if i dont have this (after=6) and just have an id, how can i still do that?
i dont want to use (after=6) also because its dynamic site and images adds and delete, so position chnages and sharing with someone else and going back on same old link, then it would be wrong position.
what kind of sql query should i be running for this?
currently i have
select * from images where id = 'sd564';
obviously i need to add limit or some other thing in query to get what i want or maybe run another query to get the result, while keeping this old query inplace too. anyway i just want positioning. i hope you can help me solve this
Example: http://media.photobucket.com/image/color%20splash/aly3265/converse.jpg
sample http://img41.imageshack.us/img41/5631/viewing3of8240.png
Album Query Request (check post below)
select images.* from images, album
where album_id = '5'
and album_id = image_album_id
order by created_date DESC
limit ....;

Assuming created_date is unique per album_id and (album_id,created_date) is unique for all rows in images, then this:
select i1.*, count(*) as position
from images i1
inner join images i2
on i1.album_id = i2.album_id -- get all other pics in this album
and i1.created_date >= i2.created_date -- in case they were created before this pic
where i1.album_id = 5
group by i1.created_date
will reliably get you the images and their position. Please understand that this will only work reliably in case (album_id,created_date) are unique throughout the images table. If that is not the case, the position wont be reliable, and you might not see all photos due to the GROUP BY. Also note that a GROUP BY clause like this, only listing some of the columns that appear in the SELECT list (in this case images.*) is not valid in most RDBMS-es. For a detailed discussion on that matter, see: http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html
By doing this:
select i1.*, count(*) as position
from images i1
inner join images i2
on i1.album_id = i2.album_id -- get all other pics in this album
and i1.created_date >= i2.created_date -- in case they were created before this pic
where i1.album_id = 5
group by i1.created_date
having count(*) = 4
you select the image at the 4th position (note the having count(*) = 4)
By doing this:
select i1.*, count(*) as position
from images i1
inner join images i2
on i1.album_id = i2.album_id -- get all other pics in this album
and i1.created_date >= i2.created_date -- in case they were created before this pic
where i1.album_id = 5
group by i1.created_date
having count(*) between 1 and 10
you select all photos with positions 1 through 10 (note the having clause again.)
Of course, if you just want one particular image, you can simply do:
select i1.*, count(*) as position
from images i1
inner join images i2
on i1.album_id = i2.album_id -- get all other pics in this album
and i1.created_date >= i2.created_date -- in case they were created before this pic
where i1.image_id = 's1234'
group by i1.created_date
This will correctly report the position of the image within the album (of course, assuming that image_id is unique with in the images table). You don't need the having clause in that case since you already pinpointed the image you want.

From what you are saying here:
dont want to use (after=6) also because its dynamic site and images adds and delete, so position chnages and sharing with someone else and going back on same old link, then it would be wrong position.
I get the impression that this is not a SQL problem at all. The problem is that the positions of the fotos are local to the search resultset. To reliably naviate by position, you would need to make a snapshot (no pun intended) of some kind. That is, you need to have some way to "freeze" the dataset while it is being browsed.
A simple way to do it, would be to execute the search, and cache the result outside of the actual current datastore. For example, you could use "scratch tables" in your database, simply store it in temporary files, or in some memory caching layer if you have the mem for it. With this model, you'd let the user browse the resultset from the cache, and you would need to clean out the cache when the user's session ends (or after some timeout, you don't want to kill your server because some users don't log out)
Another way to do it, is to simply allow yourself to lie now and then. Let's say you have result pages of 10 images, and a typical search delivers 50 pages of results. Well, you could simply send a resultset for a fixed number of items, say 100 photos (so 10 pages) to the client. These search results would then be your snapshot, and contain references to the actual pictures. If you are storing the URLS in the database , and not the binary data, this reference is simply the URL. Or you could store the database Id there. Anyway, the user is allowed to browse the initial resultset, and chances are that they never browse the entire set. If they do, you re-execute the query on the server side for the next chunk of pages. If many photos were added in the mean time that would end up at positions 1..100, then the user will see stale data: that's the price they pay for having so much time on their hands that they can allow themselvs to browse 10 pages of 10 photos.
(of course, you should tweak the parameters to your liking but you get the idea I'm sure.)
If you don't want to 'lie' and it is really important that people can reliably browse all the results they searched, you could extend your database schema to support snapshots at that level. Now asssuming that there are only two operations for photos, namely "add" and "delete", you would have a TIMESTAMP_ADDED and a TIMESTAMP_REMOVED in your photo table. On add, you do the INSERT in your db, and fill TIMESTAMP_ADDED with the currrent timestamp. The TIMESTAMP_REMOVED would be filled with the theoretical maximum value for whatever data type you like to use to store the timestamp (For this particular case I would probably go for an INT column and simply store the UNIX_TIMESTAMP) On delete, you don't DELETE the row from the db, rather, you mark it as deleted by updating TIMESTAMP_REMOVED column, setting it to the current timestamp. Now when you have to do a search, you use a query like:
SELECT *
FROM photo
WHERE timestamp_added < timestamp_of_initial_search
AND timestamp_removed > timestamp_of_initial_search
AND ...various search criteria...
ORDER BY ...something
LIMIT ...page offset and num items in page...
The timestamp_of_initial_search is the timestamp of executing the initial search for a particular set of criteria. You should store that in the application session while the user is browsing a particular search resultet so you can use that in the subsequent queries required for fetching the pages. The first two WHERE criteria are there to implement the snapshot. The condition timestamp_added < timestamp_of_initial_search ensures we can only see photos that were added before the timestamp of executing the search. The condition timestamp_removed > timestamp_of_initial_search ensures we only search that were not already removed by the time the initial search was executed.
Of course, you still have to do something with the photos that were marked for delete. You could schedule periodical physical deletion for all photos that have a timestamp removed that is smaller than any of the current search resultsets.

If I understood your problem correctly, you can use the Row_Number() function (in SQL Server). To get the desired result, you can use a query something similar to this:
select images1.* from
(SELECT ROW_NUMBER() OVER (ORDER BY image_album_id) as rowID,(SELECT COUNT(*) FROM images) AS totCount, * FROM images) images1
JOIN album ON (album_id = images1.image_album_id)
where album_id = '5'
order by images1.image_album_id
limit ....;
Here the images.rowid gives you the position of the row and images.totCount give you the total number of rows.
Hope it helps.
Thnks.

Related

Get filtered row count using dm_db_partition_stats

I'm using paging in my app but I've noticed that paging has gone very slow and the line below is the culprit:
SELECT COUNT (*) FROM MyTable
On my table, which only has 9 million rows, it takes 43 seconds to return the row count. I read in another article which states that to return the row count for 1.4 billion rows, it takes over 5 minutes. This obviously cannot be used with paging as it is far too slow and the only reason I need the row count is to calculate the number of available pages.
After a bit of research I found out that I get the row count pretty much instantly (and accurately) using the following:
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('MyTable')
AND (index_id=0 or index_id=1)
But the above returns me the count for the entire table which is fine if no filters are applied but how do I handle this if I need to apply filters such as a date range and/or a status?
For example, what is the row count for MyTable when the DateTime field is between 2013-04-05 and 2013-04-06 and status='warning'?
Thanks.
UPDATE-1
In case I wasn't clear, I require the total number of rows available so that I can determine the number of pages required that will match my query when using 'paging' feature. For example, if a page returns 20 records and my total number of records matching my query is 235, I know I'll need to display 12 buttons below my grid.
01 - (row 1 to 20) - 20 rows displayed in grid.
02 - (row 21 to 40) - 20 rows displayed in grid.
...
11 - (row 200 to 220) - 20 rows displayed in grid.
12 - (row 221 to 235) - 15 rows displayed in grid.
There will be additional logic added to handle a large amount of pages but that's a UI issue, so this is out of scope for this topic.
My problem with using "Select count(*) from MyTable" is that it is taking 40+ seconds on 9 million records (thought it isn't anymore and I need to find out why!) but using this method I was able to add the same filter as my query to determine the query. For example,
SELECT COUNT(*) FROM [MyTable]
WHERE [DateTime] BETWEEN '2018-04-05' AND '2018-04-06' AND
[Status] = 'Warning'
Once I determine the page count, I would then run the same query but include the fields instead of count(*), the CurrentPageNo and PageSize in order to filter my results by page number using the row ids and navigate to a specific pages if needed.
SELECT RowId, DateTime, Status, Message FROM [MyTable]
WHERE [DateTime] BETWEEN '2018-04-05' AND '2018-04-06' AND
[Status] = 'Warning' AND
RowId BETWEEN (CurrentPageNo * PageSize) AND ((CurrentPageNo + 1) * PageSize)
Now, if I use the other mentioned method to get the row count i.e.
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('MyTable')
AND (index_id=0 or index_id=1)
It returns the count instantly but how do I filter this so that I can include the same filters as if I was using the SELECT COUNT(*) method, so I could end up with something like:
SELECT SUM (row_count)
FROM sys.dm_db_partition_stats
WHERE object_id=OBJECT_ID('MyTable') AND
(index_id=0 or index_id=1) AND
([DateTime] BETWEEN '2018-04-05' AND '2018-04-06') AND
([Status] = 'Warning')
The above clearing won't work as I'm querying the dm_db_partition_stats but I would like to know if I can somehow perform a join or something similar to provide me with the total number of rows instantly but it needs to be filtered rather than apply to the entire table.
Thanks.
Have you ever asked for directions to alpha centauri? No? Well the answer is, you can't get there from here.
Adding indexes, re-orgs/re-builds, updating stats will only get you so far. You should consider changing your approach.
sp_spaceused will return the record count typically instantly; You may be able to use this, however depending (which you've not quite given us enough information) on what you are using the count for might not be adequate.
I am not sure if you are trying to use this count as a means to short circuit a larger operation or how you are using the count in your application. When you start to highlight 1.4 billion records and you're looking for a window in said set, it sounds like you might be a candidate for partitioned tables.
This allows you assign several smaller tables, typically separated by date, years / months, that act as a single table. When you give the date range on 1.4+ Billion records, SQL can meet performance expectations. This does depend on SQL Edition, but there is also view partitioning as well.
Kimberly Tripp has a blog and some videos out there, and Kendra Little also has some good content on how they are used and how to set them up. This would be a design change. It is a bit complex and not something you would want implement on a whim.
Here is a link to Kimberly's Blog: https://www.sqlskills.com/blogs/kimberly/sqlskills-sql101-partitioning/
Dev banter:
Also, I hear you blaming SQL, are you using entity framework by chance?

SQL add up rows in a column

I'm running SQL queries in Orion Report Writer for Solarwinds Netflow Traffic Analyzer and am trying to add up data usage for specific conversations coming from the same general sources. In this case it is netflix. I've made some progress with my query.
SELECT TOP 10000 FlowCorrelation_Source_FlowCorrelation.FullHostname AS Full_Hostname_A,
SUM(NetflowConversationSummary.TotalBytes) AS SUM_of_Bytes_Transferred,
SUM(NetflowConversationSummary.TotalBytes) AS Total_Bytes
FROM
((NetflowConversationSummary LEFT OUTER JOIN FlowCorrelation FlowCorrelation_Source_FlowCorrelation ON (NetflowConversationSummary.SourceIPSort = FlowCorrelation_Source_FlowCorrelation.IPAddressSort)) LEFT OUTER JOIN FlowCorrelation FlowCorrelation_Dest_FlowCorrelation ON (NetflowConversationSummary.DestIPSort = FlowCorrelation_Dest_FlowCorrelation.IPAddressSort)) INNER JOIN Nodes ON (NetflowConversationSummary.NodeID = Nodes.NodeID)
WHERE
( DateTime BETWEEN 41539 AND 41570 )
AND
(
(FlowCorrelation_Source_FlowCorrelation.FullHostname LIKE 'ipv4_1.lagg0%')
)
GROUP BY FlowCorrelation_Source_FlowCorrelation.FullHostname, FlowCorrelation_Dest_FlowCorrelation.FullHostname, Nodes.Caption, Nodes.NodeID, FlowCorrelation_Source_FlowCorrelation.IPAddress
So I've got an output that filters everything but netflix sessions (Full_Hostname_A) and their total usage for each session (Sum_Of_Bytes_Transferred)
I want to add up Sum_Of_Bytes_Transferred to get a total usage for all netflix sessions
listed, which will output to Total_Bytes. I created the column Total_Bytes, but don't know how to output a total to it.
For some asked clarification, here is the output from the above query:
I want the Total_Bytes Column to be all added up into one number.
I have no familiarity with the reporting tool you are using.
From reading your post I'm thinking you want the the first 2 columns of data that you've got, plus at a later point in the report, a single figure being the sum of the total_bytes column you're already producing.
Your reporting tool probably has some means of totalling a column, but you may need to get the support people for the reporting tool to tell you how to do that.
Aside from this, if you can find a way of calling a separate query in a latter section of the report, or if you embed a new report inside your existing report, after the detail section, and use that to run a separate query then you should be able to get the data you want with this:
SELECT Sum(Total_Bytes) as [Total Total Bytes]
FROM ( yourExistingQuery ) x
yourExistingQuery means the query you've already got, in full (doesnt have to be put on one line), the paretheses are required, and so is the "x". (The latter provides a syntax-required name for the virtual table which your query defines).
Hope this helps.

Displaying data in grid view page by page

I have more than 30,000 rows in a table. It takes a lot of time to load all the data in the gridview. So I want to display 100 rows at a time. When I click next page button, another 100 rows should be displayed. When I click previous page button, previous 100 rows should be displayed. If I type page 5 in a text box, then I want to jump over to the 5th lot of rows.
I also want to display how many pages there will be. Can we implement this concept in vb.net [winform] gridview. I am using database PostgreSQL.
Can anybody give me a hint or some concept?
Look at OFFSET and LIMIT in PostgreSQL.
Your query for the 5th page could look like this:
SELECT *
FROM tbl
ORDER BY id
OFFSET 400
LIMIT 100;
id is the primary key in my example, therefore an index is in place automatically.
If you access the table a lot in this fashion, performance may profit from using CLUSTER.
Total number of pages:
SELECT ceil(1235::real / 100)::int
FROM tbl;
If you wanted the number rounded down, just simplify to:
SELECT 1235 / 100
FROM tbl;
With both numbers being integer the result will be an integer type and fractional digits truncated automatically. But I think you need to round up here.

What is an unbounded query?

Is an unbounded query a query without a WHERE param = value statement?
Apologies for the simplicity of this one.
An unbounded query is one where the search criteria is not particularly specific, and is thus likely to return a very large result set. A query without a WHERE clause would certainly fall into this category, but let's consider for a moment some other possibilities. Let's say we have tables as follows:
CREATE TABLE SALES_DATA
(ID_SALES_DATA NUMBER PRIMARY KEY,
TRANSACTION_DATE DATE NOT NULL
LOCATION NUMBER NOT NULL,
TOTAL_SALE_AMOUNT NUMBER NOT NULL,
...etc...);
CREATE TABLE LOCATION
(LOCATION NUMBER PRIMARY KEY,
DISTRICT NUMBER NOT NULL,
...etc...);
Suppose that we want to pull in a specific transaction, and we know the ID of the sale:
SELECT * FROM SALES_DATA WHERE ID_SALES_DATA = <whatever>
In this case the query is bounded, and we can guarantee it's going to pull in either one or zero rows.
Another example of a bounded query, but with a large result set would be the one produced when the director of district 23 says "I want to see the total sales for each store in my district for every day last year", which would be something like
SELECT LOCATION, TRUNC(TRANSACTION_DATE), SUM(TOTAL_SALE_AMOUNT)
FROM SALES_DATA S,
LOCATION L
WHERE S.TRANSACTION_DATE BETWEEN '01-JAN-2009' AND '31-DEC-2009' AND
L.LOCATION = S.LOCATION AND
L.DISTRICT = 23
GROUP BY LOCATION,
TRUNC(TRANSACTION_DATE)
ORDER BY LOCATION,
TRUNC(TRANSACTION_DATE)
In this case the query should return 365 (or fewer, if stores are not open every day) rows for each store in district 23. If there's 25 stores in the district it'll return 9125 rows or fewer.
On the other hand, let's say our VP of Sales wants some data. He/she/it isn't quite certain what's wanted, but he/she/it is pretty sure that whatever it is happened in the first six months of the year...not quite sure about which year...and not sure about the location, either - probably in district 23 (he/she/it has had a running feud with the individual who runs district 23 for the past 6 years, ever since that golf tournament where...well, never mind...but if a problem can be hung on the door of district 23's director so be it!)...and of course he/she/it wants all the details, and have it on his/her/its desk toot sweet! And thus we get a query that looks something like
SELECT L.DISTRICT, S.LOCATION, S.TRANSACTION_DATE,
S.something, S.something_else, S.some_more_stuff
FROM SALES_DATA S,
LOCATIONS L
WHERE EXTRACT(MONTH FROM S.TRANSACTION_DATE) <= 6 AND
L.LOCATION = S.LOCATION
ORDER BY L.DISTRICT,
S.LOCATION
This is an example of an unbounded query. How many rows will it return? Good question - that depends on how business conditions were, how many location were open, how many days there were in February, etc.
Put more simply, if you can look at a query and have a pretty good idea of how many rows it's going to return (even though that number might be relatively large) the query is bounded. If you can't, it's unbounded.
Share and enjoy.
http://hibernatingrhinos.com/Products/EFProf/learn#UnboundedResultSet
An unbounded result set is where a query is performed and does not explicitly limit the number of returned results from a query. Usually, this means that the application assumes that a query will always return only a few records. That works well in development and in testing, but it is a time bomb waiting to explode in production.
The query may suddenly start returning thousands upon thousands of rows, and in some cases, it may return millions of rows. This leads to more load on the database server, the application server, and the network. In many cases, it can grind the entire system to a halt, usually ending with the application servers crashing with out of memory errors.
Here is one example of a query that will trigger the unbounded result set warning:
var query = from post in blogDataContext.Posts
where post.Category == "Performance"
select post;
If the performance category has many posts, we are going to load all of them, which is probably not what was intended. This can be fixed fairly easily by using pagination by utilizing the Take() method:
var query = (from post in blogDataContext.Posts
where post.Category == "Performance"
select post)
.Take(15);
Now we are assured that we only need to handle a predictable, small result set, and if we need to work with all of them, we can page through the records as needed. Paging is implemented using the Skip() method, which instructs Entity Framework to skip (at the database level) N number of records before taking the next page.
But there is another common occurrence of the unbounded result set problem from directly traversing the object graph, as in the following example:
var post = postRepository.Get(id);
foreach (var comment in post.Comments)
{
// do something interesting with the comment
}
Here, again, we are loading the entire set without regard for how big the result set may be. Entity Framework does not provide a good way of paging through a collection when traversing the object graph. It is recommended that you would issue a separate and explicit query for the contents of the collection, which will allow you to page through that collection without loading too much data into memory.

Paging in SQL with LIMIT/OFFSET sometimes results in duplicates on different pages

I'm developing an online gallery with voting and have a separate table for pictures and votes (for every vote I'm storing the ID of the picture and the ID of the voter). The tables related like this: PICTURE <--(1:n, using VOTE.picture_id)-- VOTE. I would like to query the pictures table and sort the output by votes number. This is what I do:
SELECT
picture.votes_number,
picture.creation_date,
picture.author_id,
picture.author_nickname,
picture.id,
picture.url,
picture.name,
picture.width,
picture.height,
coalesce(anon_1."totalVotes", 0)
FROM picture
LEFT OUTER JOIN
(SELECT
vote.picture_id as pid,
count(*) AS "totalVotes"
FROM vote
WHERE vote.device_id = <this is the query parameter> GROUP BY pid) AS anon_1
ON picture.id = anon_1.pid
ORDER BY picture.votes_number DESC
LIMIT 10
OFFSET 0
OFFSET is different for different pages, of course.
However, there are pictures with the same ID that are displayed on the different pages. I guess the reason is the sorting, but can't construct any better query, which will not allow duplicates. Could anybody give me a hint?
Thanks in advance!
Do you execute one query per page to display? If yes, I suspect that the database doesn't guarantee a consitent order for items with the same number of votes. So first query may return { item 1, item 2 } and a 2nd query may return { item 2, item 1} if both items have same number of votes. If the items are actually items 10 and 11, then the same item may appear on page 1 and then on page 2.
I had such a problem once. If that's also your case, append an extra clause to the order by to ensure a consistent ordering of items with same vote number, e.g.:
ORDER BY picture.vote, picture.ID
The simples explanation is that you had some data added or some votes occured when you was looking at different pages.
I am sure if you would sorte by ID or creation_date this issue would go away.
I.e. there is no issue with your code
in my case this problem was due to the Null value in the Order By clause, i solved this by adding another Unique ID field in Order By Clause along with other field.