Question:
I have a website where I gather browser statistics.
Thus, I have an SQL table (T_Visits), with the following columns:
uniqueidentifier Visit_UID,
uniqueidentifier User_UID,
datetime Visit_DateTime,
float Screen_w,
float Screen_h,
float Resolution = Screen_w * Screen_h
varchar resolutionstring = screen_w + ' x ' + screen_h
Since a user can visit the site from several computers, there can be different entries in screensize for each visit for the same user.
Now I want to get the maximum/minimum resolution each user had:
Select User_UID, max(resolution) from T_Visits GROUP BY User_UID
How can I get the corresponding resolution string ?
I mean I can get the max(screen_w) and max(screen_h), but there's no guarantee that the corresponding resolutionstring would be max(screen_w) +' x '+ max(screen_h)
Try something like:
;WITH resCTE
AS
(
SELECT User_UID
,resolutionstring
,ROW_NUMBER() OVER (PARTITION BY User_UID
ORDER BY Resolution desc
,Screen_w desc
) AS rnMax
,ROW_NUMBER() OVER (PARTITION BY User_UID
ORDER BY Resolution
,Screen_w
) AS rnMin
)
SELECT maxr.User_UID
,maxr.resolutionstring AS maxRes
,minr.resolutionstring AS minRes
FROM resCTE AS maxr
JOIN resCTE AS minr
ON minr.User_UID = maxr.User_UID
AND minr.rnMin = 1
WHERE maxr.rnMax = 1
(untested)
Note that this assumes you only want to see 1 row per user id, regardless of whether more than one HxW gives the same resolution.
It would be possible to modify the query to use RANK() rather than ROWNUMBER() if this isn't the behaviour you want.
EDIT
Amended to show the max/min resolution sub-sorted by screen width
SELECT DISTINCT T_Visits.User_UID, T_Visits.Resolution, T_Visits.resolutionstring
FROM T_Visits
INNER JOIN (SELECT User_UID, max(resolution) AS max
FROM T_Visits
GROUP BY User_UID) temp
ON T_Visits.User_UID = temp.User_UID
AND T_Visits.Resolution = temp.max
This query first creates a temp table of each users id and max resolution, then inner joins that with the T_Visits table matching the user id and resolution fields, which should give you the corresponding resolutionstring(s).
There are some problems though with this kind of query however. First off, while the DISTINCT takes care of multiple rows being returned for the same resolutionstring, it should still return multiple rows per user if they have multiple monitors with the same resolution. For example, what if someone visits your site with an iPhone, and you record a hit with 320x480, but then they turn their phone sideways and hit your site again, which should now register 480x320 because their X and Y values are now swapped due to orientation. This would produce multiple max resolution hits with different resolutionstrings.
The same thing can happen for monitors. It is not uncommon for document editors to rotate their monitors for a more "legal paper" style view. However, when they visit your site from their homes, they might not have the same setup, but do have the same resolution.
What exactly would you want your query to return if that is the case?
This should work to get a list of User_UID with the string of the maximum resolution. Adapt it to get all the minimum resolutions. Maybe its not the most efficient way...
Select User_UID, resolutionstring from T_Visits as p
WHERE resolution = (Select max(resolution) from T_Visits where User_UID =p.User_UID)
Related
In a project I am working on there are measurements stored in a database. A measurement consists of a worldcoordinate (posX, posY, posZ) a station identification number (stationID) and a time for measurement (time).
Sometimes a measurement is redone in the field for different reasons and then there are several measurements with the same coordinate and station id but performed at different times.
Is there a way to write an sql query such that I get all VALID measurements ie, only the latest ones in the case where the coordinates and station id are the same?
I am not very adept at SQL so I don't even really know what to google for so any pointers are very much appreciateed even if you only know what type of command I should use :)
EDIT:
My task was just changed, apparently station id does not matter, only coordinates and times.
Also, I am using DISQLite3 that implements SQL-92.
Yes, you can do it in SQL.
It seems you want to take the latest entry for each combination of station and co-ordinates - look at GROUP BY or ROW_NUMBER()
Depending on your SQL variant (It's helpful if you specify it), something like...
select *
from
(Select *,
row_number() over (Partition by coordinates, stationid order by measurementtime desc) rn
from yourtable
) v
where rn = 1
Without Ranking functions
select yourtable.*
from yourtable
inner join
(
select coordinate, MAX(time) maxtime from yourtable
group by coordinate
) v
on yourtable.coordinate = v.coordinate
and yourtable.time = v.maxtime
Given a table of responses with columns:
Username, LessonNumber, QuestionNumber, Response, Score, Timestamp
How would I run a query that returns which users got a score of 90 or better on their first attempt at every question in their last 5 lessons? "last 5 lessons" is a limiting condition, rather than a requirement, so if they completely only 1 lesson, but got all of their first attempts for each question right, then they should be included in the results. We just don't want to look back farther than 5 lessons.
About the data: Users may be on different lessons. Some users may have not yet completed five lessons (may only be on lesson 3 for example). Each lesson has a different number of questions. Users have different lesson paths, so they may skip some lesson numbers or even complete lessons out of sequence.
Since this seems to be a problem of transforming temporally non-uniform/discontinuous values into uniform/contiguous values per-user, I think I can solve the bulk of the problem with a couple ranking function calls. The conditional specification of scoring above 90 for "first attempt at every question in their last 5 lessons" is also tricky, because the number of questions completed is variable per-user.
So far...
As a starting point or hint at what may need to happen, I've transformed Timestamp into an "AttemptNumber" for each question, by using "row_number() over (partition by Username,LessonNumber,QuestionNumber order by Timestamp) as AttemptNumber".
I'm also trying to transform LessonNumber from an absolute value into a contiguous ranked value for individual users. I could use "dense_rank() over (partition by Username order by LessonNumber desc) as LessonRank", but that assumes the order lessons are completed corresponds with the order of LessonNumber, which is unfortunately not always the case. However, let's assume that this is the case, since I do have a way of producing such a number through a couple of joins, so I can use the dense_rank transform described to select the "last 5 completed lessons" (i.e. LessonRank <= 5).
For the >90 condition, I think I can transform the score into an integer so that it's "1" if >= 90, and "0" if < 90. I can then introduce a clause like "group by Username having SUM(Score)=COUNT(Score).", which will select only those users with all scores equal to 1.
Any solutions or suggestions would be appreciated.
You kind of gave away the solution:
SELECT DISTINCT Username
FROM Results
WHERE Username NOT in (
SELECT DISTINCT Username
FROM (
SELECT
r.Username,r.LessonNumber, r.QuestionNumber, r.Score, r.Timestamp
, row_number() over (partition by r.Username,r.LessonNumber,r.QuestionNumber order by r.Timestamp) as AttemptNumber
, dense_rank() over (partition by r.Username order by r.LessonNumber desc) AS LessonRank
FROM Results r
) as f
WHERE LessonRank <= 5 and AttemptNumber = 1 and Score < 90
)
Concerning the LessonRank, I used exactly what you desribed since it is not clear how to order the lessons otherwise: The timestamp of the first attempt of the first question of a lesson? Or the timestamp of the first attempt of any question of a lesson? Or simply the first(or the most recent?) timestamp of any result of any question of a lesson?
The innermost Select adds all the AttemptNumber and LessonRank as provided by you.
The next Select retains only the results which would disqualify a user to be in the final list - all first attempts with an insufficient score in the last 5 lessons. We end up with a list of users we do not want to display in the final result.
Therefore, in the outermost Select, we can select all the users which are not in the exclusion list. Basically all the other users which have answered any question.
EDIT: As so often, second try should be better...
One more EDIT:
Here's a version including your remarks in the comments.
SELECT Username
FROM
(
SELECT Username, CASE WHEN Score >= 90 THEN 1 ELSE 0 END AS QuestionScoredWell
FROM (
SELECT
r.Username,r.LessonNumber, r.QuestionNumber, r.Score, r.Timestamp
, row_number() over (partition by r.Username,r.LessonNumber,r.QuestionNumber order by r.Timestamp) as AttemptNumber
, dense_rank() over (partition by r.Username order by r.LessonNumber desc) AS LessonRank
FROM Results r
) as f
WHERE LessonRank <= 5 and AttemptNumber = 1
) as ff
Group BY Username
HAVING MIN(QuestionScoredWell) = 1
I used a Having clause with a MIN expression on the calculated QuestionScoredWell value.
When comparing the execution plans for both queries, this query is actually faster. Not sure though whether this is partially due to the low number of data rows in my table.
Random suggestions:
1
The conditional specification of scoring above 90 for "first attempt at every question in their last 5 lessons" is also tricky, because the number of questions is variable per-user.
is equivalent to
There exists no first attempt with a score <= 90 most-recent 5 lessons
which strikes me as a little easier to grab with a NOT EXISTS subquery.
2
First attempt is the same as where timestamp = (select min(timestamp) ... )
You need to identify the top 5 lessons per user first, using the timestamp to prioritize lessons, then you can limit by score. Try:
Select username
from table t inner join
(select top 5 username, lessonNumber
from table
order by timestamp desc) l
on t.username = l.username and t.lessonNumber = l.lessonNumber
from table
where score >= 90
I have a table of users profiles. Every user can have many profiles and the user has the ability to arange the order of how they will be displayed in a grid.
There are 2 tables Users and Profiles (1:M)
I've added a orderby column to the Users table where will be values like 1,2,3..
So far it seems to be okay. But when a user will change the order of the last record to be the first I have to go throught the all records and increment their values +1. This seems to me pretty ugly.
Is there any more convenient solution for this kind of situation ?
Leave gaps in the sequence or use a decimal rather than an integer data type.
The best solution is one which mirrors functionality, and that's a simple list of integers. Keeping the list in order is only a few SQL statements, and easier to understand than the other solutions suggested (floats, gapped integers).
If your lists were very large (in the tens of thousands) then performance considerations might come into play, but I assume these lists aren't that long.
How about using floating points for the order by column?
This way, you can always squeeze a profile between two others, without having to change those two values.
Eg if I want to place profile A between profiles B (ordervalue 1) and C (ordervalue 2), I can assign ordervalue 1.5 to A.
To place it on top, where before the top used to have ordervalue say 1, you can use ordervalue 0.5
There's no reason to have integers for orderby and no reason to have increments of 1 between the order of profiles.
If the data set is small (which seems to be the case), I'd prefer to use a normal list of integers and update them in batch when a profile gets a new position. This better reflects the application functionality.
In Sql Server, for the following table User_Profiles (user_id, profile_id, position), I'd have something like this:
--# The variables are:
--# #user_id - id of the user
--# #profile_id - id of the profile to change
--# #new_position - new position that the profile will take
--# #old_position - current position of the profile
select #old_position = position
from User_Profiles where
user_id = #user_id and profile_id = #profile_id
update p set position = pp.new_position
from User_Profiles p join (
select user_id, profile_id,
case
when position = #old_position then #new_position
when #new_position > #old_position then --# move up
case
when #old_position < position and
position <= #new_position
then position - 1
else position
end
when #new_position < #old_position then --# move down
case
when position < #old_position and
#new_position <= position
then position + 1
else position
end
else position --# the same
end as new_position
from User_Profiles p where user_id = #user_id
) as pp on
p.user_id = pp.user_id and p.profile_id = pp.profile_id
As a user adds profiles, set each new profile's ordering number to the previous one +1000000. e.g. to start off with:
p1 1000000
p2 2000000
p3 3000000
When reordering, set the profile's order to the middle of the two it is going in between:
p1 1000000
p2 2000000
p3 1500000
This gives the order p1,p3,p2
I think instead of keeping order in the orderby column you can introduce linklist concept to your design. Add column like nextId that will contain the next profile in the chain.
When you query the profiles table you can sort out profiles in your code (java, C#, etc)
I think the idea of leaving gaps between the orders is interesting but I don't know if it a "more convenient" solution for your problem.
I think you would be better off just updating your order by column. Because you are still going to have to determine what rows the statuses have moved between, and what to do if two statuses are switched in position (Do you calculate the new order by value for the first one then the second one). What happens if the gap between isn't large enough?
It shouldn't be that data intensive to just enumerate down the order they put it in and update each record to the order.
I am trying to wrap my head around this one this morning.
I am trying to show inventory status for parts (for our products) and this query only becomes complex if I try to return all parts.
Let me lay it out:
single table inventoryReport
I have a distinct list of X parts I wish to display, the result of which must be X # of rows (1 row per part showing latest inventory entry).
table is made up of dated entries of inventory changes (so I only need the LATEST date entry per part).
all data contained in this single table, so no joins necessary.
Currently for 1 single part, it is fairly simple and I can accomplish this by doing the following sql (to give you some idea):
SELECT TOP (1) ldDate, ptProdLine, inPart, inSite, inAbc, ptUm, inQtyOh + inQtyNonet AS in_qty_oh, inQtyAvail, inQtyNonet, ldCustConsignQty, inSuppConsignQty
FROM inventoryReport
WHERE (ldPart = 'ABC123')
ORDER BY ldDate DESC
that gets me my TOP 1 row, so simple per part, however I need to show all X (lets say 30 parts). So I need 30 rows, with that result. Of course the simple solution would be to loop X# of sql calls in my code (but it would be costly) and that would suffice, but for this purpose I would love to work this SQL some more to reduce the x# calls back to the db (if not needed) down to just 1 query.
From what I can see here I need to keep track of the latest date per item somehow while looking for my result set.
I would ultimately do a
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
to limit the parts I need. Hopefully I made my question clear enough. Let me know if you have an idea. I cannot do a DISTINCT as the rows are not the same, the date needs to be the latest, and I need a maximum of X rows.
Thoughts? I'm stuck...
SELECT *
FROM (SELECT i.*,
ROW_NUMBER() OVER(PARTITION BY ldPart ORDER BY ldDate DESC) r
FROM inventoryReport i
WHERE ldPart in ('ABC123', 'BFD21', 'AA123', etc)
)
WHERE r = 1
EDIT: Be sure to test the performance of each solution. As pointed out in this question, the CTE method may outperform using ROW_NUMBER.
;with cteMaxDate as (
select ldPart, max(ldDate) as MaxDate
from inventoryReport
group by ldPart
)
SELECT md.MaxDate, ir.ptProdLine, ir.inPart, ir.inSite, ir.inAbc, ir.ptUm, ir.inQtyOh + ir.inQtyNonet AS in_qty_oh, ir.inQtyAvail, ir.inQtyNonet, ir.ldCustConsignQty, ir.inSuppConsignQty
FROM cteMaxDate md
INNER JOIN inventoryReport ir
on md.ldPart = ir.ldPart
and md.MaxDate = ir.ldDate
You need to join into a Sub-query:
SELECT i.ldPart, x.LastDate, i.inAbc
FROM inventoryReport i
INNER JOIN (Select ldPart, Max(ldDate) As LastDate FROM inventoryReport GROUP BY ldPart) x
on i.ldPart = x.ldPart and i.ldDate = x.LastDate
basically i have albums, which has 50 images init.. now if i show list of images, i know from which to which row is showing (showing: 20 to 30 of 50), means showing 10 rows from 20 - 30. well now the problem is, i want to select an image, but still show which postion was it selected, so i can move back and forth, but keep the postion too.
like if i select 5th image, which id is 'sd564', i want to show (6 of 50 images), means you are seeing 6th of 50 images.. if i get next row id and show that, then, i want to show (7 of 50 images).
well i can do all this from pagination pointer easily, like in url say (after=5, after=6)... its moving with postion, but what if i dont have this (after=6) and just have an id, how can i still do that?
i dont want to use (after=6) also because its dynamic site and images adds and delete, so position chnages and sharing with someone else and going back on same old link, then it would be wrong position.
what kind of sql query should i be running for this?
currently i have
select * from images where id = 'sd564';
obviously i need to add limit or some other thing in query to get what i want or maybe run another query to get the result, while keeping this old query inplace too. anyway i just want positioning. i hope you can help me solve this
Example: http://media.photobucket.com/image/color%20splash/aly3265/converse.jpg
sample http://img41.imageshack.us/img41/5631/viewing3of8240.png
Album Query Request (check post below)
select images.* from images, album
where album_id = '5'
and album_id = image_album_id
order by created_date DESC
limit ....;
Assuming created_date is unique per album_id and (album_id,created_date) is unique for all rows in images, then this:
select i1.*, count(*) as position
from images i1
inner join images i2
on i1.album_id = i2.album_id -- get all other pics in this album
and i1.created_date >= i2.created_date -- in case they were created before this pic
where i1.album_id = 5
group by i1.created_date
will reliably get you the images and their position. Please understand that this will only work reliably in case (album_id,created_date) are unique throughout the images table. If that is not the case, the position wont be reliable, and you might not see all photos due to the GROUP BY. Also note that a GROUP BY clause like this, only listing some of the columns that appear in the SELECT list (in this case images.*) is not valid in most RDBMS-es. For a detailed discussion on that matter, see: http://dev.mysql.com/tech-resources/articles/debunking-group-by-myths.html
By doing this:
select i1.*, count(*) as position
from images i1
inner join images i2
on i1.album_id = i2.album_id -- get all other pics in this album
and i1.created_date >= i2.created_date -- in case they were created before this pic
where i1.album_id = 5
group by i1.created_date
having count(*) = 4
you select the image at the 4th position (note the having count(*) = 4)
By doing this:
select i1.*, count(*) as position
from images i1
inner join images i2
on i1.album_id = i2.album_id -- get all other pics in this album
and i1.created_date >= i2.created_date -- in case they were created before this pic
where i1.album_id = 5
group by i1.created_date
having count(*) between 1 and 10
you select all photos with positions 1 through 10 (note the having clause again.)
Of course, if you just want one particular image, you can simply do:
select i1.*, count(*) as position
from images i1
inner join images i2
on i1.album_id = i2.album_id -- get all other pics in this album
and i1.created_date >= i2.created_date -- in case they were created before this pic
where i1.image_id = 's1234'
group by i1.created_date
This will correctly report the position of the image within the album (of course, assuming that image_id is unique with in the images table). You don't need the having clause in that case since you already pinpointed the image you want.
From what you are saying here:
dont want to use (after=6) also because its dynamic site and images adds and delete, so position chnages and sharing with someone else and going back on same old link, then it would be wrong position.
I get the impression that this is not a SQL problem at all. The problem is that the positions of the fotos are local to the search resultset. To reliably naviate by position, you would need to make a snapshot (no pun intended) of some kind. That is, you need to have some way to "freeze" the dataset while it is being browsed.
A simple way to do it, would be to execute the search, and cache the result outside of the actual current datastore. For example, you could use "scratch tables" in your database, simply store it in temporary files, or in some memory caching layer if you have the mem for it. With this model, you'd let the user browse the resultset from the cache, and you would need to clean out the cache when the user's session ends (or after some timeout, you don't want to kill your server because some users don't log out)
Another way to do it, is to simply allow yourself to lie now and then. Let's say you have result pages of 10 images, and a typical search delivers 50 pages of results. Well, you could simply send a resultset for a fixed number of items, say 100 photos (so 10 pages) to the client. These search results would then be your snapshot, and contain references to the actual pictures. If you are storing the URLS in the database , and not the binary data, this reference is simply the URL. Or you could store the database Id there. Anyway, the user is allowed to browse the initial resultset, and chances are that they never browse the entire set. If they do, you re-execute the query on the server side for the next chunk of pages. If many photos were added in the mean time that would end up at positions 1..100, then the user will see stale data: that's the price they pay for having so much time on their hands that they can allow themselvs to browse 10 pages of 10 photos.
(of course, you should tweak the parameters to your liking but you get the idea I'm sure.)
If you don't want to 'lie' and it is really important that people can reliably browse all the results they searched, you could extend your database schema to support snapshots at that level. Now asssuming that there are only two operations for photos, namely "add" and "delete", you would have a TIMESTAMP_ADDED and a TIMESTAMP_REMOVED in your photo table. On add, you do the INSERT in your db, and fill TIMESTAMP_ADDED with the currrent timestamp. The TIMESTAMP_REMOVED would be filled with the theoretical maximum value for whatever data type you like to use to store the timestamp (For this particular case I would probably go for an INT column and simply store the UNIX_TIMESTAMP) On delete, you don't DELETE the row from the db, rather, you mark it as deleted by updating TIMESTAMP_REMOVED column, setting it to the current timestamp. Now when you have to do a search, you use a query like:
SELECT *
FROM photo
WHERE timestamp_added < timestamp_of_initial_search
AND timestamp_removed > timestamp_of_initial_search
AND ...various search criteria...
ORDER BY ...something
LIMIT ...page offset and num items in page...
The timestamp_of_initial_search is the timestamp of executing the initial search for a particular set of criteria. You should store that in the application session while the user is browsing a particular search resultet so you can use that in the subsequent queries required for fetching the pages. The first two WHERE criteria are there to implement the snapshot. The condition timestamp_added < timestamp_of_initial_search ensures we can only see photos that were added before the timestamp of executing the search. The condition timestamp_removed > timestamp_of_initial_search ensures we only search that were not already removed by the time the initial search was executed.
Of course, you still have to do something with the photos that were marked for delete. You could schedule periodical physical deletion for all photos that have a timestamp removed that is smaller than any of the current search resultsets.
If I understood your problem correctly, you can use the Row_Number() function (in SQL Server). To get the desired result, you can use a query something similar to this:
select images1.* from
(SELECT ROW_NUMBER() OVER (ORDER BY image_album_id) as rowID,(SELECT COUNT(*) FROM images) AS totCount, * FROM images) images1
JOIN album ON (album_id = images1.image_album_id)
where album_id = '5'
order by images1.image_album_id
limit ....;
Here the images.rowid gives you the position of the row and images.totCount give you the total number of rows.
Hope it helps.
Thnks.