How do you SELECT items based on COUNT(*) from another table? - SQLite - sql

My tables are like so:
Tile
Tile
---------------------------------
Id integer not null primary key
TestSet
TestSet
--------------------------------
Id integer not null primary key,
TileId integer not null,
CreatedAt datetime not null,
Outcome boolean
There is a 1:n relationship between Tile and TestSet.
I want to get the tiles that have a testSet with a true Outcome value on the most recent testSet (ordered by the CreatedAt column). My first attempt at it doesn't work as I want it to.
SELECT * FROM Tile
JOIN (SELECT TileId FROM (SELECT * FROM TestSet
WHERE tileId == 'Tile1'
ORDER BY __createdAt DESC
LIMIT 1)
WHERE Outcome=1) as ts ON ts.TileId == Tile.id;
The problem with the above statement is that the WHERE clause in the most inner SELECT statement is hardcoded.
Here's how I broke up the process:
Grab the most recent testSet.
SELECT only the tileId column from that testSet.
JOIN the tile table on the above statements to get a list of all the tiles.
I feel like I'm thinking about this the wrong way and I shouldn't really be doing a JOIN. But I don't have enough SQL experience to know exactly how to go about this problem. I'm using SQLite specifically in a mobile app, so are there any SQLite statements that can help me get the correct set of tiles?

Here's an alternate method. Since the tileid is already there in TestSet, this example uses only TestSet, but you can feel free to JOIN it with Tile table.
select *
from testset t1
where createdat in
(
-- search the same table for the tileid and also ensure outcome is 1
-- sort it by createAt latest to oldest date and choose only the first
-- record
select createdat
from testset t2
where
t2.tileid = t1.tileid
and outcome = 1
order by t2.createdat desc
limit 1
);
Example: http://sqlfiddle.com/#!7/91548/3

SELECT Id FROM Tile
JOIN (SELECT TileId, Outcome, MAX(CreatedAt)
FROM TestSet
GROUP BY TileId, Outcome
) AS ts
ON ts.TileId == Tile.id
WHERE Outcome = 1;

Related

SQL WHERE Clausule to get rows depending on the database content

Use case:
I have the customer_id and the task_id.
The database will always contain registers with a filled customer_id and empty task_id.
Sometimes will have the task_id filled. (as the example below)
Example 1
SELECT *
FROM table
WHERE customer_id = 11422412
AND task_id = 28870055
Here I expect to return the last two rows.
Example 2
SELECT *
FROM table
WHERE customer_id = 11432515
AND task_id = 22256884
Here I expect to return the only empty row.
Question:
How do I create a SQL Query to make sure that, in case the task_id exists in the database, I only return the records with task_id?
You could do something like the following with LIMIT. This will match the empty task_id and the set task_id (if it exists), order them so that the row with non-empty task_id comes first (if it exists), then return only the first one. (NULLS LAST is default sorting behavior in Postgre)
SELECT *
FROM table
WHERE customer_id = 11432515
AND (task_id = 22256884 OR task_id IS NULL)
ORDER BY task_id
LIMIT 1
I am assuming that you always want exactly one row like in your examples.
But there are other ways of doing it depending on your specific scenario (if your final query is more complicated than your examples).
Edited to add another way to handle case where more than one row matches customer_id and task_id:
SELECT *
FROM table t1
WHERE customer_id = 11432515
AND (task_id = 22256884
OR (
task_id is null
AND NOT EXISTS (SELECT * FROM table t2 WHERE t2.customer_id = 11432515 AND t2.task_id = 22256884)
)
)
This doesn't look super elegant, but it should work and you could use it as a starting point at least.

How to return sample row from database one by one

Web page should show one product image for specific product category from PostgreSql database.
This image should changed automatically to other image after every 25 seconds.
Returned product may be random or in some sequence. Some product may be missing and some repeated but most of the products in criteria should returned.
Total available image count may change slightly between sample retrieval
Currently code below is used which is executed after every 25 seconds.
This requires two queries to database: one for count which may be slwo and second for
single image retrieval. In both cases where clauses are duplicated, in real application where clause is very big and changing it requires changes in two places.
How to improve this so that single query returns sample ?
Column types cannot changed, natural primary keys are used. Additional columns, triggers, indexes, sequences can added if this helps.
ASP.NET/Mono MVC3 , npgsql are used.
$count = select count(*)
from products
where prodtype=$sometype and productid in (select productid from images);
$random = next random integer between 0 .. $count-1;
-- $productsample is result: desired sample product
$productsample = select product
from products
where prodtype=$sometype and productid in (select productid from images)
offset $random
limit 1;
create table products ( productid char(20) primary key,
prodtype char(10) references producttype
);
create table images(
id serial primary key,
productid char(20) references products,
mainimage bool
);
An order by will always be expensive specially if the expression in the order by is not indexed. So don't order. In instead do a random offset in the count() as in your queries, but do it all at once.
with t as (
select *
from
products p
inner join
images i using (productid)
where
prodtype = $sometype
)
select *
from t
offset floor(random() * (select count(*) from t))
limit 1
This version might be faster
with t as (
select *, count(*) over() total
from
products p
inner join
images i using (productid)
where
prodtype = $sometype
)
select *
from t
offset floor(random() * (select total from t limit 1))
limit 1
PosgreSQL:
SELECT column FROM table
ORDER BY RANDOM()
LIMIT 1
This gives you one, random row. You can of course add back in your WHERE filter to make sure it is the right category.
This removes your requirement to do a count first; and also has the advantage of letting the database engine do the selection, reducing round trips.
Note: For people looking at ways to do this in other SQL engines: http://www.petefreitag.com/item/466.cfm

How to get one common value from Database using UNION

2 records in above image are from Db, in above table Constraint are (SID and LINE_ITEM_ID),
SID and LINE_ITEM_ID both column are used to find a unique record.
My issues :
I am looking for a query it should fetch the recored from DB depending on conditions
if i search for PART_NUMBER = 'PAU43-IMB-P6'
1. it should fetch one record from DB if search for PART_NUMBER = 'PAU43-IMB-P6', no mater to which SID that item belong to if there is only one recored either under SID =1 or SID = 2.
2. it should fetch one record which is under SID = 2 only, from DB on search for PART_NUMBER = 'PAU43-IMB-P6', if there are 2 items one in SID=1 and other in SID=2.
i am looking for a query which will search for a given part_number depending on Both SID 1 and 2, and it should return value under SID =2 and it can return value under SID=1 only if the there are no records under SID=2 (query has to withstand a load of Million record search).
Thank you
Select *
from Table
where SID||LINE_ITEM_ID = (
select Max(SID)||Max(LINE_ITEM_ID)
from table
where PART_NUMBER = 'PAU43-IMB-P6'
);
If I understand correctly, for each considered LINE_ITEM_ID you want to return only the one with the largest value for SID. This is a common requirement and, as with most things in SQL, can be written in many different ways; the best performing will depend on many factors, not least of which is the SQL product you are using.
Here's one possible approach:
SELECT DISTINCT * -- use a column list
FROM YourTable AS T1
INNER JOIN (
SELECT T2.LINE_ITEM_ID,
MAX(T2.SID) AS max_SID
FROM YourTable AS T2
GROUP
BY T2.LINE_ITEM_ID
) AS DT1 (LINE_ITEM_ID, max_SID)
ON T1.LINE_ITEM_ID = DT1.LINE_ITEM_ID
AND T1.SID = DT1.max_SID;
That said, I don't recall seeing one that relies on the UNION relational operator. You could easily rewrite the above using the INTERSECT relational operator but it would be more verbose.
Well in my case it worked something like this:
select LINE_ITEM_ID,SID,price_1,part_number from (
(select LINE_ITEM_ID,SID,price_1,part_number from Table where SID = 2)
UNION
(select LINE_ITEM_ID,SID,price_1,part_number from Table SID = 1 and line_item_id NOT IN (select LINE_ITEM_ID,SID,price_1,part_number from Table SID = 2)))
This query solved my issue..........

How can I query rankings for the users in my DB, but only consider the latest entry for each user?

Lets say I have a database table called "Scrape" possibly setup like:
UserID (int)
UserName (varchar)
Wins (int)
Losses (int)
ScrapeDate (datetime)
I'm trying to be able to rank my users based on their Wins/Loss ratio. However, each week I'll be scraping for new data on the users and making another entry in the Scrape table.
How can I query a list of users sorted by wins/losses, but only taking into consideration the most recent entry (ScrapeDate)?
Also, do you think it matters that people will be hitting the site and the scrape may possibly be in the middle of completing?
For example I could have:
1 - Bob - Wins: 320 - Losses: 110 - ScrapeDate: 7/8/09
1 - Bob - Wins: 360 - Losses: 122 - ScrapeDate: 7/17/09
2 - Frank - Wins: 115 - Losses: 20 - ScrapeDate: 7/8/09
Where, this represents a scrape that has only updated Bob so far, and is in the process of updating Frank but has yet to be inserted. How would you handle this situation as well?
So, my question is:
How would you handle querying only the most recent scrape of each user to determine the rankings
Do you think the fact that the database may be in a state of updating (especially if a scrape could take up to 1 day to complete), and not all users have completely updated yet matters? If so, how would you handle this?
Thank you, and thank you for your responses you have given me on my related question:
When scraping a lot of stats from a webpage, how often should I insert the collected results in my DB?
This is what I call the "greatest-n-per-group" problem. It comes up several times per week on StackOverflow.
I solve this type of problem using an outer join technique:
SELECT s1.*, s1.wins / s1.losses AS win_loss_ratio
FROM Scrape s1
LEFT OUTER JOIN Scrape s2
ON (s1.username = s2.username AND s1.ScrapeDate < s2.ScrapeDate)
WHERE s2.username IS NULL
ORDER BY win_loss_ratio DESC;
This will return only one row for each username -- the row with the greatest value in the ScrapeDate column. That's what the outer join is for, to try to match s1 with some other row s2 with the same username and a greater date. If there is no such row, the outer join returns NULL for all columns of s2, and then we know s1 corresponds to the row with the greatest date for that given username.
This should also work when you have a partially-completed scrape in progress.
This technique isn't necessarily as speedy as the CTE and RANKING solutions other answers have given. You should try both and see what works better for you. The reason I prefer my solution is that it works in any flavor of SQL.
Try something like:
Select user id and max date of last entry for each user.
Select and order records to get ranking based on above query results.
This should work, however depends on your database size.
DECLARE
#last_entries TABLE(id int, dte datetime)
-- insert date (dte) of last entry for each user (id)
INSERT INTO
#last_entries (id, dte)
SELECT
UserID,
MAX(ScrapeDate)
FROM
Scrape WITH (NOLOCK)
GROUP BY
UserID
-- select ranking
SELECT
-- optionally you can use RANK OVER() function to get rank value
UserName,
Wins,
Losses
FROM
#last_entries
JOIN
Scraps WITH (NOLOCK)
ON
UserID = id
AND ScrapeDate = dte
ORDER BY
Winds,
Losses
I do not test this code, so it could not compile on first run.
The answer to part one of your question depends on the version of SQL server you are using - SQL 2005+ offers ranking functions which make this kind of query a bit simpler than in SQL 2000 and before. I'll update this with more detail if you will indicate which platform you're using.
I suspect the clearest way to handle part 2 is to display the stats for the latest complete scraping exercise, otherwise you aren't showing a time-consistent ranking (although, if your data collection exercise takes 24 hours, there's a certain amount of latitude already).
To simplify this, you could create a table to hold metadata about each scrape operation, giving each one an id, start date and completion date (at a minimum), and display those records which relate to the latest complete scrape. To make this easier, you could remove the "scrape date" from the data collection table, and replace it with a foreign key linking each data row to a row in the scrape table.
EDIT
The following code illustrates how to rank users by their latest score, regardless of whether they are time-consistent:
create table #scrape
(userName varchar(20)
,wins int
,losses int
,scrapeDate datetime
)
INSERT #scrape
select 'Alice',100,200,'20090101'
union select 'Alice',120,210,'20090201'
union select 'Bob' ,200,200,'20090101'
union select 'Clara',300,100,'20090101'
union select 'Clara',300,210,'20090201'
union select 'Dave' ,100,10 ,'20090101'
;with latestScrapeCTE
AS
(
SELECT *
,ROW_NUMBER() OVER (PARTITION BY userName
ORDER BY scrapeDate desc
) AS rn
,wins + losses AS totalPlayed
,wins - losses as winDiff
from #scrape
)
SELECT userName
,wins
,losses
,scrapeDate
,winDiff
,totalPlayed
,RANK() OVER (ORDER BY winDiff desc
,totalPlayed desc
) as rankPos
FROM latestScrapeCTE
WHERE rn = 1
ORDER BY rankPos
EDIT 2
An illustration of the use of a metadata table to select the latest complete scrape:
create table #scrape_run
(runID int identity
,startDate datetime
,completedDate datetime
)
create table #scrape
(userName varchar(20)
,wins int
,losses int
,scrapeRunID int
)
INSERT #scrape_run
select '20090101', '20090102'
union select '20090201', null --null completion date indicates that the scrape is not complete
INSERT #scrape
select 'Alice',100,200,1
union select 'Alice',120,210,2
union select 'Bob' ,200,200,1
union select 'Clara',300,100,1
union select 'Clara',300,210,2
union select 'Dave' ,100,10 ,1
;with latestScrapeCTE
AS
(
SELECT TOP 1 runID
,startDate
FROM #scrape_run
WHERE completedDate IS NOT NULL
)
SELECT userName
,wins
,losses
,startDate AS scrapeDate
,wins - losses AS winDiff
,wins + losses AS totalPlayed
,RANK() OVER (ORDER BY (wins - losses) desc
,(wins + losses) desc
) as rankPos
FROM #scrape
JOIN latestScrapeCTE
ON runID = scrapeRunID
ORDER BY rankPos

Fetch two next and two previous entries in a single SQL query

I want to display an image gallery, and on the view page, one should be able to have a look at a bunch of thumbnails: the current picture, wrapped with the two previous entries and the two next ones.
The problem of fetching two next/prev is that I can't (unless I'm mistaken) select something like MAX(id) WHERE idxx.
Any idea?
note: of course the ids do not follow as they should be the result of multiple WHERE instances.
Thanks
Marshall
You'll have to forgive the SQL Server style variable names, I don't remember how MySQL does variable naming.
SELECT *
FROM photos
WHERE photo_id = #current_photo_id
UNION ALL
SELECT *
FROM photos
WHERE photo_id > #current_photo_id
ORDER BY photo_id ASC
LIMIT 2
UNION ALL
SELECT *
FROM photos
WHERE photo_id < #current_photo_id
ORDER BY photo_id DESC
LIMIT 2;
This query assumes that you might have non-contiguous IDs. It could become problematic in the long run, though, if you have a lot of photos in your table since TOP is often evaluated after the entire result set has been retrieved from the database. YMMV.
In a high load scenario, I would probably use these queries, but I would also prematerialize them on a regular basis so that each photo had a PreviousPhotoOne, PreviousPhotoTwo, etc column. It's a bit more maintenance, but it works well when you have a lot of static data and need performance.
if your IDs are continuous you could do
where id >= #id-2 and id <= #id+2
Otherwise I think you'd have to union 3 queries, one to get the record with the given id and two others messing about with top and order by like this
select *
from table
where id = #id
union
select top 2 *
from table
where id < #id
order by id desc
union
select top 2 *
from table
where id > #id
order by id
Performance will not be too bad as you aren't retrieving massive sets of data but it won't be great due to using a union.
If you find performance starts being a problem you could add columns to hold the ids of the previous and next items; calculating the ids using a trigger or overnight process or something. This will mean you only do the hard query once rather than each time you need it.
I think this method should work fine for non-continguous ID's and should be more effecient than using a UNION's. currentID would be set either using a constant in SQL or passing from your program.
SELECT * FROM photos WHERE ID = currentID OR ID IN (
SELECT ID FROM photos WHERE ID < currentID ORDER BY ID DESC LIMIT 2
) OR ID IN (
SELECT ID FROM photos WHERE ID > currentID ORDER BY ID ASC LIMIT 2
) ORDER BY ID ASC
If you are just interested in the previous and next records by id couldn't you just have a where clause that restricts WHERE id=xx, xx-1, xx-1, xx+1, xx+2 using multiple WHERE clauses or using WHERE IN ?