SQLite design question - sql

Similar to a feed reader, I'm storing a bunch of articles, each pertaining to a source (feed) and each feed can belong to a category. What I'm trying to do is:
Retrieve the articles of the feeds that belong to a certain category.
Group the articles. One scenario would be by date(published_time), so that I have groups, for example: (12.04.09 - 3 articles, 17.04.09 - 9 articles, and so on)
Loop through each group and display each article. Pseudo-code:
foreach (Group group in results)
{
print(group.Name);
foreach (Article article in g.Articles)
{
print(article.Title);
print(article.Content);
}
}
I thought something simple like:
SELECT group_concat(item_id, '#') FROM items GROUP BY date(published_time)
would work. But then I'd have to split the resulting rows and loop through that (and there is no group_concat(*) function)
I'm confused as to how I would group(2) the results so that I can iterate through each one, preserving the group name. I thought that a SQL query returns ONE big table, and so, it seems to be impossible to accomplish this with just one query.
I reckon this is more of a DB design question, I'm also new to SQLite (SQL for that matter), so I ask you, gurus, how would one get this done efficiently?

SELECT Title, Content, date(published_time) AS Date
FROM items
ORDER BY date(published_time);
Pseudocode:
last = None
for r in results:
if not last or r.Date != last.Date:
print "Group", r.Date
print r.Title, r.Content
last = r

Related

how to count relationships from inital query

Hi I would like to make a query from a query (if that makes any sense)
My original solution is
PROFILE MATCH (q:Question)-[:TAGGED]-> (:Tag {name:"python"})
CALL{ WITH q ]
MATCH (q:Question)-[:TAGGED]-> (t:Tag)
WITH q, count(t) as c
RETURN c}
RETURN max(c)
The aim is to find all the questions q with the relationship TAGGED that is python. From the q nodes that we get the second objective is to count the number of relationships TAGGED that they have. The goal is to find the maximum amount of TAGGED relationships a question q can have. The problem is that this is not optimized enough as I am trying to limit the db hits. Another idea was the following
MATCH (:Tag {name: 'python'}) <-[:TAGGED]- (q:Question)-[:TAGGED]->(t: Tag)
WITH q, count(t) + 1 AS c
RETURN max(c)
In the first case, I tried to find first the questions that had at least the tag python and then pipeline the questions to count the number of relationships the filtered questions had but this seemed to be worse compared to the second query.
In the second query I had a problem with an expansion when I tried PROFILE and at the stage (q)-[anon_2:TAGGED]->(t) I take on too many db hits.
I'm confused as to how my first query doesn't work as well as the second.
I would try the following query:
MATCH (q:Question)
WHERE (q)-[:TAGGED]-> (:Tag {name:"python"})
WITH q,size((q)-[:TAGGED]->()) AS count
RETURN max(count)
Let's try this one:
PROFILE MATCH (q:Question)-[:TAGGED]-> (t:Tag)
WITH q, collect(t) AS tags
WHERE ANY(tag IN tags WHERE tag.name = 'python')
WITH q, size(tags) AS tagSize
RETURN max(tagSize)

Sql statement with multi ANDs querying the same column

I don't know if the title of the post is the appropriate. I have the following table
and an Array in php with some items, parsed_array. What I want to do is to find all the SupermarketIDs which have all the items of the parsed_array.
For example, if parsed_array contains [111,121,131] I want the result to be 21 which is the ID of the Supermarket that contains all these items.
I tried to do it like that:
$this->db->select('SupermarketID');
$this->db->from('productinsupermarket');
for ($i=0; $i<sizeof($parsed_array); $i++)
{
$this->db->where('ItemID', $parsed_array[$i]);
}
$query = $this->db->get();
return $query->result_array();
If there is only one item in the parsed_array the result is correct because the above is equal to
SELECT SupermarketID
FROM productinsupermarket
WHERE ItemID=parsed_array[0];
but if there are more than one items, lets say two, is equal to
SELECT SupermarketID
FROM productinsupermarket
WHERE ItemID=parsed_array[0]
AND ItemID=parsed_array[1];
which of course return an empty table. Any idea how can this be solved?
There are at least two ways of generating the result you want, either a self join (no fun to generate with a dynamic number of items) or using IN, GROUP BY and HAVING.
I can't really tell you how to generate it using CodeIgniter, I assume you're better at that than I am :)
SELECT SupermarketID
FROM productinsupermarket
WHERE ItemID IN (111,121,131) -- The 3 item id's you're looking for
GROUP BY SupermarketID
HAVING COUNT(ItemId) = 3; -- All 3 must match
An SQLfiddle to test with.
EDIT: As #ypercube mentions below, if the ItemId can show up more than once for a SupermarketID, you'll want to use COUNT(DISTINCT ItemId) to count only unique rows instead of counting every occurrence.
You can use where_in in codeigniter as below,
if(count($parsed_array) > 0)
{
$this->db->where_in('ItemID', $parsed_array);
}
Active record class in codeigniter
Try an IN clause or multiple ORs:
SELECT SupermarketID
FROM productinsupermarket
WHERE ItemID=parsed_array[0]
OR ItemID=parsed_array[1];

Paging in SQL with LIMIT/OFFSET sometimes results in duplicates on different pages

I'm developing an online gallery with voting and have a separate table for pictures and votes (for every vote I'm storing the ID of the picture and the ID of the voter). The tables related like this: PICTURE <--(1:n, using VOTE.picture_id)-- VOTE. I would like to query the pictures table and sort the output by votes number. This is what I do:
SELECT
picture.votes_number,
picture.creation_date,
picture.author_id,
picture.author_nickname,
picture.id,
picture.url,
picture.name,
picture.width,
picture.height,
coalesce(anon_1."totalVotes", 0)
FROM picture
LEFT OUTER JOIN
(SELECT
vote.picture_id as pid,
count(*) AS "totalVotes"
FROM vote
WHERE vote.device_id = <this is the query parameter> GROUP BY pid) AS anon_1
ON picture.id = anon_1.pid
ORDER BY picture.votes_number DESC
LIMIT 10
OFFSET 0
OFFSET is different for different pages, of course.
However, there are pictures with the same ID that are displayed on the different pages. I guess the reason is the sorting, but can't construct any better query, which will not allow duplicates. Could anybody give me a hint?
Thanks in advance!
Do you execute one query per page to display? If yes, I suspect that the database doesn't guarantee a consitent order for items with the same number of votes. So first query may return { item 1, item 2 } and a 2nd query may return { item 2, item 1} if both items have same number of votes. If the items are actually items 10 and 11, then the same item may appear on page 1 and then on page 2.
I had such a problem once. If that's also your case, append an extra clause to the order by to ensure a consistent ordering of items with same vote number, e.g.:
ORDER BY picture.vote, picture.ID
The simples explanation is that you had some data added or some votes occured when you was looking at different pages.
I am sure if you would sorte by ID or creation_date this issue would go away.
I.e. there is no issue with your code
in my case this problem was due to the Null value in the Order By clause, i solved this by adding another Unique ID field in Order By Clause along with other field.

NHibernate How to Select distinct objects based on specific property using HQL?

How can HQL be used to select specific objects that meet a certain criteria?
We've tried the following to generate a list of top ten subscribed RSS feeds (where SubscriptionCount is a derived property):
var topTen = UoW.Session.CreateQuery( #"SELECT distinct rss
FROM RssFeedSubscription rss
group by rss.FeedUrl
order by rss.SubscriptionCount DESC
")
.SetMaxResults(10)
.List<RssFeedSubscription>();
Where the intention is only to select the two unique feed URLs in the database, rather than the ten rows int the database instantiated as objects. The result of the above is:
Column 'RssSubscriptions.Id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
ORDER BY items must appear in the select list if SELECT DISTINCT is specified.
It's possible just to thin out the results so that we take out the two unique feed URLs after we get the data back from the database, but there must be a way to do this at the DB level using HQL?
EDIT: We realise it's possible to do a Scalar query and then manually pull out values, but is there not a way of simply specifying a match criteria for objects pulled back?
If you change your HQL a bit to look like that:
var topTen = UoW.Session.CreateQuery( #"SELECT distinct rss.FeedUrl
FROM RssFeedSubscription rss
group by rss.FeedUrl
order by rss.SubscriptionCount DESC
")
.SetMaxResults(10)
.List();
the topTen variable will be an object[] with 2 elements in there being the 2 feed URLs.
You can have this returned as strongly typed collection if you use the SetResultTransformer() method of the IQuery interfase.
You need to perform a scalar query. Here is an example from the NHibernate docs:
IEnumerable results = sess.Enumerable(
"select cat.Color, min(cat.Birthdate), count(cat) from Cat cat " +
"group by cat.Color"
);
foreach ( object[] row in results )
{
Color type = (Color) row[0];
DateTime oldest = (DateTime) row[1];
int count = (int) row[2];
.....
}
It's the group by rss.FeedUrl that's causing you the problem. It doesn't look like you need it since you're selecting the entities themselves. Remove that and I think you'll be good.
EDIT - My apologies I didn't notice the part about the "derived property". By that I assume you mean it's not a Hibernate-mapped property and, thus doesn't actually have a column in the table? That would explain the second error message you received in your query. You may need to remove the "order by" clause as well and do your sorting in Java if that's the case.

Limiting MySQL results within query

I'm looking to see if I can get the results I need with a single query, and my MySQL skills are still in their adolescence over here.
I have 4 tables: shows, artists, venues and tours. A simplified version of my main query right now looks like this:
SELECT *
FROM artists AS a,
venues AS v,
shows AS s
LEFT JOIN tours AS t ON s.show_tour_id = t.tour_id
WHERE s.show_artist_id = a.artist_id
AND s.show_venue_id = v.venue_id
ORDER BY a.artist_name ASC, s.show_date ASC;
What I want to add is a limit on how many shows are returned per artist. I know I could SELECT * FROM artists, and then run a query with a simple LIMIT clause for each returned row, but I figure there must be a more efficient way.
UPDATE: to put this more simply, I want to select up to 5 shows for each artist. I know I could do this (stripping away all irrelevancies):
<?php
$artists = $db->query("SELECT * FROM artists");
foreach($artists as $artist) {
$db->query("SELECT * FROM shows WHERE show_artist_id = $artist->artist_id LIMIT 5");
}
?>
But it seems wrong to be putting another query within a foreach loop. I'm looking for a way to achieve this within one result set.
This is the kind of thing stored procedures are for.
Select a list of artists, then loop through that list, adding 5 or fewer shows for each artists to a temp table.
Then, return the temp table.
As a plan-B, if you can't figure the proper SQL statement to use you can read the whole thing into a memory construct (array, class, etc) and loop it that way. If the data is sufficiently small and memory available sufficiently large this would let you do only one query. Not elegant, but may work for you.
Well I hesitate to suggest this because it certainly won't be computationally efficient (see the stored procedures answer for that...) but it will all be in one query like you wanted. I'm also taking some liberties and assuming that you want the 5 most recent shows...hopefully you can modify to your actual requirements.
SELECT *
FROM artists AS a,
venues AS v,
shows AS s
LEFT JOIN tours AS t ON s.show_tour_id = t.tour_id
WHERE s.show_artist_id = a.artist_id
AND s.show_venue_id = v.venue_id
AND s.show_id IN
(SELECT subS.show_id FROM shows subS
WHERE subS.show_artist_id = s.show_artist_id
ORDER BY subS.show_date DESC
LIMIT 5)
ORDER BY a.artist_name ASC, s.show_date ASC;