I seem to have reached a mental block on this and hope someone can give me a kick in the right direction.
I have a web application similar to a newsreader client. It's written in Python and uses SQLAlchemy but that's not important here as I'm trying to get my head around the SQL, also I'm using SQLite as a backend.
There is a Users table and an Articles table, the Users table is obvious enough and the Articles table stores individual articles (like posts on a news server). I track which user has read which article through a many-many relationship employing another table, Users_Articles, to do this.
The (cut down) schema is something like this:
Users:
user_id
user_name
Articles:
article_id
article_body
Users_Articles:
user_id
article_id
What I'm trying to do is SELECT a list of articles but to also display which article has already been read by the current user thus I'd like to add a boolean column to the set of columns in the SELECT statement which indicates if there is a row in Users_Articles which refers to the article for the current user.
you can go with left outer join
select
a.article_id, a.article_body,
ua.article_id as as been_read --will be not null for read articles
from Articles a
left outer join Users_Articles ua
on (ua.article_id = a.article_id and ua.user_id = $current_user_id)
or with subselect
select
a.article_id, a.article_body,
(select 1 from Users_Articles ua
where ua.article_id = a.article_id
and ua.user_id = $current_user_id) as been_read --will be not null for read articles
from Articles a
Related
I am new with google Big query, and trying to understand what is the best practices here.
I have a (.net) component that implement some articles reader behavior.
I have two tables.
one is articles and the other is user action.
Articles is a general table containing thousands of possible articles to read.
User actions simply register when a user reads an article.
I have about 200,000 users in my system.
On a certain time, I need to prepare each user with a bucket of possible articles by taking 1000 articles from the articles table and omitting the ones he already read.
As I have over 100,000 users to build a bucket I am seeking for the best possible solution to perform this:
Possible solution:
a. query for all articles,
b. query for all users actions.
c. creating the user bucket in code- long action to omit the ones he did.
that means I perform about (users count) + 1 queries in bigquery but i have to perfrom a large search in my code.
Any smart join I can do here, but I am unsure how this can go down ??
leaving the searching work to big query, and also using less queries calls than the number of users.
any help on 2 will be appreciated
Thanks you.
I would do something like this to populate a single table for all readers in one call:
Select User,Article
from
(
Select User,Article,
Row_Number() Over (Partition by User) as NBR -- to extract only 1000 per users
From
(
((Select User From
UserActions
Group Each by User) -- Unique Users table
Cross Join
Articles) as A -- A contains a list of users with all available articles
Left Join Each
(Select User,Article
From UserAction
where activity="read"
Group Each By User,Article
) as B --Using left join to add all available articles and..
On A.User=B.User
and A.Article=B.Article
where B.User Is Null --..filter out already read
)
)
where NBR<=1000 -- filter top 1000 per user
If you want to generate a query per user and you can add the user to the query, i'd go for something simpler such as:
Select top 1000 Article
from Articles
where Article not in
(Select Article from UserAction where User = "your user here" )
Hope this helps
I've searched the way of improving this dangerous combination of functions in one SQL sentence...
To put you in a context, i have a table with several information about articles (article_id, author, ...) and another one containing the article_id with one tag_id. As an article is able to have several tags, that second table could have 2 rows with the same article_id and different tag_id.
In order to get a list of the 8 articles that have more tags in common with the one that i want (in this case the 1354) I have written the following query:
SELECT articles.article_id, articles.author, count(articles_tags.article_id) as times
FROM articles
INNER JOIN articles_tags ON (articles.article_id=articles_tags.article_id)
WHERE id_tag IN
(SELECT article_id FROM articles_tags WHERE article_id=1354)
AND article_id <> 1354
GROUP BY article_id
ORDER BY times DESC
LIMIT 8
It is EXTREMELY slow... like 90 seconds for half million articles.
By deleting the "order by times" sentence, it works almost instantly, but if i do so, i won't get the most similar articles.
What can i do?
Thanks!!
a query on a sub-select is ALWAYS a time-killer... Also, as the query didn't really appear to be accurate, or missing, I am making an assumption that your articles_tags table has two columns... one for the actual article ID, and another for the tag_ID associated with it.
That said, I would pre-query just the TAG IDs for article 1354 (the on you are interested in). Use that as a Cartesian join to the article tags again on the tag IDs being the same. From that, you are grabbing the SECOND version of article tags alias and getting ITs article ID, and then the count that MATCH (via Join and not a left-join). Apply the group by on the article ID as you had, And for grins, join to the articles table to get the author.
Now, note. Some SQL engines require you to group by all non-aggregate fields, so you MAY have to either add the author to the group by (which will always be the same per article ID anyway), or change it to MAX( A.author ) as Author which would give the same results.
I would have an index on the (tag_id, article_id) so the tags are found from the "common" tags you are looking to find in common. You could have one article with 10 tags, and another article with 10 completely different tags resulting in 0 in common. This will prevent the other article from even appearing in the result set.
You STILL have the time associated with blowing through half-million articles as you described, which could be millions of actual tag entries.
select
AT2.article_id,
A.Author,
count(*) as Times
from
( select ATG.id_tag
from articles_tags ATG
where ATG.Article_ID = 1354
order by ATG.id_tag ) CommonTags
JOIN articles_tags AT2
on CommonTags.ID_Tag = AT2.ID_Tag
AND AT2.Article_ID <> 1354
JOIN articles A
on AT2.Article_ID = A.Article_ID
group by
AT2.article_id
order by
Times DESC
limit 8
It seems that it should be possible to do this without any subqueries, and then a quicker query may result.
Here the article of interest is joined to its tags, and then further to other articles having these tags. Then the number of tags for each article is counted and ordered:
SELECT a2.article_id, a2.author, COUNT(t2.tag_id) AS times
FROM articles a1
INNER JOIN articles_tags t1
ON t1.article_id = a1.article_id -- find tags for staring article
INNER JOIN tags t2
ON t2.tag_id = t1.tag_id -- find other instances of those tags
AND t2.articles_id <> t1.articles_id
INNER JOIN articles a2
ON a2.articles_id = t2.articles_id -- and the articles where they are used
WHERE a1.article_id = 1354
GROUP BY a2.article_id, a2.author -- count common tags by articles
ORDER BY times DESC
LIMIT 8
If you know a lower bound on the number of tags in common (e.g. 3), inserting HAVING times > 2 before ORDER BY times DESC could give a further speed improvement.
The query I am trying to perform is
With getusers As
(Select userID from userprofspecinst_v where institutionID IN
(select institutionID, professionID from userprofspecinst_v where userID=#UserID)
and professionID IN
(select institutionID, professionID from userprofspecinst_v where userID=#UserID))
select username from user where userID IN (select userID from getusers)
Here's what I'm trying to do. Given a userID and a view which contains the userID and the ID of their institution and profession, I want to get the list of other userID's who also have the same institutionID and and professionID. Then with that list of userIDs I want to get the usernames that correspond to each userID from another table (user). The error I am getting when I try to create the procedure is, "Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.". Am I taking the correct approach to how I should build this query?
The following query should do what you want to do:
SELECT u.username
FROM user AS u
INNER JOIN userprofspecinst_v AS up ON u.userID = up.userID
INNER JOIN (SELECT institutionID, professionID FROM userprofspecinst_v
WHERE userID = #userID) AS ProInsts
ON (up.institutionID = ProInsts.institutionID
AND up.professionID = ProInsts.professionID)
Effectively the crucial part is the last INNER JOIN statement - this creates a table constituting the insitutionsids and professsionids the user id belongs to. We then get all matching items in the view with the same institution id and profession id (the ON condition) and then link these back to the user table on the corresponding userids (the first JOIN).
You can either run this for each user id you are interested in, or JOIN onto the result of a query (your getusers) (it depends on what database engine you are running).
If you aren't familiar with JOIN's, Jeff Atwood's introductory post is a good starting place.
The JOIN statement effectively allows you to explot the logical links between your tables - the userId, institutionID and professionID are all examples of candidates for foreign keys - so, rather than having to constantly subquery each table and piece the results together, you can link all the tables together and filter down to the rows you want. It's usually a cleaner, more maintainable approach (although that is opinion).
I am trying to write a sql query which fetches all the tags related to every topic being displayed on the page.
like this
TITLE: feedback1
POSTED BY: User1
CATEGORY: category1
TAGS: tag1, tag2, tag3
TITLE: feedback2
POSTED BY: User2
CATEGORY: category2
TAGS: tag2, tag5, tag7,tag8
TITLE: feedback3
POSTED BY: User3
CATEGORY: category3
TAGS: tag1, tag5, tag6, tag3
The relationship of tags to topics is many to many.
Right now I am first fetching all the topics from the "topics" table and to fetch the related tags of every topic I loop over the returned topics array for fetching tags.
But this method is very expensive in terms of speed and not efficient too.
Please help me write this sql query.
Query for fetching all the topics and its information is as follows:
SELECT
tbl_feedbacks.pk_feedbackid as feedbackId,
tbl_feedbacks.type as feedbackType,
DATE_FORMAT(tbl_feedbacks.createdon,'%M %D, %Y') as postedOn,
tbl_feedbacks.description as description,
tbl_feedbacks.upvotecount as upvotecount,
tbl_feedbacks.downvotecount as downvotecount,
(tbl_feedbacks.upvotecount)-(tbl_feedbacks.downvotecount) as totalvotecount,
tbl_feedbacks.viewcount as viewcount,
tbl_feedbacks.title as feedbackTitle,
tbl_users.email as userEmail,
tbl_users.name as postedBy,
tbl_categories.pk_categoryid as categoryId,
tbl_clients.pk_clientid as clientId
FROM
tbl_feedbacks
LEFT JOIN tbl_users
ON ( tbl_users.pk_userid = tbl_feedbacks.fk_tbl_users_userid )
LEFT JOIN tbl_categories
ON ( tbl_categories.pk_categoryid = tbl_feedbacks.fk_tbl_categories_categoryid )
LEFT JOIN tbl_clients
ON ( tbl_clients.pk_clientid = tbl_feedbacks.fk_tbl_clients_clientid )
WHERE
tbl_clients.pk_clientid = '1'
What is the best practice that should be followed in such cases when you need to display all the tags related to every topic being displayed on a single page.
How do I alter the above sql query, so that all the tags plus related information of topics is fetched using a single query.
For a demo of what I am trying to achieve is similar to the'questions' page of stackoverflow.
All the information (tags + information of every topic being displayed) is properly displayed.
Thanks
To do this, I would have three tables:
Topics
topic_id
[whatever else you need to know for a topic]
Tags
tag_id
[etc]
Map
topic_id
tag_id
select t.[whatever], tag.[whatever]
from topics t
join map m on t.topic_id = m.topic_id
join tags tag on tag.tag_id = m.tag_id
where [conditionals]
Set up partitions and/or indexes on the map table to maximize the speed of your query. For example, if you have many more topics than tags, partition the table on topics. Then, each time you grab all the tags for a topic, it will be 1 read from 1 area, no seeking needed. Make sure to have both topics and tags indexed on their _id.
Use your 'explain plan' tool. (I am not familiar with mysql, but I assume there is some tool that can tell you how a query will be run, so you can optimize it)
EDIT:
So you have the following tables:
tbl_feedbacks
tbl_users
tbl_categories
tbl_clients
tbl_tags
tbl_topics
tbl_topics_tags
The query you provide as a starting point shows how feedback, users, categories and clients relate to each other.
I assume that tbl_topics_tags contains FKs to tags and topics, showing which topic has which tag. Is this correct?
What of (feedbacks, users, categories, and clients) has a FK to topics or tags? Or, do either topics or tags have a FK to any of the initial 4?
Once I know this, I'll be able to show how to modify the query.
EDIT #2
There are two different ways to go about this:
The easy way is the just join on your FK. This will give you one row for each tag. It is much easier and more flexible to put together the SQL to do it this way. If you are using some other language to take the results of the query and translate them to present them to the user, this method is better. If nothing else, it will be far more obvious what is going on, and will be easier to debug and maintain.
However, you may want each row of the query results to contain one feedback (and the tags that go with it).
SQL joining question <- this is a question I posted on how to do this. The answer I accepted is an oracle-only answer AFAIK, but there are other non-oracle answers.
Adapting Kevin's answer (which is supposed to work in SQL92 compliant systems):
select
[other stuff: same as in your post],
(select tag
from tbl_tag tt
join tbl_feedbacks_tags tft on tft.tag_id = tt.tag_id
where tft.fk_feedbackid = tbl_feedbacks.pk_feedbackid
order by tag_id
limit 1
offset 0 ) as tag1,
(select tag
from tbl_tag tt
join tbl_feedbacks_tags tft on tft.tag_id = tt.tag_id
where tft.fk_feedbackid = tbl_feedbacks.pk_feedbackid
order by tag_id
limit 1
offset 1 ) as tag2,
(select tag
from tbl_tag tt
join tbl_feedbacks_tags tft on tft.tag_id = tt.tag_id
where tft.fk_feedbackid = tbl_feedbacks.pk_feedbackid
order by tag_id
limit 1
offset 2 ) as tag3
from [same as in the OP]
This should do the trick.
Notes:
This will pull the first three tags. AFAIK, there isn't a way to have an arbitrary number of tags. You can expand the number of tags shown by copying and pasting more of those parts of the query. Make sure to increase the offset setting.
If this does not work, you'll probably have to write up another question, focusing on how to do the pivot in mysql. I've never used mysql, so I'm only guessing that this will work based on what others have told me.
One tip: you'll usually get more attention to your question if you strip away all the extra details. In the question I linked to above, I was really joining between 4 or 5 different tables, with many different fields. But I stripped it down to just the part I didn't know (how to get oracle to aggregate my results into one row). I know some stuff, but you can usually do far better than just one person if you trim your question down to the essentials.
I've got two SQL Server tables authors, and articles where authors primary key (AuthorID) is a foreign key in the articles table to represent a simple one-to-many relationship between authors and articles table. Now here's the problem, I need to issue a full text search on the authors table based on the first name, last name, and biography columns. The full text search is working awesome and ranking and all. Now I need to add one more criteria to my search, I need all the non-articles contributors to be ignored from the search. To achieve that I chose to create a view with all the contributors that have articles and search against this view. So I created the view this way:
Create View vw_Contributors_With_Articles
AS
Select * from Authors
Where Authors.ContributorID
IN ( Select Distinct (Articles.ContributorId) From Articles)
It's working but I really don't like the subquery thing. The join gets me all the redundant authorIDs, tried distinct but didn't work with the biography column as it's type is ntext. Group by wouldn't do it for me because I need all the columns not any aggregate of them.
What do you think guys? How can I improve this?
An EXISTS allows for the potential duplicate entries when there are multiple articles per author:
Select * from Authors
Where EXISTS (SELECT *
FROM Articles
WHERE Articles.ContributorId = Authors.ContributorId)
Edit:
To clarify, you can not DISTINCT on ntext columns. So, you can not have a JOIN solution, unless you use a derived table on articles in the JOIN and avoid using articles directly. Or you convert the ntext to nvarchar(max).
EXISTS or IN is your only option.
Edit 2:
...unless you really want to use a JOIN and you have SQL Server 2005 or higher, you can CAST and DISTINCT (aggregate) to avoid multiple rows in the output...
select DISTINCT
Authors.ContributorID,
Authors.AnotherColumn,
CAST(Authors.biography AS nvarchar(max)) AS biography,
Authors.YetAnotherColumn,
...
from
Authors
inner join
Articles on
Articles.ContributorID = Authors.ContributorID
You want an inner join
select
*
from
Authors
inner join
Articles on
Articles.ContributorID = Authors.ContributorID
This will return only authors who have a an entry on the Articles table, matched by ContributorID.
Select the distinct contributorIDs from the Articles table to get the individual authors who have written an article, and join the Authors table to that query - so something like
select distinct Articles.contributorID, Authors.*
from Articles
join Authors on Articles.contributerID = Authors.ContributerId