Here is the situation,each page will show 30 topics,so I had execute 1 sql statements at least,besides,I also want to show how many relpies with each topic and who the author is,thus
I have to use 30 statements to count the number of replpies and use other 30 statements to find the author.Finally,I got 61 statements,I really worry about the efficiency.
My tables looks like this:
Topic Reply User
------- ---------- ------------
id id id
title topic_id username
... ...
author_id
You should look into joining tables during a query.
Joins in SQLServer http://msdn.microsoft.com/en-us/library/ms191517.aspx
Joins in MySQL http://dev.mysql.com/doc/refman/5.0/en/join.html
As an example, I could do the following:
SELECT reply.id, reply.authorid, reply.text, reply.topicid,
topic.title,
user.username
FROM reply
LEFT JOIN topic ON (topic.id = reply.topicid)
LEFT JOIN user ON (user.id = reply.authorid)
WHERE (reply.isactive = 1)
ORDER BY reply.postdate DESC
LIMIT 10
If I read your requirements correctly, you want the result of the following query:
SELECT Topic.title, User.username, COUNT(Reply.topic_id) Replies
FROM Topic, User, Reply
WHERE Topic.id = Reply.topic_id
AND Topic.author_id = User.id
GROUP BY Topic.title, User.username
When I was first starting out with database driven web applications I had similar problems. I then spent several years working in a database rich environment where I actually learned SQL. If you intend to continue developing web applications (which I find are very fun to create) it would be worth your time to pick up a book or checking out some how-to's on basic and advance SQL.
One thing to add, on top of JOINS
It may be that your groups of data do not match or relate, so JOINs won't work. Another way: you may have 2 main chunks of data that is awkward to join.
Stored procedures can return multiple result sets.
For example, for a summary page you could return one aggregate result set and another "last 20" result set in one SQL call. To JOIN the 2 is awkward because it doesn't "fit" together.
You certainly can use some "left joins" on this one, however since the output only changes if someone updates/adds to your tables you could try to cache it in a xml/text file. Another way could be to build in some redundancy by adding another row to the topic table that keeps the reply count, username etc... and update them only if changes occur...
Related
Good morning/afternoon! I was hoping someone could help me out with something that probably should be very simple.
Admittedly, I’m not the strongest SQL query designer. That said, I’ve spent a couple hours beating my head against my keyboard trying to get a seemingly simple three way join working.
NOTE: I'm querying a Vertica DB.
Here is my query:
SELECT A.CaseOriginalProductNumber, A.CaseCreatedDate, A.CaseNumber, B.BU2_Key as BusinessUnit, C.product_number_desc as ModelNumber
FROM pps_sfdc.v_Case A
INNER JOIN reference_data.DIM_PRODUCT_LINE_HIERARCHY B
ON B.PL_Key = A.CaseOriginalProductLine
INNER JOIN reference_data.DIM_PRODUCT C
ON C.product_line_code = A.CaseOriginalProductLine
WHERE B.BU2_Key = 'XWT'
LIMIT 20
I have a view (v_Case) that I’m trying to join to two other tables so I can lookup a value from each of them. The above query returns identical data on everything EXCEPT the last column (see below). It's like it's iterating through the last column to pull out the unique entries, sort of like a "GROUP BY" clause. What SHOULD be happening is that I get unique rows with specific "BusinessUnit" and "ModelNumber" for that record.
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 1
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 2
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 3
DUMEPRINT 5/2/2014 8:56:27 AM 3002845327 JJT Product 4
I modeled my solution after this post:
How to deal with multiple lookup tables for beginners of SQL?
What am I doing wrong?
Thank you for any help you can provide.
Data issue. General rule in trouble shooting these is the column that is distinct (in this case C.product_number_desc as ModelNumber) for each record is generally where the issue is going to be...and why I pointed you towards dim_product.
If you receive duplicates, this query below will help identify if this table is giving you the issues. Remember key in this statement can be multiple fields...whatever you are joining the table on:
Select key,count(1) from table group by key having count(1)>1
Other options for the future...don't assume it's your code, duplicates like this almost always point towards dirty data (other option is you are causing cross joins because keys are not correct). If you comment out the 'c' table and the column referred to in the select clause, you would have received one row...hence your dupes were coming from the 'c' table here.
Good luck with it
I am new with google Big query, and trying to understand what is the best practices here.
I have a (.net) component that implement some articles reader behavior.
I have two tables.
one is articles and the other is user action.
Articles is a general table containing thousands of possible articles to read.
User actions simply register when a user reads an article.
I have about 200,000 users in my system.
On a certain time, I need to prepare each user with a bucket of possible articles by taking 1000 articles from the articles table and omitting the ones he already read.
As I have over 100,000 users to build a bucket I am seeking for the best possible solution to perform this:
Possible solution:
a. query for all articles,
b. query for all users actions.
c. creating the user bucket in code- long action to omit the ones he did.
that means I perform about (users count) + 1 queries in bigquery but i have to perfrom a large search in my code.
Any smart join I can do here, but I am unsure how this can go down ??
leaving the searching work to big query, and also using less queries calls than the number of users.
any help on 2 will be appreciated
Thanks you.
I would do something like this to populate a single table for all readers in one call:
Select User,Article
from
(
Select User,Article,
Row_Number() Over (Partition by User) as NBR -- to extract only 1000 per users
From
(
((Select User From
UserActions
Group Each by User) -- Unique Users table
Cross Join
Articles) as A -- A contains a list of users with all available articles
Left Join Each
(Select User,Article
From UserAction
where activity="read"
Group Each By User,Article
) as B --Using left join to add all available articles and..
On A.User=B.User
and A.Article=B.Article
where B.User Is Null --..filter out already read
)
)
where NBR<=1000 -- filter top 1000 per user
If you want to generate a query per user and you can add the user to the query, i'd go for something simpler such as:
Select top 1000 Article
from Articles
where Article not in
(Select Article from UserAction where User = "your user here" )
Hope this helps
I need to display a list of posts. For each post, I need to also show:
How many people "like" the post.
Three names of those who "like" the post (preferably friends of viewing user).
If the viewing user "likes" the post, I'd like for him/her to be one of the three.
I don't know how to do it without querying for each item in a for loop, which is proving to be very slow. Sure caching/denormalization will help, but I'd like to know if this can be done otherwise. How does facebook do it?
Assuming this basic db structure, any suggestions?
users
-----
id
username
posts
---------
id
user_id
content
friendships
-----------
user_id
friend_id
is_confirmed (bool)
users_liked_posts
-----------------
user_id
post_id
As a side note, if anyone knows how to do this in SQLAlchemy, that would very much appreciated.
EDIT: SQLFiddle http://sqlfiddle.com/#!2/9e703
You can try this in your sqlfiddle. The condition "WHERE user_id = 2" needs 2 replaced by your current user id.
SELECT numbered.*
FROM
(SELECT ranked.*,
IF (post_id=#prev_post,
#n := #n + 1,
#n := 1 AND #prev_post := post_id) as position
FROM
(SELECT users_liked_posts.post_id,
users_liked_posts.user_id,
visitor.user_id as u1,
friendships.user_id as u2,
IF (visitor.user_id is not null, 1, IF(friendships.user_id is not null, 2, 3)) as rank
FROM users_liked_posts
INNER JOIN posts
ON posts.id = users_liked_posts.post_id
LEFT JOIN friendships
ON users_liked_posts.user_id = friendships.user_id
AND friendships.friend_id = posts.user_id
LEFT JOIN (SELECT post_id, user_id FROM users_liked_posts WHERE user_id = 2) visitor
ON users_liked_posts.post_id = visitor.post_id
AND users_liked_posts.user_id = visitor.user_id
ORDER BY users_liked_posts.post_id, rank) as ranked
JOIN
(SELECT #n := 0, #prev_post := 0) as setup) as numbered
WHERE numbered.position < 4
You can easily join subquery "numbered" with table "users" to obtain additional user information. There are extra fields u2, u3 to help see what is happening. You can remove these.
General idea of the query:
1) left join users_liked_posts with itself two times. The first time it is restricted to current visitor, creating subquery visitors. The second time is restricted to friends.
2) the column rank, IF (visitor.user_id is not null, 1, IF(friendships.user_id is not null, 2, 3)), assigns a rank to each user in users_liked_posts. This query is sorted by post and by rank.
3) use the previous as a subquery to create the same data but with a running position for the users, per post.
4) use the previous as a subquery to extract the top 3 positions per post.
No, these steps can not be merged, in particular because MySQL does not allow a computed column to be used by alias in the WHERE condition.
#koriander gave the SQL answer, but as to how Facebook does it, you already partially answered that; they use highly denormalized data, and caching. Also, they implement atomic counters, in-memory edge lists to perform graph traversals, and they most certainly don't use relational database concepts (like JOIN's) since they don't scale. Even the MySQL clusters they run are essentially just key/value pairs which only get accessed when there's a miss in the cache layer.
Instead of an RDBS, I might suggest a graph database for your purposes, like neo4j
Good luck.
EDIT:
You're really going to have to play with Neo4j if you're interested in using it. You may or may not find it easier coming from a SQL background, but it will certainly provide more powerful, and likely faster, queries for performing graph traversals.
Here's a couple examples of Cypher queries which may be useful to you.
Count how many people like a post:
START post=node({postId})
MATCH post<-[:like]-user
RETURN count(*)
(really you should use an atomic counter, instead, if it's something you're going to be querying for a lot)
Get three people who liked a post with the following constraints:
The first likingUser will always be the current user if he/she liked the post.
If friends of the current user liked the post, they will show up before any non-friends.
START post=node({postId}), user=node({currentUserId})
MATCH path = post<-[:like]-likingUser-[r?:friend*0..1]-user
RETURN likingUser, count(r) as rc, length(path) as len
ORDER BY rc desc, len asc
LIMIT 3
I'll try to explain the above query... if I can.
Start by grabbing two nodes, the post and the current user
Match all users who like the post (likingUser)
Additionally, test whether there is a path of length 0 or 1 which connects likingUser through a friendship relationship to the current user (a path of length 0 indicates that likingUser==user).
Now, order the results first by whether or not relationship r exists (it will exist if the likingUser is friends with user or if likingUser==user). So, count(r) will be either 0 or 1 for each result. Since we prefer results where count(r)==1, we'll sort this in descending order.
Next, perform a secondary sort which forces the current user to the top of the list if he/she was part of the results set. We do this by checking the length of path. When user==likingUser, the path length will be shorter than when user is a friend of likingUser, so we can use length(path) to force user up to the top by sorting in ascending order.
Lastly, we limit the results to only the top three results.
Hopefully that makes some sense. As a side note, you may actually get better performance by separating out your queries. For example, one query to see if the user likes the post, then another to get up to three friends who liked the post, and finally another to get up to three non-friends who like the post. I say it may be faster because each query can short-circuit after it gets three results, whereas the big single-query I wrote has to consider all possibilities, then sort them. So, just keep in mind that just because you can combine multiple questions into a single query, it may actually perform worse than multiple queries.
I am working on an application that allows users to build a "book" from a number of "pages" and then place them in any order that they'd like. It's possible that multiple people can build the same book (the same pages in the same order). The books are built by the user prior to them being processed and printed, so I need to group books together that have the same exact layout (the same pages in the same order). I've written a million queries in my life, but for some reason I can't grasp how to do this.
I could simply write a big SELECT query, and then loop through the results and build arrays of objects that have the same pages in the same sequence, but I'm trying to figure out how to do this with one query.
Here is my data layout:
dbo.Books
BookId
Quantity
dbo.BookPages
BookId
PageId
Sequence
dbo.Pages
PageId
DocName
So, I need some clarification on a few things:
Once a user orders the pages the way they want, are they saved back down to a database?
If yes, then is the question to run a query to group book orders that have the same page-numbering, so that they are sent to the printers in an optimal way?
OR, does the user layout the pages, then send the order directly to the printer? And if so, it seems more complicated/less efficient to capture requested print jobs, and order them on-the-fly on the way out to the printers ...
What language/technology are you using to create this solution? .NET? Java?
With the answers to these questions, I can better gauge what you need.
With the answers to my questions, I also assume that:
You are using some type of many-to-many table to store customer page ordering. If so, then you'll need to write a query to select distinct page-orderings, and group by those page orderings. This is possible with a single SQL query.
However, if you feel you want more control over how this data is joined, then doing this programmatically may be the way to go, although you will lose performance by reading in all the data, and then outputting that data in a way that is consumable by your printers.
The books are identical only if the page count = match count.
It was tagged TSQL when I started. This may not be the same syntax on SQL.
;WITH BookPageCount
AS
(
select b1.bookID, COUNT(*) as [individualCount]
from book b1 with (nolock)
group by b1.bookID
),
BookCombinedCount
AS
(
select b1.bookID as [book1ID], b2.bookID as [book2ID], COUNT(*) as [combindCount]
from book b1 with (nolock)
join book b2 with (nolock)
on b1.bookID < b2.bookID
and b1.squence = b2.squence
and b1.page = b2.page
group by b1.bookID, b2.bookID
)
select BookCombinedCount.book1ID, BookCombinedCount.book2ID
from BookCombinedCount
join BookPageCount as book1 on book1.bookID = BookCombinedCount.book1ID
join BookPageCount as book2 on book2.bookID = BookCombinedCount.book2ID
where BookCombinedCount.combindCount = book1.individualCount
and BookCombinedCount.combindCount = book2.individualCount.PageCount
I have a query as follows in MS Access
SELECT tblUsers.Forename, tblUsers.Surname,
(SELECT COUNT(ID)
FROM tblGrades
WHERE UserID = tblUsers.UserID
AND (Grade = 'A' OR Grade = 'B' OR Grade = 'C')) AS TotalGrades
FROM tblUsers
I've put this into a report and now when trying to view the report it displays an alert "Multi-level GROUP BY clause is not allowed in subquery"
What I dont get is I dont even have any GROUP BY clauses in the query so why is it returning this error?
From Allen Browne's excellent website of Access tips: Surviving Subqueries
Error: "Multi-level group by not allowed"
You spent half an hour building a query with subquery, and verifying it all works. You create a report based on the query, and immediately it fails. Why?
The problem arises from what Access does behind the scenes in response to the report's Sorting and Grouping or aggregation. If it must aggregate the data for the report, and that's the "multi-level" grouping that is not permitted.
Solutions
In report design, remove everything form the Sorting and Grouping dialog, and do not try to sum anything in the Report Header or Report Footer. (In most cases this is not a practical solution.)
In query design, uncheck the Show box under the subquery. (This solution is practical only if you do not need to show the results of the subquery in the report.)
Create a separate query that handles the subquery. Use this query as a source "table" for the query the report is based on. Moving the subquery to the lower level query sometimes (not always) avoids the problem, even if the second query is as simple as
SELECT * FROM Query1;
Use a domain aggregate function such as DSum() instead of a subquery. While this is fine for small tables, performance will be unusable for large ones.
If nothing else works, create a temporary table to hold the data for the report. You can convert your query into an Append query (Append on Query menu in query design) to populate the temporary table, and then base the report on the temporary table.
IMPORTANT NOTE: I'm reposting the info here because I believe Allen Browne explicitly allows it. From his website:
Permission
You may freely use anything (code, forms, algorithms, ...) from these articles and sample databases for any purpose (personal, educational, commercial, resale, ...). All we ask is that you acknowledge this website in your code, with comments such as:
'Source: http://allenbrowne.com
'Adapted from: http://allenbrowne.com
Try this version:
SELECT users.Forename, users.Surname, grades.TotalGrades
FROM tblUsers AS users
LEFT JOIN (SELECT COUNT(ID) as TotalGrades, UserID FROM tblGrades WHERE (Grade = 'A' OR Grade = 'B' OR Grade = 'C') group by userid) AS grades on grades.UserID = users.UserID
I have not tested it. The query itself should be OK, but I'm not sure whether it works in the report data source.
try this:
SELECT users.Forename, users.Surname, count(grades.id) AS TotalGrades
FROM tblUsers AS users
INNER JOIN tblGrades AS grades ON users.ID=grades.UserID
WHERE grades.Grade in ("A","B","C") group by users.ID;
This is a simple joined table. Basically it means. Select all cases where a user has a grade with "A" or "B" or "C" (which would give you a table like this:
user1 | A
user1 | B
user1 | A
user2 | A
...
And then it groups it by users, counting how many times a grade appeared -> giving you the number of grades in the desired range for each user.