Bigquery: Select a column with any value not in group by clause - sql

It is a classical question and I know there are many work around like here: Select a Column in SQL not in Group By but they do not work for my issue on Bigquery.
I have a table with tweets from Twitter and I want a ranking for the urls including any tweet text.
ID tweet url
1 my github tweet http://www.github.com/xyz
2 RT github tweet http://www.github.com/xyz
3 another tweet http://www.twitter.com
4 more tweeting http://www.github.com/abc
I tried the following query, but then id 1 and 2 are counted separately.
SELECT tweet, count(url) as popularity, url FROM table group by tweet, url order by popularity desc
How can I count/rank the urls correctly and still preserve any associated tweet text in the result? I do not care if it is from ID 1 or 2.

Here is one approach:
SELECT url, COUNT(*) AS popularity, GROUP_CONCAT(tweet)
FROM Table GROUP BY url ORDER BY popularity
GROUP_CONCAT aggregation function will concatenate all the tweets associated with same URL using comma as separator (you can pick another separator as second parameter to GROUP_CONCAT).

I'm not sure this will work with google-bigquery or not, I haven't experience with it but this is a solution with pure sql I thought it may works for you.
get the count of urls in a subquery and then join it with the table on url:
select t.id,t.tweet,t.url,q.popularity
from table t
join
(SELECT url, count(url) as popularity
FROM table group by url) q
on t.url=q.url
order by q.popularity desc

Related

How do I select one row in SQL when another row has same id but a different value in a column?

I am creating a website where you can share posts with multiple tags, now I encountered the problem that a post is shown multiple times, each one with one tag. In my database I have a table posts and a table tags where you link the post_id. Now my question is: how can I get only one post but multiple tags on this one post?
screenshot of query in database
This should work, edit column names if needed:
SELECT *,
(SELECT GROUP_CONCAT(DISTINCT tag) FROM tags WHERE post_id = posts.id)
FROM posts
You can use GROUP_CONCAT() in MySQL and string_agg() in MS SQL Server
SELECT posts.id,GROUP_CONCAT(tags.tag)
FROM posts
Left JOIN tags on tags.post_id = posts.id
GROUP BY posts.id

COUNT(*) function is returning multiple values

I am writing a specific sql query that needs to return the position of a particular entry, based on a grouped table.
Background info: I am coding a Golf Club Data Management system using Java and MS Access. In this system, the user is able to store their scores as a new entry into this table. Using this table, I have managed to extract a ranking of the top 3 Golf players, using all their recorded scores (I only used top 3 to preserve screen space).
Select TOP 3 Username, Sum(Points)
FROM Scores
GROUP By Username
ORDER BY Sum(Points) desc
This produces the required result. However, if the current user falls outside of the top 3, I want to be able to tell the user where they currently sit in the complete ranking of all the players. So, I tried to write a query that counts the number of players having a sum of points below the current user. Here is my query:
Select COUNT(*)
From Scores
GROUP BY Username
HAVING Sum(Points) < (Select Sum(Points)
FROM Scores
WHERE Username = 'Golfer210'
GROUP By Username)
This does not produce the expected number 2, but instead does this.
I have tried removing the GROUP BY function but that returns null. The COUNT DISTINCT Function refuses to work as well, and continuously returns a syntax error message, no matter how I word it.
Questions: Is there a way to count the number of entries while using a GROUP BY function? if not, is there an easier, more practical way to select the position of an entry from the grouped table? Or can this only be done in Java, after the ranking has been extracted from the database? I have not been able to find a solution anywhere
You need an additional level of aggregation:
SELECT COUNT(*)
FROM (SELECT COUNT(*)
FROM Scores
GROUP BY Username
HAVING Sum(Points) < (SELECT Sum(Points)
FROM Scores
WHERE Username = 'Golfer210'
)
) as s;
Note: You might want to check if your logic does what you expect when there are ties.

Rails subquery without SQL?

I have a User model that has many Post.
I want to get, on a single query, a list of users IDs, ordered by name, and include the ID of their last post.
Is there a way to do this using the ActiveRecord API instead of a SQL query like the following?
SELECT users.id,
(SELECT id FROM posts
WHERE user_id = users.id
ORDER BY id DESC LIMIT 1) AS last_post_id
FROM users
ORDER BY id ASC;
You should be able to do this with the query generator:
User.joins(:posts).group('users.id').order('users.id').pluck(:id, 'MAX(posts.id)')
There's a lot of options on the relationship you can use to get data out of it. pluck is handy for getting values independent of models.
Update: To get models instead:
User.joins(:posts).group('users.id').order('users.id').select('users.*', 'MAX(posts.id) AS max_post_id')
That will create a field called max_post_id which works as any other attribute.

Query to ORDER BY the number of rows returned from another SELECT

I'm trying to wrap my head around SQL and I need some help figuring out how to do the following query in PostgreSQL 9.3.
I have a users table, and a friends table that lists user IDs and the user IDs of friends in multiple rows.
I would like to query the user table, and ORDER BY the number of mutual friends in common to a user ID.
So, the friends table would look like:
user_id | friend_user_id
1 | 4
1 | 5
2 | 10
3 | 7
And so on, so user 1 lists 4 and 5 as friends, and user 2 lists 10 as a friend, so I want to sort by the highest count of user 1 in friend_user_id for the result of user_id in the select.
The Postgres way to do this:
SELECT *
FROM users u
LEFT JOIN (
SELECT user_id, count(*) AS friends
FROM friends
) f USING (user_id)
ORDER BY f.friends DESC NULLS LAST, user_id -- as tiebreaker
The keyword AS is just noise for table aliases. But don't omit it from column aliases. The manual on "Omitting the AS Key Word":
In FROM items, both the standard and PostgreSQL allow AS to be omitted
before an alias that is an unreserved keyword. But this is impractical
for output column names, because of syntactic ambiguities.
Bold emphasis mine.
ISNULL() is a custom extension of MySQL or SQL Server. Postgres uses the SQL-standard function COALESCE(). But you don't need either here. Use the NULLS LAST clause instead, which is faster and cleaner. See:
PostgreSQL sort by datetime asc, null first?
Multiple users will have the same number of friends. These peers would be sorted arbitrarily. Repeated execution might yield different sort order, which is typically not desirable. Add more expressions to ORDER BY as tiebreaker. Ultimately, the primary key resolves any remaining ambiguity.
If the two tables share the same column name user_id (like they should) you can use the syntax shortcut USING in the join clause. Another standard SQL feature. Welcome side effect: user_id is only listed once in the output for SELECT *, as opposed to when joining with ON. Many clients wouldn't even accept duplicate column names in the output.
Something like this?
SELECT * FORM [users] u
LEFT JOIN (SELECT user_id, COUNT(*) friends FROM fields) f
ON u.user_id = f.user_id
ORDER BY ISNULL(f.friends,0) DESC

SELECT USING COUNT in mysql

I have a very large database with about 120 Million records in one table.I have clean up the data in this table first before I divide it into several tables(possibly normalizing it). The columns of this table is as follows: "id(Primary Key), userId, Url, Tag " . This is basically a subset of the dataset from delicious website. As I said, each row has an id, userID a url and only "one" tag. So for example a bookmark in delicious website is composed of several tags for a single url, this corresponds to several lines of my database. for example:
"id"; "user" ;"url" ;"tag"
"38";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"ajax"
"39";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"api"
"40";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"javascript"
"41";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"library"
"42";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"rails"
I need a query to count the number of times that a tag is used for a url.
Thank you for you help
This query should work for you:
SELECT tag, url, count(tag) FROM table GROUP BY tag, url
Haven't tested it for you though.
Is this what you are looking for?
SELECT COUNT(tag) FROM TABLENAME
WHERE tag='sometag'
I think it's actually more like SELECT tag, COUNT(tag) FROM TABLENAME WHERE URL='someurl' GROUP BY tag