I have a very large database with about 120 Million records in one table.I have clean up the data in this table first before I divide it into several tables(possibly normalizing it). The columns of this table is as follows: "id(Primary Key), userId, Url, Tag " . This is basically a subset of the dataset from delicious website. As I said, each row has an id, userID a url and only "one" tag. So for example a bookmark in delicious website is composed of several tags for a single url, this corresponds to several lines of my database. for example:
"id"; "user" ;"url" ;"tag"
"38";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"ajax"
"39";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"api"
"40";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"javascript"
"41";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"library"
"42";"12c2763095ec44e498f870ed67ee948d";"http://forkjavascript.org/";"rails"
I need a query to count the number of times that a tag is used for a url.
Thank you for you help
This query should work for you:
SELECT tag, url, count(tag) FROM table GROUP BY tag, url
Haven't tested it for you though.
Is this what you are looking for?
SELECT COUNT(tag) FROM TABLENAME
WHERE tag='sometag'
I think it's actually more like SELECT tag, COUNT(tag) FROM TABLENAME WHERE URL='someurl' GROUP BY tag
Related
I am creating a website where you can share posts with multiple tags, now I encountered the problem that a post is shown multiple times, each one with one tag. In my database I have a table posts and a table tags where you link the post_id. Now my question is: how can I get only one post but multiple tags on this one post?
screenshot of query in database
This should work, edit column names if needed:
SELECT *,
(SELECT GROUP_CONCAT(DISTINCT tag) FROM tags WHERE post_id = posts.id)
FROM posts
You can use GROUP_CONCAT() in MySQL and string_agg() in MS SQL Server
SELECT posts.id,GROUP_CONCAT(tags.tag)
FROM posts
Left JOIN tags on tags.post_id = posts.id
GROUP BY posts.id
In a postgres db, I need to find, extract and count URLs embedded in a text column. (Pseudocode)
SELECT id,
body,
xxx? AS the_url,
COUNT(DISTINCT(the_url)) as count
FROM messages
WHERE body LIKE '%://%'
GROUP BY the_url;
How do I accomplish that?
You can use a FILTER combined with count(*) to count how many records in your table contain a certain pattern, e.g.
SELECT count(*) FILTER (WHERE body LIKE '%://%')
FROM messages;
If you want to count the occurrences of a certain text inside a column you might wanna try this:
SELECT id,body,
array_length(string_to_array(body,'%://%'),1)-1
FROM messages;
Demo: db<>fiddle
It is a classical question and I know there are many work around like here: Select a Column in SQL not in Group By but they do not work for my issue on Bigquery.
I have a table with tweets from Twitter and I want a ranking for the urls including any tweet text.
ID tweet url
1 my github tweet http://www.github.com/xyz
2 RT github tweet http://www.github.com/xyz
3 another tweet http://www.twitter.com
4 more tweeting http://www.github.com/abc
I tried the following query, but then id 1 and 2 are counted separately.
SELECT tweet, count(url) as popularity, url FROM table group by tweet, url order by popularity desc
How can I count/rank the urls correctly and still preserve any associated tweet text in the result? I do not care if it is from ID 1 or 2.
Here is one approach:
SELECT url, COUNT(*) AS popularity, GROUP_CONCAT(tweet)
FROM Table GROUP BY url ORDER BY popularity
GROUP_CONCAT aggregation function will concatenate all the tweets associated with same URL using comma as separator (you can pick another separator as second parameter to GROUP_CONCAT).
I'm not sure this will work with google-bigquery or not, I haven't experience with it but this is a solution with pure sql I thought it may works for you.
get the count of urls in a subquery and then join it with the table on url:
select t.id,t.tweet,t.url,q.popularity
from table t
join
(SELECT url, count(url) as popularity
FROM table group by url) q
on t.url=q.url
order by q.popularity desc
I'm modifying a query I have that pulls news items from my database. These news items have tags, which, in the database, is stored in a single column as a string separated by commas.
For example:
'content,video,featured video,foo'
What I'm trying to do is grab all the items in the table but not the items that contain 'video' in the tags string, unless the tag string also contains 'featured video'
What is the best way to do this?
Here is my query:
SELECT *
FROM posts
WHERE status = 2
ORDER BY postDate
I'm offering horrible thing, but if you want to stick to your table structure, you may try following:
SELECT * FROM posts
WHERE STATUS=2 AND
INSTR(tags,'featured video')>0
OR
INSTR(tags,'video')=0
At least use FULLTEXT index on that field, so it won't be this painful to use.
Probably not the best way to do it...
Select *
FROM posts
where status = 2
AND postID NOT IN
(SELECT postID FROM posts
WHERE tag LIKE '%Video%'
AND tag NOT like '%Featured Video%')
I am assuming PostId to be the PK from the posts table btw...
I would use a query like this:
SELECT
*
FROM
posts
WHERE
status = 2
AND (CONCAT(',', tags, ',') LIKE '%,featured video,%'
OR CONCAT(',', tags, ',') NOT LIKE '%,video,%')
ORDER BY
postDate
The tagging table has 3 columns: id (the primary key), tag, and resource.
I want to select the tags that are associated with at least 3 resources. A resource can be associated several times with the same tag, so a single GROUP BY is not enough.
My current SQL query is the following:
SELECT tag FROM
(SELECT resource, tag FROM tagging GROUP BY resource, tag) AS tagging
GROUP BY tag HAVING count(*) > 2;
I need to convert this request in HQL, and HQL does not accept subqueries inside the FROM clause.
Is there a (fast) way to do the same thing without using a subquery, or with a subquery in the WHERE clause?
Thank you
To find tags that are associated with more than 2 different resources you can use
SELECT tag
FROM tagging
GROUP BY tag
HAVING count(DISTINCT resource) > 2;