SQL Order by posts containing occurences of string in another table - sql

So I have a table full of tag keywords.
Tags
------------------
asp
sql
html
and a table full of posts
posts
------------------
I really like ASP
This week with stuff about ASP
This post contains SQL
I want to display the top 10 tags in order of most unique posts containing occurrences in the posts table.
This is what I have but it is rubbish, I am ashamed.
SELECT Tag,(SELECT Count(*) FROM Posts WHERE post LIKE '%Tags.Tag%') As Mentions FROM Tags ORDER BY Mentions DESC
Please help! I know there is some sort of mystical UNION or GROUP BY I am missing here.

SELECT TOP 10 *
FROM (
SELECT tag,COUNT(tag) AS Total
FROM tags t
JOIN posts p ON p.post LIKE '%' + t.tag + '%'
GROUP BY tags
) totals
ORDER BY Total Desc

Tested in MySQL:
SELECT tag,count(*) AS n
FROM tags
JOIN posts ON post LIKE CONCAT("%",tag,"%")
GROUP BY tag
ORDER BY n DESC
LIMIT 10
To illustrate how this works, it is instructive to run first without the GROUP BY
SELECT tag,post
FROM tags
JOIN posts ON post LIKE CONCAT("%",tag,"%");
+------+------------------------+
| tag | post |
+------+------------------------+
| asp | I really like asp |
| asp | this week: asp |
| sql | this post contains sql |
+------+------------------------+
3 rows in set (0.00 sec)

Related

Why does adding a SUM(column) throw a group by error [SQL]

I found some similar questions, but none of the solutions would work, nor did they explain what was causing the issue.
I have a working query
SELECT pages.pageString pageName, timeSpent
FROM
(SELECT `page_id`, SUM(`time_spent`) as timeSpent
FROM `pageViews`
WHERE `time_spent` > 0
GROUP BY `page_id`) myTable
JOIN pages ON pages.id = page_id
ORDER BY timeSpent DESC
LIMIT 5
This returns results that look like
+------------------------------+-----------+
| pageName | timeSpent |
+------------------------------+-----------+
| page 1 | 394292 |
| page 2 | 66990 |
| page 3 | 53896 |
| page 4 | 37796 |
| page 5 | 14982 |
+------------------------------+-----------+
I'd like to add a column containing the percentage of timeSpent relative to the other pages, to start I added a SUM(timeSpent) to my query but that throws an error
In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'pages.pageString'
Im not sure why this column is effected by adding this new column to the select statement.
Sadly any solution involving changing sql settings won't work due to company policy.
I appreciate any advice
UPDATE
The failing sql statement is
SELECT pages.pageString pageName, timeSpent FROM
(SELECT `page_id`, SUM(`time_spent`) as timeSpent FROM
`pageViews` WHERE `time_spent` > 0 GROUP BY `page_id`) myTable
JOIN pages ON pages.id = page_id ORDER BY timeSpent DESC LIMIT 5
As per the first answer I added a groupBy which solves the error
SELECT pages.pageString pageName, timeSpent, SUM(timeSpent) FROM
(SELECT `page_id`, SUM(`time_spent`) as timeSpent FROM `pageViews` WHERE `time_spent` > 0 GROUP BY `page_id`) myTable
JOIN pages ON pages.id = page_id GROUP BY pageName ORDER BY timeSpent DESC LIMIT 5
This however does not give the proper output
+------------------------------+-----------+----------------+
| pageName | timeSpent | SUM(timeSpent) |
+------------------------------+-----------+----------------+
| page 1. | 390210 | 390210 |
| page 2 | 66972 | 66972 |
| page3 | 52332 | 52332 |
| page4 | 25454 | 25454 |
| page5 | 13552 | 13552 |
+------------------------------+-----------+----------------+
Ideally this SUM(timeSpent) would be 390210+ 66972 + 52332 + 25454 + 13552 so that I may do timeSpent / SUM(timeSpent)
You did not say where you tried to put the sum(timeSpent) but I believe one can try to reconstruct with the error message:
In aggregated query without GROUP BY, expression #1 of SELECT list contains nonaggregated column 'pages.pageString'
It says what the problem is. You added sum(timeSpent) to the projection, but the SQL statement does not have a GROUP BY, in particular it mentions the first item which should be aggregated pages.pageString.
It would mention the other ones too, once you fix this one.
On the other hand, please make sure you post exactly the failing SQL statement instead of trying to describe how to get the error you have. It's better for us who try to help.
Update:
You have two tables/views pages and pageViews. The first one is used to get the page name. I would just focus on the time calculation to make things easier. Figuring out the name afterwards is simple, because it is directly connected to the page_id.
The first information you want is the sum of all times spent so that you can calculate the ratio to this sum.
This is simply an aggregation where you sum the times over all pages.
The second information you want is the sum of the times per page_id. You already know how to do that. You group by the page_id while aggregating the sums of each.
Try to put those two together now. You have the first statement of which the result shall be applied to each row of the second statement so that you get the table form page_id, time_spent_page, time_spent_all.
When you have step 3 then it is easy to add the page_name now, since you have the page_id which is required for a simple join.
I tried no to give away the solution. Maybe you like to try again following the steps above. If you have difficulties, simply leave a comment (maybe showing how far you got).
It might look complex in the beginning, but once you have done that successfully I hope you'll see that it can be simple.
Adding a column containing the percentage of timeSpent relative to the sum of all pages
SELECT pages.pageString pageName, timeSpent,
, timeSpent / sum(timeSpent) over() * 100 p
FROM
(SELECT `page_id`, SUM(`time_spent`) as timeSpent
FROM `pageViews`
WHERE `time_spent` > 0
GROUP BY `page_id`) myTable
JOIN pages ON pages.id = page_id
ORDER BY timeSpent DESC
LIMIT 5

PostgreSQL finding the 3 most popular articles in a news database

I'm currently trying to find the 3 most popular articles in a database. I want to print out the title and amount of views for each. I know I'll have to join two of the tables together (articles & log) in order to do so.
The articles table has a column of the titles, and one with a slug for the title.
The log table has a column of the paths in the format of /article/'slug'.
How would I join these two tables, filter out the path to compare to the slug column of the articles table, and use count to display the number of times it was viewed?
The correct query used was:
SELECT title, count(*) as views
FROM articles a, log l
WHERE a.slug=substring(l.path, 10)
GROUP BY title
ORDER BY views DESC
LIMIT 3;
If I understood you correctly you just need to join two tables based on one column using aggregation. The catch is that you can't compare them directly but have to use some string functions before.
Assuming a schema like this:
article
| title | slug |
-------------------
| title1 | myslug |
| title2 | myslug |
log
| path |
--------------------------
| /article/'myslug' |
| /article/'unmentioned' |
Try out something like the following:
select title, count(*) from article a join log l where concat('''', a.slug, '''') = substring(l.path, 10) group by title;
For more complex queries it can be helpful to at first write smaller queries which help you to figure out the whole query later. For example just check if the string functions return what you expect:
select substring(l.path, 10) from log l;
select concat('''', a.slug, '''') from article a;

Return a grouped list with occurrences using Rails and PostgreSQL

I have a list of Tags in my rails application which are linked to Posts using Taggings. In some views, I'd like to show a list of the 5 most commonly used tags, together with the times they have been tagged. To create a complete example, assume a table with 3 posts:
POSTS
ID | title
1 | Lorem
2 | Ipsum
3 | Dolor
And a table with 2 Tags
TAGS
ID | name
1 | Tag1
2 | Tag2
Now, if Post 1 is tagged with Tag1 and post 2 is tagged with tags 1 and 2, our taggings table looks like this:
TAGGINGS
tag_id | post_id
1 | 1
1 | 2
2 | 2
Then the question is how to fetch the required information (preferably without going back to the DB multiple times) to display the following result:
Commonly used tags:
tag1 (2 times)
tag2 (1 time)
I've managed to do this using MySQL by including the tag, grouping by tag_id and ordering by the count:
Ruby:
taggings = Tagging.select("*, count(tag_id) AS count").includes(:tag).group(:tag_id).order('count(*) DESC').limit(5)
taggings.each { |t| puts "#{t.tag.name} (#{t.count})" }
However, I'm moving the app to Heroku, and therefore I'm required to move to PostgreSQL 9.1. Unfortunately the strictness of Postgres breaks that query because it requires all fields to be specified in the group by clause. I've tried going that way, but it has resulted in the fact that I can't use t.count anymore to get the count of the rows.
So, to finally get to the question:
What is the best way to query this kind of information from postgres (v9.1) and display it to the user?
Your problem:
Unfortunately the strictness of Postgres breaks that query because it requires all fields to be specified in the group by clause.
Now, that has changed somewhat with PostgreSQL 9.1 (quoting release notes of 9.1):
Allow non-GROUP BY columns in the query target list when the primary
key is specified in the GROUP BY clause (Peter Eisentraut)
What's more, the basic query you describe would not even run into this:
Show a list of the 5 most commonly used tags, together
with the times they have been tagged.
SELECT tag_id, count(*) AS times
FROM taggings
GROUP BY tag_id
ORDER BY times DESC
LIMIT 5;
Works in any case.

Access join on first record

I have two tables in an Access database, tblProducts and tblProductGroups.
I am trying to run a query that joins both of these tables, and brings back a single record for each product. The problem is that the current design allows for a product to be listed in the tblProductGroups table more than 1 - i.e. a product can be a member of more than one group (i didnt design this!)
The query is this:
select tblProducts.intID, tblProducts.strTitle, tblProductGroups.intGroup
from tblProducts
inner join tblProductGroups on tblProducts.intID = tblProductGroups.intProduct
where tblProductGroups.intGroup = 56
and tblProducts.blnActive
order by tblProducts.intSort asc, tblProducts.curPrice asc
At the moment this returns results such as:
intID | strTitle | intGroup
1 | Product 1 | 1
1 | Product 1 | 2
2 | Product 2 | 1
2 | Product 2 | 2
Whereas I only want the join to be based on the first matching record, so that would return:
intID | strTitle | intGroup
1 | Product 1 | 1
2 | Product 2 | 1
Is this possible in Access?
Thanks in advance
Al
This option runs a subquery to find the minimum intGoup for each tblProducts.intID.
SELECT tblProducts.intID
, tblProducts.strTitle
, (SELECT TOP 1 intGroup
FROM tblProductGroups
WHERE intProduct=tblProducts.intID
ORDER BY intGroup ASC) AS intGroup
FROM tblProducts
WHERE tblProducts.blnActive
ORDER BY tblProducts.intSort ASC, tblProducts.curPrice ASC
This works for me. Maybe this helps someone:
SELECT
a.Lagerort_ID,
FIRST(a.Regal) AS frstRegal,
FIRST(a.Fachboden) AS frstFachboden,
FIRST(a.xOffset) AS frstxOffset,
FIRST(a.yOffset) AS frstyOffset,
FIRST(a.xSize) AS frstxSize,
FIRST(a.ySize) AS frstySize,
FIRST(a.Platzgr) AS frstyPlatzgr,
FIRST(b.Artikel_ID) AS frstArtikel_ID,
FIRST(b.Menge) AS frstMenge,
FIRST(c.Breite) AS frstBreite,
FIRST(c.Tiefe) AS frstTiefe,
FIRST(a.Fachboden_ID) AS frstFachboden_ID,
FIRST(b.BewegungsDatum) AS frstBewegungsDatum,
FIRST(b.ErzeugungsDatum) AS frstErzeugungsDatum
FROM ((Lagerort AS a)
LEFT JOIN LO_zu_ART AS b ON a.Lagerort_ID = b.Lagerort_ID)
LEFT JOIN Regal AS c ON a.Regal = c.Regal
GROUP BY a.Lagerort_ID
ORDER BY FIRST(a.Regal), FIRST(a.Fachboden), FIRST(a.xOffset), FIRST(a.yOffset);
I have non unique entries for Lagerort_ID on the table LO_zu_ART. My goal was to only use the first found entry from LO_zu_ART to match into Lagerort.
The trick is to use FIRST() an any column but the grouped one. This may also work with MIN() or MAX(), but I have not tested it.
Also make sure to call the Fields with the "AS" statement different than the original field. I used frstFIELDNAME. This is important, otherwise I got errors.
Create a new query, qryFirstGroupPerProduct:
SELECT intProduct, Min(intGroup) AS lowest_group
FROM tblProductGroups
GROUP BY intProduct;
Then JOIN qryFirstGroupPerProduct (instead of tblProductsGroups) to tblProducts.
Or you could do it as a subquery instead of a separate saved query, if you prefer.
It's not very optimal, but if you're bringing in a few thousand records this will work:
Create a query that gets the max of tblProducts.intID from one table and call it qry_Temp.
Create another query and join qry_temp to the table you are trying to join against, and you should get your results.

How do you concat multiple rows into one column in SQL Server?

I've searched high and low for the answer to this, but I can't figure it out. I'm relatively new to SQL Server and don't quite have the syntax down yet. I have this datastructure (simplified):
Table "Users" | Table "Tags":
UserID UserName | TagID UserID PhotoID
1 Bob | 1 1 1
2 Bill | 2 2 1
3 Jane | 3 3 1
4 Sam | 4 2 2
-----------------------------------------------------
Table "Photos": | Table "Albums":
PhotoID UserID AlbumID | AlbumID UserID
1 1 1 | 1 1
2 1 1 | 2 3
3 1 1 | 3 2
4 3 2 |
5 3 2 |
I'm looking for a way to get the all the photo info (easy) plus all the tags for that photo concatenated like CONCAT(username, ', ') AS Tags of course with the last comma removed. I'm having a bear of a time trying to do this. I've tried the method in this article but I get an error when I try to run the query saying that I can't use DECLARE statements... do you guys have any idea how this can be done? I'm using VS08 and whatever DB is installed in it (I normally use MySQL so I don't know what flavor of DB this really is... it's an .mdf file?)
Ok, I feel like I need to jump in to comment about How do you concat multiple rows into one column in SQL Server? and provide a more preferred answer.
I'm really sorry, but using scalar-valued functions like this will kill performance. Just open SQL Profiler and have a look at what's going on when you use a scalar-function that calls a table.
Also, the "update a variable" technique for concatenation is not encouraged, as that functionality might not continue in future versions.
The preferred way of doing string concatenation to use FOR XML PATH instead.
select
stuff((select ', ' + t.tag from tags t where t.photoid = p.photoid order by tag for xml path('')),1,2,'') as taglist
,*
from photos
order by photoid;
For examples of how FOR XML PATH works, consider the following, imagining that you have a table with two fields called 'id' and 'name'
SELECT id, name
FROM table
order by name
FOR XML PATH('item'),root('itemlist')
;
Gives:
<itemlist><item><id>2</id><name>Aardvark</a></item><item><id>1</id><name>Zebra</name></item></itemlist>
But if you leave out the ROOT, you get something slightly different:
SELECT id, name
FROM table
order by name
FOR XML PATH('item')
;
<item><id>2</id><name>Aardvark</a></item><item><id>1</id><name>Zebra</name></item>
And if you put an empty PATH string, you get even closer to ordinary string concatenation:
SELECT id, name
FROM table
order by name
FOR XML PATH('')
;
<id>2</id><name>Aardvark</a><id>1</id><name>Zebra</name>
Now comes the really tricky bit... If you name a column starting with an # sign, it becomes an attribute, and if a column doesn't have a name (or you call it [*]), then it leaves out that tag too:
SELECT ',' + name
FROM table
order by name
FOR XML PATH('')
;
,Aardvark,Zebra
Now finally, to strip the leading comma, the STUFF command comes in. STUFF(s,x,n,s2) pulls out n characters of s, starting at position x. In their place, it puts s2. So:
SELECT STUFF('abcde',2,3,'123456');
gives:
a123456e
So now have a look at my query above for your taglist.
select
stuff((select ', ' + t.tag from tags t where t.photoid = p.photoid order by tag for xml path('')),1,2,'') as taglist
,*
from photos
order by photoid;
For each photo, I have a subquery which grabs the tags and concatenates them (in order) with a commma and a space. Then I surround that subquery in a stuff command to strip the leading comma and space.
I apologise for any typos - I haven't actually created the tables on my own machine to test this.
Rob
I'd create a UDF:
create function GetTags(PhotoID int) returns #tags varchar(max)
as
begin
declare #mytags varchar(max)
set #mytags = ''
select #mytags = #mytags + ', ' + tag from tags where photoid = #photoid
return substring(#mytags, 3, 8000)
end
Then, all you have to do is:
select GetTags(photoID) as tagList from photos
Street_Name ; Street_Code
west | 14
east | 7
west+east | 714
If want to show two different row concat itself , how can do it?
(I mean last row i want to show from select result. My table had first and secord record)