How can I select only rows with multiple hits for a specific column? - sql

I am not sure how to phrase this question so I'll give an example:
Suppose there is a table called tagged that has two columns: tagger and taggee. What would the SQL query look like to return the taggee(s) that are in multiple rows? That is to say, they have been tagged 2 or more times by any tagger.
I would like a 'generic' SQL query and not something that only works on a specific DBMS.
EDIT: Added "tagged 2 or more times by any tagger."

HAVING can operate on the result of aggregate functions. So if you have data like this:
Row tagger | taggee
--------+----------
1. Joe | Cat
2. Fred | Cat
3. Denise | Dog
4. Joe | Horse
5. Denise | Horse
It sounds like you want Cat, Horse.
To get the taggee's that are in multiple rows, you would execute:
SELECT taggee, count(*) FROM tagged GROUP BY taggee HAVING count(*) > 1
That being said, when you say "select only rows with multiple hits for a specific column", which row do you want? Do you want row 1 for Cat, or row 2?

select distinct t1.taggee from tagged t1 inner join tagged t2
on t1.taggee = t2.taggee and t1.tagger != t2.tagger;
Will give you all the taggees who have been tagged by more than one tagger

Related

Get total count and first 3 columns

I have the following SQL query:
SELECT TOP 3 accounts.username
,COUNT(accounts.username) AS count
FROM relationships
JOIN accounts ON relationships.account = accounts.id
WHERE relationships.following = 4
AND relationships.account IN (
SELECT relationships.following
FROM relationships
WHERE relationships.account = 8
);
I want to return the total count of accounts.username and the first 3 accounts.username (in no particular order). Unfortunately accounts.username and COUNT(accounts.username) cannot coexist. The query works fine removing one of the them. I don't want to send the request twice with different select bodies. The count column could span to 1000+ so I would prefer to calculate it in SQL rather in code.
The current query returns the error Column 'accounts.username' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. which has not led me anywhere and this is different to other questions as I do not want to use the 'group by' clause. Is there a way to do this with FOR JSON AUTO?
The desired output could be:
+-------+----------+
| count | username |
+-------+----------+
| 1551 | simon1 |
| 1551 | simon2 |
| 1551 | simon3 |
+-------+----------+
or
+----------------------------------------------------------------+
| JSON_F52E2B61-18A1-11d1-B105-00805F49916B |
+----------------------------------------------------------------+
| [{"count": 1551, "usernames": ["simon1", "simon2", "simon3"]}] |
+----------------------------------------------------------------+
If you want to display the total count of rows that satisfy the filter conditions (and where username is not null) in an additional column in your resultset, then you could use window functions:
SELECT TOP 3
a.username,
COUNT(a.username) OVER() AS cnt
FROM relationships r
JOIN accounts a ON r.account = a.id
WHERE
r.following = 4
AND EXISTS (
SELECT 1 FROM relationships t1 WHERE r1.account = 8 AND r1.following = r.account
)
;
Side notes:
if username is not nullable, use COUNT(*) rather than COUNT(a.username): this is more efficient since it does not require the database to check every value for nullity
table aliases make the query easier to write, read and maintain
I usually prefer EXISTS over IN (but here this is mostly a matter of taste, as both techniques should work fine for your use case)

PostgreSQL finding the 3 most popular articles in a news database

I'm currently trying to find the 3 most popular articles in a database. I want to print out the title and amount of views for each. I know I'll have to join two of the tables together (articles & log) in order to do so.
The articles table has a column of the titles, and one with a slug for the title.
The log table has a column of the paths in the format of /article/'slug'.
How would I join these two tables, filter out the path to compare to the slug column of the articles table, and use count to display the number of times it was viewed?
The correct query used was:
SELECT title, count(*) as views
FROM articles a, log l
WHERE a.slug=substring(l.path, 10)
GROUP BY title
ORDER BY views DESC
LIMIT 3;
If I understood you correctly you just need to join two tables based on one column using aggregation. The catch is that you can't compare them directly but have to use some string functions before.
Assuming a schema like this:
article
| title | slug |
-------------------
| title1 | myslug |
| title2 | myslug |
log
| path |
--------------------------
| /article/'myslug' |
| /article/'unmentioned' |
Try out something like the following:
select title, count(*) from article a join log l where concat('''', a.slug, '''') = substring(l.path, 10) group by title;
For more complex queries it can be helpful to at first write smaller queries which help you to figure out the whole query later. For example just check if the string functions return what you expect:
select substring(l.path, 10) from log l;
select concat('''', a.slug, '''') from article a;

Clash of multivalued attribute

I am having a database having name and hobbies(as multivalued attribute) and I want to find out what is the count of occurence of more than one same value
For example
If this is a sample database
A reading
A dancing
B reading
B dancing
Then the result should be
List of hobbies | Number of occurrence
-----------------|---------------------
reading, dancing | 2
I think you have a query like this:
SELECT hobbies, Count(*) As hNo
FROM t
GROUP BY hobbies
That have a result set like this:
hobbies | hNo
--------+------
reading | 2
dancing | 2
Now for this data-set you can follow answers of this question [Concatenate many rows into a single text string] to have them in one row.

JavaDB: get ordered records in the subquery

I have the following "COMPANIES_BY_NEWS_REPUTATION" in my JavaDB database (this is some random data just to represent the structure)
COMPANY | NEWS_HASH | REPUTATION | DATE
-------------------------------------------------------------------
Company A | 14676757 | 0.12345 | 2011-05-19 15:43:28.0
Company B | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company C | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company A | -7874564 | 0.12345 | 2011-05-19 15:43:28.0
One news_hash may relate to several companies while a company can relate to several news_hashes as well. Reputation and date are bound to the news_hash.
What I need to do is calculate the average reputation of last 5 news for every company. In order to do that I somehow feel that I need to user 'order by' and 'offset' in a subquery as shown in the code below.
select COMPANY, avg(REPUTATION) from
(select * from COMPANY_BY_NEWS_REPUTATION order by "DATE" desc
offset 0 rows fetch next 5 row only) as TR group by COMPANY;
However, JavaDB allows neither ORDER BY, nor OFFSET in a subquery. Could anyone suggest a working solution for my problem please?
Which version of JavaDB are you using? According to the chapter TableSubquery in the JavaDB documentation, table subqueries do support order by and fetch next, at least in version 10.6.2.1.
Given that subqueries can be ordered and the size of the result set can be limited, the following (untested) query might do what you want:
select COMPANY, (select avg(REPUTATION)
from (select REPUTATION
from COMPANY_BY_NEWS_REPUTATION
where COMPANY = TR.COMPANY
order by DATE desc
fetch first 5 rows only))
from (select distinct COMPANY
from COMPANY_BY_NEWS_REPUTATION) as TR
This query retrieves all distinct company names from COMPANY_BY_NEWS_REPUTATION, then retrieves the average of the last five reputation rows for each company. I have no idea whether it will perform sufficiently, that will likely depend on the size of your data set and what indexes you have in place.
If you have a list of unique company names in another table, you can use that instead of the select distinct ... subquery to retrieve the companies for which to calculate averages.

Postgresql (Rails 3) merge rows on column (same table)

First, I've been using mysql for forever and am now upgrading to postgresql. The sql syntax is much stricter and some behavior different, thus my question.
I've been searching around for how to merge rows in a postgresql query on a table such as
id | name | amount
0 | foo | 12
1 | bar | 10
2 | bar | 13
3 | foo | 20
and get
name | amount
foo | 32
bar | 23
The closest I've found is Merge duplicate records into 1 records with the same table and table fields
sql returning duplicates of 'name':
scope :tallied, lambda { group(:name, :amount).select("charges.name AS name,
SUM(charges.amount) AS amount,
COUNT(*) AS tally").order("name, amount desc") }
What I need is
scope :tallied, lambda { group(:name, :amount).select("DISTINCT ON(charges.name) charges.name AS name,
SUM(charges.amount) AS amount,
COUNT(*) AS tally").order("name, amount desc") }
except, rather than returning the first row of a given name, should return mash of all rows with a given name (amount added)
In mysql, appending .group(:name) (not needing the initial group) to the select would work as expected.
This seems like an everyday sort of task which should be easy. What would be a simple way of doing this? Please point me on the right path.
P.S. I'm trying to learn here (so are others), don't just throw sql in my face, please explain it.
I've no idea what RoR is doing in the background, but I'm guessing that group(:name, :amount) will run a query that groups by name, amount. The one you're looking for is group by name:
select name, sum(amount) as amount, count(*) as tally
from charges
group by name
If you append amount to the group by clause, the query will do just that -- i.e. count(*) would return the number of times each amount appears per name, and the sum() would return that number times that amount.