PostgreSQL finding the 3 most popular articles in a news database - sql

I'm currently trying to find the 3 most popular articles in a database. I want to print out the title and amount of views for each. I know I'll have to join two of the tables together (articles & log) in order to do so.
The articles table has a column of the titles, and one with a slug for the title.
The log table has a column of the paths in the format of /article/'slug'.
How would I join these two tables, filter out the path to compare to the slug column of the articles table, and use count to display the number of times it was viewed?
The correct query used was:
SELECT title, count(*) as views
FROM articles a, log l
WHERE a.slug=substring(l.path, 10)
GROUP BY title
ORDER BY views DESC
LIMIT 3;

If I understood you correctly you just need to join two tables based on one column using aggregation. The catch is that you can't compare them directly but have to use some string functions before.
Assuming a schema like this:
article
| title | slug |
-------------------
| title1 | myslug |
| title2 | myslug |
log
| path |
--------------------------
| /article/'myslug' |
| /article/'unmentioned' |
Try out something like the following:
select title, count(*) from article a join log l where concat('''', a.slug, '''') = substring(l.path, 10) group by title;
For more complex queries it can be helpful to at first write smaller queries which help you to figure out the whole query later. For example just check if the string functions return what you expect:
select substring(l.path, 10) from log l;
select concat('''', a.slug, '''') from article a;

Related

Combine same named columns from different tables *without* merging the columns

I've got a table to store collected data from several energy meters, then I created some views to show data from specific meters only. Now I want to combine those views for an overview of only interesting data.
As far as I understood from reading other questions, (where my question here could be a possible duplicate?) JOIN would be what I need and that creates new columns, but the columns with the values of the meters get merged. I guess this is because the columns with the interesting values have all the exact same name, but that is not what I want. I want the colums with the interesting values (named "1.8.0") not merged but in seperate columns as they are in the views, just next to each other for a better overview.
To shorten the post I created following example to show my problem:
http://sqlfiddle.com/#!17/a886d/31 (and maybe also http://sqlfiddle.com/#!17/a886d/30 )
The related query:
SELECT public.meter354123."0.9.2" AS datestamp,
public.meter354123."1.8.0" AS meter354123
FROM public.meter354123
FULL JOIN public.meter354124 ON public.meter354123."1.8.0" = public.meter354124."1.8.0";
For some reason I do not understand yet, the JOIN does not work for me as I would expect. If I JOIN ON the values (column "1.8.0") I get NULL rows, if I JOIN ON the datestamps (column "0.9.2"), one column is missing completely in the result.
(if it is meaningful, feel free to edit the code from the fiddle here into the question, I thought it would be too much code to paste here and I don't know how to explain my issue more simpler)
In the end I would like to have a result like:
| datestamp (=col "0.9.2") | meterdata1 (=col "1.8.0") | meterdata2 (=col "1.8.0") | etc...
| 1220101 | value1 | value1 | ...
| 1220201 | value2 | value2 | ...
| 1220301 | value3 | value3 | ...
Maybe the intermediate views are not necessary at all and it is even possible to pull off this result from the original table without going through those views?
I'm not a database expert so I went with my current knowledge to accomplish that.
Thank you very much for looking into this and for any hints!
You could aggregate meter data into a CSV:
SELECT
"0.9.2" AS datestamp,
string_agg("1.8.0", ',') AS meterdata
FROM public.meter354123
GROUP BY "0.9.2"
Or to get an actual array:
SELECT
"0.9.2" AS datestamp,
array_agg("1.8.0") AS meterdata
FROM public.meter354123
GROUP BY "0.9.2"
Thank you for looking into this, I could "solve" this by using another several intermediate views and then simple JOIN-ing those views as following:
see fiddle: http://sqlfiddle.com/#!17/a886d/40
CREATE VIEW meter354123 AS SELECT meterdata."0.0.0",
meterdata."0.9.1",
meterdata."0.9.2",
meterdata."1.8.0"
FROM meterdata
WHERE meterdata."0.0.0" = 354123::numeric AND meterdata."0.9.1" = 0::numeric
ORDER BY meterdata."0.0.0", meterdata."0.9.2" DESC
LIMIT 12;
CREATE VIEW meter354124 AS SELECT meterdata."0.0.0",
meterdata."0.9.1",
meterdata."0.9.2",
meterdata."1.8.0"
FROM meterdata
WHERE meterdata."0.0.0" = 354124::numeric AND meterdata."0.9.1" = 0::numeric
ORDER BY meterdata."0.0.0", meterdata."0.9.2" DESC
LIMIT 12;
CREATE VIEW meter354127 AS SELECT meterdata."0.0.0",
meterdata."0.9.1",
meterdata."0.9.2",
meterdata."1.8.0"
FROM meterdata
WHERE meterdata."0.0.0" = 354127::numeric AND meterdata."0.9.1" = 0::numeric
ORDER BY meterdata."0.0.0", meterdata."0.9.2" DESC
LIMIT 12;
CREATE VIEW "meter354123_1.8.0" AS SELECT public.meter354123."0.9.2" AS datestamp,
public.meter354123."1.8.0" AS meter354123
FROM public.meter354123
ORDER BY datestamp DESC
LIMIT 12;
CREATE VIEW "meter354124_1.8.0" AS SELECT public.meter354124."0.9.2" AS datestamp,
public.meter354124."1.8.0" AS meter354124
FROM public.meter354124
ORDER BY datestamp DESC
LIMIT 12;
CREATE VIEW "meter354127_1.8.0" AS SELECT public.meter354127."0.9.2" AS datestamp,
public.meter354127."1.8.0" AS meter354127
FROM public.meter354127
ORDER BY datestamp DESC
LIMIT 12;
SELECT "meter354123_1.8.0".datestamp,
"meter354123_1.8.0".meter354123,
"meter354124_1.8.0".meter354124,
"meter354127_1.8.0".meter354127
FROM "meter354123_1.8.0"
JOIN "meter354124_1.8.0" ON "meter354123_1.8.0".datestamp = "meter354124_1.8.0".datestamp
JOIN "meter354127_1.8.0" ON "meter354123_1.8.0".datestamp = "meter354127_1.8.0".datestamp;
which results in:
datestamp | meter354123 | meter354124 | meter354127
-----------+-------------+-------------+-------------
1220301 | 11055.66 | 5403.16 | 88556.23
1220201 | 11054.64 | 5399.47 | 88195.41
1220101 | 11053.33 | 5395.27 | 87799.84
I don't know if there is a more efficient/elegant solution, but at least this gives the wanted result.

Aggregating or Bundle a Many to Many Relationship in SQL Developer

So I have 1 single table with 2 columns : Sales_Order called ccso, Arrangement called arrmap
The table has distinct values for this combination and both these fields have a Many to Many relationship
1 ccso can have Multiple arrmap
1 arrmap can have Multiple ccso
All such combinations should be considered as one single bundle
Objective :
Assign a final map to each of the Sales Order as the Largest Arrangement in that Bundle
Example:
ccso : 100-10015 has 3 arrangements --> Now each of those arrangements have a set of Sales Orders --> Now those sales orders will also have a list of other arrangements and so on
(Image : 1)
Therefore the answer definitely points to something recursively checking. Ive managed to write the below code / codes and they work as long as I hard code a ccso in the where clause - But I don't know how to proceed after this now. (I'm an accountant by profession but finding more passion in coding recently) I've searched the forums and web for things like
Recursive CTEs,
many to many aggregation
cartesian product etc
and I'm sure there must be a term for this which I don't know yet. I've also tried
I have to use sqldeveloper or googlesheet query and filter formulas
sqldeveloper has restrictions on on some CTEs. If recursive is the way I'd like to know how and if I can control the depth to say 4 or 5 iterations
Ideally I'd want to update a third column with the final map if possible but if not, then a select query result is just fine
Codes I've tried
Code 1: As per Screenshot
WITH a1(ccso, amap) AS
(SELECT distinct a.ccso, a.arrmap
FROM rg_consol_map2 A
WHERE a.ccso = '100-10115' -- this condition defines the ultimate ancestors in your chain, change it as appropriate
UNION ALL
SELECT m.ccso, m.arrmap
FROM rg_consol_map2 m
JOIN a1
ON M.arrmap = a1.amap -- or m.ccso=a1.ccso
) /*if*/ CYCLE amap SET nemap TO 1 /*else*/ DEFAULT 0
SELECT DISTINCT amap FROM (SELECT ccso, amap FROM a1 ORDER BY 1 DESC) WHERE ROWNUM = 1
In this the main challenge is how to remove the hardcoded ccso and do a join for each of the ccso
Code 2 : Manual CTEs for depth
Here again the join outside the CTE gives me an error and sqldeveloper does not allow WITH clause with UPDATE statement - only works for select and cannot be enclosed within brackets as subtable
SELECT distinct ccso FROM
(
WITH ar1 AS
(SELECT distinct arrmap
FROM rg_consol_map
WHERE ccso = a.ccso
)
,so1 AS
(SELECT DISTINCT ccso
FROM rg_consol_map
WHERE arrmap IN (SELECT arrmap FROM ar1)
)
,ar2 AS
(SELECT DISTINCT ccso FROM rg_consol_map
where arrmap IN (select distinct arrmap FROM rg_consol_map
WHERE ccso IN (SELECT ccso FROM so1)
))
SELECT ar1.arrmap, NULL ccso FROM ar1
union all
SELECT null, ar2.ccso FROM ar2
UNION ALL
SELECT NULL arrmap, so1.ccso FROM so1
)
Am I Missing something here or is there an easier way to do this? I read something about MERGE and PROC SQL JOIN but was unable to get them to work but if that's the way to go ahead I will try further if someone can point me in the direction
(Image : 2)
(CSV File : [3])
Edit : Fixing CSV file link
https://github.com/karan360note/karanstackoverflow.git
I suppose can be downloaded from here IC mapping many to many.csv
Oracle 11g version is being used
Apologies in advance for the wall of text.
Your problem is a complex, multi-layered Many-to-Many query; there is no "easy" solution to this, because that is not a terribly ideal design choice. The safest best does literally include multiple layers of CTE or subqueries in order to achieve all the depths you want, as the only ways I know to do so recursively rely on an anchor column (like "parentID") to direct the recursion in a linear fashion. We don't have that option here; we'd go in circles without a way to track our path.
Therefore, I went basic, and with several subqueries. Every level checks for a) All orders containing a particular ARRMAP item, and then b) All additional items on those orders. It's clear enough for you to see the logic and modify to your needs. It will generate a new table that contains the original CCSO, the linking ARRMAP, and the related CCSO. Link: https://pastebin.com/un70JnpA
This should enable you to go back and perform the desired updates you want, based on order # or order date, etc... in a much more straightforward fashion. Once you have an anchor column, a CTE in the future is much more trivial (just search for "CTE recursion tree hierarchy").
SELECT DISTINCT
CCSO, RELATEDORDER
FROM myTempTable
WHERE CCSO = '100-10115'; /* to find all orders by CCSO, query SELECT DISTINCT RELATEDORDER */
--WHERE ARRMAP = 'ARR10524'; /* to find all orders by ARRMAP, query SELECT DISTINCT CCSO */
EDIT:
To better explain what this table generates, let me simplify the problem.
If you have order
A with arrangements 1 and 2;
B with arrangement 2, 3; and
C with arrangement 3;
then, by your initial inquiry and image, order A should related to orders B and C, right? The query generates the following table when you SELECT DISTINCT ccso, relatedOrder:
+-------+--------------+
| CCSO | RelatedOrder |
+----------------------+
| A | B |
| A | C |
+----------------------+
| B | C |
| B | A |
+----------------------+
| C | A |
| C | B |
+-------+--------------+
You can see here if you query WHERE CCSO = 'A' OR RelatedOrder = 'A', you'll get the same relationships, just flipped between the two columns.
+-------+--------------+
| CCSO | RelatedOrder |
+----------------------+
| A | B |
| A | C |
+----------------------+
| B | A |
+----------------------+
| C | A |
+-------+--------------+
So query only CCSO or RelatedOrder.
As for the results of WHERE CCSO = '100-10115', see image here, which includes all the links you showed in your Image #1, as well as additional depths of relations.

Get total count and first 3 columns

I have the following SQL query:
SELECT TOP 3 accounts.username
,COUNT(accounts.username) AS count
FROM relationships
JOIN accounts ON relationships.account = accounts.id
WHERE relationships.following = 4
AND relationships.account IN (
SELECT relationships.following
FROM relationships
WHERE relationships.account = 8
);
I want to return the total count of accounts.username and the first 3 accounts.username (in no particular order). Unfortunately accounts.username and COUNT(accounts.username) cannot coexist. The query works fine removing one of the them. I don't want to send the request twice with different select bodies. The count column could span to 1000+ so I would prefer to calculate it in SQL rather in code.
The current query returns the error Column 'accounts.username' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. which has not led me anywhere and this is different to other questions as I do not want to use the 'group by' clause. Is there a way to do this with FOR JSON AUTO?
The desired output could be:
+-------+----------+
| count | username |
+-------+----------+
| 1551 | simon1 |
| 1551 | simon2 |
| 1551 | simon3 |
+-------+----------+
or
+----------------------------------------------------------------+
| JSON_F52E2B61-18A1-11d1-B105-00805F49916B |
+----------------------------------------------------------------+
| [{"count": 1551, "usernames": ["simon1", "simon2", "simon3"]}] |
+----------------------------------------------------------------+
If you want to display the total count of rows that satisfy the filter conditions (and where username is not null) in an additional column in your resultset, then you could use window functions:
SELECT TOP 3
a.username,
COUNT(a.username) OVER() AS cnt
FROM relationships r
JOIN accounts a ON r.account = a.id
WHERE
r.following = 4
AND EXISTS (
SELECT 1 FROM relationships t1 WHERE r1.account = 8 AND r1.following = r.account
)
;
Side notes:
if username is not nullable, use COUNT(*) rather than COUNT(a.username): this is more efficient since it does not require the database to check every value for nullity
table aliases make the query easier to write, read and maintain
I usually prefer EXISTS over IN (but here this is mostly a matter of taste, as both techniques should work fine for your use case)

JavaDB: get ordered records in the subquery

I have the following "COMPANIES_BY_NEWS_REPUTATION" in my JavaDB database (this is some random data just to represent the structure)
COMPANY | NEWS_HASH | REPUTATION | DATE
-------------------------------------------------------------------
Company A | 14676757 | 0.12345 | 2011-05-19 15:43:28.0
Company B | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company C | 454564556 | 0.78956 | 2011-05-24 18:44:28.0
Company A | -7874564 | 0.12345 | 2011-05-19 15:43:28.0
One news_hash may relate to several companies while a company can relate to several news_hashes as well. Reputation and date are bound to the news_hash.
What I need to do is calculate the average reputation of last 5 news for every company. In order to do that I somehow feel that I need to user 'order by' and 'offset' in a subquery as shown in the code below.
select COMPANY, avg(REPUTATION) from
(select * from COMPANY_BY_NEWS_REPUTATION order by "DATE" desc
offset 0 rows fetch next 5 row only) as TR group by COMPANY;
However, JavaDB allows neither ORDER BY, nor OFFSET in a subquery. Could anyone suggest a working solution for my problem please?
Which version of JavaDB are you using? According to the chapter TableSubquery in the JavaDB documentation, table subqueries do support order by and fetch next, at least in version 10.6.2.1.
Given that subqueries can be ordered and the size of the result set can be limited, the following (untested) query might do what you want:
select COMPANY, (select avg(REPUTATION)
from (select REPUTATION
from COMPANY_BY_NEWS_REPUTATION
where COMPANY = TR.COMPANY
order by DATE desc
fetch first 5 rows only))
from (select distinct COMPANY
from COMPANY_BY_NEWS_REPUTATION) as TR
This query retrieves all distinct company names from COMPANY_BY_NEWS_REPUTATION, then retrieves the average of the last five reputation rows for each company. I have no idea whether it will perform sufficiently, that will likely depend on the size of your data set and what indexes you have in place.
If you have a list of unique company names in another table, you can use that instead of the select distinct ... subquery to retrieve the companies for which to calculate averages.

How can I select only rows with multiple hits for a specific column?

I am not sure how to phrase this question so I'll give an example:
Suppose there is a table called tagged that has two columns: tagger and taggee. What would the SQL query look like to return the taggee(s) that are in multiple rows? That is to say, they have been tagged 2 or more times by any tagger.
I would like a 'generic' SQL query and not something that only works on a specific DBMS.
EDIT: Added "tagged 2 or more times by any tagger."
HAVING can operate on the result of aggregate functions. So if you have data like this:
Row tagger | taggee
--------+----------
1. Joe | Cat
2. Fred | Cat
3. Denise | Dog
4. Joe | Horse
5. Denise | Horse
It sounds like you want Cat, Horse.
To get the taggee's that are in multiple rows, you would execute:
SELECT taggee, count(*) FROM tagged GROUP BY taggee HAVING count(*) > 1
That being said, when you say "select only rows with multiple hits for a specific column", which row do you want? Do you want row 1 for Cat, or row 2?
select distinct t1.taggee from tagged t1 inner join tagged t2
on t1.taggee = t2.taggee and t1.tagger != t2.tagger;
Will give you all the taggees who have been tagged by more than one tagger