Explain this short SQL query to me please - sql

"SELECT * from posts A, categories B where A.active='1'
AND A.category=B.CATID order by A.time_added desc
limit $pagingstart, $config[items_per_page]";
I think it says selects the rows from the 'posts' table such that the active entry in each row is equal to 1 but I don't understand the rest. Please explain. Thank you.

It selects the columns from Posts (referred to with the alias "A"), and the associated for from Categories (referred to as "B") for each post, where:
Posts.Active = 1
The post's category exist in the "Categories" table (if a post doesn't have a matching category in this table, the row won't be returned)
Orders the results by A.Time_added (in decending order, newest to oldest)
Returns just "$config[items_per_page]" rows, starting with "$pagingstart"
I'm not sure what brand of SQL this is, as I don't recognize the limit statement or the $variables, but that's the gist.

You'll get rows
from A and B that where category and CATID match ("intersection" bit of a Venn Diagram)
The rows for A are filtered to those where Active = 1
sorted by time_added. latest first
limit says y rows startig at row x. x and y are determined by the sort

posts A, categories B is a such called "implicit JOIN". It returns all possible combinations of records from A and B which are later filtered by the WHERE conditions.
Explicit join syntax is much more readable:
SELECT *
FROM posts A
JOIN categories B
ON B.CATID = A.category
WHERE A.active='1'
ORDER BY
A.time_added DESC
LIMIT $pagingstart, $config[items_per_page]
This means: "for each record from A, take all records from B whose catid is the same as A's category".
ORDER BY A.time_added DESC makes your posts to return from latest to earliest.
LIMIT 100, 10 makes the query to return only posts from 100th to 110th.

It looks like this is trying to select all active posts, order them with the newest at the top, and limit the number of records to fit on a page. The semantics of A.active='1' probably mean that the post is active, but I'm guessing.
It looks like MySQL with PHP.

This selects entries from posts and categories, joining them together where posts.category=categories.CATID. It filters out all rows where posts.active!=1, and then orders by descending posts.time_added, returning at most $config[items_per_page] items starting from $pagingstart.

It selects all the active posts (and their category), newest first. However, it has a paging mechanism, so it shows only $config[items_per_page] posts starting at number $pagingstart.

Select the rows from the posts table and the categories table, joined into a single table by the category ID (using what I call a lazy join, but that may just be my opinion and I'm not really a database guy), sorted in descending order by the time added, displaying only $items_per_page records starting at $pagingstart.

It select all columns from table posts and categories where posts.active is equal 1 and where posts.category is joined to the categories.catid and this is ordered by posts.time_added a limit start and end is set by the two variables $pagingstart, $config[items_per_page]

It's saying:
1) Select everything from both Posts & Categories where Posts.Active = 1 and Posts.Category = Category.CATID.
2) The Order by statement then specifies that they should be presented (from top to bottom) with the newest Post.Time_Added first.
3) Finally, the limit clause says (I think, I don't use limit very often): Only grab $spagingstart (a variable which has been set at some point) number of items, and only display $config[items_per_page] at a time.

Related

Why ORDER BY works only when I gave an alias name for the column, but didn't work just as column name?

This code cannot be executed, showed an error like
column "o.total_amt_usd" must appear in the GROUP BY clause or be used in an aggregate function
SELECT a.name, MIN(o.total_amt_usd)
FROM accounts a
JOIN orders o ON a.id = o.account_id
GROUP BY a.name
ORDER BY o.total_amt_usd
LIMIT 3
But after I use an alias, it worked:
SELECT a.name, MIN(o.total_amt_usd) small
FROM accounts a
JOIN orders o ON a.id = o.account_id
GROUP BY a.name
ORDER BY small
LIMIT 3
Could anybody explain a little bit about this, please?
Both logically make sense to me. But one of them is not working.
Thanks a lot.
You can't order by o.total_amt_usd since it's not available in the result set (after grouping on name).
You need to order by a grouped field or using an aggregate function like MIN. In this case you'll want to order by MIN(o.total_amt_usd) which is essentially what you are doing after using the alias in the order-clause.
Think about what you're asking of the first query and perhaps try a visual example by hand.
Imagine a table with just 4 rows, John has two rows with amounts of 50 and 20, Bob has two rows with amounts of 30 and 60.
You are asking for each unique name and the corresponding minimum amount, so the results are naturally John:20, Bob:30.
By asking to order your results by referring specfically to every row's Total (and not the aggregated total) you are saying, order my two rows by looking at all four rows, which means John could go both before Bob and after Bob given 20 is less than 30 and 50 is greater than 30.
You might look at the data visually and see the "correct" order, however for the query engine this makes no sense, you can only prioritise the resulting two rows based on their aggregated values, therefore you must order by those aggregated values, either using that column's alias or using the same expression. You cannot order by non-aggregated columns.

SQL: different results when using wildcard?

Using PostgreSQL 9.6.12.
Given an author has many blog posts.
When I run the following query I get a row for each associated post.
SELECT authors.id
FROM authors
LEFT JOIN posts ON authors.id = posts.author_id
When I run the following, I only get a row for each author:
SELECT authors.*
FROM authors
LEFT JOIN posts ON authors.id = posts.author_id
When I run a count on either one, however, I get the higher row count. E.g. the count of all the posts.
Why don't I get the higher row count result when I use the wildcard to select all the columns?
The problem could be caused by how you are running the query, and the settings of the IDE. These queries should return the same row count. Please run the following queries to check.
select count(*) from (SELECT authors.id
FROM authors
LEFT JOIN posts ON authors.id = posts.author_id)
select count(*) from (SELECT authors.*
FROM authors
LEFT JOIN posts ON authors.id = posts.author_id)
Why don't I get a cartesian product result when I use the wildcard to
select all the columns?
You do not get a cartesian product in either of the two SQL queries.
When I run a count on either one, however, I get the cartesian product
number of rows. E.g. the count of all the posts.
You are not calculating the count of all the posts. You are retrieving all posts that have an author in the authors table.
I am afraid you are confusing the term cartesian product. A cartesian product is the number of rows in the first table times the number of rows in the second table, without and limiting clause/condition. In simple SQL it would correspond to the following e.g.:
SELECT * FROM authors, posts
The two queries in your question return the exact same rows, except that the first query displays only the column id of the authors table while the second displays all the columns of the authors table.
This is standard SQL and I am very confident that every technology respecting the SQL standard would respect the above said.
I hope you see what I mean and suggest that you review the question. It may help if you can show some concrete example, in particular you would have to clarify:
what do you mean by "cartesian product"? (your definition differs from the common usage)
how do you count rows? (according to your example I find it hard to believe you count different number of rows; they must be equal)

Full text search across columns

Sorry for the bad post title but I couldn't summarize this better.
It's better to use an example. Say I have this simple table with two text columns (I'm leaving the other columns out).
Id Text_1 Text_2
1 a a b
2 c a b
Now if I want to search for '"a" and not "b"', in my current implementation I'm getting record 1 back. I understand why this is, it's because the search condition is a match on column "Text_1", while for record 2 it's not a match on any column.
However, for the end user this may not be intuitive, as they probably mean to exclude record 1 as well most of the time.
So my question is, if I want to tell SQL Server to do the matching "across all columns" (meaning that if the "NOT" portion is found on ANY column, the record shouldn't match), is it possible?
EDIT: This is what my query would look like for this example:
SELECT Id, TextHits.RANK Rank, Text_1, Text_2 FROM simple_table
JOIN CONTAINSTABLE(simple_table, (Text_1, Text_2), '"a" and not "b"') TextHits
ON TextHits.[KEY] = simple_table.Id
ORDER BY Rank DESC
The actual query is a bit more complicated (more columns, more joins, etc) but this is the general idea :)
Thanks!
The logic is going to be evaluated against each record so if you want an exclusion hit from one record in a row to cause an exclusion on the row you should use a NOT EXISTS and break out the fullText query into separate inclusionary and exclusionary parts...
SELECT Id,
TextHits.RANK Rank,
Text_1,
Text_2
FROM simple_table
JOIN CONTAINSTABLE(simple_table, (Text_1, Text_2), '"a"') TextHits
ON TextHits.[KEY] = simple_table.Id
WHERE NOT EXISTS (SELECT 1
FROM CONTAINSTABLE(simple_table, (Text_1, Text_2), '"b"') exclHits
WHERE TextHits.[KEY] = exclHits.[KEY])
ORDER BY Rank DESC

Filtering Database Results to Top n Records for Each Value in a Lookup Column

Let's say I have two tables in my database.
TABLE:Categories
ID|CategoryName
01|CategoryA
02|CategoryB
03|CategoryC
and a table that references the Categories and also has a column storing some random number.
TABLE:CategoriesAndNumbers
CategoryType|Number
CategoryA|24
CategoryA|22
CategoryC|105
.....(20,000 records)
CategoryB|3
Now, how do I filter out this data? So, I want to know what the 3 smallest numbers are out of each category and delete the rest. The end result would be like this:
TABLE:CategoriesAndNumbers
CategoryType|Number
CategoryA|2
CategoryA|5
CategoryA|18
CategoryB|3
CategoryB|500
CategoryB|1601
CategoryC|1
CategoryC|4
CategoryC|62
Right now, I can get the smallest numbers between all the categories, but I would like each category to be compared individually.
EDIT: I'm using Access and here's my code so far
SELECT TOP 10 cdt1.sourceCounty, cdt1.destCounty, cdt1.distMiles
FROM countyDistanceTable as cdt1, countyTable
WHERE cdt1.sourceCounty = countyTable.countyID
ORDER BY cdt1.sourceCounty, cdt1.distMiles, cdt1.destCounty
EDIT2: Thanks to Remou, here would be the working query that solved my problem. Thank you!
DELETE
FROM CategoriesAndNumbers a
WHERE a.Number NOT IN (
SELECT Top 3 [Number]
FROM CategoriesAndNumbers b
WHERE b.CategoryType=a.CategoryType
ORDER BY [Number])
You could use something like:
SELECT a.CategoryType, a.Number
FROM CategoriesAndNumbers a
WHERE a.Number IN (
SELECT Top 3 [Number]
FROM CategoriesAndNumbers b
WHERE b.CategoryType=a.CategoryType
ORDER BY [Number])
ORDER BY a.CategoryType
The difficulty with this is that Jet/ACE Top selects duplicate values where they exist, so you will not necessarily get three values, but more, if there are ties. The problem can often be solved with a key field, if one exists :
WHERE a.Number IN (
SELECT Top 3 [Number]
FROM CategoriesAndNumbers b
WHERE b.CategoryType=a.CategoryType
ORDER BY [Number], [KeyField])
However, I do not think it will help in this instance, because the outer table will include ties.
Order it by number and take 3, find out what the biggest number is and then remove rows where Number is greater than the Number.
I imagine it would need to be two seperate queries as your business tier would hold the value for the biggest number out of the 3 results and dynamically build the query to delete the rest.

Fetch last item in a category that fits specific criteria

Let's assume I have a database with two tables: categories and articles. Every article belongs to a category.
Now, let's assume I want to fetch the latest article of each category that fits a specific criteria (read: the article does). If it weren't for that extra criteria, I could just add a column called last_article_id or something similar to the categories table - even though that wouldn't be properly normalized.
How can I do this though? I assume there's something using GROUP BY and HAVING?
Try with:
SELECT *
FROM categories AS c
LEFT JOIN (SELECT * FROM articles ORDER BY id DESC) AS a
ON c.id = a.id_category
AND /criterias about joining/
WHERE /more criterias/
GROUP BY c.id
If you provide us with the Tables schemas, we could be a little more specific, but you could try something like (12.2.9.6. EXISTS and NOT EXISTS, SELECT Syntax for LIMIT)
SELECT *
FROM articles a
WHERE EXISTS (
SELECT 1
FROM articles
where category_id = a.category_id
AND <YourCriteria Here>
ORDER BY <Order Required : ID DESC, LastDate DESC or something?
LIMIT 1
)
Assuming the id's in the articles table represent always increasing numbers, this should work. Using the id is not semantically correct IMHO, you should actually use a time/date tamp field if one is available.
SELECT * FROM articles WHERE article_id IN
(
SELECT
MAX(article_id)
FROM
articles
WHERE [your filters here]
GROUP BY
category_id
)