TSQL change in query to and query - sql

I have one to many relationship table
ReviewId EffectId
1 | 2
1 | 5
1 | 8
2 | 2
2 | 5
2 | 9
2 | 3
3 | 3
3 | 2
3 | 9
In the site the users select each effect he chooses, and I get all the relevant review.
I make an in query
For example if the user select effects 2 and 5
My query: “
select reviewed from table_name where effected in(2,5)
Now I need get all the review that contain both effect
All reviews that has effect 2 and effect 5
What is the best query to make this?
Important for me that the query will run as quick as possible.
And for this I can also change the table schema (if needed ) like add a cached field that contain all the effect with comma like
Reviewed cachedEffects
1 | ,2,5,8
2 | ,2,5,9,3,
3 | ,3,2,9

You can do it this way:
select reviewid
from
tbl
where effectid in (2,5)
group by reviewid
having count(distinct effectid) > 1
Demo
count (distinct effectid) is used to ensure that the results contain only those reviewIDs which have multiple records with different values of effectID. The where clause is used to filter out based on your filter condition of having both 2 and 5.
The key thing to note here is that we are grouping by reviewID, and also using the count of distinct effectID values to ensure that only those records which have both 2 and 5 are returned. If we did not do so, the query would return all rows which have effectID equal to either 2 or 5.
For improving performance, you could create an index on reviewID.

Related

Flatten tree structure represented in SQL [duplicate]

This question already has an answer here:
SQL Server recursive self join
(1 answer)
Closed 3 years ago.
I'm using an engineering calculation package and trying to extract some information from it in a built in reporting tool that allows SQL query
An abbreviated example SQL tables are as follows:
Id | Description | Ref
---|---------------------
1 | system 1 |
3 | block 4 | 6
3 | block 4 | 1
5 | formula1 | 3
6 | f |
7 | something | 1
9 | cheese | 5
The "Ref" column identifies rows that are subrecords of other items.
What I want to do is run a query that will produce a list that will show all items that appear on a each page. As you can see from the table above "ID" is not the unique key; each item can appear in multiple locations within the table. In the example above:
ID 5 is a subitem of ID3
ID 3 is a subitem of ID 1 AND ID 6
ID 1 and ID 6 aren't subitems of anything
So effectively it is representing a tree structure:
ID 1
+-------- ID 7
|---- ID 3
+---- ID 5
+---- ID 9
ID 6
+---- ID 3
+---- ID 5
+---- ID 9
What I'm hoping to is work out which items appear under each top level item (so the end result should be a table where in the "Ref" column only top level items appear):
Id | Description | Ref
---|---------------------
1 | system 1 |
3 | block 4 | 6
3 | block 4 | 1
5 | formula1 | 1
5 | formula1 | 6
6 | f |
9 | cheese | 1
9 | cheese | 6
7 | something | 1
The tree structure can be a total of 5 levels deep
I've been trying to use left joins to build up a list of page references, but I think I'm also going to need to union results tables (because obviously rows like ID=9, ID=5, and ID = 6 have to be duplicated in the final results set). It starts to get a bit messy!
WITH A
AS (SELECT *
FROM [RbdBlocks]),
B
AS (SELECT [x].[Id],
[x].[Description],
[x].[Page] AS Page1,
[y].[Page] AS Page2,
FROM A AS x
LEFT OUTER JOIN
A AS y
ON y.Id = x.Page)
SELECT *
FROM B
The above gives me some of the nested references, but I'm not sure if there's a better way to get this data together, and to manage the recursion rather than just duplicating the set of queries 4 times?
Have a look at Recursive Common Table Expressions (CTEs). They should be able to accomplish exactly what you need.
Have a look at Example D on the SQL Docs page.
Basically what you'd do in your case is:
In the "anchor member" of the CTE, select all top-level items
In the "recursive member" of the CTE, join all of the nested children to the top-level item
Recursive CTEs are not really trivial to understand, so be sure to read the docs carefully.

How does DISTINCT interact with ORDER BY?

Consider the two tables below:
user:
ID | name
---+--------
1 | Alice
2 | Bob
3 | Charlie
event:
order | user
------+------------
1 | 1 (Alice)
2 | 2 (Bob)
3 | 3 (Charlie)
4 | 3 (Charlie)
5 | 2 (Bob)
6 | 1 (Alice)
If I run the following query:
SELECT DISTINCT user FROM event ORDER BY "order" DESC;
will it be guaranteed that I get the results in the following order?
1 (Alice)
2 (Bob)
3 (Charlie)
If the three last rows of event are selected, I know this is the order I get, because it would be ordering 4, 5, 6 in descending order. But if the first three rows are selected, and then DISTINCT prevents the last tree to be loaded for consideration, I would get it in reversed order.
Is this behavior well defined in SQL? Which of the two will happen? What about in SQLite?
No, it will not be guaranteed.
Find Itzik Ben-Gan's Logical Query Processing Phases poster for MS SQL. It migrates over many sites, currently found at https://accessexperts.com/wp-content/uploads/2015/07/Logical-Query-Processing-Poster.pdf .
DISTINCT preceeds ORDER BY .. TOP and Sql Server is free to return any of 1 | 1 (Alice) or 6 | 1 (Alice) rows for Alice. So any of (1,2,3), (1,4,5) an so on are valid results of DISTINCT.
Here's a query solution that I believe solves your problem.
SELECT
MAX([order]) AS MaxOrd
, [user]
FROM Event
GROUP BY [User]
ORDER BY MaxOrd DESC

PostgreSQL: Distribute rows evenly and according to frequency

I have trouble with a complex ordering problem. I have following example data:
table "categories"
id | frequency
1 | 0
2 | 4
3 | 0
table "entries"
id | category_id | type
1 | 1 | a
2 | 1 | a
3 | 1 | a
4 | 2 | b
5 | 2 | c
6 | 3 | d
I want to put entries rows in an order so that category_id,
and type are distributed evenly.
More precisely, I want to order entries in a way that:
category_ids that refer to a category that has frequency=0 are
distributed evenly - so that a row is followed by a different category_id
whenever possible. e.g. category_ids of rows: 1,2,1,3,1,2.
Rows with category_ids of categories with frequency<>0 should
be inserted from ca. the beginning with a minimum of frequency rows between them
(the gaps should vary). In my example these are rows with category_id=2.
So the result could start with row id #1, then #4, then a minimum of 4 rows of other
categories, then #5.
in the end result rows with same type should not be next to each other.
Example result:
id | category_id | type
1 | 1 | a
4 | 2 | b
2 | 1 | a
6 | 3 | d
.. some other row ..
.. some other row ..
.. some other row ..
5 | 2 | c
entries are like a stream of things the user gets (one at a time).
The whole ordering should give users some variation. It's just there to not
present them similar entries all the time, so it doesn't have to be perfect.
The query also does not have to give the same result on each call - using
random() is totally fine.
frequencies are there to give entries of certain categories a higher
priority so that they are not distributed across the whole range, but are placed more
at the beginning of the result list. Even if there are a lot of these entries, they
should not completely crowd out the frequency=0 entries at the beginning, through.
I'm no sure how to start this. I think I can use window functions and
ntile() to distribute rows by category_id and type.
But I have no idea how to insert the non-0-category-entries afterwards.

Don't understand why inner join is necessary for filtering in sql

I have the following tables:
Basically I have a many2many relation between students and courses using the junction table students_courses
Here is some data populated into the tables:
students:
courses
students_courses:
So basically I would like to select the full_name and c_id for a given student. So for example for student with id=3 i would have Aurica 5 and Aurica 6.
My first approach was to write:
select s.full_name,sc.c_id from students s, students_courses sc
where sc.s_id=3
But i obtain this:
Aurica 5
Aurica 6
Aurica 5
Aurica 6
Aurica 5
Aurica 6
So it is duplicated by the number of rows of the students_courses table. Now I'm not sure why this happens.
If I would be an SQL parser, I would parse it like this:
"take the c_id from students_courses, full_name from students, and display them if the students_course row respects the where filter"
Not it works using join, but I don't really understand why the inner join is necessary.
select s.full_name, sc.c_id from students s
inner join students_courses sc
on sc.s_id=s.id and s.id=3;
Explain a bit how is the first sql interpreted by the SQL parser and why with join works.
Thanks,
When you select information from two tables what it does is a cross product of all the records and then it looks to the all of the records that satisfy the where clause. You have 3 records in the Students table
id | full_name
---+----------
3 | Aurica
4 | Aurica
5 | Aurica
And 6 records in the student_courses table.
s_is | c_id
-----+-----
3 | 5
3 | 6
4 | 7
4 | 8
5 | 9
5 | 10
So before your where statement it creates 18 different records. So it is easy to see I will include all of the columns.
s.id | s.full_name | sc.s_id | sc.c_id
-----+-------------+---------+--------
3 | Aurica | 3 | 5
3 | Aurica | 3 | 6
3 | Aurica | 4 | 7
3 | Aurica | 4 | 8
3 | Aurica | 5 | 9
3 | Aurica | 5 | 10
4 | Aurica | 3 | 5
4 | Aurica | 3 | 6
4 | Aurica | 4 | 7
4 | Aurica | 4 | 8
4 | Aurica | 5 | 9
4 | Aurica | 5 | 10
5 | Aurica | 3 | 5
5 | Aurica | 3 | 6
5 | Aurica | 4 | 7
5 | Aurica | 4 | 8
5 | Aurica | 5 | 9
5 | Aurica | 5 | 10
From there it only displays the ones where cs.id=3
s.full_name | sc.c_id
------------+--------
Aurica | 5
Aurica | 6
Aurica | 5
Aurica | 6
Aurica | 5
Aurica | 6
The second query you had compared the value of sc.s_id=s.id and only displays the ones where those values are the same, as well as the c_id=3
The SQL parser doesn't try to guess how your two tables are related. It would seem like the database engine has enough information to figure this out itself by following constraints, but SQL intentionally doesn't use the FK relationships to decide how to join your tables; you might want to remove constraints at at future date for some reason (such as in order to improve performance), and you wouldn't want dropping a constraint to alter how joins were made. The DBA needs freedom to change indexes and constraints without having to worry about having changed what results are returned by queries.
Since it can't count on having complete information to go on, the SQL engine is not in the business of deducing/guessing relationships. It's up to the person writing the SQL to specify what they are joining on. If you don't give it any instructions telling it how to hook up the tables (using a JOIN ON clause or WHERE clause) then it creates a cross join, which gives you the duplicated results.
First of all, SQL is a set-based language, you operate on sets of data, not on single (rows of) data.
If I would be an SQL parser, I would parse it like this: "take the
c_id from students_courses, full_name from students, and display them
if the students_course row respects the where filter"
Here, you're overlooking the sets students_courses and students, and just thinking about each row of data, like if this rows respects the filter, give me all the informations.
The JOIN doesn't filter data (that's what WHERE does), but instead it puts it together.
When you SELECT from table A, you ask for the set of rows in A, all of them.
When you SELECT from table A WHERE some condition, you ask for the set of rows in A that respect the condition (so the SQL engine discards rows from A that do not belong to the set you described with your query).
When you JOIN table_a and table_b, you ask to join the set of rows in a with the set of rows in b, obtaining a new set whose rows are the "concatenation" (let me use that term) of the columns from a row in A and the columns from a row in B; this, without giving any other information about how to join the rows, simply results in each row of table_a joined with each row of table_b.
That's why you don't get what you expect.
Finally, from a conceptual point of view, I'd like to point out that the SQL engine doesn't take the columns you request from a table or another, but after (1) having joined the rows in any table you requested and (2) having filtered out any row that doesn't match the where condition, it just return the columns you requested from the rows of the resulting set after (1) and (2).
In real life, RDBMS may reorder these operations, and apply any kind of optimization they find possible based on indexes and other query and tables informations they have available.
This should give you a rough idea of what's going on. But as #GordonLinoff suggested you, I think you should get a stronger basis about SQL and relational databases before you go any further, or it will get harder than this.
As a side note, what you had in your FROM clause, is a sort of implicit join, a former join syntax in which the FROM clause specifies the tables involved, and the WHERE clause the join predicate (the columns whose values should match to join the data).
If you would have done something like
select s.full_name,sc.c_id
from students s, students_courses sc
where sc.s_id = s.id --<-- you left this out
AND sc.s_id=3
You would have got the same results, Inner join is not necessary for this statement but it is a good practice to use this newer INNER JOIN Syntax to retrieve data.
Both of your queries are in fact joins, only in your first example there is no word "join" (but it is there, trust me).
However, that's an old style join and it's not recommended to use any more. In short, it's about NULL values - this old style join has an problem with interpreting NULL values and that's why you have wrong result.
For more details see here.

Sort by data from multiple columns

For customer reviews on my products, I have them stored in SQL something like the below:
durability | cost | appearance
----------------------------------
5 | 3 | 4
2 | 4 | 2
1 | 5 | 5
Each value is an out of five score in the three categories.
When I want to print this information on page, I'd like to order them in descending order by the average score of an individual review.
SELECT *
FROM reviews
ORDER BY (durability+cost+appearance)/3 DESC
Obviously this doesn't work, but is there a way to get my result? I don't want to include an average column in SQL because outside of this one small application, it serves zero purpose.
Use ORDER BY instead of SORT BY:
SELECT *
FROM reviews
ORDER BY (durability+cost+appearance)/3 DESC
EDIT:
To see the order by value, try adding one more column in the select clause:
SELECT *,(durability+cost+appearance)/3 as OrderValue
FROM reviews
ORDER BY (durability+cost+appearance)/3 DESC
Sample output:
DURABILITY COST APPEARANCE ORDERVALUE
5 3 4 4
1 5 5 3
2 4 2 2