I'm trying to solve a task from an online SQL course. There are 2 tables: city and shipment linked by city_id primary key. City table has a long_name column and shipment has a weight column, which I need to have in the output. The task is to find cities with maximum and minimum weight of a single shipment.
I tried making 2 queries with order by and limit and then unify them, but it turns out that I can only order by and limit the resulting table, not both.
I searched for possible solutions, but they involve sub-queries, UNPIVOT, greatest() or least(), which haven't been introduced in the course yet. The task has to be solved with UNION if I understood correctly.
I appreciate any help with this.
Here the query I tried which results in ERROR: syntax error at or near "UNION" Position: 127
SELECT
c.city_name, s.weight
FROM
sql.city c
JOIN sql.shipment s ON c.city_id = s.city_id
ORDER BY 2 DESC
LIMIT 1
UNION
SELECT
c.city_name, s.weight
FROM
sql.city c
JOIN sql.shipment s ON c.city_id = s.city_id
ORDER BY 2
LIMIT 1
Related
I have two database tables:
Cities with columns:
Country_Code | City_Code | City_Name
Countries with columns
Country_Code | Country_Name
Based on a few chars entered by User, it checks the City_Name column to return results to populate a City autocomplete box. The result needs to have the city code, city name, country code, and country name, hence the need for a join.
The query I am using is
SELECT TOP 10
ci.Country_Code, ci.City_Code, ci.City_Name, co.Country_Name
FROM
Cities ci
LEFT OUTER JOIN
Countries co ON ci.Country_Code = co.Country_Code
WHERE
ci.City_Name LIKE '#CityName'
ORDER BY
ci.City_Name
The results I get are correct, but the query takes a long time to complete. From what I understand, first, the results contain join of both the tables, then the where clause kicks in to get the specific rows only, which are ordered by City Name and top 10 results returned.
My question is, is there a way to speed up the query. Have the where clause checked, and then only perform the join, better still perform it only on the top 10 results? I tried putting my WHERE clause in the ON clause, but that gave wrong results.
EDIT : #CityName contains 2-3 chars entered by the user and then a '%'.
I'd suggest start with adding clustered index on Countries.Country_Code (also making it the primary key of the Countries table if it is not already so). An index would sort the table such that the search speed in join is increased.
This appears to be your query:
SELECT TOP 10 ci.Country_Code, ci.City_Code, ci.City_Name, co.Country_Name
FROM Cities ci LEFT OUTER JOIN
Countries co
ON ci.Country_Code = co.Country_Code
WHERE ci.City_Name LIKE #CityName
ORDER BY ci.City_Name ;
Quotes should not be needed around #CityName.
I don't understand the LEFT JOIN. It suggests that there are cities without a valid Country_Code -- and that seems unlikely.
Assuming #CityName does not start with a wildcard (as suggested by your question), then this can make use of an index. I would suggest the following indexes:
cities(city_name, country_code)
countries(country_code, country_name)
The second is not needed if country_code is a primary key.
I wrote several SQL queries and executed them against my table. Each individual query worked. I kept adding functionality until I got a really ugly working query. The problem is that I have to manually change a value every time I want to use it. Can you assist in making this query automatic rather than “manual”?
I am working with DB2.
Table below shows customers (cid) from 1 to 3. 'club' is a book seller, and 'qnty' is the number of books the customer bought from each 'club'. The full table has 45 customers.
Image below shows all the table elements for the first 3 users (cid=1 OR cid=2 OR cid=3). The final purpose of all my queries (once combined) is it to find the single 'club' with the largest 'qnty' for each 'cid'. So for 'cid =1' the 'club' is Readers Digest with 'qnty' of 3. For 'cid=2' the 'club' is YRB Gold with 'qnty' of 5. On and on until cid 45 is reached.
To give you a background on what I did here are my queries:
(Query 1-starting point for cid=1)
SELECT * FROM yrb_purchase WHERE cid=1
(Query 2 - find the 'club' with the highest 'qnty' for cid=1)
SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC
(Query 3 – combine the record from the above query with it’s cid)
SELECT cid,
temp.club,
temp.t_qnty
FROM yrb_purchase AS p,
(SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC FETCH FIRST 1 ROWS ONLY) AS TEMP
WHERE p.cid=1
AND p.club=temp.club
(Query 4) make sure there is only one record for cid=1
SELECT cid,
temp.club,
temp.t_qnty
FROM yrb_purchase AS p,
(SELECT *
FROM
(SELECT club,
sum(qnty) AS t_qnty
FROM yrb_purchase
WHERE cid=1
GROUP BY club)results
ORDER BY t_qnty DESC FETCH FIRST 1 ROWS ONLY) AS TEMP
WHERE p.cid=1
AND p.club=temp.club FETCH FIRST ROWS ONLY
To get the 'club' with the highest 'qnty' for customer 2, I would simply change the text cid=1 to cid=2 in the last query above. My query seems to always produce the correct results. My question is, how do I modify my query to get the results for all 'cid's from 1 to 45 in a single table? How do I get a table with all the cid values along with the club which sold that cid the most books, and how many books were sold within one tablei? Please keep in mind I am hoping you can modify my query as opposed to you providing a better query.
If you decide that my query is way too ugly (I agree with you) and choose to provide another query, please be aware that I just started learning SQL and may not be able to understand your query. You should be aware that I already asked this question: For common elements, how to find the value based on two columns? SQL but I was not able to make the answer work (due to my SQL limitations - not because the answer wasn't good); and in the absence of a working answer I could not reverse engineer it to understand how it works.
Thanks in advance
****************************EDIT #1*******************************************
The results of the answer is:
You could use OLAP/Window Functions to achieve this:
SELECT
cid,
club,
qnty
FROM
(
SELECT
cid,
club,
qnty,
ROW_NUMBER() OVER (PARTITION BY cid order by qnty desc) as cid_club_rank
FROM
(
SELECT
cid,
club,
sum(qnty) as qnty
FROM yrb_purchase
GROUP BY cid, club
) as sub1
) as sub2
WHERE cid_club_rank = 1
The inner most statement (sub1) just grabs a total quantity for each cid/club combination. The second inner most statement (sub2) creates a row_number for each cid/club combination ordering by the quantity (top down). Then the outer most query chooses only records where that row_number() is 1.
I have four different tables I am trying to query on, the first table is where I will be doing most of the querying, but if there is no match in car I am to look in other fields in the other tables to see if there is a match from a VIN parameter.
Example:
Select
c.id,
c.VIN,
c.uniqueNumber,
c.anotheruniqueNumber
FROM Cars c, Boat b
WHERE
c.VIN = #VIN(parameter),
b.SerialNumber = #VIN
Now say that I have no match in Cars, but there is a match in Boat, how would I be able to pull the matching Boat record vs the car record? I have tried to JOIN the tables, but the tables have no unique identifier to reference the other table.
I am trying to figure out what is the best way to search all the tables off of a parameter but with the least amount of code. I thought about doing UNION ALL, but not sure if that what I really want for this situation, seeing as the number of records could get extremely large.
I am currently using SQL Server 2012. Thanks in advance!
UPDATED:
CAR table
ID VIN UniqueIdentifier AnotherUniqueIdentifier
1 2002034434 HH54545445 2016-A23
2 2002035555 TT4242424242 2016-A24
3 1999034534 AGH0000034 2016-A25
BOAT table
ID SerialNumber Miscellaneous
1 32424234243 545454545445
2 65656565656 FF24242424242
3 20023232323 AGH333333333
Expected Result if #VIN parameter matches a Boat identifier:
BOAT
ID SerialNumber Miscellaneous
2 65656565656 FF24242424242
Some sort of union all might be the best approach -- at least the fastest with the right indexes:
Select c.id, c.VIN, c.uniqueNumber, c.anotheruniqueNumber
from Cars c
where c.VIN = #VIN
union all
select b.id, b.VIN, b.uniqueNumber, b.anotheruniqueNumber
from Boats b
where b.VIN = #VIN and
not exists (select 1 from Cars C where c.VIN = #VIN);
This assumes that you have the corresponding columns in each of the tables (which your question implies is true).
The chain of not exists can get longer as you add more entity types. A simple way around is to do sorting instead -- assuming you want only one row:
select top 1 x.*
from (Select c.id, c.VIN, c.uniqueNumber, c.anotheruniqueNumber, 1 as priority
from Cars c
where c.VIN = #VIN
union all
select b.id, b.VIN, b.uniqueNumber, b.anotheruniqueNumber, 2 as priority
from Boats b
where b.VIN = #VIN
) x
order by priority;
There is a slight overhead for the order by. But frankly speaking, ordering 1-4 rows is trivial from a performance perspective.
I am having a slow brain day...
The tables I am joining:
Policy_Office:
PolicyNumber OfficeCode
1 A
2 B
3 C
4 D
5 A
Office_Info:
OfficeCode AgentCode OfficeName
A 123 Acme
A 456 Acme
A 789 Acme
B 111 Ace
B 222 Ace
B 333 Ace
... ... ....
I want to perform a search to return all policies that are affiliated with an office name. For example, if I search for "Acme", I should get two policies: 1 & 5.
My current query looks like this:
SELECT
*
FROM
Policy_Office P
INNER JOIN Office_Info O ON P.OfficeCode = O.OfficeCode
WHERE
O.OfficeName = 'Acme'
But this query returns multiple rows, which I know is because there are multiple matches from the second table.
How do I write the query to only return two rows?
SELECT DISTINCT a.PolicyNumber
FROM Policy_Office a
INNER JOIN Office_Info b
ON a.OfficeCode = b.OfficeCode
WHERE b.officeName = 'Acme'
SQLFiddle Demo
To further gain more knowledge about joins, kindly visit the link below:
Visual Representation of SQL Joins
Simple join returns the Cartesian multiplication of the two sets and you have 2 A in the first table and 3 A in the second table and you probably get 6 results. If you want only the policy number then you should do a distinct on it.
(using MS-Sqlserver)
I know this thread is 10 years old, but I don't like distinct (in my head it means that the engine gathers all possible data, computes every selected row in each record into a hash and adds it to a tree ordered by that hash; I may be wrong, but it seems inefficient).
Instead, I use CTE and the function row_number(). The solution may very well be a much slower approach, but it's pretty, easy to maintain and I like it:
Given is a person and a telephone table tied together with a foreign key (in the telephone table). This construct means that a person can have more numbers, but I only want the first, so that each person only appears one time in the result set (I ought to be able concatenate multiple telephone numbers into one string (pivot, I think), but that's another issue).
; -- don't forget this one!
with telephonenumbers
as
(
select [id]
, [person_id]
, [number]
, row_number() over (partition by [person_id] order by [activestart] desc) as rowno
from [dbo].[telephone]
where ([activeuntil] is null or [activeuntil] > getdate()
)
select p.[id]
,p.[name]
,t.[number]
from [dbo].[person] p
left join telephonenumbers t on t.person_id = p.id
and t.rowno = 1
This does the trick (in fact the last line does), and the syntax is readable and easy to expand. The example is simple but when creating large scripts that joins tables left and right (literally), it is difficult to avoid that the result contains unwanted duplets - and difficult to identify which tables creates them. CTE works great for me.
I have the following data:
ExamEntry Student_ID Grade
11 1 80
12 2 70
13 3 20
14 3 68
15 4 75
I want to find all the students that passed an exam. In this case, if there are few exams
that one student attended to, I need to find the last result.
So, in this case I'd get that all students passed.
Can I find it with one fast query? I do it this way:
Find the list of entries by
select max(ExamEntry) from data group by Student_ID
Find the results:
select ExamEntry from data where ExamEntry in ( ).
But this is VERY slow - I get around 1000 entries, and this 2 step process takes 10 seconds.
Is there a better way?
Thanks.
If your query is very slow at with 1000 records in your table, there is something wrong.
For a modern Database system a table containing, 1000 entries is considered very very small.
Most likely, you did not provid a (primary) key for your table?
Assuming that a student would pass if at least on of the grades is above the minimum needed, the appropriate query would be:
SELECT
Student_ID
, MAX(Grade) AS maxGrade
FROM table_name
GROUP BY Student_ID
HAVING maxGrade > MINIMUM_GRADE_NEEDED
If you really need the latest grade to be above the minimum:
SELECT
Student_ID
, Grade
FROM table_name
WHERE ExamEntry IN (
SELECT
MAX(ExamEntry)
FROM table_name
GROUP BY Student_ID
)
HAVING Grade > MINIMUM_GRADE_NEEDED
SELECT student_id, MAX(ExamEntry)
FROM data
WHERE Grade > :threshold
GROUP BY student_id
Like this?
I'll make some assumptions that you have a student table and test table and the table you are showing us is the test_result table... (if you don't have a similar structure, you should revisit your schema)
select s.id, s.name, t.name, max(r.score)
from student s
left outer join test_result r on r.student_id = s.id
left outer join test t on r.test_id = t.id
group by s.id, s.name, t.name
All the fields with id in it should be indexed.
If you really only have a single test (type) in your domain... then the query would be
select s.id, s.name, max(r.score)
from student s
left outer join test_result r on r.student_id = s.id
group by s.id, s.name
I've used the hints given here, and here the query I found that runs almost 3 orders faster than my first one (.03 sec instead of 10 sec):
SELECT ExamEntry, Student_ID, Grade from data,
( SELECT max(ExamEntry) as ExId GROUP BY Student_ID) as newdata
WHERE `data`.`ExamEntry`=`newdata`.`ExId` AND Grade > 60;
Thanks All!
As mentioned, indexing is a powerful tool for speeding up queries. The order of the index, however, is fundamentally important.
An index in order of (ExamEntry) then (Student_ID) then (Grade) would be next to useless for finding exams where the student passed.
An index in the opposite order would fit perfectly, if all you wanted was to find what exams had been passed. This would enable the query engine to quickly identify rows for exams that have been passed, and just process those.
In MS SQL Server this can be done with...
CREATE INDEX [IX_results] ON [dbo].[results]
(
[Grade],
[Student_ID],
[ExamEntry]
)
ON [PRIMARY]
(I recommend reading more about indexs to see what other options there are, such as ClusterdIndexes, etc, etc)
With that index, the following query would be able to ignore the 'failed' exams very quickly, and just display the students who ever passed the exam...
(This assumes that if you ever get over 60, you're counted as a pass, even if you subsequently take the exam again and get 27.)
SELECT
Student_ID
FROM
[results]
WHERE
Grade >= 60
GROUP BY
Student_ID
Should you definitely need the most recent value, then you need to change the order of the index back to something like...
CREATE INDEX [IX_results] ON [dbo].[results]
(
[Student_ID],
[ExamEntry],
[Grade]
)
ON [PRIMARY]
This is because the first thing we are interested in is the most recent ExamEntry for any given student. Which can be achieved using the following query...
SELECT
*
FROM
[results]
WHERE
[results].ExamEntry = (
SELECT
MAX([student_results].ExamEntry)
FROM
[results] AS [student_results]
WHERE
[student_results].Student_ID = [results].student_id
)
AND [results].Grade > 60
Having a sub query like this can appear slow, especially since it appears to be executed for every row in [results].
This, however, is not the case...
- Both main and sub query reference the same table
- The query engine scans through the Index for every unique Student_ID
- The sub query is executed, for that Student_ID
- The query engine is already in that part of the index
- So a new Index Lookup is not needed
EDIT:
A comment was made that at 1000 records indexs are not relevant. It should be noted that the question states that there are 1000 records Returned, not that the table contains 1000 records. For a basic query to take as long as stated, I'd wager there are many more than 1000 records in the table. Maybe this can be clarified?
EDIT:
I have just investigated 3 queries, with 999 records in each (3 exam results for each of 333 students)
Method 1: WHERE a.ExamEntry = (SELECT MAX(b.ExamEntry) FROM results [a] WHERE a.Student_ID = b.student_id)
Method 2: WHERE a.ExamEntry IN (SELECT MAX(ExamEntry) FROM resuls GROUP BY Student_ID)
Method 3: USING an INNER JOIN instead of the IN clause
The following times were found:
Method QueryCost(No Index) QueryCost(WithIndex)
1 23% 9%
2 38% 46%
3 38% 46%
So, Query 1 is faster regardless of indexes, but indexes also definitely make method 1 substantially faster.
The reason for this is that indexes allow lookups, where otherwise you need a scan. The difference between a linear law and a square law.
Thanks for the answers!!
I think that Dems is probably closest to what I need, but I will elaborate a bit on the issue.
Only the latest grade counts. If the student had passed first time, attended again and failed, he failed in total. He/She could've attended 3 or 4 exams, but still only the last one counts.
I use MySQL server. The problem I experience in both Linux and Windows installations.
My data set is around 2K entries now and grows with the speed of ~ 1K per new exam.
The query for specific exam also returns ~ 1K entries, when ~ 1K would be the number of students attended (received by SELECT DISTINCT STUDENT_ID from results;), then almost all have passed and some have failed.
I perform the following query in my code:
SELECT ExamEntry, Student_ID from exams WHERE ExamEntry in ( SELECT MAX(ExamEntry) from exams GROUP BY Student_ID). As subquery returns about ~1K entries, it appears that main query scans them in loop, making all the query run for a very long time and with 50% server load (100% on Windows).
I feel that there is a better way :-), just can't find it yet.
select examentry,student_id,grade
from data
where examentry in
(select max(examentry)
from data
where grade > 60
group by student_id)
don't use
where grade > 60
but
where grade between 60 and 100
that should go faster