For joins in SQL, Is common column not compulsory? - sql

I came to know that we can use other than JOIN ON a.col = b.col, with JOIN ON a.col CONDITION b.col2. How does it works?.
Example:
2 Tables are:
We can join them as follows (ignoring the required output code, how is that join worked?)
SELECT Students.Name, Grades.Grade, Students.Marks FROM Students
INNER JOIN Grades ON Students.Marks BETWEEN Grades.Min_Mark AND Max_Mark
WHERE Grades.Grade > 7
ORDER BY Grades.Grade DESC, Students.Name ASC;

Joins don't have to nominate a column at all:
SELECT * FROM People CROSS JOIN Addresses
This combines all people with all addresses. If there were 2 people and 3 addresses, 6 records would result
Person.Name, Address.Name
-------------------------
Person1, Address1
Person1, Address2
Person1, Address3
Person2, Address1
Person2, Address2
Person2, Address3
Joins that have a condition don't have to use any columns from the sets of data being joined, they just have to evaluate to true in order for the row to appear in the output. You can consider that when a database is processing any join, it first produces the cross product of every row (like above) then the truth of the condition is checked per row to decide which of those rows make it into the output
First let's do a join on something that makes some sense, using the above data
SELECT * FROM People INNER JOIN Addresses ON Person.Name = Address.Name
That would produce nothing, because 'Person1' is never equal to 'Address1' and so on.. But suppose we altered it to:
SELECT * FROM People INNER JOIN Addresses ON REPLACE(Person.Name, 'Person', 'Address') = Address.Name
The 6 rows would be prepared like before:
Person1, Address1
Person1, Address2
Person1, Address3
Person2, Address1
Person2, Address2
Person2, Address3
The DB would replace the word Person with the word Address just while it was evaluating the truth of the join, and the tests would be performed:
Person1, Address1 --'Person1'->'Address1', does 'Address1'='Address1'? YES; OUTPUT the row
Person1, Address2 --'Person1'->'Address1', does 'Address1'='Address2'? no; discard the row
Person1, Address3 --'Person1'->'Address1', does 'Address1'='Address3'? no; discard the row
Person1, Address1 --'Person2'->'Address2', does 'Address2'='Address1'? no; discard the row
Person2, Address2 --'Person2'->'Address2', does 'Address2'='Address2'? YES; OUTPUT the row
Person2, Address3 --'Person2'->'Address2', does 'Address2'='Address3'? no; discard the row
So you get 2 rows:
Person1, Address1
Person2, Address2
Now let's make it really wacky; Suppose you had:
SELECT * FROM People INNER JOIN Addresses ON DAY_OF_WEEK(NOW()) = 'Monday'
The query would produce 6 rows, but only on Monday. As soon as it turned to Tuesday the query would produce 0 rows. It doesn't make much sense to do, but you're still allowed to do it. So long as you provide something the DB can evaluate to true, or false, the DB will join every row to every other row, then check the truth for every combination, and discard any combination if it sees a false
Imagine other scenarios. Person and Address are supposed to be related on Address having a PersonId
If you did:
SELECT * FROM People LEFT JOIN Addresses ON Person.Id = (Address.PersonId + 1)
You'd see:
PersonId, Name, Address_PersonId, Street
0, Tim, NULL, NULL
1, John, 0, TheRoad
2, Mary, 1, TheAvenue
It's supposed to be Tim that lives at TheRoad, but we wrote a nonsensical join condition that was evaluated and churned out results anyway
You could divide the ID by 2, you could join on a random number being less than 0.5.. It doesn't matter, it's just a truth and most of the time it's at its most useful when it uses column data..
how is that join worked?
This BETWEEN form is actually quite a useful one. It allows you to band loads of different scores into set bands.
Suppose you have 2 people scores and 3 bands (i'm keeping it small to make it easier to type out):
Name, Score
Tim, 79
John, 68
ScoreLower, ScoreHigher, Rating
0, 50, Bronze
51, 75, Silver
76, 100, Gold
And you do
SELECT * FROM People JOIN Scores ON Score BETWEEN ScoreLower AND ScoreHigher
Remember that the DB conceptually combines EVERY person with EVERY score band first:
Tim, 79, 0, 50, Bronze
Tim, 79, 51, 75, Silver
Tim, 79, 76, 100, Gold
John, 68, 0, 50, Bronze
John, 68, 51, 75, Silver
John, 68, 76, 100, Gold
And then it goes through knocking out the ones that aren't true
Tim, 79, 0, 50, Bronze --FALSE, 79 is not BETWEEN 0 and 50, discard this one
Tim, 79, 51, 75, Silver --FALSE, 79 is not BETWEEN 51 and 75, discard this one
Tim, 79, 76, 100, Gold --TRUE, keep
John, 68, 0, 50, Bronze --FALSE, discard
John, 68, 51, 75, Silver --TRUE, keep
John, 68, 76, 100, Gold --FALSE, discard
And you get just the keeps:
Tim, 79, 76, 100, Gold
John, 68, 51, 75, Silver
You could have a list of all the chinese astrological years, and a list of people with a known birthday, and then do their birthday BETWEEN yearstart AND yearend and find out if they're a Horse, Dog etc.. You could have a list of all the letters in the alphabet and a color, and put people into color groups based on the first letter of their name. It doesn't have to be =, we could use LIKE:
People JOIN AlphabetColors ON People.FirstName LIKE AlphabetColors.Letter + '%'
Or we could:
People JOIN AlphabetColors ON LEFT(People.FirstName, 1) = AlphabetColors.Letter
Either way, data like:
Albert
Bill
Charlie
A, Red
B, Green
C, Blue
Ends up as
Albert, A, Red
Bill, B, Green
Charlie, C, Blue

Related

Write a query that returns students with the largest amount of the best marks in each subject

There is a table(named Students) containing columns:
Name, Subject, Mark
It is required to return the name and the Subject of the best students.
Student is considered best in the subject if he/her has the largest amount of the best marks.
So, if there are entries in the table:
('John', 'Math', 10),
('John', 'Math', 10),
('John', 'Math', 11),
('Mia', 'Math', 10),
('Mia', 'Math', 11);
('Bob', 'Science', 12),
('Bob', 'Science', 11),
('Ross', 'Science', 11),
('Ross', 'Science', 12),
('Ross', 'Science', 12)
The query should return
John Math
Ross Science
Because John has two tens and one eleven. Mia has one ten less.
I understand that I need to group entries by Subject, Name and Mark and count the amount of same marks. I tried the following query:
SELECT NAME, SUBJECT, MARK, COUNT(*) AS COUNT
FROM STUDENTS
GROUP BY SUBJECT, NAME, MARK
It returns:
John Math 10 2
John Math 11 1
Mia Math 10 1
Mia Math 11 1
Bob Science 12 1
Bob Science 11 1
Ross Science 11 1
Ross Science 12 2
I have an idea that I need to discard the entries where students have the same amount of particular marks. Here they are Mia and John's elevens. They are of the same amount. So table will look like this:
John Math 10 2
Mia Math 10 1
Bob Science 12 1
Ross Science 12 2
And now I have to pick the student with larger amount of marks. But the problem is I have no idea how to do this.
I do not ask for the full solution. I particularly ask for help regarding my idea, whether it is reasonable, and, if not, to suggest an alternative
I counted the top marks per name and subject and then numbered by max(mark) desc and then count desc to get only the result with the most counts.
select name
,subject
from (
select name
,subject
,rank() over(partition by subject order by max(mark) desc, count(*) desc) as rnk
from t
group by name, subject
) t
where rnk = 1
name
subject
John
Math
Ross
Science
Fiddle
This sounds like homework so I will answer in the most general terms for you to investigate to completion.
Investigate the SQL language looking for topics dealing with:
aggregating data using different functions
ordering results
limiting the number of results returned
You will most likely have to use different functions and group by syntax in the end.

VB.NET\Query - COUNT / GROUP BY QUERY PRODUCES DIFFERENT RESULTS

I got a query from a question I asked earlier. It worked perfectly. I wanted to count the total number of entries for each student and display their total in a datagridview.
So Jason Smith attended 2 different days which are 2 different entries by date in my table and the datagridview displays 2 for Jason's total classes. David Harris attended 1 class and the datagridview displays 1 for David's total classes.
Here is the query that worked.
SELECT
FirstName AS [First Name],
LastName AS [Last Name],
TrainDate AS [Training Date], Count(*) AS TotalCount
FROM
ACJATTENDANCE
GROUP BY FirstName, LastName, TrainDate, TrainDate
In the query that didn't work it display all the dates attended for each person and a 1 for total class on each date in the datagridview.
SELECT
ID AS [STUDENT ID],
FirstName AS [First Name],
LastName AS [Last Name],
TrainDate AS [Training Date], Count(*) AS TotalCount
FROM
ACJATTENDANCE
GROUP BY ID, FirstName, LastName, TrainDate, TrainDate
Why didn't my modification work the same way?
Because you group by ID, which (if it's a proper ID) is unique, meaning that every group will have a size of 1
Let's look at an artificial example:
Animals
ID, Type
1, Cat
2, Cat
3, Cat
4, Cat
5, Dog
6, Dog
4 cats, 2 dogs.
SELECT Type, COUNT(*) FROM Animals GROUP BY Type
Cat, 4
Dog, 2
In your mind, imagine a bucket with "Cat" written on and another with "Dog" written on. Write the rows out on individual sheets of paper and put them into the buckets. That's what the grouping does; you'll end up with 4 sheets of paper in the cat bucket and 2 in the dog bucket. There is one bucket per unique value: cat is one value, dog is another = 2 buckets
Now if you group by ID, you need one bucket for every unique value. That's 6 buckets, numbered 1 to 6, with one piece of paper in each
If you have multiple clauses in a group by, you end up with as many buckets as there are combinations of unique values:
Animals
ID, Type, Neutered/Spayed
1, Cat, Yes
2, Cat, No
3, Cat, Yes
4, Cat, No
5, Dog, Yes
6, Dog, Yes
SELECT Type, Neutered, COUNT(*) GROUP BY Type, Neutered
Cat, Yes, 2
Cat, No, 2
Dog, Yes, 2
GROUP BY has generated 3 buckets this time with 2 papers in each, because some cats have been neutered and some have not, but all the dogs have been neutered. There is no "Dog/No" grouping because none of the rows have this combination.
Thus, the number of groups you get is the number of distinct combinations of the column data specified in the group by. When you specify an ID, which is unique for every row, it doesn't matter what other things you group by - you will always only ever get a group size of 1 if you group by a column that is unique

Why am I having so many records after a join?

I'm trying to join two tables; one reporting information on media campaigns and another on TV spots... Because the second table does not contain any information on campaigns, and because I don't really need it now, I'm joining the two tables based on the id of the TV spot (here, "creative_id") as well as date.
The problem is, there are approx. 6.7 million records in the first table, so I don't understand why, when I run this, I get more than 17 million... :( Can you help, please?
alter view halo2 as
select
h.date,h.channel,h.strategy,h.creative_id,h.programme,h.sub_programme,h.l_c_b,
h.media_plan_split,h.pu,h.conversion_type,h.conversion_new_or_upgrade,
case when h.conversion_new_or_upgrade like '%new%' then 1 else 0 end #acquisitions,
case when h.conversion_new_or_upgrade like '%upg%' then 1 else 0 end #upgrades,
case when h.conversion_contract_length like '%12%' then 12
when h.conversion_contract_length like '%24%' then 24
when h.conversion_contract_length like '%30%' then 1
else 0 end contract_length_in_months,
h.conversion_device_manufacturer,
h.conversion_device,
h.media_spend,
h.#halo,
ft.u10/100 as upfront_cost,
(ft.sales_value+cast(ft.u28 as float))/100 as monthly_cost
from halo h left join in_ft_conversion ft
on h.creative_id=ft.creative_id
and
h.date=ft.sales_date
Here are two tables:
TableA
Number, Text
----
1, Hello
1, There
1, World
TableB
Number, Text
----
1, Foo
1, Bar
1, Baz
Here is a query that joins them:
SELECT * FROM TableA a INNER JOIN TableB b ON a.Number = b.Number
Here are the results:
a.Number, a.Text, b.Number, b.Text
----------------------------------
1, Hello, 1, Foo
1, Hello, 1, Bar
1, Hello, 1, Baz
1, There, 1, Foo
1, There, 1, Bar
1, There, 1, Baz
1, World, 1, Foo
1, World, 1, Bar
1, World, 1, Baz
This is called a Cartesian product; there isn't a 1:1 or even a 1:Many mapping between A and B, there is a Many:Many mapping. Each row in A maps to each row in B, the result that we started out with 3 rows in A, 3 rows in B, if they'd been 1:1 we'd have a 3 row resultset, but because each of 3 rows match up to another 3 rows, we got a 9 row result (3 * 3).
Any time that one of your tables has rows that match to more than one row of another table, as according to your join condition, your resulting row count will increase/multiply
The most common situation is you have more than one record in the joined table that satisfies the join condition(Predicate). If your predicate is not unique on the second table records will repeat on left join.

SQL Join Challenge

Ok, so I've been stuck on this for 2 days! I've solved it from a semantic point of view but the query can take up to 10 minutes to execute. My database of choice for this is SQLite (for reasons I do not want to elaborate here), but I have tried running the same thing on a SQL Server 2012, it didn't make much of a difference in performance.
So, the problem is that I have 2 tables
prices (product_id INT, for_date DATE, value INT)
events (starts_on DATE, ends_on DATE NULLABLE)
I have approximately 500K rows in the prices table and around 100 rows in the events table.
Now I need to write a query to do the following.
Pseudo code is:
For each event:
IF the event has an ends_on value THEN fetch all product_id(s) that have a matching for_date, For products that DO NOT MATCH then fetch the last for_date which is less than the ends_on value but greater than starts_on for that event.
ELSE IF the ends_on date of the event is NULL, THEN fetch all product_id(s) that have a for_date that matches to starts_on, For products that DO NOT MATCH fetch the last for_date which is less than the starts_on value.
The query I have written in SQL Server 2012 is
SELECT
sp.for_date, sp.value
FROM
prices sp
INNER JOIN
events ev ON (((ev.ends_on IS NOT NULL AND
(sp.for_date = (SELECT for_date
FROM prices
WHERE for_date <= ev.ends_on
AND for_date > ev.starts_on
ORDER BY for_date DESC
OFFSET 0 ROWS
FETCH NEXT 1 ROWS ONLY))))
OR
((ev.ends_on is null
and
(sp.for_date = (SELECT for_date
FROM prices
WHERE
for_date <= ev.starts_on_j
AND for_date > dateadd(day, -14, ev.starts_on)
order by for_date desc
offset 0 rows
fetch next 1 row only))))
);
Btw I have also tried to create temp tables with partial data and done the same op on them. It just gets stuck.
The strange thing is if I run the 2 "OR" conditions separately, the response time is perfect !
Update
Sample Dataset and Expected Result
Price Entries
Product ID, ForDt, Value
1, 25-01-2010, 123
1, 26-01-2010, 112
1, 29-01-2010, 334
1, 02-02-2010, 512
1, 03-02-2010, 765
1, 04-02-2010, 632
1, 05-02-2010, 311
1, 06-02-2010, 555
2, 03-02-2010, 854
2, 04-02-2010, 625
2, 05-02-2010, 919
3, 20-01-2010, 777
3, 06-02-2010, 877
3, 10-03-2010, 444
3, 11-03-2010, 888
Event Entries (To make it more understandable, Im adding an event id also)
Event ID, StartsOn, EndsOn
22, 27-01-2010, NULL
33, 02-02-2010, 06-02-2010
44, 01-03-2010, 13-03-2010
Expected Result Set
Event ID, Product ID, ForDt, Value
22, 1, 26-01-2010, 112
33, 1, 06-02-2010, 311
44, 1, 06-02-2010, 311
33, 2, 05-02-2010, 919
44, 2, 05-02-2010, 919
22, 3, 20-01-2010, 777
33, 3, 06-02-2010, 877
44, 3, 11-03-2010, 888
Okay, now that you have shown the expected results being a list of events and associated products the question makes sense. Your query only selecting dates and values didn't.
You are looking for the best product price record per event. This would be easily done with analytic functions, but SQLite doesn't support them. So we must write a more complicated query.
Let's look at events with ends_on null first. Here is how to find the best product prices (i.e. last before starts_on):
select e.event_id, p.product_id, max(for_date) as best_for_date
from events e
join prices p on p.for_date < e.starts_on
where e.ends_on is null
group by e.event_id, p.product_id;
We extend this query to also find the best product prices for events with an ends_on and then access the products table again so we get the full records with the values:
select ep.event_id, p.product_id, p.for_date, p.value
from
(
select e.event_id, p.product_id, max(for_date) as best_for_date
from events e
join prices p on (e.ends_on is null and p.for_date < e.starts_on)
or (e.ends_on is not null and p.for_date between e.starts_on and e.ends_on)
group by e.event_id, p.product_id
) ep
join prices p on p.product_id = ep.product_id and p.for_date = ep.best_for_date;
(By the way: You are describing a very special case here. The databases I have seen so far would treat an ends_on null as unlimited or "still active". Thus the price to retrieve for such an event would not be the last before starts_on, but the most current one at or after starts_on.)

SQL 2 Tables, get counts on first, group by second

I'm working in MS Access 2003.
I have Table with records of that kind of structure:
ID, Origin, Destination, Attr1, Attr2, Attr3, ... AttrX
for example:
1, 1000, 1100, 20, M, 5 ...
2, 1000, 1105, 30, F, 5 ...
3, 1001, 1000, 15, M, 10 ...
...
I also have table which has Origin And Destination Codes Grouped
Code, Country, Continent
1000, Albania, Europe
1001, Belgium, Europe
...
1100, China, Asia
1105, Japan, Asia
...
What I need is to get 2 tables which would count records based on criteria related to attributes I specify but grouped by:
1. Origin Continent and Destination Continent
2. Origin Continent and Destination Country
for example:
Case 1.
Origin, Destination, Total, Females, Males, Older than 20, Younger than 20, ...
Europe, China, 300, 100, 200, 120, 180 ...
Europe, Japan, 150, 100, 50, ...
...
Case 2.
Origin, Destination, Total, Females, Males, Older than 20, Younger than 20, ...
Europe, Asia, 1500, 700, 800 ...
Asia, Europe, 1200, ...
...
Can that be done in the way so I could add more columns/criteria easily enough?
Case 1:
select count(1) as total ,t2.continent,t3.country,t1.attr1,t1.attr2,t1.attr3 ... t1.attrX from table1 t1
join table2 t2 on t1.origin = t2.code
join table3 t3 on t1.destination = t3.code
group by t2.continent,t3.country,t1.attr1,t1.attr2,t1.attr3 ... t1.attrX
order by total desc
Case 2:
select count(1) as total ,t2.continent,t3.continent,t1.attr1,t1.attr2,t1.attr3 ... t1.attrX from table1 t1
join table2 t2 on t1.origin = t2.code
join table3 t3 on t1.destination = t3.code
group by t2.continent,t3.continent,t1.attr1,t1.attr2,t1.attr3 ... t1.attrX
order by total desc
You can join queries with queries, so this is a crosstab for Male/Female (attr2)
TRANSFORM Count(Data.ID) AS CountOfID
SELECT Data.Origin, Data.Destination, Count(Data.ID) AS Total
FROM Data
GROUP BY Data.Origin, Data.Destination
PIVOT Data.Attr2;
This is ages:
TRANSFORM Count(Data.ID) AS CountOfID
SELECT Data.Origin, Data.Destination, Count(Data.ID) AS Total
FROM Data
GROUP BY Data.Origin, Data.Destination
PIVOT Partition([Attr1],10,100,10);
This combines the two:
SELECT Ages.Origin, Ages.Destination, Ages.Total,
MF.F, MF.M, Ages.[10: 19], Ages.[20: 29], Ages.[30: 39]
FROM Ages, MF;
As you can see, this could be easier to manage in VBA.