I asked a similar question the other day but it seems no one was able to answer it, and searched the internet for a few days but still fruitless, perhaps I am not asking the question the right way: One-to-many Query in MySQL
So I while try again and maybe word it a bit differently. This is essentially a simplified version of what I am trying to do:
CREATE TABLE Customer(
customer_id INT NOT NULL,
first_name varchar(20),
last_name varchar(20)
);
CREATE TABLE Payment(
customer_id INT NOT NULL,
amount_paid INT,
year YEAR,
FOREIGN KEY (customer_id) REFERENCES Customer(customer_id)
);
What I want is to organize the first_name on the left, only occurring once, and then for each year list the payment amount in separate columns because I am going to be attaching this to WPF and I want a spreadsheet style representation of the data. So, ideally it would look like this:
name 2009 2008 2007
John 500 600 NULL
Anne NULL 500 600
Bob NULL NULL 600
My approach is to count the number of distinct years of payments, and use that as a loop counter. than loop through and collect the data for each year. Represent each column of amount_paid by the year number. I am not just not sure how to do that, because my initial approach was to use UNION, but than I realized that just puts everything in the same column as opposed to separate ones. So what should I be using? I am only asking for some guidance. Thank you!
Use:
SELECT c.first_name,
MAX(CASE WHEN p.year = 2009 THEN c.amount_paid ELSE NULL END) AS 2009,
MAX(CASE WHEN p.year = 2008 THEN c.amount_paid ELSE NULL END) AS 2008,
MAX(CASE WHEN p.year = 2007 THEN c.amount_paid ELSE NULL END) AS 2007
FROM CUSTOMER c
JOIN PAYMENT p ON p.customer_id = c.customer_id
GROUP BY c.first_name
mysql unfortunately has no pivot feature, so the only possible way is to format the result set with any programming language.
I hadn't test it but I think something like this works:
select * from ((select name, sum(amount_paid) as 2009 from customer, payment
where customer.customer_id = payment.customer_id
and payment.year = 2009) a,
(select name, sum(amount_paid) as 2008 from customer, payment
where customer.customer_id = payment.customer_id
and payment.year = 2008) b
(select name, sum(amount_paid) as 2007 from customer, payment
where customer.customer_id = payment.customer_id
and payment.year = 2007) c);
Related
I am a beginner reading the book Sams Teach Yourself SQL in 10 Minutes (Fifth Edition) to learn SQL.
Here is an expmple SQL query from the book from Chapter 14 - Combining Queries.
My question regarding a note which says about other UNION type - INTERSECT
Firstly - see the Structure of Tables from this Db Structure - Image
Here is the Query
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state IN ('IL','IN','MI')
INTERSECT
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_name = 'Fun4All'
ORDER BY cust_name, cust_contact;
In the note after this example, it says that
INTERSECT can be used to retrieve only the rows that exist in both tables.
And Below that line
In practice, however, these UNION types are rarely used because the same results can be accomplished using joins.
So, How can I accomplish this using joins and not INTERSECT?
I tried doing all I learned but couldn't get the result.
You typically need an appropriate unique key (PRIMARY KEY, FOREIGN KEY), so that you can join by it safely. Assuming you have some cust_id column, you can join
SELECT c1.cust_name, c1.cust_contact, c1.cust_email
FROM Customers c1
JOIN (
SELECT *
FROM Customers
WHERE cust_name = 'Fun4All'
) as c2
ON c1.cust_id = c2.cust_id
WHERE c1.cust_state IN ('IL','IN','MI')
ORDER BY cust_name, cust_contact;
Note however, that in your example it's even simpler
SELECT cust_name, cust_contact, cust_email
FROM Customers
WHERE cust_state IN ('IL','IN','MI') AND cust_name = 'Fun4All'
ORDER BY cust_name, cust_contact;
For the sake of simplicity, let’s assume that the table in question is called app and it has only three fields:
Person_id | employee_id | appointment_time
----------+-------------+-----------------
int | int | date
The table holds details of all medical appointments, past and future, for all clients (person_id) and specialists (employee_id).
What I am trying to figure out is how to create a list of appointments for a given specialist (let's say with an id of 235) and their corresponding "referals" (if any) - the previous appointment for a given person_id with an earlier date and serviced by another specialist (id <> 235).
SELECT
qLast.person_id,
qLast.employee_id,
qLast.LastDate,
qPrevious.employee_id,
qPrevious.PreviousDate
FROM
(
SELECT
app.person_id,
app.employee_id,
Max(app.appointment_time) AS LastDate
FROM
app
GROUP BY
app.person_id,
app.employee_id
HAVING
app.person_id <> 0
AND app.employee_id = 235
) qLast
LEFT JOIN (
SELECT
qSub.person_id,
app.employee_id,
qSub.MaxOfappointment_time AS PreviousDate
FROM
(
SELECT
app.person_id,
Max(app.appointment_time) AS MaxOfappointment_time
FROM
app
GROUP BY
app.person_id,
app.employee_id
HAVING
app.person_id <> 0
AND app.employee_id <> 235
) qSub
INNER JOIN app ON (
qSub.MaxOfappointment_time = app.appointment_time
)
AND (qSub.person_id = app.person_id)
) qPrevious ON qLast.person_id = qPrevious.person_id;
My mangled attempt almost works but sadly falls on its confused face when there is an appointment for a specialist with id<>235 with a later date than the last appointment for id=235. For now I run another query on the results of this one to filter out the unwanted records but it a rather ugly kludge. I'm sure there is a better and more elegant way of solving it. Help please!
I think you basically want lag(), but that is not available in SQL Server 2008 (time to upgrade to supported software!).
You can use apply instead:
select a.*, a2.*
from app a cross apply
(select top (1) a2.*
from app a2
where a2.person_id = a.person_id and
a2.employee_id <> a.employee_id and
a2.appointment_time < a.appointment_time
order by a2.appointment_time desc
) a2
Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 8 years ago.
Improve this question
Table Structure:
Article(
model int(key),
year int(key),
author varchar(key),
num int)
num: number of articles wrote during the year
Find all the authors that each one of them in one year atleast wrote maximal number of articles (relative to all the other authors)
I tried:
SELECT author FROM Article,
(SELECT year,max(sumnum) s FROM
(SELECT year,author,SUM(num) sumnum FROM Article GROUP BY year,author)
GROUP BY year) AS B WHERE Article.year=B.year and Article.num=B.s;
Is this the right answer?
Thanks.
You might want to try a self-JOIN to get what you are looking for:
SELECT Main.author
FROM Article AS Main
INNER JOIN (
SELECT year
,author
,SUM(num) AS sumnum
FROM Article
GROUP BY year
,author
) AS SumMain
ON SumMain.year = Main.year
AND SumMain.author = Main.author
GROUP BY Main.author
HAVING SUM(Main.num) = MAX(SumMain.sumnum)
;
This would guarantee (as it is ANSI) you are getting the MAX of the SUMmed nums and only bringing back results for what you need. Keep in mind I only JOINed on those two fields because of the information provided ... if you have a unique ID you can JOIN on, or you require more specificity to get a 1-to-1 match, adjust accordingly.
Depending on what DBMS you are using, it can be simplified one of two ways:
SELECT author
FROM (
SELECT year
,author
,SUM(num) AS sumnum
FROM Article
GROUP BY year
,author
HAVING SUM(num) = MAX(sumnum)
) AS Main
;
Some DBMSes allow you to do multiple aggregate functions, and this could work there.
If your DBMS allows you to do OLAP functions, you can do something like this:
SELECT author
FROM (
SELECT year
,author
,SUM(num) AS sumnum
FROM Article
GROUP BY year
,author
) AS Main
QUALIFY (
ROW_NUMBER() OVER (
PARTITION BY author
,year
ORDER BY sumnum DESC
) = 1
)
;
Which would limit the result set to only the highest sumnum, although you may need more parameters to handle things if you wanted the year to be involved (you are GROUPing by it, only reason I bring it up).
Hope this helps!
You mention for homework and a valid attempt, however incorrect.
This is under a premise (unclear since no sample data) that the model column is like an auto-increment, and there is only going to be one entry per author per year and never multiple records for the same author within the same year. Ex:
model year author num
===== ==== ====== ===
1 2013 A 15
2 2013 C 18
3 2013 X 17
4 2014 A 16
5 2014 B 12
6 2014 C 16
7 2014 X 18
8 2014 Y 18
So the result expected is highest article count in 2013 = 18 and would only return author "C". In 2014, highest article count is 18 and would return authors "X" and "Y"
First, get a query of what was the maximum number of articles written...
select
year,
max( num ) as ArticlesPerYear
from
Article
GROUP BY
year
This would give you one record per year, and the maximum number of articles published... so if you had data for years 2010-2014, you would at MOST have 5 records returned. Now, it is as simple as joining this to the original table that had the matching year and articles
select
A2.*
from
( select
year,
max( num ) as ArticlesPerYear
from
Article
GROUP BY
year ) PreQuery
JOIN Article A2
on PreQuery.Year = A2.Year
AND PreQuery.ArticlesPerYear = A2.num
I suggest a CTE
WITH maxyear AS
(SELECT year, max(num) AS max_articles
FROM article
GROUP BY year)
SELECT DISTINCT author
FROM article a
JOIN maxyear m
ON a.year=m.year AND a.num=m.max_articles;
and compare that in performance to a partition, which is another way
SELECT DISTINCT author FROM
(SELECT author, rank() AS r
OVER (PARTITION BY year ORDER BY num DESC)
FROM article) AS subq
WHERE r = 1;
I think some RDBMS will let you put HAVING rank()=1 on the subquery and then you don't need to nest queries.
I would like to retrieve only the customers from an order table who have paid for all the orders. (Paid = 'Y').
The order table looks like this:
Cus # Order # Paid
111 1 Y
111 2 Y
222 3 Y
222 4 N
333 5 N
In this example the query should only return customer 111.
The query
Select * from order where Paid = 'Y';
returns customers that have paid and unpaid orders (ex. customer 222) in addition to customers who have paid for all of their orders (customer 111).
How do I structure the query to evaluate all the orders for a customer and only return information for a customer that has paid for all the orders?
Looking the problem a different way, you need only the customers who don't have any unpaid order.
sel cus from order group by Cus having min(Paid) = 'Y';
The above query also utilizes the fact that 'Y' > 'N'.
SQL Fiddle:
http://sqlfiddle.com/#!4/f6022/1
If you need to select all different orders for eligible customers, you may use OLAP functions:
select cus,order,paid from (select cus,order,paid,min(paid)
over (partition by cus)minz from order)dt where minz='Y';
With Oracle, you can also do select customer from your_table where paid='Y' minus select customer from your_table where paid='N', although I don't know if this is faster and, of course, you don't get the other fields in this case.
Don't know Oracle but often work in SQL Server, I would write query something like this
Select Cus from order group by Cus having count(*) = sum(case when Paid = ‘Y’ then 1 else 0);
Basically you retrieve customers where total order count equals to sum or paid orders.
Again, apologize for not giving proper Oracle syntax but hopefully that will point you to the right direction.
SELECT *
FROM orders oo
WHERE NOT EXISTS
(
SELECT 1
FROM orders io
WHERE oo.cus# = io.cus#
AND io.paid = 'N'
)
;
In SQL Server 2005, I have a table of input coming in of successful sales, and a variety of tables with information on known customers, and their details. For each row of sales, I need to match 0 or 1 known customers.
We have the following information coming in from the sales table:
ServiceId,
Address,
ZipCode,
EmailAddress,
HomePhone,
FirstName,
LastName
The customers information includes all of this, as well as a 'LastTransaction' date.
Any of these fields can map back to 0 or more customers. We count a match as being any time that a ServiceId, Address+ZipCode, EmailAddress, or HomePhone in the sales table exactly matches a customer.
The problem is that we have information on many customers, sometimes multiple in the same household. This means that we might have John Doe, Jane Doe, Jim Doe, and Bob Doe in the same house. They would all match on on Address+ZipCode, and HomePhone--and possibly more than one of them would match on ServiceId, as well.
I need some way to elegantly keep track of, in a transaction, the 'best' match of a customer. If one matches 6 fields, and the others only match 5, that customer should be kept as a match to that record. In the case of multiple matching 5, and none matching more, the most recent LastTransaction date should be kept.
Any ideas would be quite appreciated.
Update: To be a little more clear, I am looking for a good way to verify the number of exact matches in the row of data, and choose which rows to associate based on that information. If the last name is 'Doe', it must exactly match the customer last name, to count as a matching parameter, rather than be a very close match.
for SQL Server 2005 and up try:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
s.*,c.*
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
EDIT
I hate to write so much actual code when there was no shema given, because I can't actually run this and be sure it works. However to answer the question of the how to handle ties using the last transaction date, here is a newer version of the above code:
;WITH SalesScore AS (
SELECT
s.PK_ID as S_PK
,c.PK_ID AS c_PK
,CASE
WHEN c.PK_ID IS NULL THEN 0
ELSE CASE WHEN s.ServiceId=c.ServiceId THEN 1 ELSE 0 END
+CASE WHEN (s.Address=c.Address AND s.Zip=c.Zip) THEN 1 ELSE 0 END
+CASE WHEN s.EmailAddress=c.EmailAddress THEN 1 ELSE 0 END
+CASE WHEN s.HomePhone=c.HomePhone THEN 1 ELSE 0 END
END AS Score
FROM Sales s
LEFT OUTER JOIN Customers c ON s.ServiceId=c.ServiceId
OR (s.Address=c.Address AND s.Zip=c.Zip)
OR s.EmailAddress=c.EmailAddress
OR s.HomePhone=c.HomePhone
)
SELECT
*
FROM (SELECT
s.*,c.*,row_number() over(partition by s.PK_ID order by s.PK_ID ASC,c.LastTransaction DESC) AS RankValue
FROM (SELECT
S_PK,MAX(Score) AS Score
FROM SalesScore
GROUP BY S_PK
) dt
INNER JOIN Sales s ON dt.s_PK=s.PK_ID
INNER JOIN SalesScore ss ON dt.s_PK=s.PK_ID AND dt.Score=ss.Score
LEFT OUTER JOIN Customers c ON ss.c_PK=c.PK_ID
) dt2
WHERE dt2.RankValue=1
Here's a fairly ugly way to do this, using SQL Server code. Assumptions:
- Column CustomerId exists in the Customer table, to uniquely identify customers.
- Only exact matches are supported (as implied by the question).
SELECT top 1 CustomerId, LastTransaction, count(*) HowMany
from (select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.ServiceId = sa.ServiceId
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.EmailAddress = sa.EmailAddress
union all select Customerid, LastTransaction
from Sales sa
inner join Customers cu
on cu.Address = sa.Address
and cu.ZipCode = sa.ZipCode
union all [etcetera -- repeat for each possible link]
) xx
group by CustomerId, LastTransaction
order by count(*) desc, LastTransaction desc
I dislike using "top 1", but it is quicker to write. (The alternative is to use ranking functions and that would require either another subquery level or impelmenting it as a CTE.) Of course, if your tables are large this would fly like a cow unless you had indexes on all your columns.
Frankly I would be wary of doing this at all as you do not have a unique identifier in your data.
John Smith lives with his son John Smith and they both use the same email address and home phone. These are two people but you would match them as one. We run into this all the time with our data and have no solution for automated matching because of it. We identify possible dups and actually physically call and find out id they are dups.
I would probably create a stored function for that (in Oracle) and oder on the highest match
SELECT * FROM (
SELECT c.*, MATCH_CUSTOMER( Customer.Id, par1, par2, par3 ) matches FROM Customer c
) WHERE matches >0 ORDER BY matches desc
The function match_customer returns the number of matches based on the input parameters... I guess is is probably slow as this query will always scan the complete customer table
For close matches you can also look at a number of string similarity algorithms.
For example, in Oracle there is the UTL_MATCH.JARO_WINKLER_SIMILARITY function:
http://www.psoug.org/reference/utl_match.html
There is also the Levenshtein distance algorithym.