i am completely new to sql.I am trying to learn things in sql. Juts stuck upon something. Say i have a table with two colmumns customername and customer address. multiple customers can be mapped to the same address. How can retrieve the address with maximum customers ?
This can be done using grouping (to get the counts), ordering (descending) and limiting (to get the top row). In MySQL for instance, it might look like this:
SELECT customer_address, COUNT(DISTINCT customer_id) AS number_of_customers
FROM your_table
GROUP BY customer_address
ORDER BY number_of_customers DESC
LIMIT 1;
This will yield something like:
+------------------+---------------------+
| customer_address | number_of_customers |
+------------------+---------------------+
| foo | 42 |
+------------------+---------------------+
Related
Im reading this article by Miguel Grinberg, and on the 'The Join' part, I'm kinda confused with the result.
To sum up the part I'm concerned, he joined a query and a subquery belonging to the SAME table on the condition where its customer_id's are the same
Query selected: id, customer_id, order_date
Subquery selected: customer_id, max(order_date) AS last_order_date
When he joined it I was expecting something like:
id | customer_id | order_data | customer_id | last_order_date
--------------------------------------------------------------
But his result was:
id | customer_id | order_data | last_order_date
-----------------------------------------------
Where is the other customer_id selected from the subquery?
With that I would like to confirm if my understanding is correct, a JOIN also combines two COLUMNS if it has the same NAME and VALUE.
The fact that the article uses select * when it should be using select orders.*, last_orders.last_order_date already makes me suspicious of anything else in the article.
Most databases would run the query and return two columns with customer_id -- as you suggest should happen. However, there is then a problem in accessing both those columns in an application. They have the same name. So, the columns might be elided in some way.
All that said, this is a rather poor example, because the query is much better written using window functions:
select o.*, max(order_date) over (partition by customer_id)
from orders o;
So, say I have a table of entries which have a product name, a user, and the product's pricing.
My problem is that I want to obtain a result set that groups the products bought a single user together, and then sorts those products lexicographically.
So, something like where every product bought a user whose name starts with an A is grouped in their own little block, with each product also appearing in alphabetical order (Candy before Cat food, for example), with a user whose name starts with P afterward.
Can someone explain how I might begin to do this?
An SQL query returns a table of rows and columns. You can have one column for the client and another for the product and sort by client and inside by product (ORDER BY client, product). You don't get different "blocks" of data.
If you want this more beautiful, you need some software to create a report (i.e. data with a layout) based on the query.
What you can do with SQL, though, is suppress data, such as:
select
case when client = lag(client) over (order by client, product) then null else client end
as client,
product
from bought
order by client, product;
Sample result:
client | product
--------+--------
Max | cup
| saucer
| plate
Elsa | mug
| plate
For an assignment i must combine 3 tables and write a query that returns the names of all people that have less than half of the wealth of the richest person. We define the wealth of a person as the total money on all of his/her accounts.
The 3 tables are:
Persons
id | name | address | age | eyeColor | gender
BankAccounts
id | balance
AccountOf
id | person_id → Persons | account_id → BankAccounts
I know how to use te SUM() function and the MAX() function, but combining them is a pain in my ass.
There is also someone without an bankaccount.
Does anyone know how to do this assignment or can give me a hint?
Not to give it away, since it's an assignment and that kind of ruins the whole thing, but... you'll need to find the sum(balance) for the richest person, which would be the max of all the persons' sum(balance). This will look something like:
SELECT
max(personbalance)
FROM
(
Select
sum(balance)
FROM
persons
join accountof
join bankaccounts
GROUP BY persons.id
)subForSum
This will just be a subquery in your main query, but it should give you enough direction to slap the rest of it together. When in doubt with these things, just subquery and subquery and subquery. You can clean it up after you get the answer you expect.
For future students who are looking for an answer:
Use a left Join as some persons might not be in the BankAccount table
Obtain null value and use coalesce to replace the values
use this as a subquery to obtain the richest person and compare values:
SELECT max(personbalance)
FROM
(
Select
sum(balance)
FROM
persons
join accountof
join bankaccounts
GROUP BY persons.id
)
Good Luck!
I have three tables:
Messages
messageid | userid | text
Ex: 1 | 1303 | hey guys
Users
userid | username
Ex:
1303 | trantor
1301 | tranro1
1302 | trantor2
Favorites
messageid | userid
Ex:
1 | 1302
1 | 1301
What I want to do, is display a table that has usernames, and counts the number of times they're messages were favorited a certain number of times. In the example above, I want to query saying "how many messages does each user have that has been liked exactly twice?"
and it would show a table that has a row saying
trantor | 1
A natural extension is to replace exactly twice with "at least 2", "more than 6", etc. Im trying to combine count with joins and find myself confused. And since the tables are large, Im getting counts but not confident that my query is working correctly. I have read this article but am still confused :L
What I have so far:
SELECT USERS.username, COUNT(FAVORITES.id) FROM USERS INNER JOIN FAVORITES ON FAVORITES.userID=USERS.id WHERE COUNT(FAVORITES.id) > 2;
But I dont think it works.
On S.O. I've found these questions on "correlated subqueries" but am thoroughly confused.
Would it be something like this?
SELECT USERS.username,
, ( SELECT COUNT(FAVORTIES.userid)
FROM FAVORITES INNER JOIN ON MESSAGES
WHERE FAVORITES.messageid = MESSAGES.messageid
)
FROM USERS
There's a couple things you should know with aggregate functions in SQL. First off, you need to do a GROUP BY if you're selecting an aggregate function. Second, any conditions involving aggregate functions are to be used with a HAVING clause rather than a WHERE.
The GROUP BY is to be applied to the column(s) you're selecting alongside any aggregate functions.
Here's a basic structure:
SELECT attribute1, COUNT(attribute2)
FROM someTable
GROUP BY attribute1
HAVING COUNT(attribute2) > 2;
Apply anything else you're using such as JOINS and ORDER BY and what not.
note: There's a certain order these clauses have to be in. Such as ORDER BY goes after HAVING, which comes after GROUP BY and so forth.
If I'm remembering correctly, the order of operations go:
SELECT
FROM
WHERE
GROUP BY
HAVING
ORDER BY
When you use aggregate function such as COUNT() you will need to use GROUP BY together with HAVING rather than WHERE
SELECT USERS.username, COUNT(FAVORITES.id)
FROM USERS
INNER JOIN FAVORITES
ON FAVORITES.userID=USERS.id
GROUP BY USERS.username
HAVING COUNT(FAVORITES.id) > 2;
From documentation
If you use a group function in a statement containing no GROUP BY clause, it is equivalent to grouping on all rows.
Here's my question: how do I maintain record integrity using aggregate functions with a group by?
To explain further, here's an example.
I have a table with the following columns: (Think of it as an "order" table)
Customer_Summary (first 10 char of name + first 10 char of address)
Customer_Name
Customer_Address
Customer_Postal Code
Order_weekday
There is one row per "order", so many rows with the same customer name, address, and summary.
What I want to do is show the customer's name, address, and postal code, as well as the number of orders they've placed on each weekday, grouped by the customer's summary.
So the data should look like:
Summary | Name | Address | PCode | Monday | Tuesday | Wednesday | Thursday | Friday
test custntest addre|test custname|test address|123456 | 1 | 1 | 1 | 1 | 1
I only want to group records of similar customer summary together, but obviously I want one name, address, and postal code to show. I'm using min() at the moment, so my query looks like:
SELECT Customer_Summary, min(customer_name), min(customer_address), min(customer_postal_code)
FROM Order
Group by customer_summary
I've omitted my weekday logic as I didn't think it was necessary.
My issue is this - some of these customers with the same customer summary have different addresses and postal codes.
So I might have two customers, looking like:
test custntest addre|test custname |test address |323456|
test custntest addre|test custname2|test address2|123456|
Using the group by, my query will return the following:
test custntest addre|test custname |test address |123456|
Since I'm using min, it's going to give me the minimum value for all of the fields, but not necessarily from the same record. So I've lost my record integrity here - the address and name returned by the query do not correctly match the postal code.
So how do I maintain data integrity on non-grouped fields when using a group by clause?
Hopefully I explained it clearly enough, and thanks in advance for the help.
EDIT: Solved. Thanks everyone!
You can always use ROW_NUMBER instead of GROUP BY
WITH A AS (
SELECT Customer_Summary, customer_name, customer_address, customer_postal_code,
ROW_NUMBER() OVER (PARTITION BY Customer_Summary ORDER BY customer_name, customer_address) AS rn
FROM Order
)
SELECT Customer_Summary, customer_name, customer_address, customer_postal_code
FROM A
WHERE rn = 1
Then you are free to order which customer to use in the ORDER BY clause. Currently I am order them by name and then address.
Edit:
My solution does what you asked for. But I surely agree with the others: If you are allowed to change the database structure, this would be a good idea... which you are not (saw your comment). Well, then ROW_NUMBER() is a good way.
I think you need to re-think your structure.
Ideally you would have a Customer table with an unique ID. Then you would use that unique ID in the Order table. Then you don't need the strange "first 10 characters" method that you are using. Instead, you just group by the unique ID from the Customer table.
You could even then also have a separate table for addresses, relating each address to the customer, with multiple rows (with fields marking them as home address, delivery address, billing address, etc).
This way you separate the Customer information from the Address information and from the Order information. Such that if the customer changes name (marriage) or address (moving home) you don't break your data - Everything is related by the IDs, not the data itself.
[This relationship is known as a Foreign Key.]