Multiple unwanted records in Group by clause in Postgress - sql

I have two table and I am joining them together then running a group by clause. The problem is that I keep getting unwanted data.
client table
----------
name
company_id
created_at
company table
-----------
name
Query:
SELECT company.name, clients.name, MIN (created_at) created_at
FROM company
INNER JOIN client
ON client.company_id = company.id
group by company.name, client.name
The query returns to me all the users, but what I want is only each one that was first created in each company. What should I change knowing that I need the clients names.

If you want the first one in each company, then use distinct on. This is a nice construct available only in Postgres:
SELECT DISTINCT ON (co.name) co.name, cl.name, cl.created_at
FROM company co INNER JOIN
client cl
ON cl.company_id = co.id
ORDER BY co.name, cl.created_at asc;

Related

SELECT "NOT IN", INNER JOIN and COUNT in SQL Query

I am trying to select which co-ordinates from OA table are NOT found in the CUSTOMER address table.
SELECT DISTINCT
OA.CO_ORDS
FROM
CUSTOMER
INNER JOIN
OA ON customer.address=oa.co_ords
ORDER BY ID ASC;
Returns the co-ordinates which ARE in the customer table. How do I return those that are not in the customer table?
Am I also able to COUNT how many of customers are is in each co-ordinate (The co-ords are not specific and not accurate, this is purely for query testing only)
SELECT DISTINCT
OA.CO_ORDS
FROM
CUSTOMER
INNER JOIN
OA ON customer.address=oa.co_ords
ORDER BY ID ASC;
We can use NOT EXISTS to find those co-ordinates which don't appear in the customer table:
SELECT co_ords
FROM oa
WHERE
NOT EXISTS
(SELECT 1 FROM customers
WHERE address = oa.co_ords)
ORDER BY id;
In order to count how many customers belong to a certain co-ordinate, we can use COUNT with GROUP BY, something like this:
SELECT c.address, COUNT(*)
FROM customers c
JOIN oa
ON c.address = oa.co_ords
GROUP BY c.address;
It could be better to count a specific column instead of *.
It could also be better to use an IN clause instead of JOIN the tables:
SELECT c.address, COUNT(*)
FROM customers c
WHERE address IN
(SELECT co_ords FROM oa)
GROUP BY c.address;
Such details depend on your exact table structure, you should please try this out or provide more details.
You could also do:
SELECT co_ords
FROM oa
MINUS
SELECT address
FROM customers;
which can sometimes be faster than doing an anti-join. Note that MINUS does a distinct on the resultset.

How to Group By all columns of table A when joining two tables to create one big nested table on BigQuery?

I'm trying to create a big nested table that would be composed of many tables, such as my Clients table, Phone Numbers, Emails... They all have in common the client_id field.
For the moment I have to following query that works well ("join" the Clients table fields and the according Phone Numbers fields):
SELECT Clients.*, ARRAY_AGG( STRUCT(Timestamp, Country_Code, Local_Number, Phone_Number, Whatsapp)) as Phones
FROM Clients LEFT JOIN Phones USING(client_id)
GROUP BY Client.client_id, Clients.Timestamp, Clients.First_Name, Clients.Last_Name, Clients.DOB
Client.client_id, Clients.Timestamp, Clients.First_Name, Clients.Last_Name, Clients.DOB are all my fields in Clients table.
I would like to use this query as subquery to "join" it to the Emails table in a similar way (using with and renaming the result of the subquery).
The Thing is that I would like to GROUP BY all the fields of Clients table without writing them all every time. Neither GROUP BY Clients.* nor GROUP BY ALL work...
What can I do to shorten this ?
If client_id is unique, then you can just aggregate by that. What you want is to get all the columns when you do that. A very BigQuery'ish approach is:
SELECT ANY_VALUE(c).*,
ARRAY_AGG( STRUCT(p.Timestamp, p.Country_Code, p.Local_Number, p.Phone_Number, p.Whatsapp)) as Phones
FROM Clients c LEFT JOIN
Phones p
USING (client_id)
GROUP BY c.client_id;
This works fine when I run it:
WITH clients as (
select 'x' as name, 1 as client_id union all
select 'y' as name, 2 as client_id
),
phones as (
select current_timestamp as timestamp, 1 as client_id, 'abc' as country_code, 111 as local_number, 222 as phone_number, null as whatsapp
)
SELECT ANY_VALUE(c).*,
ARRAY_AGG( STRUCT(p.Timestamp, p.Country_Code, p.Local_Number, p.Phone_Number, p.Whatsapp)) as Phones
FROM Clients c LEFT JOIN
Phones p
USING (client_id)
GROUP BY c.client_id;
Try below
SELECT ANY_VALUE(c).*,
ARRAY_AGG((SELECT AS STRUCT p.* EXCEPT(client_id))) as Phones
FROM Clients c
LEFT JOIN Phones p
USING (client_id)
GROUP BY c.client_id

How to join one to many tables in Oracle database

I have four tables:
Order, Employee, Supply, Supply_company
Order
------------------
Order_id
Order name
Emp_id
Employee
------------------
Emp_id
Emp_name
Supply
--------------------
Supply_id
Order_id
SupplierName
Supply_company
----------------------
Supply_company_id
Supply_id
Supplier_desc
address
In these 4 tables one employee has more than one order and one order has many supply ID's and for that one supply ID we have one supplier desc. I wanted to display Supplier_desc based on Emp_id. I am getting all the descriptions associated with all orders but I need to get specific desc for specific order, I have tried distinct, listagg, inner join and left outer join and used subquery in where clause but I didn't find any solution.
SELECT
O.EMP_ID,SC.*
FROM
SUPPLY_COMPANY SC
INNER JOIN
SUPPLY S
ON S.Supply_id=SC.Supply_id
INNER JOIN
Order O
ON O.Order_id=S.Order_id
WHERE O.Emp_id=123
Assuming that you have a particular EMP_ID and an ORDER_ID the following query will get you the EMP_NAME and all SUPPLIER_DESCs associated with that employee and order:
SELECT DISTINCT e.EMP_ID, e.EMP_NAME, s.SUPPLIER_DESC
FROM EMPLOYEE e
INNER JOIN ORDER o
ON o.EMP_ID = e.EMP_ID
INNER JOIN SUPPLY s
ON s.ORDER_ID = o.ORDER_ID
INNER JOIN SUPPLY_COMPANY c
ON c.SUPPLY_ID = s.SUPPLY_ID
WHERE e.EMP_ID = your_emp_id AND
o.ORDER_ID = your_order_id
Here we just walk down from EMPLOYEE, through ORDER and SUPPLY, to SUPPLY_COMPANY where the SUPPLIER_DESC can be found, then use the WHERE clause to filter the results down to the particular employee and order we care about - and since you probably don't want repeated rows we put a DISTINCT on the SELECT criteria to tell us we only want a single example of each unique combination. Just replace your_emp_id and your_order_id with the EMP_ID and ORDER_ID values you're interested in. Note that if you supply and EMP_ID and ORDER_ID which do not have anything in common on the EMPLOYEE and ORDER tables you'll get nothing returned by this query.
Best of luck.

SQL Query to find MAX Date

I have some software that uses dBase4 for its database. I am attempting to construct a report using fields from 3 tables (Customer, Service & History).
In all of the tables the ACCOUNT field is the same. The 'Customer' and the 'Service' table only have one one record for each Customer. The 'History' table has multiple records for each Customer.
I need to write a query so that only the record with the MAX date in 'History.BILLTHRU' is returned for each Customer. The code below returns all of the records for each Customer in the History table:
SELECT Customer.ACCOUNT,
Customer.FIRSTNAME,
(more fields...),
History.ACCOUNT,
History.BILLTHRU,
Service.ACCOUNT,
Service.OFFERCODE
FROM "C:\Customer.dbf" Customer
INNER JOIN "C:\History.dbf" History
ON (Customer.ACCOUNT = History.ACCOUNT)
INNER JOIN "C:\Service.dbf" Service
ON (Customer.ACCOUNT = Service.ACCOUNT)
WHERE Customer.STATUS = "A"
ORDER BY Customer.LAST_BUS_NAME
Use a sub-query and a group by:
SELECT Customer.ACCOUNT,
Customer.FIRSTNAME,
(more fields...),
History.ACCOUNT,
History.BILLTHRU,
Service.ACCOUNT,
Service.OFFERCODE
FROM "C:\Customer.dbf" Customer
INNER JOIN (SELECT ACCOUNT, MAX(BILLTHRU) AS BILLTHRU
FROM "C:\History.dbf"
GROUP BY ACCOUNT) History
ON (Customer.ACCOUNT = History.ACCOUNT)
INNER JOIN "C:\Service.dbf" Service
ON (Customer.ACCOUNT = Service.ACCOUNT)
WHERE Customer.STATUS = "A"
ORDER BY Customer.LAST_BUS_NAME
I like to use common table expressions (CTEs). Subqueries are good, but breaking it out like this sometimes makes it easier to keep separate.
with GetMaxDate as (
select account, max(billthru) as MaxBillThru
from "C:\History.dbf"
group by account
)
SELECT Customer.ACCOUNT,
Customer.FIRSTNAME,
(more fields...),
GetMaxDate.ACCOUNT,
GetMaxDate.MaxBillThru,
Service.ACCOUNT,
Service.OFFERCODE
.....
from FROM "C:\Customer.dbf" Customer
INNER JOIN GetMaxDate on customer.ACCOUNT = GetMaxDate.Account
INNER JOIN "C:\Service.dbf" Service
ON (Customer.ACCOUNT = Service.ACCOUNT)
WHERE Customer.STATUS = "A"
ORDER BY Customer.LAST_BUS_NAME
EDIT: This is a SQL Server function. I'm leaving it in case it can help you or someone else. I'll delete it if it just clouds the issue.

Select the row with the max value in a specific column, SQL Server

I've been working on a school project past few days and I picked to work on a DVD club database. I have six tables, but for this question, only two are relevant. The clients table and the loans table. So, what I am trying to do is count for every client how many loans he's made so far and out of all pick the client with the max number of loans, so he can be rewarded the free DVD next month. Here is the code I've written, but it doesn't pick the specific client, it shows all the clients having the max number of loans of a specific client:
SELECT tblClients.Client_ID, MAX(x.Number_Of_Loans) AS MAX_NOL
FROM
(
SELECT COUNT(tblLoans.Client_ID) AS Number_Of_Loans
FROM tblClients, tblLoans WHERE tblClients.Client_ID=tblLoans.Client_ID
GROUP BY tblLoans.Client_ID
)x, tblClients, tblLoans
WHERE tblClients.Client_ID=tblLoans.Client_ID
GROUP BY tblClients.Client_ID, tblClients.Given_Name,
tblClients.Family_Name, tblClients.Phone, tblClients.Address, tblClients.Town_ID
Use the following
SELECT TOP 1 tblClients.Client_ID,COUNT(tblLoans.Client_ID) AS MAX_NOL
FROM tblClients, tblLoans
WHERE tblClients.Client_ID=tblLoans.Client_ID
GROUP BY tblClients.Client_ID
ORDER BY COUNT(tblLoans.Client_ID) DESC
You can do this with a single aggregate GROUP, ordered by the client with the max loans:
SELECT TOP 1 tblClients.Client_ID, tblClients.Given_Name, tblClients.Family_Name,
tblClients.Phone, tblClients.Address, tblClients.Town_ID,
COUNT(x.Number_Of_Loans) AS MAX_NOL
FROM
tblClients INNER JOIN tblLoans
ON tblClients.Client_ID=tblLoans.Client_ID
GROUP BY tblClients.Client_ID, tblClients.Given_Name, tblClients.Family_Name,
tblClients.Phone, tblClients.Address, tblClients.Town_ID
ORDER BY MAX_NOL DESC;
Any selected columns from the client need to be included in the GROUP, and I would recommend using JOINs instead of WHERE joins.
Edit
What might be tidier is to split the determination of the ClientId with the most loans and the concern of fetching the rest of the client's data, like so (rather than the ungainly GROUP BY over many columns):
SELECT c.Client_ID, c.Given_Name, c.Family_Name,
c.Phone, c.Address, c.Town_ID,
x.MaxLoans
FROM
tblClients c
INNER JOIN
(SELECT TOP 1 tblClients.Client_ID, COUNT(tblLoans.Client_ID) AS MaxLoans
FROM tblClients
INNER JOIN tblLoans
ON tblClients.Client_ID=tblLoans.Client_ID
GROUP BY tblClients.Client_ID
ORDER BY MaxLoans DESC) x
ON c.Client_ID = x.Client_ID;