SQL Server finding maximum query

SQL Server finding maximum query - sql

me being a humble sql beginner, I have a question: let's say I have two tables: Items, and People(to whom an item is sold). Is there any way in SQL to produce a query to show data of a person who bought the most items, and the ammount of items he or she bought?
my creates are:
CREATE TABLE PEOPLE (
ID INT PRIMARY KEY,
(other data)
);
CREATE TABLE ITEMS (
ID INT PRIMARY KEY,
(other stuff....)
bought_by INT REFERENCES PEOPLE
);
Any help would be appreciated:)

Yes, you can just group by bought_by from ITEMS table, sort the result descending by the count and get the top record:
select top 1 bought_by, count(*)
from ITEMS
group by bought_by
order by count(*) desc

Probably this query can help you too, if you need more people data/info:
SELECT TOP 1 PEOPLE.*, COUNT(ITEMS.*) AMOUNT FROM PEOPLE
JOIN ITEMS ON PEOPLE.ID = ITEMS.bought_by
GROUP BY PEOPLE.ID, PEOPLE.OtherFieldsInPeopleTable
ORDER BY COUNT(ITEMS.*) DESC

Related

SQL Server 2016: Query to create multiple unique pairs of IDs from the same table?

I'm working on building a query to handle random pairings of people so that each one can assess many others. I'm looking for a way to handle this in bulk - cross join perhaps? -rather than using a cursor to loop through people one at a time, which when tested was pretty slow as there will likely be hundreds of pairings at a time.
There are a few main parameters:
Each pair must be unique - two IDs can only be paired once.
There will be a specific number of pairs per ID - both the person being assessed and the person doing the assessing can have no more or less than the specific number of pairs.
All IDs are in this one table.
Must be able to create the pairs in random order rather.
No ID can be paired with itself.
Any ideas for how I could approach this?
Here's the query I've been working on
DECLARE #assessmentID INT=[N];
DECLARE #assessmentPairs TABLE(
assessorID INT,
authorID INT,
assessorCounter INT,
authorCounter INT
UNIQUE NONCLUSTERED ([assessorID], [authorID])
);
INSERT INTO #assessmentPairs
SELECT assessorID,authorID,assessorCounter,authorCounter
FROM (
SELECT
e1.personID AS assessorID,
e2.personID AS authorID,
assessorCounter=ROW_NUMBER() OVER(PARTITION BY e1.personID ORDER BY e1.personID),
authorCounter=ROW_NUMBER() OVER(PARTITION BY e2.personID ORDER BY NEWID())
FROM People e1
JOIN Assessments a ON a.courseOfferingID=e1.courseOfferingID
CROSS JOIN People e2
WHERE e2.personID<>e1.personID
AND a.assessmentID=#assessmentID
GROUP BY e1.personID,e2.personID
) AS x
WHERE authorCounter<=10
ORDER BY assessorID,authorCounter,authorID,assessorCounter
SELECT *
FROM #assessmentPairs
ORDER BY authorID,assessorID

Return all data when grouping on a field

I have the following 2 tables (there are more fields in the real tables):
create table publisher(id serial not null primary key,
name text not null);
create table product(id serial not null primary key,
name text not null,
publisherRef int not null references publisher(id));
Sample data:
insert into publisher (id,name) values (1,'pub1'),(2,'pub2'),(3,'pub3');
insert into product (name,publisherRef) values('p1',1),('p2',2),('p3',2),('p4',2),('p5',3),('p6',3);
And I would like the query to return:
name, numProducts
pub2, 3
pub3, 2
pub1, 1
A product is published by a publisher. Now I need a list of name, id of all publishers which have at least one product, ordered by the total number of products each publisher has.
I can get the id of the publishers ordered by number of products with:
select publisherRef AS id, count(*)
from product
order by count(*) desc;
But I also need the name of each publisher in the result. I thought I could use a subquery like:
select *
from publisher
where id in (
select publisherRef
from product
order by count(*) desc)
But the order of rows in the subquery is lost in the outer SELECT.
Is there any way to do this with a single sql query?

SELECT pub.name, pro.num_products
FROM (
SELECT publisherref AS id, count(*) AS num_products
FROM product
GROUP BY 1
) pro
JOIN publisher pub USING (id)
ORDER BY 2 DESC;
db<>fiddle here
Or (since the title mentions "all data") return all columns of the publisher with pub.*. After products have been aggregated in the subquery, you are free to list anything in the outer SELECT.
This only lists publisher which
have at least one product
And the result is ordered by
the total number of products each publisher has
It's typically faster to aggregate the "n"-table before joining to the "1"-table. Then use an [INNER] JOIN (not a LEFT JOIN) to exclude publishers without products.
Note that the order of rows in an IN expression (or items in the given list - there are two syntax variants) is insignificant.
The column alias in publisherref AS id is totally optional to use the simpler USING clause for identical column names in the following join condition.
Aside: avoid CaMeL-case names in Postgres. Use unquoted, legal, lowercase names exclusively to make your life easier.
Are PostgreSQL column names case-sensitive?

Draw data from two tables

I'm learning SQL on Test sites and can not get past this question. (Test Dome)
Each item in a web shop belongs to a seller. To ensure service quality, each seller has a rating.
The data are kept in the following two tables:
TABLE sellers
id INTEGER PRIMARY KEY,
name VARCHAR(30) NOT NULL,
rating INTEGER NOT NULL
TABLE items
id INTEGER PRIMARY KEY,
name VARCHAR(30) NOT NULL,
sellerId INTEGER REFERENCES sellers(id)
Write a query that selects the item name and the name of its seller for each item that belongs to a seller with a rating greater than 4. The query should return the name of the item as the first column and name of the seller as the second column.
I've tried the following which has gotten me closest of my attempts:
SELECT items.name, sellers.name
FROM sellers, items
WHERE sellers.rating > 4;
Thanks in advance for bothering with my noob question.

You were almost there but you forgot to specify the join condition, without which would be a cross join (all possible combinations of both table rows)
SELECT items.name, sellers.name
FROM sellers, items
WHERE sellers.rating > 4
AND items.sellerId = sellers.id;
In a more extended form the above should be the same as below
SELECT items.name, sellers.name
FROM sellers
INNER JOIN items
ON sellers.id = items.id
WHERE sellers.rating > 4;

You want to join the tables.
Here's a link to the tutorial on w3s, this topic is explained quite well here: https://www.w3schools.com/sql/sql_join.asp

Easier way to limit rows in SELECT subquery?

I perform queries on an Oracle database. Let's say I have a table, PEOPLE. Each person can have multiple reference numbers. The reference numbers are stored in a different table, REFERENCENUMBERS.
REFERENCENUMBERS contains a column, PERSON_ID, which is identical to the ID column of the PEOPLE table. It is through this ID that the tables are joined.
Let's say I want to perform a query on the PEOPLE table. However I only want a single reference number returned per person record: i.e if a person has multiple reference numbers, I don't want multiple rows returned per person per reference number.
I choose a criterion for how to select only one reference number: the one which was created earliest. The date of reference number creation is stored in the REFERENCENUMBERS table as DATECREATED.
The following code does this job:
SELECT
PEOPLE.ID,
PEOPLE.NAME,
PEOPLE.AGE,
PEOPLE.ADDRESS,
-- Subquery to return the earliest-created reference number for this person
(
SELECT
REFERENCENUMBERS.NUMBER
FROM
REFERENCENUMBERS
WHERE
REFERENCENUMBERS.PERSON_ID = PEOPLE.ID -- Link back to the main people ID
AND REFERENCENUMBERS.DATECREATED =
-- Sub-sub query simply to match the earliest date
(
SELECT
MIN(R.DATECREATED) -- To ensure that only the earliest-created reference number is returned.
FROM
REFERENCENUMBERS R -- Give this sub-sub query an alias for the table
WHERE
R.PERSON_ID = PEOPLE.ID -- Link back to the main people ID
)
)
FROM
PEOPLE
WHERE
PEOPLE.AGE > 18 -- Or whatever
However, my question to you knowledgeable SQL people, is.. is there an easier way of doing this? It just appears cumbersome to have to include a sub-sub-query solely for the purpose of finding the earliest date, and limiting the WHERE clause of the sub-query.
There must be an easier, or cleaner way of doing this. Any suggestions?
(By the way - the sample code is greatly simplified from what I'm actually working on. Please don't provide answers which substantively modify my primary query with different-style JOINs etc - thanks).

The simplest would be a top-n filter:
select people.id
, people.name
, people.age
, people.address
, ( select referencenumbers.number
from referencenumbers
where referencenumbers.person_id = people.id
order by referencenumbers.datecreated
fetch first row only )
from people
where people.age > 18;
More details here (requires Oracle 12.1 or later.)
Or this (works in earlier versions):
select people.id
, people.name
, people.age
, people.address
, ( select min(rn.person_id) keep (dense_rank first order by rn.datecreated)
from referencenumbers rn
where rn.person_id = people.id )
from people
where people.age > 18;
(I gave referencenumbers a shorter alias for readability.)

Try this
SELECT
PEOPLE.ID,
PEOPLE.NAME,
PEOPLE.AGE,
PEOPLE.ADDRESS,
REFERENCENUMBERS.NUMBER
FROM PEOPLE
JOIN REFERENCENUMBERS ON REFERENCENUMBERS.PERSON_ID = PEOPLE.ID -- Link back to the main people ID
JOIN
(
SELECT
R.PERSON_ID,
MIN(R.DATECREATED) minc -- To ensure that only the earliest-created reference number is returned.
FROM
REFERENCENUMBERS R -- Give this sub-sub query an alias for the table
GROUP BY R.PERSON_ID
) t ON t.minc = REFERENCENUMBERS.DATECREATED and
t.PERSON_ID = REFERENCENUMBERS.PERSON_ID
WHERE
PEOPLE.AGE > 18 -- Or whatever

How to refactor complicated SQL query which is broken

Here is the simplified model of the domain
In a nutshell, unit grants documents to to a customer. There are two types of units: main units and their child units. Both belong to the same province, and to one province may belong multiple cities. Document has numerous events (processing history). Customer belongs to one city and province.
I have to write query, which returns random set of documents, given a target main unit code. Here is the criteria:
Return 10 documents where the newest event_code = 10
Each document must belong to a different customer living in any city of the unit's region (prefer different cities)
Return the Customers newest Document which meets the criteria
There must be both document types present in the result
Result (customers chosen) should be random with each query
But...
If there's not enough customers, try to use multiple documents of the same customer as a last resort
If there aren't enough documents either, return as much as possible
If there's not a single instance of another document type, then return all the same
There may be million of rows, and the query must be as fast as possible, it is executed frequently.
I'm not sure how to structure this kind of complex query in a sane manner. I'm using Oracle and PL/SQL. Here is something I tried, but it isn't working as expected (returns wrong data). How should I refactor this query and get the random result, and also honor all those borderline rules? I'm also worried about the performance regarding the joins and wheres.
CURSOR c_documents IS
WITH documents_cte AS
SELECT d.document_id AS document_id, d.create_dt AS create_dt,
c.customer_id
FROM documents d
JOIN customers c ON (c.customer_id = d.customer_id AND
c.province_id = (SELECT region_id FROM unit WHERE unit_code = 1234))
WHERE exists (
SELECT 1
FROM event
where document_id = d.document_id AND
event_code = 10
AND create_dt =
SELECT MAX(create_dt)
FROM event
WHERE document_id = d.document_id)
SELECT * FROM documents_cte d
WHERE create_dt = (SELECT MAX(create_dt)
from documents_cte
WHERE customer_id = d.customer_id)
How to correctly make this query with efficiency, randomness in mind? I'm not asking for exact solution, but guidelines at least.

I'd avoid hierarchic tables whenever possible. In your case you are using a hierarchic table to allow for an unlimited depth, but at last it's just two levels you store: provinces and their cities. That should better be just two tables: one for provinces and one for cities. Not a big deal, but that would make your data model simpler and easier to query.
Below I am starting with a WITH clause to get a city table, as such doesn't exist. Then I go step by step: get the customers belonging to the unit, then get their documents and rank them. At last I select the ranked documents and randomly take 10 of the best ranked ones.
with cities as
(
select
c.region_id as city_id,
o.region_id as province_id
from region c
join region p on p.region_id = c.parent_region_id
)
, unit_customers as
(
select customer_id
from customer
where city_id in
(
select city_id
from cities
where
(
select region_id
from unit
where unit_code = 1234
) in (city_id, province_id)
)
)
, ranked_documents as
(
select
document.*,
row_number(partition by customer_id order by create_dt desc) as rn
from document
where customer_id in -- customers belonging to the unit
(
select customer_id
from unit_customers
)
and document_id in -- documents with latest event code = 10
(
select document_id
from event
group by document_id
having max(event_code) keep (dense_rank last order by create_dt) = 10
)
)
select *
from ranked_documents
order by rn, dbms_random.value
fetch first 10 rows only;
This doesn't take into account to get both document types, as this contradicts the rule to get the latest documents per customer.
FETCH FIRST is availavle as of Oracle 12c. In earlier versions you would use one more subquery and another ROW_NUMBER instead.
As to speed, I'd recommend these indexes for the query:
create index idx_r1 on region(region_id); -- already exists for region_id = primary key
create index idx_r2 on region(parent_region_id, region_id);
create index idx_u1 on unit(unit_code, region_id);
create index idx_c1 on customer(city_id, customer_id);
create index idx_e1 on event(document_id, create_dt, event_code);
create index idx_d1 on document(document_id, customer_id, create_dt);
create index idx_d2 on document(customer_id, document_id, create_dt);
One of the last two will be used, the other not. Check which with EXPLAIN PLAN and drop the unused one.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas