How to identify the most common category referencing the same element? - sql

I have two relations: Location(category,item) and Item(item)
Each item can be listed under multiple categories.
What SQL query can be used in figuring out which two categories, from Location(category,item) most frequently contain the same item?
note: I am looking for a SQL statement but I tagged this question as algorithm / math, as I am willing to accept a solution in the form of an algorithm in case a SQL query can not be provided.

You can do this easily in SQL with join and group by. Join the location table to itself on item, then count the matches. Order by this descending and choose the first one, if you want the pair with the most matches:
select l1.category, l2.category, count(*) as cnt
from location l1 join
location l2
on l1.item = l2.item and
l1.category < l2.category
group by l1.category, l2.category
order by count(*) desc
limit 1;
Note that this assumes that category, item is unique in location. Otherwise, you can use this select:
select l1.category, l2.category, count(distinct l1.item) as cnt

Related

SQL Server question - subqueries in column result with a join?

I have a distinct list of part numbers from one table. It is basically a table that contains a record of all the company's part numbers. I want to add columns that will pull data from different tables but only pertaining to the part number on that row of the distinct part list.
For example: if I have part A, B, C from the unique part list I want to add columns for Purchase quantity, repair quantity, loan quantity, etc... from three totally unique tables.
So it's almost like I need 3 subqueries that will sum of that data from the different tables for each part.
Can anybody steer me in the direction of how to do this? Please and thank you so much!
One method is correlated subqueries. Something like this:
select p.*,
(select count(*)
from purchases pu
where pu.part_id = p.part_id
) as num_purchases,
(select count(*)
from repairs r
where r.part_id = p.part_id
) as num_repairs,
(select count(*)
from loans l
where l.part_id = p.part_id
) as num_loans
from parts p;
Another option is joins with aggregation before the join. Or lateral joins (which are quite similar to correlated subqueries).

Rank order ST_DWithin results by the number of radii a result appears in

I have a table of existing customers and another table of potential customers. I want to return a list of potential customers rank ordered by the number of radii of existing purchasers that they appear in.
There are many rows in the potential customers table per each existing customer, and the radius around a given existing customer could encompass multiple potential customers. I want to return a list of potential customers ordered by the count of the existing customer radii that they fall within.
SELECT pur.contact_id AS purchaser, count(pot.*) AS nearby_potential_customers
FROM purchasers_geocoded pur, potential_customers_geocoded pot
WHERE ST_DWithin(pur.geom,pot.geom,1000)
GROUP BY purchaser;
Does anyone have advice on how to proceed?
EDIT:
With some help, I wrote this query, which seems to do the job, but I'm verifying now.
WITH prequalified_leads_table AS (
SELECT *
FROM nearby_potential_customers
WHERE market_val > 80000
AND market_val < 120000
)
, proximate_to_existing AS (
SELECT pot.prop_id AS prequalified_leads
FROM purchasers_geocoded pur, prequalified_leads_table pot
WHERE ST_DWithin(pot.geom,pur.geom,100)
)
SELECT prequalified_leads, count(prequalified_leads)
FROM proximate_to_existing
GROUP BY prequalified_leads
ORDER BY count(*) DESC;
I want to return a list of potential customers ordered by the count of the existing customer radii that they fall within.
Your query tried the opposite of your statement, counting potential customers around existing ones.
Inverting that, and after adding some tweaks:
SELECT pot.contact_id AS potential_customer
, rank() OVER (ORDER BY pur.nearby_customers DESC
, pot.contact_id) AS rnk
, pur.nearby_customers
FROM potential_customers_geocoded pot
LEFT JOIN LATERAL (
SELECT count(*) AS nearby_customers
FROM purchasers_geocoded pur
WHERE ST_DWithin(pur.geom, pot.geom, 1000)
) pur ON true
ORDER BY 2;
I suggest a subquery with LEFT JOIN LATERAL ... ON true to get counts. Should make use of the spatial index that you undoubtedly have:
CREATE INDEX ON purchasers_geocoded USING gist (geom);
Thereby retaining rows with 0 nearby customers in the result - your original join style would exclude those. Related:
What is the difference between LATERAL and a subquery in PostgreSQL?
Then ORDER BY the resulting nearby_customers in the outer query (not: nearby_potential_customers).
It's not clear whether you want to add an actual rank. Use the window function rank() if so. I made the rank deterministic while being at it, breaking ties with an additional ORDER BY expression: pot.contact_id. Else, peers are returned in arbitrary order which can change for every execution.
ORDER BY 2 is short syntax for "order by the 2nd out column". See:
Select first row in each GROUP BY group?
Related:
How do I query all rows within a 5-mile radius of my coordinates?

PostgreSQL: get the max values from a consult

I need to get the max values from a list of values obtained from a query.
Basically, the problem is this:
I have 2 tables:
Lawyer
id (PK)
surname
name
Case
id (PK)
id_Client
date
id_Lawyer (FK)
And I need to get the Lawyer with the largest number of cases...(There is not problem with that) but, if exist more than one lawyer with the largest number of cases, I should list them.
Any help on this would be appreciated.
SELECT l.*, cases
FROM (
SELECT "id_Lawyer", count(*) AS cases, rank() OVER (ORDER BY count(*) DESC) AS rnk
FROM "Case"
GROUP BY 1
) c
JOIN "Lawyer" l ON l.id = c."id_Lawyer"
WHERE c.rnk = 1;
Basics for the technique (like #FuzzyTree provided):
PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?
You only need a single subquery level since you can run window functions over aggregate functions:
Get the distinct sum of a joined table column
Best way to get result count before LIMIT was applied
Aside: It's better to use legal, lower case, unquoted identifiers in Postgres. Never use a reserved word like Case, that can lead to very confusing errors.

Selecting a grouping that matches a certain criteria, SQL

I have two relations, one is a list of the areas an instructor is able to teach (AreasOfInstructor(InstructorNo,AreaName)) and the other is the result of a subquery that returns a list of AreaNames. I want to group the AreaOfInstructor relation by InstructorNo, and then return each instructor (as represented by InstructorNo) that is able to teach all the areas returned by the subquery.
My attempt:
SELECT InstructorNo
FROM AreasofInstructor
GROUP BY InstructorNo
/**WHERE THE GROUP CONTAINS* (the list of AreaNames returned by the subquery)*/
I'm not sure what the actual SQL commands are that will implement the stuff between the stars on the last line. Thanks for the help!
Edit: Just to be clear, what I'm looking for is the set of instructors that are able to teach in the areas that are returned by the subquery.
To do this, you can join both relations, group by InstructorNo, and then validate that the distinct count of AreaNames per InstructorNo matches the distinct count of AreaNames in the AreaNames relation.
with AreaNames as (subquery)
select i.InstructorNo, count(distinct i.AreaName)
from AreasofInstructor i
join AreaNames n
on n.AreaName = i.AreaName
group by i.InstructorNo
having count(distinct i.AreaName) = (select count(distinct AreaName) from AreaNames)
It's better to use Common Table Expression are more readable than a sub-query.
Check if this is what you are looking for?
WITH Areas (AreaName)
AS
(
*sub-query goes here*
)
SELECT DISTINCT
InstructorNo
FROM
AreasOfInstructor AOI
INNER JOIN
Areas A ON AOI.AreaName = A.AreaName

Can peewee nest SELECT queries such that the outer query selects on an aggregate of the inner query?

I'm using peewee2.1 with python3.3 and an sqlite3.7 database.
I want to perform certain SELECT queries in which:
I first select some aggregate (count, sum), grouping by some id column; then
I then select from the results of (1), aggregating over its aggregate. Specifically, I want to count the number of rows in (1) that have each aggregated value.
My database has an 'Event' table with 1 record per event, and a 'Ticket' table with 1..N tickets per event. Each ticket record contains the event's id as a foreign key. Each ticket also contains a 'seats' column that specifies the number of seats purchased. (A "ticket" is really best thought of as a purchase transaction for 1 or more seats at the event.)
Below are two examples of working SQLite queries of this sort that give me the desired results:
SELECT ev_tix, count(1) AS ev_tix_n FROM
(SELECT count(1) AS ev_tix FROM ticket GROUP BY event_id)
GROUP BY ev_tix
SELECT seat_tot, count(1) AS seat_tot_n FROM
(SELECT sum(seats) AS seat_tot FROM ticket GROUP BY event_id)
GROUP BY seat_tot
But using Peewee, I don't know how to select on the inner query's aggregate (count or sum) when specifying the outer query. I can of course specify an alias for that aggregate, but it seems I can't use that alias in the outer query.
I know that Peewee has a mechanism for executing "raw" SQL queries, and I've used that workaround successfully. But I'd like to understand if / how these queries can be done using Peewee directly.
I posted the same question on the peewee-orm Google group. Charles Leifer responded promptly with both an answer and new commits to the peewee master. So although I'm answering my own question, obviously all credit goes to him.
You can see that thread here: https://groups.google.com/forum/#!topic/peewee-orm/FSHhd9lZvUE
But here's the essential part, which I've copied from Charles' response to my post:
I've added a couple commits to master which should make your queries
possible
(https://github.com/coleifer/peewee/commit/22ce07c43cbf3c7cf871326fc22177cc1e5f8345).
Here is the syntax,roughly, for your first example:
SELECT ev_tix, count(1) AS ev_tix_n FROM
(SELECT count(1) AS ev_tix FROM ticket GROUP BY event_id)
GROUP BY ev_tix
ev_tix = SQL('ev_tix') # the name of the alias.
(Ticket
.select(ev_tix, fn.count(ev_tix).alias('ev_tix_n'))
.from_(
Ticket.select(fn.count(Ticket.id).alias('ev_tix')).group_by(Ticket.event))
.group_by(ev_tix))
This yields the following SQL:
SELECT ev_tix, count(ev_tix) AS ev_tix_n FROM (SELECT Count(t2."id")
AS ev_tix FROM "ticket" AS t2 GROUP BY t2."event_id")
GROUP BY ev_tix