Selecting distinct value pairs in EAV - sql

I'm working on a user database where the profile data has been changed from a simple table into a Entity-Attribute-Value table.
Where as before the structure was along these lines:
userid (int)
address 1 (varchar)
city (varchar)
country (varchar)
It's now along these lines:
userid (int)
key (varchar)
value (varchar)
eg
userid key value
150 city London
150 country UK
151 city New York
151 country USA
152 country Mexico
I need to get a distinct list of city / country pairs and a count of all users for each country:
city country count
London UK 18
New York USA 25
There is no guarantee each key value pair will exist for each user, i.e there could be city, or country or both or neither as well as any number of other key values pairs.
This was straightforward with the old structure, but I can't even think how to begin on this, and would be grateful for some pointers

Your best solution is to go back to the traditional table because EAV makes most querying much harder than it should be - witness your problems here. You're going to be doing self-joins until you're sick of them, remanufacturing the table structure that allows you to perform sensible queries.
Cities and countries for each user ID:
SELECT a.userID, a.value AS city, b.value AS country
FROM EAV AS a
JOIN EAV AS b ON a.UserID = b.UserID
WHERE a.key = 'city'
AND b.key = 'country';
So, you end up with:
SELECT city, country, count(*)
FROM (SELECT a.userID, a.value AS city, b.value AS country
FROM EAV AS a
JOIN EAV AS b ON a.UserID = b.UserID
WHERE a.key = 'city'
AND b.key = 'country'
) AS c
GROUP BY city, country;
If there's a chance that someone might have two city or two country records, this will give you a Cartesian product with as many rows for that user as the product of the number of city and country records for that user.
This quite deliberately and consciously ignores users who have a city and no country or a country and no city (let alone those who have neither). Extending the solution to deal with those is only modestly painful - you end up with a 3-way UNION, I think, though you might be able to devise something with multiple left outer joins. But the fact that data can be entered into an EAV system without the necessary constraints to ensure that there is a city and a country for a user is simply one of the many reasons for rejecting EAV.
I'm sorry you had this foisted on you. I recommend looking at http://careers.stackoverflow.com/ as a way out of your pain, for this is only the beginning of it.
Dealing with users without either city or country or both. I think this will more or less do it:
SELECT a.userID, b.value AS city, c.value AS country
FROM (SELECT DISTINCT UserID FROM EAV) AS a
LEFT JOIN EAV AS b ON a.UserID = b.UserID
LEFT JOIN EAV AS c ON a.UserID = c.UserID
WHERE b.key = 'city'
AND c.key = 'country';
This should give you one record per user as long as there are no multiple city or country records for that user. The a scan gives you the list of unique user IDs that exist in the EAV table; the two outer joins give you the corresponding city or cities and corresponding country or countries for each such user ID, with nulls being generated if there is no city record or country record (or both) for the given user ID.

re: I need to get a distinct list of city / country pairs
SELECT DISTINCT country,city
FROM
(SELECT DISTINCT userid, VALUE AS country FROM TABLE WHERE KEY = 'country') country INNER JOIN
(SELECT DISTINCT userid, VALUE AS city FROM TABLE WHERE KEY = 'city') city ON
country.userid = city.userid
--count of all users for each country
SELECT VALUE AS country,
COUNT(DISTINCT userid) AS user_count
FROM TABLE
WHERE KEY = 'country'
GROUP BY
VALUE

Related

How to make sure result pairs are unique - without using distinct?

I have three tables I want to iterate over. The tables are pretty big so I will show a small snippet of the tables. First table is Students:
id
name
address
1
John Smith
New York
2
Rebeka Jens
Miami
3
Amira Sarty
Boston
Second one is TakingCourse. This is the course the students are taking, so student_id is the id of the one in Students.
id
student_id
course_id
20
1
26
19
2
27
18
3
28
Last table is Courses. The id is the same as the course_id in the previous table. These are the courses the students are following and looks like this:
id
type
26
History
27
Maths
28
Science
I want to return a table with the location (address) and the type of courses that are taken there. So the results table should look like this:
address
type
The pairs should be unique, and that is what's going wrong. I tried this:
select S.address, C.type
from Students S, Courses C, TakingCourse TC
where TC.course_id = C.id
and S.id = TC.student_id
And this does work, but the pairs are not all unique. I tried select distinct and it's still the same.
Multiple students can (and will) reside at the same address. So don't expect unique results from this query.
Only an overview is needed, so that's why I don''t want duplicates
So fold duplicates. Simple way with DISTINCT:
SELECT DISTINCT s.address, c.type
FROM students s
JOIN takingcourse t ON s.id = t.student_id
JOIN courses c ON t.course_id = c.id;
Or to avoid DISTINCT (why would you for this task?) and, optionally, get counts, too:
SELECT c.type, s.address, count(*) AS ct
FROM students s
JOIN takingcourse t ON s.id = t.student_id
JOIN courses c ON t.course_id = c.id
GROUP BY c.type, s.address
ORDER BY c.type, s.address;
A missing UNIQUE constraint on takingcourse(student_id, course_id) could be an additional source of duplicates. See:
How to implement a many-to-many relationship in PostgreSQL?

Postgres Question: Aren't both a and b correct?

For questions below, use the following schema definition.
restaurant(rid, name, phone, street, city, state, zip)
customer(cid, fname, lname, phone, street, city, state, zip)
carrier(crid, fname, lname, lp)
delivery(did, rid, cid, tim, size, weight)
pickup(did, tim, crid)
dropoff(did, tim, crid)
It's a schema for a food delivery business that employs food carriers (carrier table).
Customers (customer table) order food from restaurants (restaurant table).
The restaurants order a delivery (delivery table); to deliver food from restaurant to customer.
The pickup table records when carrier picks up food at restaurant.
The dropoff table records when carrier drops off food at customer.
1.Find customers who have less than 5 deliveries.
a. select cid,count()
from delivery
group by cid
having count() < 5;
b. select a.cid,count()
from customer a
inner join delivery b
using(cid)
group by a.cid
having count() < 5;
c. select a.cid,count()
from customer a
left outer join delivery b
on a.cid=b.cid
group by a.cid
having count() < 5;
d. select cid,sum(case when b.cid is not null then 1 else 0 end)
from customer a
left outer join delivery b
using (cid)
group by cid
having sum(case when b.cid is not null then 1 else 0 end) < 5;
e. (write your own answer)
No, they are not correct. They miss customers who have had no deliveries.
The last is the best of a bunch of not so good queries. A better version would be:
select c.cid, count(d.cid)
from customer c left outer join
delivery d
on c.cid = d.cid
group by c.cid
having count(d.cid) < 5;
The sum(case) is over kill. And Postgres even offers a better solution than that!
count(*) filter (where d.cid is not null)
But count(d.cid) is still more concise.
Also note the use of meaningful table aliases. Don't get into the habit of using arbitrary letters for tables. That just makes queries hard to understand.

Excluding a value from a count with SQL

I have two temp tables set up. Table A consists of members and the businesses that they manage, multiple members can be associated to a single business. Table B consists of just the members, their ID's, and the class of their business relationship (Retail, Business, or Retail and Business).
The query I need to come up with is to find out which of those members from Table B do not have a Retail relationship at all. Unfortunately a simple where clause will not suffice, because a member may have multiple relationships, i.e. John Doe has a Retail AND Business relationship, or possibly all three.
I can try SELECT * FROM B WHERE class='Business' which would pull all members who have Business relationships listed in the column, but on the flip side when I say WHERE class = 'Retail', it would bring in all those members who have a Business relationship as well. I want to exclude anyone from my count who doesn't have a retail relationship at all, so from my example above, John Doe would not be included.
I don't have any test data, but give this a try
Select
ta.*
From seequillTableA as ta
Left Join
(Select
ID
, COUNT(*) as cntRetail
From seequillTableB
Where Class <> 'Retail' AND Class <> 'Retail and Business'
Group By ID
Having COUNT(*) = 0
) as tb
On ta.ID = tb.ID
Where tb.cntRetail = 0
The relationships in Table B that are retail start with "Retail..." so we can select these using LIKE 'Retail%' then exclude them from the members we select from Table A by using NOT IN.
SELECT *
FROM TableA
WHERE MemberID NOT IN
(SELECT MemberID FROM TableB WHERE class LIKE 'Retail%')

How to append multiple columns from two select sub queries together using the same primary key?

I have two tables I need to join into a view. My first table, called ttddocseg is a history of all segments of a flight itinerary. It will contain a departure city code and arrival city code in this table, among other irrelevant info. My second table is a city table that has a connection to the segment table on the city code key. What I want to be able to do is pull in the extra city information from the city table into a single view with the segment transaction data, when the 'key' (city code) is used twice: the arrival and departure city.
Example:
SELECT arvlctycode, dpartctycode FROM ttddocseg
Yields:
DFW ,DEN
DEN ,ORD
LAX ,DEN
ORD ,LAX
DEN ,DCA
...
And
SELECT ctycode, ctyname FROM trfcty
Yields:
DFW ,Dallas/Fortworth
DEN ,Denver
LAX ,Los Angeles
...
So my desired output would be, when joining the segment and city tables:
DFW, Dallas/Fortworth, DEN, Denver
DEN, Denver, ORD, Chicago/OHARE
...
So in theory I would join two subqueries that each joins the tables, one on arrival city code and another on destination city code, and then put those sets of columns next to each other, order by my tables key to make sure the arrival/dest pair up properly. Everything I have tried has not yet worked though. My best efforts so far:
select
(
select a.ctycode, b.arvlctycode, b.arvldate, b.actualmile, b.aircrrcode, b.tdtrxnum, b.tddocnum, b.segnum
from trfcty a inner join ttddocseg b on a.client = b.client and a.ctycode = b.arvlctycode
where a.client = 'TT' and ctytype = 'A'
--order by b.tdtrxnum, b.tddocnum, b.segnum
) AS Arrival,
(
select a.ctycode, b.dpartctycode , b.dpartdate, b.actualmile, b.aircrrcode, b.tdtrxnum, b.tddocnum, b.segnum
from trfcty a inner join ttddocseg b on a.client = b.client and a.ctycode = b.dpartctycode
where a.client = 'TT' and ctytype = 'A'
--order by b.tdtrxnum, b.tddocnum, b.segnum
) AS Departure
The commented out section 'Order by' is what I tried to do to make sure the pairing of arrival/departure city stay lined up, as those are the primary keys for the segment table.
I'm getting errors doing that, of course, but the idea behind it is clear I think. I just don't know how to do it the right way.
Is this what you are looking for?
SELECT s.arvlctycode, s.dpartctycode, ca.cityname, cs.cityname
FROM ttddocseg s JOIN
trfcty ca
ON s.arvlctycode = ca.ctycode JOIN
trfcty cd
ON s.dpartctycode = cs.ctycode;
This returns the cities associated with each of the codes.

sql query to select matching rows for all or nothing criteria

I have a table of cars where each car belongs to a company. In another table I have a list of company locations by city.
I want to select all cars from the cars table whose company has locations on all cities passed into the stored procedure, otherwise exclude those cars all together even if it falls short of one city.
So, I've tried something like:
select id, cartype from cars where companyid in
(
select id from locations where cityid in
(
select id from cities
)
)
This doesn't work as it obviously satisfies the condition if ANY of the cities are in the list, not all of them.
It sounds like a group by count, but can't make it work with what I tried.
I"m using MS SQL 2005
One example:
select id, cartype from cars c
where ( select count(1) from cities where id in (...))
= ( select count(distinct cityid)
from locations
where c.companyid = locations.id and cityid in (...) )
Maybe try counting all the cities, and then select the car if the company has the same number of distinct location cities are there are total cities.
SELECT id, cartype FROM cars
WHERE
--Subquery to find the number of locations belonging to car's company
(SELECT count(distinct cities.id) FROM cities
INNER JOIN locations on locations.cityid = cities.id
WHERE locations.companyId = cars.companyId)
=
--Subquery to find the total number of locations
(SELECT count(distinct cities.id) FROM cities)
I haven't tested this, and it may not be the most efficient query, but I think this might work.
Try this
SELECT e.*
FROM cars e
WHERE NOT EXISTS (
SELECT 1
FROM Cities p
WHERE p.location = e.Location
)