Subquery that matches column with several ranges defined in table - sql

I've got a pretty common setup for an address database: a person is tied to a company with a join table, the company can have an address and so forth.
All pretty normalized and easy to use. But for search performance, I'm creating a materialized, rather denormalized view. I only need a very limited set of information and quick queries. Most of everything that's usually done via a join table is now in an array. Depending on the query, I can either search it directly or join it via unnest.
As a complement to my zipcodes column (varchar[]), I'd like to add a states column that has the (German fedaral) states already precomputed, so that I don't have to transform a query to include all kinds of range comparisons.
My mapping date is in a table like this:
CREATE TABLE zip2state (
state TEXT NOT NULL,
range_start CHARACTER VARYING(5) NOT NULL,
range_end CHARACTER VARYING(5) NOT NULL
)
Each state has several ranges, and ranges can overlap (one zip code can be for two different states). Some ranges have range_start = range_end.
Now I'm a bit at wit's end on how to get that into a materialized view all at once. Normally, I'd feel tempted to just do it iteratively (via trigger or on the application level).
Or as we're just talking about 5 digits, I could create a big table mapping zip to state directly instead of doing it via a range (my current favorite, yet something ugly enough that it prompted me to ask whether there's a better way)
Any way to do that in SQL, with a table like the above (or something similar)? I'm at postgres 9.3, all features allowed...
For completeness' sake, here's the subquery for the zip codes:
(select array_agg(distinct address.zipcode)
from affiliation
join company
on affiliation.ins_id = company.id
join address
on address.com_id = company.id
where affiliation.per_id = person.id) AS zipcodes,

I suggest a LATERAL join instead of the correlated subquery to conveniently compute both columns at once. Could look like this:
SELECT p.*, z.*
FROM person p
LEFT JOIN LATERAL (
SELECT array_agg(DISTINCT d.zipcode) AS zipcodes
, array_agg(DISTINCT z.state) AS states
FROM affiliation a
-- JOIN company c ON a.ins_id = c.id -- suspect you don't need this
JOIN address d ON d.com_id = a.ins_id -- c.id
LEFT JOIN zip2state z ON d.zipcode BETWEEN z.range_start AND z.range_end
WHERE a.per_id = p.id
) z ON true;
If referential integrity is guaranteed, you don't need to join to the table company at all. I took the shortcut.
Be aware that varchar or text behaves differently than expected for numbers. For example: '333' > '0999'. If all zip codes have 5 digits you are fine.
Related:
What is the difference between LATERAL and a subquery in PostgreSQL?

Related

select items where id is contained in another table field

I have the following database schema in sqlite3:
Basically, a member has multiple characters. A character plays in an activity (with a mode type) and has results for that activity (character_activity_stats)
I select all of the stats (activity / character_activity_stats) for a specific character and mode like so:
SELECT
*,
activity.mode as activity_mode,
character_activity_stats.id as character_activity_stats_index
FROM
character_activity_stats
INNER JOIN
activity ON character_activity_stats.activity = activity.id,
modes ON modes.activity = activity.id
WHERE
modes.mode = 5 AND
character_activity_stats.character = 1
This works great.
However, now I want to select the same set of data, but by member (basically combine results for all characters for a member).
However, I am not really sure how to even approach this.
Basically, I need to retrieve all character_activity_stats where character_activity_stats.character is a character of the specified member (by id). Any suggestions or pointers? (I am very new to sql).
Join those 3 tables on the right keys:
select *
from character_activity_stats
join character on character_activity_stats.character = character.id
join member on member.id = character.member
where member.id = ?
If you don't need any data from member other than limit by id, then you leave that join off and just do character.member = ? instead.
It's much easier if you use the same name for the primary and foreign keys (i.e. don't use id for the primary key). It also allows you use natural joins so you don't even need to give the join conditions. For the primary key to convention is usually _id. You id and _in in most of the tables, so I don't what is that is about.

Coding Inner Join subquery as field in query

After looking at example after example of both inner joins and subqueries as fields, I'm apparently not getting some aspect, and I would appreciate help please. I am trying to write one query that must, alas, run in MS Access 2007 to talk to an Oracle database. I have to get values from several different places for various bits of data. One of those bits of data is GROUP_CODE (e.g., faculty, staff, student, alum, etc.). Getting that is non-trivial. I am trying to use two inner joins to get the specific value. The value of borrower category must be the value for my main row in the outer query. Here is what this looks like:
Patron table Patron_Barcode table Patron_Group table
Patron_id Barcode Patron_Group_iD
Barcode Patron_Group_id PATRON_Group_Code
I want to get the PATRON_GROUP.PATRON_GROUP_CODE. This is only one of 35 fields I need to get in my query. (Yes, that's terrible, but wearing my librarian hat, i can't write the Java program I'd like to write to do this in a snap.)
So as a test, I wrote this query:
select PATRON.PATRON_ID As thePatron,
(SELECT PATRON_GROUP.PATRON_GROUP_CODE As borrowwerCategory
FROM (PATRON_GROUP
INNER JOIN PATRON_BARCODE ON PATRON_GROUP.PATRON_GROUP_ID = PATRON_BARCODE.PATRON_GROUP_ID
) INNER JOIN PATRON ON PATRON_BARCODE.PATRON_ID = thePatron.PATRON_ID
));
I don't know what I'm doing wrong, but this doesn't work. I've written a fair amount of SQL in my time, but never anything quite like this. What am I doing wrong?
PATRON.BARCODE is the foreign key for the BARCODE table.
PATRON_BARCODE.PATRON_GROUP_ID is the foreign key for the PATRON_GROUP table. PATRON_GROUP_CODE in PATRON_GROUP is he column value that I need.
PATRON.BARCODE -> BARCODE.PATRON_GROUP_ID -> PATRON_GROUP.PATRON_GROUP_CODR>
The main table, PATRON, will have lots of other things, like inner and outer join to PATRON_ADDRESS, etc., and I can't just do an inner join directly to what I want in my main query. This has to happen in a subquery as a field. Thanks.
Ken

How can I improve a mostly "degenerate" inner join?

This is Oracle 11g.
I have two tables whose relevant columns are shown below (I have to take the tables as given -- I cannot change the column datatypes):
CREATE TABLE USERS
(
UUID VARCHAR2(36),
DATA VARCHAR2(128),
ENABLED NUMBER(1)
);
CREATE TABLE FEATURES
(
USER_UUID VARCHAR2(36),
FEATURE_TYPE NUMBER(4)
);
The tables express the concept that a user can be assigned a number of features. The (USER_UUID, FEATURE_TYPE) combination is unique.
I have two very similar queries I am interested in. The first one, expressed in English, is "return the UUIDs of enabled users who are assigned feature X". The second one is "return the UUIDs and DATA of enabled users who are assigned feature X". The USERS table has about 5,000 records and the FEATURES table has about 40,000 records.
I originally wrote the first query naively as:
SELECT u.UUID FROM USERS u
JOIN FEATURES f ON f.USER_UUID=u.UUID
WHERE f.FEATURE_TYPE=X and u.ENABLED=1
and that had lousy performance. As an experiment I tried to see what would happen if I didn't care about whether or not a user was enabled and that inspired me to try:
SELECT USER_UUID FROM FEATURES WHERE TYPE=X
and that ran very quickly. That in turn inspired me to try
(SELECT USER_UUID FROM FEATURES WHERE TYPE=X)
INTERSECT
(SELECT UUID FROM USERS WHERE ENABLED=1)
That didn't run as quickly as the second query, but ran much more quickly than the first.
After more thinking I realized that in the case at hand every user or almost every user was assigned at least one feature, which meant that the join condition was always or almost always true, which meant that the inner join completely or mostly degenerated into a cross join. And since 5,000 x 40,000 = 200,000,000 that is not a good thing. Obviously the INTERSECT version would be dealing with many fewer rows which presumably is why it is significantly faster.
Question: Is INTERSECT really the way go to in this case or should I be looking at some other type of join?
I wrote the query for the one that also needs to return DATA similarly to the very first one:
SELECT u.UUID, u.DATA FROM USERS u
JOIN FEATURES f ON f.USER_UUID=u.UUID
WHERE f.FEATURE_TYPE=X and u.ENABLED=1
But it would seem I can't do the INTERSECT trick here because there's no column in FEATURES that matches the DATA column.
Question: How can I rewrite this to avoid the degenerate join problem and perform like the query that doesn't return DATA?
I would intuitively use the EXISTS clause:
SELECT u.UUID
FROM USERS u
WHERE u.ENABLED=1
AND EXISTS (SELECT 1 FROM FEATURES f where f.FEATURE_TYPE=X and f.USER_UUID=u.UUID)
or similarly:
SELECT u.UUID, u.DATA
FROM USERS u
WHERE u.ENABLED=1
AND EXISTS (SELECT 1 FROM FEATURES f where f.FEATURE_TYPE=X and f.USER_UUID=u.UUID)
This way you can select every field from USERS since there is no need for INTERSECT anymore (which was a rather good choice for the 1st case, IMHO).

How to use the result from a second select in my first select

I am trying to use a second SELECT to get some ID, then use that ID in a second SELECT and I have no idea how.
SELECT Employee.Name
FROM Emplyee, Employment
WHERE x = Employment.DistributionID
(SELECT Distribution.DistributionID FROM Distribution
WHERE Distribution.Location = 'California') AS x
This post got long, but here is a short "tip"
While the syntax of my select is bad, the logic is not. I need that "x" somehow. Thus the second select is the most important. Then I have to use that "x" within the first select. I just don't know how
/Tip
This is the only thing I could imagine, I'm very new at Sql, I think I need a book before practicing, but now that I've started I'd like to finish my small program.
EDIT:
Ok I looked up joins, still don't get it
SELECT Employee.Name
FROM Emplyee, Employment
WHERE x = Employment.DistributionID
LEFT JOIN Distribution ON
(SELECT Distribution.DistributionID FROM Distribution
WHERE Distribution.Location = 'California') AS x
Get error msg at AS and Left
I use name to find ID from upper red, I use the ID I find FROM upper red in lower table. Then I match the ID I find with Green. I use Green ID to find corresponding Name
I have California as output data from C#. I want to use California to find the DistributionID. I use the DistributionID to find the EmployeeID. I use EmployeeID to find Name
My logic:
Parameter: Distribution.Name (from C#)
Find DistributionID that has Distribution.Name
Look in Employment WHERE given DistributionID
reveals Employees that I am looking for (BY ID)
Use that ID to find Name
return Name
Tables:
NOTE: In this example picture the Employee repeats because of the select, they are in fact singular
In "Locatie" (middle table) is Location, I get location (again) from C#, I use California as an example. I need to find the ID first and foremost!
Sory they are not in english, but here are the create tables:
Try this:
SELECT angajati.Nume
FROM angajati
JOIN angajari ON angajati.AngajatID = angajari.AngajatID
JOIN distribuire ON angajari.distribuireid = distribuire.distribuireid
WHERE distribuire.locatie = 'california'
As you have a table mapping employees to their distribution locations, you just need to join that one in the middle to create the mapping. You can use variables if you like for the WHERE clause so that you can call this as a stored procedure or whatever you need from the output of your C# code.
Try this solution:
DECLARE #pLocatie VARCHAR(40)='Alba'; -- p=parameter
SELECT a.AngajatID, a.Nume
FROM Angajati a
JOIN Angajari j ON a.AngajatID=j.AngajatID
JOIN Distribuire d ON j.DistribuireID=d.DistribuireID
WHERE d.Locatie=#pLocatie
You should add an unique key on Angajari table (Employment) thus:
ALTER TABLE Angajari
ADD CONSTRAINT IUN_Angajari_AngajatID_DistribuireID UNIQUE (AngajatUD, DistribuireID);
This will prevent duplicated (AngajatID, DistribuireID).
I don't know how you are connecting Emplyee(sic?) and Employment, but you want to use a join to connect two tables and in the join specify how the tables are related. Joins usually look best when they have aliases so you don't have to repeat the entire table name. The following query will get you all the information from both Employment and Distribution tables where the distribution location is equal to california. You can join employee to employment to get name as well.
SELECT *
FROM Employment e
JOIN Distribution d on d.DistributionID = e.DistributionID
WHERE d.Location = 'California'
This will return the contents of both tables. To select particular records use the alias.[Col_Name] separated by a comma in the select statement, like d.DistributionID to return the DistributionID from the Distribution Table

MS Access Distinct Records in Recordset

So, I once again seem to have an issue with MS Access being finicky, although it seems to also be an issue when trying similar queries in SSMS (SQL Server Management Studio).
I have a collection of tables, loosely defined as follows:
table widget_mfg { id (int), name (nvarchar) }
table widget { id (int), name (nvarchar), mfg_id (int) }
table widget_component { id (int), name (nvarchar), widget_id (int), component_id }
table component { id (int), name (nvarchar), ... } -- There are ~25 columns in this table
What I'd like to do is query the database and get a list of all components that a specific manufacturer uses. I've tried some of these queries:
SELECT c.*, wc.widget_id, w.mfg_id
FROM ((widget_component wc INNER JOIN widget w ON wc.widget_id = w.id)
INNER JOIN widget_manufacturer wm on w.mfg_id = wm.id)
INNER JOIN component c on c.id = wc.component_id
WHERE wm.id = 1
The previous example displays duplicates of any part that is contained in multiple widget_component lists for different widgets.
I've also tried doing:
SELECT DISTINCT c.id, c.name, wc.widget_id, w.mfg_id
FROM component c, widget_component wc, widget w, widget_manufacturer wm
WHERE wm.id=w.mfg_id AND wm.id = 1
This doesn't display anything at all. I was reading about sub-queries, but I do not understand how they work or how they would apply to my current application.
Any assistance in this would be beneficial.
As an aside, I am not very good with either MS Access or SQL in general. I know the basics, but not a lot beyond that.
Edit:
I just tried this code, and it works to get all the component.id's while limiting them to a single entry each. How do I go about using the results of this to get a list of all the rest of the component data (component.*) where the id's from the first part are used to select this data?
SELECT DISTINCT c.part_no
FROM component c, widget w, widget_component wc, widget_manufacturer wm
WHERE(((c.id=wc.component_id AND wc.widget_id=w.id AND w.mfg_id=wm.id AND wm.id=1)))
(P.S. this is probably not the best way to do this, but I am still learning SQL.)
What I'd like to do is query the database and get a list of all
components that a specific manufacturer uses
There are several ways to do this. IN is probably the easiest to write
SELECT c.*
FROM component c
WHERE c.id IN (SELECT c.component_id
FROM widget w
INNER JOIN widget_component c
ON w.id = c.widget_id
WHERE w.mfg_id = 123)
The IN sub query finds all the component ids that a specific manufacturer uses. The outer query then selects any component.id that is that result. It doesn't matter if its in there once or 1000 times it will only get the component record once.
The other ways of doing this are using an EXISTS sub query or using a join to the query (but then you do need to de-dup it)
It sounds like your component -to- widget relationship is one-to-many. Hence the duplicates. (i.e., the same component is used by more than one widget).
Your Select is almost OK --
SELECT c.*, wc.widget_id, w.mfg_id
but the wc.widget_id is causing the duplicates (per the assumption above).
So remove wc.widget_id from the SELECT, or else aggregate it (min, max, count, etc.). Removing is easier. If you agregate, remember to add a group by clause.
Try this:
SELECT DISTINCT c.*, w.mfg_id
Also -- FWIW, it's generally a better practice to use field names, instead of the *