SQL Query Help (Postgresql) - sql

I'm having trouble wrapping my head around a query. I have the following 3 tables:
documents (
id,
title
);
positions (
id,
title
);
documents_positions (
document_id,
position_id
);
What I am trying to get (my requested outcome) is a matrix of documents, and what positions they apply to. So each row would have document title, and then have a column for every position and a column next to it with True or False if the positions applies to the document. I suspect some kind of LEFT JOIN is required, because on each row after document, I want to list every position from the positions table and whether or not the document applies to it. Make sense?

You could use a cross join to build the matrix, and then left join to find the positions in the matrix that are filled:
select d.title
, p.title
, case when dp.document_id is null then 'hrmm' else 'HALLELUJAH' end
from documents d
cross join
positions p
left join
documents_positions dp
on dp.document_id = d.id
and dp.position_id = p.id

Since you want to turn positions rows into columns you have to "pivot" them. In PostgreSQL this is done with crosstab function. However, crosstab seems to require that you define the number of output columns in the query which you can't do as the number of rows in the positions is subject to change?
I'm not a PostgreSQL user myself so perhaps there is some trick to build the dynamic query I don't know of but it seems to be easier to use query like Andomar posted and then pivot the rows in your client code...

Related

In PostgreSQL, return rows with unique values of one column based on the minimum value of another

Background
I've got this PostgreSQL join that works pretty well for me:
select m.id,
m.zodiac_sign,
m.favorite_color,
m.state,
c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id
As you can see, I'm joining two tables to bring in a combined_id, which I need for later analysis elsewhere.
The Goal
I'd like to write a query that does so by picking the combined_id that's got the lowest value of m.id next to it (along with the other variables too). This ought to result in a new table with unique/distinct values of combined_id.
The Problem
The issue is that the current query returns ~300 records, but I need it to return ~100. Why? Each combined_id has, on average, 3 different m.id's. I don't actually care about the m.id's; I care about getting a unique combined_id. Because of this, I decided that a good "selection criterion" would be to select rows based on the lowest value m.id for rows with the same combined_id.
What I've tried
I've consulted several posts on this and I feel like I'm fairly close. See for instance this one or this one. This other one does exactly what I need (with MAX instead of MIN) but he's asking for it in Unix Bash 😞
Here's an example of something I've tried:
select m.id,
m.zodiac_sign,
m.favorite_color,
m.state,
c.combined_id
from people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c on m.id = c.id
WHERE m.id IN (select min(m.id))
This returns the error ERROR: aggregate functions are not allowed in WHERE.
Any ideas?
Postgres's DISTINCT ON is probably the best approach here:
SELECT DISTINCT ON (c.combined_id)
m.id,
m.zodiac_sign,
m.favorite_color,
m.state,
c.combined_id
FROM people."People" m
LEFT JOIN people.person_to_person_composite_crosstable c
ON m.id = c.id
ORDER BY
c.combined_id,
m.id;
As for performance, the following index on the crosstable might speed up the query:
CREATE INDEX idx ON people.person_to_person_composite_crosstable (id, combined_id);
If used, the above index should let the join happen faster. Note that I cover the combined_id column, which is required by the select.

Insert Into where not exists from specific category

I have a table that contains several repair categories, and items that are associated with each repair category. I am trying to insert all the standard items from a specific repair category that don't already exist into a Details table.
TblEstimateDetails is a join table for an Estimate Table and StandardItem Table. And TblCategoryItems is a join table for the Repair Categories and their respective Standard Items.
For example in the attached image, Left side are all the Standard Items in a Repair Category, and Right side are all the Standard Items that are already in the EstimateDetails table.
Standard Items All vs Already Included
I need to be able to insert the 6 missing GUIDS from the left, and into the table on the right, and only for a specific estimate GID.
This is being used in an Access VBA script, which I will translate into the appropriate code once I get the sql syntax correct.
Thank you!
INSERT INTO TblEstimateDetails(estimate_GID, standard_item_GID)
SELECT
'55DEEE29-7B79-4830-909C-E59E831F4297' AS estimate_GID
, standard_item_GID
FROM TblCategoryItems
WHERE repair_category_GID = '32A8AE6D-A512-4868-8E1A-EF0357AB100E'
AND NOT EXISTS
(SELECT standard_item_GID
FROM TblEstimateDetails
WHERE estimate_GID = '55DEEE29-7B79-4830-909C-E59E831F4297');
Some things to try: 1) simplify to a select query to see if it selects the right records, 2) use a NOT IN statement instead of NOT EXISTS. There's no reason NOT EXISTS shouldn't work, I'd just try a different way if it isn't working.
SELECT '55DEEE29-7B79-4830-909C-E59E831F4297' AS estimate_GID,
standard_item_GID
FROM TblCategoryItems
WHERE repair_category_GID = '32A8AE6D-A512-4868-8E1A-EF0357AB100E'
AND standard_item_GID NOT IN
(SELECT standard_item_GID FROM TblEstimateDetails
WHERE estimate_GID = '55DEEE29-7B79-4830-909C-E59E831F4297');
Got it figured out. Access needs the subquery to be correlated to main query to work. So I set the WHERE clause in the subquery to equal the matching column in the main query. And I had to join the Estimates table so that it picked only the items in a specific estimate.
SELECT
'06A2E0A9-9AE5-4073-A856-1CCE6D9C48BB' AS estimate_GID
, CI.standard_item_GID
FROM TblCategoryItems CI
INNER JOIN TblEstimates E ON CI.repair_category_GID=E.repair_category_GID
WHERE E.repair_category_GID = '15238097-305E-4456-B86F-3787C9B8219B'
AND NOT EXISTS
(SELECT ED.standard_item_GID
FROM TblEstimateDetails ED
WHERE E.estimate_GID=ED.estimate_GID
);

Why does my left join in Access have fewer rows than the left table?

I have two tables in an MS Access 2010 database: TBLIndividuals and TblIndividualsUpdates. They have a lot of the same data, but the primary key may not be the same for a given person's record in both tables. So I'm doing a join between the two tables on names and birthdates to see which records correspond. I'm using a left join so that I also get rows for the people who are in TblIndividualsUpdates but not in TBLIndividuals. That way I know which records need to be added to TBLIndividuals to get it up to date.
SELECT TblIndividuals.PersonID AS OldID,
TblIndividualsUpdates.PersonID AS UpdateID
FROM TblIndividualsUpdates LEFT JOIN TblIndividuals
ON ( (TblIndividuals.FirstName = TblIndividualsUpdates.FirstName)
and (TblIndividuals.LastName = TblIndividualsUpdates.LastName)
AND (TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or (TblIndividuals.DateBorn is null
and (TblIndividuals.MidName is null and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName))));
TblIndividualsUpdates has 4149 rows, but the query returns only 4103 rows. There are about 50 new records in TblIndividualsUpdates, but only 4 rows in the query result where OldID is null.
If I export the data from Access to PostgreSQL and run the same query there, I get all 4149 rows.
Is this a bug in Access? Is there a difference between Access's left join semantics and PostgreSQL's? Is my database corrupted (Compact and Repair doesn't help)?
ON (
TblIndividuals.FirstName = TblIndividualsUpdates.FirstName
and
TblIndividuals.LastName = TblIndividualsUpdates.LastName
AND (
TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn
or
(
TblIndividuals.DateBorn is null
and
(
TblIndividuals.MidName is null
and TblIndividualsUpdates.MidName is null
or TblIndividuals.MidName = TblIndividualsUpdates.MidName
)
)
)
);
What I would do is systematically remove all the join conditions except the first two until you find the records drop off. Then you will know where your problem is.
This should never happen. Unless rows are being inserted/deleted in the meantime,
the query:
SELECT *
FROM a LEFT JOIN b
ON whatever ;
should never return less rows than:
SELECT *
FROM a ;
If it happens, it's a bug. Are you sure the queries are exactly like this (and you have't omitted some detail, like a WHERE clause)? Are you sure that the first returns 4149 rows and the second one 4103 rows? You could make another check by changing the * above to COUNT(*).
Drop any indexes from both tables which include those JOIN fields (FirstName, LastName, and DateBorn). Then see whether you get the expected
4,149 rows with this simplified query.
SELECT
i.PersonID AS OldID,
u.PersonID AS UpdateID
FROM
TblIndividualsUpdates AS u
LEFT JOIN TblIndividuals AS i
ON
(
(i.FirstName = u.FirstName)
AND (i.LastName = u.LastName)
AND (i.DateBorn = u.DateBorn)
);
For whatever it is worth, since this seems to be a deceitful bug and any additional information could help resolving it, I have had the same problem.
The query is too big to post here and I don't have the time to reduce it now to something suitable, but I can report what I found. In the below, all joins are left joins.
I was gradually refining and changing my query. It had a derived table in it (D). And the whole thing was made into a derived table (T) and then joined to a last table (L). In any case, at one point in its development, no field in T that originated in D participated in the join to L. It was then the problem occurred, the total number of rows mysteriously became less than the main table, which should be impossible. As soon as I again let a field from D participate (via T) in the join to L, the number increased to normal again.
It was as if the join condition to D was moved to a WHERE clause when no field in it was participating (via T) in the join to L. But I don't really know what the explanation is.

SQL / Access - Left Join question, where do the values come from?

I have a cross tab query that looks like this:
State Building 1 2 3 4 5
NY
SC
FL
The problem I am having is that I want all of the states to show up, regardless of whether or not there is data. So, I need a Left Join. Unfortunately, when I substitute the Inner Join for Left Join in the code, nothing changes. I am just trying to figure out where the problem is coming from, and I think it may be one of the following causes:
The query doesn't know where to pull
the values from (The states are all
listed in a look up but this may
not be where it's looking)
Left Joins don't work on cross tab
queries.
Could someone please tell me what I am doing wrong?
Here's the SQL:
TRANSFORM Nz(Count(Demographics.ID))+0 AS CountOfID
SELECT Demographics.State
FROM Research
INNER JOIN ( Demographics
INNER JOIN [Status]
ON Demographics.ID=[Status].ID
)
ON (Research.ID=Demographics.ID)
AND (Research.ID=[Status].ID)
WHERE ((([Status].Building_Status)='Complete'))
GROUP BY Demographics.State,
[Status].Building_Status
PIVOT Research.Site In (1,2,3,4,5,6,7,8,9,10,11)
Ideally, I could specify the row values in the In statement above (which is currently specifying column values 1-10), but I don't think this can be done.
I didn't completely get your example, but from your comment "I want all of the states to show up, regardless of whether or not there is data", I think you want an "OUTER" join. Outer joins do just that -- they include data regardless of whether or not there is a "match". Inner joins (the default) include data only if there is a match.
Hope this helps,
John
You should change it this way:
TRANSFORM Nz(Count(Demographics.ID))+0 AS CountOfID
SELECT Demographics.State
FROM Demographics
LEFT JOIN ( Research
LEFT JOIN [Status]
ON Demographics.ID=[Status].ID
)
ON (Research.ID=Demographics.ID)
AND (Research.ID=[Status].ID)
WHERE ((([Status].Building_Status)='Complete'))
GROUP BY Demographics.State,
[Status].Building_Status
PIVOT Research.Site In (1,2,3,4,5,6,7,8,9,10,11)
If your problem is that there may be 0 rows in table Demographics with State = 'NY' (for example) but you want to see state 'NY' in the results anyway, then you need another table e.g. States that has all the states in it, and make this your driving table:
SELECT States.State
FROM States
LEFT OUTER JOIN Demographics ON Demographics.state = States.state
...
you can use a two steps solution (it is more readable)
first enclose your actual (and working) sql statment as a view
create view step1 as
TRANSFORM ....
SELECT ...
FROM ...
PIVOT ...
and then create a select like:
(you will need a state table like #Tony Andrews says)
select * -- or whatever
from state
left join step1 on state.id = step1.state
and that's all

Sql Query Linking Two Tables

One doubt in MSSQL.
There are two tables in a databases.
Table 1 named Property contain
fields PRPT_Id(int),PRPT_Name(varchar), PRPT_Status(bit)
Table 2 named PropertyImages contain fields PIMG_Id(int),PIMG_ImageName(varchar),PRPT_Id(int),PIMG_Status(bit)
These two tables follow a one-to-many relationship.
That means the each Property can have zero, one or more PropertyImages corresponding to it.
What is required is a query to display
PRPT_Id, PRPT_Name, ImageCount(Count of all images corresponding to a PRPT_Id where PIMG_Status is true. o if there arent any images), FirstImageName(if there are n images, the name of the first image in the image table corresponding to the PRPT_Id with PIMG_Status true. if there aren't any images we fill that with whitespace/blank) . another condition is that PRPT_Status should be true.
Edit Note - Both the tables are having autoincremented integers as primary key.
So first Image name will be the name with MIN(PIMG_Id), isn't that so?
I want the PIMG_ImageName corresponding to the MIN(PIMG_ID) in the resultset
Assuming that FirstImage means the one with the lowest Id, then this should be at least close enough to test to completion:
SELECT
PRPT_Id,
PRPT_Name,
ISNULL(pi1.ImageName, '') AS FirstImageName,
COUNT(1) AS ImageCount
FROM Property AS p
LEFT JOIN PropertyImages AS pi
ON p.PRPT_Id = pi.PRPT_Id
LEFT JOIN PropertyImage AS pi1
ON p.PRPT_Id = pi1.PRPT_Id
LEFT JOIN PropertyImgage AS pi2
ON p.PRPT_Id = pi2.PRPT_Id
AND pi1.PIMG_Id > pi2.PIMG_Id
WHERE PRPT_Status = TRUE
AND pi1.PIMG_Status = TRUE
AND pi2.PIMG_ImageName IS NULL
The double LEFT JOIN assures that you get the first image record in pi1. If the "First" rule is different, then adjust this join accordingly.
This should be about as efficient as possible. It has no subqueries.
It seems that you have to write nested queries to display what you want.
If that's the case (I'm no SQL expert), I'd recommend you to start with the innermost query, then you go out until you reach the outermost (and final) query.
First, you have to retrieve the PIMGs and group them by PRPT.
SELECT PRPT_Id, COUNT(PIMG_Id) AS PRPT_ImageCount, MIN(PIMG_Id) AS PRPT_MinImage
FROM PropertyImages
GROUP BY PRPT_Id
That retrieves the PRPT_Id's of the properties that do have associated images. However, you don't get any results for the properties that don't have any associated images.
After that, we will left join the Properties table with the previous query. The left join ensures that all the Properties will be retrieved, even if they don't appear in the results of the right query (that is, even if they don't have any associated images).
SELECT Properties.*, PRPT_ImageCount, PRPT_MinImage
FROM Properties LEFT JOIN (
SELECT PRPT_Id, COUNT(PIMG_Id) AS PRPT_ImageCount, MIN(PIMG_Id) AS PRPT_MinImage
FROM PropertyImages
GROUP BY PRPT_Id ) Temp ON ( Properties.PRPT_Id = Temp.PRPT_Id )
I hope that my SQL isn't wrong and that this post helps you.
Regards,
Edit:
SELECT Properties.*,
PRPT_ImageCount,
PRPT_MinImage,
PIMG_ImageName
FROM ( Properties LEFT JOIN
( SELECT PRPT_Id,
COUNT(PIMG_Id) AS PRPT_ImageCount,
MIN(PIMG_Id) AS PRPT_MinImage
FROM PropertyImages
GROUP BY PRPT_Id ) Temp1
ON ( Properties.PRPT_Id = Temp1.PRPT_Id ) ) Temp2 LEFT JOIN
PropertyImages ON ( PropertyImages.PIMG_Id = Temp2.PRPT_MinImage )
Now, I'm really unsure of my SQL.
the name of the first image in the image table corresponding to the PRPT_Id with PIMG_Status true
You may want to define "first" in this context. Tables aren't really ordered so, unless you keep your own ordering, the term first needs to mean "first one found".
Assuming the above is true (you want first found image), then that's the only real hard part of the query (and the type of thing that's tripped me up before). The rest of it seems fairly straight forward. If I can find the time tomorrow, I'll try to put something together for you... but it seems likely someone else will be able to provide the answer before then.