"Simple" SQL Query - sql

Each of my clients can have many todo items and every todo item has a due date.
What would be the query for discovering the next undone todo item by due date for each file? In the event that a client has more than one todo, the one with the lowest id is the correct one.
Assuming the following minimal schema:
clients (id, name)
todos (id, client_id, description, timestamp_due, timestamp_completed)
Thank you.

I haven't tested this yet, so you may have to tweak it:
SELECT
TD1.client_id,
TD1.id,
TD1.description,
TD1.timestamp_due
FROM
Todos TD1
LEFT OUTER JOIN Todos TD2 ON
TD2.client_id = TD1.client_id AND
TD2.timestamp_completed IS NULL AND
(
TD2.timestamp_due < TD1.timestamp_due OR
(TD2.timestamp_due = TD1.timestamp_due AND TD2.id < TD1.id)
)
WHERE
TD2.id IS NULL
Instead of trying to sort and aggregate, you're basically answering the question, "Is there any other todo that would come before this one?" (based on your definition of "before"). If not, then this is the one that you want.
This should be valid on most SQL platforms.

This question is the classic pick-a-winner for each group. It gets posted about twice a day.
SELECT *
FROM todos t
WHERE t.timestamp_completed is null
and
(
SELECT top 1 t2.id
FROM todos t2
WHERE t.client_id = t2.client_id
and t2.timestamp_completed is null
--there is no earlier record
and
(t.timestamp_due > t2.timestamp_due
or (t.timestamp_due = t2.timestamp_due and t.id > t2.id)
)
) is null

SELECT c.name, MIN(t.id)
FROM clients c, todos t
WHERE c.id = t.client_id AND t.timestamp_complete IS NULL
GROUP BY c.id
HAVING t.timestamp_due <= MIN(t.timestamp_due)
Avoids a subquery, correlated or otherwise but introduces a bunch of aggregate operations which aren't much better.

Some Jet SQL, I realize it is unlikely that the questioner is using Jet, however the reader may be.
SELECT c.name, t.description, t.timestamp_due
FROM (clients c
INNER JOIN
(SELECT t.client_id, Min(t.id) AS MinOfid
FROM todos t
WHERE t.timestamp_completed Is Null
GROUP BY t.client_id) AS tm
ON c.id = tm.client_id)
INNER JOIN todos t ON tm.MinOfid = t.id

The following should get you close, first get the min time for each client, then lookup the client/todo information
SELECT
C.Id,
C.Name,
T.Id
T.Description,
T.timestamp_due
FROM
{
SELECT
client_id,
MIN(timestamp_due) AS "DueDate"
FROM todos
WHERE timestamp_completed IS NULL
GROUP BY ClientId
} AS MinValues
INNER JOIN Clients C
ON (MinValues.client_id = C.Id)
INNER JOIN todos T
ON (MinValues.client_id = T.client_id
AND MinValues.DueDate = T.timestamp_due)
ORDER BY C.Name
NOTE: Written assuming SQL Server

Related

postgres: LEFT JOIN table and field does not exist

This is my query
SELECT org.id,
org.name,
org.type,
org.company_logo,
(SELECT org_profile.logo_url FROM org_profile WHERE org_profile.org_id = org.id AND org_profile.status = 'active' ORDER BY org_profile.id DESC LIMIT 1) as logo_url,
org.short_description,
org_profile.value_prop,
count(*) OVER () AS total
FROM org
LEFT JOIN user_info ON user_info.id = org.approved_by
INNER JOIN (select distinct org_profile.org_id from org_profile) org_profile ON org_profile.org_id = org.id
WHERE
org.type = 'Fintech'
AND org.status = 'APPROVED'
AND org.draft != TRUE
ORDER BY org.id DESC
I am using LEFT JOIN query with my org_profile table. I used distinct for unique org id but the problem is org_profile.value_prop column does not work. The error is showing column org_profile.value_prop does not exist
I'm trying to solve this issue. But I don't figure out this issue.
basically, the error informs that you try to get the value_prop field from org_profile subquery, which basically doesn't exist.
It's difficult to give a working query by writting just on the paper, but I think that:
it's worth to apply the handy aliasses for each subquery
deduplication, if required, should be done in the subquery. When multiple fields used DISTINCT may be insufficient - RANK function may be required.
you make some operations to get the logo_url by a scalar subquery - it seems a bit strange, especially the same table is used in JOIN - I would suggest to move all logic related to org_profile to the subquery. Scalar expressions would throw an error in case multiple values would be found in output.
SELECT
org.id,
org.name,
org.type,
org.company_logo,
prof.logo_url,
org.short_description,
prof.value_prop,
count(*) OVER () AS total
FROM org
JOIN (
select distinct org_id, logo_url, value_prop -- other deduplication type (RANK) may be required
from org_profile
where status = 'active' -- ?
) prof ON org.id = prof.org_id
LEFT JOIN user_info usr ON usr.id = org.approved_by
WHERE
org.type = 'Fintech'
AND org.status = 'APPROVED'
AND org.draft != TRUE
ORDER BY org.id DESC

LEFT JOIN not keeping only records that occur in a SELECT query

I have the following SQL select statement that I use to get a subset of products, or wines:
SELECT pv.SkProdVariantId AS id,
pa.Colour AS colour,
FROM Dim.ProductVariant AS pv
JOIN ProductAttributes_new AS pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
The length of this table generated is 3,905. I want to get all the transactional data for these products.
At the moment I'm using this select statement
SELECT c.CalDate AS timestamp,
f.SkProductVariantId AS sku_id,
f.Quantity AS quantity
FROM fact.FTransactions AS f
LEFT JOIN Dim.Calendar AS c
ON f.SkDateId = c.SkDateId
LEFT JOIN (
SELECT pv.SkProdVariantId AS id,
pa.Colour AS colour,
FROM Dim.ProductVariant AS pv
JOIN ProductAttributes_new AS pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
) AS s
ON s.id = f.SkProductVariantId
WHERE c.CalDate LIKE '%2019%'
The calendar dates are correct, but the number of unique products returned is 5,648, rather than the expected 3,905 from the select query.
Why does my LEFT JOIN on the first select query not work as I expect it to, please?
Thanks for any help!
If you want all the rows form your query, it needs to be the first reference in the LEFT JOIN. Then, I am guessing that you want transaction in 2019:
select . . .
from (SELECT pv.SkProdVariantId AS id, pa.Colour AS colour,
FROM Dim.ProductVariant pv JOIN
ProductAttributes_new pa
ON pv.SkProdVariantId = pa.SkProdVariantId
WHERE pv.ProdTypeName = 'Wines'
) s LEFT JOIN
(fact.FTransactions f JOIN
Dim.Calendar c
ON f.SkDateId = c.SkDateId AND
c.CalDate >= '2019-01-01' AND
c.CalDate < '2020-01-01'
)
ON s.id = f.SkProductVariantId;
Note that this assumes that CalDate is really a date and not a string. LIKE should only be used on strings.
You misunderstand somehow how outer joins work. See Gordon's answer and my request comment on that.
As to the task: It seems you want to select transactions of 2019, but you want to restrict your results to wine products. We typically restrict query results in the WHERE clause. You can use IN or EXISTS for that.
SELECT
c.CalDate AS timestamp,
f.SkProductVariantId AS sku_id,
f.Quantity AS quantity
FROM fact.FTransactions AS f
INNER JOIN Dim.Calendar AS c ON f.SkDateId = c.SkDateId
WHERE DATEPART(YEAR, c.CalDate) = 2019
AND f.SkProductVariantId IN
(
SELECT pv.SkProdVariantId
FROM Dim.ProductVariant AS pv
WHERE pv.ProdTypeName = 'Wines'
);
(I've removed the join to ProductAttributes_new, because it doesn't seem to play any part in this query.)

SELECT statement where rows are omitted based on another table

Table with orders has another table with positions. I want the orders table to show but then only have the most up to-date position on it. Below is a picture of the 3 rows I want showing. Omit the rest.
SELECT DispatchTable.ordernumber, DispatchTable.truck,
DispatchTable.driver, DispatchTable.actualpickup,
DispatchTable.actualdropoff, orders.pickupdateandtime,
orders.dropoffdateandtime, Truck002.lastposition,
Truck002.lastdateandtime
FROM DispatchTable
INNER JOIN orders ON DispatchTable.ordernumber = orders.id
INNER JOIN Truck002 ON DispatchTable.truck = Truck002.name
WHERE (orders.status = 'onRoute')
Assuming that you want the row having the latest lastdateandtime for the truck name, this should work:
SELECT DispatchTable.ordernumber,
DispatchTable.truck,
DispatchTable.driver,
DispatchTable.actualpickup,
DispatchTable.actualdropoff,
orders.pickupdateandtime,
orders.dropoffdateandtime,
TruckLatest.lastposition,
TruckLatest.lastdateandtime
FROM DispatchTable
INNER JOIN orders ON DispatchTable.ordernumber = orders.id
INNER JOIN (SELECT name,
lastposition,
lastdateandtime
FROM Truck002 Truck1
WHERE lastdateandtime =
(SELECT MAX(lastdateandtime)
FROM Truck002 Truck2
WHERE Truck2.name = Truck1.name)) TruckLatest
ON DispatchTable.truck = TruckLatest.name
WHERE (orders.status = 'onRoute')
If I understand correctly, you can get the most recent record for a truck using ROW_NUMBER():
SELECT dt.ordernumber, dt.truck,
dt.driver, dt.actualpickup,
dt.actualdropoff, o.pickupdateandtime,
o.dropoffdateandtime, t.lastposition,
t.lastdateandtime
FROM DispatchTable dt INNER JOIN
orders o
ON dt.ordernumber = o.id INNER JOIN
(SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY t.name ORDER BY t.lastdateandtime DESC) as seqnum
FROM Truck002 t
) t
ON dt.truck = t.name
WHERE o.status = 'onRoute' AND seqnum = 1;
Firstly, why are you using Truck002's name field rather than its id field as the link to DispacthTable? This is considered a less efficient way of doing it than using id (which is either a numerical field or a shorter string than name).
Secondly, you should mention in your Question that each Order can have many DispatchTable's and that each DispacthTable can have many Truck002's, otherwise many people will start by assuming that it is the other way round between DispatchTable and Truck002.
Thirdly, please try...
SELECT DispatchTable.ordernumber,
DispatchTable.truck,
DispatchTable.driver,
DispatchTable.actualpickup,
DispatchTable.actualdropoff,
orders.pickupdateandtime,
orders.dropoffdateandtime,
Truck002.lastposition,
Truck002.lastdateandtime
FROM DispatchTable
INNER JOIN orders ON DispatchTable.ordernumber = orders.id
INNER JOIN Truck002 ON DispatchTable.truck = Truck002.name
WHERE (orders.status = 'onRoute')
GROUP BY ordernumber
HAVING lastdateandtime = MAX( lastdateandtime )
If you have any questions or comments, then please feel free to post a Comment accordingly.
Further Reading
https://msdn.microsoft.com/en-us/library/bb177906(v=office.12).aspx (on HAVING)
https://www.w3schools.com/sql/sql_having.asp (on HAVING)
https://msdn.microsoft.com/en-us/library/bb177905(v=office.12).aspx (on GROUP BY)
https://www.w3schools.com/sql/sql_groupby.asp (on GROUP BY)

SELECT DISTINCT count() in Microsoft Access

I've created a database where we can track bugs we have raised with our developers (Table: ApplixCalls) and track any correspondence related to the logged bugs (Table: Correspondence).
I'm trying to create a count where we can see the number of bugs which have no correspondence or only correspondence from us. This should give us the visibility to see where we should be chasing our developers for updates etc.
So far I have this SQL:
SELECT DISTINCT Count(ApplixCalls.OurRef) AS CountOfOurRef
FROM ApplixCalls LEFT JOIN Correspondence ON ApplixCalls.OurRef = Correspondence.OurRef
HAVING (((Correspondence.OurRef) Is Null)
AND ((ApplixCalls.Position)<>'Closed'))
OR ((ApplixCalls.Position)<>'Closed')
AND ((Correspondence.[SBSUpdate?])=True);
I'm finding that this part is counting every occasion we have sent an update, when I need it to count 1 where OurRef is unique and it only has updates from us:
OR ((ApplixCalls.Position)<>'Closed')
AND ((Correspondence.[SBSUpdate?])=True);
Hopefully that makes sense...
Is there a way around this?
MS Access does not support count(distinct). In your case, you can use a subquery. In addition, your query should not work. Perhaps this is what you intend:
SELECT COUNT(*)
FROM (SELECT ApplixCalls.OurRef
FROM ApplixCalls LEFT JOIN
Correspondence
ON ApplixCalls.OurRef = Correspondence.OurRef
WHERE (((orrespondence.OurRef Is Null) AND (ApplixCalls.Position) <> 'Closed')) OR
(ApplixCalls.Position <> 'Closed') AND (Correspondence.[SBSUpdate?] = True))
)
GROUP BY ApplixCalls.OurRef
) as x;
Modifications:
You have a HAVING clause with no GROUP BY. I think this should be a WHERE (although I am not 100% sure of the logic you intend).
The SELECT DISTINCT is replaced by SELECT . . . GROUP BY.
The COUNT(DISTINCT) is now COUNT(*) with a subquery.
EDIT:
Based on the description in your comments:
SELECT COUNT(*)
FROM (SELECT ApplixCalls.OurRef
FROM ApplixCalls LEFT JOIN
Correspondence
ON ApplixCalls.OurRef = Correspondence.OurRef
WHERE (((orrespondence.OurRef Is Null) AND (ApplixCalls.Position) <> 'Closed')) OR
(ApplixCalls.Position <> 'Closed') AND (Correspondence.[SBSUpdate?] = True))
)
GROUP BY ApplixCalls.OurRef
HAVING SUM(IIF(Correspondence.[SBSUpdate?] = False, 1, 0)) = 0
) as x;
I can not understand why are you using having clause. I hope this query will fullfill youe need.
SELECT DISTINCT Count(ApplixCalls.OurRef) AS CountOfOurRef
FROM ApplixCalls LEFT JOIN Correspondence ON ApplixCalls.OurRef = Correspondence.OurRef
HAVING (((Correspondence.OurRef) Is Null)
AND ((ApplixCalls.Position)<>'Closed'))
OR ((ApplixCalls.Position)<>'Closed')
AND ((Correspondence.[SBSUpdate?])=True);
If you are counting all the element that respond to you condition you don't need DISTINCT .. distinct if for removing duplicate result
SELECT Count(distinct ApplixCalls.OurRef) AS CountOfOurRef
FROM ApplixCalls LEFT JOIN Correspondence ON ApplixCalls.OurRef = Correspondence.OurRef
WHERE (((Correspondence.OurRef) Is Null)
AND ((ApplixCalls.Position)<>'Closed'))
OR ((ApplixCalls.Position)<>'Closed')
AND ((Correspondence.[SBSUpdate?])=True);

Optimizing sql condition to apply condition to all dependent rows

I have the following query, split up into a view for readability:
CREATE TEMPORARY VIEW task_depcount AS
SELECT
t.*,
COUNT(p.id) AS unfinished_dep_count
FROM
task t
LEFT JOIN taskdependency d on t.id = d.task_id
LEFT JOIN task p on d.parent_task_id = p.id and p.status != 'SUCCESS'
GROUP BY t.id;
SELECT t.id, t.task_type, t.status
FROM task_depcount t
WHERE t.status = 'READY' AND t.unfinished_dep_count = 0;
Now If we're looking at the EXPLAIN ANALYZE output, this is obviously very inefficient, as we cannot really do index scans over a COUNT() result. Rewriting into a single query with HAVING would also not improve it.
So here's the question: Is there a way to write this query so that the database isn't forced to do sequence scans all over? Database is PostgreSQL 9.2, with no option to upgrade to newer versions.
Or, to state the intended result in plain english: I need all the tasks where either all it's dependencies are of status "success", or there are no dependencies at all.
You can use not exists:
SELECT t.*
FROM task t
WHERE NOT EXISTS (SELECT 1
FROM taskdependency d JOIN
task p
ON d.parent_task_id = p.id
WHERE t.id = d.task_id AND p.status <> 'SUCCESS'
);
With the right indexes, this should be much faster.
The use of an aggregation function such as COUNT() -- whether in a view, subquery, or CTE -- requires processing all the data. With NOT EXISTS, the processing can stop for each at the first unsuccessful one (if any) and not have to do any aggregation.
create temporary view task_depcount as
select t.*
from
task t
left join
taskdependency d on t.id = d.task_id
left join
task p on d.parent_task_id = p.id
group by t.id
having not bool_or(p.status != success) or not bool_or(d.task_id is not null)
;
select t.id, t.task_type, t.status
from task_depcount t
where t.status = 'READY'