SELECT DISTINCT count() in Microsoft Access - sql

I've created a database where we can track bugs we have raised with our developers (Table: ApplixCalls) and track any correspondence related to the logged bugs (Table: Correspondence).
I'm trying to create a count where we can see the number of bugs which have no correspondence or only correspondence from us. This should give us the visibility to see where we should be chasing our developers for updates etc.
So far I have this SQL:
SELECT DISTINCT Count(ApplixCalls.OurRef) AS CountOfOurRef
FROM ApplixCalls LEFT JOIN Correspondence ON ApplixCalls.OurRef = Correspondence.OurRef
HAVING (((Correspondence.OurRef) Is Null)
AND ((ApplixCalls.Position)<>'Closed'))
OR ((ApplixCalls.Position)<>'Closed')
AND ((Correspondence.[SBSUpdate?])=True);
I'm finding that this part is counting every occasion we have sent an update, when I need it to count 1 where OurRef is unique and it only has updates from us:
OR ((ApplixCalls.Position)<>'Closed')
AND ((Correspondence.[SBSUpdate?])=True);
Hopefully that makes sense...
Is there a way around this?

MS Access does not support count(distinct). In your case, you can use a subquery. In addition, your query should not work. Perhaps this is what you intend:
SELECT COUNT(*)
FROM (SELECT ApplixCalls.OurRef
FROM ApplixCalls LEFT JOIN
Correspondence
ON ApplixCalls.OurRef = Correspondence.OurRef
WHERE (((orrespondence.OurRef Is Null) AND (ApplixCalls.Position) <> 'Closed')) OR
(ApplixCalls.Position <> 'Closed') AND (Correspondence.[SBSUpdate?] = True))
)
GROUP BY ApplixCalls.OurRef
) as x;
Modifications:
You have a HAVING clause with no GROUP BY. I think this should be a WHERE (although I am not 100% sure of the logic you intend).
The SELECT DISTINCT is replaced by SELECT . . . GROUP BY.
The COUNT(DISTINCT) is now COUNT(*) with a subquery.
EDIT:
Based on the description in your comments:
SELECT COUNT(*)
FROM (SELECT ApplixCalls.OurRef
FROM ApplixCalls LEFT JOIN
Correspondence
ON ApplixCalls.OurRef = Correspondence.OurRef
WHERE (((orrespondence.OurRef Is Null) AND (ApplixCalls.Position) <> 'Closed')) OR
(ApplixCalls.Position <> 'Closed') AND (Correspondence.[SBSUpdate?] = True))
)
GROUP BY ApplixCalls.OurRef
HAVING SUM(IIF(Correspondence.[SBSUpdate?] = False, 1, 0)) = 0
) as x;

I can not understand why are you using having clause. I hope this query will fullfill youe need.
SELECT DISTINCT Count(ApplixCalls.OurRef) AS CountOfOurRef
FROM ApplixCalls LEFT JOIN Correspondence ON ApplixCalls.OurRef = Correspondence.OurRef
HAVING (((Correspondence.OurRef) Is Null)
AND ((ApplixCalls.Position)<>'Closed'))
OR ((ApplixCalls.Position)<>'Closed')
AND ((Correspondence.[SBSUpdate?])=True);

If you are counting all the element that respond to you condition you don't need DISTINCT .. distinct if for removing duplicate result
SELECT Count(distinct ApplixCalls.OurRef) AS CountOfOurRef
FROM ApplixCalls LEFT JOIN Correspondence ON ApplixCalls.OurRef = Correspondence.OurRef
WHERE (((Correspondence.OurRef) Is Null)
AND ((ApplixCalls.Position)<>'Closed'))
OR ((ApplixCalls.Position)<>'Closed')
AND ((Correspondence.[SBSUpdate?])=True);

Related

postgres: LEFT JOIN table and field does not exist

This is my query
SELECT org.id,
org.name,
org.type,
org.company_logo,
(SELECT org_profile.logo_url FROM org_profile WHERE org_profile.org_id = org.id AND org_profile.status = 'active' ORDER BY org_profile.id DESC LIMIT 1) as logo_url,
org.short_description,
org_profile.value_prop,
count(*) OVER () AS total
FROM org
LEFT JOIN user_info ON user_info.id = org.approved_by
INNER JOIN (select distinct org_profile.org_id from org_profile) org_profile ON org_profile.org_id = org.id
WHERE
org.type = 'Fintech'
AND org.status = 'APPROVED'
AND org.draft != TRUE
ORDER BY org.id DESC
I am using LEFT JOIN query with my org_profile table. I used distinct for unique org id but the problem is org_profile.value_prop column does not work. The error is showing column org_profile.value_prop does not exist
I'm trying to solve this issue. But I don't figure out this issue.
basically, the error informs that you try to get the value_prop field from org_profile subquery, which basically doesn't exist.
It's difficult to give a working query by writting just on the paper, but I think that:
it's worth to apply the handy aliasses for each subquery
deduplication, if required, should be done in the subquery. When multiple fields used DISTINCT may be insufficient - RANK function may be required.
you make some operations to get the logo_url by a scalar subquery - it seems a bit strange, especially the same table is used in JOIN - I would suggest to move all logic related to org_profile to the subquery. Scalar expressions would throw an error in case multiple values would be found in output.
SELECT
org.id,
org.name,
org.type,
org.company_logo,
prof.logo_url,
org.short_description,
prof.value_prop,
count(*) OVER () AS total
FROM org
JOIN (
select distinct org_id, logo_url, value_prop -- other deduplication type (RANK) may be required
from org_profile
where status = 'active' -- ?
) prof ON org.id = prof.org_id
LEFT JOIN user_info usr ON usr.id = org.approved_by
WHERE
org.type = 'Fintech'
AND org.status = 'APPROVED'
AND org.draft != TRUE
ORDER BY org.id DESC

Subquery returning more than one result

I am still fairly new to SQL and the stored procedure I recently created keeps telling me that a subquery is returning more than one result but I can't figure out which one is the problem. If anyone has a moment and can tell me what I am missing, I would greatly appreciate it!
Thanks!
SELECT DISTINCT a.customer_no [id],
x.esal1_desc [constituent],
a.perf [activity],
a.sp_act_dt [activity_date],
c.description[activity_type],
d.display_name_tiny [solicitor],
s.description [status],
ISNULL(a.num_attendees,0)[attending],
a.notes [notes],
e.address [email]
FROM [dbo].t_special_activity a
left outer join [dbo].tr_special_activity_status s ON s.id = a.status
left outer join [dbo].tr_special_activity c ON c.id = a.sp_act
left outer JOIN [dbo].FT_CONSTITUENT_DISPLAY_NAME() d ON a.worker_customer_no = d.customer_no
left outer JOIN [dbo].T_EADDRESS e on a.customer_no=e.customer_no and primary_ind='Y'
left outer JOIN [dbo].TX_CUST_SAL x on a.customer_no=x.customer_no and default_ind='Y'
WHERE a.status IN (ISNULL(#status, (SELECT DISTINCT id FROM TR_SPECIAL_ACTIVITY_STATUS)))
AND a.sp_act_dt BETWEEN (ISNULL(#activity_start,(SELECT MIN(sp_act_dt) FROM T_SPECIAL_ACTIVITY)))
AND (ISNULL(#activity_end,(SELECT MAX(sp_act_dt) FROM T_SPECIAL_ACTIVITY)))
AND ((ISNULL(#list,0) = 0) OR EXISTS (SELECT customer_no FROM T_LIST_CONTENTS lc WITH (NOLOCK)
WHERE a.customer_no = lc.customer_no and lc.list_no = #list))
Alas, you cannot use this expression:
WHERE a.status IN (ISNULL(#status, (SELECT DISTINCT id FROM TR_SPECIAL_ACTIVITY_STATUS)))
The subquery is in a place where a single value is expected. In any case, I think you want:
WHERE #status IS NULL OR
a.status IN (SELECT id FROM TR_SPECIAL_ACTIVITY_STATUS)
Note that select distinct is irrelevant in an IN clause. At best it does nothing; at worst it impedes the optimizer.
I realize this is a little confusing. You are thinking that IN takes a list -- and the list could even be a subquery. But, the elements of the list are scalars not lists. So, when a subquery is an element of the list, then it is assumed to be a single value.

Refactoring slow SQL query

I currently have this very very slow query:
SELECT generators.id AS generator_id, COUNT(*) AS cnt
FROM generator_rows
JOIN generators ON generators.id = generator_rows.generator_id
WHERE
generators.id IN (SELECT "generators"."id" FROM "generators" WHERE "generators"."client_id" = 5212 AND ("generators"."state" IN ('enabled'))) AND
(
generators.single_use = 'f' OR generators.single_use IS NULL OR
generator_rows.id NOT IN (SELECT run_generator_rows.generator_row_id FROM run_generator_rows)
)
GROUP BY generators.id;
An I'm trying to refactor it/improve it with this query:
SELECT g.id AS generator_id, COUNT(*) AS cnt
from generator_rows gr
join generators g on g.id = gr.generator_id
join lateral(select case when exists(select * from run_generator_rows rgr where rgr.generator_row_id = gr.id) then 0 else 1 end as noRows) has on true
where g.client_id = 5212 and "g"."state" IN ('enabled') AND
(g.single_use = 'f' OR g.single_use IS NULL OR has.norows = 1)
group by g.id
For reason it doesn't quite work as expected(It returns 0 rows). I think I'm pretty close to the end result but can't get it to work.
I'm running on PostgreSQL 9.6.1.
This appears to be the query, formatted so I can read it:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id
WHERE gr.generators_id IN (SELECT g.id
FROM generators g
WHERE g.client_id = 5212 AND
g.state = 'enabled'
) AND
(g.single_use = 'f' OR
g.single_use IS NULL OR
gr.id NOT IN (SELECT rgr.generator_row_id FROM run_generator_rows rgr)
)
GROUP BY gr.generators_id;
I would be inclined to do most of this work in the FROM clause:
SELECT gr.generators_id, COUNT(*) AS cnt
FROM generators g JOIN
generator_rows gr
ON g.id = gr.generator_id JOIN
generators gg
on g.id = gg.id AND
gg.client_id = 5212 AND gg.state = 'enabled' LEFT JOIN
run_generator_rows rgr
ON g.id = rgr.generator_row_id
WHERE g.single_use = 'f' OR
g.single_use IS NULL OR
rgr.generator_row_id IS NULL
GROUP BY gr.generators_id;
This does make two assumptions that I think are reasonable:
generators.id is unique
run_generator_rows.generator_row_id is unique
(It is easy to avoid these assumptions, but the duplicate elimination is more work.)
Then, some indexes could help:
generators(client_id, state, id)
run_generator_rows(id)
generator_rows(generators_id)
Generally avoid inner selects as in
WHERE ... IN (SELECT ...)
as they are usually slow.
As it was already shown for your problem it's a good idea to think of SQL as of set- theory.
You do NOT join tables on their sole identity:
In fact you take (SQL does take) the set (- that is: all rows) of the first table and "multiply" it with the set of the second table - thus ending up with n times m rows.
Then the ON- clause is used to (often strongly) reduce the result by simply selecting each one of those many combinations by evaluating this portion to either true (take) or false (drop). This way you can chose any arbitrary logic to select those combinations in favor.
Things get trickier with LEFT JOIN and RIGHT JOIN, but one can easily think of them as to take one side for granted:
output the combinations of that row IF the logic yields true (once at least) - exactly like JOIN does
output exactly ONE row, with 'the other side' (right side on LEFT JOIN and vice versa) consisting of ALL NULL for every column.
Count(*) is great either, but if things getting complicated don't stick to it: Use Sub- Selects for the keys only, and once all the hard word is done join the Fun- Stuff to it. Like in
SELECT SUM(VALID), ID
FROM SELECT
(
(1 IF X 0 ELSE) AS VALID, ID
FROM ...
)
GROUP BY ID) AS sub
JOIN ... AS details ON sub.id = details.id
Difference is: The inner query is executed only once. The outer query does usually have no indices left to work with and will be slow, but if the inner select here doesn't make the data explode this is usually many times faster than SELECT ... WHERE ... IN (SELECT..) constructs.

Optimizing sql condition to apply condition to all dependent rows

I have the following query, split up into a view for readability:
CREATE TEMPORARY VIEW task_depcount AS
SELECT
t.*,
COUNT(p.id) AS unfinished_dep_count
FROM
task t
LEFT JOIN taskdependency d on t.id = d.task_id
LEFT JOIN task p on d.parent_task_id = p.id and p.status != 'SUCCESS'
GROUP BY t.id;
SELECT t.id, t.task_type, t.status
FROM task_depcount t
WHERE t.status = 'READY' AND t.unfinished_dep_count = 0;
Now If we're looking at the EXPLAIN ANALYZE output, this is obviously very inefficient, as we cannot really do index scans over a COUNT() result. Rewriting into a single query with HAVING would also not improve it.
So here's the question: Is there a way to write this query so that the database isn't forced to do sequence scans all over? Database is PostgreSQL 9.2, with no option to upgrade to newer versions.
Or, to state the intended result in plain english: I need all the tasks where either all it's dependencies are of status "success", or there are no dependencies at all.
You can use not exists:
SELECT t.*
FROM task t
WHERE NOT EXISTS (SELECT 1
FROM taskdependency d JOIN
task p
ON d.parent_task_id = p.id
WHERE t.id = d.task_id AND p.status <> 'SUCCESS'
);
With the right indexes, this should be much faster.
The use of an aggregation function such as COUNT() -- whether in a view, subquery, or CTE -- requires processing all the data. With NOT EXISTS, the processing can stop for each at the first unsuccessful one (if any) and not have to do any aggregation.
create temporary view task_depcount as
select t.*
from
task t
left join
taskdependency d on t.id = d.task_id
left join
task p on d.parent_task_id = p.id
group by t.id
having not bool_or(p.status != success) or not bool_or(d.task_id is not null)
;
select t.id, t.task_type, t.status
from task_depcount t
where t.status = 'READY'

"Simple" SQL Query

Each of my clients can have many todo items and every todo item has a due date.
What would be the query for discovering the next undone todo item by due date for each file? In the event that a client has more than one todo, the one with the lowest id is the correct one.
Assuming the following minimal schema:
clients (id, name)
todos (id, client_id, description, timestamp_due, timestamp_completed)
Thank you.
I haven't tested this yet, so you may have to tweak it:
SELECT
TD1.client_id,
TD1.id,
TD1.description,
TD1.timestamp_due
FROM
Todos TD1
LEFT OUTER JOIN Todos TD2 ON
TD2.client_id = TD1.client_id AND
TD2.timestamp_completed IS NULL AND
(
TD2.timestamp_due < TD1.timestamp_due OR
(TD2.timestamp_due = TD1.timestamp_due AND TD2.id < TD1.id)
)
WHERE
TD2.id IS NULL
Instead of trying to sort and aggregate, you're basically answering the question, "Is there any other todo that would come before this one?" (based on your definition of "before"). If not, then this is the one that you want.
This should be valid on most SQL platforms.
This question is the classic pick-a-winner for each group. It gets posted about twice a day.
SELECT *
FROM todos t
WHERE t.timestamp_completed is null
and
(
SELECT top 1 t2.id
FROM todos t2
WHERE t.client_id = t2.client_id
and t2.timestamp_completed is null
--there is no earlier record
and
(t.timestamp_due > t2.timestamp_due
or (t.timestamp_due = t2.timestamp_due and t.id > t2.id)
)
) is null
SELECT c.name, MIN(t.id)
FROM clients c, todos t
WHERE c.id = t.client_id AND t.timestamp_complete IS NULL
GROUP BY c.id
HAVING t.timestamp_due <= MIN(t.timestamp_due)
Avoids a subquery, correlated or otherwise but introduces a bunch of aggregate operations which aren't much better.
Some Jet SQL, I realize it is unlikely that the questioner is using Jet, however the reader may be.
SELECT c.name, t.description, t.timestamp_due
FROM (clients c
INNER JOIN
(SELECT t.client_id, Min(t.id) AS MinOfid
FROM todos t
WHERE t.timestamp_completed Is Null
GROUP BY t.client_id) AS tm
ON c.id = tm.client_id)
INNER JOIN todos t ON tm.MinOfid = t.id
The following should get you close, first get the min time for each client, then lookup the client/todo information
SELECT
C.Id,
C.Name,
T.Id
T.Description,
T.timestamp_due
FROM
{
SELECT
client_id,
MIN(timestamp_due) AS "DueDate"
FROM todos
WHERE timestamp_completed IS NULL
GROUP BY ClientId
} AS MinValues
INNER JOIN Clients C
ON (MinValues.client_id = C.Id)
INNER JOIN todos T
ON (MinValues.client_id = T.client_id
AND MinValues.DueDate = T.timestamp_due)
ORDER BY C.Name
NOTE: Written assuming SQL Server