Access sql to retrieve counts of values meeting a condition - sql

I'm trying to write a query in Access that will return a count of values for each site in a table where the value exceeds a specified level, but also, for sites that have no values exceeding that level, return a specified value, such as "NA".
I've tried Iif, Switch, Union, sub queries, querying a different query, but no luck. I can get all the counts exceeding the level, or all sites with "NA" correct but showing total count for the rest, not just count above the level.
For example, in the table below, assuming level > 10, Houston = "NA", Detroit = 2, Pittsburgh PA = 3. I just can't get both sides of the query to work.
Apologize in advance for poor formatting.
+-----------------+-------+
| 1. Site | Value |
+-----------------+-------+
| 2. Houston | 10 |
| 3. Houston | 3 |
| 4. Houston | 0 |
| 5. Detroit | 15 |
| 6. Detroit | 7 |
| 7. Detroit | 4 |
| 8. Detroit | 12 |
| 9. Pittsburgh | 23 |
| 10. Pittsburgh | 2 |
| 11. Pittsburgh | 18 |
| 12. Pittsburgh | 12 |
+-----------------+-------+

Another solution is to use conditional aggregation, as follows :
SELECT site, SUM(IIf(value > 10, 1, 0)) AS value
FROM mytable
GROUP BY site
This approach should be more efficient than self-joining the table, since it requires to scan the table only once.
The SUM(IIf ...) is a handy construct to count how many records satisfy a given condition.
NB : it is generally not a good idea to return two different data types in the same column (in your use case, either a number or string 'NA'). Most RDBMS do not allow that. So I provided a query that will return 0 when there are not matches, instead of NA. If you really want 'NA', you can try :
IIF(
SUM(IIf(value > 10, 1, 0)) = 0,
'NA',
STR(SUM(IIf(value > 10, 1, 0)))
) AS value
This demo on DB Fiddle, with your sample data returns :
site | value
:--------- | ----:
Detroit | 2
Houston | 0
Pittsburgh | 3

Get a list of all sites independant of the counts (SiteList derived table below)
LEFT Join this back to your base table (SiteValues) to get the counts for each site where it's meeting threshold. --note should join on key which I'm not sure what is for this table. site alone isn't enough
Count the values from the siteValues dataset as NULL's will get counted as 0.
WORKING DEMO:
.
SELECT SiteList.Site, Count(Sitevalues.Site)
FROM (SELECT site, value
FROM TableName) SiteList
LEFT JOIN TableName SiteValues
on SiteList.Site = SiteValues.Site
and SiteValues.Value > 10
and SiteValues.Value = SiteList.value
GROUP BY SiteList.Site
GIVING US:
+----+------------+------------------+
| | Site | (No column name) |
+----+------------+------------------+
| 1 | Detroit | 2 |
| 2 | Houston | 0 |
| 3 | Pittsburgh | 3 |
+----+------------+------------------+
Or if you need the NA you have to cast the count to a varchar
SELECT SiteList.Site, case when Count(Sitevalues.Site) = 0 then 'NA' else cast(count(Sitevalues.site) as varchar(10)) end as SitesMeetingThreshold
FROM (SELECT site, value
FROM TableName) SiteList
LEFT JOIN TableName SiteValues
on SiteList.Site = SiteValues.Site
and SiteValues.Value > 10
and SiteValues.Value = SiteList.value
GROUP BY SiteList.Site

Just use conditional aggregation:
select site,
max(iif(value > 10, 1, 0)) as cnt_11plus
from t
group by site;
I think 0 is better than N/A. But if you want that you'll need to convert the results to a string.
select site,
iif(max(iif(value > 10, 1, 0)) > 0,
str(max(iif(value > 10, 1, 0))),
"N/A"
) as cnt_11plus
from t
group by site;

You can use UNION like this:
SELECT site, count(value) AS counter
FROM sites
WHERE value > 10
GROUP BY site
UNION
SELECT s.site, 'NA' AS counter
FROM sites AS s
WHERE value <= 10
AND NOT EXISTS (
SELECT 1 FROM sites WHERE site = s.site AND value > 10
)
GROUP BY site
Results:
site counter
Detroit 2
Houston NA
Pittsburgh 3
There is no need to convert the integer counter to Text, because Access does this implicitly for you.

Related

Calculate overall percentage of Access Query

I have an MS Access Query which returns the following sample data:
+-----+------+------+
| Ref | ANS1 | ANS2 |
+-----+------+------+
| 123 | A | A |
| 234 | B | B |
| 345 | C | C |
| 456 | D | E |
| 567 | F | G |
| 678 | H | I |
+-----+------+------+
Is it possible to have Access return the overall percentage where ANS1 = ANS2?
So my new query would return:
50
I know how to get a count of the records returned by the original query, but not how to calculate the percentage.
Since you're looking for a percentage of some condition being met across the entire dataset, the task can be reduced to having a function return either 1 (when the condition is validated), or 0 (when the condition is not validated), and then calculating an average across all records.
This could be achieved in a number of ways, one example might be to use a basic iif statement:
select avg(iif(t.ans1=t.ans2,1,0)) from YourTable t
Or, using the knowledge that a boolean value in MS Access is represented using -1 (True) or 0 (False), the expression can be reduced to:
select -avg(t.ans1=t.ans2) from YourTable t
In each of the above, change YourTable to the name of your table.
If you know how to get a count, then apply that same knowledge twice:
SELECT Count([ANS1]) As MatchCount FROM [Data]
WHERE [ANS1] = [ANS2]
divided by the total count
SELECT Count([ANS1]) As AllCount FROM [Data]
To combine both of these in a basic SQL query, one needs a "dummy" query since Access doesn't allow selection of only raw data:
SELECT TOP 1
((SELECT Count([ANS1]) As MatchCount FROM [Data] WHERE [ANS1] = [ANS2])
/
(SELECT Count([ANS1]) As AllCount FROM [Data]))
AS MatchPercent
FROM [Data]
This of course assumes that there is at least one row... so it doesn't divide by zero.

SQL Query to return a distinct count of one column while allowing a full summation of a second column, grouped by a third

I'm writing a query in access 2010 and i can't use count(distinct... so I'm running into a bit of trouble with what can be found below:
An example of my table is as follows
Provider | Member ID | Dollars | Status
FacilityA | 1001 | 50 | Pended
FacilityA | 1001 | 100 | Paid
FacilityA | 1002 | 200 | Paid
FacilityB | 1005 | 30 | Pended
FacilityB | 1009 | 90 | Pended
FacilityC | 1001 | 100 | Paid
FacilityC | 1008 | 500 | Paid
I want to return the total # of unique members that have visited each facility, but I also want to get the total dollar amount that is Pended, so for this example the ideal output would be
Provider | # members | Total Pended charges
FacilityA | 2 | 50
FacilityB | 2 | 120
FacilityC | 2 | 0
I tried using some code I found here: Count Distinct in a Group By aggregate function in Access 2007 SQL
and here:
SQL: Count distinct values from one column based on multiple criteria in other columns
Copying the code from the first link provided by gzaxx:
SELECT cd.DiagCode, Count(cd.CustomerID)
FROM (select distinct DiagCode, CustomerID from CustomerTable) as cd
Group By cd.DiagCode;
I can make this work for counting the members:
SELECT cd.Provider_Number, Count(cd.Member_ID)
FROM (select distinct Provider_Number, Member_ID from Claims_Table) as cd
ON claims_table.Provider_Number=cd.Provider_Number
Group By cd.Provider_Number;
However, no matter what I try I can't get a second portion dealing with the dollars to work without causing an error or messing up the calculation on the member count.
SELECT cd.Provider_Number,
-- claims_table.Member_ID, claims_table.Dollars
SUM(IIF ( Claims_Table.Status = 'Pended' , Claims_Table.Dollars , 0 )) as Dollars_Pending,
Count(cd.Member_ID) as Uniq_Members,
Sum(Dollars) as Dollar_Wrong
FROM (select distinct Provider_Number, Member_ID from Claims_Table) as cd inner join #claims_table
ON claims_table.Provider_Number=cd.Provider_Number and claims_table.Member_ID = cd.Member_ID
Group By cd.Provider_Number;
This should work fine based only on the table you described (named Tabelle1):
SELECT Provider, count(MemberID) as [# Members],
NZ(SUM(SWITCH([Status]='Pended', Dollars)),0) as [Total pending charges]
FROM Tabelle1
GROUP BY Provider;
Explanation
I think the first and second column are self-explanatory.
The third column is where most things are done. The SWITCH([Status]='Pended', Dollars) returns the Dollars only if the status is pending. This then gets summed up by SUM. The NZ(..,0) will set the column to 0 if the SUM returns a NULL.
EDIT: This was tested on Access 2016

Select the difference of two consecutive columns

I have a table car that looks like this:
| mileage | carid |
------------------
| 30 | 1 |
| 50 | 1 |
| 100 | 1 |
| 0 | 2 |
| 70 | 2 |
I would like to get the average difference for each car. So for example for car 1 I would like to get ((50-30)+(100-50))/2 = 35. So I created the following query
SELECT AVG(diff),carid FROM (
SELECT (mileage-
(SELECT Max(mileage) FROM car Where mileage<mileage AND carid=carid GROUP BY carid))
AS diff,carid
FROM car GROUP BY carid)
But this doesn't work as I'm not able to use current row for the other column. And I'm quite clueless on how to actually solve this in a different way.
So how would I be able to obtain the value of the next row somehow?
The average difference is the maximum minus he minimum divided by one less than the count (you can do the arithmetic to convince yourself this is true).
Hence:
select carid,
( (max(mileage) - min(mileage)) / nullif(count(*) - 1, 0)) as avg_diff
from cars
group by carid;

Find spectators that have seen the same shows (match multiple rows for each)

For an assignment I have to write several SQL queries for a database stored in a PostgreSQL server running PostgreSQL 9.3.0. However, I find myself blocked with last query. The database models a reservation system for an opera house. The query is about associating the a spectator the other spectators that assist to the same events every time.
The model looks like this:
Reservations table
id_res | create_date | tickets_presented | id_show | id_spectator | price | category
-------+---------------------+---------------------+---------+--------------+-------+----------
1 | 2015-08-05 17:45:03 | | 1 | 1 | 195 | 1
2 | 2014-03-15 14:51:08 | 2014-11-30 14:17:00 | 11 | 1 | 150 | 2
Spectators table
id_spectator | last_name | first_name | email | create_time | age
---------------+------------+------------+----------------------------------------+---------------------+-----
1 | gonzalez | colin | colin.gonzalez#gmail.com | 2014-03-15 14:21:30 | 22
2 | bequet | camille | bequet.camille#gmail.com | 2014-12-10 15:22:31 | 22
Shows table
id_show | name | kind | presentation_date | start_time | end_time | id_season | capacity_cat1 | capacity_cat2 | capacity_cat3 | price_cat1 | price_cat2 | price_cat3
---------+------------------------+--------+-------------------+------------+----------+-----------+---------------+---------------+---------------+------------+------------+------------
1 | madama butterfly | opera | 2015-09-05 | 19:30:00 | 21:30:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
2 | don giovanni | opera | 2015-09-12 | 19:30:00 | 21:45:00 | 2 | 315 | 630 | 945 | 195 | 150 | 100
So far I've started by writing a query to get the id of the spectator and the date of the show he's attending to, the query looks like this.
SELECT Reservations.id_spectator, Shows.presentation_date
FROM Reservations
LEFT JOIN Shows ON Reservations.id_show = Shows.id_show;
Could someone help me understand better the problem and hint me towards finding a solution. Thanks in advance.
So the result I'm expecting should be something like this
id_spectator | other_id_spectators
-------------+--------------------
1| 2,3
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
Note based on comments: Wanted to make clear that this answer may be of limited use as it was answered in the context of SQL-Server (tag was present at the time)
There is probably a better way to do it, but you could do it with the 'stuff 'function. The only drawback here is that, since your ids are ints, placing a comma between values will involve a work around (would need to be a string). Below is the method I can think of using a work around.
SELECT [id_spectator], [id_show]
, STUFF((SELECT ',' + CAST(A.[id_spectator] as NVARCHAR(10))
FROM reservations A
Where A.[id_show]=B.[id_show] AND a.[id_spectator] != b.[id_spectator] FOR XML PATH('')),1,1,'') As [other_id_spectators]
From reservations B
Group By [id_spectator], [id_show]
This will show you all other spectators that attended the same shows.
Meaning that every time spectator with id 1 went to a show, spectators 2 and 3 did too.
In other words, you want a list of ...
all spectators that have seen all the shows that a given spectator has seen (and possibly more than the given one)
This is a special case of relational division. We have assembled an arsenal of basic techniques here:
How to filter SQL results in a has-many-through relation
It is special because the list of shows each spectator has to have attended is dynamically determined by the given prime spectator.
Assuming that (d_spectator, id_show) is unique in reservations, which has not been clarified.
A UNIQUE constraint on those two columns (in that order) also provides the most important index.
For best performance in query 2 and 3 below also create an index with leading id_show.
1. Brute force
The primitive approach would be to form a sorted array of shows the given user has seen and compare the same array of others:
SELECT 1 AS id_spectator, array_agg(sub.id_spectator) AS id_other_spectators
FROM (
SELECT id_spectator
FROM reservations r
WHERE id_spectator <> 1
GROUP BY 1
HAVING array_agg(id_show ORDER BY id_show)
#> (SELECT array_agg(id_show ORDER BY id_show)
FROM reservations
WHERE id_spectator = 1)
) sub;
But this is potentially very expensive for big tables. The whole table hast to be processes, and in a rather expensive way, too.
2. Smarter
Use a CTE to determine relevant shows, then only consider those
WITH shows AS ( -- all shows of id 1; 1 row per show
SELECT id_spectator, id_show
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
)
SELECT sub.id_spectator, array_agg(sub.other) AS id_other_spectators
FROM (
SELECT s.id_spectator, r.id_spectator AS other
FROM shows s
JOIN reservations r USING (id_show)
WHERE r.id_spectator <> s.id_spectator
GROUP BY 1,2
HAVING count(*) = (SELECT count(*) FROM shows)
) sub
GROUP BY 1;
#> is the "contains2 operator for arrays - so we get all spectators that have at least seen the same shows.
Faster than 1. because only relevant shows are considered.
3. Real smart
To also exclude spectators that are not going to qualify early from the query, use a recursive CTE:
WITH RECURSIVE shows AS ( -- produces exactly 1 row
SELECT id_spectator, array_agg(id_show) AS shows, count(*) AS ct
FROM reservations
WHERE id_spectator = 1 -- your prime spectator here
GROUP BY 1
)
, cte AS (
SELECT r.id_spectator, 1 AS idx
FROM shows s
JOIN reservations r ON r.id_show = s.shows[1]
WHERE r.id_spectator <> s.id_spectator
UNION ALL
SELECT r.id_spectator, idx + 1
FROM cte c
JOIN reservations r USING (id_spectator)
JOIN shows s ON s.shows[c.idx + 1] = r.id_show
)
SELECT s.id_spectator, array_agg(c.id_spectator) AS id_other_spectators
FROM shows s
JOIN cte c ON c.idx = s.ct -- has an entry for every show
GROUP BY 1;
Note that the first CTE is non-recursive. Only the second part is recursive (iterative really).
This should be fastest for small selections from big tables. Row that don't qualify are excluded early. the two indices I mentioned are essential.
SQL Fiddle demonstrating all three.
It sounds like you have one half of the total question--determining which id_shows a particular id_spectator attended.
What you want to ask yourself is how you can determine which id_spectators attended an id_show, given an id_show. Once you have that, combine the two answers to get the full result.
So the final answer I got, looks like this :
SELECT id_spectator, id_show,(
SELECT string_agg(to_char(A.id_spectator, '999'), ',')
FROM Reservations A
WHERE A.id_show=B.id_show
) AS other_id_spectators
FROM Reservations B
GROUP By id_spectator, id_show
ORDER BY id_spectator ASC;
Which prints something like this:
id_spectator | id_show | other_id_spectators
-------------+---------+---------------------
1 | 1 | 1, 2, 9
1 | 14 | 1, 2
Which suits my needs, however if you have any improvements to offer, please share :) Thanks again everybody!

Finding the difference between two sets of data from the same table

My data looks like:
run | line | checksum | group
-----------------------------
1 | 3 | 123 | 1
1 | 7 | 123 | 1
1 | 4 | 123 | 2
1 | 5 | 124 | 2
2 | 3 | 123 | 1
2 | 7 | 123 | 1
2 | 4 | 124 | 2
2 | 4 | 124 | 2
and I need a query that returns me the new entries in run 2
run | line | checksum | group
-----------------------------
2 | 4 | 124 | 2
2 | 4 | 124 | 2
I tried several things, but I never got to a satisfying answer.
In this case I'm using H2, but of course I'm interested in a general explanation that would help me to wrap my head around the concept.
EDIT:
OK, it's my first post here so please forgive if I didn't state the question precisely enough.
Basically given two run values (r1, r2, with r2 > r1) I want to determine which rows having row = r2 have a different line, checksum or group from any row where row = r1.
select * from yourtable
where run = 2 and checksum = (select max(checksum)
from yourtable)
Assuming your last run will have the higher run value than others, below SQL will help
select * from table1 t1
where t1.run in
(select max(t2.run) table1 t2)
Update:
Above SQL may not give you the right rows because your requirement is not so clear. But the overall idea is to fetch the rows based on the latest run parameters.
SELECT line, checksum, group
FROM TableX
WHERE run = 2
EXCEPT
SELECT line, checksum, group
FROM TableX
WHERE run = 1
or (with slightly different result):
SELECT *
FROM TableX x
WHERE run = 2
AND NOT EXISTS
( SELECT *
FROM TableX x2
WHERE run = 1
AND x2.line = x.line
AND x2.checksum = x.checksum
AND x2.group = x.group
)
A slightly different approach:
select min(run) run, line, checksum, group
from mytable
where run in (1,2)
group by line, checksum, group
having count(*)=1 and min(run)=2
Incidentally, I assume that the "group" column in your table isn't actually called group - this is a reserved word in SQL and would need to be enclosed in double quotes (or backticks or square brackets, depending on which RDBMS you are using).