Select if distinct count > 2? - sql

[EDITED 10/03/2019 as per request]
Have a large dataset and need to map 2 ids, on different rows, to one 'conflict id'.
For example.
These rows must be attributed to a single value, "Apples & Pears".
I then need to know which persons have both "Apple" and "Pears", through this value, not just one of then.
It's a many to one relationship, that must be complete. Users must not be attributed "Apples & Pears“ if they do not have BOTH requirements.
I have master table of one to one relation for each fruit to each combination, but only want them to be joined with user table if the user has both fruits.
Table 1: User, Fruit
Josh, Apple
Josh, Pear
Tom, Apple
Kate, Pear
Table 2: Fruit, Product
Apple, Apples&Pears
Pear, Apples&Pears
Table 3 (OUTCOME I WANT TO ACHIEVE):
Josh, Apple, Apples&Pears
Josh, Pear, Apples&Pears
Tom, Apple, NULL
Kate, Pear, NULL

Welcome to S/O. Typically you want to provide more realistic sample table structures and data to portray at least critical to your question. if ever private data, make up, but keep it in accurate context.
That being said, I don't think you actually have a table of apples, pears and apples & pears, but I will play along.
I assume (example) you have a table of people, and another table of things they are associated with in a child table. I would do a join per person FIRST, to those that DO have an association with the "apples & pears" record, then do a LEFT-JOIN to same child table once for apples, another for pears. You can determine what you want to do with results (such as delete from the individual apples, pears child records).
Select
p.LastName,
p.FirstName,
case when justApples.PersonID IS NULL
then 'No' else 'Yes' end as AlsoHasApples,
case when justPears.PersonID IS NULL
then 'No' else 'Yes' end as AlsoHasPears
from
Person p
JOIN Fruits f
on p.PersonID = f.personID
AND f.Description = 'Apples & Pears'
LEFT JOIN Fruits justApples
on p.PersonID = justApples.PersonID
AND f.Description = 'Apples'
LEFT JOIN Fruits justPears
on p.PersonID = justPears.PersonID
AND f.Description = 'Pears'
Since the first "(inner) JOIN" explicitly is looking for those that ARE marked as both apples & pears record, that minimizes the list to just those people pre-qualified.
THEN, the two "LEFT-JOIN" are explicitly looking for the OTHER individualized "Apples" and "Pears" respectively.
By testing the IS NULL will indicate if the person does (or not via NULL) have that item.
If you ONLY WANT to return those that are Apples & Pears AND have one or both of the other individual items, you can add a WHERE clause at the end such as
WHERE
justApples.PersonID IS NOT NULL
OR justPears.PersonID IS NOT NULL
An inverse would be to pre-qualify the justApples and justPears and NOT marked as associated with Apples & Pears such as by REQUIRING the JOINS to apples and pears individually and LEFT-JOIN the combination record
Select
p.LastName,
p.FirstName,
case when f.PersonID IS NULL
then 'No' else 'Yes' end as IsAssociatedWithApplesAndPearsRecord
from
Person p
JOIN Fruits justApples
on p.PersonID = justApples.PersonID
AND f.Description = 'Apples'
JOIN Fruits justPears
on p.PersonID = justPears.PersonID
AND f.Description = 'Pears'
LEFT JOIN Fruits f
on p.PersonID = f.personID
AND f.Description = 'Apples & Pears'
The two joins to the individual REQUIRE each individual part, and the LEFT-JOIN shows IF they are also marked as the combination entry. As you can see by the ambiguity in your question, providing a little more detail in what you want can significantly help getting a more accurate answer to your true needs. Try not to mask details to context, but mask the sample data in itself where more private issues are of concern.

Related

Combining two records via Group By

I'm still new and learning in Access vba and appreciate if you can help me with my current scenario.
I have developed a code in VBA which pull the data from a table named Tblsrce
sqlStr = "SELECT zYear, zMonth, Product, Sum(Dollar) as totalAmt FROM Tblsrce "& _
"WHERE fruits IN (NOT NULL, '" & Replace(strFruits, ", ", "', '")
"GROUP BY zYear, zMonth, Product;"
The usual data that the field fruits contains Mango, Apples, Cherry, Banana, etc.
strFruits is a variable that came from users (which is separated by comma if they want to pull more than 1 fruit).
However, I got a problem with it when there are 2 related fruits with different name (e.g. Red Apple and Green Apple) which i need to combine. Is there any way I can Group By those records and tag them as Apples in the current query that i have?
Thanks!
Yes, you could use conditionals like the switch function to calculate some fruit group field.
Switch(
Product='Red Apple', 'Apple'
Product='Green Apple', 'Apple'
Product='Orange', 'Citrus') As ProductGroup
You can then use that field in a higher level query:
Select zYear, zMonth, ProductGroup,
Count(*)
From
(Select f.*,
Switch( .... )
From Fruits f)
Group By zYear, zMonth, ProductGroup
Of course it would be easier if this data isn't calculated dynamically in the query like this, but instead is stored in a separate table, so you know a product group for each of the products. That's also way easier to maintain (just add data instead of modify a query), and probably performs better.
You could, but you would have to have an additional table where you list all fruits, and their groups. Then you can join that in, and group by the groups.
Sample structure:
Fruit | FruitCategory
+-------------+---------------+
| Red apple | Apple |
+-------------+---------------+
| Green apple | Apple |
+-------------+---------------+
| Banana | Banana |
+-------------+---------------+
You can prepopulate the table with a quick SELECT DISTINCT Fruits from Tblsrce and insert that in both columns, and then adjust the categories where you want.

Query returning multiple identical rows instead one

I have the tables: juices, juice_ingredients and ingredients.
The juices table has the attributes:
name
barcode
The juice_ingredients table has the attributes:
juice_id
ingredient_id
And the ingredients table has
name
optional (boolean)
For the sake of customer's logistics various juices may have the same barcode, but with different ingredients, some of which are optional
I need select, by barcode, the single juice that has no contains optional ingredient among its ingredients.
I signed up four ingredients: water (optional: false), sugar (optional: true), pineapple pulp (optional: false) and mint (optional: true). And signed up four juices: one only with water and pineapple pulp, other with water, pineapple pulp and sugar, other with water, pineapple pulp and mint, and other with water, pineapple pulp, mint and sugar. All with the same barcode. I make a query to select only the juice make with non optional ingredients, in this case water and pineapple pulp.
SELECT *
FROM juices
INNER JOIN juice_ingredients ON (juice_ingredients.juice_id = juices.id)
INNER JOIN ingredients ON (juice_ingredients.ingredient_id = ingredients.id)
WHERE juices.barcode = '000000000001' AND ingredients.optional = false
But it was returning multiple rows. What should change this query to bring only one, or the juice containing no optional ingredients in the composition?
You could do it with a having clause:
SELECT juices.*
FROM juices
JOIN juice_ingredients ON juice_ingredients.juice_id = juices.id
JOIN ingredients ON juice_ingredients.ingredient_id = ingredients.id
WHERE juices.barcode = '000000000001'
GROUP BY 1, 2
HAVING MAX(ingredients.optional::text) = 'false'
Since you didn't specify which database you are using, you may have to adjust the SQL for your specific database:
select *
from juices j
where j.barcode = '000000000001'
and not exists (select *
from juice_ingredients ji
inner join ingredients i
on (i.ingredient_id = ji.ingredient_id
and i.optional = true)
where ji.juice_id = j.juice_id)

SSRS query and WHERE with multiple

Being new with SQL and SSRS and can do many things already, but I think I must be missing some basics and therefore bang my head on the wall all the time.
A report that is almost working, needs to have more results in it, based on conditions.
My working query so far is like this:
SELECT projects.project_number, project_phases.project_phase_id, project_phases.project_phase_number, project_phases.project_phase_header, project_phase_expensegroups.projectphase_expense_total, invoicerows.invoicerow_total
FROM projects INNER JOIN
project_phases ON projects.project_id = project_phases.project_id
LEFT OUTER JOIN
project_phase_expensegroups ON project_phases.project_phase_id = project_phase_expensegroups.project_phase_id
LEFT OUTER JOIN
invoicerows ON project_phases.project_phase_id = invoicerows.project_phase_id
WHERE ( projects.project_number = #iProjectNumber )
AND
( project_phase_expensegroups.projectphase_expense_total >0 )
The parameter is for selectionlist that is used to choose a project to the report.
How to have also records that have
( project_phase_expensegroups.projectphase_expense_total ) with value 0 but there might be invoices for that project phase?
Tried already to add another condition like this:
WHERE ( projects.project_number = #iProjectNumber )
AND
( project_phase_expensegroups.projectphase_expense_total > 0 )
OR
( invoicerows.invoicerow_total > 0 )
but while it gives some results - also the one with projectphase_expense_total with value 0, but the report is total mess.
So my question is: what am I doing wrong here?
There is a core problem with your query in that you are left joining to two tables, implying that rows may not exist, but then putting conditions on those tables, which will eliminate NULLs. That means your query is internally inconsistent as is.
The next problem is that you're joining two tables to project_phases that both may have multiple rows. Since these data are not related to each other (as proven by the fact that you have no join condition between project_phase_expensegroups and invoicerows, your query is not going to work correctly. For example, given a list of people, a list of those people's favorite foods, and a list of their favorite colors like so:
People
Person
------
Joe
Mary
FavoriteFoods
Person Food
------ ---------
Joe Broccoli
Joe Bananas
Mary Chocolate
Mary Cake
FavoriteColors
Person Color
------ ----------
Joe Red
Joe Blue
Mary Periwinkle
Mary Fuchsia
When you join these with links between Person <-> Food and Person <-> Color, you'll get a result like this:
Person Food Color
------ --------- ----------
Joe Broccoli Red
Joe Bananas Red
Joe Broccoli Blue
Joe Bananas Blue
Mary Chocolate Periwinkle
Mary Chocolate Fuchsia
Mary Cake Periwinkle
Mary Cake Fuchsia
This is essentially a cross-join, also known as a Cartesian product, between the Foods and the Colors, because they have a many-to-one relationship with each person, but no relationship with each other.
There are a few ways to deal with this in the report.
Create ExpenseGroup and InvoiceRow subreports, that are called from the main report by a combination of project_id and project_phase_id parameters.
Summarize one or the other set of data into a single value. For example, you could sum the invoice rows. Or, you could concatenate the expense groups into a single string separated by commas.
Some notes:
Please, please format your query before posting it in a question. It is almost impossible to read when not formatted. It seems pretty clear that you're using a GUI to create the query, but do us the favor of not having to format it ourselves just to help you
While formatting, please use aliases, Don't use full table names. It just makes the query that much harder to understand.
You need an extra parentheses in your where clause in order to get the logic right.
WHERE ( projects.project_number = #iProjectNumber )
AND (
(project_phase_expensegroups.projectphase_expense_total > 0)
OR
(invoicerows.invoicerow_total > 0)
)
Also, you're using a column in your WHERE clause from a table that is left joined without checking for NULLs. That basically makes it a (slow) inner join. If you want to include rows that don't match from that table you also need to check for NULL. Any other comparison besides IS NULL will always be false for NULL values. See this page for more information about SQL's three value predicate logic: http://www.firstsql.com/idefend3.htm
To keep your LEFT JOINs working as you intended you would need to do this:
WHERE ( projects.project_number = #iProjectNumber )
AND (
project_phase_expensegroups.projectphase_expense_total > 0
OR project_phase_expensegroups.project_phase_id IS NULL
OR invoicerows.invoicerow_total > 0
OR invoicerows.project_phase_id IS NULL
)
I found the solution and it was kind easy after all. I changed the only the second LEFT OUTER JOIN to INNER JOIN and left away condition where the query got only results over zero. Also I used SELECT DISTINCT
Now my report is working perfectly.

How to specify row names in MS Access 2007

I have a cross tab query and it pulls only the row name if there is data associated with it in the database. For example, if I have three types of musical instruments:
Guitar
Piano
Drums
Other
My results will show up as:
Guitar 1
Drums 2
It doesn't list Piano because there is no ID associated with Piano in the DB. I know I can specify columns in the properties menu, i.e. "1, 2, 3, 4, 5" will put columns in the DB for each, regardless of whether or not there is data to populate them.
I am looking for a similar solution for rows. Any ideas?
Also, I need NULL values to show up as 0.
Here's the actual SQL (forget the instrument example above)
TRANSFORM Count(Research.Patient_ID) AS CountOfPatient_ID
SELECT
Switch(
[Age]<22,"21 and under",
[Age]>=22 And [AGE]<=24,"Between 22 And 24",
[Age]>=25 And [AGE]<=29,"Between 25 And 29",
[Age]>=30 And [AGE]<=34,"30-34",
[Age]>=35 And [AGE]<=39,"35-39",
[Age]>=40 And [AGE]<=44,"40-44",
[Age]>44,"Over 44"
) AS Age_Range
FROM (Research
INNER JOIN (
SELECT ID, DateDiff("yyyy",DOB,Date()) AS AGE FROM Demographics
) AS Demographics ON Research.Patient_ID=Demographics.ID)
INNER JOIN [Letter Status] ON Research.Patient_ID=[Letter Status].Patient_ID
WHERE ((([Letter Status].Letter_Count)=1))
GROUP BY Demographics.AGE, [Letter Status].Letter_Count
PIVOT Research.Site In (1,2,3,4,5,6,7,8,9,10);
In short, I need all of the rows to show up regardless of whether or not there is a value (for some reason the LEFT JOIN isn't working, so if you can, please use my code to form your answer), and I also need to replace NULL values with 0.
Thanks
I believe this has to do with the way you are joining the instruments table to the IDs table. If you use a left outer join from instruments to IDs, Piano should be included. It would be helpful to see your actual tables and queries though, as your question is kind of vague.
What if you union the select with a hard coded select with one value for each age group.
select 1 as Guitar, 1 as Piano, 1 as Drums, 1 as Other
When you do the transform, each row will have a result that is +1 of the result you want.
foo barTmpCount
-------- ------------
Guitar 2
Piano 1
Drums 3
Other 1
You can then do a
select foo, barTmpCount - 1 as barCount from <query>
and get something like this
foo barCount
-------- ---------
Guitar 1
Piano 0
Drums 2
Other 0

mysql where IN on large dataset or Looping?

I have the following scenario:
Table 1:
articles
id article_text category author_id
1 "hello world" 4 1
2 "hi" 5 2
3 "wasup" 4 3
Table 2
authors
id name friends_with
1 "Joe" "Bob"
2 "Sue" "Joe"
3 "Fred" "Bob"
I want to know the total number of authors that are friends with "Bob" for a given category.
So for example, for category 4 how many authors are there that are friends with "Bob".
The authors table is quite large, in some cases I have a million authors that are friends with "Bob"
So I have tried:
Get list of authors that are friends with bob, and then loop through them and get the count for each of them of that given category and sum all those together in my code.
The issue with this approach is it can generate a million queries, even though they are very fast, it seems there should be a better way.
I was thinking of trying to get a list of authors that are friends with bob and then building an IN clause with that list, but I fear that would blow out the amt of memory allowed in the query set.
Seems like this is a common problem. Any ideas?
thanks
SELECT COUNT(DISTINCT auth.id)
FROM authors auth
INNER JOIN articles art ON auth.id = art.author_id
WHERE friends_with = 'bob' AND art.category = 4
Count(Distinct a.id) is required as articles might hit multiple rows for each author.
But if you have any control over the database I would use a link table for friends_with as your cussrent solution either have to use a comma seperated list of names which will be disastrous for performance and require a completly different query or each author can only have one friend.
Friends
id friend_id
then the query would look like this
SELECT COUNT(DISTINCT auth.id)
FROM authors auth
INNER JOIN articles art ON auth.id = art.author_id
INNER JOIN friends f ON auth.id = f.id
INNER JOIN authors fauth ON fauth.id = f.friend_id
WHERE fauth.name = 'bob' AND art.category = 4
Its more complex but will allow for many friends, just remeber, this construct calls for 2 rows in friends for each pair, one from joe to bob and one from bob to joe.
You could build it differently but that would make the query even more complex.
Maybe something like
select fr.name,
fr.id,
au.name,
ar.article_text,
ar.category,
ar.author_id
from authors fr, authors au, articles ar
where fr.id = ar.author_id
and au.friends_with = fr.name
and ar.category = 4 ;
Just the count...
select count(distinct fr.name)
from authors fr, authors au, articles ar
where fr.id = ar.author_id
and au.friends_with = fr.name
and ar.category = 4 ;
A version without using joins (hopefully will work!)
SELECT count(distinct id) from authors where friends_with = 'Bob' and id in(select author_id from articles where category = 4)
I found it is easier to understand statements with 'IN' in when I started out with SQL.