Rails/SQL - How to filter by distinct value in a group - sql

I'm working in Rails, but an answer in SQL is equally helpful. Let's say I have a table of Users and a table of Purchases. I want to find the Users who have only ever bought Item A. I was hoping to use a query along the lines of:
User.joins(:purchases).group(:id).having("DISTINCT(item) = 'A'").pluck(:id)
This is a simplification of the question I need to answer, but this grouping issue is my main roadblock. For that reason, I'm hoping for an answer that is logically very similar, as other workarounds would likely not apply.

Does this work in Rails?
User.joins(:purchases).group(:id).having("MIN(item) = MAX(item) AND MIN(item) = 'A'").pluck(:id)
This phrase as: there is only one distinct value (since MIN() and MAX() are equal), that is 'A'.
Alternatively:
having("MAX(CASE WHEN item <> 'A' THEN 1 ELSE 0 END) = 0")
Which would stand for: no other value than 'A'.

In having you can use only aggregate functions (e.g. having count(id) > 2) or expressions on columns you did the grouping on e.g. having("id > 1").
So depending on your db you may try to find an aggregate function that identifies existence of item in the grouping per id.
For PostgreSql that would be something like (haven't tested):
...
GROUP BY id
HAVING 'A' = ANY(ARRAY_AGG(item))

Related

SQL Oracle: Trying to pull a count with AND operators, New and needs experienced eyes

I am new to SQL and have had pretty good luck figuring things out thus far but I am missing something in this query:
The question is how to return a distinct count from two columns using another column and the criteria if the value is greater than 0.
I have tried IF and AND operators (My current query returns a 0 not an error, and it works when only using one .shp criteria)
select count (distinct ti.TO_ADDRESS)
from ti
where ti.input_id = 'xxx_029_01z_c_zzzzbab_ecrm.shp'
and ti.input_id = 'xxx_030_01z_c_zzzzbab_ecrm.shp'
and ti.OPENED>0;
Thanks so much!!
I think you want two levels of aggregation:
select count(*)
from (select ti.TO_ADDRESS
from ti
where ti.input_id in ('xxx_029_01z_c_zzzzbab_ecrm.shp', 'xxx_030_01z_c_zzzzbab_ecrm.shp') and
ti.OPENED > 0
group by ti.TO_ADDRESS
having count(distinct ti.input_id) = 2 -- has both of them
) ti;

Getting a unique value from an aggregated result set

I've got an aggregated query that checks if I have more than one record matching certain conditions.
SELECT RegardingObjectId, COUNT(*) FROM [CRM_MSCRM].[dbo].[AsyncOperationBase] a
where WorkflowActivationId IN ('55D9A3CF-4BB7-E311-B56B-0050569512FE',
'1BF5B3B9-0CAE-E211-AEB5-0050569512FE',
'EB231B79-84A4-E211-97E9-0050569512FE',
'F0DDF5AE-83A3-E211-97E9-0050569512FE',
'9C34F416-F99A-464E-8309-D3B56686FE58')
and StatusCode = 10
group by RegardingObjectId
having COUNT(*) > 1
That's nice, but then there is one field in AsyncOperationBase that will be unique. Say count(*) = 3, well, AsyncOperationBaseId in AsyncOperationBase will have 3 different values since AsyncOperationBase is the table's primary key.
To be honest, I would not even know what terms and expressions to Google to find a solution.
If anyone has a solution and also, is there any words to describe what I'm looking for ? Perhaps BI people are often faced with such a requirement or something...
I could do it with an SSRS report where the report would visually do the grouping then I could expand each grouped row to get the AsyncOperationBaseId value, but simply through SQL, I can't seem to find a way out...
Thanks.
select * from [CRM_MSCRM].[dbo].[AsyncOperationBase]
where RegardingObjectId in
(
SELECT RegardingObjectId
FROM [CRM_MSCRM].[dbo].[AsyncOperationBase] a
where WorkflowActivationId IN
(
'55D9A3CF-4BB7-E311-B56B-0050569512FE',
'1BF5B3B9-0CAE-E211-AEB5-0050569512FE',
'EB231B79-84A4-E211-97E9-0050569512FE',
'F0DDF5AE-83A3-E211-97E9-0050569512FE',
'9C34F416-F99A-464E-8309-D3B56686FE58'
)
and StatusCode = 10
group by RegardingObjectId
having COUNT(*) > 1
)

Writing a query to include certain values but exclude others when looking for a latest time period

I am trying to write a query that looks for a people that have a certain code with the latest period (year) but not if they have another code with that latest period(year). I'll be explicit just so my example makes sense.
I want people who have the code A1,A2,A3,A4,A5 but not AG,AP,AQ. There are people who have an A1 code for a period (like 2014) and an AG code for a the same period. I'd like to exclude them. Not everyone has a code so the field value could be NULL.
Is there a way to express this in a different way (i.e. less characters) than the way I did?
SELECT
people.firstName
FROM
people
WHERE EXISTS (
SELECT *
FROM codes
WHERE
codes.people_id = people.id
AND period = (SELECT MAX(period) FROM codes codes2 WHERE codes2.people_id = codes.people_id)
AND code LIKE 'A[1-5]'
)
AND NOT EXISTS (
SELECT *
FROM codes
WHERE
codes.people_id = people.id
AND period = (
SELECT MAX(period)
FROM codes codes2
WHERE codes2.people_id = codes.people_id
)
AND code LIKE 'A[GPQ]'
)
Schema is as follows:
People
id (PK)
firstName
Codes
people_id (FK) many to one relation with People table
code (e.g. "A1", "A2", "AG")
period (e.g. "2013", "2014")
There are so many ways you could do that, I'm not an SQL expert but I can't see your query being too bad, if you want to try and reduce the number of sub-queries you could consider using the GROUP BY clause along with a SUM Aggregate function in a HAVING clause.
I started updating your code as follows:
SELECT
people.firstName
FROM
people
LEFT JOIN codes AS a15 ON a15.people_id = people.id AND a15.code LIKE 'A[1-5]'
LEFT JOIN codes AS agpq ON agpq.people_id = people.id AND agpq.code LIKE 'A[GPQ]'
GROUP BY
people.firstName
HAVING
SUM(CASE WHEN a15.code IS NULL THEN 0 ELSE 1 END) > 0
AND SUM(CASE WHEN agpq.code IS NULL THEN 0 ELSE 1 END) = 0
This however doesn't take into account anything to do with period specific requirements described. You could add the period to the GROUP BY clause or add it to a WHERE or one of the JOIN constraints but I'm not quite sure from your description exactly what you're after (I don't believe this is through any fault of your own, I just can't personally align the code provided to the description).
I would also like to point out that the SUM functions above will not give an accurate count of the number of matching codes. This is because if both A[GPQ] and A[1_5] return at least one row, the number returned by each constraint will be multiplied by the number returned for the other, it can however be used to determine if there are "any" returned items as if the criteria is matched it will have a SUM(...) > 0
I'm sure a more experienced SQL Developer / DBA will be able to poke many holes in my proposed query but it might give them or someone else something to work from and hopefully gives you ideas for alternatives to using sub-queries.

Oracle Group by issue

I have the below query. The problem is the last column productdesc is returning two records and the query fails because of distinct. Now i need to add one more column in where clause of the select query so that it returns one record. The issue is that the column i need
to add should not be a part of group by clause.
SELECT product_billing_id,
billing_ele,
SUM(round(summary_net_amt_excl_gst/100)) gross,
(SELECT DISTINCT description
FROM RES.tariff_nt
WHERE product_billing_id = aa.product_billing_id
AND billing_ele = aa.billing_ele) productdescr
FROM bil.bill_sum aa
WHERE file_id = 38613 --1=1
AND line_type = 'D'
AND (product_billing_id, billing_ele) IN (SELECT DISTINCT
product_billing_id,
billing_ele
FROM bil.bill_l2 )
AND trans_type_desc <> 'Change'
GROUP BY product_billing_id, billing_ele
I want to modify the select statement to the below way by adding a new filter to the where clause so that it returns one record .
(SELECT DISTINCT description
FROM RRES.tariff_nt
WHERE product_billing_id = aa.product_billing_id
AND billing_ele = aa.billing_ele
AND (rate_structure_start_date <= TO_DATE(aa.p_effective_date,'yyyymmdd')
AND rate_structure_end_date > TO_DATE(aa.p_effective_date,'yyyymmdd'))
) productdescr
The aa.p_effective_date should not be a part of GROUP BY clause. How can I do it? Oracle is the Database.
So there are multiple RES.tariff records for a given product_billing_id/billing_ele, differentiated by the start/end dates
You want the description for the record that encompasses the 'p_effective_date' from bil.bill_sum. The kicker is that you can't (or don't want to) include that in the group by. That suggests you've got multiple rows in bil.bill_sum with different effective dates.
The issue is what do you want to happen if you are summarising up those multiple rows with different dates. Which of those dates do you want to use as the one to get the description.
If it doesn't matter, simply use MIN(aa.p_effective_date), or MAX.
Have you looked into the Oracle analytical functions. This is good link Analytical Functions by Example

Group by SQL statement

So I got this statement, which works fine:
SELECT MAX(patient_history_date_bio) AS med_date, medication_name
FROM biological
WHERE patient_id = 12)
GROUP BY medication_name
But, I would like to have the corresponding medication_dose also. So I type this up
SELECT MAX(patient_history_date_bio) AS med_date, medication_name, medication_dose
FROM biological
WHERE (patient_id = 12)
GROUP BY medication_name
But, it gives me an error saying:
"coumn 'biological.medication_dose' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.".
So I try adding medication_dose to the GROUP BY clause, but then it gives me extra rows that I don't want.
I would like to get the latest row for each medication in my table. (The latest row is determined by the max function, getting the latest date).
How do I fix this problem?
Use:
SELECT b.medication_name,
b.patient_history_date_bio AS med_date,
b.medication_dose
FROM BIOLOGICAL b
JOIN (SELECT y.medication_name,
MAX(y.patient_history_date_bio) AS max_date
FROM BIOLOGICAL y
GROUP BY y.medication_name) x ON x.medication_name = b.medication_name
AND x.max_date = b.patient_history_date_bio
WHERE b.patient_id = ?
If you really have to, as one quick workaround, you can apply an aggregate function to your medication_dose such as MAX(medication_dose).
However note that this is normally an indication that you are either building the query incorrectly, or that you need to refactor/normalize your database schema. In your case, it looks like you are tackling the query incorrectly. The correct approach should the one suggested by OMG Poinies in another answer.
You may be interested in checking out the following interesting article which describes the reasons behind this error:
But WHY Must That Column Be Contained in an Aggregate Function or the GROUP BY clause?
You need to put max(medication_dose) in your select. Group by returns a result set that contains distinct values for fields in your group by clause, so apparently you have multiple records that have the same medication_name, but different doses, so you are getting two results.
By putting in max(medication_dose) it will return the maximum dose value for each medication_name. You can use any aggregate function on dose (max, min, avg, sum, etc.)