How to get the most recent rows in a group - sql

I have a Rails 4.2.5.x project running PostGres. I have a table with a similar structure to this:
id, contact_id, date, domain, f1, f2, f3, etc
1, ABC, 01-01-16, abc.com, 1, 2, 3, ...
2, ABC, 01-01-15, abc.com, 1, 2, 3, ...
3, ABC, 01-01-14, abc.com, 1, 2, 3, ...
4, DEF, 01-01-15, abc.com, 1, 2, 3, ...
5, DEF, 01-01-14, abc.com, 1, 2, 3, ...
6, GHI, 01-11-16, abc.com, 1, 2, 3, ...
7, GHI, 01-01-16, abc.com, 1, 2, 3, ...
8, GHI, 01-01-15, abc.com, 1, 2, 3, ...
9, GHI, 01-01-14, abc.com, 1, 2, 3, ...
...
...
99, ZZZ, 01-01-16, xyz.com, 1, 2, 3, ...
I need to query to find:
The most recent rows by date
filtered by domain
for a distinct contact_id (grouped by?)
row-limited result. In this example, I'm not adding this complication but this needs to be factored in. If there are 50 distinct contacts, I am only interested in the top 3 by date.
ID is the primary key.
there are indexes on the other columns
the fX columns indicate other data in the model that is needed (such as contact email, for example).
In MySQL, this would be a simple SELECT * FROM table WHERE domain='abc.com' GROUP BY contact_id ORDER BY date DESC, however, PostGres complains, in this case, that:
ActiveRecord::StatementInvalid: PG::GroupingError: ERROR: column "table.id" must appear in the GROUP BY clause or be used in an aggregate function
I expect to get back 3 rows; 1, 4 and 6. Ideally, I'd like to get back the full rows in a single query... but I accept that I may need to do one query to get the IDs first, then another to find the items I want.
This is the closest I have got:
ExampleContacts
.select(:contact_id, 'max(date) AS max_date')
.where(domain: 'abc.com')
.group(:contact_id)
.order('max_date desc')
.limit(3)
However... this returns the contact_id, not the id. I cannot add the ID for the row.
EDIT:
Essentially, I need to get the primary key back for the row which is grouped on the non-primary key and sorted by another field.

If you want the rows, you don't need grouping. It's simply Contact.select('DISTINCT ON (contact_id)').where(domain: 'abc.com').order(date: :desc).limit(3)

Just to clarify #murad-yusufov's accepted answer, I ended up doing this:
subquery = ExampleContacts.select('DISTINCT ON (contact_id) *')
.where(domain: 'abc.com')
.order(contact_id)
.order(date: :desc)
ExampleContacts.from("(#{subquery.to_sql}) example_contacts")
.order(date: :desc)

Related

SQL query for sales pipeline

I need to build a sales pipeline with one query in SQL (Big Query).
The table has columns:
-timestamp (event time)
-id (user id)
-event
Each event is a number from 1 to 8. And I need to calculate how many unique users there were at each step.
Each step is counted only if the previous steps have been completed.
It is not necessary to go through them straight one after the other, but the main thing is that before each step, n-1 step was taken earlier.
If you sort the table by 'timestamp', you often get such sequences for one 'id' at one day:
4, 4, 1, 1, 3, 6, 5, 5, 6, 5, 6, 7, 8, 1, 2, 5, 3, 4.
In this example, the longest sequence is 1, 2, 3, 4.
The sequence is counted in one day!
I failed to solve the problem through the max/min/lag/lead window functions. I even did a 'case' with a sequential comparison with lag+n values.
I wasted 2 days for this task(

Something like GROUP BY HAVING ALL IN [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
I want to select a ProductConfig that has exactly the given variants. As it is a many-to-many relationship I have an association table. With the association table I have been trying to use it with the GROUP BY so I can work on the other column.
The problem I am having is that I need an exactly equal operator to a set of values inside the HAVING. Something like HAVING variant_id = (1, 2, 3, 99).
For now I have the following query with some problems
SELECT productconfig_id
FROM association_productconfig_elementvariant
GROUP BY productconfig_id
HAVING variant_id IN (1, 2, 3, 99);
This will match if productconfig_id has variant_id equal to ANY subset of {1, 2 3, 99} like {1, 2} or {1, 3} but I only want it to match with the exact set {1, 2, 3, 99}.
I have another problem the other way around. If productconfig_id has variant_id equal to {1, 2, 50} it will also match because the first to is in the values even though the last is not.
Basically I want to compare equality over a column and a set of values. This second problem will solve if you had something like HAVING ALL IN.
This is probably more on-target with what you need. Here, I am doing both a COUNT() and a sum() based on the matching variant_id in question. This is making sure that whatever records DO qualify the variant get to the 4 count, but ALSO the count() of every variant per configuration.
So, if one product had variants of (1, 2, 3, 5, 12, 99, 102, 150) would have a count(*) = 8, but the specific match = 4 based on those in question.
Now, if you can ignore the overall count of 8, just remove that AND portion from below, but at least you know the primary 4 in consideration are accounted for.
SELECT
productconfig_id
FROM
association_productconfig_elementvariant
GROUP BY
productconfig_id
HAVING
sum( case when variant_id in ( 1, 2, 3, 99 )
then 1 else 0 end ) = 4
AND count(*) = 4
Could you try this :
Select productconfig_id
From (
SELECT productconfig_id, count(1) as _count
FROM association_productconfig_elementvariant
GROUP BY productconfig_id
HAVING variant_id IN (1, 2, 3, 99)
) as s where _count = 4;
Basicly the only productconfig with count = 4 will be the one you are looking for.

Set column value when multiple rows exist

I have a flattened data set that includes Order#, Shipment#, and ShippingCharges. There can be multiple shipments per order, but shipping charges are collected at the order level. Here is an example dataset:
1, 1, $5.00
2, 1, $6.00
2, 2, $6.00
3, 1, $10.00
3, 2, $10.00
3, 3, $10.00
4, 1, $4.00
As you can see, the order's ShippingCharges are repeated for each shipment in the data set. I need to come up with a query that will set ShippingCharges to 0 when there are multiple shipments on the order. The resulting dataset would look like this:
1, 1, $5.00
2, 1, $6.00
2, 2, $0.00
3, 1, $10.00
3, 2, $0.00
3, 3, $0.00
4, 1, $4.00
It is important to note that the Shipment# numbers do not all reset to 1 for each order. I did this in the sample dataset to make it easier to follow. Shipment# is actually a sequential integer that increments each time a shipment is created, so a simple UPDATE dataset SET ShippingCharges=0 WHERE Shipment# > 1 is NOT the answer.
It seems like I would need to do an UPDATE when there is more than 1 shipment for an order, but only for rows where the Shipment# is greater than the minimum Shipment# for the order.
Any ideas what that query might look like, especially for Microsoft Access?
UPDATE dataset SET ShippingCharges=0 WHERE Shipment# > 1 is NOT
the answer.
Then set the charge to zero when Shipment# does not match the minimum Shipment# for that Order#.
UPDATE dataset
SET ShippingCharges=0
WHERE [Shipment#] <> DMin("[Shipment#]", "dataset", "[Order#]=" & [Order#])
If the Order# field is text datatype, add quotes in the third DMin argument (Criteria):
DMin("[Shipment#]", "dataset", "[Order#]='" & [Order#] & "'")
This was written in Oracle, so not sure if you can do similiar in Access.
Sub select to the table to see if it's the min shipment or not then you use the shipment charge or 0 it out.
select a.order_num, a.shipment_num,
case when a.shipment_num = (
select min(b.shipment_num)
from order_table b
where b.order_num = a.order_num
) then max(a.shipment_charges) else '0' end
from order_table a
group by order_num, shipment_num
order by order_num, shipment_num

Can SQL Server perform an update on rows with a set operation on the aggregate max or min value?

I am a fairly experienced SQL Server developer but this problem has me REALLY stumped.
I have a FUNCTION. The function is referencing a table that is something like this...
PERFORMANCE_ID, JUDGE_ID, JUDGING_CRITERIA, SCORE
--------------------------------------------------
101, 1, 'JUMP_HEIGHT', 8
101, 1, 'DEXTERITY', 7
101, 1, 'SYNCHRONIZATION', 6
101, 1, 'SPEED', 9
101, 2, 'JUMP_HEIGHT', 6
101, 2, 'DEXTERITY', 5
101, 2, 'SYNCHRONIZATION', 8
101, 2, 'SPEED', 9
101, 3, 'JUMP_HEIGHT', 9
101, 3, 'DEXTERITY', 6
101, 3, 'SYNCHRONIZATION', 7
101, 3, 'SPEED', 8
101, 4, 'JUMP_HEIGHT', 7
101, 4, 'DEXTERITY', 6
101, 4, 'SYNCHRONIZATION', 5
101, 4, 'SPEED', 8
In this example there are 4 judges (with IDs 1, 2, 3, and 4) judging a performance (101) against 4 different criteria (JUMP_HEIGHT, DEXTERITY, SYNCHRONIZATION, SPEED).
(Please keep in mind that in my real data there are 10+ criteria and at least 6 judges.)
I want to aggregate the results in a score BY JUDGING_CRITERIA and then aggregate those into a final score by summing...something like this...
SELECT SUM (Avgs) FROM
(SELECT AVG(SCORE) Avgs
FROM PERFORMANCE_SCORES
WHERE PERFORMANCE_ID=101
GROUP BY JUDGING_CRITERIA) result
BUT... that is not quite what I want IN THAT I want to EXCLUDE from the AVG the highest and lowest values for each JUDGING_CRITERIA grouping. That is the part that I can't figure out. The AVG should be applied only to the MIDDLE values of the GROUPING FOR EACH JUDGING_CRITERIA. The HI value and the LO value for JUMP_HEIGHT should not be included in the average. The HI value and the LO value for DEXTERITY should not be included in the average. ETC.
I know this could be accomplished with a cursor to set the hi and lo for each criteria to NULL. But this is a FUNCTION and should be extremely fast.
I am wondering if there is a way to do this as a SET operation but still automatically exclude HI and LO from the aggregation?
Thanks for your help. I have a feeling it can probably be done with some advanced SQL syntax but I don't know it.
One last thing. This example is actually a simplification of the problem I am trying to solve. I have other constraints not mentioned here for the sake of simplicity.
Seth
EDIT: -Moved the WHERE clause to inside the CTE.
-Removed JudgeID from the partition
This would be my approach
;WITH Agg1 AS
(
SELECT PERFORMANCE_ID
,JUDGE_ID
,JUDGING_CRITERIA
,SCORE
,MinFind = ROW_NUMBER() OVER ( PARTITION BY PERFORMANCE_ID
,JUDGING_CRITERIA
ORDER BY SCORE ASC )
,MaxFind = ROW_NUMBER() OVER ( PARTITION BY PERFORMANCE_ID
,JUDGING_CRITERIA
ORDER BY SCORE DESC )
FROM PERFORMANCE_SCORES
WHERE PERFORMANCE_ID=101
)
SELECT AVG(Score)
FROM Agg1
WHERE MinFind > 1
AND MaxFind > 1
GROUP BY JUDGING_CRITERIA

SQL COUNT of COUNT

I have some data I am querying. The table is composed of two columns - a unique ID, and a value. I would like to count the number of times each unique value appears (which can easily be done with a COUNT and GROUP BY), but I then want to be able to count that. So, I would like to see how many items appear twice, three times, etc.
So for the following data (ID, val)...
1, 2
2, 2
3, 1
4, 2
5, 1
6, 7
7, 1
The intermediate step would be (val, count)...
1, 3
2, 3
7, 1
And I would like to have (count_from_above, new_count)...
3, 2 -- since three appears twice in the previous table
1, 1 -- since one appears once in the previous table
Is there any query which can do that? If it helps, I'm working with Postgres. Thanks!
Try something like this:
select
times,
count(1)
from ( select
id,
count(distinct value) as times
from table
group by id ) a
group by times