GROUP BY with flagged value - sql

Below query is to return flag as Y if c.LAST_UPDATED_TIMESTAMP < MAX(t.LATEST_ACTION_TIMESTAMP)
SELECT
'Y' as CAN_UPDATE, t.LATEST_ACTION_TIMESTAMP
FROM CUSTOMERS c
LEFT JOIN TRANSACTIONS t ah on (c.customer_id = t.customer_id)
WHERE
t.status_active_flag = 'Y' and c.customer_ID ='CUST_019'
GROUP BY t.LATEST_ACTION_TIMESTAMP
HAVING c.LAST_UPDATED_TIMESTAMP < MAX(t.LATEST_ACTION_TIMESTAMP);
ORA-00979 not a GROUP BY expression encountered, understand that all columns in SELECT need to be included in GROUP BY clause. How can handle for the flagged value 'Y' in this case?

Column c.LAST_UPDATED_TIMESTAMP need to be added to group by part as well
GROUP BY t.LATEST_ACTION_TIMESTAMP, c.LAST_UPDATED_TIMESTAMP
here is a dbfiddle with a dumb example

The HAVING clause will effectively change the LEFT JOIN to an INNER JOIN and you can, since you are taking the maximum of the column you are grouping by and aggregating on that column is irrelevant as there will only be a singular value per group, then move the HAVING comparison to the ON clause of the join without aggregation:
SELECT 'Y' as CAN_UPDATE,
t.LATEST_ACTION_TIMESTAMP
FROM CUSTOMERS c
INNER JOIN TRANSACTIONS t
ON ( c.customer_id = t.customer_id
AND c.LAST_UPDATED_TIMESTAMP < t.LATEST_ACTION_TIMESTAMP
)
WHERE t.status_active_flag = 'Y'
AND c.customer_id ='CUST_019'
GROUP BY t.LATEST_ACTION_TIMESTAMP;
You could also use a DISTINCT (or UNIQUE) clause instead of the GROUP BY clause.

Related

Im trying to remove duplicates while getting a MAX result at the same time. I cant remove the duplicates

select distinct person.person_id, MAX(patient_encounter.enc_timestamp) as LastAppt
from patient_encounter inner join person on patient_encounter.person_id = person.person_id
where enc_timestamp between '2018-04-05 00:00:00.000' and '2020-04-05 23:59:59.999'
and patient_encounter.person_id = person.person_id
and billable_ind = 'y' and person.last_name <> 'ztest'
group by person.person_id, patient_encounter.enc_timestamp
order by person.person_id
I think you want:
select p.person_id, max(pe.enc_timestamp) as LastAppt
from patient_encounter pe inner join
person p
on pe.person_id = p.person_id
where pe.enc_timestamp >= '2018-04-05' and
pe.enc_timestamp < '2020-04-06' and
?.billable_ind = 'y' and -- what table is this in ???
p.last_name <> 'ztest'
group by p.person_id
order by p.person_id;
The GROUP BY specifies the definition of each row in the results set. This is specifying that you want one row per distinct value of person_id.
Notes:
The fix is to remove the timestamp from the GROUP BY.
SELECT DISTINCT is almost never needed with GROUP BY. In fact, it is rarely needed at all.
There is no reason to redundantly duplicate the JOIN conditions in the WHERE clause.
Table aliases make the query easier to write and to read.
You should qualify all column references. What table is billing_ind in?
You can simplify the date comparisons. As a benefit, you can think in terms of days rather than milliseconds.

Sum not selecting the values with Zero

I have two tables CDmachine and trnasaction.
CDMachine Table with columns CDMachineID, CDMachineName, InstallationDate
Transaction table with columns TransactionID,CDMachineID,TransactionTime,Amount
I am calculating revenue using the below query but it eliminates the machine without any transaction
SELECT CDMachine.MachineName,
SUM(Transaction.Amount)
FROM CDMachine
LEFT JOIN TRANSACTION ON CDMachine.CDMachineID = Transaction.CDMachineID
WHERE Transaction.TransactionTime BETWEEN '2019-01-01' AND '2019-01-31'
GROUP BY CDMachine.CDMachineName
ORDER BY 2
Move the WHERE condition to the ON clause:
select m.MachineName, sum(t.Amount)
from CDMachine m left join
Transaction t
on m.CDMachineID = t.CDMachineID and
t.TransactionTime between '2019-01-01' and '2019-01-31'
group by m.CDMachineName
order by 2;
The WHERE clause turns the outer join to an inner join -- meaning that you are losing the values that do not match.
If you want 0 rather than NULL for the sum, then use:
select m.MachineName, coalesce(sum(t.Amount), 0)
Even though you are using a LEFT JOIN, the fact that you have a filter on a column from the joined table causes rows that don't meet the join condition to be removed from the result set.
You need to apply the filter on transaction time to the transactions table, before joining it or as part of the join condition. I would do it like this:
SELECT CDMachine.MachineName,
SUM(Transaction.Amount)
FROM CDMachine
LEFT JOIN (
SELECT * FROM TRANSACTION
WHERE Transaction.TransactionTime BETWEEN '2019-01-01' AND '2019-01-31'
) AS Transaction
ON CDMachine.CDMachineID = Transaction.CDMachineID
GROUP BY CDMachine.CDMachineName
ORDER BY 2

Need to understand specific LEFT OUTER JOIN behavior in SQL SELECT

I have two tables, transactions and dates. One date may have one or more transactions. I need to get a list of dates with or without transactions of a specific account (account number 111 below).
select d.the_date, t.account, t.amount from dates as d
LEFT OUTER JOIN transactions as t ON t.tx_date=d.the_date
where t.account=111 AND d.the_date>='2016-01-02' and d.the_date<='2017-12-30'
order by d.the_date;
The issue is that when I specify in the condition t.account=111 I don't get the dates on which account 111 did NOT make any transactions.
Only if I remove from the condition t.account=111 I DO get the dates with no transactions (i.e. the LEFT OUTER JOIN works). Why does this happen?
Conditions on the second table need to go into the on clause:
select d.the_date, t.account, t.amount
from dates d left join
transactions t
on t.tx_date = d.the_date and t.account = 111
where d.the_date >= '2016-01-02' and d.the_date <= '2017-12-30'
order by d.the_date;
Otherwise, the value of t.account is NULL in the where clause, turning the outer join into an inner join.

Firebird can't recognize calculated column in group by clause

I have the following SQL:
select
inv.salesman_id,
(select salesman_goals.goal from salesman_goals
where salesman_goals.salesman_id = inv.salesman_id
and salesman_goals.group_id = g.group_id
and salesman_goals.subgroup_id = sg.subgroup_id
and salesman_goals.variation_id = v.variation_id)
as goal,
sum(i.quantity) as qnt
from invoiceitem i
inner join invoice inv on inv.invoice_id = i.invoice_id
inner join product p on p.product_id = i.product_id
left join groups g on g.group_id = p.group_id
left join subgroup sg on sg.group_id = g.group_id and sg.subgroup_id = p.subgroup_id
left join variation v on v.group_id = sg.group_id and v.subgroup_id = sg.subgroup_id and v.variation_id = p.variation_id
group by
1,2
which returns three columns, the first one is the salesman id, the second is a sub select to get the sales quantity goal, and the third is the actual sales quantity.
Even grouping by the first and second columns, firebird throws an error when executing the query:
Invalid expression in the select list (not contained in either an aggregate function or the GROUP BY clause).
What's the reason for this?
There is a column "in the select list (not contained in either an aggregate function or the GROUP BY clause)". Namely each column you mention in your subselect other than inv.salesman_id. Such a column has many values per group. When there is a GROUP BY (or just a HAVING, implicitly grouping by all columns) a SELECT clause returns one row per group. There is no single value to return. So you want (as you put in an answer yourself):
group by
inv.salesman_id,
g.group_id,
sg.subgroup_id,
v.variation_id
OK guys i found the solution for this problem.
The thing is, if you have a sub query in a column which will be in the group by clause, the parameters inside this sub query must also appear in the group by. So in this case, all i had to do was:
group by
inv.salesman_id,
g.group_id,
sg.subgroup_id,
v.variation_id
And that's it. Hope it helps if someone has the same issue in the future.

How can I join 3 tables and calculate the correct sum of fields from 2 tables, without duplicate rows?

I have tables A, B, C. Table A is linked to B, and table A is linked to C. I want to join the 3 tables and find the sum of B.cost and the sum of C.clicks. However, it is not giving me the expected value, and when I select everything without the group by, it is showing duplicate rows. I am expecting the row values from B to roll up into a single sum, and the row values from C to roll up into a single sum.
My query looks like
select A.*, sum(B.cost), sum(C.clicks) from A
join B
left join C
group by A.id
having sum(cost) > 10
I tried to group by B.a_id and C.another_field_in_a also, but that didn't work.
Here is a DB fiddle with all of the data and the full query:
http://sqlfiddle.com/#!9/768745/13
Notice how the sum fields are greater than the sum of the individual tables? I'm expecting the sums to be equal, containing only the rows of the table B and C once. I also tried adding distinct but that didn't help.
I'm using Postgres. (The fiddle is set to MySQL though.) Ultimately I will want to use a having clause to select the rows according to their sums. This query will be for millions of rows.
If I understand the logic correctly, the problem is the Cartesian product caused by the two joins. Your query is a bit hard to follow, but I think the intent is better handled with correlated subqueries:
select k.*,
(select sum(cost)
from ad_group_keyword_network n
where n.event_date >= '2015-12-27' and
n.ad_group_keyword_id = 1210802 and
k.id = n.ad_group_keyword_id
) as cost,
(select sum(clicks)
from keyword_click c
where (c.date is null or c.date >= '2015-12-27') and
k.keyword_id = c.keyword_id
) as clicks
from ad_group_keyword k
where k.status = 2 ;
Here is the corresponding SQL Fiddle.
EDIT:
The subselect should be faster than the group by on the unaggregated data. However, you need the right indexes: ad_group_keyword_network(ad_group_keyword_id, ad_group_keyword_id, event_date, cost) and keyword_click(keyword_id, date, clicks).
I found this (MySQL joining tables group by sum issue) and created a query like this
select *
from A
join (select B.a_id, sum(B.cost) as cost
from B
group by B.a_id) B on A.id = B.a_id
left join (select C.keyword_id, sum(C.clicks) as clicks
from C
group by C.keyword_id) C on A.keyword_id = C.keyword_id
group by A.id
having sum(cost) > 10
I don't know if it's efficient though. I don't know if it's more or less efficient than Gordon's. I ran both queries and this one seemed faster, 27s vs. 2m35s. Here is a fiddle: http://sqlfiddle.com/#!15/c61c74/10
Simply split the aggregate of the second table into a subquery as follows:
http://sqlfiddle.com/#!9/768745/27
select ad_group_keyword.*, SumCost, sum(keyword_click.clicks)
from ad_group_keyword
left join keyword_click on ad_group_keyword.keyword_id = keyword_click.keyword_id
left join (select ad_group_keyword.id, sum(cost) SumCost
from ad_group_keyword join ad_group_keyword_network on ad_group_keyword.id = ad_group_keyword_network.ad_group_keyword_id
where event_date >= '2015-12-27'
group by ad_group_keyword.id
having sum(cost) > 20
) Cost on Cost.id=ad_group_keyword.id
where
(keyword_click.date is null or keyword_click.date >= '2015-12-27')
and status = 2
group by ad_group_keyword.id