How to avoid using aggregation function in another aggregation function in ClickHouse - sql

I want to get the average number of days between creation user profile ads_clients_data.create_date and the first day of its posts min(ads_data.time).
I wrote the following SQL statement:
select
avg(dateDiff(dd, ads_clients_data.create_date, ads_data.time),
min(ads_data.time))
from
ads_data
inner join
ads_clients_data on ads_clients_data.client_union_id = ads_data.client_union_id;
But it is impossible to use min in avg function. It is the first time I worked with ClickHouse and would be very thankful if someone can help me.

If I follow you correctly, you can use two levels of aggregation:
select avg(datediff(dd, c.create_date, d.min_time)
from (
select client_union_id, min(ads_data.time) min_time
from ads_data
group by client_union_id
) d
inner join ads_clients_data c on c.client_union_id = d.client_union_id;

Related

SQL BigQuery - Error that variable is not grouped by even though it is

SQL Code:
SELECT community_table.community_name,
community_table.id,
DATE(timestamp) as date,
ifnull(COUNT(distinct app_opened.user_id), 0) as num_opened_DAU,
lag(COUNT(distinct app_opened.user_id)) OVER
(ORDER BY community_table.community_name, community_table.id, DATE(timestamp)) as pre_Value
FROM *** app_opened
LEFT JOIN (
SELECT DISTINCT id, community_id_2, context_traits_first_name, context_traits_last_name
FROM (
SELECT *
FROM ***,
UNNEST (JSON_EXTRACT_ARRAY(context_traits_community_ids, "$")) as community_id_2
)
GROUP by community_id_2, id, context_traits_first_name, context_traits_last_name) as community_id_table
ON community_id_table.id = app_opened.user_id
LEFT JOIN (
SELECT DISTINCT id, name as community_name
FROM ***) as community_table
ON TO_JSON_STRING(community_table.id) = community_id_table.community_id_2
WHERE app_opened.user_id is not null AND
EXTRACT(DAYOFWEEK FROM DATE(timestamp)) = 2 AND
community_table.community_name is not null
GROUP BY community_table.community_name, community_table.id, DATE(timestamp)
Error Message:
I am quite confused on what could be going wrong here, as the error says that timestamp is not grouped, even though I have grouped it at the bottom. I tried including just timestamp rather than Date(timestamp), but that ruins the table data that I am trying to create, where I find the number of users on a single day. Does anyone have any other ideas? My goal is for a single row, get the previous row's data, but because I am grouping by specific metrics, I need to make sure they are ordered by them as well. Thank you so much!
I think you simply need to modify OVER part as:
OVER (PARTITION BY community_table.community_name, community_table.id, DATE(timestamp)) as pre_Value
UPDATE. Seems that the problem was caused by using DATE() function within OVER so it can be solved by using DATE(timestamp) inside of subquery and passing alias to OVER

Datediff and aggregate

I am new to SQL so please excuse my lack of knowledge. This is the table i have based on the following statement:
'select S_OPERATION.OPERATIONID, CHANGE_H.SERVICEREQNO, CHANGE_H.UPDATEDDATE
from sunrise.S_OPERATION inner join
CHANGE_H on S_OPERATION.OPERATIONID = CHANGE_H.OPERATIONID
where (S_OPERATION.OPERATIONID = 102005212) OR
(S_OPERATION.OPERATIONID = 102005218) or
(s_operation.operationid = 102005406) or
(s_operation.operationid = 102005401) or
(s_operation.operationid = 102005215)'
enter image description here
I would like to be able to calculate the time difference between events within the same job.
Please note: OperationID=event, Servicereqno=job
My end goal is to calculate the average time taken between each event and export this into a report, but i am having problems getting past the first hurdle.
I have tried the following statement however it does not work:
WITH cteOps AS
(
SELECT
row_number() OVER (PARTITION BY change.servicereqid ORDER BY change.updateddate) seqid,
updateddate,
servicereqid
FROM CHANGE.updateddate, CHANGE.addedby, S_OPERATION.operationid, CHANGE.servicereqid
)
SELECT
DATEDIFF(millisecond, o1.updateddate, o2.updateddate) updateddatediff,
servicereqid
FROM cteOps o1
JOIN cteOps o2 ON o1.seqid=o2.seqid+1 AND o1.servicereqid=o2.servicereqid;
Many thanks in advance.
Your two queries look quite different having different table names, etc. So you'd probably have to adjust my query below to match what you actually have.
You can look into the previous record with LAG. So a query showing all those events with a time difference to the previous one could be:
select
c.updateddate
, c.addedby
, so.operationid
, c.servicereqid
, so.updateddate
, datediff
( millisecond
, lag(so.updateddate) over (partition by c.servicereqid order by so.updateddate)
, so.updateddate
) as updateddatediff
from change c
inner join change_h ch
on c.servicereqid = ch.servicereqno
and ch.operationid in (102005212, 102005218, 102005406, 102005401, 102005215)
inner join s_operation so
on ch.operationid = so.operationid
order by
c.servicereqid,
so.updateddate;
You can build up on this by using it as a derived table (a subquery in a FROM clause).

Sum is not getting after grouping also in SQL Server

I have a query like this.
select
TC.F_Exhibition_Code, TC.F_Exhibition,
c.F_Customer_Code, c.F_Customer_Name,
c.F_Address, c.F_ContactPerson,
c.F_Phone, c.F_Fax,
Tc.F_CreditInvoiceNo, tc.F_CreditInvoiceDate,
TC.F_Paymentmethod, TC.F_Currency,
TC.F_Description, TC.F_Price,
TC.F_quanity, TC.F_ReceivedAmt, TC.F_Totalamt,
sum(TC.F_Totalamt) as sum
from
T_CreditInvoice TC
left join
T_Customer c on c.F_Customer_Code = tc.F_Customer_Code
where
TC.F_CreditInvoiceNo = 'INV100098'
group by
TC.F_Exhibition_Code, TC.F_Exhibition, c.F_Customer_Code, c.F_Customer_Name,
c.F_Address, c.F_ContactPerson, c.F_Phone, c.F_Fax,
Tc.F_CreditInvoiceNo, tc.F_CreditInvoiceDate, TC.F_Paymentmethod,
TC.F_Currency, TC.F_Description, TC.F_Price,
TC.F_quanity, TC.F_ReceivedAmt, TC.F_Totalamt
I trying to get sum of my F_Totllamt column, but I am not getting it.
What is wrong with my query?
You can use window function and write as:
sum(TC.F_Totalamt) OVER (PARTITION BY TC.F_CreditInvoiceNo ORDER BY (SELECT 1))as sum
in existing query.. hope it helps!!
Demo
You're grouping on the variable you want to get the sum of. Hence you get the sum of every single "TC.F_Totalamt" because you're also grouping over that same variable.
Just remove "TC.F_Totalamt" from the "group by" statement and it will work.

Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause

I'm trying to select a bunch of patients with their unit and division and I want to group the result by unit name, but this code doesn't execute and gives the error as the topic of this question.
SELECT TOP (100) PERCENT
Pat.PatName AS Name,
srvDiv.sgMType AS Perkhidmatan,
Pat.PatMRank AS Pangkat,
Pat.PatMilitaryID AS [No# Tentera],
unt.untName AS Unit,
fct.pesStatusCode as StatusCode,
fct.pesSignedDate AS SignedDate
FROM dbo.FactPES AS fct INNER JOIN
dbo.DimPatient AS Pat ON fct.pesPatID = Pat.PatID LEFT OUTER JOIN
dbo.DimUnit AS unt ON fct.pesUnitID = unt.untID LEFT OUTER JOIN
dbo.DimServiceDiv AS srvDiv ON fct.pesServiceDivID = srvDiv.sgID
GROUP BY unt.untName
HAVING (deas.diDate BETWEEN
CONVERT(DATETIME, #FromDate, 102)
AND
CONVERT(DATETIME, #ToDate, 102))
I assume it's because unt.UntName is in my left join so I can't use it outside the join maybe ? I'm a bit confused because when I put it like this it works:
GROUP BY unt.untName, Pat.PatName, srvDiv.sgMType,
Pat.PatMRank, Pat.PatMilitaryID, unt.untName,
fct.pesStatusCode, fct.pesSignedDate
Any help is appreciated
First, please don't use TOP (100) PERCENT; it hurts even to read.
Second, your query contains no aggregate function, no SUM or COUNT for example. When you say you want to "group by unit name", I suspect you may simply want the results sorted by unit name. In that case, you want ORDER BY instead. (The advice from other to study what group by does is well taken.)
Finally, you might not need those CONVERT functions at the end, depending on your DBMS.
Whenever you use a GROUP BY - it should be present in the SELECT statement as a column. And if you do not want to contain it in a GROUP BY use it as an AGGREGATE column in SELECT.
So now in your case the second GROUP BY stated in your question will work.
Read this to understand more about GROUP BY

MySQL to PostgreSQL: GROUP BY issues

So I decided to try out PostgreSQL instead of MySQL but I am having some slight conversion problems. This was a query of mine that samples data from four tables and spit them out all in on result.
I am at a loss of how to convey this in PostgreSQL and specifically in Django but I am leaving that for another quesiton so bonus points if you can Django-fy it but no worries if you just pure SQL it.
SELECT links.id, links.created, links.url, links.title, user.username, category.title, SUM(votes.karma_delta) AS karma, SUM(IF(votes.user_id = 1, votes.karma_delta, 0)) AS user_vote
FROM links
LEFT OUTER JOIN `users` `user` ON (`links`.`user_id`=`user`.`id`)
LEFT OUTER JOIN `categories` `category` ON (`links`.`category_id`=`category`.`id`)
LEFT OUTER JOIN `votes` `votes` ON (`votes`.`link_id`=`links`.`id`)
WHERE (links.id = votes.link_id)
GROUP BY votes.link_id
ORDER BY (SUM(votes.karma_delta) - 1) / POW((TIMESTAMPDIFF(HOUR, links.created, NOW()) + 2), 1.5) DESC
LIMIT 20
The IF in the select was where my first troubles began. Seems it's an IF true/false THEN stuff ELSE other stuff END IF yet I can't get the syntax right. I tried to use Navicat's SQL builder but it constantly wanted me to place everything I had selected into the GROUP BY and that I think it all kinds of wrong.
What I am looking for in summary is to make this MySQL query work in PostreSQL. Thank you.
Current Progress
Just want to thank everybody for their help. This is what I have so far:
SELECT links_link.id, links_link.created, links_link.url, links_link.title, links_category.title, SUM(links_vote.karma_delta) AS karma, SUM(CASE WHEN links_vote.user_id = 1 THEN links_vote.karma_delta ELSE 0 END) AS user_vote
FROM links_link
LEFT OUTER JOIN auth_user ON (links_link.user_id = auth_user.id)
LEFT OUTER JOIN links_category ON (links_link.category_id = links_category.id)
LEFT OUTER JOIN links_vote ON (links_vote.link_id = links_link.id)
WHERE (links_link.id = links_vote.link_id)
GROUP BY links_link.id, links_link.created, links_link.url, links_link.title, links_category.title
ORDER BY links_link.created DESC
LIMIT 20
I had to make some table name changes and I am still working on my ORDER BY so till then we're just gonna cop out. Thanks again!
Have a look at this link GROUP BY
When GROUP BY is present, it is not
valid for the SELECT list expressions
to refer to ungrouped columns except
within aggregate functions, since
there would be more than one possible
value to return for an ungrouped
column.
You need to include all the select columns in the group by that are not part of the aggregate functions.
A few things:
Drop the backticks
Use a CASE statement instead of IF() CASE WHEN votes.use_id = 1 THEN votes.karma_delta ELSE 0 END
Change your timestampdiff to DATE_TRUNC('hour', now()) - DATE_TRUNC('hour', links.created) (you will need to then count the number of hours in the resulting interval. It would be much easier to compare timestamps)
Fix your GROUP BY and ORDER BY
Try to replace the IF with a case;
SUM(CASE WHEN votes.user_id = 1 THEN votes.karma_delta ELSE 0 END)
You also have to explicitly name every column or calculated column you use in the GROUP BY clause.