Group by function in sql - sql

It is throwing a error for the group by function. how do i fix it
SELECT TO_CHAR(TRANSBDATE,'MON-YYYY')PERIOD,B.CURCODE,COUNT(DISTINCT CUSTOMERID) UNIQUECUSTOMERS,COUNT(CURCODE)TRANS
FROM FX_TRANSHEADER A,FX_TRANSDETAIL B
WHERE A.TRANSNO = B.TRANSNO AND A.TRANSBDATE BETWEEN '01-JAN-2018' AND '30-JUN-2019' AND C_IDTYPE NOT IN ('CR')
ORDER BY 1 , 4
GROUP BY TO_CHAR(TRANSBDATE,'MON-YYYY')PERIOD,COUNT(DISTINCT CUSTOMERID),COUNT(CURCODE)

The GROUP BY only needs the fields on which you want to aggregate:
GROUP BY TO_CHAR(TRANSBDATE,'MON-YYYY')PERIOD, B.CURCODE
So not on COUNT(B.CURCODE) but on CURCODE

SELECT TO_CHAR(TRANSBDATE,'MON-YYYY')PERIOD,B.CURCODE,COUNT(DISTINCT CUSTOMERID) UNIQUECUSTOMERS,COUNT(CURCODE)TRANS
FROM FX_TRANSHEADER A,FX_TRANSDETAIL B
WHERE A.TRANSNO = B.TRANSNO AND A.TRANSBDATE BETWEEN '01-JAN-2018' AND '30-JUN-2019' AND C_IDTYPE NOT IN ('CR')
GROUP BY TO_CHAR(TRANSBDATE,'MON-YYYY'),B.CURCODE
ORDER BY 1 , 4

You cannot include aggregation functions in the GROUP BY. You only want the unaggregated columns from the SELECT.
I would recommend writing the query as:
SELECT TO_CHAR(H.TRANSBDATE,'MON-YYYY') as PERIOD,
D.CURCODE,
COUNT(DISTINCT ?.CUSTOMERID) as UNIQUECUSTOMERS,
COUNT(*) as TRANS
FROM FX_TRANSHEADER H JOIN
FX_TRANSDETAIL D
ON H.TRANSNO = D.TRANSNO
WHERE H.TRANSBDATE >= DATE '2018-01-01' AND
H.TRANSBDATE < DATE '2018-07-01' AND
?.C_IDTYPE NOT IN ('CR')
ORDER BY 1 , 4
GROUP BY TO_CHAR(H.TRANSBDATE, 'MON-YYYY');
The ? is for the column aliases for the tables that have the specified columns.
Notes:
Never use commas in the FROM clause. Always use proper, explicit, standard JOIN syntax.
Use meaningful table aliases (table abbreviations) rather than arbitrary letters.
Use DATE to represent date literals, so your code is not tied to particular internationalization settings on your server.
Replaced BETWEEN with two direct comparisons. This is particularly important in databases such as Oracle, where the DATE data type has a time component.

Related

null result for average number of days by month

select a.clientid, a.CaseType, b.EnrollmentStartDate, a.EligibilityStartDate, datediff(day, a.EligibilityStartDate, b.EnrollmentStartDate) as date_diff
INTO ##temptable1
FROM dbo.Client a, dbo.ClientEnrollment b
WHERE a.ClientId = b.ClientId
AND a.CaseType = 99
ORDER BY a.ClientId
select avg (date_diff) from ##temptable1
so the above query gives me the overall average number of days it takes for a client to enroll into a program from their eligibility start date. I now want to sort the results by each month
select avg (date_diff) from ##temptable1
where EligibilityStartDate = '2019-03-01
for some reason I'm getting NULL no matter what date I specify ( even though the original query produces over 40k results ) I've tried inserting EligibilityStartDate = '2019-03-01' into the table itself but that did not work either.
Presumably, you want something like this:
SELECT YEAR(c.EligibilityStartDate) as yyyy,
MONTH(c.EligibilityStartDate) as mm,
AVG(DATEDIFF(DAY, c.EligibilityStartDate, ce.EnrollmentStartDate) as date_diff
FROM dbo.Client c JOIN
dbo.ClientEnrollment ce
ON c.ClientId = ce.ClientId AND c.CaseType = 99
GROUP BY YEAR(c.EligibilityStartDate), MONTH(c.EligibilityStartDate)
ORDER BY YEAR(c.EligibilityStartDate), MONTH(c.EligibilityStartDate);
Notes:
Never use commas in the FROM clause.
Always use proper, explicit, standard JOIN syntax.
Use meaningful table aliases (i.e. abbreviations of table names) rather than meaningless ones.
You seem to want an aggregation query.

Oracle SQL - return multiple columns from subquery

Let's take a simple query in Oracle:
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED
FROM
CASE
WHERE
CASE.DATE_RAISED > '2019-01-01'
Now let's say another table, EVENT, contains multiple events which may be associated with each case (linked via EVENT.CASE_ID). OR not exist at all. I want to report on the earliest-dated future event per case - or if nothing exists, return NULL. I can do this with a subquery in the SELECT clause, as follows:
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED,
(
SELECT
MIN(EVENT.DATE)
FROM
EVENT
WHERE
EVENT.CASE_ID = CASE.ID
AND EVENT.DATE >= CURRENT_DATE
) AS MIN_EVENT_DATE
FROM
CASE
WHERE
CASE.DATE_RAISED > '2019-01-01'
This will return a table like this:
Case ID Case Type Date Raised Min Event Date
76 A 03/01/2019 10/05/2019
43 B 02/02/2019 [NULL]
89 A 29/01/2019 08/07/2019
90 A 04/03/2019 [NULL]
102 C 15/04/2019 20/05/2019
Note that if there do not exist any Events which match the criteria, the line is still returned but without a value. This is because the subquery is in the SELECT clause. This works just fine.
My problem, however, is if I want to return more than one column from the EVENT table - while still at the same time preserving the possibility that there are no matching rows from the EVENT table. The above code only returns EVENT.DATE as the single subquery result, to ONE column of the main query. But what if I also want to return EVENT.ID, or EVENT.TYPE? While still allowing for them to be NULL (if no matching records from CASE are found)?
I suppose I could use multiple subqueries in the SELECT clause: each returning just ONE column. But this seems horribly inefficient, given that each subquery would be based on the same criteria (the minimum-dated EVENT whose CASE ID matches that of the main query; or NULL if no such events found).
I suspect some nifty joins would be the answer - although I'm struggling to understand which ones exactly.
Please note that the above examples are vastly simplified versions of my actual code, which already contains multiple joins in the "old style" Oracle format, eg:
WHERE
CASE.ID(+) = EVENT.CASE_ID
There are reasons why this is so - therefore a request to anyone answering this, please would you demonstrate any solutions in this style of coding, as my SQL isn't far enough advanced to be able to re-factor the "newer" style joins into existing code.
You can use a join and window functions. For instance:
select c.*, e.*
from c left join
(select e.*,
row_number() over (partition by e.case_id order by e.date desc) as seqnum
from events e
) e
on e.case_id = c.id and e.seqnum = 1;
where c.date_raised > date '2019-01-01'; -- assuming the value is a date
Is this what you mean? I just rewrote Gordon's answer with old Oracle join syntax and your code style.
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED,
MIN_E.DATE AS MIN_EVENT_DATE
FROM
CASE,
(SELECT EVENT.*,
ROW_NUMBER() OVER (PARTITION BY EVENT.CASE_ID ORDER BY EVENT.DATE DESC) AS SEQNUM
FROM
EVENT
WHERE
EVENT.DATE >= CURRENT_DATE
) MIN_E
WHERE
CASE.DATE_RAISED > DATE '2019-01-01'
AND MIN_E.CASE_ID (+) = CASE.ID
AND MIN_E.SEQNUM (+) = 1;
Create object type with columns you want and return it from subquery. Your query will be like
SELECT
CASE.ID,
CASE.TYPE,
CASE.DATE_RAISED,
(
SELECT
t_your_new_type ( MIN(EVENT.DATE) , min ( EVENT.your_another_column ) )
FROM
EVENT
WHERE
EVENT.CASE_ID = CASE.ID
AND EVENT.DATE >= CURRENT_DATE
) AS MIN_EVENT_DATE
FROM
CASE
WHERE
CASE.DATE_RAISED > '2019-01-01'

sql oracle group by subquery

I get the same ecommerce number for each date. I am trying to get ecommerce value count depending on the date, which is different for each date as the total number is only 105 for all October, not 391958.
Any idea how to group by the output of a subquery?
Thank you!
SELECT to_char(wcs1.start_tms,'DD/MM/YYYY') as dates,
(
SELECT count(*)
FROM ft_t_wcs1 wcs1,ft_t_stup stup
WHERE stup.modl_id='ECOMMERC'
AND stup.CROSS_REF_ID=wcs1.acct_id
AND stup.end_tms IS NULL
) AS ecommerce
FROM ft_t_wcs1 wcs1, ft_t_stup stup
WHERE wcs1.scenario='CREATE'
AND wcs1.acct_id IS NOT NULL
AND wcs1.start_tms BETWEEN add_months(TRUNC(SYSDATE,'mm'),-1) AND LAST_DAY(add_months(TRUNC(SYSDATE,'mm'),-1))
GROUP BY to_char(wcs1.start_tms,'DD/MM/YYYY')
ORDER BY to_char(wcs1.start_tms,'DD/MM/YYYY');
OUTPUT
Try below modified queries
select to_char(wcs1.start_tms,'DD/MM/YYYY') as dates,count(*) AS
ecommerce
from ft_t_wcs1 wcs1, ft_t_stup stup
where stup.modl_id='ECOMMERC' and stup.CROSS_REF_ID=wcs1.acct_id and stup.end_tms is null wcs1.scenario='CREATE' and wcs1.acct_id is not null and
wcs1.start_tms between add_months(TRUNC(SYSDATE,'mm'),-1) and
LAST_DAY(add_months(TRUNC(SYSDATE,'mm'),-1))
group by to_char(wcs1.start_tms,'DD/MM/YYYY')
order by to_char(wcs1.start_tms,'DD/MM/YYYY');
-- Another way using JOIN clause
select to_char(wcs1.start_tms,'DD/MM/YYYY') as dates,count(*) AS
ecommerce
from ft_t_wcs1 wcs1
join ft_t_stup stup
ON stup.CROSS_REF_ID=wcs1.acct_id
where stup.modl_id='ECOMMERC' and stup.end_tms is null wcs1.scenario='CREATE' and wcs1.acct_id is not null and
wcs1.start_tms between add_months(TRUNC(SYSDATE,'mm'),-1) and
LAST_DAY(add_months(TRUNC(SYSDATE,'mm'),-1))
group by to_char(wcs1.start_tms,'DD/MM/YYYY')
order by to_char(wcs1.start_tms,'DD/MM/YYYY');
It's hard to suggest an answer without understanding your table relationship, but I can tell that your problem is there is no relationship between your subquery and your main query. Your subquery simply returns a count where modl_id='ECOMMERC', so that value will always be the same - in your case, 105. You need to add a JOIN criteria to the subquery that ties the unique value to your main query. You'll also want to alias the table names differently to ensure you're joining correctly.
You are doing unnecessary joins when you just want a correlated subquery:
SELECT to_char(wcs1.start_tms,'DD/MM/YYYY') as dates,
(SELECT count(*)
FROM ft_t_stup stup
WHERE stup.modl_id= 'ECOMMERC' AND
stup.CROSS_REF_ID = wcs1.acct_id
stup.end_tms IS NULL
) AS ecommerce
FROM ft_t_wcs1 wcs1
WHERE wcs1.scenario = 'CREATE' AND
wcs1.acct_id IS NOT NULL AND
wcs1.start_tms BETWEEN add_months(TRUNC(SYSDATE,'mm'),-1) AND LAST_DAY(add_months(TRUNC(SYSDATE,'mm'),-1))
GROUP BY to_char(wcs1.start_tms, 'DD/MM/YYYY')
ORDER BY to_char(wcs1.start_tms, 'DD/MM/YYYY');

How to get datetime duplicate rows in SQL Server?

Im trying to find duplicate DATETIME rows in a table,
My column has datetime values such as 2015-01-11 11:24:10.000.
I must get the duplicates in 2015-01-11 11:24 type. Rest of it, not important. I can get the right value when I use SELECT with 'convert(nvarchar(16),column,121)', but when I put this in my code, I have to use 'group by' statement, so
My code is:
SELECT ID,
RECEIPT_BARCODE,
convert(nvarchar(16),TRANS_DATE,121),
PTYPE
FROM TRANSACTION_HEADER
WHERE TRANS_DATE BETWEEN '11.01.2015' AND '12.01.2015'
GROUP BY ID,RECEIPT_BARCODE,convert(nvarchar(16),TRANS_DATE,121),PTYPE
HAVING COUNT(convert(nvarchar(16),TRANS_DATE,121)) > 1
Since SQL forces me to use 'convert(nvarchar(16),TRANS_DATE,121)' in GROUP BY statement, I can't get the duplicate values.
Any idea for this?
Thanks in advance.
If you want the actual rows that are duplicated, then use window functions instead:
SELECT th.*, convert(nvarchar(16),TRANS_DATE,121)
FROM (SELECT th.*, COUNT(*) OVER (PARTITION BY convert(nvarchar(16),TRANS_DATE,121)) as cnt
FROM TRANSACTION_HEADER th
WHERE TRANS_DATE BETWEEN '11.01.2015' AND '12.01.2015'
) th
WHERE cnt > 1;
SELECT ID,RECEIPT_BARCODE,convert(nvarchar(16),TRANS_DATE,121), PTYPE ,COUNT(*)
FROM TRANSACTION_HEADER
WHERE TRANS_DATE BETWEEN '11.01.2015' AND '12.01.2015'
GROUP ID,RECEIPT_BARCODE,convert(nvarchar(16),TRANS_DATE,121), PTYPE
HAVING COUNT(*)>1;
I think you can use count(*) directly here.try the above one.

Trying to select multiple columns on an inner join query with group and where clauses

I'm trying to run a query where it will give me one Sum Function, then select two columns from a joined table and then to group that data by the unique id i gave them. This is my original query and it works.
SELECT Sum (Commission_Paid)
FROM [INTERN_DB2].[dbo].[PaymentList]
INNER JOIN [INTERN_DB2]..[RealEstateAgentList]
ON RealEstateAgentList.AgentID = PaymentList.AgentID
WHERE Close_Date >= '1/1/2013' AND Close_Date <= '12/31/2013'
GROUP BY RealEstateAgentList.AgentID
I've tried the query below, but I keep getting an error and I don't know why. It says its a syntax error.
SELECT Sum (Commission_Paid)
FROM [INTERN_DB2].[dbo].[PaymentList]
INNERJOIN [INTERN_DB2]..[RealEstateAgentList](
Select First_Name, Last_Name
From [Intern_DB2]..[RealEstateAgentList]
Group By Last_name
)
ON RealEstateAgentList.AgentID = PaymentList.AgentID
WHERE Close_Date >= '1/1/2013' AND Close_Date <= '12/31/2013'
GROUP BY RealEstateAgentList.AgentID
Your query has multiple problems:
SELECT rl.AgentId, rl.first_name, rl.last_name, Sum(Commission_Paid)
FROM [INTERN_DB2].[dbo].[PaymentList] pl inner join
(Select agent_id, min(first_name) as first_name, min(last_name) as last_name
From [Intern_DB2]..[RealEstateAgentList]
GROUP BY agent_id
) rl
ON rl.AgentID = pl.AgentID
WHERE Close_Date >= '2013-01-01' AND Close_Date <= '2013-12-31'
GROUP BY rl.AgentID, rl.first_name, rl.last_name;
Here are some changes:
INNERJOIN --> inner join.
Fixed the syntax of the subquery next to the table name.
Removed columns for first and last name. They are not used.
Changed the subquery to include agent_id.
Added agent_id, first_name, and last_name to the outer aggregation, so you can tell where the values are coming from.
Changed the date formats to a less ambiguous standard form.
Added table alias for subquery.
I suspect the subquery on the agent list is not important. You can probably do:
SELECT rl.AgentId, rl.first_name, rl.last_name, Sum(pl.Commission_Paid)
FROM [INTERN_DB2].[dbo].[PaymentList] pl inner join
[Intern_DB2]..[RealEstateAgentList] rl
ON rl.AgentID = pl.AgentID
WHERE pl.Close_Date >= '2013-01-01' AND pl.Close_Date <= '2013-12-31'
GROUP BY rl.AgentID, rl.first_name, rl.last_name;
EDIT:
I'm glad this solution helped. As you continue to write queries, try to always do the following:
Use table aliases that are abbreviations of the table names.
Always use table aliases when referring to columns.
When using date constants, either use "YYYY-MM-DD" format or use convert() to convert a string using the specified format. (The latter is actually the safer method, but the former is more convenient and works in almost all databases.)
Pay attention to the error messages; they can be informative in SQL Server (unfortunately, other databases are not so clear).
Format your query so other people can understand it. This will help you understand and debug your queries as well. I have a very particular formatting style (which no one is going to change at this point); the important thing is not the particular style but being able to "see" what the query is doing. My style is documented in my book "Data Analysis Using SQL and Excel.
There are other rules, but these are a good way to get started.
SELECT Sum (Commission_Paid)
FROM [INTERN_DB2].[dbo].[PaymentList] pl
INNER JOIN (
Select First_Name, Last_Name
From [Intern_DB2]..[RealEstateAgentList]
Group By Last_name
) x ON x.AgentID = pl.AgentID
WHERE Close_Date >= '1/1/2013'
AND Close_Date <= '12/31/2013'
GROUP BY RealEstateAgentList.AgentID
This is how the query should look... however, if you subquery first and last name, you'll also have to include them in the group by. Assuming Close_Date is in the PaymentList table, this is how I would write the query:
SELECT
al.AgentID,
al.FirstName,
al.LastName,
Sum(pl.Commission_Paid) AS Commission_Paid
FROM [INTERN_DB2].[dbo].[PaymentList] pl
INNER JOIN [Intern_DB2].dbo.[RealEstateAgentList] al ON al.AgentID = pl.AgentID
WHERE YEAR(pl.Close_Date) = '2013'
GROUP BY al.AgentID, al.FirstName, al.LastName
Subqueries are evil, for the most part. There's no need for one here, because you can just get the columns from the join.