Group By Creates Duplicate Rows

Group By Creates Duplicate Rows - sql

I am using Oracle sql to create a sample data GridView and run into a very basic issue. So here it's, I've to organize data month-wise, say no of employees in a month based on a status column. So status = 0; Jan1 and status > 0; Jan2. I am not elaborating anything else as it has already a built-in view and that's what I've to use to make it work. So here is the query that I am using and the sample output that works fine except one:
SELECT DISTINCT SYEAR, DEPT_NAME,
--Month-wise data - Starts
DECODE ( upper((MONTHNAMESHORT)), 'JAN', NVL((FirstLetter), 0), NULL) "JAN1" ,
DECODE ( upper((MONTHNAMESHORT)), 'JAN', NVL((SecondLetter), 0), NULL) "JAN2",
DECODE ( upper((MONTHNAMESHORT)), 'FEB', NVL((FirstLetter), 0), NULL) "FEB1" ,
DECODE ( upper((MONTHNAMESHORT)), 'FEB', NVL((SecondLetter), 0), NULL) "FEB2"
--Month-wise data - Ends
FROM
--Sub-query - starts
(SELECT DISTINCT VWWEBLETTERSTATUS2.SYEAR, MONTHRANK.MONTHNAMESHORT,VWWEBLETTERSTATUS2.DEPT_NAME,
nvl(fnfirstletter(DEPT_NAME,upper(MONTHRANK.MONTHNAMESHORT),VWWEBLETTERSTATUS2.SYEAR),0) FirstLetter,
nvl(fnSecondLetter(DEPT_NAME,upper(MONTHRANK.MONTHNAMESHORT),VWWEBLETTERSTATUS2.SYEAR),0) SecondLetter,MONTHRANK.RANK
FROM
MONTHRANK,VWWEBLETTERSTATUS2 where VWWEBLETTERSTATUS2.SYEAR = '2018' AND
nvl(fnfirstletter(DEPT_NAME,upper(MONTHRANK.MONTHNAMESHORT),VWWEBLETTERSTATUS2.SYEAR), 0) <> 0 AND
nvl(fnSecondLetter(DEPT_NAME,upper(MONTHRANK.MONTHNAMESHORT),VWWEBLETTERSTATUS2.SYEAR), 0) <> 0
order by DEPT_NAME, rank) q
--Sub-query - Ends
GROUP BY SYEAR, (MONTHNAMESHORT), DEPT_NAME; --Issue here - For the month-wise group by
Output
Year Dept Jan1 Jan2 Feb1 Feb2
2018 UNIT-I3 93 87
2018 UNIT-I5 62 66
2018 QA 0 0
2018 UNIT-I5 87 66
Here for the GROUP BY (MONTHNAMESHORT) clause, it creates duplicate rows for the department and that specific year. Say when Unit-I5 has data for both the months, it creates separate rows though it should be in a single row.
Any way to overcome the issue keeping the same thing, just an alternate for the GROUP BY?
Update 1: Even tried this one, but didn't work
SUM(CASE WHEN Q.MONTHNAMESHORT = 'JAN' THEN Q.FirstLetter ELSE 0 END) "JAN1",
SUM(CASE WHEN Q.MONTHNAMESHORT = 'JAN' THEN Q.SecondLetter ELSE 0 END) "JAN2"
N.B: FirstLetter and SecondLetter are counted in the view.

SELECT DISTINCT is almost never appropriate with GROUP BY.
Your problem is that you are including (MONTHNAMESHORT) in the GROUP BY.
Your query is very difficult to decipher. But it should look something like this:
SELECT SYEAR, DEPT_NAME,
SUM(CASE WHEN upper(MONTHNAMESHORT) = 'JAN' THEN FirstLetter END) as "JAN1" ,
SUM(CASE WHEN upper(MONTHNAMESHORT) = 'JAN' THEN SecondLetter END) as "JAN2" ,
SUM(CASE WHEN upper(MONTHNAMESHORT) = 'FEB' THEN FirstLetter END) as "FEB1" ,
SUM(CASE WHEN upper(MONTHNAMESHORT) = 'FEB' THEN SecondLetter END) as "FEB2"
FROM . . .
GROUP BY SYEAR, DEPT_NAME;

Related

Query to show all months and show values where there are data for the corresponding months

I have a query and it shows the months where there is corresponding data. However, I would like to show all of the months in the year and have the months where there are no data shown as zero.
There is my SQL Statement:
SELECT DATENAME(MONTH, hb_Disputes.OPENED) AS MonthValue,
COUNT(CASE WHEN REV_CLS = 2 THEN 1 END) AS SmallCommercialIndust,
COUNT(CASE WHEN REV_CLS <> 2 THEN 1 END) AS Residential
FROM hb_Disputes
WHERE (hb_Disputes.ASSGNTO = 'E099255') AND (YEAR(hb_Disputes.OPENED) = YEAR(GETDATE()))
GROUP BY hb_Disputes.OPENED
And this is my output:
I also have a table name MonthName that shows all of the months in a year and I know I may need to use this to accomplish what I'm trying to achieve but I'm not sure how to get there:

If you have data in the table for all months, but the where clause is filtering it out, then the simplest method is to extend the conditional aggregation:
SELECT DATENAME(MONTH, d.OPENED) AS MonthValue,
SUM(CASE WHEN d.ASSGNTO = 'E099255' AND d.REV_CLS = 2 THEN 1 ELSE 0 END) AS SmallCommercialIndust,
SUM(CASE WHEN d.ASSGNTO = 'E099255' AND d.REV_CLS <> 2 THEN 1 ELSE 0 END) AS Residential
FROM hb_Disputes d
WHERE YEAR(d.OPENED) = YEAR(GETDATE())
GROUP BY DATENAME(MONTH, d.OPENED)
ORDER BY MIN(d.OPENED);
Note: This does not fix the issue in all cases. It should just be a simple way to modify your query -- and will often work.

SQL - Group data with same ID and Date that has been to every Machine but has a different Name

I am trying to create a query that will group data by CT ID and Date that have all 3 MachineID's (1, 10, and 20) and at least one different Sawing Pattern Name.
This Image shows a highlighted example of the data I'm trying to get back and the code i'm currently using
I'm trying to only show data similar to the highlighted rows in the image (CT ID 501573833) and exclude the data in the rows around it where the Sawing Pattern Name is the same at all 3 MachineID's.

Your description suggests group by and having. The conditions you describe can all go in the having clause:
select ct_id, date
from t
group by ct_id, date
having sum(case when machineid = 1 then 1 else 0 end) > 0 and
sum(case when machineid = 10 then 1 else 0 end) > 0 and
sum(case when machineid = 20 then 1 else 0 end) > 0 and
min(sawing_pattern_name) <> max(sawing_pattern_name)

Seems to me that an EXISTS could be useful here.
SELECT
[CT ID],
[MachineID],
[Sawing Pattern name],
[Time],
CAST([Time] AS DATE) AS [Date]
FROM [DataCollector].[dbo].[Maxicut] t
WHERE EXISTS
(
SELECT 1
FROM [DataCollector].[dbo].[Maxicut] d
WHERE d.[CT ID] = t.[CT ID]
AND CAST(d.[Time] AS DATE) = CAST(t.[Time] AS DATE)
AND d.[MachineID] != t.[MachineID]
AND REPLACE(d.[Sawing Pattern name],',','') != REPLACE(t.[Sawing Pattern name],',','')
);

SQL Efficiency on Date Range or Separate Tables

I'm calculating historical amount from a table in years(ex. 2015-2016, 2014-2015, etc.) I would like to seek expertise if its more efficient to do it in one batch or repeat the query multiple times filtered by the date required.
Thanks in advance!
OPTION 1:
select
id,
sum(case when year(getdate()) - year(txndate) between 5 and 6 then amt else 0 end) as amt_6_5,
...
sum(case when year(getdate()) - year(txndate) between 0 and 1 then amt else 0 end) as amt_1_0,
from
mytable
group by
id
OPTION 2:
select
id, sum(amt) as amt_6_5
from
mytable
group by
id
where
year(getdate()) - year(txndate) between 5 and 6
...
select
id, sum(amt) as amt_1_0
from
mytable
group by
id
where
year(getdate()) - year(txndate) between 0 and 1

1.
Unless you have resources issues I would go with the CASE version.
Although it has no impact on the results, filtering on the requested period in the WHERE clause might have a significant performance advantage.
2. Your period definition creates overlapping.
select id
,sum(case when year(getdate()) - year(txndate) = 6 then amt else 0 end) as amt_6
-- ...
,sum(case when year(getdate()) - year(txndate) = 0 then amt else 0 end) as amt_0
where txndate >= dateadd(year, datediff(year,0, getDate())-6, 0)
from mytable
group by id

This may be help you,
WITH CTE
AS
(
SELECT id,
(CASE WHEN year(getdate()) - year(txndate) BETWEEN 5 AND 6 THEN 'year_5-6'
WHEN year(getdate()) - year(txndate) BETWEEN 4 AND 5 THEN 'year_4-5'
...
END) AS my_year,
amt
FROM mytable
)
SELECT id,my_year,sum(amt)
FROM CTE
GROUP BY id,my_year
Here, inside the CTE, just assigned a proper year_tag for each records (based on your conditions), after that select a summary for the CTE grouped by that year_tag.

Sql ISNULL condition in Sql Pivot and Sql case

I searched for many solutions on SO and elsewhere but couldn't quite understand how to write a query for my problem.
Anyway my query looks like below
SELECT * FROM
(
SELECT Id, Date, Name, Amount,
CASE
WHEN DATEDIFF(DAY,Date,GETDATE()) <=0
THEN 'Current'
WHEN DATEDIFF(DAY,Date,GETDATE()) <30
THEN 'Due30'
WHEN DATEDIFF(DAY,Date,GETDATE()) <60
THEN 'Due60'
ELSE 'Due90'
END AS [Age]
FROM Statement
WHERE (Amount <> 0)
) AS S
PIVOT
(
SUM(Amount)
FOR[Age] IN ([Current],[Due30],[Due60],[Due90])
) P
and the result looks like this
Id Date Name Current Due30 Due60 Due90
----------- ---------- --------------------------------------------
1 2016-04-03 Alan NULL NULL NULL 110.00
2 2016-05-02 TC NULL NULL 30.00 NULL
where should i insert IsNull condition to be able to remove the null in the result and add a zero there.
I tried inserting IsNull in the pivot query but we all know that is not meant to work

You have to add it repetitively in the final SELECT, when you replace the SELECT * (which should only exist in ad-hoc queries or EXISTS tests) with the column list:
SELECT
Id,
Date,
Name,
COALESCE([Current],0) as [Current],
COALESCE(Due30,0) as Due30,
COALESCE(Due60,0) as Due60,
COALESCE(Due90,0) as Due90
FROM
(
SELECT Id, Date, Name, Amount,
CASE
WHEN DATEDIFF(DAY,Date,GETDATE()) <=0
THEN 'Current'
WHEN DATEDIFF(DAY,Date,GETDATE()) <30
THEN 'Due30'
WHEN DATEDIFF(DAY,Date,GETDATE()) <60
THEN 'Due60'
ELSE 'Due90'
END AS [Age]
FROM Statement
WHERE (Amount <> 0)
) AS S
PIVOT
(
SUM(Amount)
FOR[Age] IN ([Current],[Due30],[Due60],[Due90])
) P
I've also used COALESCE since it's generally the preferred option (ANSI standard, extends to more than two arguments, applies normal type precedence rules) instead of ISNULL.

SELECT Id
, [Date]
, Name
, [Current] = SUM(CASE WHEN val <= 0 THEN Amount ELSE 0 END)
, Due30 = SUM(CASE WHEN val < 30 THEN Amount ELSE 0 END)
, Due60 = SUM(CASE WHEN val < 60 THEN Amount ELSE 0 END)
, Due90 = SUM(CASE WHEN val >= 60 THEN Amount ELSE 0 END)
FROM dbo.[Statement] t
CROSS APPLY (
SELECT val = DATEDIFF(DAY, [Date], GETDATE())
) s
WHERE Amount <> 0
GROUP BY Id, [Date], Name

Annual Count by Criteria

I am working on a project with our HR department.
I have a table called [EEMaster] that keeps a record of the Active/Termed employees.
It is updated from a flat File using a Slowly Changing Dimension.
At the end of the year I need a count of the number of Active employees and the number of termed employees and then the year.
Here is an example of the data I need returned annually.
| 2010 | 2011 | 2012 | 2013 |
HistoricalHC | 447 | 419 | 420 | 418 |
NumbTermEmp | 57 | 67 | 51 | 42 |
I currently have the data connected to an excel spreadsheet providing a rolling count by Division. I use the following columns from the [EEMaster] for it.
ChangeStatus (1/0 from the SCD)
EmpStatusName ("Active" for current employees and "Withdrawn" for Termed Employees)
HireYear (set to All in the pivot table)
Term Year (set to 2013 in the pivot table)
PONumb (The employee numbers, I use for the count)
I have created a table to input the data into, I will manually load the previous years (counts)into the table since the current development is a rolling number. What I want to do is to develop an SSIS package that will capture the count on Jan 1 of 2014 and insert the # of "Active Employees", "Termed Employees" and the Year that just finished into a table.
UPDATE:
I have created two queries. One that provides the number of Active Employees
SELECT COUNT([PersNo]) AS HistoricalHC
FROM [dbo].[EEMaster]
WHERE [ChangeStatus] = 'Current' AND [EmpStatusName] = 'Active'
it returns
|HistoricHC|
|418 |
And another that provides the number of terms by Term Year
SELECT COUNT([PersNo]) AS NumbOfTermEE
FROM [dbo].[EEMaster]
WHERE [ChangeStatus] = 'Current' AND [EmpStatusName] = 'Withdrawn'
AND [TermYear] = '2013'
it returns
|NumbOfTermEE|
|42 |
I need the [TermYear] to be dynamic. Since this will run on Jan 1st of every year. It would need to pull the number of terms for the previous year (continually).
Then I need both of these numbers to be added into the new row with the year the data was calculated.
|Year|HistoricalHC|NumbOfTermEmp|
|2010|447 |57 |
|2011|419 |67 |
|2012|420 |51 |
|2013|418 |42 |

You are looking for a syntax of a case expression that does an aggregate to add a few for different things.
Sum(Case when (expression) then 1 end)
You also want to group by year it seems in the columns so you can easily pivot on that. You mention dynamic but I don't really know if you need to get much dynamic for just the year logic. I am not really getting if you want a SQL statement to go in a data flow to generate an output of an Excel sheet or not. Basically if you want just a grid with one row being one set of conditions and another being another. I would do a 'union' of two or more selects as long as it is not too large it should not be that hard. Here is a simple self extracting example with dummy data to see what I mean more.
It will run as is in SQL Management Studio 2005 and up.
declare #Person Table ( personID int identity, person varchar(8));
insert into #Person values ('Brett'),('Sean'),('Chad'),('Michael'),('Ray'),('Erik'),('Queyn');
declare #Orders table ( OrderID int identity, PersonID int, OrderCnt int, dt Date);
insert into #Orders values (1, 10, '1-7-11'),(1, 12, '2-12-12'),(2, 20, '7-1-13'),(2, 12, '1-5-10'),(3, 20, '6-4-11')
,(3, 12, '2-3-10'),(3, 6, '6-10-10'),(4, 20, '7-10-11'),(5, 20, '1-8-10'),(5, 9, '2-10-11'),
(6, 20, '3-1-11'),(6, 34, '4-6-12'),(7, 20, '5-1-11'),(7, 12, '6-8-12'),(7, 56, '7-25-13')
-- As is just joining sets
select *
from #Person p
join #Orders o on p.personID = o.PersonID
order by dt
-- Years on the rows
select
year(o.dt) as Year
, sum(o.OrderCnt) as Orders
, count(p.personID) as People
, count(distinct p.personID) as DistinctPeople
from #Person p
join #Orders o on p.personID = o.PersonID
group by year(o.dt)
-- Custom grouping on rows and doing the years with pivots for the columns
Select
'BulkOrders' as Description
, sum(case when year(o.dt) = '2010' then OrderCnt end) as [2010Orders]
, sum(case when year(o.dt) = '2011' then OrderCnt end) as [2011Orders]
, sum(case when year(o.dt) = '2012' then OrderCnt end) as [2012Orders]
, sum(case when year(o.dt) = '2013' then OrderCnt end) as [2013Orders]
, sum(OrderCnt) as Totals
from #Person p
join #Orders o on p.personID = o.PersonID
union
select
'OrdersByPerson'
, Count(case when year(o.dt) = '2010' then p.personID end)
, Count(case when year(o.dt) = '2011' then p.personID end)
, Count(case when year(o.dt) = '2012' then p.personID end)
, Count(case when year(o.dt) = '2013' then p.personID end)
, Count(p.personID)
from #Person p
join #Orders o on p.personID = o.PersonID
union
select
'OrdersByPersonDistinct'
, Count(distinct case when year(o.dt) = '2010' then p.personID end)
, Count(distinct case when year(o.dt) = '2011' then p.personID end)
, Count(distinct case when year(o.dt) = '2012' then p.personID end)
, Count(distinct case when year(o.dt) = '2013' then p.personID end)
, Count(distinct p.personID)
from #Person p
join #Orders o on p.personID = o.PersonID

Here is the solution I cam up with.
I will create an SSIS package that will run the following Stored Procedure.
INSERT INTO [dbo].[TORateFY] (Year,HistoricalHC,NumbTermedEmp)
SELECT DISTINCT YEAR(GETDATE()) AS [Year],
SUM(CASE WHEN EmpStatusName = 'Active' THEN 1 ELSE 0 END) AS HistoricalHC,
SUM(CASE WHEN EmpStatusName = 'Withdrawn' AND TermYear = YEAR(GETDATE()) THEN 1 ELSE 0 END) AS NumbOfTermEE
FROM dbo.EEMaster
This will be scheduled to run Annually on the 31st of Dec.

Update to my previous Answer:
I worked with a guy on another forum and he provided an excellent script that will give the correct count for both the active and termed employees on a monthly basis instead of waiting until the end of the year to get an overall count. This puts the reporting more inline with what was originally done manually.
MERGE dbo.TORateFY AS tgt
USING (
SELECT DATENAME(YEAR, GETDATE()) AS [Year],
SUM(CASE WHEN EmpStatusName = 'Active' THEN 1 ELSE 0 END) AS HistoricalHC,
SUM(CASE WHEN EmpStatusName = 'Withdrawn' AND TermYear = DATENAME(YEAR,
GETDATE()) THEN 1 ELSE 0 END) AS NumbOfTermEE
FROM dbo.EEMaster
WHERE ChangeStatus = 'Current'
AND EmpStatusName IN ('Active', 'Withdrawn')
OR TermYear <= DATENAME(YEAR, GETDATE())
) AS src ON src.[Year] = tgt.[Year]
WHEN MATCHED
THEN UPDATE
SET tgt.HistoricalHC = src.HistoricalHC,
tgt.NumbTermedEmp = src.NumbOfTermEE
WHEN NOT MATCHED BY TARGET
THEN INSERT (
[Year],
HistoricalHC,
NumbTermedEmp
)
VALUES (
src.[Year],
src.HistoricalHC,
src.NumbOfTermEE
);
I wanted to share the in case anyone else ran into a similar situation.
Thank you everyone for your input and guidance.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Group By Creates Duplicate Rows - sql

Related

Query to show all months and show values where there are data for the corresponding months

SQL - Group data with same ID and Date that has been to every Machine but has a different Name

SQL Efficiency on Date Range or Separate Tables

Sql ISNULL condition in Sql Pivot and Sql case

Annual Count by Criteria

Categories

Resources