SQL: removing duplicates based on different criteria, actually creates new records - sql

I have a data base (dbo) with duplicates. In particular, one employee can work two roles (Role Number) in the same business (Business code) or work two / the same role within different business in the same or different area (Area Code), see below:
What I want is to remove duplicate records. Thus, I created this code:
Select
dbo.year,
min(dbo.RoleNumber) AS Role,
min(dbo.AreaCode) AS Area,
min(dbo.BusinessCode) AS BCode,
dbo.EmployeeNumber
From dbo
Group by dbo.year, dbo.EmployeeNumber
This code works well when an individual works the lowest role in a business with the lowest number and in the lowest area (e.g., row n* 3 and 4 in my example) or where the area code and business code are the same in the duplicate records (e.g., row n* 1 and 2).
However, I have some cases where an individual’s lowest role is associated with a higher Business code or/and area code. In this case, SQL creates new records combining these elements see examples below:
rows 5-10: 2018, 651, 5110, 3, 17;
rows 11-13: 2018, 649, 6215, 4, 20;
rows 14-15: 2018, 750, 5101, 5, 24.
This is not a problem per se, but it is problematic when I join tables to get additional data for these employees. The key elements to join tables are Area and business codes and employee's number, however with my code SQL is creating new records that do not exist in other tables, this leads to additional data being NULL.
Is there a way to fix this? I need SQL to always select the lowest Role number first, if the role number is the same then the lowest establishment number should be selected and if the same, the lowest Area code should finally be selected.
So for instance, I would expect that the three records creating problems would be retrieved like this:
rows 5-10: 2018, 651, 6319, 3, 17;
rows 11-13: 2018, 650, 6215, 4, 20;
rows 14-15: 2018, 750, 8076, 5, 24.
Thank you
Silvia

you can use window function:
select * from
(
select * , row_number() over (partition by year, employeenumber order by rolenumber,businesscode,areacode) rn
from youratble
) t
where rn = 1
you can play with order by inside the window function to choose the row you want.

Related

How to make a query that return data of rows related to each row in table

i have some tables about Double-entry bookkeeping.
table VoucherDetail Contains Accounting Entries for Each Voucher and
other tables are Accounts Group/Ledger/Definitive
here are diagrams of tables
im trying to get opposite side of an entry and show it in a custom column that matches entry debit/credit amount(Ref to image 2).
i did some google search and find nothing. here is the query i made so far(Ref to image 1):
SELECT
dbo.Vouchers.VoucherId,
vd.VoucherDetailIndex AS ind,
vd.Debit,
vd.Credit,
vd.Description,
CONCAT ( ag.Name, '_', al.Name, '_', ad.Name ) AS names,
CONCAT ( ag.GroupId, '_', al.LedgerId, '_', ad.DefinitiveId ) AS ids
FROM dbo.Vouchers
JOIN dbo.VoucherDetails AS vd ON vd.Voucher_VoucherIndex = dbo.Vouchers.VoucherIndex
JOIN dbo.AccDefinitives AS ad ON vd.AccDefinitive_DefinitiveIndex = ad.DefinitiveIndex
JOIN dbo.AccLedgers AS al ON ad.AccLedger_LedgerIndex = al.LedgerIndex
JOIN dbo.AccGroups AS ag ON al.AccGroup_GroupIndex = ag.GroupIndex
here is the result im getting :
result i want to be :
here is an example to explain what i need :
EVENT :
we put 10$ on bank as our Equity, now we need to create a voucher for this:
INSERT INTO Vouchers(VoucherIndex, VoucherId, VoucherDate, Description) VALUES
(1, 1, 2019/01/01, initial investment);
and now we need to add Entry of this event to VoucherDetail of Voucher 1
which will have 2 entry; 1 for cash and 1 for Equity :
INSERT INTO VoucherDetails(VoucherDetailIndex, Debit, Credit, Description AccDefinitive_DefinitiveIndex, AccLedger_LedgerIndex, Voucher_VoucherIndex, EntityOrder) VALUES
(1, 10$, 0, 'Put Cash on Bank as initial Investment', 10101, 101, 1, 1),
(2, 0, 10$, 'initial Investment', 50101, 501, 1, 2);
now we run the first query i provided here is the result
now we have our common result, lets get to the problem
imagine someone filled these tables with 10000 row data
and we need to find Voucher no.10, with 20 entries inside VoucherDetail
we get these entries by doing a simple query.
but we don't know which related to which(like in above example Cash with 10$ debt related to Equity with 10$ credit)
if we want to know it, we need to spend time on it every time we need to find something
the query need to search whole table and find opposite side related to each row based on Debit or Credit value of row
this should be the result i wrote in excel :
as you can see in the image above there is 2 new columns added
Account in opposite Side and Account ID in opposite side
first row refers to Equity which related to Cash and
second row refers to Cash Which related to Equity.
As far as I can see, what you need to be able to do is join two VoucherDetail records that have the same Voucher_VoucherIndex value (let's call this VoucherID for brevity). However, the only two things these records have in common is their VoucherID and the fact that the Debit value = the Credit value in the other, and vice versa.
In the comments you mentioned that multiple VoucherDetail rows with the same VoucherID can have the same Debit value (and I presume Credit value). If this wasn't the case, you could add something like this to your query:
JOIN dbo.VoucherDetails AS vd_opposite
ON vd.Voucher_VoucherIndex = vd_opposite.Voucher_VoucherIndex
AND (vd.Debit = vd_opposite.Credit OR vd.Credit = vd_opposite.Debit)
You can't do this though, because Debit/Credit and VoucherID together are not enough to be unique, so you might pick up extra rows in the join that you don't want.
Therefore, your only option is to add a new ID field to your table (maybe called SaleID or something) that definitively links the two rows that represent opposite sides of the same "sale" with a common ID. Then, the above JOIN would look like this:
JOIN dbo.VoucherDetails AS vd_opposite
ON vd.Voucher_VoucherIndex = vd_opposite.Voucher_VoucherIndex
AND vd.SaleID = vd_opposite.SaleID
In addition to adding that JOIN, you would need to join the new vd_opposite table against all of the dbo.Acc* tables again to get access to the data you want, and obviously add the fields from those tables that you want in the results to your SELECT fields.

How to count unique occurences of string in table for separate records in apex 5

I am trying to automatically count the unique occurrences of a string saved in the table. Currently I have a count of a string but only when a user selects the string and it gives every record the same count value.
For example
Below is a image of my current table:
From the image you can see that there is a Requirement column and a count column. I have got it to the point were when the user would select a requirement record (each requirement record has a link) it would insert the requirement text into a requirement item called 'P33_REQUIREMENT' so the count can have a value to compare to.
This is the SQL that I have at current:
SELECT (SELECT COUNT(*)
FROM DIA_ASSOCIATED_QMS_DOCUMENTS
WHERE REQUIREMENT = :P33_REQUIREMENT
group by REQUIREMENT
) AS COUNT,
DPD.DIA_SELECTED,
DPD.Q_NUMBER_SELECTED,
DPD.SECTION_SELECTED,
DPD.ASSIGNED_TO_PERSON,
DAQD.REFERENCE,
DAQD.REQUIREMENT,
DAQD.PROGRESS,
DAQD.ACTION_DUE_DATE,
DAQD.COMPLETION_DATE,
DAQD.DIA_REF,
DA.DIA,
DA.ORG_RISK_SCORE
FROM DIA_PROPOSED_DETAIL DPD,
DIA_ASSOCIATED_QMS_DOCUMENTS DAQD,
DIA_ASSESSMENTS DA
WHERE DPD.DIA_SELECTED = DAQD.DIA_REF
AND DPD.DIA_SELECTED = DA.DIA
This is the sql used to make the table in the image.
This issue with this is, it is giving every record the same count when the user selects a requirement value. I can kind of fix this by also adding in AND DIA_SELECTED = :P33_DIA into the where clause of the count. DIA_SELECTED being the first column in the table and :P33_DIA being the item that stores the DIA ref number relating to the record chosen.
The output of this looks like:
As you can see there is only one count. Still doesn't fix the entire issue but a bit better.
So to sum up is there a way to have the count, count the occurrences individually and insert them in the requirements that are the same. So if there are three tests like in the images there would be a '3' in the count column where requirement = 'test', and if there is one record with 'test the system' there would be a '1' in the count column.
Also for more context I wont know what the user will input into the requirement so I can't compare to pre-determined strings.
I'm new to stack overflow I am hoping I have explained enough and its not too confusing.
The following extract:
SELECT (SELECT COUNT(*)
FROM DIA_ASSOCIATED_QMS_DOCUMENTS
WHERE REQUIREMENT = :P33_REQUIREMENT group by REQUIREMENT ) AS COUNT
Could be replaced by
SELECT (SELECT COUNT(*)
FROM DIA_ASSOCIATED_QMS_DOCUMENTS
WHERE REQUIREMENT = DAQD.REQUIREMENT ) AS COUNT
Which would give you - for each line, the number of requirements that are identical.
I'm not completely certain it is what you are after, but if it isn't, it should give you some ideas on how to progress (or allow you to indicate where I failed to understand your request)

SQL Aggregate Function over partitions

I'm relatively new to SQL but have learned some cool stuff. I'm getting results that don't make sense. I've got a query with several subqueries and what-not but I have a windowed function that isn't working like I'm expecting.
The part that isn't working is this (simplified from the 300 line query):
SELECT AVG(table.sales_amount)
OVER (PARTITION BY table.month, table.sales_rep, table.department)
FROM table
The problem is that when I pull the data non aggregated I get a value different (107) than the above returns (95).
I've used windowed functions for COUNT and SUM and they work fine, but AVG is acting strangely. Am I missing something about how this works with AVG?
The subquery that table is a standin for looks like:
sales_rep, month, department, sales_amount
1, 2017-1, abc, 125.20
1, 2017-2, abc, 120.00
2, 2017-1, def, 100.00
...etc
Working out of Sql Server Management studio
SOLVED: I did finally figure it out, the results i was joining this subquery to had the sales rep multiple times in a month selling objects A&B which caused whoever sold both to be counted twice. whoops, my bad.
The results that you get should be the same values as in:
SELECT AVG(table.sales_amount)
FROM table
GROUP BY table.month, table.sales_rep, table.department;
Of course, the rows will be different. You need to match up the three key columns.
Based on your sample data, it looks like the partitioning keys uniquely define each row. Perhaps you really intend:
SELECT AVG(table.sales_amount) OVER () as overall_average
FROM table;
EDIT:
For the departmental average:
SELECT AVG(table.sales_amount) OVER (partition by table.department) as department_average
FROM table;
After some bruteforcing of potential errors I finally figured out the issue. I was joining that subquery to the another which had multiple instances of a sales_rep in a given month (selling objects a & b) which caused the average of those with sales of both objects to be counted twice instead of once.
so sales rep 1 sold objects a & b which made his avg count as 66% of the dept avg instead of 50%, and sales rep 2 count only 33%.

Wrapping a range of data

How would I select a rolling/wrapping* set of rows from a table?
I am trying to select a number of records (per type, 2 or 3) for each day, wrapping when I 'run out'.
Eg.
2018-03-15: YyBiz, ZzCo, AaPlace
2018-03-16: BbLocation, CcStreet, DdInc
These are rendered within a SSRS report for Dynamics CRM, so I can do light post-query operations.
Currently I get to:
2018-03-15: YyBiz, ZzCo
2018-03-16: AaPlace, BbLocation, CcStreet
First, getting a number for each record with:
SELECT name, ROW_NUMBER() OVER (PARTITION BY type ORDER BY name) as RN
FROM table
Within SSRS, I then adjust RN to reflect the number of each type I need:
OnPageNum = FLOOR((RN+num_of_type-1)/num_of_type)-1
--Shift RN to be 0-indexed.
Resulting in AaPlace, BbLocation and CcStreet having a PageNum of 0, DdInc of 1, ... YyBiz and ZzCo of 8.
Then using an SSRS Table/Matrix linked to the dataset, I set the row filter to something like:
RowFilter = MOD(DateNum, NumPages(type)) == OnPageNum
Where DateNum is essentially days since epoch, and each page has a separate table and day passed in.
At this point, it is showing only N records of type per page, but if the total number of records of a type isn't a multiple of the number of records per page of that type, there will pages with less records than required.
Is there an easier way to approach this/what's the next step?
*Wrapping such as Wraparound found in videogames, seamless resetting to 0.
To achieve this effect, I found that offsetting the RowNumber by -DateNum*num_of_type (negative for positive ordering), then modulo COUNT(type) would provide the correct "wrap around" effect.
In order to achieve the desired pagination, it then just had to be divided by num_of_type and floor'd, as below:
RowFilter: FLOOR(((RN-DateNum*num_of_type) % count(type))/num_of_type) == 0

Bringing back multiple max on a single column in sql

I have a spreadsheet with customer accounts and when we get a new account it gets added on using a reference account number i.e. Anderson Electrical would be AND01 etc. I'm trying to use sql to bring back the highest number from each variation of letterings e.g. if AND01 already existed and our highest company value was AND34 then it would just bring back AND34 rather than 1 to 34.
Each account has the first 3 letters of there company name followed by whatever the next number is.
Hope I have explained this well enouh for someone to understand :)
For a single reference account:
select max(AcctNum)
from Accounts
where left(AcctNum, 3) = <reference account>
If you want it for all at once:
select left(AcctNum, 3) as ReferenceAcct, max(AcctNum)
from Accounts
group by left(AcctNum, 3)
Not sure if that's what you're asking but if you need to find max value that is part of a string you can do it with substring. So if you need to find the highest number from a column that contains those values you can do it with:
;WITH tmp AS(
SELECT 'AND01' AS Tmp
UNION ALL
SELECT 'AND34'
) SELECT MAX(SUBSTRING(tmp, 4, 2)) FROM tmp GROUP BY SUBSTRING(tmp, 0, 3)
That's a little test query that returns 34 because I'm grouping by first 3 letters, you probably want to group it by some ID.