avoiding group by for column used in datediff? - sql

As the database is currently constructed, I can only use a Date Field of a certain table in a datediff-function that is also part of a count aggregation (not the date field, but that entity where that date field is not null. The group by in the end messes up the counting, since the one entry is counted on it's own / as it's own group.
In some detail:
Our lead recruiter want's a report that shows the sum of applications, and conducted interviews per opening. So far no problem. Additionally he likes to see the total duration per opening from making it public to signing a new employee per opening and of cause only if the opening could already be filled.
I have 4 tables to join:
table 1 holds the data of the opening
table 2 has the single applications
table 3 has the interview data of the applications
table 4 has the data regarding the publication of the openings (with the date when a certain opening was made public)
The problem is the duration requirement. table 4 holds the starting point and in table 2 one (or none) applicant per opening has a date field filled with the time he returned a signed contract and therefor the opening counts as filled. When I use that field in a datediff I'm forced to also put that column in the group by clause and that results in 2 row per opening. 1 row has all the numbers as wanted and in the second row there is always that one person who has a entry in that date field...
So far I haven't come far in thinking of a way of avoiding that problem except for explanining to the colleague that he get's his time-to-fill number in another report.
SELECT
table1.col1 as NameOfProject,
table1.col2 as Company,
table1.col3 as OpeningType,
table1.col4 as ReasonForOpening,
count (table2.col2) as NumberOfApplications,
sum (case when table2.colSTATUS = 'withdrawn' then 1 else 0 end) as mberOfApplicantsWhoWithdraw,
sum (case when table3.colTypeInterview = 'PhoneInterview' then 1 else 0 end) as NumberOfPhoneInterview,
...more sum columns...,
table1.finished, // shows „1“ if opening is occupied
DATEDIFF(day, table4.colValidFrom, **table2.colContractReceived**) as DaysToCompletion
FROM
table2 left join table3 on table2.REF_NR = table3.REF_NR
join table1 on table2.PROJEKT = table1.KBEZ
left join table4 on table1.REFNR = table4.PRJ_REFNR
GROUP BY
**table2.colContractReceived**
and all other columns except the ones in aggregate (sum and count) functions go in the GROUP BY section
ORDER BY table1.NameOfProject
Here is a short rebuild of what it looks like. First a row where the opening is not filled and all aggregations come out in one row as wanted. The next project/opening shows up double, because the field used in the datediff is grouped independently...
project company; no_of_applications; no_of_phoneinterview; no_of_personalinterview; ... ; time_to_fill_in_days; filled?
2018_312 comp a 27 4 2 null 0
2018_313 comp b 54 7 4 null 0
2018_313 comp b 1 1 1 42 1
I'd be glad to get any idea how to solve this. Thanks for considering my request!
(During the 'translation' of all the specific column and table names I might have build in a syntax error here and there but the query worked well ecxept for that unwanted extra aggregation per filled opening)

If I've understood your requirement properly, I believe the issue you are having is that you need to show the date between the starting point and the time at which an applicant responded to an opening, however this must only show a single row based on whether or not the position was filled (if the position was filled, then show that row, if not then show that row).
I've achieved this result by assuming that you count a position as filled using the "ContractsRecevied" column. This may be wrong however the principle should still provide what you are looking for.
I've essentially wrapped your query in to a subquery, performed a rank ordering by the contractsfilled column descending and partitioned by the project. Then in the outer query I filter for the first instance of this ranking.
Even if my assumption about the column structure and data types is wrong, this should provide you with a model to work with.
The only issue you might have with this ranking solution is if you want to aggregate over both rows within one (so include all of the summed columns for both the position filled and position not filled row per project). If this is the case let me know and we can work around that.
Please let me know if you have any questions.
declare #table1 table (
REFNR int,
NameOfProject nvarchar(20),
Company nvarchar(20),
OpeningType nvarchar(20),
ReasonForOpening nvarchar(20),
KBEZ int
);
declare #table2 table (
NumberOfApplications int,
Status nvarchar(15),
REF_NR int,
ReturnedApplicationDate datetime,
ContractsReceived bit,
PROJEKT int
);
declare #table3 table (
TypeInterview nvarchar(25),
REF_NR int
);
declare #table4 table (
PRJ_REFNR int,
StartingPoint datetime
);
insert into #table1 (REFNR, NameOfProject, Company, OpeningType, ReasonForOpening, KBEZ)
values (1, '2018_312', 'comp a' ,'Permanent', 'Business growth', 1),
(2, '2018_313', 'comp a', 'Permanent', 'Business growth', 2),
(3, '2018_313', 'comp a', 'Permanent', 'Business growth', 3);
insert into #table2 (NumberOfApplications, Status, REF_NR, ReturnedApplicationDate, ContractsReceived, PROJEKT)
values (27, 'Processed', 4, '2018-04-01 08:00', 0, 1),
(54, 'Withdrawn', 5, '2018-04-02 10:12', 0, 2),
(1, 'Processed', 6, '2018-04-15 15:00', 1, 3);
insert into #table3 (TypeInterview, REF_NR)
values ('Phone', 4),
('Phone', 5),
('Personal', 6);
insert into #table4 (PRJ_REFNR, StartingPoint)
values (1, '2018-02-25 08:00'),
(2, '2018-03-04 15:00'),
(3, '2018-03-04 15:00');
select * from
(
SELECT
RANK()OVER(Partition by NameOfProject, Company order by ContractsReceived desc) as rowno,
table1. NameOfProject,
table1.Company,
table1.OpeningType,
table1.ReasonForOpening,
case when ContractsReceived >0 then datediff(DAY, StartingPoint, ReturnedApplicationDate) else null end as TimeToFillInDays,
ContractsReceived Filled
FROM
#table2 table2 left join #table3 table3 on table2.REF_NR = table3.REF_NR
join #table1 table1 on table2.PROJEKT = table1.KBEZ
left join #table4 table4 on table1.REFNR = table4.PRJ_REFNR
group by NameOfProject, Company, OpeningType, ReasonForOpening, ContractsReceived,
StartingPoint, ReturnedApplicationDate
) x where rowno=1

Related

Remove clients who don't have 2 rows by their name in SQL

What I'm trying to do is to filter by the clients that registered twice in the DB. This as I need to know who of them came at least twice, that is why I´m working with a table that registers every time they registered in the system as it follows:
order #
client
date
One
Andrew
XX
Two
Andrew
XX+1
Three
Andrew
XX+2
One
David
YY
One
Marc
ZZ
Two
Marc
ZZ+1
In this case I want to delete David´s record, as I only want people who has order numbers distinct than "one".
I tried this SQL:
select *
from table
where order_number > 1
however what this does is remove all the rows of the first orders, including the ones that came back.
Does somebody know an easy way for me to compare row names and filter by that or just how could I delete those rows in which there are clients with only one entry?
you need something like this :
select * from yourtable
where not exists (select 1 from yourtable where order_number >1)
or:
select client
from tablename
group by client
having count(*) > 1
CREATE TABLE records (
ID INTEGER PRIMARY KEY,
order_number TEXT NOT NULL,
client TEXT NOT NULL,
date DateTime NOT NULL
);
INSERT INTO records VALUES (1,'ONE', 'Adrew', '01.01.1999');
INSERT INTO records VALUES (2, 'TWO','Adrew', '02.02.1999');
INSERT INTO records VALUES (3, 'THREE','Adrew', '03.03.1999');
INSERT INTO records VALUES (4, 'ONE', 'David', '01.01.1999');
INSERT INTO records VALUES (5, 'ONE','Marc', '01.01.1999');
INSERT INTO records VALUES (6, 'TWO','Marc', '01.03.1999');
DELETE FROM records WHERE ID in
(
SELECT COUNT(client) as numberofclient FROM records
Group By client Having Count (client) > 1
);

sql join using recursive cte

Edit: Added another case scenario in the notes and updated the sample attachment.
I am trying to write a sql to get an output attached with this question along with sample data.
There are two table, one with distinct ID's (pk) with their current flag.
another with Active ID (fk to the pk from the first table) and Inactive ID (fk to the pk from the first table)
Final output should return two columns, first column consist of all distinct ID's from the first table and second column should contain Active ID from the 2nd table.
Below is the sql:
IF OBJECT_ID('tempdb..#main') IS NOT NULL DROP TABLE #main;
IF OBJECT_ID('tempdb..#merges') IS NOT NULL DROP TABLE #merges
IF OBJECT_ID('tempdb..#final') IS NOT NULL DROP TABLE #final
SELECT DISTINCT id,
current
INTO #main
FROM tb_ID t1
--get list of all active_id and inactive_id
SELECT DISTINCT active_id,
inactive_id,
Update_dt
INTO #merges
FROM tb_merges
-- Combine where the id from the main table matched to the inactive_id (should return all the rows from #main)
SELECT id,
active_id AS merged_to_id
INTO #final
FROM (SELECT t1.*,
t2.active_id,
Update_dt ,
Row_number()
OVER (
partition BY id, active_id
ORDER BY Update_dt DESC) AS rn
FROM #main t1
LEFT JOIN #merges t2
ON t1.id = t2.inactive_id) t3
WHERE rn = 1
SELECT *
FROM #final
This sql partially works. It doesn't work, where the id was once active then gets inactive.
Please note:
the active ID should return the last most active ID
the ID which doesn't have any active ID should either be null or the ID itself
ID where the current = 0, in those cases active ID should be the ID current in tb_ID
ID's may get interchanged. For example there are two ID's 6 and 7, when 6 is active 7 is inactive and vice versa. the only way to know the most current active state is by the update date
Attached sample might be easy to understand
Looks like I might have to use recursive cte for achieiving the results. Can someone please help?
thank you for your time!
I think you're correct that a recursive CTE looks like a good solution for this. I'm not entirely certain that I've understood exactly what you're asking for, particularly with regard to the update_dt column, just because the data is a little abstract as-is, but I've taken a stab at it, and it does seem to work with your sample data. The comments explain what's going on.
declare #tb_id table (id bigint, [current] bit);
declare #tb_merges table (active_id bigint, inactive_id bigint, update_dt datetime2);
insert #tb_id values
-- Sample data from the question.
(1, 1),
(2, 1),
(3, 1),
(4, 1),
(5, 0),
-- A few additional data to illustrate a deeper search.
(6, 1),
(7, 1),
(8, 1),
(9, 1),
(10, 1);
insert #tb_merges values
-- Sample data from the question.
(3, 1, '2017-01-11T13:09:00'),
(1, 2, '2017-01-11T13:07:00'),
(5, 4, '2013-12-31T14:37:00'),
(4, 5, '2013-01-18T15:43:00'),
-- A few additional data to illustrate a deeper search.
(6, 7, getdate()),
(7, 8, getdate()),
(8, 9, getdate()),
(9, 10, getdate());
if object_id('tempdb..#ValidMerge') is not null
drop table #ValidMerge;
-- Get the subset of merge records whose active_id identifies a "current" id and
-- rank by date so we can consider only the latest merge record for each active_id.
with ValidMergeCTE as
(
select
M.active_id,
M.inactive_id,
[Priority] = row_number() over (partition by M.active_id order by M.update_dt desc)
from
#tb_merges M
inner join #tb_id I on M.active_id = I.id
where
I.[current] = 1
)
select
active_id,
inactive_id
into
#ValidMerge
from
ValidMergeCTE
where
[Priority] = 1;
-- Here's the recursive CTE, which draws on the subset of merges identified above.
with SearchCTE as
(
-- Base case: any record whose active_id is not used as an inactive_id is an endpoint.
select
M.active_id,
M.inactive_id,
Depth = 0
from
#ValidMerge M
where
not exists (select 1 from #ValidMerge M2 where M.active_id = M2.inactive_id)
-- Recursive case: look for records whose active_id matches the inactive_id of a previously
-- identified record.
union all
select
S.active_id,
M.inactive_id,
Depth = S.Depth + 1
from
#ValidMerge M
inner join SearchCTE S on M.active_id = S.inactive_id
)
select
I.id,
S.active_id
from
#tb_id I
left join SearchCTE S on I.id = S.inactive_id;
Results:
id active_id
------------------
1 3
2 3
3 NULL
4 NULL
5 4
6 NULL
7 6
8 6
9 6
10 6

Merging two rows on to one in different columns

Firstly sorry if this has already been answered somewhere else, I have been unable to find an answer though after days of searching.
Is there a way to merge two rows into one row using different columns.
You will see from the image below, the row is identical, other than the date and location what I am looking for is to have the details below on one row. Where the date column is displayed twice with different column names for example 'Date sent to X location' and 'Date sent to Y location'. The location would not need to be displayed if we put the correct dates in the correct columns, as they would see what the location was from the column name.
So far I use this query, and I am unsure on how to adjust it to do what I need?
select
l.lot_number,
trunc(l.start_tran_date) AS "Date sent to location",
l.location_id_2 AS "Location"
FROM t_tran_log l
WHERE
(l.location_id_2 = 'SENTTOMAP' OR l.location_id_2 = 'WAITINGFORCOLLECTION')
;
This is what I would like the above result to look like:
This would be my approach:
1- Create a test table
create table MESSYLOG
(
lotn varchar(20),
datesent date,
location varchar(20)
);
insert into messylog values ('abc', '06-JUN-16', 'waiting');
insert into messylog values ('abc', '07-JUN-16', 'sent');
insert into messylog values ('def', '08-JUN-16', 'waiting');
insert into messylog values ('def', '10-JUN-16', 'sent');
--select * from MESSYLOG
2- Write 2 subqueries
select t1.lotn,t2.DateWait, t1.DateSentmap,
from
(
select e.lotn, e.datesent as DateSentmap
from messylog e
where e.location = 'sent'
) t1
JOIN
(
select m.lotn, m.datesent as DateWait
from messylog m
where m.location = 'waiting'
)t2
on t1.lotn = t2.lotn
3-Resultset
LOTN DATEWAIT DATESENTMAP
abc 06-JUN-16 07-JUN-16
def 08-JUN-16 10-JUN-16
If there is only a single date to a location for a given lot, could try something like this:
SELECT lot_number
,MAX(CASE WHEN location_id = 'WAITINGFORCOLLECTION' THEN start_tran_date) ELSE NULL END) AS "Date waiting for collection"
,MAX(CASE WHEN location_id = 'SENTTOMAP' THEN start_tran_date) ELSE NULL END) AS "Date sent to map"
FROM t_tran_log
GROUP BY lot_number
The aggregate function (MAX) will skip the NULL column values leaving the single value for the location.

SQL Showing Less information depending on date

I have this code, what It returns is a list of some clients, but it lists too many. This is because it lists several of the same thing just with diffrent dates. I only want to show the latest date and none of the other ones. I tried to do a group by Client_Code but it didn't work, it just through up not an aggregate function or something similar (can get if needed). What I have been asked to get is all of our clients, with all the details listed. in the 'as' part and they all pull through properly. If I take out:
I.DATE_LAST_POSTED as 'Last Posted',
I.DATE_LAST_BILLED as 'Last Billed'
It shows up okay, but I need the last billed date only to appear. But putting these lines in shows the client several times listing all the diffrent bill dates. And I think that is because it is pulling across the diffrent Matters in the Matter_Master Table. Essentially, I would like to only show the Client Information on the highest Matter with there last billed date.
Please let me know if this needs clarification, im trying to explain best I can....
SELECT DISTINCT
A.DIWOR as 'ID',
B.Client_alpha_Name as 'Client Name',
A.ClientCODE as 'Client Code',
B.Client_address as 'Client Address',
D.COMM_NO AS 'Contact',
E.Contact_full_name as 'Possible Key Contact',
G.LOBSICDESC as 'LOBSIC Code',
H.EARNERNAME as 'Client Care Parnter',
A.CLIENTCODE + '/' + LTRIM(STR(A.LAST_MATTER_NUM)) as 'Last Matter Code',
I.DATE_LAST_POSTED as 'Last Posted',
I.DATE_LAST_BILLED as 'Last Billed'
FROM CLIENT_MASTER A
JOIN CLIENT_INFO B
ON A.CLIENTCODE=B.CLIENT_CODE
JOIN MATTER_MASTER C
ON A.DIWOR=C.CLIENTDIWOR
JOIN COMMINFO D
ON A.DIWOR=D.DIWOR
JOIN CONTACT E
ON A.CLIENTCODE=E.CLIENTCODE
JOIN VW_CONTACT F
ON E.NAME_DIWOR=F.NAME_DIWOR
JOIN LOBSIC_CODES G
ON A.LOBSICDIWOR=G.DIWOR
JOIN STAFF H
ON A.CLIENTCAREPARTNER=H.DIWOR
JOIN MATTER I
ON C.DIWOR=I.MATTER_DIWOR
WHERE F.COMPANY_FLAG='Y'
AND C.MATTER_MANAGER NOT IN ('78','466','2','104','408','73','51','561','504','101','13','534','16','461','531','144','57','365','83','107','502','514','451')
AND I.DATE_LAST_BILLED > 0
GROUP BY A.ClientCODE
ORDER BY A.DIWOR
Your problem is that you aren't using enough aggregate functions. Which is probably why you're using both the DISTINCT clause and the GROUP BY clause (the recommendation is to use GROUP BY, and not DISTINCT).
So... remove DISTINCT, add the necessary (unique, more or less) list of columns to the GROUP BY clause, and wrap the rest in aggregate functions, constants, or subselects. In the specific case of wanting the largest date, wrap it in a MAX() function.
If I understood right:
--=======================
-- sample data - simplifed output of your query
--=======================
declare #t table
(
ClientCode int,
ClientAddress varchar(50),
DateLastBilled datetime
-- the rest of fields is skipped
)
insert into #t values (1, 'address1', '2011-01-01')
insert into #t values (1, 'address1', '2011-01-02')
insert into #t values (1, 'address1', '2011-01-03')
insert into #t values (1, 'address1', '2011-01-04')
insert into #t values (2, 'address2', '2011-01-07')
insert into #t values (2, 'address2', '2011-01-08')
insert into #t values (2, 'address2', '2011-01-09')
insert into #t values (2, 'address2', '2011-01-10')
--=======================
-- solution
--=======================
select distinct
ClientCode,
ClientAddress,
DateLastBilled
from
(
select
ClientCode,
ClientAddress,
DateLastBilled,
-- list of remaining fields
MaxDateLastBilled = max(DateLastBilled) over(partition by ClientCode)
from
(
-- here should be your query
select * from #t
) t
) t
where MaxDateLastBilled = DateLastBilled

Continuous sequences in SQL

Having a table with the following fields:
Order,Group,Sequence
it is required that all orders in a given group form a continuous sequence. For example: 1,2,3,4 or 4,5,6,7. How can I check using a single SQL query what orders do not comply with this rule? Thank you.
Example data:
Order Group Sequence
1 1 3
2 1 4
3 1 5
4 1 6
5 2 3
6 2 4
7 2 6
Expected result:
Order
5
6
7
Also accepted if the query returns only the group which has the wrong sequence, 2 for the example data.
Assuming that the sequences are generated and therefore cannot be duplicated:
SELECT group
FROM theTable
GROUP BY group
HAVING MAX(Sequence) - MIN(Sequence) &lt> (COUNT(*) - 1);
How about this?
select Group from Table
group by Group
having count(Sequence) <= max(Sequence)-min(Sequence)
[Edit] This assumes that Sequence does not allow duplicates within a particular group. It might be better to use:count != max - min + 1
[Edit again] D'oh, still not perfect. Another query to flush out duplicates would take care of that though.
[Edit the last] The original query worked fine in sqlite, which is what I had available for a quick test. It is much more forgiving than SQL server. Thanks to Bell for the pointer.
Personaly I think I would consider rethinking the requirement. It is the nature of relational databases that gaps in sequences can easily occur due to records that are rolled back. For instance, supppose an order starts to create four items in it, but one fails for some rason and is rolled back. If you precomputed the sequences manually, you would then have a gap is the one rolled back is not the last one. In other scenarios, you might get a gap due to multiple users looking for sequence values at approximately the same time or if at the last minute a customer deleted one record from the order. What are you honestly looking to gain from having contiguous sequences that you don't get from a parent child relationship?
This SQL selects the orders 3 and 4 wich have none continuous sequences.
DECLARE #Orders TABLE ([Order] INTEGER, [Group] INTEGER, Sequence INTEGER)
INSERT INTO #Orders VALUES (1, 1, 0)
INSERT INTO #Orders VALUES (1, 2, 0)
INSERT INTO #Orders VALUES (1, 3, 0)
INSERT INTO #Orders VALUES (2, 4, 0)
INSERT INTO #Orders VALUES (2, 5, 0)
INSERT INTO #Orders VALUES (2, 6, 0)
INSERT INTO #Orders VALUES (3, 4, 0)
INSERT INTO #Orders VALUES (3, 6, 0)
INSERT INTO #Orders VALUES (4, 1, 0)
INSERT INTO #Orders VALUES (4, 2, 0)
INSERT INTO #Orders VALUES (4, 8, 0)
SELECT o1.[Order]
FROM #Orders o1
LEFT OUTER JOIN #Orders o2 ON o2.[Order] = o1.[Order] AND o2.[Group] = o1.[Group] + 1
WHERE o2.[Order] IS NULL
GROUP BY o1.[Order]
HAVING COUNT(*) > 1
So your table is in the form of
Order Group Sequence
1 1 4
1 1 5
1 1 7
..and you want to find out that 1,1,6 is missing?
With
select
min(Sequence) MinSequence,
max(Seqence) MaxSequence
from
Orders
group by
[Order],
[Group]
you can find out the bounds for a given Order and Group.
Now you can simulate the correct data by using a special numbers table, which just contains every single number you could ever use for a sequence. Here is a good example of such a numbers table. It's not important how you create it, you could also create an excel file with all the numbers from x to y and import that excel sheet.
In my example I assume such a numbers table called "Numbers" with only one column "n":
select
[Order],
[Group],
n Sequence
from
(select min(Sequence) MinSequence, max(Seqence) MaxSequence from [Table] group by [Order], [Group]) MinMaxSequence
left join Numbers on n >= MinSequence and n <= MaxSequence
Put that SQL into a new view. In my example I will call the view "vwCorrectOrders".
This gives you the data where the sequences are continuous. Now you can join that data with the original data to find out which sequences are missing:
select
correctOrders.*
from
vwCorrectOrders co
left join Orders o on
co.[Order] = o.[Order]
and co.[Group] = o.[Group]
and co.Sequence = o.Sequence
where
o.Sequence is null
Should give you
Order Group Sequence
1 1 6
After a while I came up with the following solution. It seems to work but it is highly inefficient. Please add any improvement suggestions.
SELECT OrdMain.Order
FROM ((Orders AS OrdMain
LEFT OUTER JOIN Orders AS OrdPrev ON (OrdPrev.Group = OrdMain.Group) AND (OrdPrev.Sequence = OrdMain.Sequence - 1))
LEFT OUTER JOIN Orders AS OrdNext ON (OrdNext.Group = OrdMain.Group) AND (OrdNext.Sequence = OrdMain.Sequence + 1))
WHERE ((OrdMain.Sequence < (SELECT MAX(Sequence) FROM Orders OrdMax WHERE (OrdMax.Group = OrdMain.Group))) AND (OrdNext.Order IS NULL)) OR
((OrdMain.Sequence > (SELECT MIN(Sequence) FROM Orders OrdMin WHERE (OrdMin.Group = OrdMain.Group))) AND (OrdPrev.Order IS NULL))