SQL query to retrieve discrepancies in punch order - sql

Consider the table below.
The rule is - an employee cannot take a break (needs to clock out) from job num 1 before clocking in to job num 2. In this case the employee "A" was supposed to clock OUT instead of BREAK on jobnum 1 because he later clocked in to JobNum#2
Is it possible to write a query to find this in plain SQL?

Idea is to check if next record is proper one. To find next record one has to find first punchtime after current for same employee. Once this information is retrieved one can isolate record itself and check fields of interest, specifically is jobnum the same and [optionally] is punch_type 'IN'. If it is not, not exists evaluates to true and record is output.
select *
from #punch p
-- Isolate breaks only
where p.punch_type = 'BREAK'
-- The ones having no proper entry
and not exists
(
select null
-- The same table
from #punch a
where a.emplid = p.emplid
and a.jobnum = p.jobnum
-- Next record has punchtime from subquery
and a.punchtime = (select min (n.punchtime)
from #punch n
where n.emplid = p.emplid
and n.punchtime > p.punchtime
)
-- Optionally you might force next record to be 'IN'
and a.punch_type = 'IN'
)
Replace #punch with your table name. -- is comment in Sql Server; if you are not using this database, remove this lines. It is a good idea to tag your database and version as there are probably faster/better ways to do this.

Here is the SQL
select * from employees e1 cross join employees e2 where e1.JOBNUM = (e2.JOBNUM + 1)
and e1.PUNCH_TYPE = 'BREAK' and e2.PUNCH_TYPE = 'IN'
and e1.PUNCHTIME < e2.PUNCHTIME
and e1.EMPLID = e2.EMPLID

Related

Is there a way to exlude SQL results that are ALMOST duplicates?

I have query that runs daily that shows old and new member addresses as they are updated. The query works fine except for the times when a USPS address match is done in our core system and just changes some of the abbreviations
For example:
Old Address - 1234 East Main Street
New Address - 1234 E Main St
I don't need to see these results.
I have tried removing based on unique fields in the core, however, the USPS match process creates all new fields so the query can't remove based on that information.
The main SP for this is:
INSERT INTO #results
SELECT
distinct i.INDIVIDUAL_ID,
i.FIRST_NAME,
i.MIDDLE_NAME,
i.LAST_NAME,
i.D1NAME,
CurrentAddress.ADDRESS1,
PreviousAddress.ADDRESS1,
CurrentAddress.ADDRESS2,
PreviousAddress.ADDRESS2,
CurrentAddress.ADDRESS3,
PreviousAddress.ADDRESS3,
CurrentAddress.CITY,
PreviousAddress.CITY,
CurrentAddress.STATE,
PreviousAddress.STATE,
CurrentAddress.ZIP_STR,
PreviousAddress.ZIP_STR,
CurrentAddress.ZIP4_STR,
PreviousAddress.ZIP4_STR,
CurrentAddress.COUNTRY,
PreviousAddress.COUNTRY
FROM INDIVIDUAL i
INNER JOIN MEMBERSHIPPARTICIPANT mpt
ON i.INDIVIDUAL_ID = mpt.INDIVIDUAL_ID
AND i.DL_LOAD_DATE = mpt.DL_LOAD_DATE
INNER JOIN AGR_MEMBERTOTAL_TODAY m
ON mpt.MEMBER_NBR = m.MEMBER_NBR
AND mpt.DL_LOAD_DATE = m.DL_LOAD_DATE
INNER JOIN BRANCH b
ON i.BRANCH_NBR = b.BRANCH_NBR
CROSS APPLY dbo.GetCurrentAddress(i.INDIVIDUAL_ID, #latestDate) AS CurrentAddress
CROSS APPLY dbo.GetCurrentAddress(i.INDIVIDUAL_ID, #previousDate) AS PreviousAddress
WHERE i.DL_LOAD_DATE = #latestDate
AND ( m.OPN_LN_ALL_CNT > 0 OR m.OPN_SV_ALL_CNT > 0 )
order by i.FIRST_NAME asc
DELETE #results
WHERE Address1_Today = Address2_Yesterday
AND Address2_Today = Address1_Yesterday
SELECT *
FROM #results
WHERE (Address1_Today != Address1_Yesterday
OR Address2_Today != Address2_Yesterday
OR Address3_Today != Address3_Yesterday
OR City_Today != City_Yesterday
OR State_Today != State_Yesterday
OR ZipCode_Today != ZipCode_Yesterday
--OR FullZip_Today != FullZip_Yesterday
OR Country_Today != Country_Yesterday)
I'd like to remove the almost duplicate rows
For example:
Old Address - 1234 East Main Street
New Address - 1234 E Main St
There isn't a built in way to test via SQL, and it will have to be defined by logic via procedure. The first thing I'd do is group the substrings in both Old Address and New Address by count of those substrings. The ones where the counts equal each other at the row level, you can split by space and break up the address. Think of each address field as three parts [street_nbr, street_nm, street_suffix]. The street_nm can have an abbreviated prefix, which is why grouping the count of substrings is important thereby increasing the count past 3. Secondary lookup tables that match words/abbreviations that you identify can then be used to "un-duplicate" those suffixes and prefixes.
CREATE TABLE lookup_abbreviations(
unabbreviated_name varchar(50),
abbreviated_name varchar(50));
INSERT INTO lookup_abbreviations(unabbreviated_name, abbreviated_name)
VALUES ('East', 'E')
INSERT INTO lookup_abbreviations(unabbreviated_name, abbreviated_name)
VALUES ('Street', 'St');
-- Use Cross Applies and functions(LEN, LEFT, RIGHT, CHARINDEX, SUBSTRING) to split the address
-- into equal parts. This is where you'll have to figure out the best logic for grouping.
SELECT DISTINCT
Old_Street_Nbr = SUBSTRING(Old_Address, CHARINDEX(' ', Old_Address))
Old_Street_Nm_Prefix = CASE WHEN /*Here is where the count of substrings is tested*/ END
Old_Street_Nm = CASE WHEN /*Here is where the count of substrings is tested*/ END
Old_Street_Suffix = []
INTO #AbbreviatonSort
FROM Results;
SELECT
Old_Street_Nbr ,
Old_Street_Nm_Prefix = CASE
WHEN Old_Street_Nm_Prefix IN (SELECT abbreviated_name from
lookup_abbreviations)
THEN (SELECT unabbreviated_name from
lookup_abbreviations WHERE abbreviated_name =
Old_Street_Nm_Prefix)
ELSE Old_Street_Nm_Prefix
END
INTO #SortedAddresses
FROM #AbbreviationSort
;
SELECT DISTINCT * FROM
(
SELECT Old_Street_Nbr, Old_Prefix FROM #SortedAddresses
UNION ALL
SELECT New_Street_Nbr, New_Prefix FROM #SortedAddresses
) AS DupSearch

Having trouble converting strings to numbers

I am having an issue with casting or converting my varchar2 value to a number. The problem is that my TO_EMPLOYEE column (in PROJ_NOTIFY_HIST) contains both email addresses and employee IDs.
SELECT NOTIFY.PROJ_ID
FROM PROJ_NOTIFY_HIST NOTIFY
WHERE NOTIFY.NOTIFIED_SENT = 0
AND CAST(NOTIFY.TO_EMPLOYEE AS NUMBER) NOT IN (SELECT EMPLOYEE_ID
FROM V_ACTIVE_EMPLOYEE_INFO);
Is there any way to get employee IDs only, and compare them to my sub-query?
You could test each to_employee value to see if it consists of only numeric characters, and only then try to convert it to a number and compare with your view. In recent versions of Oracle you can just use to_number() and use its default ... on error clause to prevent the ORA-01722. Or validate_conversion() to do the test.
But that will only tell you if the value is numeric, not if it is an employee ID which you can then check to see if it's active. You could have other numbers in there that aren't supposed to be IDs, like phone numbers for example.
if you have a table that contains all employee IDs, not just the active ones, then you could cast those the other way, from numbers to strings, to find matches; e.g. if you have an employee_info table that holds all active and inactive employees, something like:
select notify.proj_id
from proj_notify_hist notify
join employee_info e
on notify.to_employee = cast(e.employee_id as varchar2(10)) -- use suitable size
where notify.notified_sent = 0
and e.employee_id not in (select employee_id
from v_active_employee_info);
or
select notify.proj_id
from proj_notify_hist notify
join employee_info e
on notify.to_employee = cast(e.employee_id as varchar2(10)) -- use suitable size
where notify.notified_sent = 0
and not exists (select null
from v_active_employee_info active
where active.employee_id = e.employee_id);
If v_active_employee_info is a view over the employee_info table then you could skip that not in or exists look-up and instead directly test for whatever conditions the view is using to filter active employees.
I suspect you want something like this:
SELECT NOTIFY.PROJ_ID
FROM PROJ_NOTIFY_HIST NOTIFY
WHERE NOTIFY.NOTIFIED_SENT = 0 AND
NOT EXIST (SELECT 1
FROM V_ACTIVE_EMPLOYEE_INFO A
WHERE TO_CHAR(A.EMPLOYEE_ID) = NOTIFY.TO_EMPLOYEE
);
The main thing? Cast to a string not a number.
If you only want to limit this to values that look like a number, then:
SELECT NOTIFY.PROJ_ID
FROM PROJ_NOTIFY_HIST NOTIFY
WHERE NOTIFY.NOTIFIED_SENT = 0 AND
REGEXP_LIKE(EMPLOYEE_ID, '^[0-9]+$') AND
NOT EXIST (SELECT 1
FROM V_ACTIVE_EMPLOYEE_INFO A
WHERE TO_CHAR(A.EMPLOYEE_ID) = NOTIFY.TO_EMPLOYEE
);
Use case stetment to check should you convert it (field consists onlu numbers) or not.
CASE [ expression ]
WHEN condition_1 THEN result_1
ELSE result
END

SQL Filtering duplicate rows due to bad ETL

The database is Postgres but any SQL logic should help.
I am retrieving the set of sales quotations that contain a given product within the bill of materials. I'm doing that in two steps: step 1, retrieve all DISTINCT quote numbers which contain a given product (by product number).
The second step, retrieve the full quote, with all products listed for each unique quote number.
So far, so good. Now the tough bit. Some rows are duplicates, some are not. Those that are duplicates (quote number & quote version & line number) might or might not have maintenance on them. I want to pick the row that has maintenance greater than 0. The duplicate rows I want to exclude are those that have a 0 maintenance. The problem is that some rows, which have no duplicates, have 0 maintenance, so I can't just filter on maintenance.
To make this exciting, the database holds quotes over 20+ years. And the data scientists guys have just admitted that maybe the ETL process has some bugs...
--- step 0
--- cleanup the workspace
SET CLIENT_ENCODING TO 'UTF8';
DROP TABLE IF EXISTS product_quotes;
--- step 1
--- get list of Product Quotes
CREATE TEMPORARY TABLE product_quotes AS (
SELECT DISTINCT master_quote_number
FROM w_quote_line_d
WHERE item_number IN ( << model numbers >> )
);
--- step 2
--- Now join on that list
SELECT
d.quote_line_number,
d.item_number,
d.item_description,
d.item_quantity,
d.unit_of_measure,
f.ref_list_price_amount,
f.quote_amount_entered,
f.negtd_discount,
--- need to calculate discount rate based on list price and negtd discount (%)
CASE
WHEN ref_list_price_amount > 0
THEN 100 - (ref_list_price_amount + negtd_discount) / ref_list_price_amount *100
ELSE 0
END AS discount_percent,
f.warranty_months,
f.master_quote_number,
f.quote_version_number,
f.maintenance_months,
f.territory_wid,
f.district_wid,
f.sales_rep_wid,
f.sales_organization_wid,
f.install_at_customer_wid,
f.ship_to_customer_wid,
f.bill_to_customer_wid,
f.sold_to_customer_wid,
d.net_value,
d.deal_score,
f.transaction_date,
f.reporting_date
FROM w_quote_line_d d
INNER JOIN product_quotes pq ON (pq.master_quote_number = d.master_quote_number)
INNER JOIN w_quote_f f ON
(f.quote_line_number = d.quote_line_number
AND f.master_quote_number = d.master_quote_number
AND f.quote_version_number = d.quote_version_number)
WHERE d.net_value >= 0 AND item_quantity > 0
ORDER BY f.master_quote_number, f.quote_version_number, d.quote_line_number
The logic to filter the duplicate rows is like this:
For each master_quote_number / version_number pair, check to see if there are duplicate line numbers. If so, pick the one with maintenance > 0.
Even in a CASE statement, I'm not sure how to write that.
Thoughts? The database is Postgres but any SQL logic should help.
I think you will want to use Window Functions. They are, in a word, awesome.
Here is a query that would "dedupe" based on your criteria:
select *
from (
select
* -- simplifying here to show the important parts
,row_number() over (
partition by master_quote_number, version_number
order by maintenance desc) as seqnum
from w_quote_line_d d
inner join product_quotes pq
on (pq.master_quote_number = d.master_quote_number)
inner join w_quote_f f
on (f.quote_line_number = d.quote_line_number
and f.master_quote_number = d.master_quote_number
and f.quote_version_number = d.quote_version_number)
) x
where seqnum = 1
The use of row_number() and the chosen partition by and order by criteria guarantee that only ONE row for each combination of quote_number/version_number will get the value of 1, and it will be the one with the highest value in maintenance (if your colleagues are right, there would only be one with a value > 0 anyway).
Can you do something like...
select
*
from
w_quote_line_d d
inner join
(
select
...
,max(maintenance)
from
w_quote_line_d
group by
...
) d1
on
d1.id = d.id
and d1.maintenance = d.maintenance;
Am I understanding your problem correctly?
Edit: Forgot the group by!
I'm not sure, but maybe you could Group By all other columns and use MAX(Maintenance) to get only the greatest.
What do you think?

SQL Statement to select row where previous row status = 'C' AS400

This is being run on sql for IBMI Series 7
I have a table which stores info about orders. Each row has an order number (ON), part number(PN), and sequence number(SEQ). Each ON will have multiple PN's linked to them and each part number has multiple SEQ Number. Each sequence number represents the order in which to do work on the part. Somewhere else in the system once the part is at a location and ready to be worked on it shows a flag. What I want to do is get a list of orders for a location that have not yet arrived but have been closed out on the previous location( Which means the part is on it's way).
I have a query listed below that I believe should work but I get the following error: "The column qualifier or table t undefined". Where is my issue at?
Select * From (SELECT M2ON as Order__Number , M2SEQ as Sequence__Number,
M2PN as Product__Number,ML2OQ as Order__Quantity
FROM M2P
WHERE M2pN in (select R1PN FROM R1P WHERE (RTWC = '7411') AND (R1SEQ = M2SEQ)
)
AND M2ON IN (SELECT M1ON FROM M1P WHERE ML1RCF = '')
ORDER BY ML2OSM ASC) as T
WHERE
T.Order__Number in (Select t3.m2on from (SELECT *
FROM(Select * from m2p
where m2on = t.Order__Number and m2pn = t.Product__Number
order by m2seq asc fetch first 2 rows only
)as t1 order by m2seq asc fetch first row only
) as t3 where t3.m2stat = 'C')
EDIT- Answer for anyone else with this issue
Clutton's Answer worked with slight modification so thank you to him for the fast response! I had to name my outer table and specify that in the subquery otherwise the as400 would kick back and tell me it couldn't find the columns. I also had to order by the sequence number descending so that I grabbed the highest record that was below the parameter(otherwise for example if my sequence number was 20 it could grab 5 even though 10 was available and should be shown first. Here is the subquery I now use. Please note the actual query names m2p as T1.
IFNULL((
SELECT
M2STAT
FROM
M2P as M2P_1
WHERE
M2ON = T1.M2ON
AND M2SEQ < T1.M2SEQ
AND M2PN IN (select R1PN FROM R1P WHERE (RTWC = #WC) AND (R1SEQ = T1.M2SEQ))
ORDER BY M2SEQ DESC
FETCH FIRST ROW ONLY
), 'NULL') as PRIOR_M2STAT
Just reading your question, it looks like something I do frequently to emulate RPG READPE op codes. Is the key to M2P Order/Seq? If so, here is a basic piece that may help you build out the rest of the query.
I am assuming that you are trying to get the prior record by key using SQL. In RPG this would be like doing a READPE on the key for a file with Order/Seq key.
Here is an example using a subquery to get the status field of the prior record.
SELECT
M2ON, M2PN, M2OQ, M2STAT,
IFNULL((
SELECT
M2STAT
FROM
M2P as M2P_1
WHERE
M2P_1.M2ON = M2ON
AND M2P_1.M2SEQ < M2SEQ
FETCH FIRST ROW ONLY
), '') as PRIOR_M2STAT
FROM
M2P
Note that this wraps the subquery in an IFNULL to handle the case where it is the first sequence number and no prior sequence exists.

Return overlapping date records in SQL

I used the following query to fetch the overlapping records in SQL:
SELECT QUOTE_ID,FUNCTION_ID,FUNCTION_DT,FUNC_SPACE_ID,FN_START_TIME,FN_END_TIME,DATE_AUTH_LEVEL
FROM R_13_ALL_RESERVED A
WHERE
A.FUNC_SPACE_ID = '401-ZFU-52'
AND A.FUNCTION_DT = TO_DATE('09/03/2015','MM/DD/YYYY')
AND EXISTS ( SELECT 'X'
FROM R_13_ALL_RESERVED B
WHERE A.PROPERTY = B.PROPERTY
AND A.FUNCTION_DT = B.FUNCTION_DT
AND A.FUNCTION_ID <> B.FUNCTION_ID
AND ( ( A.FN_START_TIME > B.FN_START_TIME
AND A.FN_START_TIME < B.FN_END_TIME)
OR ( B.FN_START_TIME > A.FN_START_TIME
AND B.FN_START_TIME < A.FN_END_TIME)
OR ( A.FN_START_TIME = B.FN_START_TIME
AND A.FN_END_TIME = B.FN_END_TIME)
)
)
But eventhough the dates are not overlapping it still returns the records as overlapping.
I am missing some thing here?
Also if the date records overlap, I need to compare the count of function_id records with DATE_AUTH_LEVEL, if 2 function_id records overlap and the count of function_id would be 2 and DATE_AUTH_LEVEL is 1, such record should in the result set.
Please find the data set in SQLFiddle
http://sqlfiddle.com/#!9/95874/1
Desired Output : The SQL should return overlapping FN_START_TIME and FN_END_TIME for a function_space_id and it's function_dt
In the provided example, row 5 and 6 overlap for the function space id '401-ZFU-12' and function_dt 'August, 15 2015' and all others are not overlapping
The simplest predicate (where clause condition) for detecting the overlap of two ranges is to compare the start of the first range with the end of the 2nd range, and the start of the 2nd range with the end of the first range:
WHERE R1.Start_Date <= R2.End_Date
AND R2.Start_Date <= R1.End_Date
As you can see each of the two inequalities looks at a start and end value from separate records (R1 and R2 and then R2 and R1 respectively) all that remains is to add the conditions that will correlate the records, and also ensure that you aren't comparing a row to itself So if you want to find all Common_IDs that have Distinct_IDs with over lapping date ranges:
select *
from Your_Table R1
where exists (select 1 from Your_Table R2
where R1.Common_ID = R2.Common_ID
and R1.Distinct_ID <> R2.Distinct_ID
and R1.Start_Date <= R2.End_Date
and R2.Start_Date <= R1.End_Date)
If there is no Distinct_ID to use, you can use R1.rowid <> R2.rowid in place of R1.Distinct_ID <> R2.Distinct_ID
Here is an approach to troubleshooting the issue on your end.
My first suspicion is that the results of your exists clause are too broad and thus returning rows for every record matching in the outer clause unexpectedly. Likely there are rows that do not fall on the desired date or spaceid that share one component of their interval with your inner criteria.
Inspect the results of the inner select statement (the one within the exists clause) for an example row, exchanging all the 'A' aliased values with actual values from one of the rows returned you did not expect to receive.
Additionally, you can inspect what I think would be a semi join in the execution profile to see what the join criteria are. If you expect it to be filtered by a constant for 'FUNC_SPACE_ID' of '401-ZFU-52', you will discover that it is not.