Having trouble converting strings to numbers - sql

I am having an issue with casting or converting my varchar2 value to a number. The problem is that my TO_EMPLOYEE column (in PROJ_NOTIFY_HIST) contains both email addresses and employee IDs.
SELECT NOTIFY.PROJ_ID
FROM PROJ_NOTIFY_HIST NOTIFY
WHERE NOTIFY.NOTIFIED_SENT = 0
AND CAST(NOTIFY.TO_EMPLOYEE AS NUMBER) NOT IN (SELECT EMPLOYEE_ID
FROM V_ACTIVE_EMPLOYEE_INFO);
Is there any way to get employee IDs only, and compare them to my sub-query?

You could test each to_employee value to see if it consists of only numeric characters, and only then try to convert it to a number and compare with your view. In recent versions of Oracle you can just use to_number() and use its default ... on error clause to prevent the ORA-01722. Or validate_conversion() to do the test.
But that will only tell you if the value is numeric, not if it is an employee ID which you can then check to see if it's active. You could have other numbers in there that aren't supposed to be IDs, like phone numbers for example.
if you have a table that contains all employee IDs, not just the active ones, then you could cast those the other way, from numbers to strings, to find matches; e.g. if you have an employee_info table that holds all active and inactive employees, something like:
select notify.proj_id
from proj_notify_hist notify
join employee_info e
on notify.to_employee = cast(e.employee_id as varchar2(10)) -- use suitable size
where notify.notified_sent = 0
and e.employee_id not in (select employee_id
from v_active_employee_info);
or
select notify.proj_id
from proj_notify_hist notify
join employee_info e
on notify.to_employee = cast(e.employee_id as varchar2(10)) -- use suitable size
where notify.notified_sent = 0
and not exists (select null
from v_active_employee_info active
where active.employee_id = e.employee_id);
If v_active_employee_info is a view over the employee_info table then you could skip that not in or exists look-up and instead directly test for whatever conditions the view is using to filter active employees.

I suspect you want something like this:
SELECT NOTIFY.PROJ_ID
FROM PROJ_NOTIFY_HIST NOTIFY
WHERE NOTIFY.NOTIFIED_SENT = 0 AND
NOT EXIST (SELECT 1
FROM V_ACTIVE_EMPLOYEE_INFO A
WHERE TO_CHAR(A.EMPLOYEE_ID) = NOTIFY.TO_EMPLOYEE
);
The main thing? Cast to a string not a number.
If you only want to limit this to values that look like a number, then:
SELECT NOTIFY.PROJ_ID
FROM PROJ_NOTIFY_HIST NOTIFY
WHERE NOTIFY.NOTIFIED_SENT = 0 AND
REGEXP_LIKE(EMPLOYEE_ID, '^[0-9]+$') AND
NOT EXIST (SELECT 1
FROM V_ACTIVE_EMPLOYEE_INFO A
WHERE TO_CHAR(A.EMPLOYEE_ID) = NOTIFY.TO_EMPLOYEE
);

Use case stetment to check should you convert it (field consists onlu numbers) or not.
CASE [ expression ]
WHEN condition_1 THEN result_1
ELSE result
END

Related

SELECT DISTINCT to return at most one row

Given the following db structure:
Regions
id
name
1
EU
2
US
3
SEA
Customers:
id
name
region
1
peter
1
2
henry
1
3
john
2
There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
So maybe something like this:
SELECT DISTINCT region FROM customers WHERE id IN (?, ?)
The problem with this is that the result will be either an array (if the customers are not within the same region) or a single value.
Is there are more elegant way of solving this constraint? I was thinking of SELECT INTO and use a temporary table, or I could SELECT COUNT(DISTINCT region) and then do another SELECT for the actual value if the count is less than 2, but I'd like to avoid the performance hit if possible.
There is also a PL/pgSQL function in place, defined as sendShipment() which takes (among other things) a sender and a receiver customer ID.
There is a business constraint around this which requires us to verify that both sender and receiver sit in the same region - and we need to do this as part of sendShipment(). So from within this function, we need to query the customer table for both the sender and receiver ID and verify that both their region ID is identical. We will also need to ID itself for further processing down the line.
This query should work:
WITH q AS (
SELECT
COUNT( * ) AS CountCustomers,
COUNT( DISTINCT c.Region ) AS CountDistinctRegions,
-- MIN( c.Region ) AS MinRegion
FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion
FROM
Customers AS c
WHERE
c.CustomerId = $senderCustomerId
OR
c.CustomerId = $receiverCustomerId
)
SELECT
CASE WHEN q.CountCustomers = 2 AND q.CountDistinctRegions = 2 THEN 'OK' ELSE 'BAD' END AS "Status",
CASE WHEN q.CountDistinctRegions = 2 THEN q.MinRegion END AS SingleRegion
FROM
q
The above query will always return a single row with 2 columns: Status and SingleRegion.
SQL doesn't have a "SINGLE( col )" aggregate function (i.e. a function that is NULL unless the aggregation group has a single row), but we can abuse MIN (or MAX) with a CASE WHEN COUNT() in a CTE or derived-table as an equivalent operation.
Alternatively, windowing-functions could be used, but annoyingly they don't work in GROUP BY queries despite being so similar, argh.
Once again, this is the ISO SQL committee's fault, not PostgreSQL's.
As your Region column is UUID you cannot use it with MIN, but I understand it should work with FIRST_VALUE( c.Region ) OVER ( ORDER BY c.Region ) AS MinRegion.
As for the columns:
The Status column is either 'OK' or 'BAD' based on those business-constraints you mentioned. You might want to change it to a bit column instead of a textual one, though.
The SingleRegion column will be NOT NULL (with a valid region) if CountDistinctRegions = 2 regardless of CountCustomers, but feel free to change that, just-in-case you still want that info.
For anybody else who's interested in a simple solution, I finally came up with the (kind of obvious) way to do it:
SELECT
r.region
FROM
customers s
INNER JOIN customers r ON
s.region = r.region
WHERE s.id = 'sender_id' and r.id = 'receiver_id';
Huge credit to SELECT DISTINCT to return at most one row who helped me out a lot on this and also posted a viable solution.

SQL - Returning fields based on where clause then joining same table to return max value?

I have a table named Ticket Numbers, which (for this example) contain the columns:
Ticket_Number
Assigned_Group
Assigned_Group_Sequence_No
Reported_Date
Each ticket number could contain 4 rows, depending on how many times the ticket changed assigned groups. Some of these rows could contain an assigned group of "Desktop Support," but some may not. Here is an example:
Example of raw data
What I am trying to accomplish is to get the an output that contains any ticket numbers that contain 'Desktop Support', but also the assigned group of the max sequence number. Here is what I am trying to accomplish with SQL:
Queried Data
I'm trying to use SQL with the following query but have no clue what I'm doing wrong:
select ih.incident_number,ih.assigned_group, incident_history2.maxseq, incident_history2.assigned_group
from incident_history_public as ih
left join
(
select max(assigned_group_seq_no) maxseq, incident_number, assigned_group
from incident_history_public
group by incident_number, assigned_group
) incident_history2
on ih.incident_number = incident_history2.incident_number
and ih.assigned_group_seq_no = incident_history2.maxseq
where ih.ASSIGNED_GROUP LIKE '%DS%'
Does anyone know what I am doing wrong?
You might want to create a proper alias for incident_history. e.g.
from incident_history as incident_history1
and
on incident_history1.ticket_number = incident_history2.ticket_number
and incident_history1.assigned_group_seq_no = incident_history2.maxseq
In my humble opinion a first error could be that I don't see any column named "incident_history2.assigned_group".
I would try to use common table expression, to get only ticket number that contains "Desktop_support":
WITH desktop as (
SELECT distinct Ticket_Number
FROM incident_history
WHERE Assigned_Group = "Desktop Support"
),
Than an Inner Join of the result with your inner table to get ticket number and maxSeq, so in a second moment you can get also the "MAXGroup":
WITH tmp AS (
SELECT i2.Ticket_Number, i2.maxseq
FROM desktop D inner join
(SELECT Ticket_number, max(assigned_group_seq_no) as maxseq
FROM incident_history
GROUP BY ticket_number) as i2
ON D.Ticket_Number = i2.Ticket_Number
)
SELECT i.Ticket_Number, i.Assigned_Group as MAX_Group, T.maxseq, i.Reported_Date
FROM tmp T inner join incident_history i
ON T.Ticket_Number = i.Ticket_Number and i.assigned_group_seq_no = T.maxseq
I think there are several different method to resolve this question, but I really hope it's helpful for you!
For more information about Common Table Expression: https://www.essentialsql.com/introduction-common-table-expressions-ctes/

Is there a way to exlude SQL results that are ALMOST duplicates?

I have query that runs daily that shows old and new member addresses as they are updated. The query works fine except for the times when a USPS address match is done in our core system and just changes some of the abbreviations
For example:
Old Address - 1234 East Main Street
New Address - 1234 E Main St
I don't need to see these results.
I have tried removing based on unique fields in the core, however, the USPS match process creates all new fields so the query can't remove based on that information.
The main SP for this is:
INSERT INTO #results
SELECT
distinct i.INDIVIDUAL_ID,
i.FIRST_NAME,
i.MIDDLE_NAME,
i.LAST_NAME,
i.D1NAME,
CurrentAddress.ADDRESS1,
PreviousAddress.ADDRESS1,
CurrentAddress.ADDRESS2,
PreviousAddress.ADDRESS2,
CurrentAddress.ADDRESS3,
PreviousAddress.ADDRESS3,
CurrentAddress.CITY,
PreviousAddress.CITY,
CurrentAddress.STATE,
PreviousAddress.STATE,
CurrentAddress.ZIP_STR,
PreviousAddress.ZIP_STR,
CurrentAddress.ZIP4_STR,
PreviousAddress.ZIP4_STR,
CurrentAddress.COUNTRY,
PreviousAddress.COUNTRY
FROM INDIVIDUAL i
INNER JOIN MEMBERSHIPPARTICIPANT mpt
ON i.INDIVIDUAL_ID = mpt.INDIVIDUAL_ID
AND i.DL_LOAD_DATE = mpt.DL_LOAD_DATE
INNER JOIN AGR_MEMBERTOTAL_TODAY m
ON mpt.MEMBER_NBR = m.MEMBER_NBR
AND mpt.DL_LOAD_DATE = m.DL_LOAD_DATE
INNER JOIN BRANCH b
ON i.BRANCH_NBR = b.BRANCH_NBR
CROSS APPLY dbo.GetCurrentAddress(i.INDIVIDUAL_ID, #latestDate) AS CurrentAddress
CROSS APPLY dbo.GetCurrentAddress(i.INDIVIDUAL_ID, #previousDate) AS PreviousAddress
WHERE i.DL_LOAD_DATE = #latestDate
AND ( m.OPN_LN_ALL_CNT > 0 OR m.OPN_SV_ALL_CNT > 0 )
order by i.FIRST_NAME asc
DELETE #results
WHERE Address1_Today = Address2_Yesterday
AND Address2_Today = Address1_Yesterday
SELECT *
FROM #results
WHERE (Address1_Today != Address1_Yesterday
OR Address2_Today != Address2_Yesterday
OR Address3_Today != Address3_Yesterday
OR City_Today != City_Yesterday
OR State_Today != State_Yesterday
OR ZipCode_Today != ZipCode_Yesterday
--OR FullZip_Today != FullZip_Yesterday
OR Country_Today != Country_Yesterday)
I'd like to remove the almost duplicate rows
For example:
Old Address - 1234 East Main Street
New Address - 1234 E Main St
There isn't a built in way to test via SQL, and it will have to be defined by logic via procedure. The first thing I'd do is group the substrings in both Old Address and New Address by count of those substrings. The ones where the counts equal each other at the row level, you can split by space and break up the address. Think of each address field as three parts [street_nbr, street_nm, street_suffix]. The street_nm can have an abbreviated prefix, which is why grouping the count of substrings is important thereby increasing the count past 3. Secondary lookup tables that match words/abbreviations that you identify can then be used to "un-duplicate" those suffixes and prefixes.
CREATE TABLE lookup_abbreviations(
unabbreviated_name varchar(50),
abbreviated_name varchar(50));
INSERT INTO lookup_abbreviations(unabbreviated_name, abbreviated_name)
VALUES ('East', 'E')
INSERT INTO lookup_abbreviations(unabbreviated_name, abbreviated_name)
VALUES ('Street', 'St');
-- Use Cross Applies and functions(LEN, LEFT, RIGHT, CHARINDEX, SUBSTRING) to split the address
-- into equal parts. This is where you'll have to figure out the best logic for grouping.
SELECT DISTINCT
Old_Street_Nbr = SUBSTRING(Old_Address, CHARINDEX(' ', Old_Address))
Old_Street_Nm_Prefix = CASE WHEN /*Here is where the count of substrings is tested*/ END
Old_Street_Nm = CASE WHEN /*Here is where the count of substrings is tested*/ END
Old_Street_Suffix = []
INTO #AbbreviatonSort
FROM Results;
SELECT
Old_Street_Nbr ,
Old_Street_Nm_Prefix = CASE
WHEN Old_Street_Nm_Prefix IN (SELECT abbreviated_name from
lookup_abbreviations)
THEN (SELECT unabbreviated_name from
lookup_abbreviations WHERE abbreviated_name =
Old_Street_Nm_Prefix)
ELSE Old_Street_Nm_Prefix
END
INTO #SortedAddresses
FROM #AbbreviationSort
;
SELECT DISTINCT * FROM
(
SELECT Old_Street_Nbr, Old_Prefix FROM #SortedAddresses
UNION ALL
SELECT New_Street_Nbr, New_Prefix FROM #SortedAddresses
) AS DupSearch

How to group by more than one row value?

I am working with POSTGRESQL and I can't find out how to solve a problem. I have a model called Foobar. Some of its attributes are:
FOOBAR
check_in:datetime
qr_code:string
city_id:integer
In this table there is a lot of redundancy (qr_code is not unique) but that is not my problem right now. What I am trying to get are the foobars that have same qr_code and have been in a well known group of cities, that have checked in at different moments.
I got this by querying:
SELECT * FROM foobar AS a
WHERE a.city_id = 1
AND EXISTS (
SELECT * FROM foobar AS b
WHERE a.check_in < b.check_in
AND a.qr_code = b.qr_code
AND b.city_id = 2
AND EXISTS (
SELECT * FROM foobar as c
WHERE b.check_in < c.check_in
AND c.qr_code = b.qr_code
AND c.city_id = 3
AND EXISTS(...)
)
)
where '...' represents more queries to get more persons with the same qr_code, different check_in date and those well known cities.
My problem is that I want to group this by qr_code, and I want to show the check_in fields of each qr_code like this:
2015-11-11 14:14:14 => [2015-11-11 14:14:14, 2015-11-11 16:16:16, 2015-11-11 17:18:20] (this for each different qr_code)
where the data at the left is the 'smaller' date for that qr_code, and the right part are all the other dates for that qr_code, including the first one.
Is this possible to do with a sql query only? I am asking this because I am actually doing this app with rails, and I know that I can make a different approach with array methods of ruby (a solution with this would be well received too)
You could solve that with a recursive CTE - if I interpret your question correctly:
Assuming you have a given list of cities that must be visited in order by the same qr_code. Your text doesn't say so, but your query indicates as much.
WITH RECURSIVE
c AS (SELECT '{1,2,3}'::int[] AS cities) -- your list of city_id's here
, route AS (
SELECT f.check_in, f.qr_code, 2 AS idx
FROM foobar f
JOIN c ON f.city_id = c.cities[1]
UNION ALL
SELECT f.check_in, f.qr_code, r.idx + 1
FROM route r
JOIN foobar f USING (qr_code)
JOIN c ON f.city_id = c.cities[r.idx]
WHERE r.check_in < f.check_in
)
SELECT qr_code, array_agg(check_in) AS check_in_list
FROM (
SELECT *
FROM route
ORDER BY qr_code, idx -- or check_in
) sub
HAVING count(*) = (SELECT array_length(cities) FROM c);
GROUP BY 1;
Provide the list as array in the first (non-recursive) CTE c.
In the recursive part start with any rows in the first city and travel along your array until the last element.
In the final SELECT aggregate your check_in column in order. Only return qr_code that have visited all cities of the array.
Similar:
Recursive query used for transitive closure

SQL query to retrieve discrepancies in punch order

Consider the table below.
The rule is - an employee cannot take a break (needs to clock out) from job num 1 before clocking in to job num 2. In this case the employee "A" was supposed to clock OUT instead of BREAK on jobnum 1 because he later clocked in to JobNum#2
Is it possible to write a query to find this in plain SQL?
Idea is to check if next record is proper one. To find next record one has to find first punchtime after current for same employee. Once this information is retrieved one can isolate record itself and check fields of interest, specifically is jobnum the same and [optionally] is punch_type 'IN'. If it is not, not exists evaluates to true and record is output.
select *
from #punch p
-- Isolate breaks only
where p.punch_type = 'BREAK'
-- The ones having no proper entry
and not exists
(
select null
-- The same table
from #punch a
where a.emplid = p.emplid
and a.jobnum = p.jobnum
-- Next record has punchtime from subquery
and a.punchtime = (select min (n.punchtime)
from #punch n
where n.emplid = p.emplid
and n.punchtime > p.punchtime
)
-- Optionally you might force next record to be 'IN'
and a.punch_type = 'IN'
)
Replace #punch with your table name. -- is comment in Sql Server; if you are not using this database, remove this lines. It is a good idea to tag your database and version as there are probably faster/better ways to do this.
Here is the SQL
select * from employees e1 cross join employees e2 where e1.JOBNUM = (e2.JOBNUM + 1)
and e1.PUNCH_TYPE = 'BREAK' and e2.PUNCH_TYPE = 'IN'
and e1.PUNCHTIME < e2.PUNCHTIME
and e1.EMPLID = e2.EMPLID