Finding most specific prefix with SQL

Finding most specific prefix with SQL - sql

I have a bit of an SQL problem. Here are my tables:
areas(id, name, sla_id)
areas_groups(id, group_id, areaprefix)
The sla_id is an identifier from a different source - it is unique, but areas has its own auto-incrementing primary key.
The areaprefix field is the interesting one. It just contains the first few digits of the sla_id and is unique. Each area can only exist in one group, so the area belongs to the group with the most specific prefix. Example:
Group 12's area prefixes: 105, 110, 115, 805
Group 13's area prefixes: 1, 8
Area sla_id = 10533071 matches both group 12 (105*) and group 13 (1*)
"105" is longer, so this area is in group 12
Area sla_id = 81031983 matches only group 13 (8*)
The reason it's done like this is so we can easily make a "catch-all" group for areas which don't fall into any other group.
I can find which group an area is in like this:
-- eg: area with sla_id 105055200
SELECT * FROM (
SELECT group_id
FROM areas_groups
WHERE SUBSTR('105055200', 0, LENGTH(area_prefix)) = area_prefix
ORDER BY LENGTH(area_prefix) DESC
)
WHERE rownum = 1;
(Did I mention this is Oracle?)
Going the other way is the tricky one: Given a group Id, I want to find all the areas which belong to that group. That is, given group 13, I want all the areas that start with 1 or 8 but not 105, 110, 115 or 805 (in this example).
The closest I've come is this:
SELECT a.id, a.sla_id, MAX(LENGTH(ag.area_prefix)), ag.group_id
FROM areas a INNER JOIN areas_groups ag
ON (SUBSTR(a.sla_id, 0, LENGTH(ag.area_prefix)) = ag.area_prefix)
WHERE a.sla_id IS NOT NULL
GROUP BY a.id, a.sla_id, ag.group_id
That returns data like this:
id sla_id leng group_id
583 105308400 3 12
583 105308400 1 13
584 105556700 3 12
584 105556700 1 13
So if I could only grab the group_id which has the longest length for each id... I have a feeling that I'm really close but just missing a tiny little thing... Can anyone help put me out of my misery?

select id
, sla_id
, leng
, group_id
, (row_number() over (partition by id order by leng desc)) rn
from
(
SELECT a.id, a.sla_id, MAX(LENGTH(ag.area_prefix)) leng, ag.group_id
FROM areas a INNER JOIN areas_groups ag
ON (SUBSTR(a.sla_id, 0, LENGTH(ag.area_prefix)) = ag.area_prefix)
WHERE a.sla_id IS NOT NULL
GROUP BY a.id, a.sla_id, ag.group_id
)
where rn = 1

This is untested on Oracle, but I believe Oracle has supported COALESCE as a string function since version 9 so this should be OK unless you're working on an old version of Oracle.
I have assumed that there is also group of area_prefix records with two characters.
select a.id
,a.sla_id
,coalesce(ag3.area_prefix,ag2.area_prefix,ag1.area_prefix) area_prefix
,coalesce(ag3.group_id,ag2.group_id,ag1.group_id) group_id
from areas a
left join areas_groups ag3
on substr(a.sla_id,1,3) = ag3.area_prefix
left join areas_groups ag2
on substr(a.sla_id,1,2) = ag2.area_prefix
left join areas_groups ag1
on substr(a.sla_id,1,1) = ag1.area_prefix

Related

How to show data that's not in a table. SQL ORACLE

I've a data base with two tables.
Table Players Table Wins
ID Name ID Player_won
1 Mick 1 2
2 Frank 2 1
3 Sarah 3 4
4 Eva 4 5
5 Joe 5 1
I need a SQL query which show "The players who have not won any game".
I tried but I don't know even how to begin.
Thank you

You need all the rows from players that don't have corresponding rows in wins. For this you need a left join, filtering for rows that don't join:
select
p.id,
p.name
from Players p
left join Wins w on w.Player_won = p.id
where w.Player_won is null
You can also use not in:
select
id,
name
from Players
where id not in (select Player_won from Wins)

How about the MINUS set operator?
Sample data:
SQL> with players (id, name) as
2 (select 1, 'Mick' from dual union all
3 select 2, 'Ffrank' from dual union all
4 select 3, 'Sarah' from dual union all
5 select 4, 'Eva' from dual union all
6 select 5, 'Joe' from dual
7 ),
8 wins (id, player_won) as
9 (select 1, 2 from dual union all
10 select 2, 1 from dual union all
11 select 3, 4 from dual union all
12 select 4, 5 from dual union all
13 select 5, 1 from dual
14 )
Query begins here:
15 select id from players
16 minus
17 select player_won from wins;
ID
----------
3
SQL>
So, yes ... player 3 didn't win any game so far.

I think you should provide your attempts next time, but here you go:
select p.name
from players p
where not exists (select * from wins w where p.id = w.player_won);
MINUS is not the best option here because of not using indexes and instead performing a full-scan of both tables.

I've a data base with two tables.
You don't show the names or any definition of the tables, leaving me to make an educated guess about their structure.
I tried but I don't know even how to begin.
What exactly did you try? Possibly what you are missing here is the concept of a LEFT OUTER JOIN.
Assuming the tables are named player_table and wins_table, and have column names exactly as you showed, and that the player_won column is intended to express the number of games won by the player of that row's ID, and without knowing whether or not wins_table will have rows for players with zero wins… this should cover it:
select Name
from players_table pt
left join wins_table wt on (pt.ID = wt.ID)
-- Either this player is explicitly specified to have Player_won=0
-- or there is no row for this player ID in the wins table
-- (but excluding the possibility of an explicit NULL value, since its meaning would be unclear)
where Player_won = 0 or wt.ID is null;

As you can see from the variety of answers you've gotten, there are many ways to accomplish this.
One additional way to do this is to use COUNT in a correlated subquery, as in:
SELECT *
FROM PLAYERS p
WHERE 0 = (SELECT COUNT(*)
FROM WINS w
WHERE w.PLAYER_WON = p.ID)
db<>fiddle here

SELECT *
FROM Players p
INNER JOIN Wins w
ON p.ID = w.ID
WHERE w.players_won = 0
I have not done SQL in awhile but I think this might be right if you are looking for players with 0 wins

Select column with maximum value in another column but with aggregate SUM calculation

For each name, I need to output the category with the MAX net revenue and I am not sure how to do this. I have tried a bunch of different approaches, but it basically looks like this:
SELECT Name, Category, MAX(CatNetRev)
FROM (
SELECT Name, Category, SUM(Price*(Shipped-Returned)) AS "CatNetRev"
FROM a WITH (NOLOCK)
INNER JOIN b WITH (NOLOCK) ON b.ID = a.ID
...
-- (bunch of additional joins here, not relevant to question)
WHERE ... -- (not relevant to question)
GROUP BY Name, Category
) a GROUP BY Name;
This currently doesn't work because "Category" is not contained in an aggregate function or Group By (and this is obvious) but other approaches I have tried have failed for different reasons.
Each Name can have a bunch of different Categories, and Names can have the same Categories but the overlap is irrelevant to the question. I need to output just each unique Name that I have (we can assume they are already all unique) along with the "Top Selling Category" based on that Net Revenue calculation.
So for example if I have:
Name:
Category:
"CatNetRev":
A
1
100
A
2
300
A
3
50
B
1
300
B
2
500
C
1
40
C
2
20
C
3
10
I would want to output:
Name:
Category:
A
2
B
2
C
1
What's the best way to go about doing this?

Having to guess at your data schema a bit, as you didn't alias any of your columns, or define what table a vs b really was (as Gordon alluded). I'd use CROSS APPLY to get the max value, then bind the revenues in a WHERE clause, like so.
DECLARE #Revenue TABLE
(
Name VARCHAR(50)
,Category VARCHAR(50)
,NetRevenue DECIMAL(16, 9)
);
INSERT INTO #Revenue
(
Name
,Category
,NetRevenue
)
SELECT Name
,Category
,SUM(a.Price * (b.Shipped - b.Returned)) AS CatNetRev
FROM Item AS a
INNER JOIN ShipmentDetails AS b ON b.ID = a.ID
WHERE 1 = 1
GROUP BY
Name
,Category;
SELECT r.Name
,r.Category
FROM #Revenue AS r
CROSS APPLY (
SELECT MAX(r2.NetRevenue) AS MaxRevenue
FROM #Revenue AS r2
WHERE r.Name = r2.Name
) AS mr
WHERE r.NetRevenue = mr.MaxRevenue;

you can use window functions:
select * from
(
select * , rank() over (partition by Name order by CatNetRev desc) rn
from table
) t
where t.rn = 1

Query to select distinct values from different tables and not have them repeat (show them as a flat file)

I'm trying to get all phones, emails, and organizations for a person and show it in a flat file format. There should be n number of rows, where n is the max count of organizations, emails, or phones. NULL values will be shown once all values have been shown in the rows, with NULL being the last values. The emails and phones can only have 1 PreferredInd per person. I want these to be on the same row (1 of them can be NULL). I've tried to do this on a more complex query, but couldn't get it to work, so I've started over using this simpler example.
Example tables and values:
#ContactPerson
Id Name
1 John Doe
#ContactEmail
Id PersonId Email PreferredInd
1 1 johndoe#us.gov 0
2 1 jdoe#us.gov 1
3 1 johndoe#gmail.com 0
#ContactPhone
Id PersonId Phone PreferredInd
1 1 888-867-5309 0
2 1 305-476-5234 1
#ContactOrganization
Id PersonId Organization
1 1 US Government
2 1 US Army
I want a resulting set to look like:
Name Organization PreferredInd Email Phone
John Doe US Government 1 jdoe#us.gov 888-867-5309
John Doe US Army 0 johndoe#us.gov 305-467-5234
John Doe NULL 0 johndoe#gmail.com NULL
The complete sql code that I have for this example is here on pastebin. It also includes code to create the sample tables. It works when the count of emails exceeds the count of organizations or phones, but that won't always be true. I can't seem to figure out how to get the result that I'm looking for. The actual tables I'm working with can have 0 or infinity emails, phones, or organizations per person. There will also be many more values, but I can fix that myself.
Can you help me fix my query or show me a simpler way to do it? If you have any questions, just let me know and I can try to answer them.

something like this?
with cte_e as (
select
*,
row_number() over(order by PreferredInd desc, Id) as rn
from ContactEmail
), cte_p as (
select
*,
row_number() over(order by PreferredInd desc, Id) as rn
from ContactPhone
), cte_o as (
select
*,
row_number() over(order by Organization) as rn
from ContactOrganization
), cte_d as (
select distinct rn, PersonId from cte_e union
select distinct rn, PersonId from cte_p union
select distinct rn, PersonId from cte_o
)
select
pr.Name, o.Organization, e.Email, p.Phone
from cte_d as d
left outer join ContactPerson as pr on pr.Id = d.PersonId
left outer join cte_e as e on e.PersonId = d.PersonId and e.rn = d.rn
left outer join cte_p as p on p.PersonId = d.PersonId and p.rn = d.rn
left outer join cte_o as o on o.PersonId = d.PersonId and o.rn = d.rn
sql fiddle demo
it's a bit clumsy, I can think of couple of other possible ways to do this, but I think this one is most readable one

Step 1
Write a query that does the full join of all the tables, which will end up with lots of duplicate rows for each person (for each email or phone number)
Step 2
Write a second query that uses GroupBy to group the rows, and that uses the Case or Decode keywords (like a c# switch statement) to find the preferred row value and select it as the value to display

How can a group by be converted to a self-join

for a table such as:
employeeID | groupCode
1 red111
2 red111
3 blu123
4 blu456
5 red553
6 blu423
7 blu341
how can I count the number of employeeIDs that are in parent groups (such as red or blu, but there are many more groups in the real table) that have a total number of group members greater than 2 (so all those with blu in this particular example) excluding themselves.
To expand: groupCode consists of a parent group (three letters), followed by some numbers for the subgroup.
using a self-join, or at least without using a group by statement.
So far I have:
SELECT T1.employeeID
FROM TABLE T1, TABLE T2
WHERE T1.groupCode <> T2.groupCode
AND SUBSTR(T1.groupCode, 1, 3) = SUBSTR(T2.gorupCode, 1, 3);
but that doesn't do much for me...

Add an index on the first 3 characters of EMPLOYEE.
Then try this one:
SELECT ed.e3
, COUNT(*)
FROM EMPLOYEE e
JOIN
( SELECT DISTINCT
SUBSTR(groupCode, 1, 3) AS e3
FROM EMPLOYEE
) ed
ON e.groupCode LIKE CONCAT(ed.e3, '%')
GROUP BY ed.e3
HAVING COUNT(*) >= 3 --- or whatever is wanted

What about
SELECT substring(empshirtno, 1, 3),
Count(SELECT 1 from myTable as myTable2
WHERE substring(mytable.empshirtno, 1, 3) = substring(mytable2.empshirtno, 1, 3))
FROM MyTable
GROUP BY substring(mytable2.empshirtno, 1, 3)
maybe counting from a subquery is speedier with an index

Joining onto a table that doesn't have ranges, but requires ranges

Trying to find the best way to write this SQL statement.
I have a customer table that has the internal credit score of that customer. Then i have another table with definitions of that credit score. I would like to join these tables together, but the second table doesn't have any way to link it easily.
The score of the customer is an integer between 1-999, and the definition table has these columns:
Score
Description
And these rows:
60 LOW
99 MED
999 HIGH
So basically if a customer has a score between 1 and 60 they are low, 61-99 they are med, and 100-999 they are high.
I can't really INNER JOIN these, because it would only join them IF the score was 60, 99, or 999, and that would exclude anyone else with those scores.
I don't want to do a case statement with the static numbers, because our scores may change in the future and I don't want to have to update my initial query when/if they do. I also cannot create any tables or functions to do this- I need to create a SQL statement to do it for me.
EDIT:
A coworker said this would work, but its a little crazy. I'm thinking there has to be a better way:
SELECT
internal_credit_score
(
SELECT
credit_score_short_desc
FROM
cf_internal_credit_score
WHERE
internal_credit_score = (
SELECT
max(credit.internal_credit_score)
FROM
cf_internal_credit_score credit
WHERE
cs.internal_credit_score <= credit.internal_credit_score
AND credit.internal_credit_score <= (
SELECT
min(credit2.internal_credit_score)
FROM
cf_internal_credit_score credit2
WHERE
cs.internal_credit_score <= credit2.internal_credit_score
)
)
)
FROM
customer_statements cs

try this, change your table to contain the range of the scores:
ScoreTable
-------------
LowScore int
HighScore int
ScoreDescription string
data values
LowScore HighScore ScoreDescription
-------- --------- ----------------
1 60 Low
61 99 Med
100 999 High
query:
Select
.... , Score.ScoreDescription
FROM YourTable
INNER JOIN Score ON YourTable.Score>=Score.LowScore
AND YourTable.Score<=Score.HighScore
WHERE ...

Assuming you table is named CreditTable, this is what you want:
select * from
(
select Description, Score
from CreditTable
where Score > 80 /*client's credit*/
order by Score
)
where rownum = 1
Also, make sure your high score reference value is 1000, even though client's highest score possible is 999.
Update
The above SQL gives you the credit record for a given value. If you want to join with, say, Clients table, you'd do something like this:
select
c.Name,
c.Score,
(select Description from
(select Description from CreditTable where Score > c.Score order by Score)
where rownum = 1)
from clients c
I know this is a sub-select that executed for each returning row, but then again, CreditTable is ridiculously small and there will be no significant performance loss because of the the sub-select usage.

You can use analytic functions to convert the data in your score description table to ranges (I assume that you meant that 100-999 should map to 'HIGH', not 99-999).
SQL> ed
Wrote file afiedt.buf
1 with x as (
2 select 60 score, 'Low' description from dual union all
3 select 99, 'Med' from dual union all
4 select 999, 'High' from dual
5 )
6 select description,
7 nvl(lag(score) over (order by score),0) + 1 low_range,
8 score high_range
9* from x
SQL> /
DESC LOW_RANGE HIGH_RANGE
---- ---------- ----------
Low 1 60
Med 61 99
High 100 999
You can then join this to your CUSTOMER table with something like
SELECT c.*,
sd.*
FROM customer c,
(select description,
nvl(lag(score) over (order by score),0) + 1 low_range,
score high_range
from score_description) sd
WHERE c.credit_score BETWEEN sd.low_range AND sd.high_range

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas