How to match phone number prefix to country from phonenumber in SQL - sql

I am trying to extract the country code prefix from a list of numbers, and match them to the region that they belong to. The data might look something like this:
| id | phone_number |
|----|----------------|
| 1 | +27000000000 |
| 2 | +16840000000 |
| 3 | +10000000000 |
| 4 | +27000000000 |
The country codes here are:
American Samoa: +1684
United States and Caribbean: +1
South Africa: +27
And the desired result would be something this:
| country | count |
|-----------------------------|-------|
| South Africa | 2 |
| American Samoa | 1 |
| United States and Caribbean | 1 |
There are some difficulties because
country prefix codes vary from 1 to 4 numbers and even without the country prefix,
phone number length varies from place to place.
I do not have write access to this DB, so adding another column, while probably the best solution, will not work in this use case
This is my current solution:
SELECT
CASE
WHEN SUBSTRING(phone_number,1,5) = '+1684' THEN 'American Samoa'
WHEN SUBSTRING(phone_number,1,5) = '+1264' THEN 'Anguilla'
...
WHEN SUBSTRING(phone_number,1,5) = '+1599' THEN 'Saint Martin'
WHEN SUBSTRING(phone_number,1,4) = '+355' THEN 'Albania'
WHEN SUBSTRING(phone_number,1,4) = '+213' THEN 'Algeria'
...
WHEN SUBSTRING(phone_number,1,4) = '+263' THEN 'Zimbabwe'
WHEN SUBSTRING(phone_number,1,3) = '+93' THEN 'Afghanistan'
WHEN SUBSTRING(phone_number,1,3) = '+54' THEN 'Argentina'
...
WHEN SUBSTRING(phone_number,1,3) = '+58' THEN 'Venezuela'
WHEN SUBSTRING(phone_number,1,3) = '+84' THEN 'Vietnam'
WHEN SUBSTRING(phone_number,1,2) = '+1' THEN 'United States and Caribbean'
WHEN SUBSTRING(phone_number,1,2) = '+7' THEN 'Kazakhstan, Russia'
ELSE 'unknown'
END as country_name,
count(*)
FROM users
GROUP BY country_name
order by count desc
There are ~205 WHEN ... THEN cases. It seems to be very inefficient and times out. I assume this is because it runs the pattern matching on every row. This would need to scale to roughly 10s of millions of rows
Is there a more efficient way to do this?
I am using postgreSQL 9.6.16

In spite of reading the whole table, an index could help here. In order to aggregate the data per country code, the DBMS must sort all rows by country code. Sorting is an expensive operation, especially on large data sets. If you had an index on the country codes, the DBMS would find the codes already pre-sorted in the index and could avoid the work of sorting the data.
You don't have the separate country code in a column, but each phone number starts with the code, so you could index the complete phone number:
create index idx on users (phone_number);
Then you must make it obvious to the DBMS that you are interested in the beginnings of the string, so it will consider using the index. Invoking a function like SUBSTRING on the phone number is likely to make the the DBMS blind to this. Use LIKE instead. According to the docs (https://www.postgresql.org/docs/9.3/indexes-types.html), indexes on strings can be used with LIKE 'something%':
WHEN phone_number LIKE '+1684%' THEN 'American Samoa'
There is no guarantee this will help, but it's worth a try I think. It depends on whether the optimizer sees the advantage of using the pre-sorted phone numbers from the index.

Related

How to do an exact match followed by ORDER BY in PostgreSQL

I'm trying to write a query that puts some results (in my case a single result) at the top, and then sorts the rest. I have yet to find a PostgreSQL solution.
Say I have a table called airports like so.
id | code | display_name
----+------+----------------------------
1 | SDF | International
2 | INT | International Airport
3 | TES | Test
4 | APP | Airport Place International
In short, I have a query in a controller method that gets called asynchronously when a user text searches for an airport either by code or display_name. However, when a user types in an input that matches a code exactly (airport code is unique), I want that result to appear first, and all airports that also have int in their display_name to be displayed afterwards in ascending order. If there is no exact match, it should return any wildcard matches sorted by display_name ascending. So if a user types in INT, The row (2, INT, International Airport) should be returned first followed by the others:
Results:
1. INT | International Airport
2. APP | Airport Place International
3. SDF | International
Here's the kind of query I was tinkering with that is slightly simplified to make sense outside the context of my application but same concept nonetheless.
SELECT * FROM airports
WHERE display_name LIKE 'somesearchtext%'
ORDER BY (CASE WHEN a.code = 'somesearchtext` THEN a.code ELSE a.display_name END)
Right now the results if I type INT I'm getting
Results:
1. APP | Airport Place International
2. INT | International Airport
3. SDF | International
My ORDER BY must be incorrect but I can't seem to get it
Any help would be greatly appreciated :)
If you want an exact match on code to return first, then I think this does the trick:
SELECT a.*
FROM airports a
WHERE a.display_name LIKE 'somesearchtext%'
ORDER BY (CASE WHEN a.code = 'somesearchtext' THEN 1 ELSE 2 END),
a.display_name
You could also write this as:
ORDER BY (a.code = 'somesearchtext') DESC, a.display_name
This isn't standard SQL, but it is quite readable.
I think you can achieve your goal by using a UNION.
First get an exact match and then add that result to rest of the data as you which.
e.g.. (you will need to work in this a bit)
SELECT * FROM airports
WHERE code == 'somesearchtext'
ORDER BY display_name
UNION
SELECT * FROM airports
WHERE code != 'somesearchtext' AND display_name LIKE 'somesearchtext%'
ORDER BY display_name

Single record buffering in SAP ABAP

My table is stud.
+-----+------+-------+
| no | name | grade |
+-----+------+-------+
| 101 | naga | A |
| 102 | raj | A |
| 103 | john | A |
+-----+------+-------+
The query I'm using is:
SELECT * FROM stud WHERE no = 101 AND grade = 'A'.
If am using single record buffering, how much data is being stored in the buffer area?
This query doesn't do anything. There is no "into" clause. meaning it wont store anything selected.
You are probably looking to do something like this....
SELECT * FROM stud into wa_stud WHERE no = 101 AND grade = 'A'.
"processing of each single row is performed here
endselect.
or perhaps something like this, where only 1 row (the first rows ordered by primary key) is selected...
select single * from stud into wa_stud where no = 101 and grade = 'A' .
or perhaps you want everything brought in to a table, meaning number and grade does not include the full primary key.
select * from stud into table it_stud where no = 101 and grade = 'A'.
this is from ABAP Keyword documentation in SE38:
SAP Buffer - Single Record Buffering
Only those rows in the table are buffered that are actually accessed.
This requires less space in the buffer than when using generic or full
buffering. On the other hand, more administration work is required and
significantly more direct database accesses.
So since your query returns a single record (based on the data you displayed) it should just get one row and hold in the buffer.
I'd suggest looking at SAP help and Google - also have a look at SELECT SINGLE and incompletely specified keys - there used to be a problem with the buffer being bypassed in some situations - have a read for reference.

Postgres matching against an array of regular expressions

My client wants the possibility to match a set of data against an array of regular expressions, meaning:
table:
name | officeId (foreignkey)
--------
bob | 1
alice | 1
alicia | 2
walter | 2
and he wants to do something along those lines:
get me all records of offices (officeId) where there is a member with
ANY name ~ ANY[.*ob, ali.*]
meaning
ANY of[alicia, walter] ~ ANY of [.*ob, ali.*] results in true
I could not figure it out by myself sadly :/.
Edit
The real Problem was missing form the original description:
I cannot use select disctinct officeId .. where name ~ ANY[.*ob, ali.*], because:
This application, stored data in postgres-xml columns, which means i do in fact have (after evaluating xpath('/data/clients/name/text()'))::text[]):
table:
name | officeId (foreignkey)
-----------------------------------------
[bob, alice] | 1
[anthony, walter] | 2
[alicia, walter] | 3
There is the Problem. And "you don't do that, that is horrible, why would you do it like this, store it like it is meant to be stored in a relation database, user a no-sql database for Document-based storage, use json" are no options.
I am stuck with this datamodel.
This looks pretty horrific, but the only way I can think of doing such a thing would be a hybrid of a cross-join and a semi join. On small data sets this would probably work pretty well. On large datasets, I imagine the cross-join component could hit you pretty hard.
Check it out and let me know if it works against your real data:
with patterns as (
select unnest(array['.*ob', 'ali.*']) as pattern
)
select
o.name, o.officeid
from
office o
where exists (
select null
from patterns p
where o.name ~ p.pattern
)
The semi-join helps protect you from cases where you have a name like "alicia nob" that would meet multiple search patterns would otherwise come back for every match.
You could cast the array to text.
SELECT * FROM workers WHERE (xpath('/data/clients/name/text()', xml_field))::text ~ ANY(ARRAY['wal','ant']);
When casting a string array into text, strings containing special characters or consisting of keywords are enclosed in double quotes kind of like {jimmy,"walter, james"} being two entries. Also when matching with ~ it is matched against any part of the string, not the same as LIKE where it's matched against the whole string.
Here is what I did in my test database:
test=# select id, (xpath('/data/clients/name/text()', name))::text[] as xss, officeid from workers WHERE (xpath('/data/clients/name/text()', name))::text ~ ANY(ARRAY['wal','ant']);
id | xss | officeid
----+-------------------------+----------
2 | {anthony,walter} | 2
3 | {alicia,walter} | 3
4 | {"walter, james"} | 5
5 | {jimmy,"walter, james"} | 4
(4 rows)

What's the best way to query a column to see if it contains a particular number? The column is varchar

I have a table with a column that includes a handful of numbers delimited by a comma. I need to select * rows that include a particular value. I am using SQL Server and C# so it can be in SQL or LINQ.
The data in my channels column (varchar) looks something like this: 1,5,8,22,27,33
My Media table looks like this:
MediaID MediaName MediaDate ChannelIDs
------- --------- --------- ----------
1 | The Cow Jumped Over The Moon | 01/18/2015 | 1,5,8,22,27,33
2 | The Cat In The Hat | 01/18/2015 | 2,4,9,25,28,31
3 | Robin Hood The Thief | 01/18/2015 | 3,5,6,9,22,33
4 | Jingle Bells Batman Smells | 01/18/2015 | 6,7,9,24,25,32
5 | Up The River Down The River | 01/18/2015 | 5,6,10,25,26,33
etc...
My Channels Table looks like this:
ChannelID ChannelName
--------- -----------
1 Animals
2 Television
3 Movies
4 Nursery Rhymes
5 Holidays
etc...
Each row of Media could contain multiple channels.
Should I be using a contains search like this?
SELECT * FROM Media WHERE CONTAINS (Channels,'22')
This would require me to full-text index this column but I don't really want to include this column in my full-text index.
Is there a better way to do this?
Thanks
You should fix your data format so you are not storing numbers as comma-delimited strings. SQL has a great data structure for lists, it is called a table not a string. In particular, you want a junction table with one row per "media" entity and id.
That said, sometimes you are stuck with a particular data structure. If so, you can use like:
where ','+channels+',' like '%,22,%'
Note: this cannot take advantage of regular indexes, so performance will not be good. Fix the data structure if you have a large table and need better performance.

Need lowest price in each region in a mysql query

I am trying to write up a query for wordpress which will give me all the post_id's with the lowest fromprice field for each region. Now the trick is these are custom fields in wordpress, and due to such, the information is stored row based, so there is no region and fromprice columns.
So the data I have is (but of course containing a lot more rows):
Post_ID | Meta_Key | Meta_Value
1 | Region | Location1
1 | FromPrice | 150
2 | Region | Location1
2 | FromPrice | 160
3 | Region | Location2
3 | FromPrice | 145
The query I am endeavoring to build should return the post_id of the "lowest priced" matching post grouped by each region with results like:
Post_ID | Region | From Price
1 | Location1 | 150
3 | Location2 | 145
This will allow me to easily iterate the post_id's and print the required information, in fact, I would be just happy with returning post_id's if the rest is harder, I can then fetch the information independently if need be.
Thanks a lot, tearing my hair out over this one; don't often have to think about shifting results on their side from row based to column based that often, but this time I need it!
So you get an idea of the table structure I have, you can use the below as a guide. I thought I had this, but it turned out yes, this query prints out each distinct region WITH the lowest from price found attached to that post in the region, but the post_id is completely incorrect. I don't know why, it seems to be just getting the first result of the post_id and using that.
SELECT pm.post_id,
pm2.meta_value as region,
MIN(pm.meta_value) as price
FROM `wp_postmeta` pm
inner join `wp_postmeta` pm2
on pm2.post_id = pm.post_id
AND pm2.meta_key = 'region'
AND pm.meta_key = 'fromprice'
group by region
I suggest changing MIN(pm.meta_value) in your query to be MIN(CAST(pm.meta_value AS DECIMAL)). Meta_value is a character field, so your existing query will be returning the minimum string value, not the minimum numeric value; for example, "100" will be deemed to be lower than "21".
EDIT - amended CAST syntax.
It's hard to figure out without being able to execute the query, but would it help to just change your group by to:
group by pm.post_id, region