I have a MySQL table containing domain names:
+----+---------------+
| id | domain |
+----+---------------+
| 1 | amazon.com |
| 2 | google.com |
| 3 | microsoft.com |
| | ... |
+----+---------------+
I'd like to be able to search through this table for a full hostname (i.e. 'www.google.com'). If it were the other way round where the table contained the full URL I'd use:
SELECT * FROM table WHERE domain LIKE '%google.com%'
But the inverse is not so straightforward. My current thinking is to search for the full hostname, then progressively strip off each part of the domain, and search again. (i.e. search for 'www.google.com' then 'google.com')
This is not particular efficient or clever, there must be a better way. I am sure it is a common problem, and no doubt easy to solve!
You can use the column on the right of the like too:
SELECT domain FROM table WHERE 'www.google.com' LIKE CONCAT('%', domain);
or
SELECT domain FROM table WHERE 'www.google.com' LIKE CONCAT('%', domain, '%');
It's not particularly efficient but it works.
In mysql you can use regular expressions (RLIKE) to perform matches. Given this ability you could do something like this:
SELECT * FROM table WHERE 'www.google.com' RLIKE domain;
It appears that the way RLIKE has been implemented it is even smart enough to treat the dot in that field (normally a wildcard in regex) as a literal dot.
MySQL's inclusion of regular expressions gives you a very powerful ability to parse and search strings. If you would like to know more about regular expressions, just google "regex". You can also use one of these links:
http://en.wikipedia.org/wiki/Regular_expression
http://www.regular-expressions.info/
http://www.codeproject.com/KB/string/re.aspx
You could use a bit of SQL string manipulation to generate the equivalent of string.EndsWith():
SELECT * FROM table WHERE
substring('www.google.com',
len('www.google.com') - len([domain]) ,
len([domain])+1) = [domain]
Related
I have a problem where I have two tables. One table constains urls and their information and another groups of urls that should be grouped by a pattern.
Urls table:
------------------------------------------------
| url | files |
| https://myurl1/test/one/es/main.html | 530 |
| https://myurl1/test/one/en/main.html | 530 |
| https://myurl1/test/one/ar/main.html | 530 |
------------------------------------------------
Urls patterns table:
---------------------------------------------
| group | url_pattern |
| group1 | https://myurl1/test/one/(es|en)/%|
| group2 | https://myurl1/test/one/(ar)/% |
---------------------------------------------
I have tried something like this bearing in mind that url_patterns will only have one row per group.
SELECT * FROM urls_table
WHERE url SIMILAR TO (SELECT MAX (url_pattern) FROM url_patterns WHERE group='group1')
LIMIT 10
The main problem here is that it seems that applying SIMILAR TO with a column argument is not working.
Could anyone give me some advices?
Thanks in advance.
You are running into the requirement that regexp patterns are compiled and that SIMILAR TO is a layer on regexp. So what you are trying to do won't work. I believe there are a number of other ways to do this.
I) Change to LIKE pattern matching: LIKE patterns aren't precompiled so can use dynamic patterns. The downside is that they are more limited but I think you can still do what you want. Just change your patterns to be set of pattern columns (if the number of patterns is limited) and test for all the patterns. Unneeded patterns can just be a value that can never match. Definitely a brute force hack.
II) Change to LIKE pattern matching w/ SQL to provide OR behavior: have multiple LIKE patterns in the url_pattern column separated by '|' (for example). Then use split_part to match each sub-pattern - a bit complex and possible slow but works. Like this:
SELECT url
FROM urls_table
LEFT JOIN (SELECT split_part(pattern, '|', part_no::int) as pattern
FROM url_patterns
CROSS JOIN (SELECT row_number() over () as part_no FROM urls_table)
WHERE "group" = 'group1'
)
ON url LIKE pattern
WHERE p.pattern IS NOT NULL;
You will also need to change your pattern strings to use the simpler LIKE format and use '|' for multiple possibilities - Ex: Group1 pattern becomes 'https://myurl1/test/one/es/%|https://myurl1/test/one/en/%'
III) Use some front-end query modification to find the pattern for the group and apply it to query BEFORE it is sent to the compiler. This could be an external tool or a stored procedure on Redshift. Get the pattern in one query and use it to issue the second query.
Do you want exists?
SELECT u.*
FROM urls_table u
WHERE EXISTS (SELECT 1
FROM url_patterns p
WHERE u.url SIMILAR TO p.url_pattern AND
p.group = 'group1'
)
LIMIT 10;
I need to test if any part of a column value is in a given string, instead of whether the string is part of a column value.
For instance:
This way, I can find if any of the rows in my table contains the string 'bricks' in column:
SELECT column FROM table
WHERE column ILIKE '%bricks%';
But what I'm looking for, is to find out if any part of the sentence "The ships hung in the sky in much the same way that bricks don’t" is in any of the rows.
Something like:
SELECT column FROM table
WHERE 'The ships hung in the sky in much the same way that bricks don’t' ILIKE '%' || column || '%';
So the row from the first example, where the column contains 'bricks', will show up as result.
I've looked through some suggestions here and some other forums but none of them worked.
Your simple case can be solved with a simple query using the ANY construct and ~*:
SELECT *
FROM tbl
WHERE col ~* ANY (string_to_array('The ships hung in the sky ... bricks don’t', ' '));
~* is the case insensitive regular expression match operator. I use that instead of ILIKE so we can use original words in your string without the need to pad % for ILIKE. The result is the same - except for words containing special characters: %_\ for ILIKE and !$()*+.:<=>?[\]^{|}- for regular expression patterns. You may need to escape special characters either way to avoid surprises. Here is a function for regular expressions:
Escape function for regular expression or LIKE patterns
But I have nagging doubts that will be all you need. See my comment. I suspect you need Full Text Search with a matching dictionary for your natural language to provide useful word stemming ...
Related:
IN vs ANY operator in PostgreSQL
PostgreSQL LIKE query performance variations
Pattern matching with LIKE, SIMILAR TO or regular expressions in PostgreSQL
This query:
SELECT
regexp_split_to_table(
'The ships hung in the sky in much the same way that bricks don’t',
'\s' );
gives a following result:
| regexp_split_to_table |
|-----------------------|
| The |
| ships |
| hung |
| in |
| the |
| sky |
| in |
| much |
| the |
| same |
| way |
| that |
| bricks |
| don’t |
Now just do a semijoin against a result of this query to get desired results
SELECT * FROM table t
WHERE EXISTS (
SELECT * FROM (
SELECT
regexp_split_to_table(
'The ships hung in the sky in much the same way that bricks don’t',
'\s' ) x
) x
WHERE t.column LIKE '%'|| x.x || '%'
)
i keep data on table rows as followed like this;
t_course
+------+------------------------------------------+
| sid | courses |
+------+------------------------------------------+
| 1 | cs101.math102.ns202-2.phy104 |
+------+------------------------------------------+
| 2 | cs101.math201.ens202-1.phy104-10.chm105 |
+------+------------------------------------------+
| 3 | cs101.ns202-2.math201.ens202-1.phy104 |
+------+------------------------------------------+
Now, i want to take the sum of courses mentioned ns202 and ens202 in same time. Normally it should only brings record which id is 3, it brings all of the records (because of instr). i have used many methods for this, but it doesn't work. For example;
select count(*) from
t_course
where
instr(courses, 'ns202') > 0
and instr(courses, 'ens202') > 0;
Above code doesn't work properly because it takes ns202 but ens202 contains ns202 in itself.
I tried using regular expressions, i converted all course to row (split) but this has both broke working logic and slowed down.
How can i do this with regular expressions instead of instr according to begin withs (for example ns202%) logic? (Begining with ns202 first or after dot)
You can use regexp_like with word boundaries to get rows which have both ns202 and ens_202. Normally you would use \b for word-boundaries. As Oracle doesn't support it, the alternate is to use (\s|\W) with start ^ and end $ anchors.
\s - space character, \W - non word character. Add more characters as needed, as word-boundaries based on your requirements.
select *
from t_course
where regexp_like(courses,'(^|\s|\W)ns202(\s|\W|$)')
and regexp_like(courses,'(^|\s|\W)ens202(\s|\W|$)')
You will have the same problem with ens202, by the way - what if there is also cens202or tens202?
You can solve your problem with regular expressions. You can also solve it with the LIKE operator:
select <whatever>
from <table or tables>
where (courses like 'ns202%' or courses like '%.ns202%')
and (courses like 'ens202%' or courses like '%.ens202%')
You can test both approaches to see which works best for your data.
So we have a table with a field that contains strings.
These strings can contain wildcards.
For example:
id | name
---+----------------
1 | thomas
2 | san*
3 | *max*
Now I want to select from that table with respect to these wildcards.
For example something like this:
SELECT * FROM table WHERE name = 'sandra'.
That SELECT should fetch the record with ID = 2 from my table.
Note that it would be ok to use % instead of * as the wildcard character in the table.
Any way to achieve this in OpenSQL?
You can use wildcards, just the sign (like Matecki said), is %.
Take a look here:
https://scn.sap.com/thread/1418148
Additionally You could create and use a ranges table in the where clause. If You do not know, what it is, and how it can be done, just tell me. Populate the ranges table like this: OPTION = CP, SIGN = I, LOW = san.
Ok for You?
UPDATE:
I was wrong and changed the answer
When using BigQuery query with Data containing URLs we noticed that the DOMAIN function behaves differently from the case of the URL.
This can be demonstrated with this simple query:
SELECT
domain('WWW.FOO.COM.AU'),
domain(LOWER('http://WWW.FOO.COM.AU/')),
domain('http://WWW.FOO.COM.AU/')
The result of URL of full uppercase does not seem to be right and the documentation does not mentioned anything regarding case in URLs.
DOMAIN (and the other URL-handling functions in legacy SQL) have a number of limitations, unfortunately. While we don't have an equivalent yet in standard SQL (uncheck the "Use Legacy SQL" box under Options), you can make up your own that works in more cases using a regular expression. There are a number of StackOverflow questions about domain extraction, and we can put one of the answers to use as:
CREATE TEMPORARY FUNCTION GetDomain(url STRING) AS (
REGEXP_EXTRACT(url, r'^(?:https?:\/\/)?(?:[^#\n]+#)?(?:www\.)?([^:\/\n]+)'));
WITH T AS (
SELECT url
FROM UNNEST(['WWW.FOO.COM.AU:8080', 'google.com',
'www.abc.xyz', 'http://example.com']) AS url)
SELECT
url,
GetDomain(url) AS domain
FROM T;
+---------------------+----------------+
| url | domain |
+---------------------+----------------+
| www.abc.xyz | abc.xyz |
| WWW.FOO.COM.AU:8080 | WWW.FOO.COM.AU |
| google.com | google.com |
| http://example.com | example.com |
+---------------------+----------------+