SQL find similar content - sql

I have table ITEMS and column URL. All I need is in items.url to find similar rows:
Example of two similar rows:
ITEM_ID | URL
1 | www.google.com/test1/test2/test3.php
2 | www.yahoo.com/test1/test2/test3.php
3 | www.google.com/test5.php
4 | www.facebook.com/test5.php
As you can see the URL is similar JUST with different domains.
My query should be something like:
SELECT * FROM ITEMS
WHERE URL LIKE `%google.com%`...
AND `here code probably` ???
My query should return me ITEM_ID 2 and 4

You could group by the substring starting from the '/' character, and take the max ID in the group. Using postgresql syntax, it should look like this:
SELECT *
FROM ITEMS t
WHERE t.item_id IN (SELECT MAX(s.item_d)
FROM ITEMS s
GROUP BY SUBSTRING(s.url FROM POSITION('/' IN s.url)))
ORDER BY t.item_id;
Update: if you want only google domains, which have similar rows on different domains, you could use a filter EXISTS:
SELECT *
FROM ITEMS t
WHERE t.url LIKE 'www.google.com%'
AND EXISTS (SELECT 1
FROM ITEMS s
WHERE s.url NOT LIKE 'www.google.com%'
AND SUBSTRING(t.url FROM POSITION('/' IN t.url)) =
SUBSTRING(s.url FROM POSITION('/' IN s.url)));

Related

Select rows from table with specific string value endings from list

I have a table with two columns item_name, value where item_names looks like "abracadabra_prefix.tag_name". And I need to select rows with tag_names from a list that doesn't have a prefix.
Should be somthing like:
tag_names = ['f1', 'k500', '23_g']
SELECT * FROM table WHERE item_name IN (LIKE "%{tag_names});
input table:
item_name
value
fasdaf.f1
1
asdfe.f2
2
eywvs.24_g
2
asdfe.l500
2
asdfe.k500
2
eywvs.23_g
2
output table:
item_name
value
fasdaf.f1
1
asdfe.k500
2
eywvs.23_g
2
I have tried concatenating a string in a loop to get a query like this:
SELECT * FROM table WHERE item_name LIKE '%f1' OR item_name LIKE '%k500' OR item_name LIKE '%23_g';
But I can have from 1 to 200 tags, and with a large number of tags, this makes the query too complicated,as I understand it.
You can extract the suffix of item_name using substring with regexp and then use the any operator for comparison in the where clause.
select * from the_table
where substring (item_name from '\.(\w+)$') = any('{f1,k500,23_g}'::text[]);
SQL fiddle demo
If you intend to use the query as a parameterized one then it will be convenient to replace '{f1,k500,23_g}'::text[] with string_to_array('f1,k500,23_g', ','), i.e. pass the list of suffixes as a comma-separated string. Please note that this query will result in a sequential scan.
You can use:
UNNEST to extract tag values from your array,
CROSS JOIN to associate tag value to each row of your table
LIKE to make a comparison between your item_name and your tag
SELECT item_name, value_
FROM tab
CROSS JOIN UNNEST(ARRAY['f1', 'k500', '23_g']) AS tag
WHERE item_name LIKE '%' || tag || '%'
Output:
item_name
value_
fasdaf.f1
1
asdfe.k500
2
eywvs.23_g
2
Check the demo here.

Searching for a number in a database column where column contains series of numbers seperated by a delimeter '"&" in SQLite

My table structure is as follows :
id category
1 1&2&3
2 18&2&1
3 11
4 1&11
5 3&1
6 1
My Question: I need a sql query which generates the result set as follows when the user searched category is 1
id category
1 1&2&3
2 18&2&1
4 1&11
5 3&1
6 1
but i am getting all the results not the expected one
I have tried regexp and like operators but no success.
select * from mytable where category like '%1%'
select * from mytable where category regexp '([.]*)(1)(.*)'
I really dont know about regexp I just found it.
so please help me out.
For matching a list item separated by &, use:
SELECT * FROM mytable WHERE '&'||category||'&' LIKE '%&1&%';
this will match entire item (ie, only 1, not 11, ...), whether it is at list beginning, middle or end.

Can I use column labels in SELECT list as columns?

Let's say I have a table like this:
id | name |
1 | Can add permission |
Can I somehow refer to already defined column labels in select list?
To write something like this:
select id as A, A > 1 as B from auth_permission LIMIT 1;
Looks like not.
But maybe someone knows some cleaver trick? I use PostgreSQL, just in case.
SELECT a,
a > 1 AS b
FROM (SELECT id AS a
FROM auth_permission
LIMIT 1) subsel

Postgres: regex and nested queries something like Unix pipes

Command should do: Give 1 as output if the pattern "*#he.com" is on the row excluding the headings:
user_id | username | email | passhash_md5 | logged_in | has_been_sent_a_moderator_message | was_last_checked_by_moderator_at_time | a_moderator
---------+----------+-----------+----------------------------------+-----------+-----------------------------------+---------------------------------------+-------------
9 | he | he#he.com | 6f96cfdfe5ccc627cadf24b41725caa4 | 0 | 1 | 2009-08-23 19:16:46.316272 |
In short, I want to connect many SELECT-commands with Regex, rather like Unix pipes. The output above is from a SELECT-command. A new SELECT-command with matching the pattern should give me 1.
Related
Did you mean
SELECT regexp_matches( (SELECT whatevername FROM users WHERE username='masi'), 'masi');
you obviously can not feed the record (*) to regexp_matches, but I assume this is not what your problem is, since you mention the issue of nesting SQL queries in the subject.
Maybe you meant something like
SELECT regexp_matches( wn, 'masi' ) FROM (SELECT whatevername AS wn FROM users WHERE username LIKE '%masi%') AS sq;
for the case when your subquery yields multiple results.
It looks like you could use a regular expression query to match on the email address:
select * from table where email ~ '.*#he.com';
To return 1 from this query if there is a match:
select distinct 1 from table where email ~ '.*#he.com';
This will return a single row containing a column with 1 if there is a match, otherwise no rows at all. There are many other possible ways to construct such a query.
Let's say that your original query is:
select * from users where is_active = true;
And that you really want to match in any column (which is bad idea for a lot of reasons), and you want just to check if "*#he.com" matches any row (by the way - this is not correct regexp! correct would be .*#he.com, but since there are no anchors (^ or $) you can just write #he.com.
select 1 from (
select * from users where is_active = true
) as x
where textin(record_out( x )) ~ '#he.com'
limit 1;
of course you can also select all columns:
select * from (
select * from users where is_active = true
) as x
where textin(record_out( x )) ~ '#he.com'
limit 1;

Concatenate several fields into one with SQL

I have three tables tag, page, pagetag
With the data below
page
ID NAME
1 page 1
2 page 2
3 page 3
4 page 4
tag
ID NAME
1 tag 1
2 tag 2
3 tag 3
4 tag 4
pagetag
ID PAGEID TAGID
1 2 1
2 2 3
3 3 4
4 1 1
5 1 2
6 1 3
I would like to get a string containing the correspondent tag names for each page with SQL in a single query. This is my desired output.
ID NAME TAGS
1 page 1 tag 1, tag 2, tag 3
2 page 2 tag 1, tag 3
3 page 3 tag 4
4 page 4
Is this possible with SQL?
I am using MySQL. Nonetheless, I would like a database vendor independent solution if possible.
Sergio del Amo:
However, I am not getting the pages without tags. I guess i need to write my query with left outer joins.
SELECT pagetag.id, page.name, group_concat(tag.name)
FROM
(
page LEFT JOIN pagetag ON page.id = pagetag.pageid
)
LEFT JOIN tag ON pagetag.tagid = tag.id
GROUP BY page.id;
Not a very pretty query, but should give you what you want - pagetag.id and group_concat(tag.name) will be null for page 4 in the example you've posted above, but the page shall appear in the results.
Yep, you can do it across the 3 something like the below:
SELECT page_tag.id, page.name, group_concat(tags.name)
FROM tag, page, page_tag
WHERE page_tag.page_id = page.page_id AND page_tag.tag_id = tag.id;
Has not been tested, and could be probably be written a tad more efficiently, but should get you started!
Also, MySQL is assumed, so may not play so nice with MSSQL! And MySQL isn't wild about hyphens in field names, so changed to underscores in the above examples.
As far as I'm aware SQL92 doesn't define how string concatenation should be done. This means that most engines have their own method.
If you want a database independent method, you'll have to do it outside of the database.
(untested in all but Oracle)
Oracle
SELECT field1 | ', ' | field2
FROM table;
MS SQL
SELECT field1 + ', ' + field2
FROM table;
MySQL
SELECT concat(field1,', ',field2)
FROM table;
PostgeSQL
SELECT field1 || ', ' || field2
FROM table;
I got a solution playing with joins. The query is:
SELECT
page.id AS id,
page.name AS name,
tagstable.tags AS tags
FROM page
LEFT OUTER JOIN
(
SELECT pagetag.pageid, GROUP_CONCAT(distinct tag.name) AS tags
FROM tag INNER JOIN pagetag ON tagid = tag.id
GROUP BY pagetag.pageid
)
AS tagstable ON tagstable.pageid = page.id
GROUP BY page.id
And this will be the output:
id name tags
---------------------------
1 page 1 tag2,tag3,tag1
2 page 2 tag1,tag3
3 page 3 tag4
4 page 4 NULL
Is it possible to boost the query speed writing it another way?
pagetag.id and group_concat(tag.name) will be null for page 4 in the example you've posted above, but the page shall appear in the results.
You can use the COALESCE function to remove the Nulls if you need to:
select COALESCE(pagetag.id, '') AS id ...
It will return the first non-null value from it's list of parameters.
I think you may need to use multiple updates.
Something like (not tested):
select ID as 'PageId', Name as 'PageName', null as 'Tags'
into #temp
from [PageTable]
declare #lastOp int
set #lastOp = 1
while #lastOp > 0
begin
update p
set p.tags = isnull(tags + ', ', '' ) + t.[Tagid]
from #temp p
inner join [TagTable] t
on p.[PageId] = t.[PageId]
where p.tags not like '%' + t.[Tagid] + '%'
set #lastOp == ##rowcount
end
select * from #temp
Ugly though.
That example's T-SQL, but I think MySql has equivalents to everything used.