Postgres: regex and nested queries something like Unix pipes - sql

Command should do: Give 1 as output if the pattern "*#he.com" is on the row excluding the headings:
user_id | username | email | passhash_md5 | logged_in | has_been_sent_a_moderator_message | was_last_checked_by_moderator_at_time | a_moderator
---------+----------+-----------+----------------------------------+-----------+-----------------------------------+---------------------------------------+-------------
9 | he | he#he.com | 6f96cfdfe5ccc627cadf24b41725caa4 | 0 | 1 | 2009-08-23 19:16:46.316272 |
In short, I want to connect many SELECT-commands with Regex, rather like Unix pipes. The output above is from a SELECT-command. A new SELECT-command with matching the pattern should give me 1.
Related

Did you mean
SELECT regexp_matches( (SELECT whatevername FROM users WHERE username='masi'), 'masi');
you obviously can not feed the record (*) to regexp_matches, but I assume this is not what your problem is, since you mention the issue of nesting SQL queries in the subject.
Maybe you meant something like
SELECT regexp_matches( wn, 'masi' ) FROM (SELECT whatevername AS wn FROM users WHERE username LIKE '%masi%') AS sq;
for the case when your subquery yields multiple results.

It looks like you could use a regular expression query to match on the email address:
select * from table where email ~ '.*#he.com';
To return 1 from this query if there is a match:
select distinct 1 from table where email ~ '.*#he.com';
This will return a single row containing a column with 1 if there is a match, otherwise no rows at all. There are many other possible ways to construct such a query.

Let's say that your original query is:
select * from users where is_active = true;
And that you really want to match in any column (which is bad idea for a lot of reasons), and you want just to check if "*#he.com" matches any row (by the way - this is not correct regexp! correct would be .*#he.com, but since there are no anchors (^ or $) you can just write #he.com.
select 1 from (
select * from users where is_active = true
) as x
where textin(record_out( x )) ~ '#he.com'
limit 1;
of course you can also select all columns:
select * from (
select * from users where is_active = true
) as x
where textin(record_out( x )) ~ '#he.com'
limit 1;

Related

SQL WHERE condition when one does not return true, then try other

I have to query a table based on two fields such that if first field matches then don't check the second but if first field does not match then check if second field matches for a value
something like:
SELECT * FROM table
WHERE cart_id=389 OR (cart_id IS NULL AND user_id=26)
But if first condition succeeds, it must not check for second condition
Example:
Suppose the following is my table
id | cart_id | user_id
1 | 389 | 26
2 | null | 26
3 | 878 | 26
on querying for cart_id = 389 and user_id = 26, I should get back only record 1 and NOT 2
on querying for cart_id = 1 and user_id = 26, I should get back only records 2 and NOT 1 and 3
The only way I can think of, is to do this in two steps and check the result of the first step in the second:
with the_cart as (
SELECT *
FROM the_table
WHERE cart_id=389
)
select *
from the_cart
union all
select *
from the_table
where cart_id IS NULL
AND user_id=26
and not exists (select * from the_cart);
If the first query (using cart_id=389) returns something the second query from the union will not be run (or more precisely return no rows) due to the not exists() condition.
Online example
Based on your updated example data, your where clause would be:
WHERE cart_id = 389 and user_id = 26
but given how trivial that is, it’s difficult to believe that’s really what you’ve been asking all along.
===
Updated based on latest example…
WHERE (cart_id = 389 and user_id = 26)
OR (cart_id is null and user_id = 26)

Filtering a column based on having some value in one of the rows in SQL or Presto Athena

I am trying in Athena to output only users which have some specific value in them but not in all of the rows
Suppose I have the table below.
I want all users which have value '100' in at least one of their rows but also having in other rows value different than 100.
user | value
A | 1
B | 2
A | 100
D | 3
A | 4
C | 3
C | 5
D | 100
So in this example I would want to get only users A and D because only them having 100 and none 100.
I tried maybe grouping by user and creating an array of values per user and then checking if array contains 100 but I don't manage doing it presto.
Also I thought about converting rows to columns and then checking if one of columns equals 100.
Those solutions are too complex? Anybody knows how to implement them or anyone has a better simpler solution?
The users that have at least one value of 100 can be found with this SQL:
SELECT DISTINCT user
FROM some_table
WHERE value = 100
But I assume you are after all tuples of user and value where the user has at least one value of 100, this can be accomplished by using the query above in a slightly more complex query:
WITH matching_users AS (
SELECT DISTINCT user
FROM some_table
WHERE value = 100
)
SELECT user, value
FROM matching_users
LEFT JOIN some_table USING (user)
You can use sub query as below to achieve your required output=
SELECT * FROM your_table
WHERE User IN(
SELECT DISTINCT User
FROM your_table
WHERE Value = 100
)
If you just want the users, I would go for aggregation:
select user
from t
group by user
having sum(case when value = 100 then 1 else 0 end) > 0;
If 100 is the maximum possible value, this can be simplified to:
having max(value) = 100

SQL find similar content

I have table ITEMS and column URL. All I need is in items.url to find similar rows:
Example of two similar rows:
ITEM_ID | URL
1 | www.google.com/test1/test2/test3.php
2 | www.yahoo.com/test1/test2/test3.php
3 | www.google.com/test5.php
4 | www.facebook.com/test5.php
As you can see the URL is similar JUST with different domains.
My query should be something like:
SELECT * FROM ITEMS
WHERE URL LIKE `%google.com%`...
AND `here code probably` ???
My query should return me ITEM_ID 2 and 4
You could group by the substring starting from the '/' character, and take the max ID in the group. Using postgresql syntax, it should look like this:
SELECT *
FROM ITEMS t
WHERE t.item_id IN (SELECT MAX(s.item_d)
FROM ITEMS s
GROUP BY SUBSTRING(s.url FROM POSITION('/' IN s.url)))
ORDER BY t.item_id;
Update: if you want only google domains, which have similar rows on different domains, you could use a filter EXISTS:
SELECT *
FROM ITEMS t
WHERE t.url LIKE 'www.google.com%'
AND EXISTS (SELECT 1
FROM ITEMS s
WHERE s.url NOT LIKE 'www.google.com%'
AND SUBSTRING(t.url FROM POSITION('/' IN t.url)) =
SUBSTRING(s.url FROM POSITION('/' IN s.url)));

PostgreSQL differences in value 1 and 11

I try to call to my DB and where is only one table:
id | value
----------
1 | 1|2|4
2 | 11|23
3 | 1|4|3|11
4 | 2|4|11
5 | 5|6|11
6 | 12|15|16
7 | 3|1|4
8 | 5|2|1
QUERY was : SELECT * FROM table_name WHERE value LIKE '%1%'
I want to select only rows with value 1 but I get rows with 11 value to.
How to show in SQL differences?
If you have to stick with this broken design, it's probably better to use Postgres' ability to parse a string into an array.
This is more robust than using a like condition:
select *
from the_table
where string_to_array(value,'|') #> array['1']
or maybe a bit easier to read
select *
from the_table
where '1' = any (string_to_array(value,'|'))
using the overlaps operator #> you can also search for more than one value at a time:
select *
from the_table
where string_to_array(value,'|') #> array['1','2']
will return all rows where value contains 1 and 2
SQLFiddle example: http://sqlfiddle.com/#!15/8793d/2
I strongly recommend that you should normalize your schema to every column store only atomic values.
Without it, you are forced to do some nasty trick, f.ex. with arrays:
select * from t
where '1' = any (string_to_array(value, '|'))
or, with pattern matching:
select * from t
where '1' similar to value
SQLFiddle

Count particular substring text within column

I have a Hive table, titled 'UK.Choices' with a column, titled 'Fruit', with each row as follows:
AppleBananaAppleOrangeOrangePears
BananaKiwiPlumAppleAppleOrange
KiwiKiwiOrangeGrapesAppleKiwi
etc.
etc.
There are 2.5M rows and the rows are much longer than the above.
I want to count the number of instances that the word 'Apple' appears.
For example above, it is:
Number of 'Apple'= 5
My sql so far is:
select 'Fruit' from UK.Choices
Then in chunks of 300,000 I copy and paste into Excel, where I'm more proficient and able to do this using formulas. Problem is, it takes upto an hour and a half to generate each chunk of 300,000 rows.
Anyone know a quicker way to do this bypassing Excel? I can do simple things like counts using where clauses, but something like the above is a little beyond me right now. Please help.
Thank you.
I think I am 2 years too late. But since I was looking for the same answer and I finally managed to solve it, I thought it was a good idea to post it here.
Here is how I do it.
Solution 1:
+-----------------------------------+---------------------------+-------------+-------------+
| Fruits | Transform 1 | Transform 2 | Final Count |
+-----------------------------------+---------------------------+-------------+-------------+
| AppleBananaAppleOrangeOrangePears | #Banana#OrangeOrangePears | ## | 2 |
| BananaKiwiPlumAppleAppleOrange | BananaKiwiPlum##Orange | ## | 2 |
| KiwiKiwiOrangeGrapesAppleKiwi | KiwiKiwiOrangeGrapes#Kiwi | # | 1 |
+-----------------------------------+---------------------------+-------------+-------------+
Here is the code for it:
SELECT length(regexp_replace(regexp_replace(fruits, "Apple", "#"), "[A-Za-z]", "")) as number_of_apples
FROM fruits;
You may have numbers or other special characters in your fruits column and you can just modify the second regexp to incorporate that. Just remember that in hive to escape a character you may need to use \\ instead of just one \.
Solution 2:
SELECT size(split(fruits,"Apple"))-1 as number_of_apples
FROM fruits;
This just first split the string using "Apple" as a separator and makes an array. The size function just tells the size of that array. Note that the size of the array is one more than the number of separators.
This is straight-forward if you have any delimiter ( eg: comma ) between the fruit names. The idea is to split the column into an array, and explode the array into multiple rows using the 'explode' function.
SELECT fruit, count(1) as count FROM
( SELECT
explode(split(Fruit, ',')) as fruit
FROM UK.Choices ) X
GROUP BY fruit
From your example, it looks like fruits are delimited by Capital letters. One idea is to split the column based on capital letters, assuming there are no fruits with same suffix.
SELECT fruit_suffix, count(1) as count FROM
( SELECT
explode(split(Fruit, '[A-Z]')) as fruit_suffix
FROM UK.Choices ) X
WHERE fruit_suffix <> ''
GROUP BY fruit_suffix
The downside is that, the output will not have first letter of the fruit,
pple - 5
range - 4
I think you want to run in one select, and use the Hive if UDF to sum for the different cases. Something like the following...
select sum( if( fruit like '%Apple%' , 1, 0 ) ) as apple_count,
sum( if( fruit like '%Orange%', 1, 0 ) ) as orange_count
from UK.Choices
where ID > start and ID < end;
instead of a join in the above query.
No experience of Hive, I'm afraid, so this may or may not work. But on SQLServer, Oracle etc I'd do something like this:
Assuming that you have an int PK called ID on the row, something along the lines of:
select AppleCount, OrangeCount, AppleCount - OrangeCount score
from
(
select count(*) as AppleCount
from UK.Choices
where ID > start and ID < end
and Fruit like '%Apple%'
) a,
(
select count(*) as OrangeCount
from UK.Choices
where ID > start and ID < end
and Fruit like '%Orange%'
) o
I'd leave the division by the total count to the end, when you have all the rows in the spreadsheet and can count them there.
However, I'd urgently ask my boss to let me change the Fruit field to be a table with an FK to Choices and one fruit name per row. Unless this is something you can't do in Hive, this design is something that makes kittens cry.
PS I'd missed that you wanted the count of occurances of Apple which this won't do. I'm leaving my answer up, because I reckon that my However... para is actually a good answer. :(