How to count total occurrence of “#“ In email address in impala hive - hive

Table A
Clientid Email- add
123 Abc#gmail.com
245 Ghi#gm#yahoo.com
I want to find email addresses with count of “#“ > 1.That means i should get clientid 245 as my answer

I did not find an Impala function to count occurrences but the below code should work
select *
from A
where (length(Email-add) - length(replace(Email-add, '#', ''))) > 1;

Related

max consecutive digits in a string

I am trying to count the number of MAX consecutive digits that appear in a string column, let me give an example to illustrate better what I am trying to do. If I have a table called email
email
lucas1234#gmail.com
fer12#gmail.com
lupal#gmail.com
carlos1perez222#gmail.com
carlos11perez222#gmail.com
lucila1#gmail.com
my expected output would be
email count_cons_digits
lucas1234#gmail.com 4
fer12#gmail.com 2
lupal#gmail.com 0
carlos1perez222#gmail.com 3
carlos11perez222#gmail.com 3
lucila1#gmail.com 1
Check that this question is very similar to :
Number of consecutive digits in a column string
but the only difference is that the function from the results is not contemplating cases with only one digit in the email (like lucila1#gmail.com). In this case, the expected result should be 1 but the proposed function is giving 0. And also whenever the email contains "two sections" of consecutive digits (carlos11perez222#gmail.com). In this case, the expected output is to be 3 but is given 5.
Consider below approach
select *,
ifnull((select length(digits) len
from unnest(regexp_extract_all(email, r'\d+')) digits
order by len desc
limit 1
), 0) as count_cons_digits
from your_table
if applied to sample data in your question - output is
You may also try this approach using regex:
WITH email AS
(SELECT 'lucas1234#gmail.com' mail,
UNION ALL SELECT 'fer12#gmail.com',
UNION ALL SELECT 'lupal#gmail.com',
UNION ALL SELECT 'carlos1perez222#gmail.com',
UNION ALL SELECT 'carlos11perez222#gmail.com',
UNION ALL SELECT 'lucila1#gmail.com')
SELECT email,
(LENGTH(REGEXP_REPLACE(REGEXP_REPLACE(email.mail, r'[A-Za-z]+\d+[A-Za-z]+', ''),r'[A-Za-z.#]+',''))) AS count_cons_digits,
FROM email;
Output:

How do I display multiple fields when using distinct count?

I am trying to get a count of total different first and last names with the same email address, and I'm not sure where to go from here. Field1 and Field2 are in the same table.
My output should have the concatenated field, field 1, field2
SELECT COUNT(DISTINCT(CONCAT(first_name,last_name)))
FROM `datalake.core.profile_snapshot`
WHERE classic_country = 'US' and
email.personal = 'example#provider.net'
LIMIT 1000
Appreciate any help!
SELECT
first_name
,last_name
,email_address
,count(1) as number
FROM datalake.core.profile_snapshot
GROUP BY
first_name
,last_name
,email_address
If you want to reduce the result set to a particular email address then just add a where clause to do so.
I've used email_address instead of email.personal.
LIMIT for SQL is generally limiting the number of rows returned, not for filtering. Need to use HAVING to filter on your aggregate
Email with 1000+ Distinct Names
SELECT email
/*Put random pipe character "|" in between first and last name so don't get names that concatenate to same value
Such as Jane Doe and Jan Edoe. Not a realistic example but concatenation could result in same "value" without a separator*/
,DistinctNames = COUNT(DISTINCT CONCAT(first_name,'|',last_name))
FROM datalake.core.profile_snapshot
WHERE classic_country = 'US'
AND email.personal = 'example#provider.net' /*Can comment this out if you want to see all email with 1000+ distinct names*/
GROUP BY email
/*HAVING clause = WHERE clause for aggregates*/
HAVING COUNT(DISTINCT CONCAT(first_name,'|',last_name)) > 1000 /*1000 distinct names for each email*/

Count the matched values in sqlite

I want to count the matched values in data like in (table1)
name id subject
maria 01 Math computer english
faro 02 Computer stat english
hina 03 Chemistry physics bio
The below query
Select *
from table1
where subject like ‘%english%’ or
subject like ‘%stat%’
returns first two rows :
But I also need to count the matched values like below output
count
1
2
0
(Because in the first row only one value matches, in the second row two matches and in third row there are no matches).
can i get that desired output??
You may try summing CASE expressions which check each condition:
SELECT
subject,
CASE WHEN subject LIKE '%english%' THEN 1 ELSE 0 END +
CASE WHEN subject LIKE '%stat%' THEN 1 ELSE 0 END AS count
FROM
yourTable;
If you instead wanted to get a count of the number of words in each subject which did not match to one of the two keywords, you could try:
SELECT
subject,
LENGTH(subject) - LENGTH(REPLACE(subject, ' ', '')) + 1 -
( subject LIKE '%English%' ) - ( subject LIKE '%stat%' ) AS count
FROM yourTable;
Demo
SQLite validates conditions returning boolean values with 1 and 0 for true and false respectively, so you can do it like this:
select *,
(subject like '%english%')
+
(subject like '%stat%') as count
from table1

How to count a number of substring in a row via SQL?

I am working on a database represeting a simple address book through MS Studio 2015 (C#) and MS SQL Server 2008. I successfully added 'insert row' and 'remove row' methods in my code. So I want to compose a query (a stored procedure) which counts a number of substring in every row.
For example, I have the database which includes a table called Contacts:
PersonID Name Surname City Phone
1 Alice Karlsson Gotheburg 69-58-12
2 Mark Morrow Stockholm 48-48-48
3 Katherine Karlsson Gotheburg 69-58-16
If I try to find and count 'th' in the table, I want to get the following the result:
PersonID Name Surname City Phone Count
3 Katherine Karlsson Gotheburg 69-58-16 2
1 Alice Karlsson Gotheburg 69-58-12 1
So I don't know how to do that. I've been googling for all the day but I didn't find the satisfying result. Here on the stackoverflow.com I find a solution returning the next result:
ColumnName ColumnValue
Contacts.City Gotheburg
Contacts.Name Katherine
Contacts.City Gotheburg
Please, give me any idea to compose a query returning the expected result.
Full-text search; is the expected result
UPD: 'th' is a substring I'm looking for in a row. So it should count "Agathe', 'th' and 'youth' the same way.
You should try following,
Select
PersonId,
Name,
Surname,
City,
Phone,
sum(count) as count
From
(
select
*,
(Len(name) - LEN(REPLACE(name, 'th', ' ')) +
Len(surname) - LEN(REPLACE(surname, 'th', ' ')) +
Len(city) - LEN(REPLACE(city, 'th', ' '))) as count
from Contacts
where name like '%th%' or surname like '%th%' or city like '%th%'
)T
Group by PersonId, Name, Surname, City, Phone
Order by 6 desc
Here what you are trying to achieve is fulltext search...
Please follow this link..
http://blog.sqlauthority.com/2008/09/05/sql-server-creating-full-text-catalog-and-index/
create a full text index
and use this script
select * from yourtable
where freetext (*,'your_search_item')
Try this way
select * from Contacts where Contacts.City like '%th%' or
Contacts.Name like '%th%'
You need to create a table valued function that loops on all rows column by column to seek the sub-string with a counter,
inside the loop you can use built in functions that help in seeking texts such as CHARINDEX('th',Name+Surname+City,0)
which gives the exact location of the sub-string inside the text ...

Trouble Finding ID's with Duplicate Fields

My data looks like this:
ID Email
1 someone#hotmail.com
2 someone1#hotmail.com
3 someone2#hotmail.com
4 someone3#hotmail.com
5 someone4#hotmail.com
6 someone5#hotmail.com
There should be exactly 1 email per ID, but there's not.
> dim(data)
[1] 5071 2
> length(unique(data$Person_Onyx_Id))
[1] 5071
> length((data$Email))
[1] 5071
> length(unique(data$Email))
[1] 4481
So, I need to find the ID's with duplicated email addresses.
Seems like this should be easy, but I'm striking out:
> sqldf("select ID, count(Email) from data group by ID having count(Email) > 1")
[1] ID count(Email)
<0 rows> (or 0-length row.names)
I've also tried taking off the having clause and sending the result to an object and sorting the object by the count(Email)... it appears that every ID has count(Email) of 1...
I would dput the actual data but I can't due to the sensitivity of email addresses.
Are you also sure you don't have the opposite condition, multiple ids with the same email?
select Email, count(*)
from data
group by Email
having count(*) > 1;
My guess is that you have NULL emails. You could find this by using count(*) rather than count(email):
select ID, count(*)
from data
group by ID
having count(*) > 1;