How do I display multiple fields when using distinct count? - sql

I am trying to get a count of total different first and last names with the same email address, and I'm not sure where to go from here. Field1 and Field2 are in the same table.
My output should have the concatenated field, field 1, field2
SELECT COUNT(DISTINCT(CONCAT(first_name,last_name)))
FROM `datalake.core.profile_snapshot`
WHERE classic_country = 'US' and
email.personal = 'example#provider.net'
LIMIT 1000
Appreciate any help!

SELECT
first_name
,last_name
,email_address
,count(1) as number
FROM datalake.core.profile_snapshot
GROUP BY
first_name
,last_name
,email_address
If you want to reduce the result set to a particular email address then just add a where clause to do so.
I've used email_address instead of email.personal.

LIMIT for SQL is generally limiting the number of rows returned, not for filtering. Need to use HAVING to filter on your aggregate
Email with 1000+ Distinct Names
SELECT email
/*Put random pipe character "|" in between first and last name so don't get names that concatenate to same value
Such as Jane Doe and Jan Edoe. Not a realistic example but concatenation could result in same "value" without a separator*/
,DistinctNames = COUNT(DISTINCT CONCAT(first_name,'|',last_name))
FROM datalake.core.profile_snapshot
WHERE classic_country = 'US'
AND email.personal = 'example#provider.net' /*Can comment this out if you want to see all email with 1000+ distinct names*/
GROUP BY email
/*HAVING clause = WHERE clause for aggregates*/
HAVING COUNT(DISTINCT CONCAT(first_name,'|',last_name)) > 1000 /*1000 distinct names for each email*/

Related

BigQuery GROUP BY function still showing duplicates

I'm doing a query in BigQuery:
SELECT id FROM [table] WHERE city = 'New York City' GROUP BY id
The weird part is it shows duplicate ids, often right next to each other.
There is absolutely nothing different between the ids themselves. There are around 3 million rows total, for ~500k IDs. So there are a lot of duplicates, but that is by design. We figured the filtering would easily eliminate that but noticed discrepancies in totals.
Is there a reason BigQuery's GROUP BY function would work improperly? For what its worth, the dataset has ~3 million rows.
Example of duplicate ID:
56abdb5b9a75d90003001df6
56abdb5b9a75d90003001df6
the only explanation is your id is STRING and in reality those two ids are different because of spaces before or most likely after what is "visible" for eyes
I recommend you to adjust your query like below
SELECT REPLACE(id, ' ', '')
FROM [table]
WHERE city = 'New York City'
GROUP BY 1
another option to troubleshoot would be below
SELECT id, LENGTH(id)
FROM [table]
WHERE city = 'New York City'
GROUP BY 1, 2
so you can see if those ids are same by length or not - my initial assumption was about space - but it can be any other char(s) including non printable

How to count a number of substring in a row via SQL?

I am working on a database represeting a simple address book through MS Studio 2015 (C#) and MS SQL Server 2008. I successfully added 'insert row' and 'remove row' methods in my code. So I want to compose a query (a stored procedure) which counts a number of substring in every row.
For example, I have the database which includes a table called Contacts:
PersonID Name Surname City Phone
1 Alice Karlsson Gotheburg 69-58-12
2 Mark Morrow Stockholm 48-48-48
3 Katherine Karlsson Gotheburg 69-58-16
If I try to find and count 'th' in the table, I want to get the following the result:
PersonID Name Surname City Phone Count
3 Katherine Karlsson Gotheburg 69-58-16 2
1 Alice Karlsson Gotheburg 69-58-12 1
So I don't know how to do that. I've been googling for all the day but I didn't find the satisfying result. Here on the stackoverflow.com I find a solution returning the next result:
ColumnName ColumnValue
Contacts.City Gotheburg
Contacts.Name Katherine
Contacts.City Gotheburg
Please, give me any idea to compose a query returning the expected result.
Full-text search; is the expected result
UPD: 'th' is a substring I'm looking for in a row. So it should count "Agathe', 'th' and 'youth' the same way.
You should try following,
Select
PersonId,
Name,
Surname,
City,
Phone,
sum(count) as count
From
(
select
*,
(Len(name) - LEN(REPLACE(name, 'th', ' ')) +
Len(surname) - LEN(REPLACE(surname, 'th', ' ')) +
Len(city) - LEN(REPLACE(city, 'th', ' '))) as count
from Contacts
where name like '%th%' or surname like '%th%' or city like '%th%'
)T
Group by PersonId, Name, Surname, City, Phone
Order by 6 desc
Here what you are trying to achieve is fulltext search...
Please follow this link..
http://blog.sqlauthority.com/2008/09/05/sql-server-creating-full-text-catalog-and-index/
create a full text index
and use this script
select * from yourtable
where freetext (*,'your_search_item')
Try this way
select * from Contacts where Contacts.City like '%th%' or
Contacts.Name like '%th%'
You need to create a table valued function that loops on all rows column by column to seek the sub-string with a counter,
inside the loop you can use built in functions that help in seeking texts such as CHARINDEX('th',Name+Surname+City,0)
which gives the exact location of the sub-string inside the text ...

sql distinct on one column of a group

SELECT DISTINCT inta, name, PHN#, FROM nydta.adres
WHERE inta <> ' '
I want the distinct for inta because alot of the time phone is blank so those are coming thru i do want all columns, but distinct for inta.
and secondly, inta is an internet address column.
i would like to exclude one domain like say
#excludethisdomain.com
Data looks like this
ACCOUNT#ALLSTARS.COM GATES LOU 212-555-1212 ALLSTARREADING
PHERWESTBARN#MSN.COM BARN HEAT 212-555-1212
PHERWESTBARN#MSN.COM BARN RALP EARLS
So in the second and third, it's distinct bec of the email address.
With regard to the comments under question, if you want to select distinct emails with only one name which does not matter which of the names are selected for the specific email then you can use subqueries to select the values:
select distinct
t.inta ,
(select top 1 a.name from nydta.adres a where a.inta=t.inta) name,
(select top 1 a.PHH from nydta.adres a where a.inta=t.inta) PHH
from
nydta.adres t
where
inta <> ' '

How can I select the Nth row of a group of fields?

I have a very very small database that I am needing to return a field from a specific row.
My table looks like this (simplified)
Material_Reading Table
pointID Material_Name
123 WoodFloor
456 Carpet
789 Drywall
111 Drywall
222 Carpet
I need to be able to group these together and see the different kinds (WoodFloor, Carpet, and Drywall) and need to be able to select which one I want and have that returned. So my select statement would put the various different types in a list and then I could have a variable which would select one of the rows - 1, 2, 3 for example.
I hope that makes sense, this is somewhat a non-standard implementation because its a filemaker database unfortunately, so itstead of one big SQL statement doing all I need I will have several that will each select an individual row that I indicate.
What I have tried so far:
SELECT DISTINCT Material_Name FROM MATERIAL_READING WHERE Room_KF = $roomVariable
This works and returns a list of all my material names which are in the room indicated by the room variable. But I cant get a specific one by supplying a row number.
I have tried using LIMIT 1 OFFSET 1. Possibly not supported by Filemaker or I am doing it wrong, I tried it like this - it gives an error:
SELECT DISTINCT Material_Name FROM MATERIAL_READING WHERE _Room_KF = $roomVariable ORDER BY Material_Name LIMIT 1 OFFSET 1
I am able to use ORDER BY like this:
SELECT DISTINCT Material_Name FROM MATERIAL_READING WHERE Room_KF = $roomVariable ORDER BY Material_Name
In MSSQL
SELECT DISTINCT Material_Name
FROM MATERIAL_READING
WHERE _Room_KF = 'roomVariable'
ORDER BY Material_Name
OFFSET N ROWS
FETCH NEXT 5 ROWS ONLY
where N->from which row does to start
X->no.of rows to retrieve which were started from (N+1 row)

How can an get count of the unique lengths of a string in database rows?

I am using Oracle and I have a table with 1000 rows. There is a last name field and
I want to know the lengths of the name field but I don't want it for every row. I want a count of the various lengths.
Example:
lastname:
smith
smith
Johnson
Johnson
Jackson
Baggins
There are two smiths length of five. Four others, length of seven. I want my query to return
7
5
If there were 1,000 names I expect to get all kinds of lengths.
I tried,
Select count(*) as total, lastname from myNames group by total
It didn't know what total was. Grouping by lastname just groups on each individual name unless it's a different last name, which is as expected but not what I need.
Can this be done in one SQL query?
SELECT Length(lastname)
FROM MyTable
GROUP BY Length(lastname)
select distinct(LENGTH(lastname)) from mynames;
Select count(*), Length(column_name) from table_name group by Length(column_name);
This will work for the different lengths in a single column.