SQL Window Function to get addresses with more than 1 unique last name present (Snowflake) - sql

I have a Snowflake table which includes addresses, state, first names and last names.
I would like to get a query that shows me only the addresses where more than 1 individual with a different last name is present.
So for example, assume that I have
address | fname | lname |State
10 lake road| John | Smith |FL
10 lake road| Julie | Gallagher|FL
3 gator cove| Jack | Hoyt |FL
3 gator cove| Debra | Hoyt |FL
I would like the query to return only 1 row in that example: 10 lake road. Because it's the only house where there is more than 1 unique last name present.
I am currently using
SELECT distinct a.address, a.fname, a.lname, a.state
FROM clients_addresses a
WHERE a.state = 'FL'
qualify count(1) over( partition by a.lname) > 1
order by a.address
However, this is just returning the addresses where there is more than 1 person, it doesn't care if the last name is repeated. That's what I'm trying to avoid.
I can't quite understand where the query is going wrong. Snowflake doesn't like using any distinct keyword after the initial select, and even if I use it, it only returns 1 occurrence of each address, but it's still just addresses with more than 1 person, even if there was only 1 last name in the address.
It doesn't need to involve the keyword "qualify", I know Snowflake also accepts other things such as subselects that might help with this problem.

I would like the query to return only 1 row in that example: 10 lake road.
This sounds like aggregation:
SELECT a.address, count(*)
FROM clients_addresses a
WHERE a.state = 'FL'
GROUP BY a.address
HAVING COUNT(DISTINCT a.lname) > 1;
If you want the original rows (which is not what your question asks for), you can use:
SELECT a.*
FROM clients_addresses a
WHERE a.state = 'FL'
QUALITY COUNT(DISTINCT a.lname) OVER (PARTITION BY a.address) > 1;

Related

How can I delete completely duplicate rows from a query, without having a unique value for it?

I'm having an issue getting information from an MS Access Database table. I need a count of a code but I don't have to take into account duplicate rows, which means that I need to delete all duplicate rows.
Here's an example to illustrate what I need:
Code | Name
12 | George
20 | John
12 | George
33 | John
I will need first to delete both rows with the same code, and then I need a count for the name the rest of the table data for example this will be the result that I'm expecting:
Name | Count
John | 2
I already have a query that does that for me, but is taking around 1 hour to get me around 5000 rows and I need something more efficient. My query:
select name, count(*) from Table
where name = '" + input_name + "'
and code in (select code from Table group by code
having count(code) = 1)
group by name
order by count(name) desc;
I would appreciate any suggestion.
Rather than using in, I might suggest filtering the original dataset in a subquery, e.g.:
select u.name, count(*)
from (select t.code, t.name from yourtable t group by t.code, t.name having count(*) = 1) u
group by u.name
Here, change yourtable to the name of your table.

How to get dates based on months that appear more than once?

I'm trying to get months of Employees' birthdays that are found in at least 2 rows
I've tried to unite birthday information table with itself supposing that I could iterate through them abd get months that appear multiple times
There's the question: how to get birthdays with months that repeat more than once?
SELECT DISTINCT e.EmployeeID, e.City, e.BirthDate
FROM Employees e
GROUP BY e.BirthDate, e.City, e.EmployeeID
HAVING COUNT(MONTH(b.BirthDate))=COUNT(MONTH(e.BirthDate))
UNION
SELECT DISTINCT b.EmployeeID, b.City, b.BirthDate
FROM Employees b
GROUP BY b.EmployeeID, b.BirthDate, b.City
HAVING ...
Given table:
| 1 | City1 | 1972-03-26|
| 2 | City2 | 1979-12-13|
| 3 | City3 | 1974-12-16|
| 4 | City3 | 1979-09-11|
Expected result :
| 2 | City2 |1979-12-13|
| 3 | City3 |1974-12-16|
Think of it in steps.
First, we'll find the months that have more than one birthday in them. That's the sub-query, below, which I'm aliasing as i for "inner query". (Substitute MONTH(i.Birthdate) into the SELECT list for the 1 if you want to see which months qualify.)
Then, in the outer query (o), you want all the fields, so I'm cheating and using SELECT *. Theoretically, a WHERE IN would work here, but IN can have unfortunate side effects if a NULL comes back, so I never use it. Instead, there's a correlated sub=query; which is to say we look for any results where the month from the outer query is equal to the months that make the cut in the inner (correlated sub-) query.
When using a correlated sub-query in the WHERE clause, the SELECT list doesn't matter. You could put 1/0 and it won't throw an error. But I always use SELECT 1 to show that the inner query isn't actually returning any results to the outer query. It's just there to look for, well, the correlation between the two data sets.
SELECT
*
FROM
#table AS o
WHERE
EXISTS
(
SELECT
1
FROM
#table AS i
WHERE
MONTH(i.Birthdate) = MONTH(o.Birthdate)
GROUP BY
MONTH(i.Birthdate)
HAVING
COUNT(*) > 1
);
Seems to be an odd requirement.
This might help with some tweaks. Works in Oracle.
SELECT DATE FROM TABLE WHERE EXTRACT(MONTH FROM DATE)=EXTRACT(MONTH FROM SOMEDATE);
Give this a try and you may be able to dispense with your UNION:
SELECT
EmployeeId
, City
, BirthDate
FROM Employees
GROUP BY
EmployeeId
, City
, BirthDate
HAVING COUNT(Month(BirthDate)) > 2
Here is another approach using GROUP_CONCAT. It's not exactly what you're looking for but it might do the job. Eric's approach is better though. (Note: This is for MySQL)
SELECT GROUP_CONCAT(EmployeeID) EmployeeID, BirthDate, COUNT(*) DupeCount
FROM Employees
GROUP BY MONTH(BirthDate)
HAVING DupeCount> 1;

SQL - how to transpose only some row values into column headers without pivot

I have a table similar to this:
stud_ID | first_name | last_name | email | col_num | user_value
1 tom smith 50 Retail
1 tom smith 60 Product
2 Sam wright 50 Retail
2 Sam wright 60 Sale
but need to convert it to: (basically transpose 'col_num' to column headers and change 50 to function, 60 to department)
stud_ID | first_name | last_name | email | Function | Department
1 tom smith Retail Product
2 Sam wright Retail Sale
Unfortunately Pivot doesn't work in my system, just wondering if there is any other way to do this please?
The code that I have so far (sorry for the long list):
SELECT c.person_id_external as stu_id,
c.lname,
c.fname,
c.mi,
a.cpnt_id,
a.cpnt_typ_id,
a.rev_dte,
a.rev_num,
cp.cpnt_title AS cpnt_desc,
a.compl_dte,
a.CMPL_STAT_ID,
b.cmpl_stat_desc,
b.PROVIDE_CRDT,
b.INITIATE_LEVEL1_SURVEY,
b.INITIATE_LEVEL3_SURVEY,
a.SCHD_ID,
a.TOTAL_HRS,
a.CREDIT_HRS,
a.CPE_HRS,
a.CONTACT_HRS,
a.TUITION,
a.INST_NAME,
--a.COMMENTS,
a.BASE_STUD_ID,
a.BASE_CPNT_TYP_ID,
a.BASE_CPNT_ID,
a.BASE_REV_DTE,
a.BASE_CMPL_STAT_ID,
a.BASE_COMPL_DTE,
a.ES_USER_NAME,
a.INTERNAL,
a.GRADE_OPT,
a.GRADE,
a.PMT_ORDER_TICKET_NO,
a.TICKET_SEQUENCE,
a.ORDER_ITEM_ID,
a.ESIG_MESSAGE,
a.ESIG_MEANING_CODE_ID,
a.ESIG_MEANING_CODE_DESC,
a.CPNT_KEY,
a.CURRENCY_CODE,
c.EMP_STAT_ID,
c.EMP_TYP_ID,
c.JL_ID,
c.JP_ID,
c.TARGET_JP_ID,
c.JOB_TITLE,
c.DMN_ID,
c.ORG_ID,
c.REGION_ID,
c.CO_ID,
c.NOTACTIVE,
c.ADDR,
c.CITY,
c.STATE,
c.POSTAL,
c.CNTRY,
c.SUPER,
c.COACH_STUD_ID,
c.HIRE_DTE,
c.TERM_DTE,
c.EMAIL_ADDR,
c.RESUME_LOCN,
c.COMMENTS,
c.SHIPPING_NAME,
c.SHIPPING_CONTACT_NAME,
c.SHIPPING_ADDR,
c.SHIPPING_ADDR1,
c.SHIPPING_CITY,
c.SHIPPING_STATE,
c.SHIPPING_POSTAL,
c.SHIPPING_CNTRY,
c.SHIPPING_PHON_NUM,
c.SHIPPING_FAX_NUM,
c.SHIPPING_EMAIL_ADDR,
c.STUD_PSWD,
c.PIN,
c.PIN_DATE,
c.ENCRYPTED,
c.HAS_ACCESS,
c.BILLING_NAME,
c.BILLING_CONTACT_NAME,
c.BILLING_ADDR,
c.BILLING_ADDR1,
c.BILLING_CITY,
c.BILLING_STATE,
c.BILLING_POSTAL,
c.BILLING_CNTRY,
c.BILLING_PHON_NUM,
c.BILLING_FAX_NUM,
c.BILLING_EMAIL_ADDR,
c.SELF_REGISTRATION,
c.SELF_REGISTRATION_DATE,
c.ACCESS_TO_ORG_FIN_ACT,
c.NOTIFY_DEV_PLAN_ITEM_ADD,
c.NOTIFY_DEV_PLAN_ITEM_MOD,
c.NOTIFY_DEV_PLAN_ITEM_REMOVE,
c.NOTIFY_WHEN_SUB_ITEM_COMPLETE,
c.NOTIFY_WHEN_SUB_ITEM_FAILURE,
c.LOCKED,
c.PASSWORD_EXP_DATE,
c.SECURITY_QUESTION,
c.SECURITY_ANSWER,
c.ROLE_ID,
c.IMAGE_ID,
c.GENDER,
c.PAST_SERVICE,
c.LST_UNLOCK_TSTMP,
c.MANAGE_SUB_SP,
c.MANAGE_OWN_SP,
d.col_num,
d.user_value
FROM pa_cpnt_evthst a,
pa_cmpl_stat b,
pa_student c,
pv_course cp,
pa_stud_user d
WHERE a.cmpl_stat_id = b.cmpl_stat_id
AND a.stud_id = c.stud_id
AND cp.cpnt_typ_id(+) = a.cpnt_typ_id
AND cp.cpnt_id(+) = a.cpnt_id
AND cp.rev_dte(+) = a.rev_dte
AND a.CPNT_TYP_ID != 'SYSTEM_PROGRAM_ENTITY'
AND c.stud_id = d.stud_id
AND d.col_num in ('10','30','50','60')
I would just use conditional aggregation:
select stud_ID, first_name, last_name, email,
max(case when col_num = 50 then user_value end) as function,
max(case when col_num = 60 then user_value end) as department
from t
group by stud_ID, first_name, last_name, email;
Your code seems to have nothing to do with the sample data. I do notice however that you are using implicit join syntax. You really need to learn how to use proper, explicit, standard JOIN syntax.
I'm assuming you have Sql Server 2000 or 2003. What you need to do in that case is create a script with one cursor.
This cursor will create a text with something like this:
string var = "CREATE TABLE #Report (Col1 VARCHAR(20), Col2, VARCHAR(20), " + ColumnName
That way you can create a temp table on the fly, at the end you will need to do a Select of your temp table to get your pivot table ready.
Its not that easy if you are not familiar with cursors.
OR
if there are only few values on your 'pivot' column and they are not going to grow you can also do something like this:
Pivot using SQL Server 2000
I'm unable to understand your code, so I'll just assume the table mentioned in the sample data as stud(because of stud_id).
So here is what I think can do the work of pivot.
SELECT ISNULL(s1.stud_ID, s2.stud_id),
ISNULL(s1.first_name, s2.first_name),
ISNULL(s1.last_name, s2.last_name),
ISNULL(s1.email, s2.email),
s1.user_value as [Function], s2.user_value as Department
FROM stud s1 OUTER JOIN stud s2
ON s1.stud_ID = s2.stud_ID -- Assuming stud_ID is primary key, else join on all primary keys
AND s1.col_num = 50 AND s2.col_num = 60
Explanation: I'm just trying to simulate here what PIVOT does. For every column you want, you create a new table in the JOIN and constaint it to only one value in your col_num column. For example, if there are no values for 50 in s1, the OUTER JOIN will get make it NULL and we need to pull records from s2.
Note: If you need more than 2 new columns, then you can use COALESCE instead of ISNULL

Case Statement for multiple criteria

I would like to ignore some of the results of my query as for all intents and purposes, some of the results are a duplicate, but based on the way the request was made, we need to use this hierarchy and although we are seeing different 'Company_Name' 's, we need to ignore one of the results.
Query:
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
2
ORDER BY
3 ASC, 2 ASC
This code omits half a doze joins and where statements that are not germane to this question.
Results:
Customer_Name_Count Company_Name Total_Sales
-------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 6 Jimmy's Restaurant 1,500
4 9 Impala Hotel 2,000
5 12 Sports Drink 2,500
In the above set, we can see that numbers 2 & 3 have the same count and the same total_sales number and similar company names. Is there a way to create a case statement that takes these 3 factors into consideration and then drops one or the other for Jimmy's enterprises? The other issue is that this has to be variable as there are other instances where this happens. And I would only want this to happen if the count and sales number match each other with a similar name in the company name.
Desired result:
Customer_Name_Count Company_Name Total_Sales
--------------------------------------------------------------
1 3 Blockbuster 1,000
2 6 Jimmy's Bar 1,500
3 9 Impala Hotel 2,000
4 12 Sports Drink 2,500
Looks like other answers are accurate based on assumption that Company_IDs are the same for both.
If Company_IDs are different for both Jimmy's Bar and Jimmy's Restaurant then you can use something like this. I suggest you get functional users involved and do some data clean-up else you'll be maintaining this every time this issue arise:
SELECT
COUNT(DISTINCT CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END) AS Customer_Name_Count
,CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END AS Company_Name
,SUM(A12.Total_Sales) AS Total_Sales
FROM some_table er
GROUP BY CASE
WHEN A12.Company_Name = 'Name2' THEN 'Name1'
ELSE A12.Company_Name
END
Your problem is that the joins you are using are multiplying the number of rows. Somewhere along the way, multiple names are associated with exactly the same entity (which is why the numbers are the same). You can fix this by aggregating by the right id:
SELECT COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
MAX(Company_Name) as Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM some_table AS A12
GROUP BY Company_id -- I'm guessing the column is something like this
ORDER BY 3 ASC, 2 ASC;
This might actually overstate the sales (I don't know). Better would be fixing the join so it only returned one name. One possibility is that it is a type-2 dimension, meaning that there is a time component for values that change over time. You may need to restrict the join to a single time period.
You need to have function to return a common name for the companies and then use DISTINCT:
SELECT DISTINCT
Customer_Name_Count,
dbo.GetCommonName(Company_Name) as Company_Name,
Total_Sales
FROM dbo.theTable
You can try to use ROW_NUMBER with window function to make row number by Customer_Name_Count and Total_Sales then get rn = 1
SELECT * FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY Customer_Name_Count,Total_Sales ORDER BY Company_Name) rn
FROM (
SELECT
COUNT(DISTINCT A12.Company_name) AS Customer_Name_Count,
Company_Name,
SUM(Total_Sales) AS Total_Sales
FROM
some_table AS A12
GROUP BY
Company_Name
)t1
)t1
WHERE rn = 1

Display a blank row between every unique row?

I have a simple query like:
SELECT employee, ITEM_TYPE, COUNT(ITEM_TYPE)
FROM hr_database
So the output may look like
BOB MUGS 4
BOB PENCILS 10
CAT MUGS 2
CAT PAPERCLIPS 7
SAL MUGS 11
But for readability, I want to put a blank row between each user in the output(i.e for readability), like this :
BOB MUGS 4
BOB PENCILS 10
CAT MUGS 2
CAT PAPERCLIPS 7
SAL MUGS 11
Is there a way to do this in Oracle SQL ? So far, I found this link but it doesn't match what I need . I'm thinking to use a WITH in the query?
You can do it in the database, but this type of processing should really be done at the application layer.
But, it is kind of an amusing trick to figure out how to do it in the database, and that is your specific question:
WITH e AS (
SELECT employee, ITEM_TYPE, COUNT(ITEM_TYPE) as cnt
FROM hr_database
GROUP BY employee, ITEM_TYPE
)
SELECT (case when cnt is not null then employee end) as employee,
item_type, cnt
FROM (select employee, item_type, cnt, 1 as x from e union all
select distinct employee, NULL, NULL, 2 as x from e
) e
ORDER BY e.employee, x;
I emphasize, though, that this is really for amusement and perhaps for understanding better how SQL works. In the real world, you do this type of work at the application layer.
A summary of how this works. The union all brings in one additional row for each employee. The x is a priority for sorting -- because you have to sort the result set to get the proper ordering. The case statement is needed to prevent the employee from being in the first column. cnt should never be NULL for the valid rows.
You can try like this with normal union & distinct
select emp,item_type,cnt from
(select distinct ' ' as emp,' ' as item_type ,' ' as cnt, employee
from hr_database
union
select employee as emp,item_type ,to_char(count(item_type)) as cnt, employee
from hr_database
group by employee,item_type)a
order by a.employee