Remove partial duplicates in SQL? - sql

The resulting table (CSV) looks like this:
NAME ,TITLE ,YEAR ,QNTY ,CLUB ,PRICE ,LOWEST_CLUB ,LOWEST
Andy Aardverk ,Avarice is Good ,1998,1,Basic ,218.95, CARP ,215.95
Andy Aardverk ,Avarice is Good ,1998,1,Basic ,218.95, YRB Bronze ,215.95
Andy Aardverk ,Yon-juu Hachi ,1948,1,Basic ,44.95, CARP ,41.95
Boswell Biddles ,Not That London! ,2003,1,Basic ,12.5, CAA ,10
Boswell Biddles ,Not That London! ,2003,1,Basic ,12.5, Readers Digest ,10
Cary Cizek ,Ringo to Nashi ,1997,1,Basic ,32.95, YRB Gold ,29.95
Cary Cizek ,Ringo to Nashi ,1997,1,Basic ,32.95, York Club ,29.95
Cary Cizek ,Toronto Underground ,2001,1,YRB Gold ,14.45, York Club ,12.95
Egbert Engles ,Capricia's Conundrum ,1993,1,CARP ,13.45, Guelph Club ,12.95
Egbert Engles ,Tande mou nai ,2002,1,Basic ,112.95, Oprah ,104.95
Egbert Engles ,Tande mou nai ,2002,1,Basic ,112.95, YRB Silver ,104.95
Ekksdwl Qjksynn ,I don't think so ,2001,1,YRB Gold ,12.5, CAA ,11.5
George Wolf ,Math is fun! ,1995,1,YRB Silver ,13.5, CAA ,12
Jack Daniels ,Eigen Eigen ,1980,1,York Club ,57.95, Oprah ,56.95
Jack Daniels ,Okay Why Not? ,2001,1,York Club ,18.45, Oprah ,17.45
Jackie Johassen ,Getting into Snork U. ,2004,1,YRB Silver ,21.95, Waterloo Club ,20.45
Jackie Johassen ,Not That London! ,2003,1,Basic ,12.5, CAA ,10
Klive Kittlehart ,Will Snoopy find Lucy? ,1990,1,YRB Bronze ,14.95, YRB Gold ,12.95
Lux Luthor ,Is Math is fun? ,1996,1,Basic ,72.95, Oprah ,69.95
Lux Luthor ,Tropical Windsor ,2004,1,Basic ,18.95, Oprah ,17.95
Nigel Nerd ,Are my feet too big? ,1993,1,Basic ,13.95, CAA ,11.45
Nigel Nerd ,Dogs are not Cats ,1995,1,Basic ,35.95, UofT Club ,32.95
Phil Regis ,Databases made Real Hard ,2002,1,Basic ,39.95, Oprah ,35.95
Pretence Parker ,Tchuss ,2002,1,Basic ,24.95, Guelph Club ,21.95
Qfwfq ,The Earth is not Enough ,2003,1,YRB Gold ,37.37, Oprah ,36.37
Qfwfq ,Under Your Bed ,2004,1,Oprah ,14.85, CAA ,13.85
Suzy Sedwick ,Are my feet too big? ,1993,1,YRB Silver ,12.95, Oprah ,11.45
Tracy Turnip ,Will Snoopy find Lucy? ,1990,1,Basic ,15.95, Readers Digest ,13.95
Tracy Turnip ,Will Snoopy find Lucy? ,1990,1,Basic ,15.95, YRB Silver ,13.95
Tracy Turnip ,Yon-juu Hachi ,1948,1,Readers Digest ,41, York Club ,40.95
Valerie Vixen ,Base de Donne ,2003,1,YRB Bronze ,23.95, Readers Digest ,20.95
Xia Xu ,Where art thou Bertha? ,2003,1,Basic ,30.95, CAA ,26.95
Yves Yonge ,Radiator Barbecuing ,2002,2,Basic ,14.2, Waterloo Club ,12.2
Zebulon Zilio ,Transmorgifacation ,2004,1,Basic ,288.73, CAA ,278.73
34 record(s) selected.,,,,,,,
I want to be able to only show one of the lowest options. For example 'Andy Aardverk' has purchased 'Avarice is Good' and could have bought it from 'CARP' or 'YRB Bronze' for a lower price. I only want one to show so it could be 'CARP' or 'YRB Bronze' but not both.
I tried to use 'group by' on 'name, title, year, qnty, club, price' but was given this error:
'SQL0119N An expression starting with "LOWEST_CLUB" specified in a SELECT
clause, HAVING clause, or ORDER BY clause is not specified in the GROUP BY
clause or it is in a SELECT clause, HAVING clause, or ORDER BY clause with a
column function and no GROUP BY clause is specified. SQLSTATE=42803'

It would have been easier with your actual query, but I'll give it a go anyways.
You can solve this problem by using a CTE like this:
;WITH CTE AS (
SELECT
ROW_NUMBER() OVER(PARTITION BY [NAME], [TITLE] ORDER BY LOWEST_CLUB ASC, some_fallback_if_two_prices_are_the_same ASC) AS RowNumber,
[NAME],
[TITLE],
col1,
col2,
lowest_price,
some_fallback_if_two_prices_are_the_same
FROM [Table]
)
SELECT * --or rewrite your columns if you want to avoid the RowNumber
FROM CTE
WHERE RowNumber = 1;
That SELECT inside the CTE should be your current query + the ROW_NUMBER() line.
Seeing as I don't have the query, I can't give you a final result. You'll have to fiddle with it until it works for you.

Related

SQL Result to multiple array

MY SQL returns the following array...
id
staff
province
1
Ben
Ontario
2
Ben
Quebec
3
John
Manitoba
4
John
Saskatchewan
6
Kitty
Alberta
7
Kitty
Nova Scotia
I would like to have the record displayed like this...
staff
province
Ben
Ontario, Quebec
John
Quebec, Manitoba, Saskatchewan
Kitty
Alberta, Nova Scotia
what approach should I use to approach this?
Would be better to post the tables as well for clearer context.
You can use Aggregate functions and Grouping to help doing this. A GROUP BY to group the rows by staff column, then use GROUP_CONCAT() to concatenate province values in one string.
A reference of how you want it to be, unsure what table you are using or if there are any other factors but you can adapt as needed.
SELECT staff, GROUP_CONCAT(province SEPARATOR ', ') as province
FROM table_name
GROUP BY staff;

Is there a wildcard search solution that can allow me to search for a given string but allow 2 characters to be wrong/missing/blank in Snowflake?

I'm very new to the concept or Regular Expressions and am looking for a wildcard search solution that allows 2 or fewer characters of the string to be wrong/missing/blank, in Snowflake.
For example, if I have a table's column of basketball players' names such as 'lebron james', 'carmelo anthony', 'kobe bryant', below are the results I would like to have matched from another table (consumers' search queries) for 'lebron james':
'lebrn james' (missing 'o')
'lebronjames' (missing a space between fn and ln)
'lebrn jme' (missing 'o' and 'a')
'lebron james' (exact match)
Would anyone be so kind to provide some guidance?
EDITDISTANCE is what you are asking for:
with input(str) as (
select * from values
('lebrn james'), ('lebronjames'), ('lebrn jme')
), targets(str) as (
select * from values
('lebron james'), ('carmelo anthony'), ('kobe bryant')
)
select i.str, t.str, editdistance(i.str, t.str)
from input i
cross join targets t;
gives:
STR
STR_2
EDITDISTANCE(I.STR, T.STR)
lebrn james
lebron james
1
lebrn james
carmelo anthony
14
lebrn james
kobe bryant
10
lebronjames
lebron james
1
lebronjames
carmelo anthony
13
lebronjames
kobe bryant
10
lebrn jme
lebron james
3
lebrn jme
carmelo anthony
13
lebrn jme
kobe bryant
9

Compute number of direct report for each employee in the organization (aggregation)

FYI I use Redshift SQL.
I have a database that looks roughly like the one below (the database has multiple columns that I'll abstract away for simplicity).
This table is a representation of the hierarchical tree within my organization.
employee manager
-------- -------
daniel louis
matt martha
martha kim
laura matt
michael martha
...
As you can see, matt appears in two distinct records, one as the employee and the other as laura's manager. Martha appears in three records, one as an employee and in two other as manager.
I'd like to find a way to compute the number of direct reports each employee has. A conditional count in which the criteria would be where employee = manager, perhaps?
I guess I could find this information using a subquery and then join it back but I was wondering if there was a more "elegant" way to do this making use of window functions maybe.
The expected output for the table above would be:
employee manager direct_reports
-------- ------- --------------
daniel louis 0
matt martha 1
martha kim 2
laura matt 0
michael martha 0
...
I would approach this with a correlated subquery:
select
t.employee,
t.manager,
(select count(*) from mytable t1 where t1.manager = t.employee) direct_reports
from mytable t
This should be a quite efficient method, especially with an index on (employee, manager).
Use a left join and aggregation:
select em.employee, em.manager, count(ew.employee)
from employees em left join
employees ew
on ew.manager = em.employee
group by em.employee, em.manager;

an SQL query from a 3 tables, can't get the results required

-- List all titles that have been sold along with the artist, order date and ship date
SELECT title, artist, order_date, ship_date
FROM items,orders,orderline
WHERE orders.order_id = orderline.order_id
AND items.item_id = orderline.item_id;
I tried my own query up I get results below
Under the Sun, Donald Arley, 11/15/2013, 11/20/2013
Under the Sun, Donald Arley, 12/20/2013, 12/22/2013
Under the Sun, Donald Arley, 1/18/2014, 1/23/2014
Dark Lady, Keith Morris, 1/31/2014, 2/4/2014
Dark Lady, Keith Morris, 3/10/2014, 3/15/2014
Dark Lady, Keith Morris, 3/14/2014, 3/19/2014
Dark Lady, Keith Morris, 11/15/2013, 11/20/2013
Happy Days, Andrea Reid, 2/27/2014, 3/2/2014
Happy Days, Andrea Reid, 10/30/2013, 11/3/2013
Happy Days, Andrea Reid, 12/18/2013, 12/22/2013
The Hunt, Walter Alford, 1/31/2014, 2/4/2014
The Hunt, Walter Alford, 3/10/2014, 3/15/2014
etc...............
This looks like a generic homework question.
I suggest you familiarize yourself with this page and site http://use-the-index-luke.com/sql/join.
Solution:
Change your statement to:
SELECT
items.title, items.artist, orders.order_date, orders.ship_date
FROM
items
JOIN
orderline ON orderline.item_id = items.item_id
JOIN
orders ON orders.order_id = orderline.order_id

Interesting SQL Sorting Issue

It's crunch time, deadline for my most recent contract is coming in two days and almost everything is complete and working fine (knock on wood) except for one issue.
In one of my stored procedures, I'm needing to return a result set as follows.
group_id name
-------- --------
A101 Craig
A102 Craig
Z101 Craig
Z102 Craig
A101 Jim
A102 Jim
Z101 Jim
Z102 Jim
B101 Andy
B102 Andy
Z101 Andy
Z102 Andy
The names need to be sorted by the first character of the group id and also include the Z101/Z102 entries. By sorting strictly by the group id, I get a result set as follows:
group_id name
-------- --------
A101 Craig
A102 Craig
A101 Jim
A102 Jim
B101 Andy
B102 Andy
Z101 Andy
Z102 Andy
Z101 Craig
Z102 Craig
Z101 Jim
Z102 Jim
I really can't think of a solution that doesn't involve me making a cursor and bloating the stored procedure up more than it already is. I'm sure a great mind out there has an elegant solution and I'm eager to see what the community can come up with.
Thanks a ton in advance.
Edit: Let me expand :) I'm sorry, it's late and I'm coffee addled.
The above result set is a special case for a special type of data entry. Being transparent, we're making an election based website and these are going to be candidates sorted by office, name, and then district.
Most offices have multiple districts in them except for district positions like magistrate/coroner, which will have only one. The Z comes in as the "district" for absentee machine and absentee paper votes.
The non-magistrate positions can be sorted by name first, as they are all grouped together. However, the existing system lists all magistrates in a huge clump of information, when they should be sorted by individual districts. This is where the issue lies.
To protect my pride, I want to add that I had no control over the normalization of the database. It was given to me by the client.
Here's the order clause of my stored procedure, if it helps:
ORDER BY candidate.party,
candidate.ballot_name,
CASE WHEN candidate.district_type = 'MAG' THEN LEFT(votecount.precinct_id, 1) END,
candidate.last_name,
candidate.first_name,
precinct.name
Edit 2: Here's where I currently stand (1:43 A.M.) -
I'm using a suggestion below to create a conditional inner join as follows:
IF candidate.district_type = 'MAG'
BEGIN
(
SELECT candidate.id AS candidate_id, candidate.last_name, LEFT(votecount.precinct_id, 1) AS district, votecount.precinct_id
FROM candidate
INNER JOIN votecount
ON votecount.candidate_id = candidate.id
GROUP BY name
) mag_order
INNER JOIN mag_order
ON mag_order.candidate_id = candidate.id
END
and then I'll sort it by mag_order.district, candidate.precinct_id, candidate.last_name.
For some reason I'm getting a SQL error when aliasing the ( SELECT ) as mag_order. Anyone see anything wrong with the code? I can't for the life of me. Sorry this is a bit tangential.
SELECT g1.group_id, g1.name
FROM
groups g1
INNER JOIN
(
SELECT MIN(group_id), name
FROM groups
GROUP BY name
) g2 on g1.name = g2.name
ORDER BY g2.group_id, g1.name, g1.group_id
ORDER BY name DESC, SUBSTR(group_id,1), group_id
SELECT groupId, name
FROM table
ORDER BY getFirstGroupId(name), name, groupId
Then your getFirstGroupId() function would return the first groupId for that name
SELECT MIN(groupId)
FROM groupTable
WHERE name = #name