Remove partial duplicates in SQL? - sql
The resulting table (CSV) looks like this:
NAME ,TITLE ,YEAR ,QNTY ,CLUB ,PRICE ,LOWEST_CLUB ,LOWEST
Andy Aardverk ,Avarice is Good ,1998,1,Basic ,218.95, CARP ,215.95
Andy Aardverk ,Avarice is Good ,1998,1,Basic ,218.95, YRB Bronze ,215.95
Andy Aardverk ,Yon-juu Hachi ,1948,1,Basic ,44.95, CARP ,41.95
Boswell Biddles ,Not That London! ,2003,1,Basic ,12.5, CAA ,10
Boswell Biddles ,Not That London! ,2003,1,Basic ,12.5, Readers Digest ,10
Cary Cizek ,Ringo to Nashi ,1997,1,Basic ,32.95, YRB Gold ,29.95
Cary Cizek ,Ringo to Nashi ,1997,1,Basic ,32.95, York Club ,29.95
Cary Cizek ,Toronto Underground ,2001,1,YRB Gold ,14.45, York Club ,12.95
Egbert Engles ,Capricia's Conundrum ,1993,1,CARP ,13.45, Guelph Club ,12.95
Egbert Engles ,Tande mou nai ,2002,1,Basic ,112.95, Oprah ,104.95
Egbert Engles ,Tande mou nai ,2002,1,Basic ,112.95, YRB Silver ,104.95
Ekksdwl Qjksynn ,I don't think so ,2001,1,YRB Gold ,12.5, CAA ,11.5
George Wolf ,Math is fun! ,1995,1,YRB Silver ,13.5, CAA ,12
Jack Daniels ,Eigen Eigen ,1980,1,York Club ,57.95, Oprah ,56.95
Jack Daniels ,Okay Why Not? ,2001,1,York Club ,18.45, Oprah ,17.45
Jackie Johassen ,Getting into Snork U. ,2004,1,YRB Silver ,21.95, Waterloo Club ,20.45
Jackie Johassen ,Not That London! ,2003,1,Basic ,12.5, CAA ,10
Klive Kittlehart ,Will Snoopy find Lucy? ,1990,1,YRB Bronze ,14.95, YRB Gold ,12.95
Lux Luthor ,Is Math is fun? ,1996,1,Basic ,72.95, Oprah ,69.95
Lux Luthor ,Tropical Windsor ,2004,1,Basic ,18.95, Oprah ,17.95
Nigel Nerd ,Are my feet too big? ,1993,1,Basic ,13.95, CAA ,11.45
Nigel Nerd ,Dogs are not Cats ,1995,1,Basic ,35.95, UofT Club ,32.95
Phil Regis ,Databases made Real Hard ,2002,1,Basic ,39.95, Oprah ,35.95
Pretence Parker ,Tchuss ,2002,1,Basic ,24.95, Guelph Club ,21.95
Qfwfq ,The Earth is not Enough ,2003,1,YRB Gold ,37.37, Oprah ,36.37
Qfwfq ,Under Your Bed ,2004,1,Oprah ,14.85, CAA ,13.85
Suzy Sedwick ,Are my feet too big? ,1993,1,YRB Silver ,12.95, Oprah ,11.45
Tracy Turnip ,Will Snoopy find Lucy? ,1990,1,Basic ,15.95, Readers Digest ,13.95
Tracy Turnip ,Will Snoopy find Lucy? ,1990,1,Basic ,15.95, YRB Silver ,13.95
Tracy Turnip ,Yon-juu Hachi ,1948,1,Readers Digest ,41, York Club ,40.95
Valerie Vixen ,Base de Donne ,2003,1,YRB Bronze ,23.95, Readers Digest ,20.95
Xia Xu ,Where art thou Bertha? ,2003,1,Basic ,30.95, CAA ,26.95
Yves Yonge ,Radiator Barbecuing ,2002,2,Basic ,14.2, Waterloo Club ,12.2
Zebulon Zilio ,Transmorgifacation ,2004,1,Basic ,288.73, CAA ,278.73
34 record(s) selected.,,,,,,,
I want to be able to only show one of the lowest options. For example 'Andy Aardverk' has purchased 'Avarice is Good' and could have bought it from 'CARP' or 'YRB Bronze' for a lower price. I only want one to show so it could be 'CARP' or 'YRB Bronze' but not both.
I tried to use 'group by' on 'name, title, year, qnty, club, price' but was given this error:
'SQL0119N An expression starting with "LOWEST_CLUB" specified in a SELECT
clause, HAVING clause, or ORDER BY clause is not specified in the GROUP BY
clause or it is in a SELECT clause, HAVING clause, or ORDER BY clause with a
column function and no GROUP BY clause is specified. SQLSTATE=42803'
It would have been easier with your actual query, but I'll give it a go anyways.
You can solve this problem by using a CTE like this:
;WITH CTE AS (
SELECT
ROW_NUMBER() OVER(PARTITION BY [NAME], [TITLE] ORDER BY LOWEST_CLUB ASC, some_fallback_if_two_prices_are_the_same ASC) AS RowNumber,
[NAME],
[TITLE],
col1,
col2,
lowest_price,
some_fallback_if_two_prices_are_the_same
FROM [Table]
)
SELECT * --or rewrite your columns if you want to avoid the RowNumber
FROM CTE
WHERE RowNumber = 1;
That SELECT inside the CTE should be your current query + the ROW_NUMBER() line.
Seeing as I don't have the query, I can't give you a final result. You'll have to fiddle with it until it works for you.
Related
SQL Result to multiple array
MY SQL returns the following array... id staff province 1 Ben Ontario 2 Ben Quebec 3 John Manitoba 4 John Saskatchewan 6 Kitty Alberta 7 Kitty Nova Scotia I would like to have the record displayed like this... staff province Ben Ontario, Quebec John Quebec, Manitoba, Saskatchewan Kitty Alberta, Nova Scotia what approach should I use to approach this?
Would be better to post the tables as well for clearer context. You can use Aggregate functions and Grouping to help doing this. A GROUP BY to group the rows by staff column, then use GROUP_CONCAT() to concatenate province values in one string. A reference of how you want it to be, unsure what table you are using or if there are any other factors but you can adapt as needed. SELECT staff, GROUP_CONCAT(province SEPARATOR ', ') as province FROM table_name GROUP BY staff;
Is there a wildcard search solution that can allow me to search for a given string but allow 2 characters to be wrong/missing/blank in Snowflake?
I'm very new to the concept or Regular Expressions and am looking for a wildcard search solution that allows 2 or fewer characters of the string to be wrong/missing/blank, in Snowflake. For example, if I have a table's column of basketball players' names such as 'lebron james', 'carmelo anthony', 'kobe bryant', below are the results I would like to have matched from another table (consumers' search queries) for 'lebron james': 'lebrn james' (missing 'o') 'lebronjames' (missing a space between fn and ln) 'lebrn jme' (missing 'o' and 'a') 'lebron james' (exact match) Would anyone be so kind to provide some guidance?
EDITDISTANCE is what you are asking for: with input(str) as ( select * from values ('lebrn james'), ('lebronjames'), ('lebrn jme') ), targets(str) as ( select * from values ('lebron james'), ('carmelo anthony'), ('kobe bryant') ) select i.str, t.str, editdistance(i.str, t.str) from input i cross join targets t; gives: STR STR_2 EDITDISTANCE(I.STR, T.STR) lebrn james lebron james 1 lebrn james carmelo anthony 14 lebrn james kobe bryant 10 lebronjames lebron james 1 lebronjames carmelo anthony 13 lebronjames kobe bryant 10 lebrn jme lebron james 3 lebrn jme carmelo anthony 13 lebrn jme kobe bryant 9
Compute number of direct report for each employee in the organization (aggregation)
FYI I use Redshift SQL. I have a database that looks roughly like the one below (the database has multiple columns that I'll abstract away for simplicity). This table is a representation of the hierarchical tree within my organization. employee manager -------- ------- daniel louis matt martha martha kim laura matt michael martha ... As you can see, matt appears in two distinct records, one as the employee and the other as laura's manager. Martha appears in three records, one as an employee and in two other as manager. I'd like to find a way to compute the number of direct reports each employee has. A conditional count in which the criteria would be where employee = manager, perhaps? I guess I could find this information using a subquery and then join it back but I was wondering if there was a more "elegant" way to do this making use of window functions maybe. The expected output for the table above would be: employee manager direct_reports -------- ------- -------------- daniel louis 0 matt martha 1 martha kim 2 laura matt 0 michael martha 0 ...
I would approach this with a correlated subquery: select t.employee, t.manager, (select count(*) from mytable t1 where t1.manager = t.employee) direct_reports from mytable t This should be a quite efficient method, especially with an index on (employee, manager).
Use a left join and aggregation: select em.employee, em.manager, count(ew.employee) from employees em left join employees ew on ew.manager = em.employee group by em.employee, em.manager;
an SQL query from a 3 tables, can't get the results required
-- List all titles that have been sold along with the artist, order date and ship date SELECT title, artist, order_date, ship_date FROM items,orders,orderline WHERE orders.order_id = orderline.order_id AND items.item_id = orderline.item_id; I tried my own query up I get results below Under the Sun, Donald Arley, 11/15/2013, 11/20/2013 Under the Sun, Donald Arley, 12/20/2013, 12/22/2013 Under the Sun, Donald Arley, 1/18/2014, 1/23/2014 Dark Lady, Keith Morris, 1/31/2014, 2/4/2014 Dark Lady, Keith Morris, 3/10/2014, 3/15/2014 Dark Lady, Keith Morris, 3/14/2014, 3/19/2014 Dark Lady, Keith Morris, 11/15/2013, 11/20/2013 Happy Days, Andrea Reid, 2/27/2014, 3/2/2014 Happy Days, Andrea Reid, 10/30/2013, 11/3/2013 Happy Days, Andrea Reid, 12/18/2013, 12/22/2013 The Hunt, Walter Alford, 1/31/2014, 2/4/2014 The Hunt, Walter Alford, 3/10/2014, 3/15/2014 etc...............
This looks like a generic homework question. I suggest you familiarize yourself with this page and site http://use-the-index-luke.com/sql/join. Solution: Change your statement to: SELECT items.title, items.artist, orders.order_date, orders.ship_date FROM items JOIN orderline ON orderline.item_id = items.item_id JOIN orders ON orders.order_id = orderline.order_id
Interesting SQL Sorting Issue
It's crunch time, deadline for my most recent contract is coming in two days and almost everything is complete and working fine (knock on wood) except for one issue. In one of my stored procedures, I'm needing to return a result set as follows. group_id name -------- -------- A101 Craig A102 Craig Z101 Craig Z102 Craig A101 Jim A102 Jim Z101 Jim Z102 Jim B101 Andy B102 Andy Z101 Andy Z102 Andy The names need to be sorted by the first character of the group id and also include the Z101/Z102 entries. By sorting strictly by the group id, I get a result set as follows: group_id name -------- -------- A101 Craig A102 Craig A101 Jim A102 Jim B101 Andy B102 Andy Z101 Andy Z102 Andy Z101 Craig Z102 Craig Z101 Jim Z102 Jim I really can't think of a solution that doesn't involve me making a cursor and bloating the stored procedure up more than it already is. I'm sure a great mind out there has an elegant solution and I'm eager to see what the community can come up with. Thanks a ton in advance. Edit: Let me expand :) I'm sorry, it's late and I'm coffee addled. The above result set is a special case for a special type of data entry. Being transparent, we're making an election based website and these are going to be candidates sorted by office, name, and then district. Most offices have multiple districts in them except for district positions like magistrate/coroner, which will have only one. The Z comes in as the "district" for absentee machine and absentee paper votes. The non-magistrate positions can be sorted by name first, as they are all grouped together. However, the existing system lists all magistrates in a huge clump of information, when they should be sorted by individual districts. This is where the issue lies. To protect my pride, I want to add that I had no control over the normalization of the database. It was given to me by the client. Here's the order clause of my stored procedure, if it helps: ORDER BY candidate.party, candidate.ballot_name, CASE WHEN candidate.district_type = 'MAG' THEN LEFT(votecount.precinct_id, 1) END, candidate.last_name, candidate.first_name, precinct.name Edit 2: Here's where I currently stand (1:43 A.M.) - I'm using a suggestion below to create a conditional inner join as follows: IF candidate.district_type = 'MAG' BEGIN ( SELECT candidate.id AS candidate_id, candidate.last_name, LEFT(votecount.precinct_id, 1) AS district, votecount.precinct_id FROM candidate INNER JOIN votecount ON votecount.candidate_id = candidate.id GROUP BY name ) mag_order INNER JOIN mag_order ON mag_order.candidate_id = candidate.id END and then I'll sort it by mag_order.district, candidate.precinct_id, candidate.last_name. For some reason I'm getting a SQL error when aliasing the ( SELECT ) as mag_order. Anyone see anything wrong with the code? I can't for the life of me. Sorry this is a bit tangential.
SELECT g1.group_id, g1.name FROM groups g1 INNER JOIN ( SELECT MIN(group_id), name FROM groups GROUP BY name ) g2 on g1.name = g2.name ORDER BY g2.group_id, g1.name, g1.group_id
ORDER BY name DESC, SUBSTR(group_id,1), group_id
SELECT groupId, name FROM table ORDER BY getFirstGroupId(name), name, groupId Then your getFirstGroupId() function would return the first groupId for that name SELECT MIN(groupId) FROM groupTable WHERE name = #name