SQL how to display results only if two parts are unique - sql

I'm currently having an issue trying to make a query such that it displays the fields only if both parts are unique. For example, lets say the fields to be displayed currently are as goes:
SELECT
Name,
CompanyName,
JobStartDate,
Birthday,
Age,
Favorite Ice Cream,
Height
From 'sample_person_data'
How would I set this so that it only displays fields where both CompanyName and JobStartDate are both distinct?
At first, I thought just putting distinct would be enough, but came to the realization that would not work, I then thought what if I make it so that it has to check both CompanyName + JobStartDate as unique fields, so only showing the fields where both those two things are unique, but could not go about implementing it.
Essentially what I'm aiming to achieve is if there was a large dataset with some repeated values, how could I help display only the unique fields. I use CompanyName and JobStartDate as examples here, but I understand that people can start at the same company on the same day, therefore this would be a concept which could expand into adding more comparisons.
Thank you for your time.
EDIT: Based on comments I am trying to provide further detail by example
Say this is the sample data:
Name
CompanyName
JobStartDate
Birthday
Age
Favorite Ice Cream
Height
John
Google
04-17-00
01-01-78
50
Vanilla
5-7
John
Google
04-17-00
01-01-78
50
Chocolate
5-7
John
Microsoft
04-17-00
02-01-95
30
Chocolate
5-8
Nancy
Google
06-27-00
04-01-78
50
Vanilla
5-2
Joanna
Google
08-19-00
05-01-78
50
Vanilla
5-0
So here we see the same John from Google filled the form twice because say he decided to change his favorite ice cream. How do I edit the query such that it displays such as the following:
Name
CompanyName
JobStartDate
Birthday
Age
Favorite Ice Cream
Height
John
Google
04-17-00
01-01-78
50
Vanilla
5-7
John
Microsoft
04-17-00
02-01-95
30
Chocolate
5-8
Nancy
Google
06-27-00
04-01-78
50
Vanilla
5-2
Joanna
Google
08-19-00
05-01-78
50
Vanilla
5-0
I don't really care if his favorite ice cream shows up as Chocolate or Vanilla, but rather that only 1 entry of a John from google shows up, using the current company + job start date as the identifying fields for example.

Use below simple approach
select * from your_table
qualify 1 = row_number() over(partition by CompanyName, JobStartDate)
if applied to sample data in your question - output is

Related

Cannot identify how to query SQL data in unusual format

I have been practicing data manipulation in SQL Server Express 2017, and I have been provided with a data source I can't seem to make sense of. I was hoping there might be someone more familiar here that might be able to point me in the right direction. I need to work on some SQL queries on the dataset, but I haven't the faintest idea on where to start.
The data looks like this for instance:
Company Code - Field - Value (3 fields)
1001 - Vendor Name - 7 Eleven
1001 - Vendor Name - Bob Jane
1001 - Vendor Name - Krispy Kreme
1001 - Vendor Address - 102 Reservoir Street
1001 - Vendor Address - 110 Pitt Road
1001 - Vendor Address - 23 Foxy Place
Usually, I would expect to see it in a somewhat relational type of table like
Company Code Vendor Name Vendor Address
1001 7 Eleven 102 Reservoir Street.
1001 Bob Jane 110 Pitt Road.
What you have appears to be a design called EAV - entity attribute value. That term you can google. Unfortunately either you left out important information or your design is fundamentally broken. Given what you posted, there is no way to know that "Bob Jane" goes with "110 Pitt Road". Rows in a table have no reliable or specific order. You need a column in your table to define "order" if you want to associate rows based on "order".

Combining almost identical rows into 1

I have a tricky problem that I wouldn't mind a bit of help on, I've made some progress using queries that I've here and elsewhere, but am getting seriously stumped now.
I have a mailing list that has numerous near duplications that I'm trying to combine into one meaningful row, taking data such as this.
Title Forename Surname Address1 Postcode Phone Age Income Ownership Gas
Mrs D Andrews 122 Somewhere BH10 123456 66-70 Homeowner
Ms Diane Andrews 122 Somewhere BH10 123456 £25-40 EDF
and making one row along the lines of
Title Forename Surname Address1 Postcode Phone Age Income Ownership Gas
Mrs Diane Andrews 122 Somewhere BH10 123456 66-70 £25-40 Homeowner EDF
I have over 127 million records, most duplicated with a similar pattern, but no clear logic as was proven when I added an identity field. I also have over 90 columns to consider, so it's a bit of work!
There isn't a clear pattern to the data, so I'm thinking I may have a huge case statement to try to climb over.
Using the following code I can get a decent start on only returning the full name, but with the pattern of data - trying to compare the fields across rows is as follows.
SELECT c1.*
FROM
Mailing c1
JOIN
Mailingc2 ON c1.Telephone1 = c2.Telephone1 AND c1.surname = c2.surname
WHERE
len(c1.Forename) > len(c2.Forename)
AND c2.over_18 <> ''
AND c1.Telephone1 = '123456'
Has anyone got any pointers as to how I should progress please? I'm open to discussion and ideas...
I'm using SQL 2005 and apologies in advance if the tagging is all over the place!
Cheers,
Jon
Would it work by assuming that all persons with the same surname and phone number (Do all persons have a phone?) were the same person?
INSERT INTO newtable <fieldnames>
SELECT lastname,phone,max(field3),max(field4)....
FROM oldtable
GROUP BY lastname,phone
But that would collapse John Smith and Jack Smith living together into one person.
Perhaps you should consider outsourcing it to a data-entry sweatshop somewhere, adter you have preprocessed the data. :-)
And/or be prepared to take the flack for mistaken bundling.
Perhaps adding something like "To improve our green footprint, we have merged x listings on your adress together. If you would like separate mailings, please contact us"

Crystal reports - missing fields

using Crystal reports 10 linked to an excel document. Would like to pull the dinner field but also pull country and Company name from row that dont have it, this are linked via Bookingref. Example below. I've tried sub-reports and supressing unwanted fields but can't get it right. Also I can't make changes in excel doc as it's 1000+ records, which is exported from an online system weekly.
Id BookingRef Country CompanyName Surname Forname Dinner
1 001 UK Company1 John Andrews
2 001 Mary Jane 1
3 001 Tom Andrews 1
4 002 Germany Company2 Lee Jones
5 003 Germany Company3 Peter Lee 1
6 003 Sofie Lee 1
OK I am not sure I understand the full extent of your problem but let's start with the Country and Company name and see if I can get you moving forward. Instead of putting the Country field directly on the report you could use a formula field and do something like this:
IF {#BookingRef} = "001" Then
"UK"
Else IF {#BookingRef} = "002" Then
"Germany"
Else
"Unnamed"
Now you just put the formula field where the country field used to be and it will put the right country in bases on the BookingRef code. This, however, is only practical if you are working with a small number of Country / Company Names or possibly a big list that never changes although I would caution against the latter.
The other thing you could do is create a table in any database that holds the BookingRef, Company and Country values, link the BookingRef fields from both "databases" and then just drop the fields on your report.
If I am missing the point of your question please be real specific about what it is you are trying to accomplish and what is and is not working in your current solution.

Database design - alternatives for Entity Attribute Value (EAV)

see How to design a product table for many kinds of product where each product has many parameters for similar topic.
My question: i want to design a database, that will be used for a production facility of different types of products where each product has its own (number of) parameters.
because i want the serial numbers to be in one tabel for overview purposes i have a problem with these different paraeters .
One solution could be EAV, but it has its downsides, certainly because we have +- 5 products with every product +- 20.000 serial numbers (records). it looks a bit overkill to me...
I just don't know how one could design a database so that you have an attribute in a mastertable that says: 'hey, you could find details of this record in THAT detail-table".
'in a way that you qould easely query the results)
currenty i am using Visual Basic & Acces 2007. but i'm going to Visual Basic & MySQL.
thanks for your help.
Bob
I would go with something like this:
product [productid, title, price, datecreated, datemodified, etc]
attribute [attributeid, title]
productattribute [productid, attributeid, value, unit]
Example:
[product]
productid title price datecreated datemodified
1 LCD TV 99.95 2010-01-01 2010-01-01
2 Car 12356 2010-01-01 2010-01-02
3 B/W TV 12.95 1960-01-01 1960-01-01
[attribute]
attributeid title
10 Colors
11 Dimensions
12 Passengers
[productattribute]
productid attributeid value unit
1 10 16 million
1 11 32 inch
2 12 4 adults
3 10 2 colors
3 11 6 inch
It seems you probably need to learn more about the available design patterns when dealing with this sort of problem as there isn't a one-size-fits-all solution.
I recommend picking up a copy of Patterns of Enterprise Application Delvelopment to help you on your way. Sorry that I'm not able to answer your question directly (hopefully someone else here on SO can) but I think the answer given in the question you linked to is about as good as it gets.

Beginner SQL question: querying gold and silver tag badges in Stack Exchange Data Explorer

I'm using the Stack Exchange Data Explorer to learn SQL, but I think the fundamentals of the question is applicable to other databases.
I'm trying to query the Badges table, which according to Stexdex (that's what I'm going to call it from now on) has the following schema:
Badges
Id
UserId
Name
Date
This works well for badges like [Epic] and [Legendary] which have unique names, but the silver and gold tag-specific badges seems to be mixed in together by having the same exact name.
Here's an example query I wrote for [mysql] tag:
SELECT
UserId as [User Link],
Date
FROM
Badges
Where
Name = 'mysql'
Order By
Date ASC
The (slightly annotated) output is: as seen on stexdex:
User Link Date
--------------- ------------------- // all for silver except where noted
Bill Karwin 2009-02-20 11:00:25
Quassnoi 2009-06-01 10:00:16
Greg 2009-10-22 10:00:25
Quassnoi 2009-10-31 10:00:24 // for gold
Bill Karwin 2009-11-23 11:00:30 // for gold
cletus 2010-01-01 11:00:23
OMG Ponies 2010-01-03 11:00:48
Pascal MARTIN 2010-02-17 11:00:29
Mark Byers 2010-04-07 10:00:35
Daniel Vassallo 2010-05-14 10:00:38
This is consistent with the current list of silver and gold earners at the moment of this writing, but to speak in more timeless terms, as of the end of May 2010 only 2 users have earned the gold [mysql] tag: Quassnoi and Bill Karwin, as evidenced in the above result by their names being the only ones that appear twice.
So this is the way I understand it:
The first time an Id appears (in chronological order) is for the silver badge
The second time is for the gold
Now, the above result mixes the silver and gold entries together. My questions are:
Is this a typical design, or are there much friendlier schema/normalization/whatever you call it?
In the current design, how would you query the silver and gold badges separately?
GROUP BY Id and picking the min/max or first/second by the Date somehow?
How can you write a query that lists all the silver badges first then all the gold badges next?
Imagine also that the "real" query may be more complicated, i.e. not just listing by date.
How would you write it so that it doesn't have too many repetition between the silver and gold subqueries?
Is it perhaps more typical to do two totally separate queries instead?
What is this idiom called? A row "partitioning" query to put them into "buckets" or something?
Requirement clarification
Originally I wanted the following output, essentially:
User Link Date
--------------- -------------------
Bill Karwin 2009-02-20 11:00:25 // result of query for silver
Quassnoi 2009-06-01 10:00:16 // :
Greg 2009-10-22 10:00:25 // :
cletus 2010-01-01 11:00:23 // :
OMG Ponies 2010-01-03 11:00:48 // :
Pascal MARTIN 2010-02-17 11:00:29 // :
Mark Byers 2010-04-07 10:00:35 // :
Daniel Vassallo 2010-05-14 10:00:38 // :
------- maybe some sort of row separator here? can SQL do this? -------
Quassnoi 2009-10-31 10:00:24 // result of query for gold
Bill Karwin 2009-11-23 11:00:30 // :
But the answers so far with a separate column for silver and gold is also great, so feel free to pursue that angle as well. I'm still curious how you'd do the above, though.
Is this a typical design, or are there much friendlier schema/normalization/whatever you call it?
Sure, you could add a type code to make it more explicit. But when you consider that one can not get a gold badge before a silver one, the date stamp makes a lot of sense to differentiate between them.
In the current design, how would you query the silver and gold badges separately? GROUP BY Id and picking the min/max or first/second by the Date somehow?
Yes - joining onto a derived table (AKA inline view) that is a list of users & the minimum date would return the silver badges. Using HAVING COUNT(*) >= 1 would work too. You'd have to use a combination of GROUP BY and HAVING COUNT(*) = 2` to get gold badges - the max date doesn't ensure that there are more than one record for a userid...
How can you write a query that lists all the silver badges first then all the gold badges next?
Sorry - by users, or all silvers first and then golds? The former might be done simply by using ORDER BY t.userid, t.date; the latter I'd likely use analytic functions (IE: ROW_NUMBER(), RANK())...
Is it perhaps more typical to do two totally separate queries instead?
See above about how vague your requirements are, to me anyways...
What is this idiom called? A row "partitioning" query to put them into "buckets" or something?
What you're asking about is referred to by the following synonyms: Analytic, Windowing, ranking...
You'd do something like this and rely only on date or count in an aggregate.
Arguably, it also makes no sense to query silver followed by gold, but rather get data side by side like this:
Unfortunately, you haven't really specified what you want, but a good starting point for aggregates is to express it in plain English
Example: "Give me dates of silver and gold badge awards per user for tag mysql". Which this does:
SELECT
UserId as [User Link],
min(Date) as [Silver Date],
case when count(*) = 1 THEN NULL ELSE max(date) END
FROM
Badges
Where
Name = 'mysql'
group by
UserId
Order By
case when count(*) = 1 THEN NULL ELSE max(date) END DESC, min(Date)
Edit, after update:
Your desired output is not really SQL: it's 2 separate recordsets. The separator is a no-go. As a setb based operation, there is no "natural" order so this introduces one:
SELECT
UserId as [User Link],
min(Date) as [Date],
0 as dummyorder
FROM
Badges
Where
Name = 'mysql'
group by
UserId
union all
select
UserId as [User Link],
max(Date) as [Date],
1 as dummyorder
FROM
Badges
Where
Name = 'mysql'
group by
UserId
having
count(*) = 2
Order By
dummyorder, Date