I have a table with the following sampled data:
Name Color
Alice Green
Bob Black
Chris Green
Chris Black
David Red
Peter Blue
Simon Blue
Simon Red
Simon Green
Ultimately, I want to reduce the table by consolidating the Color column like:
Name Color
Alice Green
Bob Black
Chris Green, Black
David Red
Peter Blue
Simon Blue, Red, Green
such that Name can become unique.
The table has no PRIMARY KEY, I got as far as creating a new column using ROW_NUMBER to distinguishing duplicates but don't know what to do next.:
rownumber Name Color
1 Alice Green
1 Bob Black
1 Chris Green
2 Chris Black
1 David Red
1 Peter Blue
1 Simon Blue
2 Simon Red
3 Simon Green
Don't do this. Instead, normalize your tables further to e.g. a Person, Preference and a Color table (where Preference, if that is the right name for the relation, has foreign keys to Person and Color). This way, you avoid the risks of inconsistencies (you can make Person names unique if you like, but you should make Color names unique).
EDITED: if you're getting this from a join query I'll assume the data is reasonably consistent, so normalization isn't an issue. Would it be possible to change the join query to GROUP on Name instead? Much cleaner than hacking around a result set, really!
I have adopted the approach here
with a table variable to hold the temporary result set to work(hack) from.
Sorted!
Related
Say I have a data table like this:
Name
Color
Food
John
Black
Pizza
John
Blue
Pasta
John
Blue
Fries
I ran this code to combine ID's 2 and 3 together, so that the value for the "Food" field will show "Pasta/Fries" This is the code that I ran on Amazon Redshift:
SELECT Name, Color, listagg(Food, '/') WITHIN GROUP (ORDER BY Food) as Food FROM data GROUP BY Name, Color ORDER BY Name, Color
It successfully produced the table that I want:
Name
Color
Food
John
Black
Pizza
John
Blue
Pasta/Fries
--
Now let's say the original data table was revised and now has an additional column for Race:
Name
Color
Food
Race
John
Black
Pizza
Asian
John
Blue
Pasta
White
John
Blue
Fries
NULL
The problem I now face is when I run the code stated above, but now including Race:
SELECT Name, Color, Race, listagg(Food, '/') WITHIN GROUP (ORDER BY Food) as Food FROM data GROUP BY Name, Color, Race ORDER BY Name, Color, Race
What happens is it won't aggregate the two rows (Pasta and Fries) anymore because the value for "Race" is different for both rows. How can I modify my code so that since the value of Race for Fries is NULL, I just want to return "White" i.e. I want my final table to look like:
Name
Color
Food
Race
John
Black
Pizza
Asian
John
Blue
Pasta/Fries
White
This isn't as simple as the title. (What is an appropriate title?) I think the easiest way to describe my issue is with an example.
My goal is a list of what color balloon’s each child is missing.
Let’s assume table 1 contains the following data in 2 columns:
Child BalloonColor
Sally Yellow
Sally White
Sally Blue
Bob Red
Bob Green
Bob White
This is table 2, also 2 columns.
ColorCode Color
Y Yellow
W White
R Red
B Blue
G Green
P Pink
I need a write a result set that states what color balloon each child needs to have all colors.
Sally, R, Red
Sally, G, Green
Sally, P, Pink
Bob, Y, Yellow
Bob, B, Blue
Bob, P, Pink
My example is small, but assume I have a 1000 children in my table 1 and 75 colors in table 2. How can I check each child, one at a time, is my ultimate question? A Not in query will only yield "P, Pink", but you can see I need it at the child level, not table level.
I'm not a developer, but can write good SQL statements.
MS SQL Server 2008 R2.
Thanks in advance, Mike.
SELECT
SQ.child_name,
BC.balloon_color
FROM
(
SELECT DISTINCT
child_name
FROM
Child_Balloons
) SQ
CROSS JOIN Balloon_Colors BC
WHERE
NOT EXISTS (
SELECT *
FROM Child_Balloons CB
WHERE
CB.child_name = SQ.child_name AND
CB.balloon_color = BC.balloon_color
)
I am trying to query on a table. Let's say there 3 columns, color, car brand, and car model.
I want to exclude all blue and green colors cars, and no Honda Civic. This is the where statement I have below:
color not in ("blue", "green")
and (carBrand not = "Honda" and carModel not = "Civic").
I tried using the above statement, and it actually excludes all Honda, all Civic, and exclude any color that is blue or green. I've decided to break the query into two pieces and running them in sequence below:
color not in ("blue","green")
Then from that list run below:
carBrand not ="Honda" and carModel not = "Civic"
My question is, can I have done the above action with one query rather two queries.
FYI, I like Honda Civics, just thought it was a good example since a lot of people know that car model.
Thanks in advance.
Added new comments below for ease of reference.
I tried using the codes below. The code excludes the scenario of
Red, Honda, Accord. I have also added a sample table of the data for reference. Thanks for the help.
SELECT Color, CarBrand, CarModel
FROM ColorCarModel
WHERE (
(Color Not In ('green','blue'))
AND (CarBrand Not In 'Honda')
AND (CarModel Not In 'Civic')
);
Example data:
ID Color CarBrand CarModel
1 Green Honda Civic
2 Blue Honda Civic
3 Red Honda Civic
4 Green Ford Civic
5 Blue Ford Taurus
6 Red Ford Taurus
7 Red Honda Accord
8 Red Ford Explorer
SELECT Color, CarBrand, CarModel FROM YourTable
WHERE Color NOT IN ('green','blue')
AND CarBrand NOT IN ('Honda')
AND CarModel NOT IN ('Civic');
Try this and let me know if this helps you.
I didn't get what you wanted to exlude based on the 3 filters ... try this
where not (color in ('blue','green') and carBrand ='Honda' and carModel = 'Civic')
The problem with your original where clause is that you applied the not in operator separately to the populations of CarBrand and CarModel. You had:
(CarBrand Not In ("Honda")) and (CarModel Not In ("Civic"))
Instead, you want:
Not(CarBrand = "Honda" and CarModel = "Civic")
Here is a working example:
data ColorCarModel;
format id best. color $20. CarBrand $20. CarModel $20.;
input id color CarBrand CarModel;
datalines;
1 Green Honda Civic
2 Blue Honda Civic
3 Red Honda Civic
4 Green Ford Civic
5 Blue Ford Taurus
6 Red Ford Taurus
7 Red Honda Accord
8 Red Ford Explorer
9 Red Dummy Civic
run;
proc sql;
select color, CarBrand, CarModel
from ColorCarModel
where (
upcase(color) not in ('GREEN','BLUE') AND
NOT (upcase(CarBrand) = 'HONDA' AND upcase(CarModel) = 'CIVIC')
);
quit;
Notice also that I apply upcase() to the variables before comparing them to strings. This prevents problems caused by mixed case in the data set (i.e. where CarBrand = "Honda" would miss observations where CarBrand contains "honda").
If it's easier to think about the logic without any nesting, the following SQL provides an equivalent filter via concatenation of CarModel and CarBrand:
proc sql;
select color, CarBrand, CarModel
from ColorCarModel
where (
upcase(color) not in ('GREEN','BLUE') AND
upcase(catx('-', CarBrand, CarModel)) <> 'HONDA-CIVIC'
);
quit;
Because these filters are essentially propositional logic, a good tutorial or text on that subject could be a helpful reference. In particular, learning rules like De Morgan's law can help you think through how to transform a where clause to a different, equivalent form, which can be helpful in simplifying complex logic. If you're feeling ambitious, Discrete Mathematics by Rosen is a standard text used in CS curricula that includes an early chapter on propositional logic.
Say I have a Person table that stores information about that person (weird right?). I have select boxes for things like gender, hair color, and eye color. Instead of creating separate tables with a description field for each, is there a good way to use a single table? Maybe a Resources table with a Name and Description fields? Is it just that simple?
Resources
=========
ID Name Description
--------------------
1 Gender Male
2 Gender Female
3 Eye Color Blue
4 Eye Color Green
5 Eye Color Brown
6 Hair Color Black
7 Hair Color Brunette
8 Hair Color Blonde
9 Hair Color Red
Person
=========
ID Name Gender Eye_Color Hair_Color
-----------------------------------------------
1 Ryan 1 3 8
Is this the recommended way or is there something better for this?
Yes it is that simple, IMO your approach is correct. But please note you approach will not work if you get to select Ex: multiple hair colors for one person.
But I believe keeping code simple until you get a requirement to change it, read about YAGNI when u have some time :)
You could do it that way and it would be a polymorphic association.
If you don't need to query this information but just be able to access it you can use serialize and just store all the values in one column.
So a person record would have a column, let's call it attributes, that would have "eye_color: blue, gender: male", etc...
I'd create a separate table called Physical_attributes and an assossiative one between Person and Physical_attributes, personal_physical_attributes, where I'd store the person's id, the Physical_attribute's id and the description for that Physical_attribute.
I have an addition SQL question, hopefully someone here can give me a hand.
I have the following mysql table:
ID Type Result
1 vinyl blue, red, green
1 leather purple, orange
2 leather yellow
and i am seeking the following output:
ID Row One Row Two
1 vinyl blue, red, green leather purple, orange
2 leather yellow
the thing is... type is not static... there are many different types and not all of them have the same ones. They need to follow in order.
Please post a show create table of your table. It's not clear what you mean in fact.
Maybe what you need is GROUP_CONCAT after all:
mysql> select ID, GROUP_CONCAT(type,' ',result) from test;
Let us know.