Combining Rows (that contain NULL) in SQL (Amazon Redshift) - sql

Say I have a data table like this:
Name
Color
Food
John
Black
Pizza
John
Blue
Pasta
John
Blue
Fries
I ran this code to combine ID's 2 and 3 together, so that the value for the "Food" field will show "Pasta/Fries" This is the code that I ran on Amazon Redshift:
SELECT Name, Color, listagg(Food, '/') WITHIN GROUP (ORDER BY Food) as Food FROM data GROUP BY Name, Color ORDER BY Name, Color
It successfully produced the table that I want:
Name
Color
Food
John
Black
Pizza
John
Blue
Pasta/Fries
--
Now let's say the original data table was revised and now has an additional column for Race:
Name
Color
Food
Race
John
Black
Pizza
Asian
John
Blue
Pasta
White
John
Blue
Fries
NULL
The problem I now face is when I run the code stated above, but now including Race:
SELECT Name, Color, Race, listagg(Food, '/') WITHIN GROUP (ORDER BY Food) as Food FROM data GROUP BY Name, Color, Race ORDER BY Name, Color, Race
What happens is it won't aggregate the two rows (Pasta and Fries) anymore because the value for "Race" is different for both rows. How can I modify my code so that since the value of Race for Fries is NULL, I just want to return "White" i.e. I want my final table to look like:
Name
Color
Food
Race
John
Black
Pizza
Asian
John
Blue
Pasta/Fries
White

Related

SQL row names and row flags

I have trouble understanding row flags. The below question can clear it for me:
Is it possible to store a name and its flag in the same cell in SQL?
Consider:
If you have a table known as cars with the columns number_plate, colour, and brand_name. The brand_name has a name and a flag.
How would one store that in a single column? If it is not possible or advised, explain why and how to do it.
How would you then get the number of cars from a given country (based on the unique number_plate(primary key)) and the country flag?
I think you are trying to design a schema but haven't quite got the hang of foreign keys.
In your example, you'd have the following tables:
country:
country_id name continent
-----------------------------------
1 Germany Europe
2 Japan Asia
3 USA N.America
Brand
Brand_id name country_id (foreign key)
---------------------------------------------
1 Mercedes 1
2 Toyota 2
3 BMW 1
4 Chrysler 3
Car
Number_plate colour brand_id
------------------------------------------
xxx-yy-zz Green 1
aa-bb-cc Red 1
kkk-l-mmm Orange 2
....
To find the number of cars, based on the country where the brand is based, you'd do something like:
select country.name,
count(*)
from car
inner join brand on car.brand_id = brand.brand_id
inner join country on brand.country_id = country.country_id
group by country.name
Let's say name and flag are two separate columns. Using concat function they can be stored into a single column named brand_name.
select number_plate, colour, concat(name,' ',flag) as brand_name from cars
To get the count of cars(unique) based on a flag
select * from
(select
distinct number_plate,
colour,
concat(name,' ',flag) as brand_name from cars
) a
where brand_name like '%UK%'
Demo

SQL Server Data in one table, but missing from another

This isn't as simple as the title. (What is an appropriate title?) I think the easiest way to describe my issue is with an example.
My goal is a list of what color balloon’s each child is missing.
Let’s assume table 1 contains the following data in 2 columns:
Child BalloonColor
Sally Yellow
Sally White
Sally Blue
Bob Red
Bob Green
Bob White
This is table 2, also 2 columns.
ColorCode Color
Y Yellow
W White
R Red
B Blue
G Green
P Pink
I need a write a result set that states what color balloon each child needs to have all colors.
Sally, R, Red
Sally, G, Green
Sally, P, Pink
Bob, Y, Yellow
Bob, B, Blue
Bob, P, Pink
My example is small, but assume I have a 1000 children in my table 1 and 75 colors in table 2. How can I check each child, one at a time, is my ultimate question? A Not in query will only yield "P, Pink", but you can see I need it at the child level, not table level.
I'm not a developer, but can write good SQL statements.
MS SQL Server 2008 R2.
Thanks in advance, Mike.
SELECT
SQ.child_name,
BC.balloon_color
FROM
(
SELECT DISTINCT
child_name
FROM
Child_Balloons
) SQ
CROSS JOIN Balloon_Colors BC
WHERE
NOT EXISTS (
SELECT *
FROM Child_Balloons CB
WHERE
CB.child_name = SQ.child_name AND
CB.balloon_color = BC.balloon_color
)

With Rails, can I use a single table to store simple values (gender, hair color, eye color)?

Say I have a Person table that stores information about that person (weird right?). I have select boxes for things like gender, hair color, and eye color. Instead of creating separate tables with a description field for each, is there a good way to use a single table? Maybe a Resources table with a Name and Description fields? Is it just that simple?
Resources
=========
ID Name Description
--------------------
1 Gender Male
2 Gender Female
3 Eye Color Blue
4 Eye Color Green
5 Eye Color Brown
6 Hair Color Black
7 Hair Color Brunette
8 Hair Color Blonde
9 Hair Color Red
Person
=========
ID Name Gender Eye_Color Hair_Color
-----------------------------------------------
1 Ryan 1 3 8
Is this the recommended way or is there something better for this?
Yes it is that simple, IMO your approach is correct. But please note you approach will not work if you get to select Ex: multiple hair colors for one person.
But I believe keeping code simple until you get a requirement to change it, read about YAGNI when u have some time :)
You could do it that way and it would be a polymorphic association.
If you don't need to query this information but just be able to access it you can use serialize and just store all the values in one column.
So a person record would have a column, let's call it attributes, that would have "eye_color: blue, gender: male", etc...
I'd create a separate table called Physical_attributes and an assossiative one between Person and Physical_attributes, personal_physical_attributes, where I'd store the person's id, the Physical_attribute's id and the description for that Physical_attribute.

SELECT datafields with multiple groups and sums

I cant seem to group by multiple data fields and sum a particular grouped column.
I want to group Person to customer and then group customer to price and then sum price. The person with the highest combined sum(price) should be listed in ascending order.
Example:
table customer
-----------
customer | common_id
green 2
blue 2
orange 1
table invoice
----------
person | price | common_id
bob 2330 1
greg 360 2
greg 170 2
SELECT DISTINCT
min(person) As person,min(customer) AS customer, sum(price) as price
FROM invoice a LEFT JOIN customer b ON a.common_id = b.common_id
GROUP BY customer,price
ORDER BY person
The results I desire are:
**BOB:**
Orange, $2230
**GREG:**
green, $360
blue,$170
The colors are the customer, that GREG and Bob handle. Each color has a price.
There are two issues that I can see. One is a bit picky, and one is quite fundamental.
Presentation of data in SQL
SQL returns tabular data sets. It's not able to return sub-sets with headings, looking something a Pivot Table.
The means that this is not possible...
**BOB:**
Orange, $2230
**GREG:**
green, $360
blue, $170
But that this is possible...
Bob, Orange, $2230
Greg, Green, $360
Greg, Blue, $170
Relating data
I can visually see how you relate the data together...
table customer table invoice
-------------- -------------
customer | common_id person | price |common_id
green 2 greg 360 2
blue 2 greg 170 2
orange 1 bob 2330 1
But SQL doesn't have any implied ordering. Things can only be related if an expression can state that they are related. For example, the following is equally possible...
table customer table invoice
-------------- -------------
customer | common_id person | price |common_id
green 2 greg 170 2 \ These two have
blue 2 greg 360 2 / been swapped
orange 1 bob 2330 1
This means that you need rules (and likely additional fields) that explicitly state which customer record matches which invoice record, especially when there are multiples in both with the same common_id.
An example of a rule could be, the lowest price always matches with the first customer alphabetically. But then, what happens if you have three records in customer for common_id = 2, but only two records in invoice for common_id = 2? Or do the number of records always match, and do you enforce that?
Most likely you need an extra piece (or pieces) of information to know which records relate to each other.
you should group by using all your selected fields except sum then maybe the function group_concat (mysql) can help you in concatenating resulting rows of the group clause
Im not sure how you could possibly do this. Greg has 2 colors, AND 2 prices, how do you determine which goes with which?
Greg Blue 170 or Greg Blue 360 ???? or attaching the Green to either price?
I think the colors need to have unique identofiers, seperate from the person unique identofiers.
Just a thought.

Reduce and Merge duplicate rows

I have a table with the following sampled data:
Name Color
Alice Green
Bob Black
Chris Green
Chris Black
David Red
Peter Blue
Simon Blue
Simon Red
Simon Green
Ultimately, I want to reduce the table by consolidating the Color column like:
Name Color
Alice Green
Bob Black
Chris Green, Black
David Red
Peter Blue
Simon Blue, Red, Green
such that Name can become unique.
The table has no PRIMARY KEY, I got as far as creating a new column using ROW_NUMBER to distinguishing duplicates but don't know what to do next.:
rownumber Name Color
1 Alice Green
1 Bob Black
1 Chris Green
2 Chris Black
1 David Red
1 Peter Blue
1 Simon Blue
2 Simon Red
3 Simon Green
Don't do this. Instead, normalize your tables further to e.g. a Person, Preference and a Color table (where Preference, if that is the right name for the relation, has foreign keys to Person and Color). This way, you avoid the risks of inconsistencies (you can make Person names unique if you like, but you should make Color names unique).
EDITED: if you're getting this from a join query I'll assume the data is reasonably consistent, so normalization isn't an issue. Would it be possible to change the join query to GROUP on Name instead? Much cleaner than hacking around a result set, really!
I have adopted the approach here
with a table variable to hold the temporary result set to work(hack) from.
Sorted!