Working with sets of rows in (My)SQL and comparing values - sql

I am trying to figure out the SQL for doing some relatively simple operations on sets of records in a table but I am stuck. Consider a table with multiple rows per item, all identified by a common key.
For example:
serial model color
XX1 A blue
XX2 A blue
XX3 A green
XX5 B red
XX6 B blue
XX1 B blue
What I would for example want to do is:
Assuming that all model A rows must have the same color, find the rows which dont. (for example, XX3 is green).
Assuming that a given serial number can only point to a single type of model, find out the rows which that does not occur (for example XX1 points both to A and B)
These are all simple logically things to do. To abstract it, I want to know how to group things by using a single key (or combination of keys) and then compare the values of those records.
Should I use a join on the same table? should i use some sort of array or similar?
thanks for your help

For 1:
SELECT model, color, COUNT(*) AS num FROM yourTable GROUP BY model, color;
This will give you a list of each model and each color for that model along with the count. So the output from your dataset would be:
model color num
A blue 2
A green 1
B red 1
B blue 2
From this output you can easily see what's incorrect and fix it using an UPDATE statement or do a blanket operation where you assign the most popular color to each model.
For 2:
SELECT serial, COUNT(*) AS num FROM yourTable GROUP BY serial HAVING num > 1
The output for this would be:
serial num
XX1 2

To address #1, I would use a self-join (a join on the same table, as you put it).
For example,
select *
from mytable
where serial in (select serial
from mytable
group by model, color
having count(*) = 1)
would find all the serial numbers that only exist in one color. I did not test this, but I hope you see what it does. The inner select finds all the records that only occur once, then the outer select shows all detail for those serials.
Of course, having said that, this is a poor table design. But I don't think that was your question. And I hope this was a made up example for a real situation. My concern would be that there is no reason to assume that the single occurrence is actually bad -- it could be that there are 10 records, all of which have a distinct color. This approach would tell you that all of them are wrong, and you would be unable to decide which was correct.

Related

Counting from different categories within the same query

I am trying to make a query from a table in Access that would give me totals for different types of product based off of 2 categories, all within one query. For example my Table looks as follows:
Type
Description 1
Description 2
Date
New
Shiny
Black
1/1/2022
New
Black
Dull
1/1/2022
Old
Shiny
Grey
1/1/2022
Old
Grey
Dull
1/1/2022
The query results that I want to receive are as follows:
Description
New
Old
Shiny
1
1
Black
2
0
Dull
1
1
Grey
0
2
The dataset that I am working with isn't as clean as my example shown here and is causing some of the issues. I never had an issue with the code running, but I just felt that there had to be an easier way that I was missing.
They way I was doing it originally just turned into a bunch of separate query's and was messy to get around. I essentially wrote a query to separate the table into new and old types. From there I used a bunch of
SUM(IIF( Description 1 = "x" OR Description 2 = "x") AS X
SUM(IIF( Description 1 = "y" OR Description 2 = "y") AS Y
expressions to count my totals for each of the objects. This would give me a query where all the totals were displayed in columns. Then I created a separate query to join these data sets together into a presentable manner, but it was turning into too much for how many different "types" I had.
I was just looking for a way to combine all of this into 1 query that would make pulling reports much easier.
Strongly advise not to use space in naming convention nor reserved words as names. Date is a reserved word.
Consider:
Query1
SELECT Type, Description1 AS D, [Date], 1 AS Category FROM Table1
UNION SELECT Type, Description2, [Date], 2 FROM Table1;
UNION will not allow duplicate rows. Use UNION ALL to include all records, even if there are duplicates. There is no query designer or wizard for UNION - must type or copy/paste in SQLView of query builder.
Query2
TRANSFORM Nz(Count(Query1.Category),0) AS CountOfCategory
SELECT Query1.D
FROM Query1
GROUP BY Query1.D
PIVOT Query1.Type;

How do I SELECT minimum set of rows to cover all possible values of each columns in SQL?

I am running a SQL query to get data from a table to map all different possible values of all categories represented by each columns.
How do I run the SELECT query such that it returns the minimum number of rows just enough to include all possible values of all columns?
For example, if I have a table of 10 rows and 3 columns, each column containing 3 possible values:
TABLE sales
--------------------------------
brandID color size
--------------------------------
2 red big
3 blue big
2 blue big
2 red small
2 blue medium
3 green small
3 red big
1 green medium
2 red medium
2 blue big
Of course I could SELECT all rows from table without filter, but that would be an expensive query of 10 rows.
However, as you can see, if we filter the SELECT query to only return the following rows below, it is possible to cover all the possible values of all columns:
1,2,3 for brandID
red,blue,green for color
big,small,medium for size
--------------------------------
brandID color size
--------------------------------
3 blue big
2 red small
1 green medium
How do I do that in SQL query?
This one does what you expect:
select b.brandid, c.color, s.size
from (
select brandid, row_number() over (order by brandid) as rn
from sales
group by brandid
) b
full join (
select color, row_number() over (order by color) as rn
from sales
group by color
) c on b.rn = c.rn
full join (
select size, row_number() over (order by size) as rn
from sales
group by size
) s on b.rn = s.rn;
Online example: https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=e72e7d1dfed43825025c5703b5d3671a
But this only works properly, if you have the same number of (distinct) brands, colors and sizes. If you have e.g. 5 brands, 6 colors and 7 sizes the result is rather "strange":
https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=4417a4d97ecf7601364f09d65f6522fa
First, a query that returns ten rows is not "expensive".
Second, this is a very hard problem. It involves looking at all combinations of rows to see if the set has all combinations of columns. I suspect that any algorithm will need to basically search through all possible combinations -- although there may be some efficiencies, such as automatically including all rows with a unique value in any column.
As a hard problem involving comparing zillions of sets, SQL is not really an appropriate language for addressing the issue.
This is a rather weird requirement... But you might try something along this:
DECLARE #sales TABLE(BrandID INT, color VARCHAR(10),size VARCHAR(10));
INSERT INTO #sales VALUES
(2,'red', 'big'),
(3,'blue', 'big'),
(2,'blue', 'big'),
(2,'red', 'small'),
(2,'blue', 'medium'),
(3,'green', 'small'),
(3,'red', 'big'),
(1,'green', 'medium'),
(2,'red', 'medium'),
(2,'blue', 'big');
WITH AllBrands AS (SELECT ROW_NUMBER() OVER(ORDER BY BrandID) AS RowInx, BrandID FROM #sales GROUP BY BrandID)
,AllColors AS (SELECT ROW_NUMBER() OVER(ORDER BY color) AS RowInx, color FROM #sales GROUP BY color)
,AllSizes AS (SELECT ROW_NUMBER() OVER(ORDER BY size) AS RowInx, size FROM #sales GROUP BY size)
SELECT COALESCE(b.RowInx,c.RowInx,s.RowInx) AS RowInx
,b.BrandID
,c.color
,s.size
FROM AllBrands b
FULL OUTER JOIN AllColors c ON COALESCE(b.RowInx,c.RowInx)=c.RowInx
FULL OUTER JOIN AllSizes s ON COALESCE(b.RowInx,c.RowInx,s.RowInx)=s.RowInx;
This solution is similar to #a_horse_with_no_name's, but avoids gaps in the result in case of unequal counts of values per column.
The idea in short:
We create a numbered set of all distinct values per column and join all sets on this number. As we don't know the count in advance I use COALESCE to pick the first value, which is not null.
This is not a good problem if you demand ONE AND ONLY ONE query and ONE AND ONLY ONE of each result set, and ONE AND ONLY ONE instance of each result. As Gordon Linoff accurately put: that is not a problem for SQL.I get that maybe you have a MUCH larger table, but he's absolutely right.
But add another layer, and you can have exactly what you want, with all the efficiency you want, and a readable output. Use a cursor and some basic SELECT from dynamic SQL with a SELECT columns.name from sys.tables JOIN sys.columns ON tables.object_id = columns.object_id, if you absolutely have to do this with TSQL alone.
And if you're willing to build a basic application with any framework with a SQL driver, you can just SELECT DISTINCT FROM < and put the various results into arrays.
Alternatively: reword your question, with the understanding that the results of any SQL query are gonna be x rows by x columns. Not an array for each column.
I think your example confuses things by having exactly 3 values for each field, which makes the requested result seem like a reasonable thing to expect. But what happens when two more brands are added, or a new colour? Then what would you expect to be returned?
Really you are asking three questions, so I feel this should be done as three queries:
"What are the different brands?"
"What are the different colours?"
"What are the different sizes?"
If they need to be displayed in a neat table, stitch them together afterwards in your application layer. You could maybe do it in the SQL with something like a_horse_with_no_name suggests, but really its the wrong place.

SQL JOIN with CASE statement result

Is there any way of joining the result of a case statement with a reference table without creating a CTE, ect.
Result AFTER CASE statement:
ID Name Bonus Level (this is the result of a CASE statement)
01 John A
02 Jim B
01 John B
03 Jake C
Reference table
A 10%
B 20%
C 30%
I want to then get the % next to each employee, then the max %age using the MAX function and grouping by ID, then link it back again to the reference so that each employee has the single correct (highest) bonus level next to their name. (This is a totally fictitious scenario, but very similar to what I am looking for).
Just need help with joining the result of the CASE statement with the reference table.
Thanks in advance.
In place of a temporary value as the result of the case statement, you could use a select statement from the reference table.
So if your case statement looks like:
case when variable1=value then bonuslevel =A
Then, replacing it like this might help
case when variable1=value then (select percentage from ReferenceTable where variable2InReferenceTable=A)
Don't know if I am overly simplifying, but based on the results of your case result query, why not just join that to the reference table, and do a max grouped by ID/Name. Since the ID and persons name wont change anyhow since they are the same person, you are just getting the max you want. To complete the Bonus level, rejoin just that portion after the max percentage determined for the person.
select
lvl1.ID,
lvl1.Name,
lvl1.FinalBonus,
rt2.BonusLvl
from
( select
PQ.ID,
PQ.Name,
max( rt.PcntBonus ) as FinalBonus
from
(however you
got your
data query ) PQ
JOIN RefTbl rt
on PQ.BonusLvl = rt.BonusLvl
) lvl1
JOIN RefTbl rt2
on lvl1.FinalBonus = rt2.PcntBonus
Since the Bonus levels (A,B,C) do not guarantee corresponding % levels (10,20,30), I did it this way... OTHERWISE, you could have just used max() on both the bonus level and percent. But what if your bonus levels were listed as something like
Limited 10%
Aggressive 20%
Ace 30%
You could see that a max of the level above would have "Limited", but the max % = 30 is associated with an "Ace" sales rep... Get the 30% first, then see what the label that matched that is.

How to select group of distinct rows from the table

Let's assume, a table has the following rows
ID Name Value
1 Apple Red
1 Taste Sour
2 Apple Yellow
2 Taste Sweet
3 Apple Red
3 Taste Sour
4 Apple Green
4 Taste Tart
5 Apple Yellow
5 Taste Sweet
I wonder, how can I select ID's corresponding to distinct combination of Apple and Taste? For example, ID=1 corresponds to red sour apple and ID=3 can be omitted in the query result. Similarly, ID=2 is for yellow sweet apple and ID=5 can be excluded from the query result, etc. A valid query result can be any of the following ID sets: (1,2,4), (1,4,5), (2,3,4) etc.
The query or the model could be improved with more understanding of the problem.
But assuming the model is correct and the problem is presented as this, this would be my quick approach.
SELECT MIN(a.ID) as ID
FROM Table a
INNER JOIN Table b ON a.ID = b.ID AND a.Name > b.Name
GROUP BY a.Value, b.Value
This query is joining the table with itself using the ID. But because you would have four lines for each possible combination (Ex.: Apple-Apple, Taste-Taste, Apple-Taste and Taste-Apple), you need to state not only that they are different (Because you would still have Apple-Taste and Taste-Apple) but that one of them is bigger than the other (That way you choose to have Apples on one side of the join and Tastes in the other). That's why there is the a.Name > b.Name.
You then group by both the values, stating that you don't want to have more than one combination of Apple values and Taste values. Resulting in only three lines.
The Select I think it depends of the RDBMS (I used SQL Server syntax), and it's selecting the lowest ID. You don't care, so you could choose Min or Max. Min results in lines with 1,2,4. Max would result in 3,4,5.

SQL list only unique / distinct values

I have a table which contains geometry lines (ways).There are lines that have a unique geometry (not repeating) and lines which have the same geometry (2,3,4 and more). I want to list only unique ones. If there are, for example, 2 lines with the same geometry I want to drop them. I tried DISTINCT but it also shows the first result from duplicated lines. I only want to see the unique ones.
I tried window function but I get similar result (I get a counter on first line from the duplicating ones). Sorry for a newbie question but I'm learning :) Thanks!
Example:
way|
1 |
1 |
2 |
3 |
3 |
4 |
Result should be:
way|
2 |
4 |
That actually worked. Thanks a lot. I also have other tags in this table for every way (name, ref and few other tags) When I add them to the query I loose the segregation.
select count(way), way, name
from planet_osm_line
group by way, name
having count(way) = 1;
Without "name" in the query I get all unique values listed but I want to keep "name" for every line. With this example I stilll get all the lines in the table listed.
To expound on #Nithila answer:
select count(way), way
from your_table
group by way
having count(way) = 1;
You first calculate the rows you want, and then search for the rest of the fields. So the aggregation doesnt cause you problems.
WITH singleRow as (
select count(way), way
from planet_osm_line
group by way
having count(way) = 1
)
SELECT P.*
FROM planet_osm_line P
JOIN singleRow S
ON P.way = S.way
you can group by way and while taking the data out check the count=1.It will give non duplicating data.
#voyteck
As I understood your question you need to get only non duplicating records of way column and for each row you need to show the name is it
If so, you have to put all the column in select statement, but no need to group by all the columns.
select count(way), way, name
from planet_osm_line
group by way
having count(way) = 1;