How to tell in a Query I don't want duplicates? - sql

So I got this query and it's pulling from tables like this:
Plantation TABLE
PLANT ID, Color Description
1 Red
2 Green
3 Purple
Vegetable Table
VegetabkeID, PLANT ID, Feeldesc
199 1 Harsh
200 1 Sticky
201 2 Bitter
202 3 Bland
and now in my Query I join them using PLANT ID ( I Use a left join)
PLANT ID, Color Description, Feeldesc
1 Red Harsh
1 Red Sticky
2 Green Bitter
3 Purple Bland
So the problem is that in the Query you can see Red shows up twice! I can't have this, and
I'm not sure how to make the joins happen but stop reds from coming up twice.

It seems remotely possible that you're asking how do to group indication -- that is, showing a value which identifies or describes a group only on the first line of that group. In that case, you want to use the lag() window function.
Assuming setup of the schema and data is like this:
create table plant (plantId int not null primary key, color text not null);
create table vegetable (vegetableId int not null, plantId int not null,
Feeldesc text not null, primary key (vegetableId, plantId));
insert into plant values (1,'Red'),(2,'Green'),(3,'Purple');
insert into vegetable values (199,1,'Harsh'),(200,1,'Sticky'),
(201,2,'Bitter'),(202,3,'Bland');
The results you show (modulus column headings) could be obtained with this simple query:
select p.plantId, p.color, v.Feeldesc
from plant p left join vegetable v using (plantId)
order by plantId, vegetableId;
If you're looking to suppress display of the repeated information after the first line, this query will do it:
select
case when plantId = lag(plantId) over w then null
else plantId end as plantId,
case when p.color = lag(p.color) over w then null
else p.color end as color,
v.Feeldesc
from plant p left join vegetable v using (plantId)
window w as (partition by plantId order by vegetableId);
The results look like this:
plantid | color | feeldesc
---------+--------+----------
1 | Red | Harsh
| | Sticky
2 | Green | Bitter
3 | Purple | Bland
(4 rows)
I had to do something like the above just this week to produce a listing directly out of psql which was easy for the end user to read; otherwise it never would have occurred to me that you might be asking about this functionality. Hopefully this answers your question, although I might be completely off base.

Check array_agg function in the documentation it can be used something like this:
SELECT
v.plantId
,v.color
,array_to_string(array_agg(v.Feeldesc),', ')
FROM
vegetable
INNER JOIN plant USING (plantId)
GROUP BY
v.plantId
,v.color
or use
SELECT DISTINCT
v.plantId
,v.color
FROM
vegetable
INNER JOIN plant USING (plantId)
disclaimer: hand written, syntax errors expected :)

Related

Lookup from a comma separated SQL column

I have a user table which has comma separated ids in one of the columns, like:
Id
Name
PrimaryTeamId
SecondaryTeamIds
1
John
123
456,789,669
2
Ringo
123
456,555
and a secondary table which contains the team names
Id
TeamId
TeamName
1
456
Red Team
2
669
Blue Team
3
789
Purple Team
4
555
Black Team
5
123
Orange Team
I'm trying to create a view which gives the following format:
Name
Primary Team
Secondary Teams
John
Orange Team
Red Team, Purple Team, Blue Team
Ringo
Orange Team
Red Team, Black Team
I have created
select
u.Name,
t.TeamName as 'Primary Team'
SELECT ... ?? as 'Secondary Teams'
from
users u
inner join teams t on u.PrimaryTeamId = t.TeamId
I've tried numerous things but can't seem to put it together. I can't seem to find the same use case here or elsewhere. I do control the data coming in so I could parse those values out relationally to begin with or do some kind of lookup on the ETL side, but would like to figure it out.
If the sequence of Secondary Teams is essential, you can parse the string via OpenJSON while preserving the sequence [key].
Then it becomes a small matter of string_agg()
Example or dbFiddle
Select A.ID
,A.Name
,PrimaryTeam = C.TeamName
,B.SecendaryTeams
from YourTable A
Cross Apply (
Select SecendaryTeams = string_agg(B2.TeamName,', ') within group (order by B1.[Key])
From OpenJSON( '["'+replace(string_escape([SecondaryTeamIds],'json'),',','","')+'"]' ) B1
Join YourTeams B2 on B1.Value=B2.TeamID
) B
Join YourTeams C on A.[PrimaryTeamId]=C.TeamId
Results
ID Name PrimaryTeam SecendaryTeams
1 John Orange Team Red Team, Purple Team, Blue Team
2 Ringo Orange Team Red Team, Black Team
I played around with this a little bit and I found you can do it using two functions, STRING_SPLIT and STRING_AGG.
STRING_SPLIT allows you to convert a NVARCHAR and split it into a table where each row is a value, and STRING_AGG allows you to do the opposite, Join a table into a NVARCHAR. Then I just used a JOIN in between.
Maybe its not the cleanest solution but it does the job. Also its a bit inefitient but using native functions instead of loops help a lot.
I attach a working example. In this online editor I just had one table so I joined it with itself but it must work joining with other tables.
SELECT
*,
(
SELECT STRING_AGG(CAST(Val AS NVARCHAR), ', ') -- concatenates the rows together
FROM
(
SELECT demo.hint AS Val
FROM STRING_SPLIT((SELECT d.name FROM demo as d where id = demo.id), ',') -- splits the row by comas
JOIN demo ON value = demo.id -- joins so that the values are replaced with names
) Vals
) as JointValues -- name of the column with the joint values
FROM demo

In PostgreSQL, count how many observations match a list of criteria in another table

Suppose I have a table, cars, which looks like this:
id | model | car_color
----+--------+--------
01 | Camry | blue
02 | Elantra| red
03 | Sienna | blue
04 | Camry | fuschia
05 | LX450 | pink
06 | Tundra | lime
Also suppose I have this other table, a non-exhaustive of colors, colors:
colors
-------
blue
red
fuschia
In Postgres (or perhaps in any SQL variant), how can I count how many entries in cars.car_color match any of the entries in colors.colors?
The answer here would be 4, as 'pink' and 'lime' don't appear in the colors table, but I can't get Postgres to spit this back for me. (In what I'm actually working on, the first table has dozens of millions of rows, and the second table I'm checking against has about 100k.) I'm trying things like this, to no avail:
select count(*) from cars
where "car_color" IN (colors.colors)
Here's the error:
[42P01] ERROR: missing FROM-clause entry for table "colors"
My intuition is that this is something about my WHERE statement, but I can't figure out what. Nor can I seem to phrase this in such a way as to get good search results in Google or SX search -- I know I'm not the first (or the 257th) to ask this.
Close. You need a subquery:
select count(*)
from cars c
where c.car_color IN (select co.color from colors co);
Postgres as a good optimizer, but sometimes exists works better:
select count(*)
from cars c
where exists (select 1 from colors co where co.color = c.car_color);
Here is a db<>fiddle.

SQL Spatial Subquery Issue

Greetings Benevolent Gods of Stackoverflow,
I am presently struggling to get a spatially enabled query to work for a SQL assignment I am working on. The wording is as follows:
SELECT PURCHASES.TotalPrice, STORES.GeoLocation, STORES.StoreName
FROM MuffinShop
join (SELECT SUM(PURCHASES.TotalPrice) AS StoreProfit, STORES.StoreName
FROM PURCHASES INNER JOIN STORES ON PURCHASES.StoreID = STORES.StoreID
GROUP BY STORES.StoreName
HAVING (SUM(PURCHASES.TotalPrice) > 600))
What I am trying to do with this query is perform a function query (like avg, sum etc) and get the spatial information back as well. Another example of this would be:
SELECT STORES.StoreName, AVG(REVIEWS.Rating),Stores.Shape
FROM REVIEWS CROSS JOIN
STORES
GROUP BY STORES.StoreName;
This returns a Column 'STORES.Shape' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. error message.
I know I require a sub query to perform this task, I am just having endless trouble getting it to work. Any help at all would be wildly appreciated.
There are two parts to this question, I would tackle the first problem with the following logic:
List all the store names and their respective geolocations
Get the profit for each store
With that in mind, you need to use the STORES table as your base, then bolt the profit onto it through a sub query or an apply:
SELECT s.StoreName
,s.GeoLocation
,p.StoreProfit
FROM STORES s
INNER JOIN (
SELECT pu.StoreId
,StoreProfit = SUM(pu.TotalPrice)
FROM PURCHASES pu
GROUP BY pu.StoreID
) p
ON p.StoreID = s.StoreID;
This one is a little more efficient:
SELECT s.StoreName
,s.GeoLocation
,profit.StoreProfit
FROM STORES s
CROSS APPLY (
SELECT StoreProfit = SUM(p.TotalPrice)
FROM PURCHASES p
WHERE p.StoreID = s.StoreID
GROUP BY p.StoreID
) profit;
Now for the second part, the error that you are receiving tells you that you need to GROUP BY all columns in your select statement with the exception of your aggregate function(s).
In your second example, you are asking SQL to take an average rating for each store based on an ID, but you are also trying to return another column without including that inside the grouping. I will try to show you what you are asking SQL to do and where the issue lies with the following examples:
-- Data
Id | Rating | Shape
1 | 1 | Triangle
1 | 4 | Triangle
1 | 1 | Square
2 | 1 | Triangle
2 | 5 | Triangle
2 | 3 | Square
SQL Server, please give me the average rating for each store:
SELECT Id, AVG(Rating)
FROM Store
GROUP BY StoreId;
-- Result
Id | Avg(Rating)
1 | 2
2 | 3
SQL Server, please give me the average rating for each store and show its shape in the result (but don't group by it):
SELECT Id, AVG(Rating), Shape
FROM Store
GROUP BY StoreId;
-- Result
Id | Avg(Rating) | Shape
1 | 2 | Do I show Triangle or Square ...... ERROR!!!!
2 | 3 |
It needs to be told to get the average for each store and shape:
SELECT Id, AVG(Rating), Shape
FROM Store
GROUP BY StoreId, Shape;
-- Result
Id | Avg(Rating) | Shape
1 | 2.5 | Triangle
1 | 1 | Square
2 | 3 | Triangle
2 | 3 | Square
As in any spatial query you need an idea of what your final geometry will be. It looks like you are attempting to group by individual stores but delivering an average rating from the subquery. So if I'm reading it right you are just looking to get the stores shape info associated with the average ratings?
Query the stores table for the shape field and join the query you use to get the average rating
select a.shape
b.*
from stores a inner join (your Average rating query with group by here) b
on a.StoreID = b.Storeid

Combining two records via Group By

I'm still new and learning in Access vba and appreciate if you can help me with my current scenario.
I have developed a code in VBA which pull the data from a table named Tblsrce
sqlStr = "SELECT zYear, zMonth, Product, Sum(Dollar) as totalAmt FROM Tblsrce "& _
"WHERE fruits IN (NOT NULL, '" & Replace(strFruits, ", ", "', '")
"GROUP BY zYear, zMonth, Product;"
The usual data that the field fruits contains Mango, Apples, Cherry, Banana, etc.
strFruits is a variable that came from users (which is separated by comma if they want to pull more than 1 fruit).
However, I got a problem with it when there are 2 related fruits with different name (e.g. Red Apple and Green Apple) which i need to combine. Is there any way I can Group By those records and tag them as Apples in the current query that i have?
Thanks!
Yes, you could use conditionals like the switch function to calculate some fruit group field.
Switch(
Product='Red Apple', 'Apple'
Product='Green Apple', 'Apple'
Product='Orange', 'Citrus') As ProductGroup
You can then use that field in a higher level query:
Select zYear, zMonth, ProductGroup,
Count(*)
From
(Select f.*,
Switch( .... )
From Fruits f)
Group By zYear, zMonth, ProductGroup
Of course it would be easier if this data isn't calculated dynamically in the query like this, but instead is stored in a separate table, so you know a product group for each of the products. That's also way easier to maintain (just add data instead of modify a query), and probably performs better.
You could, but you would have to have an additional table where you list all fruits, and their groups. Then you can join that in, and group by the groups.
Sample structure:
Fruit | FruitCategory
+-------------+---------------+
| Red apple | Apple |
+-------------+---------------+
| Green apple | Apple |
+-------------+---------------+
| Banana | Banana |
+-------------+---------------+
You can prepopulate the table with a quick SELECT DISTINCT Fruits from Tblsrce and insert that in both columns, and then adjust the categories where you want.

Sql Ordering Hiarchy

I am working on a SQL Statement that I can't seem to figure out. I need to order the results alphabetically, however, I need "children" to come right after their "parent" in the order. Below is a simple example of the table and data I'm working with. All non relevant columns have been removed. I'm using SQL Server 2005. Is there an easy way to do this?
tblCats
=======
idCat | fldCatName | idParent
--------------------------------------
1 | Some Category | null
2 | A Category | null
3 | Top Category | null
4 | A Sub Cat | 1
5 | Sub Cat1 | 1
6 | Another Cat | 2
7 | Last Cat | 3
8 | Sub Sub Cat | 5
Results of Sql Statement:
A Category
Another Cat
Some Category
A Sub Cat1
Sub Cat 1
Sub Sub Cat
Top Category
Last Cat
(The prefixed spaces in the result are just to add in understanding of the results, I don't want the prefixed spaces in my sql result. The result only needs to be in this order.)
You can do it with a hierarchical query, as below.
It looks a lot more complicated than it is, due to the lack of a PAD funciton in t-sql. The seed of the hierarchy are the categories without parents. The fourth column we select is their ranking alphabetically (converted to a string and padded). Then we union this with their children. At each recursion, the children will all be at the same level, so we can get their ranking alphabetically without needing to partition. We can concatenate these rankings together down the tree, and order by that.
;WITH Hierarchy AS (
SELECT
idCat, fldCatName, idParent,
CAST(RIGHT('00000'+
CAST(ROW_NUMBER() OVER (ORDER BY fldCatName) AS varchar(8))
, 5)
AS varchar(256)) AS strPath
FROM Category
WHERE idParent IS NULL
UNION ALL
SELECT
c.idCat, c.fldCatName, c.idParent,
CAST(h.strPath +
CAST(RIGHT('00000'+
CAST(ROW_NUMBER() OVER (ORDER BY c.fldCatName) AS varchar(8))
, 5) AS varchar(16))
AS varchar(256))
FROM Hierarchy h
INNER JOIN Category c ON c.idParent = h.idCat
)
SELECT idCat, fldCatName, idParent, strPath
FROM Hierarchy
ORDER BY strPath
With your data:
idCat fldCatName idParent strPath
------------------------------------------------
2 A Category NULL 00001
6 Another Category 2 0000100001
1 Some Category NULL 00002
4 A Sub Category 1 0000200001
5 Sub Cat1 1 0000200002
8 Sub Sub Category 5 000020000200001
3 Top Category NULL 00003
7 Last Category 3 0000300001
It can be done in CTE... Is this what you're after ?
With MyCats (CatName, CatId, CatLevel, SortValue)
As
( Select fldCatName CatName, idCat CatId,
0 Level, Cast(fldCatName As varChar(200)) SortValue
From tblCats
Where idParent Is Null
Union All
Select c.fldCatName CatName, c.idCat CatID,
CatLevel + 1 CatLevel,
Cast(SortValue + '\' + fldCatName as varChar(200)) SortValue
From tblCats c Join MyCats p
On p.idCat = c.idParent)
Select CatName, CatId, CatLevel, SortValue
From MyCats
Order By SortValue
EDIT: (thx to Pauls' comment below)
If 200 characters is not enough to hold the longest concatenated string "path", then change the value to as high as is needed... you can make it as high as 8000
I'm not aware of any SQL Server (or Ansi-SQL) inherent support for this.
I don't supposed you'd consider a temp table and recursive stored procedure an "easy" way ? J
Paul's answer is excellent, but I thought I would throw in another idea for you. Joe Celko has a solution for this in his SQL for Smarties book (chapter 29). It involves maintaining a separate table containing the hierarchy info. Inserts, updates, and deletes are a little complicated, but selects are very fast.
Sorry I don't have a link or any code to post, but if you have access to this book, you may find this helpful.