More efficient SQL statement to eliminate my n^2 algorithm? - sql

Let's say the following are my SQL tables:
My first table is called [Customer].
CustomerID CustomerName CustomerAddress
---------- ------------ ---------------
1 Name1 1 Infinity Loop
2 Name2 2 Infinity Loop
3 Name3 3 Infinity Loop
My next table is called [Group].
GroupID GroupName
------- ---------
1 Group1
2 Group2
3 Group3
Then, to link the two, I have a table called [GroupCustomer].
GroupID CustomerID
------- ----------
1 2
1 3
2 1
3 1
So on the ASP.NET page, I have two tables I want to display. The first table are essentially all Customers that are in a particular group. So in a drop down list, if I select Group1, it would display the following table:
CustomerID CustomerName CustomerAddress
---------- ------------ ---------------
2 Name2 2 Infinity Loop
3 Name3 3 Infinity Loop
The table above is for all customers that are "associated" with the selected group (which in this case is Group1). Then, in the other table, I want it to display this:
CustomerID CustomerName CustomerAddress
---------- ------------ ---------------
1 Name1 1 Infinity Loop
Essentially, for this table, I want it to display all customers that are NOT in the selected group.
To generate the table for all customers that are in the selected group, I wrote the following SQL:
SELECT Customer.CustomerID, Customer.CustomerName, Customer.CustomerAddress
FROM Customer
INNER JOIN GroupCustomer ON
Customer.CustomerID = GroupCustomer.CustomerID
INNER JOIN [Group] ON
GroupCustomer.GroupID = [Group].GroupID
WHERE [Group].GroupID = #selectedGroupParameter
So when I mentioned my n^2 algorithm, I essentially used the SQL statement above, and compared it against a SQL statement where I just SELECT * from the Customer table. Where there was a match, I just simply had it did not display it. This is incredibly inefficient, and something I'm not proud of.
This leads to my current question, what's the most efficient SQL statement I can write that will eliminate my n^2?

You can use NOT EXISTS to get Customers not in a particular Group:
SELECT *
FROM Customer c
WHERE
NOT EXISTS(
SELECT 1
FROM GroupCustomer
WHERE
CustomerID = c.CustomerID
AND GroupID = #selectedGroupParameter
)
Read this article by Aaron Bertrand for different ways to solve this kind of problem and their performance comparisons, with NOT EXISTS being the fastest according to his test.
SQL Fiddle

Select * from Customer
where CustomerID not in
(select CustomerID
from GroupCustomer
where GroupID = #selectedGroupParameter)
You can use not in for this check. That said, you can probably just get rid of the join to the Group table for some increased performance, since you don't appear to actually use the group name.

Related

Combine multiple rows into single row in SQL View (without group by or with CTE?) [duplicate]

This question already exists:
Combine multiple rows into single row in SQL View with multiple joins [duplicate]
Closed 1 year ago.
I'm trying to create complex view, when I have relation one to many then I want to put these value into single row.
Staff (main table)
ID Name
1 aaa
2 bbb
OtherStaff (table one to many)
ID StaffId Name Value
1 1 xxx 888
2 1 yyy 777
3 2 vvv 333
SomeTable (table one to one)
Id StaffId SomeVal
1 1 qwert
2 2 asd
Result:
ID Name OtherStaff SomeVal
1 aaa xxx, 888; yyy, 777 qwert
2 bbb vvv, 333 asd
View:
CREATE VIEW MyView
AS
SELECT DISTINCT
Staff.Id,
MagicCombine(OtherStaff.Name, OtherStaff.Value) -- pseudocode
SomeTable.SomeVal
FROM
dbo.[Staff] Staff
JOIN dbo.[OtherStaff] OtherStaff ON OtherStaff.StaffId = Staff.Id
JOIN dbo.[SomeTable] SomeTable ON SomeTable.StaffId= Staff.Id
I have read that I can use GroupBy but in fact I will have a lot of JOINS and Columns, but GroupBy requires to put them all into clause. I was thinking about more elegant solution? Can CTE – Common Table Expressions be useful somehow?
Your "magic function" would appear to be STRING_AGG().
The code would then look like this:
CREATE VIEW MyView AS
SELECT Staff.Id,
STRING_AGG(CONCAT(OtherStaff.Name, ', ', OtherStaff.Value), '; ') WITHIN GROUP (ORDER BY OtherStaff.Name),
SomeTable.SomeVal
FROM dbo.[Staff] Staff JOIN
dbo.[OtherStaff] OtherStaff
ON OtherStaff.StaffId = Staff.Id JOIN
dbo.[SomeTable] SomeTable
ON SomeTable.StaffId = Staff.Id
GROUP BY Staff.Id, SomeTable.SomeVal;
Listing unaggregated columns in both the GROUP BY and SELECT should not be too troublesome. After all, you can just cut-and-paste them.

Where statement for exact match on Many to Many SQL tables

I am trying to construct a SQL statement to search in two tables that are in a many to many relation.
Problem : SQL statement to search for products with exact stones.
For example, in the below tables, I need a statement that will search for product with Ruby and Emerald stone ONLY. In all my attempts I get both Ring and Necklace because they both have Ruby and Emerald even though Necklace has one additional stone. It should only give Ring product.
I need a way to implement the AND operator on the stone table so that the result contains products that have the exact stones. Please help.
Table stone
s_id
s_name
1
Ruby
2
Emerald
3
Onyx
Table product
p_id
p_name
1
Ring
2
Necklace
3
Pendent
Relation table - product_stone
p_s_id
p_id
s_id
1
1
1
1
1
2
1
2
1
1
2
2
1
2
3
1
3
3
This is a relational division question. We need to find the cross join of the two tables "divided" by our list, with no remainder i.e. no other stone in product.
We will assume that p_id and s_id are unique:
;WITH StonesToFind AS ( -- we could also use a table variable etc here
SELECT *
FROM stone
WHERE s_name IN ('Ruby','Emerald')
)
SELECT p.p_name
FROM product AS p -- let's get all products...
JOIN product_stone AS ps ON ps.p_id = p.p_id -- ...cross join all their stones
LEFT JOIN StonesToFind AS s ON s.s_id = ps.s_id -- they may have stones in the list
GROUP BY p.p_id, p_name
HAVING COUNT(CASE WHEN s.s_id IS NULL THEN 1 END) = 0
-- the number of non matching stones in product must be zero
AND COUNT(*) = (SELECT COUNT(*) FROM StonesToFind);
-- the total number of stones must be the same as the list

sql combining 2 queries with different order by group by

I have a query where I am counting the most frequent response in a database and ranking them by highest amount so using group by and order by.
The following shows how to do it for one:
select health, count(health) as count
from [Health].[Questionaire]
group by Health
order by count(Health) desc
which outputs the following:
Health Count
----------- -----
Very Good 6
Good 5
Poor 4
I would like to do with another column on the same table another query similar to the following so two queries using one sql statement like the following:
Health Count Diet Count
----------- ----- ----- -----
Very Good 6 Very Good 6
Good 5 Good 4
Poor 4 Poor 3
UPDATE!!
Hello this is how the table looks like at the moment
ID Diet Health
----------- ----- -------
101 Very Good Very Good
102 Poor Good
103 Poor Poor
I would like to do with another column on the same table another query similar to the following so two queries using one sql statement like the following:
Health Count Diet Count
----------- ----- ----- -----
Very Good 2 Very Good 1
Poor 1 Good 1
Good 0 Poor 1
Can anyone please help me out with this one?
Can provide further clarification if needed!
Here are 2 different ways of doing it, notice i removed the redundant column:
Test data:
DECLARE #t table(Health varchar(20), Diet varchar(20))
INSERT #t values
('Very good', 'Very good'),
('Poor', 'Good'),
('Poor', 'Poor')
Query 1:
;WITH CTE1 as
(
SELECT Health, count(*) CountHealth
FROM #t --[Health].[Questionaire]
GROUP BY health
), CTE2 as
(
SELECT Diet, count(*) CountDiet
FROM #t --[Health].[Questionaire]
GROUP BY Diet
)
SELECT
coalesce(Health, Diet) Grade,
coalesce(CountHealth, 0) CountHealth,
coalesce(CountDiet, 0) CountDiet
FROM CTE1
FULL JOIN
CTE2
ON CTE1.Health = CTE2.Diet
ORDER BY CountHealth DESC
Result 1:
Grade CountHealth CountDiet
Poor 2 1
Very good 1 1
Good 0 1
Mixing the results like that is really not good practice, so here is a different solution
Query 2:
SELECT Health, count(*) Count, 'Health' Grade
FROM #t --[Health].[Questionaire]
GROUP BY health
UNION ALL
SELECT Diet, count(*) CountDiet, 'Diet'
FROM #t --[Health].[Questionaire]
GROUP BY Diet
ORDER BY Grade, Count DESC
Result 2:
Health Count Grade
Good 1 Diet
Poor 1 Diet
Very good 1 Diet
Poor 2 Health
Very good 1 Health
You need to join the table to itself, but (as your sample data shows) to deal with gaps in actual data for specific values.
If you have a table that has the range of health/diet values:
select
v.value Status,
count(a.id) healthCount,
count(b.id) DietCount
from health_diet_values v
left join Questionaire a on a.health = v.value
left join Questionaire b on b.diet = v.value
group by v.value
or if you don't have such a table, you need to generate the list of values manually and join from that:
select
v.value Status,
count(a.id) healthCount,
count(b.id) DietCount
from (select 'Very Good' value union all
select 'Good' union all
select 'Poor') v
left join Questionaire a on a.health = v.value
left join Questionaire b on b.diet = v.value
group by v.value
Both of these queries produce zeroes if there is no matching data for the value.
Note that in your desired output you have a redundant column - you repeat the value column. The above queries produce output that looks like:
Status HealthCount DietCount
-------------------------------
Very Good 2 1
Good 1 1
Poor 0 1

Ordering by list of strings in Oracle SQL without LISTAGG

I'm working with two entities: Item and Attribute, which look something like the following:
Item
----
itemId
Attribute
---------
attributeId
name
An Item has Attributes, as specified in an association table:
ItemAttribute
--------------
itemId
attributeId
When this data gets to the client, it will be displayed with a row per Item, and each row will have a list of Attributes by name. For example:
Item Attributes
---- ----------
1 A, B, C
2 A, C
3 A, B
The user will have the option to sort on the Attributes column, so we need the ability to sort the data as follows:
Item Attributes
---- ----------
3 A, B
1 A, B, C
2 A, C
At present, we're getting one row of data per ItemAttribute row. Basically:
SELECT Item.itemId,
Attribute.name
FROM Item
JOIN ItemAttribute
ON ItemAttribute.itemId = Item.itemId
JOIN Attribute
ON Attribute.attributeId = ItemAttribute.attributeId
ORDER BY Item.itemId;
Which produces a result like:
itemId name
------ ----
1 A
1 B
1 C
2 A
2 C
3 A
3 B
The actual ORDER BY clause is based on user input. It's usually a single column, so the ordering is simple, and the app-side loop that processes the result set combines the Attribute names into a comma-separated list for presentation on the client. But when the user asks to sort on that list, it'd be nice to have Oracle sort the results so that -- using the example above -- we'd get:
itemId name
------ ----
3 A
3 B
1 A
1 B
1 C
2 A
2 C
Oracle's LISTAGG function can be used to generate the attribute lists prior to sorting; however Attribute.name can be a very long string, and it is possible that the combined list is greater than 4000 characters, which would cause the query to fail.
Is there a clean, efficient way to sort the data in this manner using Oracle SQL (11gR2)?
There are really two questions here:
1) How to aggregate more than 4000 characters of data
Is it even sensible to aggregate so much data and display it in a single column?
Anyway you will need some sort of large structure to display more than 4000 characters, like a CLOB for example. You could write your own aggregation method following the general guideline described in one of Tom Kyte's thread (obviously you would need to modify it so that the final output is a CLOB).
I will demonstrate a simpler method with a nested table and a custom function (works on 10g):
SQL> CREATE TYPE tab_varchar2 AS TABLE OF VARCHAR2(4000);
2 /
Type created.
SQL> CREATE OR REPLACE FUNCTION concat_array(p tab_varchar2) RETURN CLOB IS
2 l_result CLOB;
3 BEGIN
4 FOR cc IN (SELECT column_value FROM TABLE(p) ORDER BY column_value) LOOP
5 l_result := l_result ||' '|| cc.column_value;
6 END LOOP;
7 return l_result;
8 END;
9 /
Function created.
SQL> SELECT item,
2 concat_array(CAST (collect(attribute) AS tab_varchar2)) attributes
3 FROM data
4 GROUP BY item;
ITEM ATTRIBUTES
1 a b c
2 a c
3 a b
2) How to sort large data
Unfotunately you can't sort by an arbitrarily large column in Oracle: there are known limitations relative to the type and the length of the sort key.
Trying to sort with a clob will result in an ORA-00932: inconsistent datatypes: expected - got CLOB.
Trying to sort with a key larger than the database block size (if you decide to split your large data into many VARCHAR2 for example) will yield an ORA-06502: PL/SQL: numeric or value error: character string buffer too small
I suggest you sort by the first 4000 bytes of the attributes column:
SQL> SELECT * FROM (
2 SELECT item,
3 concat_array(CAST (collect(attribute) AS tab_varchar2)) attributes
4 FROM data
5 GROUP BY item
6 ) order by dbms_lob.substr(attributes, 4000, 1);
ITEM ATTRIBUTES
3 a b
1 a b c
2 a c
As Vincent already said, sort keys are limited (no CLOB, max block size).
I can offer a slightly different solution which works out of the box in 10g and newer, without the need for a custom function and type using XMLAgg:
with ItemAttribute as (
select 'name'||level name
,mod(level,3) itemid
from dual
connect by level < 2000
)
,ItemAttributeGrouped as (
select xmlagg(xmlparse(content name||' ' wellformed) order by name).getclobval() attributes
,itemid
from ItemAttribute
group by itemid
)
select itemid
,attributes
,dbms_lob.substr(attributes,4000,1) sortkey
from ItemAttributeGrouped
order by dbms_lob.substr(attributes,4000,1)
;
Clean is subjective, and efficiency would need to be checked (but it's still only hitting the tables once so probably shouldn't be any worse), but if you have a finite upper limit on the number of attributes any item might - or at leads how many you have to consider for ordering - had then you could use multiple lead calls to do this:
SELECT itemId, name
FROM (
SELECT itemId, name, min(dr) over (partition by itemId) as dr
FROM (
SELECT itemId, name,
dense_rank() over (order by name, name1, name2, name3, name4) as dr
FROM (
SELECT Item.itemId,
Attribute.name,
LEAD(Attribute.name, 1)
OVER (PARTITION BY Item.itemId
ORDER BY Attribute.name) AS name1,
LEAD(Attribute.name, 2)
OVER (PARTITION BY Item.itemId
ORDER BY Attribute.name) AS name2,
LEAD(Attribute.name, 3)
OVER (PARTITION BY Item.itemId
ORDER BY Attribute.name) AS name3,
LEAD(Attribute.name, 4)
OVER (PARTITION BY Item.itemId
ORDER BY Attribute.name) AS name4
FROM Item
JOIN ItemAttribute
ON ItemAttribute.itemId = Item.itemId
JOIN Attribute
ON Attribute.attributeId = ItemAttribute.attributeId
)
)
)
ORDER BY dr, name;
So, the inner query is getting the two values you care about, and using four lead calls (just as an example, so this can sort based on a maximum of the first five attribute names, but could of course be extended by adding more!) to get a picture of what else each item has. With your data this gives:
ITEMID NAME NAME1 NAME2 NAME3 NAME4
---------- ---------- ---------- ---------- ---------- ----------
1 A B C
1 B C
1 C
2 A C
2 C
3 A B
3 B
The next query out does a dense_rank over those five ordered attribute names, which assigns a rank to each itemID and name, giving:
ITEMID NAME DR
---------- ---------- ----------
1 A 1
1 B 4
1 C 6
2 A 3
2 C 6
3 A 2
3 B 5
The next query out finds the minimum of those calculated dr values for each itemId, using the analytic version of min, so each itemID=1 gets min(dr) = 1, itemId=2 gets 3, and itemId=3 gets 2. (You could combine those two levels by selecting min(dense_rank(...)) but that's (even) less clear).
The final outer query uses that minimum rank for each item to do the actual ordering, giving:
ITEMID NAME
---------- ----------
1 A
1 B
1 C
3 A
3 B
2 A
2 C

Sql COALESCE entire rows?

I just learned about COALESCE and I'm wondering if it's possible to COALESCE an entire row of data between two tables? If not, what's the best approach to the following ramblings?
For instance, I have these two tables and assuming that all columns match:
tbl_Employees
Id Name Email Etc
-----------------------------------
1 Sue ... ...
2 Rick ... ...
tbl_Customers
Id Name Email Etc
-----------------------------------
1 Bob ... ...
2 Dan ... ...
3 Mary ... ...
And a table with id's:
tbl_PeopleInCompany
Id CompanyId
-----------------
1 1
2 1
3 1
And I want to query the data in a way that gets rows from the first table with matching id's, but gets from second table if no id is found.
So the resulting query would look like:
Id Name Email Etc
-----------------------------------
1 Sue ... ...
2 Rick ... ...
3 Mary ... ...
Where Sue and Rick was taken from the first table, and Mary from the second.
SELECT Id, Name, Email, Etc FROM tbl_Employees
WHERE Id IN (SELECT ID From tbl_PeopleInID)
UNION ALL
SELECT Id, Name, Email, Etc FROM tbl_Customers
WHERE Id IN (SELECT ID From tbl_PeopleInID) AND
Id NOT IN (SELECT Id FROM tbl_Employees)
Depending on the number of rows, there are several different ways to write these queries (with JOIN and EXISTS), but try this first.
This query first selects all the people from tbl_Employees that have an Id value in your target list (the table tbl_PeopleInID). It then adds to the "bottom" of this bunch of rows the results of the second query. The second query gets all tbl_Customer rows with Ids in your target list but excluding any with Ids that appear in tbl_Employees.
The total list contains the people you want — all Ids from tbl_PeopleInID with preference given to Employees but missing records pulled from Customers.
You can also do this:
1) Outer Join the two tables on tbl_Employees.Id = tbl_Customers.Id. This will give you all the rows from tbl_Employees and leave the tbl_Customers columns null if there is no matching row.
2) Use CASE WHEN to select either the tbl_Employees column or tbl_Customers column, based on whether tbl_Customers.Id IS NULL, like this:
CASE WHEN tbl_Customers.Id IS NULL THEN tbl_Employees.Name ELSE tbl_Customers.Name END AS Name
(My syntax might not be perfect there, but the technique is sound).
This should be pretty performant. It uses a CTE to basically build a small table of Customers that have no matching Employee records, and then it simply UNIONs that result with the Employee records
;WITH FilteredCustomers (Id, Name, Email, Etc)
AS
(
SELECT Id, Name, Email, Etc
FROM tbl_Customers C
INNER JOIN tbl_PeopleInCompany PIC
ON C.Id = PIC.Id
LEFT JOIN tbl_Employees E
ON C.Id = E.Id
WHERE E.Id IS NULL
)
SELECT Id, Name, Email, Etc
FROM tbl_Employees E
INNER JOIN tbl_PeopleInCompany PIC
ON C.Id = PIC.Id
UNION
SELECT Id, Name, Email, Etc
FROM FilteredCustomers
Using the IN Operator can be rather taxing on large queries as it might have to evaluate the subquery for each record being processed.
I don't think the COALESCE function can be used for what you're thinking. COALESCE is similar to ISNULL, except it allows you to pass in multiple columns, and will return the first non-null value:
SELECT Name, Class, Color, ProductNumber,
COALESCE(Class, Color, ProductNumber) AS FirstNotNull
FROM Production.Product
This article should explain it's application:
http://msdn.microsoft.com/en-us/library/ms190349.aspx
It sounds like Larry Lustig's answer is more along the lines of what you need though.