SQL Server 2005 -Join based on criteria in table column - sql

Using SQL Server 2005, what is the most efficient way to join the two tables in the following scenario ?
The number of records in each table could be fairly large about 200000 say.
The only way I can currently think of doing this is with the use of cursors and some dynamic SQL for each item which will clearly be very inefficient.
I have two tables - a PERSON table and a SEARCHITEMS table. The SEARCHITEMS table contains a column with some simple criteria which is to be used when matching records with the PERSON table. The criteria can reference any column in the PERSON table.
For example given the following tables :
PERSON table
PERSONID FIRSTNAME LASTNAME GENDER AGE ... VARIOUS OTHER COLUMNS
1 Fred Bloggs M 16
....
200000 Steve Smith M 18
SEARCHITEMS table
ITEMID DESCRIPTION SEARCHCRITERIA
1 Males GENDER = 'M'
2 Aged 16 AGE=16
3 Some Statistic {OTHERCOLUMN >= SOMEVALUE AND OTHERCOLUMN < SOMEVALUE}
....
200000 Males Aged 16 GENDER = 'M' AND AGE = 16
RESULTS table should contain something like this :
ITEMID DESCRIPTION PERSONID LASTNAME
1 Males 1 Bloggs
1 Males 200000 Smith
2 Aged 16 1 Bloggs
....
200000 Males Aged 16 1 Bloggs
It would be nice to be able to just do something like
INSERT INTO RESULTSTABLE
SELECT *
FROM PERSON P
LEFT JOIN SEARCHITEMS SI ON (APPLY SI.SEARCHCRITERIA TO P)
But I can't see a way of making this work. Any help or ideas appreciated.

Seeing that the SEARCHITEMS table is non-relational by nature, it seems that the cursor and dynamic SQL solution is the only workable one. Of course this will be quite slow and I would "pre-calculate" the results to make it somewhat bearable.
To do this create the following table:
CREATE TABLE MATCHEDITEMS(
ITEMID int NOT NULL
CONSTRAINT fkMatchedSearchItem
FOREIGN KEY
REFERENCES SEARCHITEMS(ITEMID),
PERSONID int
CONSTRAINT fkMatchedPerson
FOREIGN KEY
REFERENCES PERSON(PERSONID)
CONSTRAINT pkMatchedItems
PRIMARY KEY (ITEMID, PERSONID)
)
The table will contain a lot of data, but considering it only stores 2 int columns the footprint on disk will be small.
To update this table you create the following triggers:
a trigger on the SEARCHITEMS table which will populate the MATCHEDITEMS table whenever a rule is changed or added.
a trigger on the PERSON table which will run the rules on the updated or added PERSON records.
Results can then simply be presented by joining the 3 tables.
SELECT m.ITEMID, m.DESCRIPTION, m.PERSONID, p.LASTNAME
FROM MATCHEDITEMS m
JOIN PERSON p
ON m.PERSONID = p.PERSONID
JOIN SEARCHITEMS s
ON m.ITEMID = s.ITEMID

You could build your TSQL dynamically, and then execute it with sp_executesql.

Related

Postgres - How to find id's that are not used in different multiple tables (inactive id's) - badly written query

I have table towns which is main table. This table contains so many rows and it became so 'dirty' (someone inserted 5 milions rows) that I would like to get rid of unused towns.
There are 3 referent table that are using my town_id as reference to towns.
And I know there are many towns that are not used in this tables, and only if town_id is not found in neither of these 3 tables I am considering it as inactive and I would like to remove that town (because it's not used).
as you can see towns is used in this 2 different tables:
employees
offices
and for table * vendors there is vendor_id in table towns since one vendor can have multiple towns.
so if vendor_id in towns is null and town_id is not found in any of these 2 tables it is safe to remove it :)
I created a query which might work but it is taking tooooo much time to execute, and it looks something like this:
select count(*)
from towns
where vendor_id is null
and id not in (select town_id from banks)
and id not in (select town_id from employees)
So basically I said, if vendor_is is null it means this town is definately not related to vendors and in the same time if same town is not in banks and employees, than it will be safe to remove it.. but query took too long, and never executed successfully...since towns has 5 milions rows and that is reason why it is so dirty..
In face I'm not able to execute given query since server terminated abnormally..
Here is full error message:
ERROR: server closed the connection unexpectedly This probably means
the server terminated abnormally before or while processing the
request.
Any kind of help would be awesome
Thanks!
You can join the tables using LEFT JOIN so that to identify the town_id for which there is no row in tables banks and employee in the WHERE clause :
WITH list AS
( SELECT t.town_id
FROM towns AS t
LEFT JOIN tbl.banks AS b ON b.town_id = t.town_id
LEFT JOIN tbl.employees AS e ON e.town_id = t.town_id
WHERE t.vendor_id IS NULL
AND b.town_id IS NULL
AND e.town_id IS NULL
LIMIT 1000
)
DELETE FROM tbl.towns AS t
USING list AS l
WHERE t.town_id = l.town_id ;
Before launching the DELETE, you can check the indexes on your tables.
Adding an index as follow can be usefull :
CREATE INDEX town_id_nulls ON towns (town_id NULLS FIRST) ;
Last but not least you can add a LIMIT clause in the cte so that to limit the number of rows you detele when you execute the DELETE and avoid the unexpected termination. As a consequence, you will have to relaunch the DELETE several times until there is no more row to delete.
You can try an JOIN on big tables it would be faster then two IN
you could also try UNION ALL and live with the duplicates, as it is faster as UNION
Finally you can use a combined Index on id and vendor_id, to speed up the query
CREATE TABLe towns (id int , vendor_id int)
CREATE TABLE
CREATE tABLE banks (town_id int)
CREATE TABLE
CREATE tABLE employees (town_id int)
CREATE TABLE
select count(*)
from towns t1 JOIN (select town_id from banks UNION select town_id from employees) t2 on t1.id <> t2.town_id
where vendor_id is null
count
0
SELECT 1
fiddle
The trick is to first make a list of all the town_id's you want to keep and then start removing those that are not there.
By looking in 2 tables you're making life harder for the server so let's just create 1 single list first.
-- build empty temp-table
CREATE TEMPORARY TABLE TEMP_must_keep
AS
SELECT town_id
FROM tbl.towns
WHERE 1 = 2;
-- get id's from first table
INSERT TEMP_must_keep (town_id)
SELECT DISTINCT town_id
FROM tbl.banks;
-- add index to speed up the EXCEPT below
CREATE UNIQUE INDEX idx_uq_must_keep_town_id ON TEMP_must_keep (town_id);
-- add new ones from second table
INSERT TEMP_must_keep (town_id)
SELECT town_id
FROM tbl.employees
EXCEPT -- auto-distincts
SELECT town_id
FROM TEMP_must_keep;
-- rebuild index simply to ensure little fragmentation
REINDEX TABLE TEMP_must_keep;
-- optional, but might help: create a temporary index on the towns table to speed up the delete
CREATE INDEX idx_towns_town_id_where_vendor_null ON tbl.towns (town_id) WHERE vendor IS NULL;
-- Now do actual delete
-- You can do a `SELECT COUNT(*)` rather than a `DELETE` first if you feel like it, both will probably take some time depending on your hardware.
DELETE
FROM tbl.towns as del
WHERE vendor_id is null
AND NOT EXISTS ( SELECT *
FROM TEMP_must_keep mk
WHERE mk.town_id = del.town_id);
-- cleanup
DROP INDEX tbl.idx_towns_town_id_where_vendor_null;
DROP TABLE TEMP_must_keep;
The idx_towns_town_id_where_vendor_null is optional and I'm not sure if it will actaully lower the total time but IMHO it will help out with the DELETE operation if only because the index should give the Query Optimizer a better view on what volumes to expect.

How to improve an Update query in Oracle

I'm trying to update two columns in an archaic Oracle database, but the query simply doesn't finish and nothing is updated. Any ideas to improve the query or something else that can be done? I don't have DBA skills/knowledge and unsure if indexing would help, so would appreciate comments in that area, too.
PERSON table: This table has 200 million distinct person_id's. There are no duplicates. The person_id is numeric and am trying to update the favorite_color and color_confidence columns, which are varchar2 and values currently NULLed out.
person table
person_id favorite_color color_confidence many_other_columns
222
333
444
TEMP_COLOR_CONFIDENCE table: I'm trying to get the favorite_color and color_confidence from this table and update to the PERSON table. This table has 150 million distinct person's, again nothing duplicated.
temp_color_confidence
person_id favorite_color color_confidence
222 R H
333 Y L
444 G M
This is my update query, which I realize only updates those found in both tables. Eventually I'll need to update the remaining 50 million with "U" -- unknown. Solving that in one shot would be ideal too, but currently just concerned that I'm not able to get this query to complete.
UPDATE person p
SET (favorite_color, color_confidence) =
(SELECT t.favorite_color, t.color_confidence
FROM temp_color_confidence t
WHERE p.person_id = t.person_id)
WHERE EXISTS (
SELECT 1
FROM temp_color_confidence t
WHERE p.person_id = t.person_id );
Here's where my ignorance shines... would indexing on person_id help, considering they are all distinct anyway? Would indexing on favorite_color help? There are less than 10 colors and only 3 confidence values.
For every person, it has to find the corresponding row in temp_color_confidence. The way to do that with the least I/O is to scan each table once and crunch them together in a single hash join, ideally all in memory. Indexes are unlikely to help with that, unless maybe temp_color_confidence is very wide and verbose and has an index on (person_id, favorite_color, color_confidence) which the optimiser can treat as a skinny table.
Using merge might be more efficient as it can avoid the second scan of temp_color_confidence:
merge into person p
using temp_color_confidence t
on (p.person_id = t.person_id)
when matched then update
set p.favorite_color = t.favorite_color, p.color_confidence = t.color_confidence;
If you are going to update every row in the table, though, you might consider instead creating a new table containing all the values you need:
create table person2
( person_id, favorite_color, color_confidence )
pctfree 0 compress
as
select p.person_id, nvl(t.favorite_color,'U'), nvl(t.color_confidence,0)
from person p
left join temp_color_confidence t
on t.person_id = p.person_id;

Sql querying Ids of a table inner joined with a 'list' table

In Sqlite I have the following tables:
CREATE TABLE "Person" (Id INTEGER PRIMARY KEY,
"Age" INTEGER, "Email" TEXT)
CREATE TABLE "_Person_Name" (Id INTEGER PRIMARY KEY,
Owner INTEGER NOT NULL, Name TEXT,
FOREIGN KEY(Owner) REFERENCES Person(Id))
where the second table represents a list of strings for the name of a Person.
I would like to query Person.Id by matching names such that several conditions may match for in the same row or in different rows of _Person_Name.Name. For example, suppose Person.Id 1 has 2 associated rows _Person_Name.Name "John" and "Smith", both with _Person_Name.Owner=1. Then I'd like to have a query that returns exactly this Person.Id 1 based on searching for "John" and for "Smith". "John Wilson" or "Jonas Smith" should not be returned, but "John Theodore Smith" should be returned.
I tried the following:
SELECT Person.Id FROM Person
INNER JOIN _Person_Name ON Person.Id = _Person_Name.Owner
WHERE (Name LIKE 'John') AND (Name LIKE 'Smith');
But this doesn't work. It finds the person with each of the conditions separately, but the conjunction of both seems to apply to the same row only, so nothing is returned.
How can I search for both conditions such that they must both apply to the same person Id, but may match in different rows of the list table?
Edit: Here is an example of the schema with data. It's just an example, this is for an automated tool that deals with arbitrary schemas and associated 'list' tables.
Table Person
Id Age Email
==================
1 30 john#test.com
2 28 lucie#gmail.com
3 47 bob#gmail.com
Table _Person_Names
Id Name Owner (Foreign Key references Person.Id)
1 John 1
2 C. 1
3 Smith 1
4 Lucie 2
5 Smith 2
6 Bob 3
7 Smith 3
The query should return only Id 1, because only Person.Id 1 has both "John" and "Smith" in the table _Person_Names.
The problem is finding if there are 2 rows inside _Person_Names containing the values John and Smith in the column Name and having the same value in column Owner.
This has nothing to do with the table Person.
If 2 such rows can be found, then the value in column Owner is the Id in the table Person. Right?
Check this code:
SELECT Owner FROM
(SELECT pn.Owner AS Owner, pn.Name AS Name1, p.Name AS Name2
FROM _Person_Names AS pn
INNER JOIN _Person_Names AS p ON (pn.Owner = p.Owner) AND (Name1 <> Name2))
WHERE (Name1 = 'John') AND (Name2 = 'Smith')
Since the first, middle, and last names are all in the same column, you could also try modifying your WHERE clause to:
WHERE Name LIKE 'John % Smith'
Same result as forpas and Namandeep_Kaur methods just less bulky
Using an AND operator in WHERE clause is not retrieving the correct result. You should use an OR operator
Edited part starts here
select Person.ID, Dusra_table.name
from person, dusra_table
where Person.ID = Dusra_Table.Owner
and (Dusra_Table.Name = 'John'
OR Dusra_Table.Name = 'Smith'
)
and Person.ID = ANY(Select Dusra_Table.Owner from Dusra_table where Name = 'John');
Dusra_Table is _Person_Names table

Oracle SQL statement for one to many to append the multiple records (fields) to one string

I am writing a SQL statement for Oracle where there is a one to many relationship between two tables. The table Person has a foreign key to table Purchase which has a Purchase Description field.
I need to write a SELECT query that will take all the purchase records/rows and append them to each other like so
Person Table
PersonID PersonName
1 John
Purchases Table
PurchaseId (PK), PersonID(FK), PurchaseDescription
1 1 Book
2 1 Clothes
3 1 Bag
4 1 Dinner
So the output of the query would look like this
Output = 1, Book:Bag:Clothes:Dinner
The output will be one row from the one to many relationship where there are separate records for book, bag, clothes, and dinner.
Any help is appreciated. Thanks
to do this use a function called LISTAGG, like this
SELECT 'Output = '||CAST(P.PersonID AS VARCHAR(100)), LISTAGG(Pur.PurchaseDescription, ':')
FROM Person P
LEFT JOIN Purchase Pur ON P.PersonID = Pur.PersonID
GROUP BY P.PersonID

UPDATE query that fixes orphaned records

I have an Access database that has two tables that are related by PK/FK. Unfortunately, the database tables have allowed for duplicate/redundant records and has made the database a bit screwy. I am trying to figure out a SQL statement that will fix the problem.
To better explain the problem and goal, I have created example tables to use as reference:
alt text http://img38.imageshack.us/img38/9243/514201074110am.png
You'll notice there are two tables, a Student table and a TestScore table where StudentID is the PK/FK.
The Student table contains duplicate records for students John, Sally, Tommy, and Suzy. In other words the John's with StudentID's 1 and 5 are the same person, Sally 2 and 6 are the same person, and so on.
The TestScore table relates test scores with a student.
Ignoring how/why the Student table allowed duplicates, etc - The goal I'm trying to accomplish is to update the TestScore table so that it replaces the StudentID's that have been disabled with the corresponding enabled StudentID. So, all StudentID's = 1 (John) will be updated to 5; all StudentID's = 2 (Sally) will be updated to 6, and so on. Here's the resultant TestScore table that I'm shooting for (Notice there is no longer any reference to the disabled StudentID's 1-4):
alt text http://img163.imageshack.us/img163/1954/514201091121am.png
Can you think of a query (compatible with MS Access's JET Engine) that can accomplish this goal? Or, maybe, you can offer some tips/perspectives that will point me in the right direction.
Thanks.
The only way to do this is through a series of queries and temporary tables.
First, I would create the following Make Table query that you would use to create a mapping of the bad StudentID to correct StudentID.
Select S1.StudentId As NewStudentId, S2.StudentId As OldStudentId
Into zzStudentMap
From Student As S1
Inner Join Student As S2
On S2.Name = S1.Name
Where S1.Disabled = False
And S2.StudentId <> S1.StudentId
And S2.Disabled = True
Next, you would use that temporary table to update the TestScore table with the correct StudentID.
Update TestScore
Inner Join zzStudentMap
On zzStudentMap.OldStudentId = TestScore.StudentId
Set StudentId = zzStudentMap.NewStudentId
The most common technique to identify duplicates in a table is to group by the fields that represent duplicate records:
ID FIRST_NAME LAST_NAME
1 Brian Smith
3 George Smith
25 Brian Smith
In this case we want to remove one of the Brian Smith Records, or in your case, update the ID field so they both have the value of 25 or 1 (completely arbitrary which one to use).
SELECT min(id)
FROM example
GROUP BY first_name, last_name
Using min on ID will return:
ID FIRST_NAME LAST_NAME
1 Brian Smith
3 George Smith
If you use max you would get
ID FIRST_NAME LAST_NAME
25 Brian Smith
3 George Smith
I usually use this technique to delete the duplicates, not update them:
DELETE FROM example
WHERE ID NOT IN (SELECT MAX (ID)
FROM example
GROUP BY first_name, last_name)