Query XML columns over multiple tables in SQL Server database - sql

I have a database which has two tables, namely:
Boats
user_id|boat_id|sails
1 1 <sails><id135986><sailinfo>A2</sailinfo><sailtype>Main</sailtype></id135986><id185764><sailinfo>#3</sailinfo><sailtype>Jib</sailtype></id185764></sails>
1 2 <sails><id10230><sailinfo>A2</sailinfo><sailinfo>Main</sailinfo></id10230><id20000><sailinfo>#5</sailinfo><sailinfo>Genoa</sailinfo></id20000></sails>
2 3 <sails><id43567><sailinfo>A2</sailinfo><sailinfo>Main</sailinfo></id43567><id24503><sailinfo>#5</sailinfo><sailinfo>Genoa</sailinfo></id24503></sails>
Records
user_id|boat_id|location |sails
1 1 San Francisco <sails><id135986>id135986</id135986><id185764>id185764</id185764></sails>
1 2 Chicago <sails><id10230>id10230</id10230></sails>
1 2 Chicago <sails><id20000>id20000</id20000></sails>
1 2 New York <sails><id10230>id10230</id10230><id20000>id20000</id20000></sails>
2 3 Bermuda <sails><id43567>id43567</id43567></sails>
The idea behind this structure is that if the records "sailinfo" and "sailtype" are updated in the table Boats, the Records table is unaffected because it is linked to the other one by the id tag.
Now, I have an application where the user can choose a sailinfo (e.g. A2) and based on this input a query should be generated to retrieve the location where that particular sailinfo has been used. The results should be dispayed in tabular format where the first column ("Location") contains the relevant locations and the second column should have the "sailtype" associated with "A2" as header and A2 as entry. For instance, suppose the user with user_id=2 inputs "A2" for boat_id=2, this is what should be returned:
location | Jib
Chicago A2
New York A2
I have tried using the following SQL statement but it didn't work:
DECLARE #boat_id VARCHAR(50);
SET #boat_id = (SELECT Boats.sails.value('local-name(/sails[1]/*[1])','nvarchar(50)')
FROM Boats
WHERE boat_id = 88
AND Boats.sails.value('(/sails/*/sailinfo)[1]', 'nvarchar(50)') = 'A2');
DECLARE #record_id varchar(50);
SET #record_id = (SELECT Records.sails.value('local-name(/sails[1]/*[1])', 'nvarchar(50)')
FROM Records
WHERE boat_id = 2
AND Records.sails.value('local-name(/sails[1]/*[1])', 'nvarchar(50)') = #boat_id );
SELECT
[location], Boats.sails.value('(/sails/id10230/sailinfo)[1]', 'nvarchar(50)') AS 'Jib'
FROM
Boats
FULL JOIN
Records ON Records.sails.value('local-name(/sails[1]/*[1])', 'nvarchar(50)') = #boat_id
WHERE
Boats.boat_id = 2;
Of course since I am picking from table Boats the above returns NULL for "location" and just a single row for "Jib":
Sample query result
I hope the above description is clear enough.
Your help would be greatly appreciated!

That statement does not work at first because in your data you do non have 'boat_id = 88' and then you do not have 'id28108'
Code below returns what you want:
SELECT
[location], Boats.sails.value('(/sails/id10230/sailinfo)[1]', 'nvarchar(50)') AS 'Jib'
FROM
#tbl_Boats as Boats
FULL JOIN
#tbl_Records as Records
ON Records.sails.value('local-name(/sails[1]/*[1])', 'nvarchar(50)')
= Boats.sails.value('local-name(/sails[1]/*[1])','nvarchar(50)')
WHERE Records.sails.value('(/sails/id10230)[1]', 'nvarchar(50)') is not null;
P.S. Something looks wrong. Try to store some values in columns, outside of XML. That will work much faster and will be easier to understand/troubleshoot.

Related

Using TOP 1 (or CROSS APPLY) within multiple joins

I've reviewed multiple Q&A involving TOP 1 and CROSS APPLY (including the very informative 2043259), but I still can't figure out how to solve my issue. If I had a single join I'd be fine, but fitting TOP 1 into the middle of a chain of joins has stumped me.
I have four tables and one of the tables contains multiple matches when joining due to a previous bug (since fixed) that created new records in the table instead of updating existing records. In all cases, where there are multiple records, it is the top-most record that I want to use in one of my joins. I don't have access to the table to clean up the extraneous data, so I just have to deal with it.
The purpose of my query is to return a list of all "Buildings" managed by a particular person (user choses a person's name and they get back a list of all buildings managed by that person). My tables are:
Building (a list of all buildings):
BuildingId BuildingName
1 Oak Tree Lane
2 Lighthoue Court
3 Fairview Lane
4 Starview Heights
WebBuildingMapping (mapping of BuidingId from Building table, that is part of an old system, and corresponding WebBuildingId in another piece of software):
BuildingId WebBuildingId
1 201
2 202
3 203
4 204
WebBuildingContacts (list of ContactID for the building manager of each building). This is the table with duplicate values - where I want to choose the TOP 1. In sample data below, there are two references to WebBuidingId = 203 (row 3 & row 5) - I only want to use row 3 data in my join.
Id WebBuildingId ContactId
1 201 1301
2 202 1301
3 203 1303
4 204 1302
5 203 1302
Contacts (list of ContactIds and corresponding property manager Names)
ContactId FullName
1301 John
1302 Mike
1303 Judy
As noted, in the example above, the table WebBuildingContact has two entries for the building with a WebBuidingId = 203 (row 3 and row 5). In my query, I want to select the top one (row 3).
My original query for a list of buildings managed by 'Mike' is:
SELECT BuildingName
FROM Building bu
JOIN WebBuildingMapping wbm ON wbm.BuildingId = bu.BuildingId
JOIN WebBuildingContact wbc ON wbc.WebBuildingId = wbm.WebBuildingId
JOIN Contacts co ON co.ContactId = wbc.ContactId
WHERE co.FullName = 'Mike'
This returns 'Fairview Lane' and 'Starview Heights'; however, Judy manages 'Fairview Lane' (she's the top entry in the WebBuildingContacts table). To modify the query and eliminate row 5 in WebBuildingContacts from the join, I did the following:
SELECT BuildingName
FROM Building bu
JOIN WebBuildingMapping wbm ON wbm.BuildingId = bu.BuildingId
JOIN WebBuildingContact wbc ON wbc.WebBuildingId =
(
SELECT TOP 1 WebBuildingId
FROM WebBuildingContact
WHERE WebBuildingContact.WebBuildingId = wbm.WebBuildingId
)
JOIN Contacts co ON co.ContactId = wbc.ContactId
WHERE co.FullName = 'Mike'
When I try this; however, I get the same result set (ie it returns 'Mike' as manager for 2 buildings). I've also made various attempts to use CROSS APPLY but I just end up with 'The multi-part identifier could not be bound', which is a whole other rabbit hole to go down.
You could try this:
SELECT bu2.BuildingName
FROM building bu2
WHERE bu2.BuildingId IN
(SELECT MAX(bu.BuildingId)
FROM Building bu
JOIN WebBuildingMapping wbm ON wbm.BuildingId = bu.BuildingId
JOIN WebBuildingContact wbc ON wbc.WebBuildingId = wbm.WebBuildingId
JOIN Contacts co ON co.ContactId = wbc.ContactId
WHERE co.FullName = 'Mike'
);

SQL Join correct? Pulling more than I want

I have a system that is storing people applying for jobs. They are in the warehouse. When they apply for a job I create a record for them in the jobTracking table. That is simple. The issue is when someone applies for more then one position they get more then one record clearly in jobTracking. My issue is the ability to make a query to add people to a job internally based on a where clause. Let's say I want to add people to jobID=56 where their degree = "MD" and they haven't already applied. The list of people that appear post query will contain MDs who have already applied for 56 IF they have another record for another job. Who can I tell the query to ignore that applicantID in all records if one is a match? Tables are below. Query is also below that gives incorrect records. applicantID=10 will appear in my query below because he also has a record for job 46.
SELECT applicantWarehouse.first
, applicantWarehouse.last
, applicantWarehouse.title
, applicantWarehouse.ID
, jobTracking.applicantID
, jobTracking.jobID
FROM jobTracking INNER JOIN applicantWarehouse ON jobTracking.applicantID = applicantWarehouse.ID
WHERE Degree="MD"
AND jobTracking.jobID !=56
applicantWarehouse table
ID | First | Last | Degree
job table
jobID | jobTitle
jobTracking table
ID | applicantID | jobID
1 10 56
2 10 46
In your example you want to pull all people who have a degree md AND have not yet applied to jobID of 56. If that is what you need you can do this using not exists.
SELECT applicantWarehouse.first
, applicantWarehouse.last
, applicantWarehouse.title
, applicantWarehouse.ID
, jobTracking.applicantID
, jobTracking.jobID
FROM jobTracking INNER JOIN applicantWarehouse ON jobTracking.applicantID = applicantWarehouse.ID
WHERE Degree="MD"
AND NOT EXISTS (
SELECT 1
FROM jobTracking j
WHERE j.jobID = 56
AND j.applicantID = applicantWarehouse.ID )

How to load grouped data with SSIS

I have a tricky flat file data source. The data is grouped, like this:
Country City
U.S. New York
Washington
Baltimore
Canada Toronto
Vancouver
But I want it to be this format when it's loaded in to the database:
Country City
U.S. New York
U.S. Washington
U.S. Baltimore
Canada Toronto
Canada Vancouver
Anyone has met such a problem before? Got a idea to deal with it?
The only idea I got now is to use the cursor, but the it is just too slow.
Thank you!
The answer by cha will work, but here is another in case you need to do it in SSIS without temporary/staging tables:
You can run your dataflow through a Script Transformation that uses a DataFlow-level variable. As each row comes in the script checks the value of the Country column.
If it has a non-blank value, then populate the variable with that value, and pass it along in the dataflow.
If Country has a blank value, then overwrite it with the value of the variable, which will be last non-blank Country value you got.
EDIT: I looked up your error message and learned something new about Script Components (the Data Flow tool, as opposed to Script Tasks, the Control Flow tool):
The collection of ReadWriteVariables is only available in the
PostExecute method to maximize performance and minimize the risk of
locking conflicts. Therefore you cannot directly increment the value
of a package variable as you process each row of data. Increment the
value of a local variable instead, and set the value of the package
variable to the value of the local variable in the PostExecute method
after all data has been processed. You can also use the
VariableDispenser property to work around this limitation, as
described later in this topic. However, writing directly to a package
variable as each row is processed will negatively impact performance
and increase the risk of locking conflicts.
That comes from this MSDN article, which also has more information about the Variable Dispenser work-around, if you want to go that route, but apparently I mislead you above when I said you can set the value of the package variable in the script. You have to use a variable that is local to the script, and then change it in the Post-Execute event handler. I can't tell from the article whether that means that you will not be able to read the variable in the script, and if that's the case, then the Variable Dispenser would be the only option. Or I suppose you could create another variable that the script will have read-only access to, and set its value to an expression so that it always has the value of the read-write variable. That might work.
Yes, it is possible. First you need to load the data to a table with an IDENTITY column:
-- drop table #t
CREATE TABLE #t (id INTEGER IDENTITY PRIMARY KEY,
Country VARCHAR(20),
City VARCHAR(20))
INSERT INTO #t(Country, City)
SELECT a.Country, a.City
FROM OPENROWSET( BULK 'c:\import.txt',
FORMATFILE = 'c:\format.fmt',
FIRSTROW = 2) AS a;
select * from #t
The result will be:
id Country City
----------- -------------------- --------------------
1 U.S. New York
2 Washington
3 Baltimore
4 Canada Toronto
5 Vancouver
And now with a bit of recursive CTE magic you can populate the missing details:
;WITH a as(
SELECT Country
,City
,ID
FROM #t WHERE ID = 1
UNION ALL
SELECT COALESCE(NULLIF(LTrim(#t.Country), ''),a.Country)
,#t.City
,#t.ID
FROM a INNER JOIN #t ON a.ID+1 = #t.ID
)
SELECT * FROM a
OPTION (MAXRECURSION 0)
Result:
Country City ID
-------------------- -------------------- -----------
U.S. New York 1
U.S. Washington 2
U.S. Baltimore 3
Canada Toronto 4
Canada Vancouver 5
Update:
As Tab Alleman suggested below the same result can be achieved without the recursive query:
SELECT ID
, COALESCE(NULLIF(LTrim(a.Country), ''), (SELECT TOP 1 Country FROM #t t WHERE t.ID < a.ID AND LTrim(t.Country) <> '' ORDER BY t.ID DESC))
, City
FROM #t a
BTW, the format file for your input data is this (if you want to try the scripts save the input data as c:\import.txt and the format file below as c:\format.fmt):
9.0
2
1 SQLCHAR 0 11 "" 1 Country SQL_Latin1_General_CP1_CI_AS
2 SQLCHAR 0 100 "\r\n" 2 City SQL_Latin1_General_CP1_CI_AS

Select multiple results from sub query into single row (as array datatype)

I'm trying to solve a small problem with a SQL query in an oracle database. Let's assume I have these tables:
One table that holds information about cars:
tblCars
ID Model Color
--------------------
1 Volvo Red
2 BMW Blue
3 BMW Green
And another one containing information about drivers:
tblDrivers
ID fID_tblCars Name
---------------------------
1 1 George
2 1 Mike
3 2 Jason
4 2 Paul
5 2 William
6 3 Steve
Now, let's pretend that to find out the popularity of the cars, I want to create reports that contain the data about the cars and the people that are driving them (which seems a very reasonable thing one would accomplish with a database).
This "ReportObject" would have a string for the model, a string for the color and an array (or a list) of strings for the drivers.
Currently, I do this with two queries, in the first I select the cars
SELECT ID, Model, Color FROM tblCars
and create a report object for each result.
Then, I would take each result and get the drivers for each specific car
SELECT Name FROM tblDrivers WHERE fID_tblCars = ResultObject.ID
Basically, step one gives me a resulting data set that looks like this:
Result
------------------------------------------
ColumnID ColumnModel ColumnColor
Type Integer Type String Type String
and now, if I will have more cars in the future, I will have to make a lot of additional queries, one for each row in the resulting table.
When I try this:
SELECT Model, Color, (SELECT Name FROM tblDrivers WHERE tblDrivers.fID_tblCars = tblCars.ID) as Name FROM tblCars
I get some error message telling me that one result in the row contains multiple elements (which is what I want!).
I want the result to look like this:
Result
--------------------------------------------------------
ColumnID ColumnModel ColumnColor ColumnName
Type Integer Type String Type String Type Array
So when I build my report object, I could do something like this:
foreach (var Row in Results)
{
ReportObject.Model = Row.Model;
ReportObject.Color = Row.Color;
foreach (string Driver in Row.Name)
{
ReportObject.Drivers.Add(Driver);
}
}
Am I completely missing my basics here or do I have to split this up in multiple queries?
Thanks!
This works in Oracle. In the SQL Fiddle example I couldn't get the IDENTITY or the PRIMARY KEYS to work when creating the table (never used Oracle SQL before)
SELECT c.id,
c.model,
c.color,
LISTAGG(d.name, ',') WITHIN GROUP (ORDER BY d.name) AS "Drivers"
FROM tblCars c
JOIN tblDrivers d
ON c.id = d.fID_TblCars
GROUP BY c.id,
c.model,
c.color
ORDER BY c.Id
SQL Fiddle Example

Query to JOIN / *overwrite* field

I'm not sure if I'm using the correct terminology.
SELECT movies.*, actors.`First Name`, actors.`Last Name`
From movies
Inner Join actors on movies.`actor1` Where movies.`actor1` = actors.`indexActors`;
#Inner Join actors on movies.`actor2` Where movies.`actor2` = actors.`indexActors`;
I have the 2nd line commented out, each one works individually, and I'm wondering how to combine them.
2ndly, when I execute the query, I get the results:
ID Title Runtime Rating Actor1 Actor2 First Name Last Name
1 Se7en 127 R 1 2 Morgan Freeman
2 Bruce Almighty 101 PG-13 1 3 Morgan Freeman
3 Mr. Popper's Penguins 94 PG 3 4 Jim Carrey
4 Superbad 113 R 4 5 Emma Stone
5 Crazy, Stupid, Love. 118 PG-13 4 Null Emma Stone
Is there a way to add the results from the 2nd join to the rightmost columns?
Also, is it possible to combine the strings/VARCHARs from First Name and Last Name, and then have that value show up under the corresponding Actor Field?
(aka the field under Actor 1 for row 1 would be "Morgan Freeman" instead of "1")
Thanks.
Your sql is not valid, but you can achieve your goal by joining to the same table twice, with different aliases. This sort of thing
select blah blah blah
from table1 t1 join table2 t2 on t1.field1 = t2.field1
join table2 t2_again on t1.field1 = t2_again.field2
etc
As far as joining first and last names in a single field, most databases have a way to concatenate strings, but they are not all the same. You'll have to specify your db engine.