Why are query results across linked servers of different versions completely wrong? - sql

I have a SQL Server 2005 running a stored procedure which hits other servers running 2008.
A very straightforward query is returning utterly incorrect results.
SELECT
c.acctno, c.provcode, p.provcode, p.provname, c.poscode AS ChargePOS,
pos.poscode, pos.posdesc
FROM
Server2008.charge_t as c
inner join
Server2008.provcode_t as p on c.provcode = p.provcode
inner join
Server2008.poscode_t as pos on c.poscode = pos.poscode
inner join
Server2008.patdemo_t as pat on c.acctno = pat.acctno
left join
Server2008.billareacode_t as b on c.billingarea = b.billareacode
Where
c.proccode in ('G0438', 'G0439', '99420')
and c.correction = 'N'
and (c.priinscode in ('0001', '001A', '001B')
or c.secinscode in ('0001', '001A', '001B'))
and year(c.dateofservice) = year(getdate())
Note the INNER JOIN from poscode_t to charge_t table (second inner join) where c.poscode = pos.poscode. This is very simple, standard stuff here.
When this is executed on the 2005 server, the results are just wrong. I get the following:
acctno | patlname | patfname | ChargeProv | ProvProv | provname | ChargePOS | poscode | posdesc
---------------------------------------------------------------------------------------------------------------------------------------------------------
1 | person1 | Person1 | 28 | 28 | Doctor28 | 07 | 323 | Site323
2 | person2 | person2 | 24 | 24 | Doctor24 | 07 | 323 | Site323
In both example, the ChargePOS (07) and the poscode (323) are clearly not the same, which the join should ensure they were.
When I run this query on Server2008 itself, the results are correct. When I run it on a 2012 server, the results are correct. It's only when I run it on the 2005 server. It makes no difference what version of SSMS I use.
I've broken the query down to run piece by piece adding in the joins one at a time. If I specify an acctno in the WHERE, the results are correct.
Has anyone seen anything like this? It's like the link itself is bad or there's some sort of junk in a hung transaction out there that's messing with things only on this server. Any ideas where to look are helpful.
Thanks for your time.

This is not a solution to the overall issue, but I've discovered a couple of things that affect the results.
Changing the joins to LEFT corrects the problem when running it on the 2005 server.
Adding the hint OPTION ( MERGE JOIN ) corrects it as well.
None of this explains why it runs properly on all other servers but horribly wrong on this one server. Changing the joins to LEFT didn't alter the execution plan's structure but adding the hint did.
We're to a point where we need to bring in an expert because working around the problem isn't acceptable in this case. I still welcome any ideas for what might be happening here.

Related

Union results either Blank or Unfiltered

We have data that separates Paid and Rejected claims. I need to see results of both and therefore have to do a union.
(Our data is a mess. I am also aliasing for confidentiality/HIPPA compliance. Please try not to get hung up on those parts because I can't change it.)
SELECT CustID, code, date, 'Paid' AS Srce
FROM Paid.Claims
INNER JOIN Paid.Medical
ON Paid.Claims.id = Paid.Medical.id
AND Paid.Claims.blind = Paid.Medical.blind
WHERE Paid.Claims.date BETWEEN '2022-01-01' AND '2022-06-07'
AND Paid.Medical.code IN ('88521','88522','88523','88524','88525')
AND Paid.Claims.custID IN ('N065468','N095843','N001086')
UNION
SELECT CustID, code, date, 'Filter' AS Srce
FROM Rejected.Claims
INNER JOIN Rejected.Medical
ON Rejected.Claims.id = Rejected.Medical.id
AND Rejected.Claims.blind = Rejected.Medical.blind
WHERE Rejected.Claims.date BETWEEN '2022-01-01' AND '2022-06-07'
AND Rejected.Medical.code IN '88521','88522','88523','88524','88525')
AND Rejected.Claims.custID IN ('N065468','N095843','N001086')
It's based on a query that the person before me made, and that one works but it's also much simpler because it pulls less from less places. My outcomes so far have been:
Leave the where-clause out of the Paid data but still in the Rejected data and get EVERY RESULT. None of the filtering seems to be working.
Include the where-clause in both and get no results. filtering not working, but in the opposite direction.
I have also tried
SELECT *
FROM (
everything above with and without filters
) AS results
WHERE <filters same as above>
And results set is empty.
I have tried with and without aliasing with no changes in what's returned.
I'm expecting about 200 results that SHOULD look something like this:
| CustID | code | date | Srce |
| ------- | ----- | ---------- | ------ |
| N065468 | 88522 | 2022-04-04 | Paid |
| N095843 | 88521 | 2022-03-09 | Paid |
| N001086 | 88524 | 2022-05-20 | Filter |
Back to troubleshooting.
It's hard without sample data as someone else has requested. Without any data I'd recommend running each query individually and seeing what results you get ie. If you run just the top half for paid claims does that give you the rows you're expecting? Then the bottom half? The filtering arguments can be case sensitive so I'd recommend checking your medical codes are lower case.
We have many duplicate fields as a result of our flattening process. My formatting may have been correct, but it turns out I was pulling from some of the wrong places. I solved the problem by creating a franken-query using several others made by my predecessor with a similar union element. Depending on how I was attempting to alias everything, the filters were likely pulling from different fields than the ones I was select-ing.

MAX Function Fails in SQL

I'm trying to get the MOST recent date that comes before tom_temp.Begin_Time out of tbl_Trim_history.Comp. The SQL I'm using is:
SELECT
Tom_Temp2.feeder,
Tom_Temp.CauseType,
Tom_Temp.RootCause,
Tom_Temp.Storm_Name_Thunder,
Tom_Temp.DeviceGroup,
tbl_Trim_History.[COMP],
Tom_Temp.[Begin_Time]
FROM Tom_Temp2
LEFT JOIN (Tom_Temp
LEFT JOIN tbl_Trim_History
ON Tom_Temp.feeder = tbl_Trim_History.CIRCUIT_ID)
ON Tom_Temp2.feeder = Tom_Temp.feeder
WHERE (((tbl_Trim_History.[COMP]) < [Tom_Temp].[Begin_Time]));
I'm having a hard time figuring out where I need to put my max() function in this statement in order to make sure I don't get back every single tbl_Trim_history.[COMP] that occurs prior to the tom_temp.Begin_Time date. I only want the most recent date from tbl_Trim_history.[COMP] that occurs BEFORE the tom_temp.begin_Time .... NOT every historical date record.
Any help you guys could give me would be awesome because I keep getting back sets that I can tell are not what I'm looking for / expecting.
Thanks everyone. I appreciate the feedback.
Edit in regard to the responses below:
Due to the character limits, I just edited the master post for you guys.
I can't really post the data as it is somewhat confidential, so the best I can do is give you an example. Also, this is access, but my background is MySQL. Sorry for the tags, I wasn't sure what was similar since the access tag just didn't seem to fit the question.
The Data being received are about 168 records. Someone pointed out that there is an inner join occurring here, but I wanted to indicate I'm actually using 3 different tables.
1 table contains my feeders,
Another contains a list of all outages that I am joining to using all my feeders contained in the first table
Then I have another table that contains all the trim history for each feeder. The outage table is joined to the trim table.
When I run the query above, I get data like this
feeder | comp | Begin_time
___________________________________________
123456 | 10/4/2012 | 3/3/2016 11:26:00AM
123456 | 10/17/2015 | 3/3/2016 11:26:00AM
456789 | 6/28/2008 | 9/20/2013 10:05AM
456789 | 12/1/2012 | 9/20/2013 10:05AM
456789 | 7/3/2013 | 9/20/2013 10:05AM
what I want is data like this:
feeder | comp | Begin_time
___________________________________________
123456 | 10/17/2015 | 3/3/2016 11:26:00AM
456789 | 7/3/2013 | 9/20/2013 10:05AM
where the comp date is the closest to date / time occuring BEFORE Begin_time date.
I tried this query:
SELECT Tom_Temp2.feeder, Tom_Temp.CauseType, Tom_Temp.RootCause, Tom_Temp.Storm_Name_Thunder, Tom_Temp.DeviceGroup, Max(tbl_Trim_History.COMP) AS MaxOfCOMP, Tom_Temp.Begin_Time
FROM Tom_Temp2
LEFT JOIN (Tom_Temp LEFT JOIN tbl_Trim_History ON Tom_Temp.feeder = tbl_Trim_History.CIRCUIT_ID) ON Tom_Temp2.feeder = Tom_Temp.feeder
GROUP BY Tom_Temp2.feeder, Tom_Temp.CauseType, Tom_Temp.RootCause, Tom_Temp.Storm_Name_Thunder, Tom_Temp.DeviceGroup, Tom_Temp.Begin_Time
HAVING (((Max(tbl_Trim_History.COMP))<[Tom_Temp].[Begin_Time]));
But of the 168 records I get back in my first query, I'm only getting back 20 records with this query.
The reason I know this is wrong is because some records are missing between the set of 168 and the set of 20. For example, I'd be missing any records for feeder 456789. However, I know this record should be returned because it's in my table of feeders that should be returned (Tom_Temp2).
After manually deleting unwanted rows of data, I know that I should get a record count of 85. So my most recent attempt to use the Max query is way off.

SQL Query - Join the same column twice

I'm having trouble to achieve the result I want trying join a column from a table twice.
My first table is "dbo.Sessions", which contains basic session info like the user ID, the project ID, login/logout date and times, etc.
I need to join to that the user names and project names. However, these are found in another table, but in the same column (dbo.tblObjects.Name).
Example:
+------+---------------+
| k_Id | Name |
+------+---------------+
| 1 | AgentName1 |
| 2 | ProjectNameX |
| 3 | ProjectNameY |
| 4 | AgentName2 |
| 5 | ProjectNameZ |
| 6 | AgentName3 |
+------+---------------+
To try and achieve my goal, I used two "LEFT JOIN". However, I get duplicate results in both. I'll either get both columns to display either the project names or the user names (depending on which "LEFT JOIN" is first).
This is what I have at this point:
SELECT SysDB.dbo.Sessions.*, SysDB.dbo.tblObjects.Name AS AgentName, SysDB.dbo.tblObjects.Name AS ProjectName
FROM SysDB.dbo.Sessions
LEFT JOIN SysDB.dbo.tblObjects ON SysDB.dbo.Sessions.userId = SysDB.dbo.Objects.k_Id
LEFT JOIN SysDB.dbo.tblObjects ON SysDB.dbo.Sessions.projectId = SysDB.dbo.Objects.k_Id
WHERE (SysDB.dbo.Sessions.loginDate BETWEEN 'm/d/yyyy' AND 'm/d/yyyy')
Note: SysDB is the name of the database that I identify every time because this query is to be run externally. I also don't use "USE SysDB" before my selection because it doesn't work from the VBA macro this will run from.
Note 2: I have found a thread on this site that addresses this exact issue, but I can't understand what is being done, and it dates back in 2012. Something about aliases. The solution offers to add "ls." and "lt." before the table names, but that doesn't work for me. Says the table doesn't exist.
SQL Query Join Same Column Twice
Note 3: I have tried many different things, such as:
LEFT JOIN SysDB.dbo.tblObjects AS AgentName ON SysDB.dbo.Sessions.userId = SysDB.dbo.tblObjects.k_Id
LEFT JOIN SysDB.dbo.tblObjects AS ProjectName ON SysDB.dbo.Sessions.projectId = SysDB.dbo.tblObjects.k_Id
Any insights would be greatly appreciated. Thanks!
You may find it much easier to see what you are doing by giving each table an alias (session, agent, project below)
SELECT session.*, agent.Name AS AgentName, project.Name AS ProjectName
FROM SysDB.dbo.Sessions session
LEFT JOIN SysDB.dbo.tblObjects agent
ON session.userId = agent.k_Id
LEFT JOIN SysDB.dbo.tblObjects project
ON project.projectId = session.k_Id
WHERE (session.loginDate BETWEEN 'm/d/yyyy' AND 'm/d/yyyy')

SQL: Select distinct based on regular expression

Basically, I'm dealing with a horribly set up table that I'd love to rebuild, but am not sure I can at this point.
So, the table is of addresses, and it has a ton of similar entries for the same address. But there are sometimes slight variations in the address (i.e., a room # is tacked on IN THE SAME COLUMN, ugh).
Like this:
id | place_name | place_street
1 | Place Name One | 1001 Mercury Blvd
2 | Place Name Two | 2388 Jupiter Street
3 | Place Name One | 1001 Mercury Blvd, Suite A
4 | Place Name, One | 1001 Mercury Boulevard
5 | Place Nam Two | 2388 Jupiter Street, Rm 101
What I would like to do is in SQL (this is mssql), if possible, is do a query that is like:
SELECT DISTINCT place_name, place_street where [the first 4 letters of the place_name are the same] && [the first 4 characters of the place_street are the same].
to, I guess at this point, get:
Plac | 1001
Plac | 2388
Basically, then I can figure out what are the main addresses I have to break out into another table to normalize this, because the rest are just slight derivations.
I hope that makes sense.
I've done some research and I see people using regular expressions in SQL, but a lot of them seem to be using C scripts or something. Do I have to write regex functions and save them into the SQL Server before executing any regular expressions?
Any direction on whether I can just write them in SQL or if I have another step to go through would be great.
Or on how to approach this problem.
Thanks in advance!
Use the SQL function LEFT:
SELECT DISTINCT LEFT(place_name, 4)
I don't think you need regular expressions to get the results you describe. You just want to trim the columns and group by the results, which will effectively give you distinct values.
SELECT left(place_name, 4), left(place_street, 4), count(*)
FROM AddressTable
GROUP BY left(place_name, 4), left(place_street, 4)
The count(*) column isn't necessary, but it gives you some idea of which values might have the most (possibly) duplicate address rows in common.
I would recommend you look into Fuzzy Search Operations in SQL Server. You can match the results much better than what you are trying to do. Just google sql server fuzzy search.
Assuming at least SQL Server 2005 for the CTE:
;with cteCommonAddresses as (
select left(place_name, 4) as LeftName, left(place_street,4) as LeftStreet
from Address
group by left(place_name, 4), left(place_street,4)
having count(*) > 1
)
select a.id, a.place_name, a.place_street
from cteCommonAddresses c
inner join Address a
on c.LeftName = left(a.place_name,4)
and c.LeftStreet = left(a.place_street,4)
order by a.place_name, a.place_street, a.id

How can I optimize this query...?

I have two tables, one for routes and one for airports.
Routes contains just over 9000 rows and I have indexed every column.
Airports only 2000 rows and I have also indexed every column.
When I run this query it can take up to 35 seconds to return 300 rows:
SELECT routes.* , a1.name as origin_name, a2.name as destination_name FROM routes
LEFT JOIN airports a1 ON a1.IATA = routes.origin
LEFT JOIN airports a2 ON a2.IATA = routes.destination
WHERE routes_build.carrier = "Carrier Name"
Running it with "DESCRIBE" I get the followinf info, but I'm not 100% sure on what it's telling me.
id | Select Type | Table | Type | possible_keys | Key | Key_len | ref | rows | Extra
--------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | routes_build | ref | carrier,carrier_2 | carrier | 678 | const | 26 | Using where
--------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | a1 | ALL | NULL | NULL | NULL | NULL | 5389 |
--------------------------------------------------------------------------------------------------------------------------------------
1 | SIMPLE | a2 | ALL | NULL | NULL | NULL | NULL | 5389 |
--------------------------------------------------------------------------------------------------------------------------------------
The only alternative I can think of is to run two separate queries and join them up with PHP although, I can't believe something like this being something that could kill a mysql server. So as usual, I suspect I'm doing something stupid. SQL is my number 1 weakness.
Personally, I would start by removing the left joins and replacing them with inner joins as each route must have a start and end point.
It's telling you that it's not using an index for joining on the airports table. See how the "rows" column is so huge, 5000 odd? that's how many rows it's having to read to answer your query.
I don't know why, as you have claimed you have indexed every column. What is IATA? Is it Unique? I believe if mysql decides the index is inefficient it may ignore it.
EDIT: if IATA is a unique string, maybe try indexing half of it only? (You can select how many characters to index) That may give mysql an index it can use.
SELECT routes.*, a1.name as origin_name, a2.name as destination_name
FROM routes_build
LEFT JOIN
airports a1
ON a1.IATA = routes_build.origin
LEFT JOIN
airports a2
ON a2.IATA = routes_build.destination
WHERE routes_build.carrier = "Carrier Name"
From your EXPLAIN PLAN I can see that you don't have an index on airports.IATA.
You should create it for the query to work fast.
Name also suggests that it should be a UNIQUE index, since IATA codes are unique.
Update:
Please post your table definition. Issue this query to show it:
SHOW CREATE TABLE airports
Also I should note that your FULLTEXT index on IATA is useless unless you have set ft_max_word_len is MySQL configuration to 3 or less.
By default, it's 4.
IATA codes are 3 characters long, and MySQL doesn't search for such short words using FULLTEXT with default settings.
After you implement Martin Robins's excellent advice (i.e. remove every instance of the word LEFT from your query), try giving routes_build a compound index on carrier, origin, and destination.
It really depends on what information you're trying to get to. You probably don't need to join airports twice and you probably don't need to use left joins. Also, if you can search on a numeric field rather than a text field, that would speed things up as well.
So what are you trying to fetch?