How To Query IDs that have different Patient Names - sql

First, I understand that I should have a Primary Key on a Value Patient ID. A project was performed for ID conversions that did not go very well. So now I need to find all Patient IDs that have differnt Patient Names. There are 4 different DBs>Tables that contain info. For now I selected them into a Temp DB. Because I actually need all PIDs to be distinct across those DBs. Our application has tools to keep that synchronized. But due to some bad SQL work, I need to synchronize all the Data again.
PID NAME
1234 Johnson
1234 Johnson
4567 Jones
4567 Alexander
I am trying to write a query that will return the results of PID 4567 + NAME Values of Jones and Alexander.

You need to use a self-join. Please try this:
create table #temp
(id int, name varchar(30))
insert into #temp values (1,'johnson')
insert into #temp values (1,'johnson')
insert into #temp values (2,'james')
insert into #temp values (2,'Alex')
SELECT * FROM #temp WHERE id IN (
SELECT a.id FROM #temp a
JOIN #temp b on b.id = a.id AND b.name <> a.name
)

SELECT Min(PID), Name FROM [Table]
GROUP BY Name
HAVING Count(PID) = 1

SELECT PID,NAME
FROM TABLE
GROUP BY PID,NAME
HAVING COUNT(*) =1

I think this will do it
select p.pid, max(name), min(name), count(*) as cnt
from p
group by pid
having max(name) <> min(name)
or
select p1.pid, p1.name, p2.name
from p p1
join p p2
on p1.pid = p2.pid
and p1.name < p2.name
order by p1.pid, p1.name, p2.name

There are a lot of ways and some more optimized than others depending on which RDBMS system you are using. But typically this is a 2 step operations.
1) Find all of the PIDs that have more than 1 Name associated with it
2) Relate back to get the rest of the data you are seeking.
CREATE TABLE #T (
PID INT
,Name VARCHAR(25)
)
INSERT INTO #T (PID,Name) VALUES (1234,'Johnson'),(1234,'Johnson'),(4567,'Jones'),(4567,'Alexander')
SELECT
t2.*
FROM
(
SELECT
PID
FROM
#T t1
GROUP BY
PID
HAVING COUNT(DISTINCT Name) > 1
) dupes
INNER JOIN #T t2
ON dupes.PID = t2.PID
It is important when using a method such as the join or IN above that you use DISTINCT name because simplying counting * or name will return multiple occurrences of the same PID to name combination not simply duplicates.
If you only want the duplicate not all of the combinations. Using a RowNumber() or something can help you get to the answer a little more efficiently too. Or you can also use a method such as looking for existence of a non identical record, like so:
SELECT DISTINCT t1.PID, t1.Name
FROM
#T t1
WHERE
EXISTS (SELECT 1 FROM #t t2 WHERE t1.PID = t2.PID AND t1.Name <> t2.Name)
This way could perform faster for you depending on data sets etc. I would tend to stay away from solutions that use IN for cases like these.

Related

Update table with using NEWID() function

CREATE TABLE Products(Id INT, Name CHAR(100), DefaultImageId INT NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(1, 'A', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(2, 'A', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(3, 'A', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(4, 'A', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(5, 'A', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(1, 'B', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(2, 'B', NULL);
INSERT INTO Products (Id, Name, DefaultImageId) VALUES(3, 'B', NULL);
In general, I would update a table randomly like the following scripts.
update a
set DefaultImageId=1
from Products as a
where name = 'A'
and id in (
select top 2 id
from Products as b
where a.name = b.name
order by newid()
)
However, I get some issue. It would update more/less then 2 rows. I try execute the following scripts many times for debug.
The results are not always only two records. If I remove the order by newid(), the number of output result will be fine. It seems like the problem in newid(). How can I solve this problem? Thanks
select *
from Products as a
where name = 'A'
and id in (
select top 2 id
from Products as b
where a.name = b.name
order by newid()
)
You can try this:
UPDATE U SET DefaultImageId = 1
FROM(
SELECT TOP 2 * FROM dbo.Products WHERE Name = 'A' ORDER BY NEWID()
) AS U
However if you filter out the DefaultImageId=1 in inner query, it would be even better too. You should note that in this case, the inner query might produce less than 2 records.
UPDATE U SET DefaultImageId = 1
FROM(
SELECT TOP 2 * FROM dbo.Products WHERE Name = 'A' AND DefaultImageId <> 1 ORDER BY NEWID()
) AS U
I think you're just looking for a simple JOIN as
UPDATE A
SET DefaultImageId = 1
FROM Products A
JOIN
(
SELECT TOP 2 Id, Name
FROM Products B
ORDER BY NEWID()
) B ON A.Id = B.Id AND A.Name = B.Name;
Demo
SQL SERVER BUG:
I think you just hit the following SQL server bug, that is still under review since 2018.
https://feedback.azure.com/forums/908035-sql-server/suggestions/33272404-cte-containing-top-and-newid-producing-incorrect
WORK AROUND:
The only thing that seems to work, based on my stesting, is if you eliminate the outside reference from the subquery, things works as expected. You can eliminate the outside reference by doing any of the following as suggested by others here.
Option 1: Keeping the query as-is, except hard-coding the filter criteria inside the subquery instead of using an outside reference:
SELECT ...
FROM t1
WHERE
t1.id IN (SELECT TOP 2 id subquery with NO outside reference WHERE name = 'A' ORDER BY ORDER BY newid() )
AND t1.name = 'A'
....
Option 2: Using a JOIN with proper ON matching criteria instead of using an outside reference inside the subquery:
SELECT ...
FROM t1
INNER JOIN (SELECT TOP 2 id subquery with NO outside reference ORDER BY newid() ) t2 ON t1.id = t2.id and t1.name = t2.name
WHERE
t1.name = 'A'
....
OR
DETAILS:
Based on my testing, when all of the following are aligned, the SQL Engine returns an unpredictable result:
An outside table is referenced inside the subquery (or through CTE) as a filtering criteria
NEWID() function is used in the ORDER BY with a SELECT TOP statement (It does NOT happen with other non-deterministic functions like RAND() for example. Something specific to NEWID())
The SELECT TOP statement uses a number less than the number of records exists in that table
This seems to throw off the SQL engine. The result changes at every run: The number of records returned varies at every run between zero records and max number of records that would have been returned if SELECT TOP was not used.
The query plan does not make sense at all. See the screenshot here. TOP N returns more than it is supposed to and id IN () predicate returns random number of records (zero in this example). How could that be possible? I have no clue. Also, note that the id IN () predicate is applied before the left table scan, which shouldn't happen. I think the unreliable result has something to do with that. We'll have to wait until SQL Server team fixes the bug to be able write the queries the way you wrote.
As you are using Correlated subquery, where outer query name is referred by subquery name, you are getting different result set.
I would suggest you to use the name parameter twice to avoid this problem.
select *
from Products as a
where name='A'
and id in
(select top 2 id
from Products as b
where b.name = 'A'
order by newid())
Similarly, for update also, you change the query
update a
set DefaultImageId=1
from Products as a
where name = 'A'
and id in (
select top 2 id
from Products as b
where b.name = 'A'
order by newid()
)
Something like this
This query selects 2 rows at random.
with top_2_cte(id, [name]) as (
select top 2 id, [name]
from Products
where name='A'
order by newid())
select a.*
from Products a
join
top_2_cte ttc on a.Id=ttc.id
and a.[Name]=ttc.[name];
Update statement
;with top_2_cte(id, [name]) as (
select top 2 id, [name]
from Products
where name='A'
order by newid())
update a
set DefaultImageId=1
from Products a
join
top_2_cte ttc on a.Id=ttc.id
and a.[Name]=ttc.[name];

Compare the results of a ROW COUNT

I have 2 databases in the same server and I need to compare the registers on each one, since one of the databases is not importing all the information
I was trying to do a ROW count but it's not working
Currently I am doing packages of 100,000 rows approximate, and lookup at them in Excel.
Let's say I want a query that does a count for each ID in TABLE A and then compares the count result VS TABLE B count for each ID, since they are the same ID the count should be the same, and I want that brings me the ID on which there where any mismatch between counts.
--this table will contain the count of occurences of each ID in tableA
declare #TableA_Results table(
ID bigint,
Total bigint
)
insert into #TableA_Results
select ID,count(*) from database1.TableA
group by ID
--this table will contain the count of occurences of each ID in tableB
declare #TableB_Results table(
ID bigint,
Total bigint
);
insert into #TableB_Results
select ID,count(*) from database2.TableB
group by ID
--this table will contain the IDs that doesn't have the same amount in both tables
declare #Discordances table(
ID bigint,
TotalA bigint,
TotalB bigint
)
insert into #Discordances
select TA.ID,TA.Total,TB.Total
from #TableA_Results TA
inner join #TableB_Results TB on TA.ID=TB.ID and TA.Total!=TB.Total
--the final output
select * from #Discordances
The question is vague, but maybe this SQL Code might help nudge you in the right direction.
It grabs the IDs and Counts of each ID from database one, the IDs and counts of IDs from database two, and compares them, listing out all the rows where the counts are DIFFERENT.
WITH DB1Counts AS (
SELECT ID, COUNT(ID) AS CountOfIDs
FROM DatabaseOne.dbo.TableOne
GROUP BY ID
), DB2Counts AS (
SELECT ID, COUNT(ID) AS CountOfIDs
FROM DatabaseTwo.dbo.TableTwo
GROUP BY ID
)
SELECT a.ID, a.CountOfIDs AS DBOneCount, b.CountOfIDs AS DBTwoCount
FROM DB1Counts a
INNER JOIN DB2Counts b ON a.ID = b.ID
WHERE a.CountOfIDs <> b.CountOfIDs
This SQL selects from the specific IDs using the "Database.Schema.Table" notation. So replace "DatabaseOne" and "DatabaseTwo" with the names of your two databases. And of course replace TableOne and TableTwo with the names of your tables (I'm assuming they're the same). This sets up two selects, one for each database, that groups by ID to get the count of each ID. It then joins these two selects on ID, and returns all rows where the counts are different.
You could full outer join two aggregate queries and pull out ids that are either missing in one table, or for which the record count is different:
select coalesce(ta.id, tb.id), ta.cnt, tb.cnt
from
(select id, count(*) cnt from tableA) ta
full outer join (select id, count(*) cnt from tableB) tb
on ta.id = tb.id
where
coalesce(ta.cnt, -1) <> coalesce(tb.cnt, -1)
You seem to want aggregation and a full join:
select coalesce(a.id, b.id) as id, a.cnt, b.cnt
from (select id, count(*) as cnt
from a
group by id
) a full join
(select id, count(*) as cnt
from b
group by id
) b
on a.id = b.id
where coalesce(a.cnt, 0) <> coalesce(b.cnt, 0);

Could this be done more efficiently

I have a table with two columns (p_id, id_type) where the p_id can have multiple types. I need to find the p_ids that do not have a specific type.
P_ID ID_TYPE
----------- -------------
12456 6
12456 7
56897 10
25686 9
25686 22
25686 7
56897 22
This is the query I used but wondering if there is a more efficient way to do this.
select p_id
into #temp1
from table2
where id_type = 6
SELECT
distinct table2.p_id
,table1.NAME
,table1.TYPE
FROM
table2 left join table1
on table2.p_id = table1.p_id
where
table2.p_id not in
(select p_id from #temp1)
and type = 'XYZ'
Expected outcome should be those P_IDs that DO NOT have an ID_TYPE = 6.
P_ID Name Type
56897 Smith Physician
25686 Jones Physician
Assuming I'm understanding your question correctly, you're trying to select all the p_id rows that don't have any corresponding p_id rows with a specific type.
If so, there are a couple of ways to do this. One is to use NOT IN:
select *
from yourtable
where p_id not in (
select p_id
from yourtable
where id_type = 6)
SQL Fiddle Demo
Using NOT EXISTS:
select *
from yourtable t
where not exists (
select 1
from yourtable t2
where t.p_id = t2.p_id and
t2.id_type = 6)
More Fiddle
You could also use an OUTER JOIN to achieve the same result.
If you want just specific p_id's, then you need to add DISTINCT. It's not clear what you're expected output should be.
A more SQLy way to do this is to use a single left join to find something called a Relative Complement. Essentially what we want to say is "Take all of the p_id, then take away all the ones that have an id_type of 6".
SELECT DISTINCT t.p_id
FROM table2 AS t
LEFT OUTER JOIN table2 AS t2 ON t.p_id = t2.p_id
AND t2.id_type = 6
WHERE t2.p_id IS NULL

SQL how to compare a table to itself

So lets say I have a table with this information in it
- Tom BLDG200
- Kevin BLDG200
- Mary BLDG340
I want to find everyone who shares the same building. So I want it to print out Ton and Kevin. But because Mary is by herself it shouldn't print. The way I have been going about it is using INNER JOIN to join them at the buildings but because I am comparing a table to itself it joins even if it's only 1 person. So in my case it would print out Mary even though I don't want it to. How can I make it print out only if 2 or more people share the same building.
Here is an efficient way to solve this query:
select t.*
from table t
where exists (select 1
from table t2
where t2.name <> t.name and t2.building = t.building
) ;
This will optimally take advantage of an index on building, name.
Most databases offer window/analytic functions, which are another efficient approach:
select name, building
from (select t.*, count(*) over (partition by building) as cnt
from table t
) t
where cnt > 1;
Assuming your column names are person and building:
SELECT t1.person, t2.person
FROM `table` t1
JOIN `table` t2
ON ( t1.building = t2.building
AND t1.person > t2.person
);
This line AND t1.person > t2.person solves your problem with Mary.
There is a problem with more persons, beacause they would be divided into pairs. But if this doesn't bother you, that would work.
Also, followind would work (but results appear per person, so you'll have list of every non-lonely persons and buildings they live in)
SELECT t1.person, t1.building
FROM `table` t1
JOIN `table` t2
ON ( t1.building = t2.building
AND t1.person > t2.person
);
Relational algebra 101. I added some more names so you can see that distinct is needed. In my sample data only Jane is alone and should not be in result.
with cte (name, building) as (
values
('Tom', 'BLDG200'),
('Kevin','BLDG200'),
('John', 'BLDG200'),
('Jack', 'BLDG200'),
('Mary', 'BLDG340'),
('Terry','BLDG340'),
('Jane', 'BLDG341')
)
select
distinct
a.name, a.building
from
cte a
join cte b on (a.name <> b.name and a.building = b.building)
SQLFiddle

Finding unique combinations of columns

I'm trying to write a select query but am having trouble, probably because I'm not familiar with SQL Server (usually use MySQL).
Basically what I need to do is find the number of unique combinations of 2 columns, one a Varchar and one a Double.
There are less rows in one than another, so I've been trying to figure out the right way to do this.
Essentially pretend Table.Varchar has in it:
Table.Varchar
--------------
apple
orange
and Table.Float has in it:
Table.Float
--------------
1
2
3.
How could I write a query which returns
QueryResult
-------------
apple1
apple2
apple3
orange1
orange2
orange3
Long day at work and I think I'm just overthinking this what I've tried so far is to concat the two columns and then count but it's not working. Any ideas to better go about this?
Select T1.VarcharField + CAST(T2.FloatField as Varchar(10)) as [Concat]
from Table.Varchar T1
CROSS JOIN Table.Float T2
this way, you are generating the fields
so, then group by and use Count
select T.Concat, count(*) from
(Select T1.VarcharField + CAST(T2.FloatField as Varchar(10)) as [Concat]
from Table.Varchar T1
CROSS JOIN Table.Float T2) T
group by T.Concat order by count(*) asc
If they are in the same table:
SELECT a.Field1, b.Field2
FROM [Table] a
CROSS JOIN [Table] b
or if they are in seperate tables:
SELECT a.Field1, b.Field2
FROM [Table1] a
CROSS JOIN [Table2] b
Keep in mind that the above queries will match ALL records from the first table with ALL records from the second table, creating a cartesian product.
This will eliminate duplicates:
DECLARE #Varchar TABLE(v VARCHAR(32));
DECLARE #Float TABLE(f FLOAT);
INSERT #Varchar SELECT 'apple'
UNION ALL SELECT 'orange'
UNION ALL SELECT 'apple';
INSERT #Float SELECT 1
UNION ALL SELECT 2
UNION ALL SELECT 3;
SELECT v.v + CONVERT(VARCHAR(12), f.f)
FROM #Varchar AS v
CROSS JOIN #Float AS f
GROUP BY v.v, f.f;
A cross join is a join where each record in one table is combined with each record of the other table. Select the distinct values from the table and join them.
select x.Varchar, y.Float
from (select distinct Varchar from theTable) x
cross join (select distinct Float from theTable) y
To find the number of combinations you don't have to actually return all combinations, just count them.
select
(select count(distinct Varchar) from theTable) *
(select count(distinct Float) from theTable)
Try This
Possible Cominations.
SELECT
DISTINCT T1.VarField+CONVERT(VARCHAR(12),T2.FtField) --Get Unique Combinations
FROM Table1 T1 CROSS JOIN Table2 T2 --From all possible combinations
WHERE T1.VarField IS NOT NULL AND T2.FtField IS NOT NULL --Making code NULL Proof
and to just get the Possible Cominations Count
SELECT Count(DISTINCT T1.VarcharField + CONVERT(VARCHAR(12), T2.FloatField))
FROM Table1 T1
CROSS JOIN Table2 T2
WHERE T1.VarcharField IS NOT NULL AND T2.FloatField IS NOT NULL