Comparing multiple tables with one another - sql

I do not necessarily spend my time creating sql-queries, I maintain and search for mistakes in databases. I constantly have to compare two types of tables, and if the database is small, I dont mind just writing a small query. But at times, some databases are huge und the amount of tables is overwhelming.
I have one table-type that has compressed data and another that has aggregates comprised of the compressed data. At times, the AggregateTables are missing some IDs, some data was not calculated. If it is just one AggregateTable, I just compare it to its corresponding compressed table and i can immediately see what needs to be recalculated(code for that is shown below).
select distinct taguid from TLG.TagValueCompressed_0_100000
where exists
(select * from tlg.AggregateValue_0_100000 where
AggregateValue_0_100000.TagUID = TagValueCompressed_0_100000.TagUID)
I would like to have a table, that compares all tables with another and spits out a table with all non existing tags. My SQl knowledge is at its infancy, and my job does not require me to be a sql freak. But a query that does said problem, would help me alot. Does anyone have any suggestions for a solution?
Relevant comlumns: Taguids, thats it.
Optimal Table:
Existing Tags missing Tags
1
2
3
4
.
.
Numbers meaning what table : "_0_100000", "_0_100001" ...

So let's assume this query produced the output you want, i.e. the list of all possible tags in a given table set (0_100000, etc.) and columns denoting whether the given tag exists in AggregateValue and TagValueCompressed:
SELECT '0_100000' AS TableSet
, ISNULL(AggValue.TagUID, TagValue.TagUID) AS TagUID
, IIF(TagValue.TagUID IS NOT NULL, 1, 0) AS ExistsInTag
, IIF(AggValue.TagUID IS NOT NULL, 1, 0) AS ExistsInAgg
FROM (SELECT DISTINCT TagUID FROM tlg.AggregateValue_0_100000) AggValue
FULL OUTER
JOIN (SELECT DISTINCT TagUID FROM TLG.TagValueCompressed_0_100000) TagValue
ON AggValue.TagUID = TagValue.TagUID
So in order to execute it for multiple tables, we can make this query a template:
DECLARE #QueryTemplate NVARCHAR(MAX) = '
SELECT ''$SUFFIX$'' AS TableSet
, ISNULL(AggValue.TagUID, TagValue.TagUID) AS TagUID
, IIF(TagValue.TagUID IS NOT NULL, 1, 0) AS ExistsInTag
, IIF(AggValue.TagUID IS NOT NULL, 1, 0) AS ExistsInAgg
FROM (SELECT DISTINCT TagUID FROM tlg.AggregateValue_$SUFFIX$) AggValue
FULL OUTER
JOIN (SELECT DISTINCT TagUID FROM TLG.TagValueCompressed_$SUFFIX$) TagValue
ON AggValue.TagUID = TagValue.TagUID';
Here $SUFFIX$ denotes the 0_100000 etc. We can now execute this query dynamically for all tables matching a certain pattern, like say you have 500 of these tables.
DECLARE #query NVARCHAR(MAX) = ''
-- Produce a query for a given suffix out of the template
-- suffix is the last 8 characters of the table's name
-- combine all the queries, for all tables, using UNION ALL
SELECT #query += CONCAT(REPLACE(#QueryTemplate, '$SUFFIX$', RIGHT(name, 8)), ' UNION ALL ')
FROM sys.tables
WHERE name LIKE 'TagValueCompressed%';
-- Get rid of the trailing UNION ALL
SET #Query = LEFT(#Query, LEN(#Query) - LEN('UNION ALL '));
EXECUTE sp_executesql #Query
This will yield combined results for all matching tables.
Here is a working example on dbfiddle.

Related

How do I use SQL Server to select into another table?

I am consolidating a web service. I am replacing multiple calls to the service with one call that contains the data.
I have created a table:
CREATE TABLE InvResults
(
Invoices nvarchar(max),
InvoiceDetails nvarchar(max),
Products nvarchar(max)
);
I used (max) because I don't know how complex the json will get at this time.
I need to do some sort of selects like this (this is pseudocode, not actual SQL):
SELECT
(SELECT *
INTO InvResults for Column Invoices
FROM MyInvoiceTable
WHERE SomeColumns = 'someStuffvariable'
FOR JSON PATH, ROOT('invoices')) AS invoices;
SELECT
(SELECT *
INTO InvResults for Column InvoiceDetails
FROM MyInvoiceDetailsTable
WHERE SomeColumns = 'someStuffvariable'
FOR JSON PATH, ROOT('invoicedetails')) AS invoicedetails;
I don't know how to format this and my google skills are failing me at this point. I understand that I probably want to use an UPDATE statement, but I'm not sure how to do this in combination with the rest of my requirements. I'm exploring How do I UPDATE from a SELECT in SQL Server? but I am still at a halt.
The end result should be a table "InvResults" that has 3 columns containing one row with results from Select statements as JSON. The column names should be defined the same as the json root objects.
INSERT INTO InvResults(Invoices,InvoidesDetails)
SELECT
(SELECT *
INTO InvResults for Column Invoices
FROM MyInvoiceTable
WHERE SomeColumns = 'someStuffvariable'
FOR JSON PATH, ROOT('invoices'))
,
(SELECT *
INTO InvResults for Column InvoiceDetails
FROM MyInvoiceDetailsTable
WHERE SomeColumns = 'someStuffvariable'
FOR JSON PATH, ROOT('invoicedetails'))
;
Because the SELECT.. FOR JSON is only returning 1 row above works.
The third field is easily to added, but left to do for yourself 😉

How to use Join with like operator and then casting columns

I have 2 tables with these columns:
CREATE TABLE #temp
(
Phone_number varchar(100) -- example data: "2022033456"
)
CREATE TABLE orders
(
Addons ntext -- example data: "Enter phone:2022033456<br>Thephoneisvalid"
)
I have to join these two tables using 'LIKE' as the phone numbers are not in same format. Little background I am joining the #temp table on the phone number with orders table on its Addons value. Then again in WHERE condition I am trying to match them and get some results. Here is my code. But my results that I am getting are not accurate. As its not returning any data. I don't know what I am doing wrong. I am using SQL Server.
select
*
from
order_no as n
join
orders as o on n.order_no = o.order_no
join
#temp as t on t.phone_number like '%'+ cast(o.Addons as varchar(max))+'%'
where
t.phone_number = '%' + cast(o.Addons as varchar(max)) + '%'
You can not use LIKE statement in the JOIN condition. Please provide more information on your tables. You have to convert the format of one of the phone field to compile with other phone field format in order to join.
I think your join condition is in the wrong order. Because your question explicitly mentions two tables, let's stick with those:
select *
from orders o JOIN
#temp t
on cast(o.Addons as varchar(max)) like '%' + t.phone_number + '%';
It has been so long since I dealt with the text data type (in SQL Server), that I don't remember if the cast() is necessary or not.
Instead of trying to do everything in a single top-level query, you should apply a transformation projection to your orders table and use that as a subquery, which will make the query easier to understand.
Using the CHARINDEX function will make this a lot easier, however it does not support ntext, you will need to change your schema to use nvarchar(max) instead - which you should be doing anyway as ntext is deprecated, fortunately you can use CONVERT( nvarchar(max), someNTextValue ), though this will reduce performance as you won't be able to use any indexes on your ntext values - but this query will run slowly anyway.
SELECT
orders2.*,
CASE WHEN orders2.PhoneStart > 0 AND orders2.PhoneEnd > 0 THEN
SUBSTRING( orders2.Addons, orders2.PhoneStart, orders2.PhoneEnd - orders2.PhoneStart )
ELSE
NULL
END AS ExtractedPhoneNumber
FROM
(
SELECT
orders.*, -- never use `*` in production, so replace this with the actual columns in your orders table
CHARINDEX('Enter phone:', Addons) AS PhoneStart,
CHARINDEX('<br>Thephoneisvalid', AddOns, CHARINDEX('Enter phone:', Addons) ) AS PhoneEnd
FROM
orders
) AS orders2
I suggest converting the above into a VIEW or CTE so you can directly query it in your JOIN expression:
CREATE VIEW ordersWithPhoneNumbers AS
-- copy and paste the above query here, then execute the batch to create the view, you only need to do this once.
Then you can use it like so:
SELECT
* -- again, avoid the use of the star selector in production use
FROM
ordersWithPhoneNumbers AS o2 -- this is the above query as a VIEW
INNER JOIN order_no ON o2.order_no = order_no.order_no
INNER JOIN #temp AS t ON o2.ExtractedPhoneNumber = t.phone_number
Actually, I take back my previous remark about performance - if you add an index to the ExtractedPhoneNumber column of the ordersWithPhoneNumbers view then you'll get good performance.

Performance issues with UNION of large tables

I have seven large tables, that can be storing between 100 to 1 million rows at any time. I'll call them LargeTable1, LargeTable2, LargeTable3, LargeTable4...LargeTable7. These tables are mostly static: there are no updates nor new inserts. They change only once every two weeks or once a month, when they are truncated and a new batch of registers are inserted in each.
All these tables have three fields in common: Headquarter, Country and File. Headquarter and Country are numbers in the format '000', though in two of these tables they are parsed as int due to some other system necessities.
I have another, much smaller table called Headquarters with the information of each headquarter. This table has very few entries. At most 1000, actually.
Now, I need to create a stored procedure that returns all those headquarters that appear in the large tables but are either absent in the Headquarters table or have been deleted (this table is deleted logically: it has a DeletionDate field to check this).
This is the query I've tried:
CREATE PROCEDURE deletedHeadquarters
AS
BEGIN
DECLARE #headquartersFiles TABLE
(
hq int,
countryFile varchar(MAX)
);
SET NOCOUNT ON
INSERT INTO #headquartersFiles
SELECT headquarter, CONCAT(country, ' (', file, ')')
FROM
(
SELECT DISTINCT CONVERT(int, headquarter) as headquarter,
CONVERT(int, country) as country,
file
FROM LargeTable1
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable2
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable3
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable4
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable5
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable6
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable7
) TC
SELECT RIGHT('000' + CAST(st.headquarter AS VARCHAR(3)), 3) as headquarter,
MAX(s.deletionDate) as deletionDate,
STUFF
(
(SELECT DISTINCT ', ' + st2.countryFile
FROM #headquartersFiles st2
WHERE st2.headquarter = st.headquarter
FOR XML PATH('')),
1,
1,
''
) countryFile
FROM #headquartersFiles as st
LEFT JOIN headquarters s ON CONVERT(int, s.headquarter) = st.headquarter
WHERE s.headquarter IS NULL
OR s.deletionDate IS NOT NULL
GROUP BY st.headquarter
END
This sp's performance isn't good enough for our application. It currently takes around 50 seconds to complete, with the following total rows for each table (just to give you an idea about the sizes):
LargeTable1: 1516666 rows
LargeTable2: 645740 rows
LargeTable3: 1950121 rows
LargeTable4: 779336 rows
LargeTable5: 1100999 rows
LargeTable6: 16499 rows
LargeTable7: 24454 rows
What can I do to improve performance? I've tried to do the following, with no much difference:
Inserting into the local table by batches, excluding those headquarters I've already inserted and then updating the countryFile field for those that are repeated
Creating a view for that UNION query
Creating indexes for the LargeTables for the headquarter field
I've also thought about inserting these missing headquarters in a permanent table after the LargeTables change, but the Headquarters table can change more often, and I would like not having to change its module to keep these things tidy and updated. But if it's the best possible alternative, I'd go for it.
Thanks
Take this filter
LEFT JOIN headquarters s ON CONVERT(int, s.headquarter) = st.headquarter
WHERE s.headquarter IS NULL
OR s.deletionDate IS NOT NULL
And add it to each individual query in the union and insert into #headquartersFiles
It might seem like this makes a lot more filters but it will actually speed stuff up because you are filtering before you start processing as a union.
Also take out all your DISTINCT, it probably won't speed it up but it seems silly because you are doing a UNION and not a UNION all.
Do the filtering at each step. But first, modify the headquarters table so it has the right type for what you need . . . along with an index:
alter table headquarters add headquarter_int as (cast(headquarter as int));
create index idx_headquarters_int on headquarters(headquarters_int);
SELECT DISTINCT headquarter, country, file
FROM LargeTable5 lt5
WHERE NOT EXISTS (SELECT 1
FROM headquarters s
WHERE s.headquarter_int = lt5.headquarter and s.deletiondate is not null
);
Then, you want an index on LargeTable5(headquarter, country, file).
This should take less than 5 seconds to run. If so, then construct the full query, being sure that the types in the correlated subquery match and that you have the right index on the full table. Use union to remove duplicates between the tables.
I'd try doing the filtering with each individual table first. You just need to account for the fact that a headquarter might appear in one table, but not another. You can do this like so:
SELECT
headquarter
FROM
(
SELECT DISTINCT
headquarter,
'table1' AS large_table
FROM
LargeTable1 LT
LEFT OUTER JOIN Headquarters HQ ON HQ.headquarter = LT.headquarter
WHERE
HQ.headquarter IS NULL OR
HQ.deletion_date IS NOT NULL
UNION ALL
SELECT DISTINCT
headquarter,
'table2' AS large_table
FROM
LargeTable2 LT
LEFT OUTER JOIN Headquarters HQ ON HQ.headquarter = LT.headquarter
WHERE
HQ.headquarter IS NULL OR
HQ.deletion_date IS NOT NULL
UNION ALL
...
) SQ
GROUP BY headquarter
HAVING COUNT(*) = 5
That would make sure that it's missing from all five tables.
Table variables have horrible performance because sql server does not generate statistics for them. Instead of a table variable, try using a temp table instead, and if headquarter + country + file is unique in the temp table, add a unique constraint (which will create a clustered index) in the temp table definition. You can set indexes on a temp table after creating it, but for various reasons SQL Server may ignore it.
Edit: as it turns out, you can in fact create indexes on table variables, even non-unique in 2014+.
Secondly, try not to use functions in your joins or where clauses - doing so often causes performance problems.
The real answer is to create separate INSERT statements for each table with the caveat that data to be inserted does not exist in the destination table.

compare some lists in where condition sql

I have some question in Sqlserver2012. I have a table that contains a filed that save who System Used from this information and separated by ',', I want to set into parameter the name of Systems and query the related rows:
declare #System nvarchar(50)
set #System ='BPM,SEM'
SELECT *
FROM dbo.tblMeasureCatalog t1
where ( ( select Upper(value) from dbo.split(t1.System,','))
= any( select Upper(value) from dbo.split(#System,',')))
dbo.split is a function to return systems in separated rows
Forgetting for a second that storing delimited lists in a relational database is abhorrent, you can do it using a combination of INTERSECT and EXISTS, for example:
DECLARE #System NVARCHAR(50) = 'BPM,SEM';
DECLARE #tblMeasureCatalog TABLE (System VARCHAR(MAX));
INSERT #tblMeasureCatalog VALUES ('BPM,XXX'), ('BPM,SEM'), ('XXX,SEM'), ('XXX,YYY');
SELECT mc.System
FROM #tblMeasureCatalog AS mc
WHERE EXISTS
( SELECT Value
FROM dbo.Split(mc.System, ',')
INTERSECT
SELECT Value
FROM dbo.Split(#System, ',')
);
Returns
System
---------
BPM,XXX
BPM,SEM
XXX,SEM
EDIT
Based on your question stating "Any" I assumed that you wanted rows where the terms matched any of those provided, based on your comment I now assume you want records where the terms match all. This is a fairly similar approach but you need to use NOT EXISTS and EXCEPT instead:
Now all is still quite ambiguous, for example if you search for "BMP,SEM" should it return a record that is "BPM,SEM,YYY", it does contain all of the searched terms, but it does contain additional terms too. So the approach you need depends on your requirements:
DECLARE #System NVARCHAR(50) = 'BPM,SEM,XXX';
DECLARE #tblMeasureCatalog TABLE (System VARCHAR(MAX));
INSERT #tblMeasureCatalog
VALUES
('BPM,XXX'), ('BPM,SEM'), ('XXX,SEM'), ('XXX,YYY'),
('SEM,BPM'), ('SEM,BPM,XXX'), ('SEM,BPM,XXX,YYY');
-- METHOD 1 - CONTAINS ALL SEARCHED TERMS BUT CAN CONTAIN ADDITIONAL TERMS
SELECT mc.System
FROM #tblMeasureCatalog AS mc
WHERE NOT EXISTS
(
SELECT Value
FROM dbo.Split(#System, ',')
EXCEPT
SELECT Value
FROM dbo.Split(mc.System, ',')
);
-- METHOD 2 - ONLY CONTAINS ITEMS WITHIN THE SEARCHED TERMS, BUT NOT
-- NECESSARILY ALL OF THEM
SELECT mc.System
FROM #tblMeasureCatalog AS mc
WHERE NOT EXISTS
( SELECT Value
FROM dbo.Split(mc.System, ',')
EXCEPT
SELECT Value
FROM dbo.Split(#System, ',')
);
-- METHOD 3 - CONTAINS ALL ITEMS IN THE SEARCHED TERMS, AND NO ADDITIONAL ITEMS
SELECT mc.System
FROM #tblMeasureCatalog AS mc
WHERE NOT EXISTS
( SELECT Value
FROM dbo.Split(#System, ',')
EXCEPT
SELECT Value
FROM dbo.Split(mc.System, ',')
)
AND LEN(mc.System) = LEN(#System);
You have a problem with your data structure because you are storing lists of things in a comma-delimited list. SQL has a great data structure for storing lists. It goes by the name "table". You should have a junction table with one row per "measure catalog" and "system".
Sometimes, you are stuck with other people's really bad design decisions. One solution is to use split(). Here is one method:
select mc.*
from dbo.tblMeasureCatalog mc
where exists (select 1
from dbo.split(t1.System, ',') t1s join
dbo.split(#System, ',') ss
on upper(t1s.value) = upper(ss.value)
);
you can try this :
declare #System nvarchar(50)
set #System ='BPM,SEM'
SELECT * from dbo.tblMeasureCatalog t1 inner join dbo.Split (#System ,',') B on t1.it=B.items

Check if a list of items already exists in a SQL database

I want to create a group of users only if the same group does not exist already in the database.
I have a GroupUser table with three columns: a primary key, a GroupId, and a UserId. A group of users is described as several lines in this table sharing a same GroupId.
Given a list of UserId, I would like to find a matching GroupId, if it exists.
What is the most efficient way to do that in SQL?
Let say your UserId list is stored in a table called 'MyUserIDList', the following query will efficiently return the list of GroupId containing exactly your user list. (SQL Server Syntax)
Select GroupId
From (
Select GroupId
, count(*) as GroupMemberCount
, Sum(case when MyUserIDList.UserID is null then 0 else 1 End) as GroupMemberCountInMyList
from GroupUser
left outer join MyUserIDList on GroupUser.UserID=MyUserIDList.UserID
group by GroupId
) As MySubQuery
Where GroupMemberCount=GroupMemberCountInMyList
There are couple of ways of doing this. This answer is for sql server only (as you have not mentioned it in your tags)
Pass the list of userids in comma seperated to a stored procedure and in the SP create a dynamic query with this and use the EXEC command to execute the query. This link will guide you in this regard
Use a table-valued parameter in a SP. This is applicable to sql server 2008 and higher only.
The following link will help you get started.
http://www.codeproject.com/Articles/113458/TSQL-Passing-array-list-set-to-stored-procedure-MS
Hope this helps.
One other solution is that you convert the input list into a table. This can be done with various approaches. Unions, temporary tables and others. A neat solution combines the answer of
user1461607 for another question here on SO, using a comma-separated string.
WITH split(word, csv) AS (
-- 'initial query' (see SQLite docs linked above)
SELECT
'', -- place holder for each word we are looking for
'Auto,A,1234444,' -- items you are looking for
-- make sure the list ends with a comma !!
UNION ALL SELECT
substr(csv, 0, instr(csv, ',')), -- each word contains text up to next ','
substr(csv, instr(csv, ',') + 1) -- next recursion parses csv after this ','
FROM split -- recurse
WHERE csv != '' -- break recursion once no more csv words exist
) SELECT word, exisiting_data
FROM split s
-- now join the key you want to check for existence!
-- for demonstration purpose, I use an outer join
LEFT OUTER JOIN (select 'A' as exisiting_data) as t on t.exisiting_data = s.word
WHERE s.word != '' -- make sure we clamp the empty strings from the split function
;
Results in:
Auto,null
A,A
1234444,null