SQL : get PK IDs that are in a CSV list (or field) that aren't in another CSV list - sql

(Edited to add info on the context)
I have 2 fields in Table A containing CSV lists of IDs of records in 2 other tables. The "USERS" field contains a CSV list of records in USERS_TABLE; the "CONTACTS" field contains a CSV list of records in CONTACTS_TABLE:
USERS_FIELD: "1,2,3,4,5,6"
CONTACTS_FIELD: "2,4,6,8"
I want to find all records that are in USERS_FIELD list but not in CONTACTS_FIELD list. In this instance I want records 1,3,5. The lists can be anywhere from 1 ID to hundreds.
The solution has to run in the WHERE clause of a query. My environment is a VBScript-based scripting language inside a COTS product: MicroFocus/Serena SBM running on MS Windows Server and SQL Server 2012. The scripting language allows me to specify the WHERE and ORDERBY clauses and it does the query and returns the results. The storage of multiple record IDs as CSV is built into the product. I can't do anything about it, nor can I create SQL temp tables or define SQL functions. The implementation of the host scriping language removed arrays and the "Split" function. While I can parse the CSV to a Dictionary object, iterating a pair of those, each with several hundred elements is not fast. This is all happening while the end-user is waiting for a web page to complete. Again, that's how the product was designed.
Can I use a UNION type operator and do something like:
Select ID from USERS_TABLE Where ID in USERS_FIELD
MINUS
Select ID from CONTACTS_TABLE Where ID in CONTACTS_FIELD

Not sure I follow the requirement for the solution needing to run in the WHERE clause. If you're using SQL Server 2017, you can take advantage of the STRING_SPLIT (also available in SQL Server 2016) and STRING_AGG functions.
DROP TABLE IF EXISTS #A;
CREATE TABLE #A (id INT PRIMARY KEY IDENTITY, users VARCHAR(MAX), contacts VARCHAR(MAX));
INSERT INTO #A (users, contacts)
VALUES
('1,2,3,4,5,6', '2,4,6,8'),
('3,5,6', '4,6,9'),
('2,4,7,9', '2,4,9');
SELECT
A.id,
A.users,
A.contacts,
STRING_AGG(B.value, ',') intersection
FROM #A A
CROSS APPLY STRING_SPLIT(users, ',') B
WHERE NOT EXISTS (SELECT * FROM STRING_SPLIT(A.contacts, ',') X1 WHERE B.value = X1.value) -- where user is not in contacts
GROUP BY
A.id,
A.users,
A.contacts;

Related

Trying to append a query onto a temporary table

Using SQL Server 2008-R2
I have a csv of purchase IDs and in my database there is a table with these purchase IDs and there corresponding User IDs in our system. I need these to run a more complicated query after that using. I tried to bulk insert or run import wizard but I don't have permission. My new idea is to create a #temp using SELECT INTO and then have the query inside that like below.
SELECT *
INTO ##PurchaseIDs
FROM
(
SELECT PurchaseID, UserID, Added
FROM Users
WHERE PurchaseID IN (
/* These are the csv IDs just copied and pasted in */
'49397828',
'49397883',
etc.
What happens is that there are ~55,000 IDs so I get this error.
The query processor ran out of internal resources and could not
produce a query plan. This is a rare event and only expected for
extremely complex queries or queries that reference a very large
number of tables or partitions. Please simplify the query. If you
believe you have received this message in error, contact Customer
Support Services for more information.
It works if I upload about 30,000 so my new plan is to see if I can make a temp table, then append a new table to the end of that. I am also open to other ideas on how to accomplish what I am looking to do. I attached an idea of what I am thinking below.
INSERT *
INTO ##PurchaseIDs
FROM (
SELECT PurchaseID, UserID, Added
FROM Users
WHERE PurchaseID IN (
/* These are the OTHER csv IDS just copied and pasted in */
'57397828',
'57397883',
etc.
You need to create a temp table and insert the values in IN clause to the temp table and Join the temp table to get the result
Create table #PurchaseIDs (PurchaseID int)
insert into #PurchaseIDs (PurchaseID)
Select '57397828'
Union All
Select '57397828'
Union All
......
values from csv
Now use Exists to check the existence of PurchaseID in temp table instead of IN clause
SELECT PurchaseID,
UserID,
Added
FROM Users u
WHERE EXISTS (SELECT 1
FROM #PurchaseIDs p
WHERE u.PurchaseID = p.PurchaseID)

Retrieving entries of linked list in relational database

For my project, I implemented linked list with rdbms. The linked list uses rowid column as a pointer, and contains prior, next and owner pointer(from different table).
The simple example would be like this.
CREATE TABLE EMPLOYEE
(
EMP_ID NUMBER(4) NOT NULL,
OFFICE_CODE CHAR(2),
OFF_EMP_prior ROWID,
OFF_EMP_next ROWID,
OFF_EMP_owner ROWID
);
{EMP1,(NULL,EMP2,OFF1)} - {EMP2,(EMP1,EMP3,OFF1)} - {EMP3,(EMP2,NULL,OFF1)}
Now I have to implement a retrieval function like "Find 'nth(integer)' entry of the list which has 'OFF1' as a owner".
This can be simply done by using loop to traverse the linked list. But this requires too many SQL operations for one retrieval. (I know that using sequence number can be another option, but this is the decision made so far.)
Instead, I found SELECT - CONNECTED BY in oracle SQL, and tried
select * from EMPLOYEE
where OFF_EMP_owner = [OFF_ROWID]
connect by nocycle OFF_EMP_prior = rowid;
This query works for retrieving entries of the list, but the order of the result is not as I expected (something like EMP3-EMP1-EMP2).
Is it possible to retrieve entries of the linked list and sort them by the order of the list with SELECT-CONNECT BY'? Or is there exists more suitable SQL?
select * from EMPLOYEE
where DEPT_EMPLOYEE_owner = [OWNER_ROWID}
start with OFF_EMP_prior is NULL
connect by OFF_EMP_prior = prior rowid;
Solved the problem with the query above. 'prior' should be used instead of nocycle.

How do I stitch together tables using SQL?

Ok, I am learning SQL and just installed SQL Server. I've read about outer joins and inner joins but am not sure either is what I want. Basically, I want to reconstruct a text file that has been "chopped" up into 5 smaller text files. The columns are the same across all 5 text files, e.g. name, age, telephone #, etc. The only difference is that they have different numbers of rows of data.
What I'd like to do is "append" the data from each file into one "mega-file". Should I create a table containing all of the data, or just create a view? Then, how do I implement this...do I use union? Any guidance would be appreciated, thanks.
Beyond your immediate goal of merging the five files it sounds like you want the data contained in your text files to be generally available for more flexible analysis.
An example of why you might require this is if you need to merge other data with the data in your text files. (If this is not the case then Oded is right on the money, and you should simply use logparser or Visual Log Parser.)
Since your text files all contain the same columns you can insert them into one table*.
Issue a CREATE statement defining your table
Insert data into your newly created table**
Create an index on field(s) which might often be used in query predicates
Write a query or create a view to provide the data you need
*Once you have your data in a table you can think about creating views on the table, but to start you might just run some ad hoc queries.
**Note that it is possible to accomplish Step 2 in other ways. Alternatively you can programmatically construct and issue your INSERT statements.
Examples of each of the above steps are included below, and a tested example can be found at: http://sqlfiddle.com/#!6/432f7/1
-- 1.
CREATE TABLE mytable
(
id int identity primary key,
person_name varchar(200),
age integer,
tel_num varchar(20)
);
-- 2. or look into BULK INSERT option https://stackoverflow.com/q/11016223/42346
INSERT INTO mytable
(person_name, age, tel_num)
VALUES
('Jane Doe', 31, '888-888-8888'),
('John Smith', 24, '888-555-1234');
-- 3.
CREATE UNIQUE INDEX mytable_age_idx ON mytable (age);
-- 4.
SELECT id, person_name, age, tel_num
FROM mytable
WHERE age < 30;
You need to look into using UNION.
SELECT *
FROM TABLE1
UNION
SELECT *
FROM TABLE2
And I would just create a View -- no need to have a stored table especially if the data ever changes.

SQL: I need to take two fields I get as a result of a SELECT COUNT statement and populate a temp table with them

So I have a table which has a bunch of information and a bunch of records. But there will be one field in particular I care about, in this case #BegAttField# where only a subset of records have it populated. Many of them have the same value as one another as well.
What I need to do is get a count (minus 1) of all duplicates, then populate the first record in the bunch with that count value in a new field. I have another field I call BegProd that will match #BegAttField# for each "first" record.
I'm just stuck as to how to make this happen. I may have been on the right path, but who knows. The SELECT statement gets me two fields and as many records as their are unique #BegAttField#'s. But once I have them, I haven't been able to work with them.
Here's my whole set of code, trying to use a temporary table and SELECT INTO to try and populate it. (Note: the fields with # around the names are variables for this 3rd party app)
CREATE TABLE #temp (AttCount int, BegProd varchar(255))
SELECT COUNT(d.[#BegAttField#])-1 AS AttCount, d.[#BegAttField#] AS BegProd
INTO [#temp] FROM [Document] d
WHERE d.[#BegAttField#] IS NOT NULL GROUP BY [#BegAttField#]
UPDATE [Document] d SET d.[#NumAttach#] =
SELECT t.[AttCount] FROM [#temp] t INNER JOIN [Document] d1
WHERE t.[BegProd] = d1.[#BegAttField#]
DROP TABLE #temp
Unfortunately I'm running this script through a 3rd party database application that uses SQL as its back-end. So the errors I get are simply: "There is already an object named '#temp' in the database. Incorrect syntax near the keyword 'WHERE'. "
Comment out the CREATE TABLE statement. The SELECT INTO creates that #temp table.

SQL query select from table and group on other column

I'm phrasing the question title poorly as I'm not sure what to call what I'm trying to do but it really should be simple.
I've a link / join table with two ID columns. I want to run a check before saving new rows to the table.
The user can save attributes through a webpage but I need to check that the same combination doesn't exist before saving it. With one record it's easy as obviously you just check if that attributeId is already in the table, if it is don't allow them to save it again.
However, if the user chooses a combination of that attribute and another one then they should be allowed to save it.
Here's an image of what I mean:
So if a user now tried to save an attribute with ID of 1 it will stop them, but I need it to also stop them if they tried ID's of 1, 10 so long as both 1 and 10 had the same productAttributeId.
I'm confusing this in my explanation but I'm hoping the image will clarify what I need to do.
This should be simple so I presume I'm missing something.
If I understand the question properly, you want to prevent the combination of AttributeId and ProductAttributeId from being reused. If that's the case, simply make them a combined primary key, which is by nature UNIQUE.
If that's not feasible, create a stored procedure that runs a query against the join for instances of the AttributeId. If the query returns 0 instances, insert the row.
Here's some light code to present the idea (may need to be modified to work with your database):
SELECT COUNT(1) FROM MyJoinTable WHERE AttributeId = #RequestedID
IF ##ROWCOUNT = 0
BEGIN
INSERT INTO MyJoinTable ...
END
You can control your inserts via a stored procedure. My understanding is that
users can select a combination of Attributes, such as
just 1
1 and 10 together
1,4,5,10 (4 attributes)
These need to enter the table as a single "batch" against a (new?) productAttributeId
So if (1,10) was chosen, this needs to be blocked because 1-2 and 10-2 already exist.
What I suggest
The stored procedure should take the attributes as a single list, e.g. '1,2,3' (comma separated, no spaces, just integers)
You can then use a string splitting UDF or an inline XML trick (as shown below) to break it into rows of a derived table.
Test table
create table attrib (attributeid int, productattributeid int)
insert attrib select 1,1
insert attrib select 1,2
insert attrib select 10,2
Here I use a variable, but you can incorporate as a SP input param
declare #t nvarchar(max) set #t = '1,2,10'
select top(1)
t.productattributeid,
count(t.productattributeid) count_attrib,
count(*) over () count_input
from (select convert(xml,'<a>' + replace(#t,',','</a><a>') + '</a>') x) x
cross apply x.x.nodes('a') n(c)
cross apply (select n.c.value('.','int')) a(attributeid)
left join attrib t on t.attributeid = a.attributeid
group by t.productattributeid
order by countrows desc
Output
productattributeid count_attrib count_input
2 2 3
The 1st column gives you the productattributeid that has the most matches
The 2nd column gives you how many attributes were matched using the same productattributeid
The 3rd column is how many attributes exist in the input
If you compare the last 2 columns and the counts
match - you can use the productattributeid to attach to the product which has all these attributes
don't match - then you need to do an insert to create a new combination