SQL - Stripping a string and using it in a condition - sql

So I have a SQL query issue given to me which i'm struggling to resolve:
It currently brings back 6710445 rows but i need to apply further conditions based on a particular string field.
SELECT
Table1.ExampleColumn1 -- (ID)
,Table1.ExampleColumn2
,Table2.ExampleColumn3
,Table2.ExampleColumn4
,Table3.ExampleColumn5
,Table3.ExampleColumn6
,Table1.StringField
FROM [Example Database].[dbo].[Table1] AS Table1
INNER JOIN [Example Database].[dbo].[Table2] AS Table2
ON Example = Example
INNER JOIN [Example Database].[dbo].[Table3] AS Table3
ON Example = Example
WHERE Month BETWEEN 201304 AND 201603
AND (Age < 19)
The above 'Table1.StringField' has the following type codes displayed as a string in each the rows: "||J183,Y752,J374,Y752."
I also have a reference table (Call it 'Ref1') with 514 of these codes displayed individually, which has no other fields in the table whatsoever.
So what i need to be able to do is find rows from the query above which has any of values from the 'Ref1' displayed anywhere within 'Table1.StringField' individual rows, and if not to not include that row in the results set.
I tried to strip down the 'StringField' column of the comma's and "||" but it didn't work as well as i hoped and ended up bringing back over 30M rows.
Any ideas on how to do this? Preferably so it's efficient and doesn't make the user wait 10 minutes just to query it?

Maybe this will get you half way there... I also agree with Sean Lange's comment about not storing delimited data to begin with but I'm assuming the OP already knows this. You can also pivot/unpivot this data to achieve this as well. This is probably the most brute force way of doing sort of what you're looking to do.
--DROP TABLE #Table
--DROP TABLE #Ref
CREATE TABLE #Table (Col VARCHAR(MAX))
CREATE TABLE #Ref (Code VARCHAR(10))
INSERT INTO #Table (Col) VALUES ('A123,B234,C345'),('A123'),('C345')
INSERT INTO #Ref (Code) VALUES ('A123'),('B234')
SELECT * FROM #Table
SELECT * FROM #Ref
SELECT DISTINCT t.Col
FROM #Table t
CROSS APPLY (
SELECT CASE WHEN CHARINDEX(r.Code, t.Col) > 0 THEN 1 ELSE 0 END AS [ItsHere] FROM #Ref r) oa
WHERE oa.ItsHere = 1

What you need to do is join your query to the Ref1 table on Table1.StringField = Ref1.Ref_1_value and then exclude the Table1 rows that don't match any Ref_1_value. Like this:
SELECT
Table1.ExampleColumn1 -- (ID)
,Table1.ExampleColumn2
,Table2.ExampleColumn3
,Table2.ExampleColumn4
,Table3.ExampleColumn5
,Table3.ExampleColumn6
,Table1.StringField
FROM [Example Database].[dbo].[Table1] AS Table1
INNER JOIN [Example Database].[dbo].[Table2] AS Table2
ON Example = Example
INNER JOIN [Example Database].[dbo].[Table3] AS Table3
ON Example = Example
INNER JOIN [Example Database].[dbo].[Ref1] as Ref1
ON Table1.StringField = Ref1.Ref_1_value
WHERE Month BETWEEN 201304 AND 201603
AND (Age < 19)
AND Ref1.Ref_1_value is not null

Related

SQL - Finding duplicates based on 3 columns with different data types

SQL noob here, let me know if I'm not wording anything right. I'm trying to find all entries where there is more than one instance of the same data in 3 columns. Below is some sample data from the 3 columns.
formatid type_from call_desc_code
20 002694W0:USAGE V9
20 013030W0:USAGE OM
20 013030W0:USAGE NULL
From what I understand checksum can be used for this but the output from the below query doesn't seem right. The first part of the query that I'm putting into the #temp table has 29824 results which tells me there should be only 29824 unique combinations of the 3 columns but when I've run the full query then tried removing duplicates in Excel based on only those 3 columns to sanity check the results I have a whole lot more then 29824 entries left.
The formatid is a smallint data type so when I've tried just concatenating the cells with + it returns a conversion failed error. I'm running SQL Server 2012 but I don't think the database is on the same as it doesn't recognise the concat function.
select checksum(formatid,type_from,call_desc_code) & checksum(reverse(formatid),reverse(type_from),reverse(call_desc_code)) as [checksum], count(*) as [Blah]
into #temp
from Table
group by checksum(formatid,type_from,call_desc_code) & checksum(reverse(formatid),reverse(type_from),reverse(call_desc_code))
having count(*) > 1
select * from
Table
where checksum(formatid,type_from,call_desc_code) & checksum(reverse(formatid),reverse(type_from),reverse(call_desc_code)) in (select [checksum] from #temp)
drop table #temp
this will get you everything from your source table which has duplicates
select *
from table t
inner join
(select formatid,type_from,call_desc_code
from Table
group by formatid,type_from,call_desc_code
having count(*) > 1) dup
on dup.formatid = t.formatid
and dup.type_from = t.type_from
and dup.call_desc_code = t.call_desc_code

SELECT query to return a row from a table with all values set to Null

I need to make a query but get the value in every field empty. Gordon Linoff give me the clue to this need here:
SQL Empty query results
which is:
select t.*
from (select 1 as val
) v left outer join
table t
on 1 = 0;
This query wors perfectly on PostgreSQL but gets an error when trying to execute it in Microsoft Access, it says that 1 = 0 expression is not admitted. How could it be fixed to work on microsoft access?
Regards,
If the table has a numeric primary key column whose values are non-negative then the following query will work in Access. The primary key field is [ID].
SELECT t2.*
FROM
myTable AS t2
RIGHT JOIN
(
SELECT TOP 1 (ID * -1) AS badID
FROM myTable AS t1
) AS rowStubs
ON t2.ID = rowStubs.badID
This was tested with Access 2010.
I am offering this answer here, even though you didn't think it worked in my edit to your original question. What is the problem?
select t.*
from (select max(col) as maxval from table as t
) as v left join
table as t
on v.val < t.col;
You can use the following query, but it would still need a little "manual coding".
EDITS:
Actually, you do not need the SWITCH function. Modified query below.
Removed the reference to Description column from one line. Still, you would need to use a Text column name (such as Description) in the last line of the query.
For example, the following query would work for the Months table:
select Months.*
from Months
RIGHT OUTER JOIN
(select "" as DummyColumn from Months) Blank_Data
ON Months.Description = Blank_Data.DummyColumn; --hardcoded Description column

SQL IN() operator with condition inside

I've got table with few numbers inside (or even empty): #states table (value int)
And I need to make SELECT from another table with WHERE clause by definite column.
This column's values must match one of #states numbers or if #states is empty then accept all values (like there is no WHERE condition for this column).
So I tried something like this:
select *
from dbo.tbl_docs docs
where
docs.doc_state in(iif(exists(select 1 from #states), (select value from #states), docs.doc_state))
Unfortunately iif() can't return subquery resulting dataset. I tried different variations with iif() and CASE but it wasn't successful. How to make this condition?
select *
from dbo.tbl_docs docs
where
(
(select count(*) from #states) > 0
AND
docs.doc_state in(select value from #states)
)
OR
(
(select count(*) from #states)=0
AND 1=1
)
Wouldn't a left join do?
declare #statesCount int;
select #statesCount = count(1) from #states;
select
docs.*
from dbo.tbl_docs docs
left join #states s on docs.doc_state = s.value
where s.value is not null or #statesCount = 0;
In general, whenever your query contains sub-queries, you should stop for five minutes, and think hard about whether you really need a sub-query at all.
And if you've got a server capable of doing that, in many cases it might be better to preprocess the input parameters first, or perhaps use constructs such as MS SQL's with.
select *
from dbo.tbl_docs docs
where exists (select 1 from #states where value = doc_state)
or not exists (select 1 from #state)

Alternative to NOT IN()

I have a table with 14,028 rows from November 2012. I also have a table with 13,959 rows from March 2013. I am using a simple NOT IN() clause to see who has left:
select * from nov_2012 where id not in(select id from mar_2013)
This returned 396 rows and I never thought anything of it, until I went to analyze who left. When I pulled all the ids for the lost members and put them in a temp table (##lost), 32 of them were actually still in the mar_2013 table. I can pull them up when I search for their ids using the following:
select * from mar_2013 where id in(select id from ##lost)
I can't figure out what is going on. I will mention that the id field I created is an IDENTITY column. Could that have any effect on the matching using NOT IN? Is there a better way to check for missing rows between tables? I have also tried:
select a.* from nov_2012 a left join mar_2013 b on b.id = a.id where b.id is NULL
And received the same results.
This is how I created the identity field;
create table id_lookup( dateofcusttable date ,sin int ,sex varchar(12) ,scid int identity(777000,1))
insert into id_lookup (sin, sex) select distinct sin, sex from [Client Raw].dbo.cust20130331 where sin <> 0 order by sin, sex
This is how I added the scid into the march table:
select scid, rowno as custrowno
into scid_20130331
from [Client Raw].dbo.cust20130331 cust
left join id_lookup scid
on scid.sin = cust.sin
and scid.sex = cust.sex
update scid_20130331
set scid = custrowno where scid is NULL --for members who don't have more than one id or sin information is not available
drop table Account_Part2_Current
select a.*, scid
into Account_Part2_Current
from Account_Part1_Current a
left join scid_20130331 b
on b.custrowno = a.rowno_custdmd_cust
I then group all the information by the scid
I would prefer this form (and here's why):
SELECT a.id --, other columns
FROM dbo.nov_2012 AS a
WHERE NOT EXISTS (SELECT 1 FROM dbo.mar_2013 WHERE id = a.id);
However this should still give the same results as what you've tried, so I suspect there is something about the data model that you're not telling us - for example, is mar_2013.id nullable?
this is logically equivalent to not in and is faster than not in.
where yourfield in
(select afield
from somewhere
minus
select
thesamefield
where you want to exclude the record
)
It probably isn't as fast as using where not exists, as per Aaron's answer so you should only use it if not exists does not provide the results you want.

SQL Query to return rows even if it is not present in the table

This is a specific problem .
I have an excel sheet containing data. Similar data is present in a relational database table. Some rows may be absent or some additional rows may be present. The goal is to verify the data in the excel sheet with the data in the table.
I have the following query
Select e_no, start_dt,end_dt
From MY_TABLE
Where e_no In
(20231, 457)
In this case, e_no 457 is not present in the database (and hence not returned). But I want my query to return a row even if it not present (457 , null , null). How do I do that ?
For Sql-Server: Use a temporary table or table type variable and left join MY_TABLE with it
Sql-Server fiddle demo
Declare #Temp Table (e_no int)
Insert into #Temp
Values (20231), (457)
Select t.e_no, m.start_dt, m.end_dt
From #temp t left join MY_TABLE m on t.e_no = m.e_no
If your passing values are a csv list, then use a split function to get the values inserted to #Temp.
Why not simply populate a temporary table in the database from your spreadsheet and join against that? Any other solution is probably going to be both more work and more difficult to maintain.
You can also do it this way with a UNION
Select
e_no, start_dt ,end_dt
From MY_TABLE
Where e_no In (20231, 457)
UNION
Select 457, null, null