SQL Server connecting several Selects - sql

I have the following statement:
Select No, Region = 'Ohio'
FROM table
where PostCode >='0001'
AND PostCode <= '4999'
which updates me the table with the correct state in the field Region. How can I expand that statement with several other WHERE conditions in the same statement?
e.g.
Region = 'NewYork'
Where PostCode >='5000'
AND PostCode <= '7999'
My solution would be to build several Statements, for each Region, but there must be a better way having them all in one.

Two common ways to select/set different values based on multiple criteria in a single query are case statements and doing a join on another table with those values. I should also point out that you can take advantage of the between operator in SQL server for much of this.
CASE statements in a single query
A case statement might be useful if you have a small set of criteria, or if you just need to throw together an adhoc query. Here is an example of using a case statement:
select
No,
Region = case
when (PostCode >= '0001' and PostCode <= '4999')
'Ohio'
when (PostCode between '5000' and '7999')
'NewYork'
else
'Unknown'
end
from [...]
JOIN a table with the values and criteria
This is definitely the better method for something like evaluating 50 states - especially since this data is likely static. The idea is that you will want to have a table that contains the criteria and the value, and then join it to the table.
Here is an example using a temp table - you would likely want to use a real table for something as common as states.
-- Setup a #states table
create table #states (state varchar(20), PostCodeMin char(4), PostCodeMax char(4))
insert into #states values ('Ohio', '0001', '4999')
insert into #states values ('NewYork', '5000', '7999')
-- Now query it
select
t.No,
State = isnull(s.state, 'Unknown')
from
my_table t
left outer join #states s
on (t.PostCode between s.PostCodeMin and s.PostCodeMax)
Note that in the above query, I do a left outer join to #states, in case the state isn't setup. I also select the State using isnull, in case the outer join doesn't return anything for that particular row in my_table.

You can create a calculated field using a case statement on region. If there is going to be many "Unknown" records returned then you may want to tweak the WHERE clause to filter out nonessential records for better performance.
SELECT
*
FROM
(
Select
No,
Region =
CASE
WHEN PostCode >'0001' AND PostCode <='4999' THEN 'Ohio'
WHEN PostCode >'5000' AND PostCode <='7999' THEN 'New York'
ELSE
'Unknown'
END
FROM table
where PostCode >='0001' AND PostCode <= '7999'
)AS X
ORDER BY
Region

Related

Optimised way of replacing strings with each other in SQL table

Problem Statement:
Looking for the optimised way of replacing strings with each other in SQL table with huge data and multiple cases.
Consider, I have a table City
I need to replace Bangalore with Delhi and Delhi with Bangalore and similarly there might be 'n' number of other cases.
I know that we can use casing in update to replace the data in the table. Is there a better way of doing it using Replace() function or anything else in the single update..?
An alternative would be to create a temporal table, something like
create table #tmpReplace(OriginalValue varchar(200), NewValue varchar(200))
and make an inner join with your table. The advantage is a much simpler update. The drawback, you still have to populate this table.
For example:
update yt
set yt.Name = tmp.NewValue
from dbo.YourTable yt inner join #tmpReplace tmp on tmp.Name = OriginalValue
Honestly, the easiest method would seem to be a CASE expression:
SELECT ID,
CASE Name WHEN 'Bangalore' THEN 'Delhi'
WHEN 'Delhi' THEN 'Bangalore'
...
ELSE Name
END AS Name
FROM dbo.YourTable;
I would suggest creating a derived table in the query with the "replacement" values:
update t
set t.name = v.newvalue
from t join
(values ('Delhi', 'Bangalore'),
('Bangalore', 'Delhi'),
. . .
) v(oldvalue, newvalue)
on v.oldvalue = t.name;

Is there a CASE statement to apply to several columns to have a like output value changed to another (but different) like value?

I have several columns that read "Not Specified". I would like them to be blank instead.
Is there a better way to apply a case statement to my ENTIRE query rather than each line, if I'm looking to change the same value in different columns? Currently my query looks like this (plus several columns):
SELECT
[user_name]
,[employee_number]
,CASE [veteran_status] WHEN 'not specified' THEN ''
ELSE veteran_status
END AS veteran_status
,CASE [ethnic_origin] WHEN 'not specified' THEN ''
ELSE ethnic_origin
END AS ethnic_origin
,CASE [gender] WHEN 'not specified' THEN ''
ELSE gender
END AS gender
FROM table.HR
I am getting the correct output with the case statements, but looking to see if there's a more efficient way to apply CASE to a mass amount.
Well you can combine the UNPIVOT and PIVOT operators like so:
with t1 as (
select user_name
, employee_number
, col
, nullif(value,'not specified') value
from HR unpivot(value for col in (veteran_status, ethnic_origin, gender)) as x
)
select *
from t1
pivot (max(value) for col in (veteran_status, ethnic_origin, gender)) x;
If your columns aren't of the same data type, you might need to precondition them by casting them all to the same data type:
with t0 as (
select user_name
, employee_number
, cast(veteran_status as varchar(30)) veteran_status
, cast(ethnic_origin as varchar(30)) ethnic_origin
, cast(gender as varchar(30)) gender
from HR
), t1 as (... from t1 ...)
...
However, this may be a LOT of work for something easily achievable by more conventional means.
There is no way using which you can apply one principle (logic) to multiple / all columns of a select statement without putting them for each column.
What you can do though is write alternatives that are more concise than CASE statement. From your query style, I guess you are using MS SQL server, here is one alternative for MS SQL Server
SELECT
[user_name]
,[employee_number]
,iif([veteran_status] = 'not specified', '', [veteran_status]) veteran_status
,iif([ethnic_origin] = 'not specified', '', [ethnic_origin]) ethnic_origin
,iif([gender] = 'not specified', '', [gender]) gender
FROM table.HR
For Oracle there is a similar function called decode, Mysql equivalent is called if
That said, CASE is standard ANSII SQL, available in all databases. Choice is yours.
The following solution will work when you also want to resolve NULLs with the same value. You will still need to add this to each column that is referenced, but it uses less code & reduces redundancy overall.
ISNULL(NULLIF([veteran_status], ‘not specified’), ‘’) AS [veteran_status]
You could join the table to itself, excluding the 'not specified' columns. There isnt a huge gain, however, its an alternative perspective.
SELECT
user_name
,employee_number
,veteran_status
,ethnic_origin
,gender
FROM table.HR hr
LEFT JOIN table.HR hrvs ON hrvs.employee_number = hr.employee_number AND hrvs.veteran_status <> 'not specified'
LEFT JOIN table.HR hreo ON hreo.employee_number = hr.employee_number AND hreo.ethnic_origin <> 'not specified'
LEFT JOIN table.HR hrge ON hrge.employee_number = hr.employee_number AND hrge.gender <> 'not specified'

Performance issues with UNION of large tables

I have seven large tables, that can be storing between 100 to 1 million rows at any time. I'll call them LargeTable1, LargeTable2, LargeTable3, LargeTable4...LargeTable7. These tables are mostly static: there are no updates nor new inserts. They change only once every two weeks or once a month, when they are truncated and a new batch of registers are inserted in each.
All these tables have three fields in common: Headquarter, Country and File. Headquarter and Country are numbers in the format '000', though in two of these tables they are parsed as int due to some other system necessities.
I have another, much smaller table called Headquarters with the information of each headquarter. This table has very few entries. At most 1000, actually.
Now, I need to create a stored procedure that returns all those headquarters that appear in the large tables but are either absent in the Headquarters table or have been deleted (this table is deleted logically: it has a DeletionDate field to check this).
This is the query I've tried:
CREATE PROCEDURE deletedHeadquarters
AS
BEGIN
DECLARE #headquartersFiles TABLE
(
hq int,
countryFile varchar(MAX)
);
SET NOCOUNT ON
INSERT INTO #headquartersFiles
SELECT headquarter, CONCAT(country, ' (', file, ')')
FROM
(
SELECT DISTINCT CONVERT(int, headquarter) as headquarter,
CONVERT(int, country) as country,
file
FROM LargeTable1
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable2
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable3
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable4
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable5
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable6
UNION
SELECT DISTINCT headquarter,
country,
file
FROM LargeTable7
) TC
SELECT RIGHT('000' + CAST(st.headquarter AS VARCHAR(3)), 3) as headquarter,
MAX(s.deletionDate) as deletionDate,
STUFF
(
(SELECT DISTINCT ', ' + st2.countryFile
FROM #headquartersFiles st2
WHERE st2.headquarter = st.headquarter
FOR XML PATH('')),
1,
1,
''
) countryFile
FROM #headquartersFiles as st
LEFT JOIN headquarters s ON CONVERT(int, s.headquarter) = st.headquarter
WHERE s.headquarter IS NULL
OR s.deletionDate IS NOT NULL
GROUP BY st.headquarter
END
This sp's performance isn't good enough for our application. It currently takes around 50 seconds to complete, with the following total rows for each table (just to give you an idea about the sizes):
LargeTable1: 1516666 rows
LargeTable2: 645740 rows
LargeTable3: 1950121 rows
LargeTable4: 779336 rows
LargeTable5: 1100999 rows
LargeTable6: 16499 rows
LargeTable7: 24454 rows
What can I do to improve performance? I've tried to do the following, with no much difference:
Inserting into the local table by batches, excluding those headquarters I've already inserted and then updating the countryFile field for those that are repeated
Creating a view for that UNION query
Creating indexes for the LargeTables for the headquarter field
I've also thought about inserting these missing headquarters in a permanent table after the LargeTables change, but the Headquarters table can change more often, and I would like not having to change its module to keep these things tidy and updated. But if it's the best possible alternative, I'd go for it.
Thanks
Take this filter
LEFT JOIN headquarters s ON CONVERT(int, s.headquarter) = st.headquarter
WHERE s.headquarter IS NULL
OR s.deletionDate IS NOT NULL
And add it to each individual query in the union and insert into #headquartersFiles
It might seem like this makes a lot more filters but it will actually speed stuff up because you are filtering before you start processing as a union.
Also take out all your DISTINCT, it probably won't speed it up but it seems silly because you are doing a UNION and not a UNION all.
Do the filtering at each step. But first, modify the headquarters table so it has the right type for what you need . . . along with an index:
alter table headquarters add headquarter_int as (cast(headquarter as int));
create index idx_headquarters_int on headquarters(headquarters_int);
SELECT DISTINCT headquarter, country, file
FROM LargeTable5 lt5
WHERE NOT EXISTS (SELECT 1
FROM headquarters s
WHERE s.headquarter_int = lt5.headquarter and s.deletiondate is not null
);
Then, you want an index on LargeTable5(headquarter, country, file).
This should take less than 5 seconds to run. If so, then construct the full query, being sure that the types in the correlated subquery match and that you have the right index on the full table. Use union to remove duplicates between the tables.
I'd try doing the filtering with each individual table first. You just need to account for the fact that a headquarter might appear in one table, but not another. You can do this like so:
SELECT
headquarter
FROM
(
SELECT DISTINCT
headquarter,
'table1' AS large_table
FROM
LargeTable1 LT
LEFT OUTER JOIN Headquarters HQ ON HQ.headquarter = LT.headquarter
WHERE
HQ.headquarter IS NULL OR
HQ.deletion_date IS NOT NULL
UNION ALL
SELECT DISTINCT
headquarter,
'table2' AS large_table
FROM
LargeTable2 LT
LEFT OUTER JOIN Headquarters HQ ON HQ.headquarter = LT.headquarter
WHERE
HQ.headquarter IS NULL OR
HQ.deletion_date IS NOT NULL
UNION ALL
...
) SQ
GROUP BY headquarter
HAVING COUNT(*) = 5
That would make sure that it's missing from all five tables.
Table variables have horrible performance because sql server does not generate statistics for them. Instead of a table variable, try using a temp table instead, and if headquarter + country + file is unique in the temp table, add a unique constraint (which will create a clustered index) in the temp table definition. You can set indexes on a temp table after creating it, but for various reasons SQL Server may ignore it.
Edit: as it turns out, you can in fact create indexes on table variables, even non-unique in 2014+.
Secondly, try not to use functions in your joins or where clauses - doing so often causes performance problems.
The real answer is to create separate INSERT statements for each table with the caveat that data to be inserted does not exist in the destination table.

'In' clause in SQL server with multiple columns

I have a component that retrieves data from database based on the keys provided.
However I want my java application to get all the data for all keys in a single database hit to fasten up things.
I can use 'in' clause when I have only one key.
While working on more than one key I can use below query in oracle
SELECT * FROM <table_name>
where (value_type,CODE1) IN (('I','COMM'),('I','CORE'));
which is similar to writing
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'COMM'
and
SELECT * FROM <table_name>
where value_type = 1 and CODE1 = 'CORE'
together
However, this concept of using 'in' clause as above is giving below error in 'SQL server'
ERROR:An expression of non-boolean type specified in a context where a condition is expected, near ','.
Please let know if their is any way to achieve the same in SQL server.
This syntax doesn't exist in SQL Server. Use a combination of And and Or.
SELECT *
FROM <table_name>
WHERE
(value_type = 1 and CODE1 = 'COMM')
OR (value_type = 1 and CODE1 = 'CORE')
(In this case, you could make it shorter, because value_type is compared to the same value in both combinations. I just wanted to show the pattern that works like IN in oracle with multiple fields.)
When using IN with a subquery, you need to rephrase it like this:
Oracle:
SELECT *
FROM foo
WHERE
(value_type, CODE1) IN (
SELECT type, code
FROM bar
WHERE <some conditions>)
SQL Server:
SELECT *
FROM foo
WHERE
EXISTS (
SELECT *
FROM bar
WHERE <some conditions>
AND foo.type_code = bar.type
AND foo.CODE1 = bar.code)
There are other ways to do it, depending on the case, like inner joins and the like.
If you have under 1000 tuples you want to check against and you're using SQL Server 2008+, you can use a table values constructor, and perform a join against it. You can only specify up to 1000 rows in a table values constructor, hence the 1000 tuple limitation. Here's how it would look in your situation:
SELECT <table_name>.* FROM <table_name>
JOIN ( VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b) ON a = value_type AND b = CODE1;
This is only a good idea if your list of values is going to be unique, otherwise you'll get duplicate values. I'm not sure how the performance of this compares to using many ANDs and ORs, but the SQL query is at least much cleaner to look at, in my opinion.
You can also write this to use EXIST instead of JOIN. That may have different performance characteristics and it will avoid the problem of producing duplicate results if your values aren't unique. It may be worth trying both EXIST and JOIN on your use case to see what's a better fit. Here's how EXIST would look,
SELECT * FROM <table_name>
WHERE EXISTS (
SELECT 1
FROM (
VALUES
('I', 'COMM'),
('I', 'CORE')
) AS MyTable(a, b)
WHERE a = value_type AND b = CODE1
);
In conclusion, I think the best choice is to create a temporary table and query against that. But sometimes that's not possible, e.g. your user lacks the permission to create temporary tables, and then using a table values constructor may be your best choice. Use EXIST or JOIN, depending on which gives you better performance on your database.
Normally you can not do it, but can use the following technique.
SELECT * FROM <table_name>
where (value_type+'/'+CODE1) IN (('I'+'/'+'COMM'),('I'+'/'+'CORE'));
A better solution is to avoid hardcoding your values and put then in a temporary or persistent table:
CREATE TABLE #t (ValueType VARCHAR(16), Code VARCHAR(16))
INSERT INTO #t VALUES ('I','COMM'),('I','CORE')
SELECT DT. *
FROM <table_name> DT
JOIN #t T ON T.ValueType = DT.ValueType AND T.Code = DT.Code
Thus, you avoid storing data in your code (persistent table version) and allow to easily modify the filters (without changing the code).
I think you can try this, combine and and or at the same time.
SELECT
*
FROM
<table_name>
WHERE
value_type = 1
AND (CODE1 = 'COMM' OR CODE1 = 'CORE')
What you can do is 'join' the columns as a string, and pass your values also combined as strings.
where (cast(column1 as text) ||','|| cast(column2 as text)) in (?1)
The other way is to do multiple ands and ors.
I had a similar problem in MS SQL, but a little different. Maybe it will help somebody in futere, in my case i found this solution (not full code, just example):
SELECT Table1.Campaign
,Table1.Coupon
FROM [CRM].[dbo].[Coupons] AS Table1
INNER JOIN [CRM].[dbo].[Coupons] AS Table2 ON Table1.Campaign = Table2.Campaign AND Table1.Coupon = Table2.Coupon
WHERE Table1.Coupon IN ('0000000001', '0000000002') AND Table2.Campaign IN ('XXX000000001', 'XYX000000001')
Of cource on Coupon and Campaign in table i have index for fast search.
Compute it in MS Sql
SELECT * FROM <table_name>
where value_type + '|' + CODE1 IN ('I|COMM', 'I|CORE');

Building a SELECT clause with dynamic number of columns

Is it possible to create a SELECT clause with a varying number of columns to be returned depending on joined tables?
For instance.
If I join a table depending on a value in the WHERE-clause I want to return either tbl1.col1, tbl1.col2 if tabl tbl1 is joined or tbl2.col4, tbl2.col5, tbl2.col8 if table tbl2 is joined.
Is this possible? How?
No, you can't write one query that sometimes returns n columns and another time m columns. What you can do is something like this: Use UNION ALL on two queries with conditions that either query 1 or query 2 Returns data. Have columns match, so where one query has no value let it select null in this place.
select tbl1.col1 as firstname, tbl1.col2 as lastname, null as street, tbl1.col3 as job as street from ...
where #variable = 1
UNION ALL
select tbl2.col4 as firstname, tbl2.col5 as lastname, tbl2.col8 as street, null as job from ...
where #variable = 2;
Or you just build your SQL dynamically with whatever language and use completely different SQL, which is what one would normally do.