Select columnValue if the column exists otherwise null - sql

I'm wondering if I can select the value of a column if the column exists and just select null otherwise. In other words I'd like to "lift" the select statement to handle the case when the column doesn't exist.
SELECT uniqueId
, columnTwo
, /*WHEN columnThree exists THEN columnThree ELSE NULL END*/ AS columnThree
FROM (subQuery) s
Note, I'm in the middle to solidifying my data model and design. I hope to exclude this logic in the coming weeks, but I'd really like to move beyond this problem right because the data model fix is a more time consuming endeavor than I'd like to tackle now.
Also note, I'd like to be able to do this in one query. So I'm not looking for an answer like
check what columns are on your sub query first. Then modify your
query to appropriately handle the columns on your sub query.

You cannot do this with a simple SQL statement. A SQL query will not compile unless all table and column references in the table exist.
You can do this with dynamic SQL if the "subquery" is a table reference or a view.
In dynamic SQL, you would do something like:
declare #sql nvarchar(max) = '
SELECT uniqueId, columnTwo, '+
(case when exists (select *
from INFORMATION_SCHEMA.COLUMNS
where tablename = #TableName and
columnname = 'ColumnThree' -- and schema name too, if you like
)
then 'ColumnThree'
else 'NULL as ColumnThree'
end) + '
FROM (select * from '+#SourceName+' s
';
exec sp_executesql #sql;
For an actual subquery, you could approximate the same thing by checking to see if the subquery returned something with that column name. One method for this is to run the query: select top 0 * into #temp from (<subquery>) s and then check the columns in #temp.
EDIT:
I don't usually update such old questions, but based on the comment below. If you have a unique identifier for each row in the "subquery", you can run the following:
select t.. . ., -- everything but columnthree
(select column3 -- not qualified!
from t t2
where t2.pk = t.pk
) as column3
from t cross join
(values (NULL)) v(columnthree);
The subquery will pick up column3 from the outer query if it doesn't exist. However, this depends critically on having a unique identifier for each row. The question is explicitly about a subquery, and there is no reason to expect that the rows are easily uniquely identified.

As others already suggested, the sane approach is to have queries that meet your table design.
There is a rather exotic approach to achieve what you want in (pure, not dynamic) SQL though. A similar problem was posted at DBA.SE: How to select specific rows if a column exists or all rows if a column doesn't but it was simpler as only one row and one column was wanted as result. Your problem is more complex so the query is more convoluted, to say the least. Here is, the insane approach:
; WITH s AS
(subquery) -- subquery
SELECT uniqueId
, columnTwo
, columnThree =
( SELECT ( SELECT columnThree
FROM s AS s2
WHERE s2.uniqueId = s.uniqueId
) AS columnThree
FROM (SELECT NULL AS columnThree) AS dummy
)
FROM s ;
It also assumes that the uniqueId is unique in the result set of the subquery.
Tested at SQL-Fiddle
And a simpler method which has the additional advantage that allows more than one column with a single subquery:
SELECT s.*
FROM
( SELECT NULL AS columnTwo,
NULL AS columnThree,
NULL AS columnFour
) AS dummy
CROSS APPLY
( SELECT
uniqueId,
columnTwo,
columnThree,
columnFour
FROM tableX
) AS s ;
The question has also been asked at DBA.SE and has been answered by #Andriy M (using CROSS APPLY too!) and Michael Ericsson (using XML):
Why can't I use a CASE statement to see if a column exists and not SELECT from it?

you can use dynamic SQL.
first you need to check exist column and then create dynamic query.
DECLARE #query NVARCHAR(MAX) = '
SELECT FirstColumn, SecondColumn, '+
(CASE WHEN exists (SELECT 1 FROM syscolumns
WHERE name = 'ColumnName' AND id = OBJECT_ID('TableName'))
THEN 'ColumnName'
ELSE 'NULL as ThreeColumn'
END) + '
FROM TableName'
EXEC sp_executesql #query;

Related

Join using a LIKE clause is taking too long

Please see the TSQL below:
create table #IDs (id varchar(100))
insert into #IDs values ('123')
insert into #IDs values ('456')
insert into #IDs values ('789')
insert into #IDs values ('1010')
create table #Notes (Note varchar(500))
insert into #Notes values ('Here is a note for 123')
insert into #Notes values ('A note for 789 here')
insert into #Notes values ('456 has a note here')
I want to find all the IDs that are referenced in the #Notes table. This works:
select #IDs.id from #IDs inner join #Notes on #Notes.note like '%' + #IDs.id + '%'
However, there are hundreds of thousands of records in both tables and the query does not complete. I was thinking about FreeText searching, but I don't think it can be applied here. A cursor takes too long to run as well (I think it will take over one month). Is there anything else I can try? I am using SQL Server 2019.
The size of the input is only one aspect of the solution.
By splitting the text to tokens you indeed increase the number of records, but in the same time you enable equality join, which can be implemented using Hash Join.
You should get the query results in a few minutes top, basically the time it takes to your system to do a full scan on both tables, plus some processing time.
No need for temp tables.
No need for indexes.
Select id
from #IDS
where id in (select w.value
from #Notes as n
cross apply string_split(n.Note, ' ') as w
)
Fiddle
Per the OP request -
Here is a code that handles more complicated scenario, where an id could contain various characters (as defined by #token_char) and the separators are potentially all other characters
declare #token_char varchar(100) = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
;
with cte_notes as
(
select Note
,replace(translate(Note,#token_char,space(len(#token_char))),' ','') as non_token_char
from #Notes
)
select id
from #IDS
where id in
(
select w.value
from cte_notes as n
cross apply string_split(translate(n.Note,n.non_token_char,space(len(n.non_token_char))),' ') as w
where w.value != ''
)
The Fiddle data sample was altered accordingly, to reflect the change
If you are going to do this search often you may want to explore using a wonderful (if underused) feature of SQL Server called 'Full Text Search.' To quote Microsoft:
A LIKE query against millions of rows of text data can take minutes to
return; whereas a full-text query can take only seconds or less
against the same data, depending on the number of rows that are
returned.'
I have seen searches go from minutes to seconds using this feature.
You would need to create a Full Text Search Catalog and then create indexs on the tables you want to search. It's not hard and will take you a few minutes to learn how to do this.
This is a good starting point:
https://learn.microsoft.com/en-us/sql/relational-databases/search/get-started-with-full-text-search?view=sql-server-ver15
I would apply CTE with string_split to filter out all alphabetic components and then join #ID table with the result of the CTE by id column. The query was tested on a sample of 1 mm rows.
With CTE As (
Select T.value As id
From #Notes Cross Apply String_Split(Note,' ') As T
Where Try_Convert(Int, T.value) Is Not Null
)
Select I.id
From #IDs As I Inner Join CTE On (I.id=CTE.id)
If you just want to extract a numeric value from a string, in this case join is excessive.
Select T.value As id, #Notes.Note
From #Notes Cross Apply String_Split(Note,' ') As T
Where Try_Convert(Int, T.value) Is Not Null And T.value Like '%[0-9]%'
id
Note
123
Here is a note for 123
789
A note for 789 here
456
456 has a note here
No matter what, under the given circumstances, I would use join to filter out those numbers that are not represented in #IDs table.
With CTE As (
Select distinct(id) As id
From #IDs
)
Select T.value As id, #Notes.Note
From #Notes Cross Apply String_Split(Note,' ') As T
Inner Join CTE On (T.value=CTE.id)
Where Try_Convert(Int, T.value) Is Not Null
And T.value Like '%[0-9]%'
If the string contains brackets or parenthesis instead of spaces like this:
"456(this is an id number) has a note here" or "456[01/01/2022]"
as last resorts (since it degrades performance) you can use TRANSLATE to replace those brackets with spaces as follows:
With CTE As (
Select distinct(id) As id
From #IDs
)
Select T.value As id, #Notes.Note
From #Notes Cross Apply String_Split(TRANSLATE(Note,'[]()',' '),' ') As T
Inner Join CTE On (T.value=CTE.id)
Where Try_Convert(Int, T.value) Is Not Null
And T.value Like '%[0-9]%'
db<>fiddle

Use a select query result in an other query

Could please someone explain, how to use a result from a SELECT (1st select result)
Then use that result (1st select result) on a second query VALUES clause
there is an example (on Microsoft SQL Server) :
--1st query, select all DB1.client.name
select Name from DB1.client
--the result of that query is : 1st select result VALUES : ana,boby, ..., micky
--2nd query, compare DB1.client.name (1st select result) with DB2.client.name
--and get back who doesn't exist on second table
select v.Name
from (values
**(There i want use the result of my first query)**
) as v(Name)
where not exists (select *
from DB2.client c
where c.Name = v.Name);
--the result is ana, ..., micky
"..." mean some other results
i want compare first and second database to retrieve values which aren't in both databases
If you can, I recommend sub-query :
select Name
from DB1.client
where Name not in not exists (select name from DB2.client)
If you want reuse a query (not the result), see #Thkas answer.
It isn't possible to reuse the result in SQL Server, because SQL Server release the memory when the result is read. The trick is to insert the query's result in a temporary table, then you read this temporary table as many times as necessary :
--Insert into temporary table
insert into #tmpResult
select Name from DB1.client
--First read
select Name from #tmpResult
--Second read from sub-query
select Name
from #tmpResult
where Name not in not exists (select name from DB2.client)
According to your sample code I understood something like this. You can add two tables and do with WHERE the check you want.
select a.Name,b.Name
from DB1.client,
DB2.client as b
where a.Name != b.Name;
Also if you want to take as a result set from the first query:
Here I created a subquery which takes the results from the first query. You can correct me for columns.
with temp as (
select Name
from DB1.client
)select a.Name,b.Name
from temp as a ,
DB2.client as b
where a.Name != b.Name;

I am trying to return a certain values in each row which depend on whether different values in that row are already in a different table

I'm still a n00b at SQL and am running into a snag. What I have is an initial selection of certain IDs into a temp table based upon certain conditions:
SELECT DISTINCT ID
INTO #TEMPTABLE
FROM ICC
WHERE ICC_Code = 1 AND ICC_State = 'CA'
Later in the query I SELECT a different and much longer listing of IDs along with other data from other tables. That SELECT is about 20 columns wide and is my result set. What I would like to be able to do is add an extra column to that result set with each value of that column either TRUE or FALSE. If the ID in the row is in #TEMPTABLE the value of the additional column should read TRUE. If not, FALSE. This way the added column will ready TRUE or FALSE on each row, depending on if the ID in each row is in #TEMPTABLE.
The second SELECT would be something like:
SELECT ID,
ColumnA,
ColumnB,
...
NEWCOLUMN
FROM ...
NEWCOLUMN's value for each row would depend on whether the ID in that row returned is in #TEMPTABLE.
Does anyone have any advice here?
Thank you,
Matt
If you left join to the #TEMPTABLE you'll get a NULL where the ID's don't exist
SELECT ID,
ColumnA,
ColumnB,
...
T.ID IS NOT NULL AS NEWCOLUMN -- Gives 1 or 0 or True/false as a bit
FROM ... X
LEFT JOIN #TEMPTABLE T
ON T.ID = X.ID -- DEFINE how the two rows can be related unquiley
You need to LEFT JOIN your results query to #TEMPTABLE ON ID, this will give you the ID if there is one and NULL if there isn't, if you want 1 or 0 this would do it (For SQL Server) ISNULL(#TEMPTABLE.ID,0)<>0.
A few notes on coding for performance:
By definition an ID column is unique so the DISTINCT is redundant and causes unnecisary processing (unless it is an ID from another table)
Why would you store this to a temporary table rather than just using it in the query directly?
You could use a union and a subquery.
Select . . . . , 'TRUE'
From . . .
Where ID in
(Select id FROM #temptable)
UNION
SELECT . . . , 'FALSE'
FROM . . .
WHERE ID NOT in
(Select id FROM #temptable)
So the top part, SELECT ... FROM ... WHERE ID IN (Subquery), does a SELECT if the ID is in your temptable.
The bottom part does a SELECT if the ID is not in the temptable.
The UNION operator joins the two results nicely, since both SELECT statements will return the same number of columns.
To expand on what someone else was saying with Union, just do something like so
SELECT id, TRUE AS myColumn FROM `table1`
UNION
SELECT id, FALSE AS myColumn FROM `table2`

SQL select from either one or other table

Assume I have a table A with a lot of records (> 100'000) and a table B with has the same columns as A and about the same data amount.
Is there a possibility with one clever select statement that I can either get all records of table A or all records of table B?
I am not so happy with the approach I currently use because of the performance:
select
column1
,column2
,column3
from (
select 'A' as tablename, a.* from table_a a
union
select 'B' as tablename, b.* from table_b b
) x
where
x.tablename = 'A'
Offhand, your approach seems like the only approach in standard SQL.
You will improve performance considerably by changing the UNION to UNION ALL. The UNION must read in the data from both tables and then eliminate duplicates, before returning any data.
The UNION ALL does not eliminate duplicates. How much better this performs depends on the database engine and possibly on turning parameters.
Actually, there is another possibility. I don't know how well it will work, but you can try it:
select *
from ((select const.tableName, a.*
from A cross join
(select 'A' as tableName where x.TableName = 'A')
) union all
(select const.tableName, b.*
from B cross join
(select 'B' as tableName where x.TableName = 'B')
)
) t
No promises. But the idea is to cross join to a table with either 1 or 0 rows. This will not work in MySQL, because it does not allow WHERE clauses without a FROM. In other databases, you might need a tablename such as dual. This gives the query engine an opportunity to optimize away the read of the table entirely, when the subquery contains no records. Of course, just because you give a SQL engine the opportunity to optimize does not mean that it will.
Also, the "*" is a bad idea particularly in union's. But I've left it in because that is not the focus of the question.
you can try next solution, it's selects only from table tmp1 ('A' = 'A')
select
*
from
tmp1
where
'A' = 'A'
union all
select
*
from
tmp2
where
'B' = 'A'
SQL Fiddle demo here
check execution plan
Hard to tell exactly what you want without a little more context, but perhaps something like this could work?
DECLARE #TableName nvarchar(15);
DECLARE #Query nvarchar(50);
SELECT #TableName = YourField
FROM YourTable
WHERE ...
SET #Query = 'SELECT * FROM ' + #TableName
EXEC #Query
Syntax might differ a bit depending on what RDBMS you are using, and more specifically what you are trying to accomplish, but might be a push in the right direction.
The proper way to do this and maintain performance requires some modification to your physical table design.
If you can add a column to each table that holds your indicator column and add a check constraint on that column, you can achieve "partition" elimination on your query.
DDL:
create table table_a (
c1 ...
,c2 ...
,c3 ...
,table_ind char(1) not null generated always as 'A'
,constraint ck_table_ind check (table_ind = 'A')
);
create table table_b (
c1 ...
,c2 ...
,c3 ...
,table_ind char(1) not null generated always as 'B'
,constraint ck_table_ind check (table_ind = 'B')
);
create view v1 as (
select * from table_a
union all
select * from table_b
);
If you execute the query select c1,c2,c3 from v1 where table_ind = 'A' the DB2 optimizer will use the check constraint to recognize that no rows in table_b can match the table_ind = 'A' predicate, so it will completely eliminate the table from the access plan.
This was used (and still is in some cases) before DB2 for Linux/UNIX/Windows supported Range Partitioning. You can read more about this technique in this research paper [PDF] written by some of the IBM DB2 developers back in 2002.

Using the distinct function in SQL

I have a SQL query I am running. What I was wanting to know is that is there a way of selecting the rows in a table where the value in on one of those columns is distinct? When I use the distinct function, It returns all of the distinct rows so...
select distinct teacher from class etc.
This works fine, but I am selecting multiple columns, so...
select distinct teacher, student etc.
but I don't want to retrieve the distinct rows, I want the distinct rows where the teacher is distinct. So this query would probably return the same teacher's name multiple times because the student value is different but what I would like is to return rows where the teachers are distinct, even if it means returning the teacher and one student name (because I don't need all the students).
I hope what I am trying to ask is clear but is it possible to use the distinct function on a single column even when selecting multiple columns or is there any other solution to this problem? Thanks.
The above is just an example I am giving. I don't know if using 'distinct' is the solution to my problem. I am not using teacher etc. that was just an example to get the idea accross. I am selecting multiple columns (about 10) from different tables. I have a query to get the tabled result I want. Now I want to query that table to find the unique values in one particular column. So using the teacher example again, say I have wrote a query and I have all the teachers and all the pupils they teach. Now I want to go through each row in this table and email the teacher a message. But I don't want to email the teacher numerous times, just the once, so I want to return all the columns from the table I have, where only the teacher value is distinct.
Col A Col B Col C Col D
a b c d
a c d b
b a a c
b c c c
A query I have produces the above table. Now I want only those rows where Col A values are unique. How would I go about it?
You have misunderstood the DISTINCT keyword. It is not a function and it does not modify a column. You cannot SELECT a, DISTINCT(b), c, DISTINCT(d) FROM SomeTable. DISTINCT is a modifier for the query itself, i.e. you don't select a distinct column, you make a SELECT DISTINCT query.
In other words: DISTINCT tells the server to go through the whole result set and remove all duplicate rows after the query has been performed.
If you need a column to contain every value once, you need to GROUP BY that column. Once you do that, the server now needs to do which student to select with each teacher, if there are multiple, so you need to provide a so-called aggregate function like COUNT(). Example:
SELECT teacher, COUNT(student) AS amountStudents
FROM ...
GROUP BY teacher;
One option is to use a GROUP BY on Col A. Example:
SELECT * FROM table_name
GROUP BY Col A
That should return you:
abcd
baac
Based on the limited details you provided in your question (you should explain how/why your data is in different tables, what DB server you are using, etc) you can approach this from 2 different directions.
Reduce the number of columns in your query to only return the "teacher" and "email" columns but using the existing WHERE criteria. The problem you have with your current attempt is both DISTINCT and GROUP BY don't understand that you one want 1 row for each value of the column that you are trying to be distinct about. From what I understand, MySQL has support for what you are doing using GROUP BY but MSSQL does not support result columns not included in the GROUP BY statement. If you don't need the "student" columns, don't put them in your result set.
Convert your existing query to use column based sub-queries so that you only return a single result for non-grouped data.
Example:
SELECT t1.a
, (SELECT TOP 1 b FROM Table1 t2 WHERE t1.a = t2.a) AS b
, (SELECT TOP 1 c FROM Table1 t2 WHERE t1.a = t2.a) AS c
, (SELECT TOP 1 d FROM Table1 t2 WHERE t1.a = t2.a) AS d
FROM dbo.Table1 t1
WHERE (your criteria here)
GROUP BY t1.a
This query will not be fast if you have a lot of data, but it will return a single row per teacher with a somewhat random value for the remaining columns. You can also add an ORDER BY to each sub-query to further tweak the values returned for the additional columns.
I'm not sure if I am understanding this right but couldn't you do
SELECT * FROM class WHERE teacher IN (SELECT DISTINCT teacher FROM class)
This would return all of the data in each row where the teacher is distinct
distinct requires a unique result-set row. This means that whatever values you select from your table will need to be distinct together as a row from any other row in the result-set.
Using distinct can return the same value more than once from a given field as long as the other corresponding fields in the row are distinct as well.
As soulmerge and Shiraz have mentioned you'll need to use a GROUP BY and subselect. This worked for me.
DECLARE #table TABLE (
[Teacher] [NVarchar](256) NOT NULL ,
[Student] [NVarchar](256) NOT NULL
)
INSERT INTO #table VALUES ('Teacher 1', 'Student 1')
INSERT INTO #table VALUES ('Teacher 1', 'Student 2')
INSERT INTO #table VALUES ('Teacher 2', 'Student 3')
INSERT INTO #table VALUES ('Teacher 2', 'Student 4')
SELECT
T.[Teacher],
(
SELECT TOP 1 T2.[Student]
FROM #table AS T2
WHERE T2.[Teacher] = T.[Teacher]
) AS [Student]
FROM #table AS T
GROUP BY T.[Teacher]
Results
Teacher 1, Student 1
Teacher 2, Student 3
You need to do it with a sub select where you take TOP 1 of student where the teacher is the same.
You may try "GROUP BY teacher" to return what you need.
What is the question your query is trying to answer?
Do you need to know which classes have only one teacher?
select class_name, count(teacher)
from class group by class_name having count(teacher)=1
Or are you looking for teachers with only one student?
select teacher, count(student)
from class group by teacher having count(student)=1
Or is it something else? The question you've posed assumes that using DISTINCT is the correct approach to the query you're trying to construct. It seems likely this is not the case. Could you describe the question you're trying to answer with DISTINCT?
You will need to say how your data is stored in-memory for us to say how you can query it.
But you could do a separate query to just get the distinct teachers.
select distinct teacher from class
I am struggling to understand exactly what you wish to do.. but you can do something like this:
SELECT DISTINCT ColA FROM Table WHERE ...
If you only select a singular column, the distinct will only grab those.
If you could clarify a little more, I could try to help a bit more.
You could use GROUP BY to separate the return values based on a single column value.
All you have to do is select just the columns you want the first one and do a select Distinct
Select Distinct column1 -- where your criteria...
The following might help you get to your solution. The other poster did point to this but his syntax for group by was incorrect.
Get all teachers that teach any classes.
Select teacher_id, count(*)
from teacher_table inner join classes_table
on teacher_table.teacher_id = classes_table.teacher_id
group by teacher_id
Noone seems to understand what you want. I will take another guess.
Select * from tbl
Where ColA in (Select ColA from tbl Group by ColA Having Count(ColA) = 1)
This will return all data from rows where ColA is unique -i.e. there isn't another row with the same ColA value. Of course, that means zero rows from the sample data you provided.
select cola,colb,colc
from yourtable
where cola in
(
select cola from yourtable where your criteria group by cola having count(*) = 1
)
declare #temp as table (colA nchar, colB nchar, colC nchar, colD nchar, rownum int)
insert #temp (colA, colB, colC, colD, rownum)
select Test.ColA, Test.ColB, Test.ColC, Test.ColD, ROW_NUMBER() over (order by ColA) as rownum
from Test
select t1.ColA, ColB, ColC, ColD
from #temp as t1
join (
select ColA, MIN(rownum) [min]
from #temp
group by Cola)
as t2 on t1.Cola = t2.Cola and t1.rownum = t2.[min]
This will return a single row for each value of the colA.
CREATE FUNCTION dbo.DistinctList
(
#List VARCHAR(MAX),
#Delim CHAR
)
RETURNS
VARCHAR(MAX)
AS
BEGIN
DECLARE #ParsedList TABLE
(
Item VARCHAR(MAX)
)
DECLARE #list1 VARCHAR(MAX), #Pos INT, #rList VARCHAR(MAX)
SET #list = LTRIM(RTRIM(#list)) + #Delim
SET #pos = CHARINDEX(#delim, #list, 1)
WHILE #pos > 0
BEGIN
SET #list1 = LTRIM(RTRIM(LEFT(#list, #pos - 1)))
IF #list1 <> ''
INSERT INTO #ParsedList VALUES (CAST(#list1 AS VARCHAR(MAX)))
SET #list = SUBSTRING(#list, #pos+1, LEN(#list))
SET #pos = CHARINDEX(#delim, #list, 1)
END
SELECT #rlist = COALESCE(#rlist+',','') + item
FROM (SELECT DISTINCT Item FROM #ParsedList) t
RETURN #rlist
END
GO