Proc Sql case confusion - sql

Within SAS
I have a proc-sql step that I'm using to create macro variables to do some list processing.
I have ran into a confusing step where using a case statement rather than a where statement results in the first row of the resulting data set being a null string ('')
There are no null strings contained in either field in either table.
These are two sample SQL steps with all of the macro business removed for simplicity:
create table test as
select distinct
case
when brand in (select distinct core_brand from new_tv.core_noncore_brands) then brand
end as brand1
from new_tv.new_tv2
;
create table test2 as
select distinct brand
from new_tv.new_tv2
where brand in (select distinct core_brand from new_tv.core_noncore_brands)
;
using the first piece of code the result is a table with multiple rows, the first being an empty string.
The second piece of code works as expected
Any reason for this?

So the difference is that without a WHERE clause you aren't limiting what you are selecting, IE every row is considered. The CASE statement can bucket items by criteria, but you don't lose results just because your buckets don't catch everything, hence the NULL. WHERE limits the items being returned.

Yes, the first has no then clause in the case statement. I'm surprised that it even parses. It wouldn't in many SQL dialects.
Presumably you mean:
create table test as
select distinct
case
when brand in (select distinct core_brand from new_tv.core_noncore_brands)
then brand
end as brand1
from new_tv.new_tv2
;
The reason you are getting the NULL is because the case statement is return NULL for the non-matching brands. You would need to add:
where brand1 is not NULL
to prevent this (using either a subquery or making brand1 a calculated field).

Your first query is not correct, there is no 'then' statement in the 'case' clause.
create table test as
select distinct
case
when brand in (select distinct core_brand from new_tv.core_noncore_brands)
*then value*
end as brand1
from new_tv.new_tv2
;
Probably, you have NULL value because there is no default value for the 'case' clause, so for the value which doesn't meet the condition it returns NULL. There is a difference between 'case' clause and 'NOT IN', the first returns you all the rows, but without values, which do not meet condition, when second query will return only row which meet condition.

Related

print the values that are common from both the tables of a column

I want to print the values that are common from both the tables of a column.
The issue is one column's substring value matches with the other column's string.
Printing the subquery alone fetches the right values (proving the substring query is correct) but I think the entire query after the where clause needs changing.
Kindly suggest.
Code:
select distinct sd.sourceworkitemid
from u_prodstypetest pst, sdidata sd
where sd.keyid1 = 'S-20210719-00000003'
and sd.sourceworkitemid in (select substr(testmethodid,0,INSTR(testmethodid,'|',1)-1) from u_prodstypetest);
I want to create a substring of a value of a column in a table and compare it with a column in another table. But since it is a substring the where clause of column1=column2 does not suffice and hence I wroe the subquery to fetch the substring which if run throws an error in 'in' because the subquery return >1 values.

Getting value of row that corresponds to distinct? (proc sql)

I have a dataset unique across 5 variables. The 5th is an identifier variable. Finally, I have a 6th variable, which is dependent on the identifier variable.
The identifier variable can appear in multiple places. The dependent variable will never change for a given value of the identifier variable.
I have code such as the following:
proc sql;
select
...
, count(distinct identifier) as n_ids
from
group by
Which selects the number of unique identifiers per group of 4 independent variables. I'm hoping to add on to this the sum of the 6th variable, which would be something like the following:
sum(case when distinct identifier then dependent_var else 0 end)
Which obviously does not work (and for good reason). Any clean way of finding this sum within the sql step?
The easiest solution is probably to summarize the dataset first by the identifier:
proc sql;
select biggerstuff, identifier, max(depvar)
from yourdataset
group by biggerstuff,identifier;
quit;
Then insert that in your larger query in the place of the 'from' statement (select blah, count(identifier), sum(depvar)). Once you've pre-summarized it in the inner query you know that you only get 1 row per identifier so distinct isn't needed any longer.

SQL: What does NULL as ColumnName imply

I understand that AS is used to create an alias. Therefore, it makes sense to have one long name aliased as a shorter one. However, I am seeing a SQL query NULL as ColumnName
What does this imply?
SELECT *, NULL as aColumn
Aliasing can be used in a number of ways, not just to shorten a long column name.
In this case, your example means you're returning a column that always contains NULL, and it's alias/column name is aColumn.
Aliasing can also be used when you're using computed values, such as Column1 + Column2 AS Column3.
When unioning or joining datasets using a 'Null AS [ColumnA] is a quick way to make sure create a complete dataset that can then be updated later and a new column does not need to be created in any of the source tables.
In the statement result we have a column that has all NULL values. We can refer to that column using alias.
In your case the query selects all records from table, and each result record has additional column containing only NULL values. If we want to refer to this result set and to additional column in other place in the future, we should use alias.
It means that "aColumn" has only Null values. This column could be updated with actual values later but it's an empty one when selected.
---I'm not sure if you know about SSIS, but this mechanism is useful with SSIS to add variable value to the "empty" column.
When using SELECT you can pass a value to the column directly.
So something like :
SELECT ID, Name, 'None' AS Hobbies, 0 AS NumberOfPets, NULL AS Picture, '' AS Adress
Is valid.
It can be used to format nicely a query output when using UNION/UNION ALL.
Query result can have a new column that has all NULL values. In SQL Server we can do it like this
SELECT *, CAST(NULL AS <data-type>) AS as aColumn
e.g.
SELECT *, CAST(NULL AS BIGINT) AS as aColumn
How about without using the the as
SELECT ID
, Name
, 'None' AS Hobbies
, 0 AS NumberOfPets
, NULL Picture
Usually adding NULL as [Column] name at the end of a select all is used when inserting into another table a calculated column based on the table you have just selected.
UPDATE #TempTable SET aColumn = Column1 + Column2 WHERE ...
Then exporting or saving the results to another table.

Find Top 1 best matching string in SQL server

I have a table 'MyTable' which has some business logics. This table has a column called Expression which has a string built using other columns.
My query is
Select Value from MyTable where #Parameters_Built like Expression
The variable #Parameters_Built is built from Input parameters by Concatenating all together.
In my current scenario,
#Parameteres_Built='1|2|Computer IT/Game Design & Dev (BS)|0|1011A|1|0|'
Below are the expressions
---------------------
%%|%%|%%|0|%%|%%|0|
---------------------
1|2|%%|0|%%|%%|0|
---------------------
1|%%|%%|0|%%|%%|0|
---------------------
So my above query returns true for all the three rows. But It should return only the second row (Maximum match).
I just don't need a solution with fix for this scenario. It's just a example. I need a solution like choosing the best match. Any idea?
Try:
Select top 1 * from MyTable
where #Parameters_Built like Expression
order by len(Expression)-len(replace(Expression,'%',''))
- this orders the results by the number of non-% characters in expression.
SQLFiddle here.

how to filter in sql script to not include any column null

imagine there are 50 columns. I dont wan't any row that includes a null value. Are there any tricky way?
SQL 2005 server
Sorry, not really. All 50 columns have to be checked in one form or another.
Column1 IS NOT NULL AND ... AND Column50 IS NOT NULL
Of course, under these conditions why not disallow NULLs in the first place by having NOT NULL in the table definition
If it's SQL Server 2005+ you can do something like:
SELECT fields
FROM MyTable
WHERE stuff
EXCEPT -- This excludes the below results
SELECT fields
FROM MyTable
WHERE (Col1 + Col2 + Col3....) IS NULL
Adding a null to a value results in a null, so the sum of all your columns will be NULL.
This may need to change based on your data types, but adding NULL to either a char/varchar or a number will result in another NULL.
If you are looking at the values not being null, you can do this in the select statement.
SELECT ISNULL(firstname,''), ISNULL(lastname,'') FROM TABLE WHERE SOMETHING=1
This will replace nulls with string blanks. If you want another value use: ISNULL(firstname,'empty') for example. You can use anything where the word empty is.
I prefer this query
select *
from table
where column1>''
and column2>''
and (column3>'' or column3<'')
Allows sql server to use an index seek if the proper index/es exist. you would have to do the syntext for column 3 for any numeric values that could be negative.