Getting value of row that corresponds to distinct? (proc sql) - sql

I have a dataset unique across 5 variables. The 5th is an identifier variable. Finally, I have a 6th variable, which is dependent on the identifier variable.
The identifier variable can appear in multiple places. The dependent variable will never change for a given value of the identifier variable.
I have code such as the following:
proc sql;
select
...
, count(distinct identifier) as n_ids
from
group by
Which selects the number of unique identifiers per group of 4 independent variables. I'm hoping to add on to this the sum of the 6th variable, which would be something like the following:
sum(case when distinct identifier then dependent_var else 0 end)
Which obviously does not work (and for good reason). Any clean way of finding this sum within the sql step?

The easiest solution is probably to summarize the dataset first by the identifier:
proc sql;
select biggerstuff, identifier, max(depvar)
from yourdataset
group by biggerstuff,identifier;
quit;
Then insert that in your larger query in the place of the 'from' statement (select blah, count(identifier), sum(depvar)). Once you've pre-summarized it in the inner query you know that you only get 1 row per identifier so distinct isn't needed any longer.

Related

How to backreference a calculated column value in another column during an INSERT query on Postgres? (query-runtime temporary variable assignment)

In MySQL there's some helpful syntax for doing things like SELECT #calc:=3,#calc, but I can't find the way to solve this on PostgreSQL
The idea would be something like:
SELECT (SET) autogen := UUID_GENERATE_v4() AS id, :autogen AS duplicated_id;
returning a row with 2 columns with same value
EDIT: Not interested in conventional \set, I need to do this for hundreds of rows
You can use a subquery:
select id, id as duplicated_id
from (select UUID_GENERATE_v4() AS id
) x
Postgres does not confuse the select statement by allowing variable assignment. Even if it did, nothing guarantees the order of evaluation of expressions in a select, so you still would not be sure that it worked.

Using dot notation with sum() to query the same, but multiple columns, in multiple databases

SQLite
There are multiple databases, one database for each time period (i.e. quarter). The column headers in each table are the same. Some of the columns. The data is identical between databases (e.g. ID, Name, Address, State, Website, etc). Some of the columns, the column header is the same but the
data in the column is different between databases.
The goal is to:
Select multiple columns from multiple databases, sum each column, convert the output from 000000000 to $000,000,000,000, adding three zero's to the output
(currently the data is represented in 000's).
Following is an iteration of queries that work, ending in the queries that fail.
Selecting one column from one database. This query works.
select dep
From AllReports19921231AssetsAndLiabilities;
output
"11005"
"34396"
"42244"
Adding a sum(columnName) method to this same query works.
select sum(dep)
From AllReports19921231AssetsAndLiabilities;
results: 3562807353
Attempting to sum(columnName) from multiple databases causes an error.
select sum(dep)
From AllReports19921231AssetsAndLiabilities,
AllReports19930331AssetsAndLiabilities;
error:
ambiguous column name: dep: select sum(dep)
From AllReports19921231AssetsAndLiabilities,
AllReports19930331AssetsAndLiabilities;
Using dot notation to attach a database to a column. Query works.
select AllReports19921231AssetsAndLiabilities.dep
From AllReports19921231AssetsAndLiabilities;
Output:
"11005"
"34396"
"42244"
However when I attempt to include dot notation and add sum(columnName) to the query, it fails.
select AllReports19921231AssetsAndLiabilities.sum(dep)
From AllReports19921231AssetsAndLiabilities;
I receive this error:
near "(": syntax error: select AllReports19921231AssetsAndLiabilities.sum(
What are correct ways to write this query?
The end goal is to select the same columns (e.g. col1, col2, col3, etc) from multiple databases (Q1, Q2, Q3, Q4).
Sum each column, add three zero's the output, then convert from 000000000 to $000,000,000,000
Note: There are 103 databases (i.e. one for each time period/quarter).
select AllReports19921231AssetsAndLiabilities.sum(dep),
AllReports19930331AssetsAndLiabilities.sum(dep),
AllReports19930630AssetsAndLiabilities.sum(dep)
From AllReports19921231AssetsAndLiabilities,
AllReports19930331AssetsAndLiabilities,
AllReports19930630AssetsAndLiabilities;
The above query outputs an error:
near "(": syntax error: select AllReports19921231AssetsAndLiabilities.sum(
Your syntax is wrong :
select sum(AllReports19921231AssetsAndLiabilities.dep)
From AllReports19921231AssetsAndLiabilities
Learn to use aliases!
select sum(aal.dep)
From AllReports19921231AssetsAndLiabilities aal;
The query is much easier to write and to read. The table alias (whether the full table name or an abbreviation) is attached to the column name. In SQL, this results in a qualified column reference. The qualification specifies what table it is coming from.
The table alias is not attached to a function, because SQL does not currently allow tables to contain functions.

Proc Sql case confusion

Within SAS
I have a proc-sql step that I'm using to create macro variables to do some list processing.
I have ran into a confusing step where using a case statement rather than a where statement results in the first row of the resulting data set being a null string ('')
There are no null strings contained in either field in either table.
These are two sample SQL steps with all of the macro business removed for simplicity:
create table test as
select distinct
case
when brand in (select distinct core_brand from new_tv.core_noncore_brands) then brand
end as brand1
from new_tv.new_tv2
;
create table test2 as
select distinct brand
from new_tv.new_tv2
where brand in (select distinct core_brand from new_tv.core_noncore_brands)
;
using the first piece of code the result is a table with multiple rows, the first being an empty string.
The second piece of code works as expected
Any reason for this?
So the difference is that without a WHERE clause you aren't limiting what you are selecting, IE every row is considered. The CASE statement can bucket items by criteria, but you don't lose results just because your buckets don't catch everything, hence the NULL. WHERE limits the items being returned.
Yes, the first has no then clause in the case statement. I'm surprised that it even parses. It wouldn't in many SQL dialects.
Presumably you mean:
create table test as
select distinct
case
when brand in (select distinct core_brand from new_tv.core_noncore_brands)
then brand
end as brand1
from new_tv.new_tv2
;
The reason you are getting the NULL is because the case statement is return NULL for the non-matching brands. You would need to add:
where brand1 is not NULL
to prevent this (using either a subquery or making brand1 a calculated field).
Your first query is not correct, there is no 'then' statement in the 'case' clause.
create table test as
select distinct
case
when brand in (select distinct core_brand from new_tv.core_noncore_brands)
*then value*
end as brand1
from new_tv.new_tv2
;
Probably, you have NULL value because there is no default value for the 'case' clause, so for the value which doesn't meet the condition it returns NULL. There is a difference between 'case' clause and 'NOT IN', the first returns you all the rows, but without values, which do not meet condition, when second query will return only row which meet condition.

Convert select into stored procedure best approach

I use this SQL to get count for every group of type.
select
mytype, count(mytype)
from types1
group by 1
The result is 5 records with count for each type. I need to convert this to a stored procedure; should I write the above SQL using For...Select or should I return single value using Select...Where...Into 5 times for each type?
I will use the return counts to update a master table and types may increase in the future.
That depends on what you want out of the procedure:
If you want the same output as your select with five rows, use a FOR SELECT. You will get one row for each type and an associated count. This is probably the "standard" approach.
If however you want five output variables, one for each count of each type, you can use five queries of the form SELECT COUNT(1) FROM types1 WHERE mytype = 'type1' INTO :type1. Realize though that this will be five queries and you may be better off doing a single FOR SELECT query and looping through the returned rows in the procedure. Also note that if you at some point add a sixth type you will have to change this procedure to add the additional type.
If you want to query a single type, you can also do something like the following, which will return a single row with a single count for the type in the input parameter:
CREATE PROCEDURE GetTypeCount(
TypeName VARCHAR(256)
)
RETURNS (
TypeCount INTEGER
)
AS
BEGIN
SELECT COUNT(1)
FROM types1
WHERE mytype = :TypeName
INTO :TypeCount;
SUSPEND
END

How to select a sql table's column value by giving column index?

I have table with 3 columns. One is Id, second column is Name and the third one Description. How can I select the value in the Description field by giving the column index, 3?
Thanks in advance
You can't, from plain SQL (other than in the ORDER BY clause, which won't give you the value but will allow you to sort the result set by it).
If you are using another programming language to construct a dynamic query, you could use that to identify the column being selected by its index number.
Alternatively, you could parameterise your query to return a specific column based on a case statement - like so:
select a, b, c, d, e, ...,
case ?
when 1 then a
when 2 then b
when 3 then c
when 4 then d
when 5 then e
...
end as parameterised_column
from ...
The problem with referring to a column by an index number is that, one day, someone may add a column and break your application as the wrong value will be returned.
This principle is enforced in SQL because you can select named columns, or all columns using the * syntax.
This principle is not enforced in programming languages, where you can usually access the column by ordinal in code, but you should consider the principle before deciding to use a statement such as (psuedo code)
value = results[0].column[2].value;
It should be possible. You'd have to query the system tables (which do vary from one version of SQL to another) to get the 3rd (or Nth) column name as a string to form a following query using that column name.
In SQL 2000 the tables you'll need to start with are syscolumns with a join to sysobjects for the table name. Then the rank() function on "Colid" will give you the Nth column and "name" (shockingly) the name of the column. Once you've got that in a variable the following command can return the value, compare to it, order by it or whatever you need.
This is how you can retrieve a Column's name by passing it's index.
Here variable AcID is used as the index of the column.
Below is the code e.g
dim gFld as string
vSqlText1 = "Select * from RecMast where ID = 1000"
vSql1 = New SqlClient.SqlCommand(vSqlText1, cnnRice)
vRs1 = vSql1.ExecuteReader
if vRs1.Read then
gFld = vRs1.GetName(AcID)
msgbox gfld
end if
declare #searchIndex int
set #searchIndex = 3
select Description from tbl_name t where t.Id = #searchIndex