ERROR: Subquery evaluated to more than one row. in SAS - sql

I wrote the following code in SAS in order to select the record with egrefid not equal to 3 grouped by subjid and cpevent, but was told "ERROR: Subquery evaluated to more than one row."
case when (select count(egrefid) from INFMM.EDAT_EG004
group by subjid, cpevent
having count(egrefid) ne 3)
and cpevent in ('DAY1', 'DAY29', 'DAY85') then 'triplicate'
else ' ' end as flag
I think the problem is in the count() function, but don't know how to fix it.
Does anybody know how to solve this problem?

The case when is evaluated for each line. Your subquery will return all unique subjid and cpevent pairs in your INFMM.EDAT_EG004 table.
I think a join would be your best bet in this instance
create table egrefid_counts as
select subjid, cpevent, count(egrefid) as egrefid_count
from INFMM.EDAT_EG004
where cpeven in ('DAY1', 'DAY29', 'DAY85')
group by subjid, cpevent
;
Then you join that to your table on subjid and cpevent
select a.*, case when b.egrefid_count = 3 then 'triplicate'
else ' ' end as flag
from <whatever your table is> as a
left join
egrefid_count as b
on a.subjid=b.subjid and a.cpevent = b.cpevent

Related

Avoiding aggregation when selecting values from tables

I have the following code which selects value from table2 when 'some string' occurs more than once in 1990
SELECT a.value, COUNT(*) AS test
FROM table1 c
JOIN table2 a
ON c.value2 = a.value_2
JOIN table3 o
ON c.value3 = o.value_3
AND o.value4 = 1990
WHERE c.string = 'Some string'
GROUP BY a.value
HAVING COUNT(*) > 1
This works fine but I am attempting to write a query that produces a similar result without using aggregation. I just need to select values with more then 1 c.string and select those rather than counting and selecting the count as well. I thought about searching for pairs of 'some string' occurring in 1990 for a value but am unsure of how to execute this. Pointing me in the right direction would be appreciated! Struggling to find any documentation referencing this. Thank you!
Use window function ROW_NUMBER() to assign a sequence number within the rows of each table2.value. And use window function FIRST_VALUE() to get the largest row number for each table2.value. Use DISTINCT to remove the duplicates:
select distinct value, first_value(rn) over ( order by rn desc) as count
from
(
SELECT a.value , row_number() over (partition by a.value order by null) rn
FROM table1 c
JOIN table2 a
ON c.value2 = a.value_2
JOIN table3 o
ON c.value3 = o.value_3
AND o.value4 = 1990
WHERE c.string = 'Some string' ) t
where rn > 1;
To check for duplicates, you can use 'WHERE EXISTS', as a starting point. You could start by reading this:
https://www.w3schools.com/sql/sql_exists.asp
This will give you quite a long, cumbersome piece of code compared to using aggregation. But I expect that's the point of the task - to show how useful aggregation is.

Adding a new computed variable back to main dataset in SQL

I am trying to compute a variable (say last_week) and add it back to my main dataset (say new_j). I managed to join it to new_j. However, if I want to use that variable (last_week) now for further calculations, it does not recognise it. Here's my code:
SELECT [Weekkey] AS weekkey
,[article / colour] as prod_id
,[Current MP Department No/Desc] as prod_dept
,[Total Stock] as total_stock
INTO #new_j
FROM [J_20160831] --(that’s the db in server and I created a temp db #new_j)
SELECT prod_id, max(weekkey) as last_week
into #lastweeksales
FROM #new_j
group by prod_id
select *
from #new_j
left join #lastweeksales
on #lastweeksales.prod_id = #new_j.prod_id
So, I joined both successfully and if I run this code, I see column last_week. Now what I want to do is this:
select *
,case
when last_week = max(weekkey) then total_stock
else 0
end as last_stock_position
from #new_j
But it says last_week is not found in new_j. I also tried #lastweeksales.last_week instead of just last_week in the last bit of code, but it didn't either. What's the best way out here? Moreover, is there a better way to do it instead?. The output I am looking to have at the end is a table with these variables: WeekKey, prod_dept, prod_id, total_stock, last_week, last_stock_position
Thanks for the help!!! Much appreciate it.
This normal behaviour of joins..
by selecting this
select * from #new_j left join #lastweeksales
on #lastweeksales.prod_id = #new_j.prod_id'
all the columns of newj and lastweekales will be displayed in same order (first new_j columns and then lastweeksales columns ).So 'last_week' is the last column of lastweeksales.
Secondly,
select *,
case when last_week = max(weekkey) then total_stock
else 0
end as last_stock_position
from #new_j
in above query,your are selecting 'last_week' column which belongs to the table #lastweeksales.
Be careful while selecting the columns.
I guess your expecting,
select a.WeekKey, a.prod_dept, a.prod_id, a.total_stock, b.last_week,
case
when b.last_week = max(a.weekkey) then total_stock
else 0
end as last_stock_position
from #new_j as a
left join #lastweeksales as b
on b.prod_id = a.prod_id
group by a.weekkey,a.prod_dept,a.prod_id,a.total_stock,b.last_week

Select the last non-NULL value when current row is NULL

I know that there are a lot of solutions for this but unfortunately I cannot use partition or keyword TOP. Nothing I tried on earlier posts works.
My table looks like this:
The result I want is when any completion percentage is NULL it should get the value from last non-value completion percentage, like this:
I tried this query but nothing works. Can you tell me where I am going wrong?
SELECT sequence,project_for_lookup,
CASE WHEN completion_percentage IS NOT NULL THEN completion_percentage
ELSE
(SELECT max(completion_percentage) FROM [project_completion_percentage] AS t2
WHERE t1.project_for_lookup=t2.project_for_lookup and
t1.sequence<t2.sequence and
t2.completion_percentage IS NOT null
END
FROM [project_completion_percentage] AS t1
SQL Server 2008 doesn't support cumulative window functions. So, I would suggest outer apply:
select cp.projectname, cp.sequence,
coalesce(cp.completion_percentage, cp2.completion_percentage) as completion_percentage
from completion_percentage cp outer apply
(select top 1 cp2.*
from completion_percentage cp2
where cp2.projectname = cp.projectname and
cp2.sequence < cp.sequence and
cp2.completion_percentage is not null
order by cp2.sequence desc
) cp2;
Does this work? It seems to for me. You were missing a parenthesis and had the sequence backwards.
http://sqlfiddle.com/#!3/465f2/4
SELECT sequence,project_for_lookup,
CASE WHEN completion_percentage IS NOT NULL THEN completion_percentage
ELSE
(
SELECT max(completion_percentage)
FROM [project_completion_percentage] AS t2
WHERE t1.project_for_lookup=t2.project_for_lookup
-- sequence was reversed. You're on the row t1, and want t2 that is from a prior sequence.
and t2.sequence<t1.sequence
and t2.completion_percentage IS NOT null
--missing a closing paren
)
END
FROM [project_completion_percentage] AS t1

Group By & Having vs. SubQuery (Where Count is Greater Than 1)

I'm struggling here trying to write a script that finds where an order was returned multiple times by the same associate (count greater than 1). I'm guessing my syntax with the subquery is incorrect. When I run the script, I get a message back that the "SELECT failed.. [3669] More than one value was returned by the subquery."
I'm not tied to the subquery, and have tried using just the group by and having statements, but I get an error regarding a non-aggregate value. What's the best way to proceed here and how do I fix this?
Thank you in advance - code below:
SEL s.saletran
, s.saletran_dt SALE_DATE
, r.saletran_id RET_TRAN
, r.saletran_dt RET_DATE
, ra.user_id RET_ASSOC
FROM salestrans s
JOIN salestrans_refund r
ON r.orig_saletran_id = s.saletran_id
AND r.orig_saletran_dt = s.saletran_dt
AND r.orig_loc_id = s.loc_id
AND r.saletran_dt between s.saletran_dt and s.saletran_dt + 30
JOIN saletran rt
ON rt.saletran_id = r.saletran_id
AND rt.saletran_dt = r.saletran_dt
AND rt.loc_id = r.loc_id
JOIN assoc ra --Return Associate
ON ra.assoc_prty_id = rt.sls_assoc_prty_id
WHERE
(SELECT count(*)
FROM saletran_refund
GROUP BY ORIG_SLTRN_ID
) > 1
AND s.saletran_dt between '2015-01-01' and current_date - 1
Based on what you've got so far, I think you want to use this instead:
where r.ORIG_SLTRN_ID in
(select
ORIG_SLTRN_ID
from
saletran_refund
group by ORIG_SLTRN_ID
having count (*) > 1)
That will give you the ORIG_SLTRN_IDs that have more than one row.
you don't give enough for a full answer but this is a start
group by s.saletran
, s.saletran_dt SALE_DATE
, r.saletran_id RET_TRAN
, r.saletran_dt RET_DATE
, ra.user_id RET_ASSOC
having count(distinct(ORIG_SLTRN_ID)) > 0
this does return more the an one row
run it
SELECT count(*)
FROM saletran_refund
GROUP BY ORIG_SLTRN_ID

SQL subquery in the AND statement

A couple problems.
Solved valid_from_tsp <> max(valid_from_tsp) - how can I get my query to filter based on not being the max date? This idea doesn't work The error being returned is: "Improper use of an aggregate function in a WHERE clause"
My second issue is when I run it without the date, I am returned a syntax error: Syntax error, expected something like 'IN' keyword or 'CONTAINS' keyword between ')' and ')'
What do you see that I don't? Thanks in advance
Edited Query
select
a.*,
b.coverage_typ_cde as stg_ctc
from P_FAR_BI_VW.V_CLAIM_SERVICE_TYP_DIM a
inner join (select distinct etl_partition_id, coverage_typ_cde from
P_FAR_STG_VW.V_CLAIM_60_POLICY_STG where row_Create_tsp > '2013-11-30 23:23:59')b
on (a.etl_partition_id = b.etl_partition_id)
where a.valid_from_tsp > '2013-11-30 23:23:59'
and a.coverage_typ_cde = ' '
and (select * from P_FAR_SBXD.T_CLAIM_SERVICE_TYP_DIM where service_type_id = 136548255
and CAST(valid_from_tsp AS DATE) <> '2014-03-14')
Trouble part: and (select * from P_FAR_SBXD.T_CLAIM_SERVICE_TYP_DIM where service_type_id = 136548255
and CAST(valid_from_tsp AS DATE) <> '2014-03-14')
I am trying to filter by the date on the service_type_id, and I am getting the error in question 2
As for sample data: This is kinda tricky, This query returns many thousands of rows of data. Currently when I do the inner join, I get a secondary unique index violation error. So I am trying to filter out everything but the more recent which could be under that violation (service_type_id is the secondary index)
If I bring back three rows with the service_type_id with three different valid_from_tsp timestamps, I only want to keep the newest one, and in the query, not return the other two.
I don't know about your second question, but your first error is due to using an aggregate function max in a where clause. I'm not really sure what you want to do here, but a quick fix is to replace max(valid_from_tsp) with a subquery that only returns the maximum value.
This is your query:
select a.*, b.coverage_typ_cde as stg_ctc
from P_FAR_BI_VW.V_CLAIM_SERVICE_TYP_DIM a inner join
(select distinct etl_partition_id, coverage_typ_cde
from P_FAR_STG_VW.V_CLAIM_60_POLICY_STG
where row_Create_tsp > '2013-11-30 23:23:59'
) b
on (a.etl_partition_id = b.etl_partition_id)
where a.valid_from_tsp > '2013-11-30 23:23:59' and
a.coverage_typ_cde = ' ' and
(select *
from P_FAR_SBXD.T_CLAIM_SERVICE_TYP_DIM
where service_type_id = 136548255 and
CAST(valid_from_tsp AS DATE) <> '2014-03-14'
);
In general, you cannot have a subquery just there in the where clause with no condition. Some databases might allow a scalar subquery in this context (one that returns one row and one column), but this isn't a scalar subquery. You can fix the syntax by using exists:
where a.valid_from_tsp > '2013-11-30 23:23:59' and
a.coverage_typ_cde = ' ' and
exists (select 1
from P_FAR_SBXD.T_CLAIM_SERVICE_TYP_DIM
where service_type_id = 136548255 and
CAST(valid_from_tsp AS DATE) <> '2014-03-14'
);