Is it possible to select a column from a database and not display it in the result set - if selected, causes an issue with Group By - sql

I have a query that's being use in my application. I had to make a small edit to it (select an additional column) and now that I do that, I don't get the same results, therefore I get a bad file. Just to give example this is what the query looks like ....
Select
'X' = tblA.VendorNumber,
'Y' = tblB.Label,
'Z' = tblC.InvoiceNo,
'W' = tblD.Checks,
From //Doing some joins here
Group By
tblA.VendorNumber, tblB.label, tblc.InvoiceNo, tblD.Checks
The result set gives me many records, but groups by the ones with identical X,Y,Z,W - so with no Group By it would look like this
X Y Z W
-----------------------------------------------------------
123 Anton 772 0
123 Anton 772 0
Obviously, with the group by they are rolled up into one...
The issue comes when I try to include an additional column in my Select query. I need this query in my, because I need to value in my code to be able to distinguish what type of record it is. With the new column, these two rows of data are not the same, therefore they do not get rolled up.
Is there a way for me to somehow add an additional column, but not display it, and exclude it from the Group By?
This is what I mean
Select
'X' = tblA.VendorNumber,
'Y' = tblB.Label,
'Z' = tblC.InvoiceNo,
'W' = tblD.Checks,
'P' = tblC.Proc -- New column
From //Doing some joins here
Group By
tblA.VendorNumber, tblB.label, tblc.InvoiceNo, tblD.Checks,
tblC.Proc -- New column
In this case the data looks like this
X Y Z W P
---------------------------------------------------------------------
123 Anton 772 0 FPN
123 Anton 772 0 PPN
So now that P is different for the 2 records that previously were rolled up into one, is there a way for me to somehow not display P, however, still be able to get it's value from my record set. I am unable to select the 'P' if it's not selected in this one query and because of the fact that the two records are not rolling up, I'm having some major issues.
Basically I need to select 'P' but not include it in my result set or group by.
Any help would be much appreciated.

There's a couple of options, I guess... There's not really a way to get values without having them in your results. How else would you read them?
One would be to STUFF the values of 'P' into a comma delimited list ie:
X Y Z W P
123 Anton 772 0 FPN,PPN
Then read them in your application separated by comma. Read here: https://stackoverflow.com/questions/31211506/how-stuff-and-for-xml-path-work-in-sql-server
Another would be to create boolean headers if there's not too many options for 'P' or you know all of the options. You can create them using CASE statements like:
,SUM(CASE WHEN tlbc.Proc = 'PPN' THEN 1 ELSE 0 END) AS "PPN"
EDIT: Left out an aggregate around it so it groups correctly. Can use MAX, for 1 or 0, as well... depends how many results there can be for tlbc.Proc. Some aggregate function around your cases will combine your rows to one.
For results like:
X Y Z W FPN PPN AnotherP
123 Anton 772 0 1 1 0
Third, if I misread your question and you don't need the values but just need them in a WHERE, then don't display them.
There's definitely more ways to go about this.
Does this help?

You need to do some sort of aggregate function on P so you don't need to group on it. If you only care about one of the values, you could use MIN() or MAX() to get one of them.
So, something like:
SELECT X, Y, Z, MAX(P) AS P etc....
Another way would be to use something like FOR XML PATH to pivot the values into a single value (maybe a comma separated list). There are lots of examples online of people doing this. Here is one example:
https://sqlperformance.com/2014/08/t-sql-queries/sql-server-grouped-concatenation

If you only want one record per first DISTINCT four column, which one of the P column you want to show?
If you have an answer for this question, just add to the query
Where tblC.Proc = -->YOUR_EXPECTED_VALUE_OR_CONDITION_HERE<--

Related

Values in one Column different but second column the same SQL

I have an initial query written below and need to find values in the quote_id column that different but the corresponding values in the benefit_plan_cd column are the same. The output should look like the below. I know the prospect_nbr for this issue which is why I am able to add it to my initial query to get the expected results but need to be able to find other ones going forward.
select prospect_nbr, qb.quote_id, quote_type, effective_date,
benefit_plan_cd, package_item_cd
from qo_benefit_data qb
inner join
qo_quote qq on qb.quote_id = qq.quote_id
where quote_type = 'R'
and effective_date >= to_date('06/01/2022','mm/dd/yyyy')
and package_item_cd = 'MED'
Output should look like something like this excluding the other columns.
quote_id benefit_plan_cd
514 1234
513 1234
Let's do this in two steps.
First take your existing query and add the following at the end of your select list:
select ... /* the columns you have already */
, count(distinct quote_id partition by benefit_plan_id) as ct
That is the only change - don't change anything else. You may want to run this first, to see what it produces. (Looking at a few rows should suffice, you don't need to look at all the rows.)
Then use this as a subquery, to filter on this count being > 1:
select ... /* only the ORIGINAL columns, without the one we added */
from (
/* write the query from above here, as a SUBquery */
)
where ct > 1
;

How do I do a sum per id?

SELECT distinct
A.PROPOLN, C.LIFCLNTNO, A.PROSASORG, sum (A.PROSASORG) as sum
FROM [FPRODUCTPF] A
join [FNBREQCPF] B on (B.IQCPLN=A.PROPOLN)
join [FLIFERATPF] C on (C.LIFPOLN=A.PROPOLN and C.LIFPRDCNT=A.PROPRDCNT and C.LIFBNFCNT=A.PROBNFCNT)
where C.LIFCLNTNO='2012042830507' and A.PROSASORG>0 and A.PROPRDSTS='10' and
A.PRORECSTS='1' and A.PROBNFLVL='M' and B.IQCODE='B10000' and B.IQAPDAT>20180101
group by C.LIFCLNTNO, A.PROPOLN, A.PROSASORG
This does not sum correctly, it returns two lines instead of one:
PROPOLN LIFCLNTNO PROSASORG sum
1 209814572 2012042830507 3881236 147486968
2 209814572 2012042830507 15461074 463832220
You are seeing two rows because A.PROSASORG has two different values for the "C.LIFCLNTNO, A.PROPOLN" grouping.
i.e.
C.LIFCLNTNO, A.PROPOLN, A.PROSASORG together give you two unique rows.
If you want a single row for C.LIFCLNTNO, A.PROPOLN, then you may want to use an aggregate on A.PROSASORG as well.
Your entire query is being filtered on your "C" table by the one LifClntNo,
so you can leave that out of your group by and just have it as a MAX() value
in your select since it will always be the same value.
As for you summing the PROSASORG column via comment from other answer, just sum it. Hour column names are not evidently clear for purpose, so I dont know if its just a number, a quantity, or whatever. You might want to just pull that column out of your query completely if you want based on a single product id.
For performance, I would suggest the following indexes on
Table Index
FPRODUCTPF ( PROPRDSTS, PRORECSTS, PROBNFLVL, PROPOLN )
FNBREQCPF ( IQCODE, IQCPLN, IQAPDAT )
FLIFERATPF ( LIFPOLN, LIFPRDCNT, LIFBNFCNT, LIFCLNTNO )
I have rewritten your query to put the corresponding JOIN components to the same as the table they are based on vs all in the where clause.
SELECT
P.PROPOLN,
max( L.LIFCLNTNO ) LIFCLNTNO,
sum (P.PROSASORG) as sum
FROM
[FPRODUCTPF] P
join [FNBREQCPF] N
on N.IQCODE = 'B10000'
and P.PROPOLN = N.IQCPLN
and N.IQAPDAT > 20180101
join [FLIFERATPF] L
on L.LIFCLNTNO='2012042830507'
and P.PROPOLN = L.LIFPOLN
and P.PROPRDCNT = L.LIFPRDCNT
and P.PROBNFCNT = L.LIFBNFCNT
where
P.PROPRDSTS = '10'
and P.PRORECSTS = '1'
and P.PROBNFLVL = 'M'
and P.PROSASORG > 0
group by
P.PROPOLN
Now, one additional issue you will PROBABLY be running into. You are doing a query with multiple joins, and it appears that there will be multiple records in EACH of your FNBREQCPF and FLIFERATPF tables for the same FPRODUCTPF entry. If you, you will be getting a Cartesian result as the PROSASORG value will be counted for each instance combination in the two other tables.
Ex: FProductPF has ID = X with a Prosasorg value of 3
FNBreQCPF has matching records of Y1 and Y2
FLIFERATPF has matching records of Z1, Z2 and Z3.
So now your total will be equal to 3 times 6 = 18.
If you look at the combinations, Y1:Z1, Y1:Z2, Y1:Z3 AND Y2:Z1, Y2:Z2, Y2:Z3 giving your 6 entries that qualify, times the original value of 3, thus bloating your numbers -- IF such multiple records may exist in each respective table. Now, imagine if your tables have 30 and 40 matching instances respectively, you have just bloated your totals by 1200 times.

How to select multiple values with the same ids and put them in one row, while maintaining the id to value connection?

I have a processknowledgeentry table that has the following data:
pke_id prc_id knw_id
1 1 2
2 1 4
3 2 4
The column knw_id references another table called knowledge, which also has its own id column. I want to be able to select all knw_id values with the same prc_id, and have them retain its nature as an id (so that it remains referenceable to the knowledge table).
Desired result:
prc_id knw_ids
1 [2, 4]
My code is shown below. (It also selects a Process Name from another table called process by inner joining the prc_ids. That part works correctly at least.)
SELECT * FROM (
SELECT
p.prc_name,
(SELECT knw_id
FROM processknowledgeentry
GROUP BY knw_id
HAVING COUNT(*) > 1)
FROM processknowledgeentry pke
INNER JOIN process p
ON pke.prc_id=p.prc_id
WHERE pke.prc_id = %s) as temp
I get the error: "CardinalityViolation: more than one row returned by a subquery used as an expression", and I understand why the error exists, so I want to know how to work around it. I'm also not sure if my logic is correct.
Would appreciate any assistance, thank you!
Seems you need a STRING_AGG() function instead of GROUP_CONCAT(), which some other DBMS has, containing a string type parameter as the first argument along with HAVING clause which filters multiple prc_id values such as
SELECT p.prc_id, STRING_AGG(knw_id::TEXT,',') AS knw_ids
FROM processknowledgeentry pke
JOIN process p
ON pke.prc_id = p.prc_id
-- WHERE pke.prc_id = %s
GROUP BY p.prc_id
HAVING COUNT(pke.prc_id) > 1
Indeed this case, a WHERE clause won't be needed.
Demo

What is "Select -1", and how is it different from "Select 1"?

I have the following query that is part of a common table expression. I don't understand the function of the "Select -1" statement. It is obviously different than the "Select 1" that is used in "EXISTS" statements. Any ideas?
select days_old,
count(express_cd),
count(*),
case
when round(count(express_cd)*100.0/count(*),2) < 1 then '0'
else ''
end ||
cast(decimal(round(count(express_cd)*100.0/count(*),2),5,2) as varchar(7)) ||
'%'
from foo.bar
group by days_old
union all
select -1, -- Selecting the -1 here
count(express_cd),
count(*),
case
when round(count(express_cd)*100.0/count(*),2) < 1 then '0'
else ''
end ||
cast(decimal(round(count(express_cd)*100.0/count(*),2),5,2) as varchar(7)) ||
'%'
from foo.bar
where days_old between 1 and 7
It's just selecting the number "minus one" for each row returned, just like "select 1" will select the number "one" for each row returned.
There is nothing special about the "select 1" syntax uses in EXISTS statements by the way; it's just selecting some random value because EXISTS requires a record to be returned and a record needs data; the number 1 is sufficient.
Why you would do this, I have no idea.
When you have a union statement, each part of the union must contain the same columns. From what I read when I look at this, the first statement is giving you one line for each days old value and then some stats for each day old. The second part of the union is giving you a summary of all the records that are only a week or so less. Since days old column is not relevant here, they put in a fake value as a placeholder in order to do the union. OF course this is just a guess based on reading thousands of queries through the years. To be sure, I would need to actually run teh code.
Since you say this is a CTE, to really understand why this is is happening, you may need to look at the data it generates and how that data is used in the next query that uses the CTE. That might answer your question.
What you have asked is basically about a business rule unique to your company. The true answer should lie in any requirements documents for the original creation of the code. You should go look for them and read them. We can make guesses based on our own experience but only people in your company can answer the why question here.
If you can't find the documentation, then you need to talk (Yes directly talk, preferably in person) to the Stakeholders who use the data and find out what their needs were. Only do this after running the code and analyzing the results to better understand the meaning of the data returned.
Based on your query, all the records with days_old between 1 and 7 will be output as '-1', that is what select -1 does, nothing special here and there is no difference between select -1 and select 1 in exists, both will output the records as either 1 or -1, they are doing the same thing to check whether if there has any data.
Back to your query, I noticed that you have a union all and compare each four columns you select connected by union all, I am guessing your task is to get a final result with days_old not between 1 and 7 and combine the result with day_old, which is one because you take all between 1 and 7.
It is just a grouping logic there.
Your query returns aggregated
data (counts and rounds) grouped by days_old column plus one more group for data where days_old between 1 and 7.
So, -1 is just another additional group there, it cannot be 1 because days_old=1 is an another valid group.
result will be like this:
row1: days_old=1 count(*)=2 ...
row2: days_old=3 count(*)=5 ...
row3: days_old=9 count(*)=6 ...
row4: days_old=-1 count(*)=7

How to use aggregate function to filter a dataset in ssrs 2008

I have a matrix in ssrs2008 like below:
GroupName Zone CompletedVolume
Cancer 1 7
Tunnel 1 10
Surgery 1 64
ComplatedVolume value is coming by a specific expression <<expr>>, which is equal to: [Max(CVolume)]
This matrix is filled by a stored procedure that I am not supposed to change if possible. What I need to do is that not to show the data whose CompletedVolume is <= 50. I tried to go to tablix properties and add a filter like [Max(Q9Volume)] >= 50, but when I try to run the report it says that aggregate functions cannot be used in dataset filters or data region filters. How can I fix this as easy as possible?
Note that adding a where clause in sql query would not solve this issue since there are many other tables use the same SP and they need the data where CompletedVolume <= 50. Any help would be appreciated.
EDIT: I am trying to have the max(Q9Volume) value on SP, but something happening I have never seen before. The query is like:
Select r.* from (select * from results1 union select * from results2) r
left outer join procedures p on r.pid = p.id
The interesting this is there are some columns I see that does not included by neither results1/results2 nor procedures tables when I run the query. For example, there is no column like Q9Volume in the tables (result1, result2 and procedures), however when I run the query I see the columns on the output! How is that possible?
You can set the Row hidden property to True when [Max(CVolume)] is less or equal than 50.
Select the row and go to Row Visibility
Select Show or Hide based on an expression option and use this expression:
=IIF(
Max(Fields!Q9Volume.Value)<=50,
True,False
)
It will show something like this:
Note maximum value for Cancer and Tunnel are 7 and 10 respectively, so
they will be hidden if you apply the above expression.
Let me know if this helps.