Merge two rows replacing nulls in pivot - sql

Here's my sql
SELECT a."incomeNumber"
, (CASE WHEN b."traitName" = 'sometrait1' THEN b."traitValue" END) AS "numberResult"
, (CASE WHEN b."traitName" = 'sometrait2' THEN b."traitValue" END) AS "dateResult"
FROM "request" a
JOIN "traits" b ON a.id=b."requestId"
WHERE b."traitName" = 'sometrait1'
OR b."traitName" = 'sometrait2'
GROUP BY a."incomeNumber"
, b."traitName"
, b."traitValue"
Result
But I want to get one row 99 1 01.03.2018 per request, I can't сome up with solution how to deal with the trait table as sometrait1 and sometrait2 is the two different rows.
I'm using Postgres 9.6 and I want this solution to be plain sql if it's possible.

Ok, I solved my problem. I just need to remove traitName and traitValue from GROUP BY statement and adding MAX to CASE THEN.

Related

sql group by satisfying multiple conditions within the group

I have a table like below:
I want to select the group which has RELB_CD =9093 and INFO_SRC_CD with 7784. Both conditions should be present in the group. In the table below my output should be the group with id=139993690.
You can use aggregation with having:
select id
from t
group by id
having sum(case when relb_cd = 9093 then 1 else 0 end) > 0 and
sum(case when info_src_cde = 7784 then 1 else 0 end) > 0
hey use this code hope this will help you.
you have to ignore the date column because that one is not allowing to group
select id,fisc_ind, sum(sls_amt),relb_cd,info_scop,info_src_cd from yourtable group by id,fisc_ind,relb_cd,info_scop,info_src_cd
Another working answer. If your data are large, you could compare both GL's and this working answer and see which runs faster for you. I honestly don't know which is faster. This was slightly faster with a very short set of data.
select id
from table1
where relb_cd = 9093
intersect
select id
from table1
where info_src_cd = 7784

selecting details from the table based on the where condition on same column with different filtering option

I am having a table with below specified structure
From the table, I just want to retrieve the product id which is having Ram with value 12 and color with Blue. The expected result is 1.
I tried many queries and it's not sharing the expected result.
What will be the solution?
It's very difficult to manage the separate table for each feature as we have an undefined set of features.
You can use conditional aggregation:
select productid
from t
group by productid
having max(case when feature = 'Ram' then value end) = '12' and
max(case when feature = 'Color' then value end) = 'Blue';
use correlated subquery with not exists
select distinct product_id from tablename a
where not exists
(select 1 from tablename b where a.product_id=b.product_id and feature='Ram' and value<>12)
and not exists
(select 1 from tablename c where a.product_id=c.product_id and feature='Color' and value<>'blue')

SQL fill gaps with hold

I've encountered a problem I cannot solve with my knowledge and I haven't found any solutions I understood good enough to solve my problem.
So here is what I try to achieve.
I have a database with the following structure:
node_id, source_time, value
1 , 10:13:15 , 1
2 , 10:13:15 , 1
2 , 10:13:16 , 2
1 , 10:13:19 , 2
1 , 10:13:25 , 3
2 , 10:13:28 , 3
I want to have a sql query to get the following output
time , value1, value2
10:13:15, 1 , 1
10:13:16, 1 , 2
10:13:19, 2 , 2
10:13:25, 3 , 2
10:13:28, 3 , 3
You see, the times are all times that occur from both nodes.
But the values have to be filled in the gaps since node1 has no value for the time :16 and :28.
I got it to the point where I get the 2 columns from one table. That was not the hard part.
SELECT T1.[value], T2.[value]
FROM [db1].[t_value_history] T1, [db1].[t_value_history] T2
WHERE ( T1.node_id = 1 AND T2.node_id = 2)
But the result doesn't look like the way I want it to be.
I found something with COALESCE and another table which holds the previous value. But that looked quiet complicated for such a easy thing.
I guess there is an easy sql solution but I haven't had much time to get into the materia.
I would be happy to get any idea which function to use.
Thanks so far.
Edit: Changed the database, made a mistake on the last line.
Edit2: I am using SQL Server. Sorry for not clarifying this. Also the values are not neccessarily increasing. I just used increasing numbers in this example here.
This works in SQL Server. If you are certain that there is a value for both nodes for the minimum time then you could change the OUTER APPLY to a CROSS APPLY, which would perform better.
WITH times
AS ( SELECT DISTINCT
source_time
FROM dbo.t_value_history
)
SELECT t.source_time ,
n1.value ,
n2.value
FROM times AS t
OUTER APPLY ( SELECT TOP 1
h.value
FROM dbo.t_value_history AS h
WHERE h.node_id = 1
AND h.source_time <= t.source_time
ORDER BY h.source_time DESC
) AS n1
OUTER APPLY ( SELECT TOP 1
h.value
FROM dbo.t_value_history AS h
WHERE h.node_id = 2
AND h.source_time <= t.source_time
ORDER BY h.source_time DESC
) AS n2;
You could use conditional aggregation to get the right set of rows:
select vh.source_time,
max(case when vh.node_id = 1 then value end) as value_1,
max(case when vh.node_id = 2 then value end) as value_2
from db1.t_value_history vh
group by vh.source_time;
If you want to fill in the values, then the best solution is lag() with ignore nulls. Supported by ANSI, but not by SQL Server (which I'm guessing you are using). Your values appear to be increasing. If that is the case, you can use a cumulative max:
select vh.source_time,
max(max(case when vh.node_id = 1 then value end)) over (order by vh.source_time) as value_1,
max(max(case when vh.node_id = 2 then value end) over (order by vh.source_time) as value_2
from db1.t_value_history vh
group by vh.source_time;
In your data, value is increasing, so this works for the data in your example. If that is not the case, a more complex query is needed to fill in the gaps.
This will do it in SQL Server. It is not 'nice' though:
SELECT DISTINCT
T1.source_time,
CASE WHEN T1.node_id = 1 THEN T1.[value] ELSE ISNULL(T2.[value], T3.[value]) END,
CASE WHEN T1.node_id = 1 THEN ISNULL(T2.[value], T3.[Value]) ELSE T1.[value] END
FROM
[db1].[t_value_history] T1
LEFT OUTER JOIN [db1].[t_value_history] T2 ON T2.source_time = T1.source_time
AND T2.node_id <> T1.node_id -- This join looks for a value for the other node at the same time.
LEFT OUTER JOIN [db1].[t_value_history] T3 ON T3.source_time < T1.source_time
AND T3.node_id <> T1.node_id -- If the previous join is empty, this looks for values for the other node at previous times
LEFT OUTER JOIN [db1].[t_value_history] T4 ON T4.source_time > T3.source_time
AND T4.source_time < T1.source_time
AND T4.node_id <> T1.node_id -- This join makes sure there aren't any more recent values
WHERE
T4.node_id IS NULL

SQL query and joins

Please see my query below:
select I.OID_CUSTOMER_DIM, I.segment as PISTACHIO_SEGMENT,
MAX(CASE WHEN S.SUBSCRIPTION_TYPE = '5' THEN 'Y' ELSE 'N' END ) PB_SUBS,
max(case when S.SUBSCRIPTION_TYPE ='12' then 'Y' else 'N' end) DAILY_TASTE,
MAX(CASE WHEN S.SUBSCRIPTION_TYPE ='8' THEN 'Y' ELSE 'N' END) COOKING_FOR_TWO
FROM WITH_MAIL_ID i JOIN CUSTOMER_SUBSCRIPTION_FCT S
ON I.IDENTITY_ID = S.IDENTITY_ID
WHERE S.SITE_CODE ='PB'and S.SUBSCRIPTION_END_DATE is null
group by I.oid_customer_dim, I.segment
In this one I am getting 654105 rows, which is lower than the one of the joins table with_mail_id which has 706795 rows.
Now, for the qc purpose my manager is wondering as why I am not having all the rows in my final table. I tried to remove all the filters but the results are still not same in both tables. What am I doing wrong?
I am not very good in SQL yet and this thing is really confusing me.
You're doing an inner join on the two tables, so only rows from WITH_MAIL_ID that can join against CUSTOMER_SUBSCRIPTION_FCT will be returned. Additionally you have a group clause.
First the join. If you want to return all rows regardless of the join condition, you can use a left join, but in this case all the S. columns will be NULL, and you'll have to deal with that.
If you run this, you might see the count is the difference:
select count(*) from WITH_MAIL_ID i
left join CUSTOMER_SUBSCRIPTION_FCT S
on I.IDENTITY_ID = S.IDENTITY_ID
where s.IDENTITY_ID is NULL
The most likely thing however is that it's just the grouping. If you are grouping on two columns and selecting the max of various other columns based on that grouping, you would expect that the number of rows returned is less than the original table, otherwise why bother grouping?
If I have data like this:
groupkey1 value
1 2
1 10
2 1
2 1
Then I group by groupkey1, and select MAX(value) I would get 2 rows [1,2], [2,1], not 4 rows.

Most optimized way to get column totals in SQL Server 2005+

I am creating some reports for an application to be used by various states. The database has the potential to be very large. I would like to know which way is the best way to get column totals.
Currently I have SQL similar to the following:
SELECT count(case when prg.prefix_id = 1 then iss.id end) +
count(case when prg.prefix_id = 2 then iss.id end) as total,
count(case when prg.prefix_id = 1 then iss.id end) as c1,
count(case when prg.prefix_id = 2 then iss.id end) as c2
FROM dbo.TableName
WHERE ...
As you can see, the columns are in there twice. In one instance, im adding them and showing the total, in the other im just showing the individual values which is required for the report.
This is a very small sample of the SQL, there are 20+ columns and w/i those columns 4 or more of them are being summed at times.
I was thinking of declaring some #Parameters and setting each of the columns equal to a #Parameter, then I could just add up which ever #Parameters I needed to show the column totals, IE: SET #Total = #c1 + #c2
But, does the SQL Server engine even care the columns are in there multiple times like that? Is there a better way of doing this?
Any reason this isn't done as
select prg.prefix_id, count(1) from tablename where... group by prg.prefix_id
It would leave you with a result set of the prefix_id and the count of rows for each prefix_ID...might be preferential over a series of count(case) statements, and I think it should be quicker, but I can't confirm for sure.
I would use a subquery before resorting to #vars myself. Something like this:
select c1,c2,c1+c1 as total from
(SELECT
count(case when prg.prefix_id = 1 then iss.id end) as c1,
count(case when prg.prefix_id = 2 then iss.id end) as c2
FROM dbo.TableName
WHERE ... ) a
Use straight SQL if you can before resorting to T-SQL procedure logic. Rule of thumb if you can do it in SQL do it in SQL. If you want to emulate static values with straight SQL try a inline view like this:
SELECT iv1.c1 + iv1.c2 as total,
iv1.c1,
iv1.c2
FROM
(
SELECT count(case when prg.prefix_id = 1 then iss.id end) as c1,
count(case when prg.prefix_id = 2 then iss.id end) as c2
FROM dbo.TableName
WHERE ...
) AS iv1
This way you logically are getting the counts once and can compute values based on those counts. However I think SQL Server is smart enough to not have to scan for the count n number of times so I don't know that your plan would differ from the SQL I sent and the SQL you have.