SQL Merging rows with specific matching columns - sql

I'm trying to handle this issue in my code that should be handled in SQL. The problem is I want to take this
ID COL 1 | ID COL 2 | CHARGE | PAYMENT
2 | 3 | 17 | 0
2 | 3 | 0 | 17
and turn it into this
ID COL 1 | ID COL 2 | CHARGE | PAYMENT
2 | 3 | 17 | 17
table1
id | whatever | whatever1
5 | null | null
table2
id | id col 1 | id col 2 | charge | payment
5 | 2 | 3 | 17 | 0
5 | 2 | 3 | 0 | 17
current result:
id | whatever | whatever1 | idcol1 | idcol2 | charge | payment
5 | null | null | 2 | 3 | 17 | 0
5 | null | null | 2 | 3 | 0 | 17
want:
id | whatever | whatever1 | idcol1 | idcol2 | charge | payment
5 | 2 | 3 | 17 | 17
The problem is during my sql call I'm doing an inner join, which does a cartesian product for some of the values rather than doing what I'd want above. Does anyone have an idea how this could be accomplished?

You can use group by for this:
select IDCOL1, IDCOL2, max(CHARGE) as charge, max(PAYMENT) as payment
from table t
group by idcol1, idcol2
This gets the maximum charge and maximum payment from all rows with the same id columns. If you have more than one row with charges or payment, you might prefer SUM() to MAX().
With the joins, this would look like:
select t1.id, t1.whatever, t1.whatever, t2.IDCOL1, t2.IDCOL2,
max(CHARGE) as charge, max(PAYMENT) as payment
from table1 t1 join
table2 t2
on t1.id = t2.id
group by t1.id, t1.whatever, t1.whatever, t2.IDCOL1, t2.IDCOL2

Related

Is it faster to do WHERE IN or INNER JOIN in Redshift

I have 2 tables in redshift:
table1
| ids |
|------:|
| 1 |
| 2 |
| 6 |
| 9 |
| 12 |
table2
| id | value |
|-----:|---------:|
| 1 | 0.134435 |
| 2 | 0.767417 |
| 3 | 0.779567 |
| 4 | 0.726051 |
| 5 | 0.405138 |
| 6 | 0.775206 |
| 7 | 0.699945 |
| 8 | 0.499433 |
| 10 | 0.457386 |
| 9 | 0.227511 |
| 10 | 0.369292 |
| 11 | 0.653735 |
| 12 | 0.537251 |
| 2 | 0.953539 |
| 13 | 0.377625 |
| 14 | 0.973905 |
| 4 | 0.104643 |
| 1 | 0.450627 |
And I basically want to get the rows in table2 where id is in table1 and I have 2 possibilities:
SELECT *
FROM table2
WHERE id IN (SELECT ids FROM table1)
or
SELECT t2.id, t2.value
FROM table2 t2
INNER JOIN table1 t1
ON t2.id = t1.ids
I want to know if there is any performance difference between them.
(I know I could just test in this example to find out but I would like to know if there is one which is always faster)
Edit: table1.ids is a unique column
The two queries do different things.
The JOIN can multiply the number of rows if id is duplicated in table1.
The IN will never duplicate rows.
If id can be duplicated, you should use the version that does what you want. If id is guaranteed to be unique, then the two are functionally equivalent.
In my experience, JOIN is typically at least as fast a IN. Of course, you can test on your data, but that is a starting point.

Sum with 3 tables to join

I have 3 tables. The link between the first and the second table is REQ_ID and the link between the second and the third table is ENC_ID. There is no direct link between the first and the third table.
INS_RCPT
+----+--------+------+----------+
| ID | REQ_ID | CURR | RCPT_AMT |
+----+--------+------+----------+
| 1 | 1 | USD | 100 |
| 2 | 2 | USD | 200 |
| 3 | 3 | USD | 300 |
+----+--------+------+----------+
ENC_LOG
+----+--------+--------+-------------+
| ID | REQ_ID | ENC_ID | ENC_LOG_AMT |
+----+--------+--------+-------------+
| 1 | 1 | 1 | 20 |
| 2 | 1 | 2 | 50 |
| 3 | 1 | 3 | 30 |
| 4 | 2 | 4 | 20 |
+----+--------+--------+-------------+
ENC_RCPT
+----+--------+--------------+
| ID | ENC_ID | ENC_RCPT_AMT |
+----+--------+--------------+
| 1 | 1 | 10 |
| 2 | 1 | 10 |
| 3 | 2 | 15 |
| 4 | 2 | 25 |
| 5 | 2 | 10 |
| 6 | 3 | 12 |
| 7 | 3 | 18 |
| 8 | 4 | 10 |
+----+--------+--------------+
I would like to have output as follows:
+----+--------+------+----------+-------------+--------------+
| ID | REQ_ID | CURR | RCPT_AMT | ENC_LOG_AMT | ENC_RCPT_AMT |
+----+--------+------+----------+-------------+--------------+
| 1 | 1 | USD | 100 | 100 | 100 |
| 2 | 2 | USD | 200 | 20 | 10 |
| 3 | 3 | USD | 300 | 0 | 0 |
+----+--------+------+----------+-------------+--------------+
I am using SQL Server to write this query. Any help is appreciated.
One approach would be to join the first table to two subqueries which compute the sums separately:
SELECT
ir.ID,
ir.REQ_ID,
ir.CURR,
ir.RCPT_AMT,
el.ENC_LOG_AMT,
er.ENC_RCPT_AMT
FROM INS_RCPT ir
LEFT JOIN
(
SELECT REQ_ID, SUM(ENC_LOG_AMT) AS ENC_LOG_AMT
FROM ENC_LOG
GROUP BY REQ_ID
) el
ON ir.REQ_ID = el.REQ_ID
LEFT JOIN
(
SELECT t1.REQ_ID, SUM(t2.ENC_RCPT_AMT) AS ENC_RCPT_AMT
FROM ENC_LOG t1
INNER JOIN ENC_RCPT t2 ON t1.ENC_ID = t2.ENC_ID
GROUP BY t1.REQ_ID
) er
ON ir.REQ_ID = er.REQ_ID
Demo
Note that your question includes a curve ball. The second subquery needs to return aggregates of the receipt table by REQ_ID, even though this field does not appear in that table. As a result, we actually need to join ENC_LOG to ENC_RCPT in that subquery, and then aggregate by REQ_ID.
You can try the below query. Also change the join from left to inner as per your requirement.
select a.id,a.req_id,a.curr,sum(a.rcpt_amt) rcpt_amt,sum(a.enc_log_amt) enc_log_amt,sum(c.enc_rcpt_amt) enc_rcpt_amt
from
(
select a.id id ,a.req_id req_id ,a.curr curr,sum(rcpt_amt) as rcpt_amt,sum(enc_log_amt) as enc_log_amt
from ins_rcpt a
left join enc_log b
on a.req_id=b.req_id
group by id,req_id,curr
) a
left join enc_rcpt c
on a.enc_id = c.enc_id
group by id,req_id,curr;

How to sum rows before a condition is met in SQL

I have a table which has multiple records for the same id. Looks like this, and the rows are sorted by sequence number.
+----+--------+----------+----------+
| id | result | duration | sequence |
+----+--------+----------+----------+
| 1 | 12 | 7254 | 1 |
+----+--------+----------+----------+
| 1 | 12 | 2333 | 2 |
+----+--------+----------+----------+
| 1 | 11 | 1000 | 3 |
+----+--------+----------+----------+
| 1 | 6 | 5 | 4 |
+----+--------+----------+----------+
| 1 | 3 | 20 | 5 |
+----+--------+----------+----------+
| 2 | 1 | 230 | 1 |
+----+--------+----------+----------+
| 2 | 9 | 10 | 2 |
+----+--------+----------+----------+
| 2 | 6 | 0 | 3 |
+----+--------+----------+----------+
| 2 | 1 | 5 | 4 |
+----+--------+----------+----------+
| 2 | 12 | 3 | 5 |
+----+--------+----------+----------+
E.g. for id=1, i would like to sum the duration for all the rows before and include result=6, which is 7254+2333+1000+5. Same for id =2, it would be 230+10+0. Anything after the row where result=6 will be left out.
My expected output:
+----+----------+
| id | duration |
+----+----------+
| 1 | 10592 |
+----+----------+
| 2 | 240 |
+----+----------+
The sequence has to be in ascending order.
I'm not sure how I can do this in sql.
Thank you in advance!
I think you want:
select t2.id, sum(t2.duration)
from t
where t.sequence <= (select t2.sequence
from t t2
where t2.id = t.id and t2.result = 6
);
In PrestoDB, I would recommend window functions:
select id, sum(duration)
from (select t.*,
min(case when result = 6 then sequence end) over (partition by id) as sequence_6
from t
) t
where sequence <= sequence_6;
You can use a simple aggregate query with a condition that uses a subquery to recover the sequence corresponding to the record whose sequence is 6 :
SELECT t.id, SUM(t.duration) total_duration
FROM mytable t
WHERE t.sequence <= (
SELECT sequence
FROM mytable
WHERE id = t.id AND result = 6
)
GROUP BY t.id
This demo on DB Fiddle with your test data returns :
| id | total_duration |
| --- | -------------- |
| 1 | 10592 |
| 2 | 240 |
Basic group by query should solve your issue
select
id,
sum(duration) duration
from t
group by id
for the certain rows:
select
id,
sum(duration) duration
from t
where id = 1
group by id
if you want to include it in your result set
select id, duration, sequence from t
union all
select
id,
sum(duration) duration
null sequence
from t
group by id

Create a combined list from two tables

I have a table with CostCenter_ID (int) and a second table with Process_ID (int).
I'd like to combine the results of both tables so that each cost center ID is assigned to all process IDs, like so:
|CostCenterID | ProcessID |
---------------------------
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 3 | 1 |
| 3 | 2 |
| 3 | 3 |
I've done it before but I'm drawing a blank. I've tried this:
SELECT CostCenter_ID,NULL FROM dbo.Cost_Centers
UNION ALL
SELECT NULL,Process_ID FROM dbo.Processes
which returns this:
|CostCenterID | ProcessID |
---------------------------
| 1 | NULL |
| NULL | 1 |
| NULL | 2 |
| NULL | 3 |
Try:
select a.CostCenterID, b.ProcessID
from table1 a
cross join table2 b
or:
select a.CostCenterID, b.ProcessID
from table1 a
,table2 b
NB: cross join is the better method as it makes it clearer to the reader what your intentions are.
More info (with pics) here: http://www.w3resource.com/sql/joins/cross-join.php

Query for data in two tables connected by a third. Data Sometimes only on one

I thought I could figure this out but I am having a lot of issues.I have 3 Tables, Table1, Table2, and Table3. These tables where designed by someone else and I have to work with them. They were not designed to be used the way they are used today.
The bottom line is I need to be able to enter an Item_No, this will always exist in Table2. And if the Item_No can also be found in Table 3, could be multiple times or none, and there can be times where I can find it 5 times in Table2 and only 3 times in Table3. If it is in Table3 it will also be in Table1.
So, using the Item_No i can find on Table2, return the Order_qty's associated with those rows. Then using the if exist getting Table1.ID where Table1.ID = Table3.ID WHERE Table3.Item_No = Table2.Item_No
I came up with the following, it does not give me errors but simply stops code execution during a C# fill. I had it working for finding the Item_No on Table3 and returning what it finds, I have ONLY changed this line of code since so I KNOW this is the issue.
Here is what I could come up with that is not working:
SELECT Table1.ID,
Table2.Order_Qty As [Qty of Full Order], Table2.Item_No As [Set No]
FROM Table2
LEFT JOIN Table3
ON Table2.Item_No = Table3.Item_No
AND Table2.Order_No = Table3.Order_No
LEFT JOIN Table1
ON Table1.Order_No = Table2.Order_No
AND Table1.ID = Table3.ID
WHERE Table2.Item_No = #m_strUserEnteredSeachValue
ORDER BY Table2.Order_No DESC
*Example Data: *
Table 1
+----------+--------------+-------------------+
| Order_No | Sub_Order_No | Sub_Order_Contact |
+==========+==============+===================+
| 1 | 1 | John Doe |
+----------+--------------+-------------------+
| 1 | 2 | Jane Doe |
+----------+--------------+-------------------+
| 1 | 3 | Foo |
+----------+--------------+-------------------+
| 1 | 4 | Bar |
+----------+--------------+-------------------+
| 1 | 5 | Foo2 |
+----------+--------------+-------------------+
Table 2
+----------+--------------+-------------------+
| Order_No | Item_No | Customer_Item_Name|
+==========+==============+===================+
| 1 | 1 | 1234567890 |
+----------+--------------+-------------------+
| 1 | 2 | 1234567891 |
+----------+--------------+-------------------+
| 1 | 3 | 1234567892 |
+----------+--------------+-------------------+
| 1 | 4 | 1234567893 |
+----------+--------------+-------------------+
| 1 | 5 | 1234567894 |
+----------+--------------+-------------------+
| 1 | 6 | 1234567895 |
+----------+--------------+-------------------+
| 2 | 1 | 0987654321 |
+----------+--------------+-------------------+
| 2 | 2 | 0987654322 |
+----------+--------------+-------------------+
| 2 | 3 | 0987654323 |
+----------+--------------+-------------------+
| 3 | 1 | 1234567893 |
+----------+--------------+-------------------+
And Table 3
+----------+--------------+-------------------+--------------+
| Order_No | Item_No | Customer_Item_Name| Sub_Order_No |
+==========+==============+===================+==============+
| 1 | 1 | 1234567890 | 1 |
+----------+--------------+-------------------+--------------+
| 1 | 2 | 1234567891 | 2 |
+----------+--------------+-------------------+--------------+
| 1 | 3 | 1234567892 | 2 |
+----------+--------------+-------------------+--------------+
| 1 | 4 | 1234567893 | 3 |
+----------+--------------+-------------------+--------------+
| 1 | 5 | 1234567894 | 4 |
+----------+--------------+-------------------+--------------+
| 1 | 6 | 1234567895 | 4 |
+----------+--------------+-------------------+--------------+
| 1 | 4 | 1234567893 | 4 |
+----------+--------------+-------------------+--------------+
The Result I am looking for: If I search for Item 1234567893
+----------+--------------+-------------------+--------------+-------------------+
| Order_No | Item_No | Customer_Item_Name| Sub_Order_No | Sub_Order_Contact |
+==========+==============+===================+==============+===================+
| 3 | 1 | 1234567893 | | |
+----------+--------------+-------------------+--------------+-------------------+
| 1 | 4 | 1234567893 | 3 | Foo |
+----------+--------------+-------------------+--------------+-------------------+
| 1 | 4 | 1234567893 | 4 | Bar |
+----------+--------------+-------------------+--------------+-------------------+
A pragmatic answer to a problem like this is to split it into a couple of queries. Query Table #2 first, and then based on that result set, run additional queries into #1 or #3.
Another angle is to query on Table #2 and use subqueries to reach-out-there into Table #1 or Table #3 to fetch data you need.
Try this:
declare #m_strUserEnteredSeachValue varchar(10) = '1234567893';
with a as
(
select
Order_No, Item_No, Customer_Item_Name
from
Table2
UNION
select
Order_No, Item_No, Customer_Item_Name
from
Table3
)
select
a.Order_No,
a.Item_No,
a.Customer_Item_Name,
Table3.Sub_Order_No,
Table1.Sub_Order_Contact
from
a
left join
Table3
on
Table3.Order_No=a.Order_No
and Table3.Item_No=a.Item_No
and Table3.Customer_Item_Name=a.Customer_Item_Name
left join
Table1
on
Table1.Sub_Order_No = Table3.Sub_Order_No
where
#m_strUserEnteredSeachValue = a.Customer_Item_Name
order by
a.Item_No, Table3.Sub_Order_No
SqlFiddle demo: http://www.sqlfiddle.com/#!3/973d8/3
I have no idea if this is what you are trying to arrive at or not, since it's difficult to understand from you question. All I know that this query gives the dataset that you put in OP.