Oracle group by only ONE column - update - sql

I am stuck in similar situation as this.
I have multiple columns with different types of data, and I want to select all columns but group by it with only one column.
My Table:
+--------+----------+----------+-------+-----------------------+
| id | b_group | col2 | col3 | col4 |
+--------+----------+----------+-------+-----------------------+
| 1 | 1 | abcd | 100 | www.google.com |
| 2 | 1 | xyz | 200 | www.yahoo.com |
| 3 | 2 | dfs | 200 | www.stackoverflow.com |
| 4 | 3 | asda3 | 78 | www.imdb.com |
| 5 | 4 | zsdvf4 | 65 | www.youtube.com |
| 6 | 5 | sdf4 | 101 | www.ymail.com |
| 7 | 5 | ssdfsd | 200 | www.gmail.com |
| 8 | 1 | zxcgdf4 | 200 | www.club.com |
| 9 | 6 | yujhgj | 202 | www.thunderbird.com |
+--------+----------+----------+-------+-----------------------+
After reading the solution provided there, what I understood is to use aggregate function so my query is like:
select MIN(b_group),id,col2,col3,col4 from myTable where col3='200' group by id,col2,col3,col4;
But this is not working in my case, it is giving all the records where col3=200.
My desired Output:
+--------+----------+----------+-------+-----------------------+
| id | b_group | col2 | col3 | col4 |
+--------+----------+----------+-------+-----------------------+
| 2 | 1 | xyz | 200 | www.yahoo.com |
| 3 | 2 | dfs | 200 | www.stackoverflow.com |
| 6 | 5 | sdf4 | 200 | www.ymail.com |
+--------+----------+----------+-------+-----------------------+
I don't care which record is picked, order don't matter.
I just want to select all columns with group by only one.

By applying a group by clause, you get a result row per unique combination of all the columns in it (in this case, per unique combination of id, col2, col3, and col4). Instead, you could use the row_number window function to number rows per b_group, and then select just the (arbitrary) first of each group:
SELECT id, b_group, col2, col3, col4
FROM (SELECT id, b_group, col2, col3, col4,
ROW_NUMBER() OVER (PARTITION BY b_group ORDER BY 1) AS rn
FROM mytable
WHERE col3 = 200)
WHERE rn = 1

Related

Is it faster to do WHERE IN or INNER JOIN in Redshift

I have 2 tables in redshift:
table1
| ids |
|------:|
| 1 |
| 2 |
| 6 |
| 9 |
| 12 |
table2
| id | value |
|-----:|---------:|
| 1 | 0.134435 |
| 2 | 0.767417 |
| 3 | 0.779567 |
| 4 | 0.726051 |
| 5 | 0.405138 |
| 6 | 0.775206 |
| 7 | 0.699945 |
| 8 | 0.499433 |
| 10 | 0.457386 |
| 9 | 0.227511 |
| 10 | 0.369292 |
| 11 | 0.653735 |
| 12 | 0.537251 |
| 2 | 0.953539 |
| 13 | 0.377625 |
| 14 | 0.973905 |
| 4 | 0.104643 |
| 1 | 0.450627 |
And I basically want to get the rows in table2 where id is in table1 and I have 2 possibilities:
SELECT *
FROM table2
WHERE id IN (SELECT ids FROM table1)
or
SELECT t2.id, t2.value
FROM table2 t2
INNER JOIN table1 t1
ON t2.id = t1.ids
I want to know if there is any performance difference between them.
(I know I could just test in this example to find out but I would like to know if there is one which is always faster)
Edit: table1.ids is a unique column
The two queries do different things.
The JOIN can multiply the number of rows if id is duplicated in table1.
The IN will never duplicate rows.
If id can be duplicated, you should use the version that does what you want. If id is guaranteed to be unique, then the two are functionally equivalent.
In my experience, JOIN is typically at least as fast a IN. Of course, you can test on your data, but that is a starting point.

Adding conditional statements to a SQL window function

I want to use a series of conditions to dictate how a window function I have works. Currently, what I have is this:
SELECT col1, col2,
1=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC) OR
3=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC)
AS col3
FROM myTable;
What it's essentially doing is taking two columns of input, grouping by the values in col1, ordering by values in col2, and then splitting the data for each partition into two halves, and flagging the first row of each half as a true/1.
So, taking this input:
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
+------+------+
We get this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 1 |
| 1 | 4 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 4 | 0 |
+------+------+------+
Now, obviously, this only works when there are exactly 4 rows of entries for each value in col1. How do I introduce conditional statements to make this work when there aren't exactly 4 rows?
The constraints I have are these:
a) there will always be an even number of rows (2,4,6..) when grouping by values in `col1`
b) there will be a minimum of 2 rows when grouping by values in `col1`
EDIT:
I think I need to clarify that I do not simply want alternating rows of 1's and 0's. For example, if I used this table instead...
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 1 | 6 |
| 1 | 7 |
| 1 | 8 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 2 | 6 |
| 2 | 7 |
| 2 | 8 |
+------+------+
...then I'd expect this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 4 | 0 |
| 1 | 5 | 1 |
| 1 | 6 | 0 |
| 1 | 7 | 0 |
| 1 | 8 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 0 |
| 2 | 4 | 0 |
| 2 | 5 | 1 |
| 2 | 6 | 0 |
| 2 | 7 | 0 |
| 2 | 8 | 0 |
+------+------+------+
In the original example I gave, we grouped by col1 and saw that there were 4 rows for each partition. We take half of that, which is 2, and flag every 2nd row (every other row) as true/1.
In this second example, once we group by col1, we see that there are 8 rows for each partition. Splitting that in half gives us 4, so every 4th row should be flagged with a true/1.
Use modulo arithmetic.
Many dialects of SQL use % for modulus:
SELECT col1, col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) % 2 as col3
FROM mytable;
Some use the function MOD():
SELECT col1, col2,
MOD(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2), 2) as col3
FROM mytable;
EDIT:
You don't want to alternate rows. You simply want two rows. For that, you can still use modulo arithmetic but with somewhat different logic:
SELECT col1, col2,
(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2)
) as col3
FROM mytable;
I am just extending the Gordon's answer as his answer will not give you correct result -
SELECT col1, col2,
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2) = 1 THEN 1 ELSE 0 END
) as col3
FROM mytable;

Select one row inside a group according to a criteria in PostgreSQL

I have a table as such (tbl):
+----+-----+------+-----+
| pk | grp | attr | val |
+----+-----+------+-----+
| 0 | 0 | ohif | 4 |
| 1 | 0 | foha | 56 |
| 2 | 0 | slns | 2 |
| 3 | 1 | faso | 11 |
| 4 | 1 | tepj | 4 |
| 5 | 2 | bnda | 12 |
| 6 | 2 | ojdf | 9 |
| 7 | 2 | anaw | 1 |
+----+-----+------+-----+
I would like to select one row from each group, in particular that with the maximum val for each group.
I can easily select grp and val:
SELECT grp, MAX(val)
FROM tbl
GROUP BY grp
Yielding this table (tbl2):
+-----+-----+
| grp | val |
+-----+-----+
| 0 | 56 |
| 1 | 11 |
| 2 | 12 |
+-----+-----+
However, I want this table:
+----+-----+------+-----+
| pk | grp | attr | val |
+----+-----+------+-----+
| 1 | 0 | foha | 56 |
| 3 | 1 | faso | 11 |
| 5 | 2 | bnda | 12 |
+----+-----+------+-----+
Since (grp, val) constitutes a key, I could left-join tbl2 with tbl on same grp and val.
However, I was wondering if there was any other solution: in my real-world situation tbl is a pretty complex and heavy derived table, and I have the design constrain of not being able to use temp tables. Is there any way to order the rows inside each group according to val and to then take the first record for each group?
I'm on PostgreSQL 10, but a standard SQL solution would be the best.
In Postgres, the best approach is distinct on:
SELECT DISTINCT ON (t.grp) t.*
FROM tbl
ORDER BY grp, val DESC;
In particular, this can take advantage of an index on (grp, val desc).

Get new Id + 1 for each group in SQL

Please help me to figure out a way of getting from a data set the first number id of each group IF the Id is not already taken yet... I don't even know to explain it, So I will explain down here:
Id | Col1 | Col2 | Value | Number
------+-------+------+----------+-------
17525 | A | B | 1086.00 | 1
17525 | A | B | 1086.00 | 2
17525 | A | B | 1086.00 | 3
17526 | A | B | 1378.00 | 1
17526 | A | B | 1378.00 | 2
17526 | A | B | 1378.00 | 3
17527 | A | B | 1498.00 | 1
17527 | A | B | 1498.00 | 2
17527 | A | B | 1498.00 | 3
And I want to get something like this:
For each Id OR Value (doesn't matter, are equal) the FIRST Number, after the FIRST already taken from the other group.
Something like this:
Id | Col1 | Col2 | Value | Number
------+-------+------+----------+-------
17525 | A | B | 1086.00 | 1
17526 | A | B | 1378.00 | 2
17527 | A | B | 1498.00 | 3
So for the first value, 1086.00 I'll take Number 1, for the 2nd value 1378.00 I'll will take Number 2, because 1 is already taken be the first value.
I tried for 3 hours, with ROW_NUMBER, doesn't work, Recursion CTE could't pass the Max Recursion Limit 100 error.
Please HELP!
Thanks.
Have you considered using dense_rank()?:
select distinct Id, Col1, Col2, Value
, dr = dense_rank() over (order by Id)
from t
returns:
+-------+------+------+---------+----+
| Id | Col1 | Col2 | Value | dr |
+-------+------+------+---------+----+
| 17525 | A | B | 1086,00 | 1 |
| 17526 | A | B | 1378,00 | 2 |
| 17527 | A | B | 1498,00 | 3 |
+-------+------+------+---------+----+

Query for data in two tables connected by a third. Data Sometimes only on one

I thought I could figure this out but I am having a lot of issues.I have 3 Tables, Table1, Table2, and Table3. These tables where designed by someone else and I have to work with them. They were not designed to be used the way they are used today.
The bottom line is I need to be able to enter an Item_No, this will always exist in Table2. And if the Item_No can also be found in Table 3, could be multiple times or none, and there can be times where I can find it 5 times in Table2 and only 3 times in Table3. If it is in Table3 it will also be in Table1.
So, using the Item_No i can find on Table2, return the Order_qty's associated with those rows. Then using the if exist getting Table1.ID where Table1.ID = Table3.ID WHERE Table3.Item_No = Table2.Item_No
I came up with the following, it does not give me errors but simply stops code execution during a C# fill. I had it working for finding the Item_No on Table3 and returning what it finds, I have ONLY changed this line of code since so I KNOW this is the issue.
Here is what I could come up with that is not working:
SELECT Table1.ID,
Table2.Order_Qty As [Qty of Full Order], Table2.Item_No As [Set No]
FROM Table2
LEFT JOIN Table3
ON Table2.Item_No = Table3.Item_No
AND Table2.Order_No = Table3.Order_No
LEFT JOIN Table1
ON Table1.Order_No = Table2.Order_No
AND Table1.ID = Table3.ID
WHERE Table2.Item_No = #m_strUserEnteredSeachValue
ORDER BY Table2.Order_No DESC
*Example Data: *
Table 1
+----------+--------------+-------------------+
| Order_No | Sub_Order_No | Sub_Order_Contact |
+==========+==============+===================+
| 1 | 1 | John Doe |
+----------+--------------+-------------------+
| 1 | 2 | Jane Doe |
+----------+--------------+-------------------+
| 1 | 3 | Foo |
+----------+--------------+-------------------+
| 1 | 4 | Bar |
+----------+--------------+-------------------+
| 1 | 5 | Foo2 |
+----------+--------------+-------------------+
Table 2
+----------+--------------+-------------------+
| Order_No | Item_No | Customer_Item_Name|
+==========+==============+===================+
| 1 | 1 | 1234567890 |
+----------+--------------+-------------------+
| 1 | 2 | 1234567891 |
+----------+--------------+-------------------+
| 1 | 3 | 1234567892 |
+----------+--------------+-------------------+
| 1 | 4 | 1234567893 |
+----------+--------------+-------------------+
| 1 | 5 | 1234567894 |
+----------+--------------+-------------------+
| 1 | 6 | 1234567895 |
+----------+--------------+-------------------+
| 2 | 1 | 0987654321 |
+----------+--------------+-------------------+
| 2 | 2 | 0987654322 |
+----------+--------------+-------------------+
| 2 | 3 | 0987654323 |
+----------+--------------+-------------------+
| 3 | 1 | 1234567893 |
+----------+--------------+-------------------+
And Table 3
+----------+--------------+-------------------+--------------+
| Order_No | Item_No | Customer_Item_Name| Sub_Order_No |
+==========+==============+===================+==============+
| 1 | 1 | 1234567890 | 1 |
+----------+--------------+-------------------+--------------+
| 1 | 2 | 1234567891 | 2 |
+----------+--------------+-------------------+--------------+
| 1 | 3 | 1234567892 | 2 |
+----------+--------------+-------------------+--------------+
| 1 | 4 | 1234567893 | 3 |
+----------+--------------+-------------------+--------------+
| 1 | 5 | 1234567894 | 4 |
+----------+--------------+-------------------+--------------+
| 1 | 6 | 1234567895 | 4 |
+----------+--------------+-------------------+--------------+
| 1 | 4 | 1234567893 | 4 |
+----------+--------------+-------------------+--------------+
The Result I am looking for: If I search for Item 1234567893
+----------+--------------+-------------------+--------------+-------------------+
| Order_No | Item_No | Customer_Item_Name| Sub_Order_No | Sub_Order_Contact |
+==========+==============+===================+==============+===================+
| 3 | 1 | 1234567893 | | |
+----------+--------------+-------------------+--------------+-------------------+
| 1 | 4 | 1234567893 | 3 | Foo |
+----------+--------------+-------------------+--------------+-------------------+
| 1 | 4 | 1234567893 | 4 | Bar |
+----------+--------------+-------------------+--------------+-------------------+
A pragmatic answer to a problem like this is to split it into a couple of queries. Query Table #2 first, and then based on that result set, run additional queries into #1 or #3.
Another angle is to query on Table #2 and use subqueries to reach-out-there into Table #1 or Table #3 to fetch data you need.
Try this:
declare #m_strUserEnteredSeachValue varchar(10) = '1234567893';
with a as
(
select
Order_No, Item_No, Customer_Item_Name
from
Table2
UNION
select
Order_No, Item_No, Customer_Item_Name
from
Table3
)
select
a.Order_No,
a.Item_No,
a.Customer_Item_Name,
Table3.Sub_Order_No,
Table1.Sub_Order_Contact
from
a
left join
Table3
on
Table3.Order_No=a.Order_No
and Table3.Item_No=a.Item_No
and Table3.Customer_Item_Name=a.Customer_Item_Name
left join
Table1
on
Table1.Sub_Order_No = Table3.Sub_Order_No
where
#m_strUserEnteredSeachValue = a.Customer_Item_Name
order by
a.Item_No, Table3.Sub_Order_No
SqlFiddle demo: http://www.sqlfiddle.com/#!3/973d8/3
I have no idea if this is what you are trying to arrive at or not, since it's difficult to understand from you question. All I know that this query gives the dataset that you put in OP.