I have sql table as follows
+-----------------------------+
| |col1 | col2 | col3| col4| |
+-----------------------------+
| _______________________ |
| | a | 3 | d1 | 10 | |
| | a | 6 | d2 | 15 | |
| | b | 2 | d2 | 8 | |
| | b | 30 | d1 | 50 | |
+-----------------------------+
I would like transform the above table into below, where the transformation is
col4 = col4 - (col4 % min(col2) group by col1)
+------------------------------+
| |col1 | col2 | col3| col4| |
+------------------------------+
| ____________________________ |
| |a | 3 | d1 | 9 | |
| |a | 6 | d2 | 15 | |
| |b | 2 | d2 | 8 | |
| |b | 30 | d1 | 50 | |
| |
+------------------------------+
I could read the above table in application code to do transformation manually, was wondering if it was possible to offload the transformation to sql
Just run a simple select query for this:
select col1, col2, col3,
col4 - (col4 % min(col2) over (partition by col1))
from t;
There is no need to actually modify the table.
You can use a multi-table UPDATE to achieve your desired result, joining your table to a table of MIN(col2) values:
UPDATE table1
SET col4 = col4 - (col4 % t2.col2min)
FROM (SELECT col1, MIN(col2) AS col2min
FROM table1
GROUP BY col1) t2
WHERE table1.col1 = t2.col1
Output:
col1 col2 col3 col4
a 3 d1 9
a 6 d2 15
b 2 d2 8
b 30 d1 50
Demo on dbfiddle
Related
Is there a way to skip all rows that result in zero after division. For example
+------+------+
| Col1 | Col2 |
+------+------+
| 5 | 5 |
| 3 | 0 |
| 12 | 6 |
+------+------+
Then col3 = col1 /col2 giving:
+------+------+------+
| Col1 | Col2 | col3 |
+------+------+------+
| 5 | 5 | 1 |
| 12 | 6 | 2 |
+------+------+------+
You can try the below -
select col1, col2, col1/col2
from tablename
where col2!=0
I want to use a series of conditions to dictate how a window function I have works. Currently, what I have is this:
SELECT col1, col2,
1=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC) OR
3=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC)
AS col3
FROM myTable;
What it's essentially doing is taking two columns of input, grouping by the values in col1, ordering by values in col2, and then splitting the data for each partition into two halves, and flagging the first row of each half as a true/1.
So, taking this input:
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
+------+------+
We get this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 1 |
| 1 | 4 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 4 | 0 |
+------+------+------+
Now, obviously, this only works when there are exactly 4 rows of entries for each value in col1. How do I introduce conditional statements to make this work when there aren't exactly 4 rows?
The constraints I have are these:
a) there will always be an even number of rows (2,4,6..) when grouping by values in `col1`
b) there will be a minimum of 2 rows when grouping by values in `col1`
EDIT:
I think I need to clarify that I do not simply want alternating rows of 1's and 0's. For example, if I used this table instead...
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 1 | 6 |
| 1 | 7 |
| 1 | 8 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 2 | 6 |
| 2 | 7 |
| 2 | 8 |
+------+------+
...then I'd expect this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 4 | 0 |
| 1 | 5 | 1 |
| 1 | 6 | 0 |
| 1 | 7 | 0 |
| 1 | 8 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 0 |
| 2 | 4 | 0 |
| 2 | 5 | 1 |
| 2 | 6 | 0 |
| 2 | 7 | 0 |
| 2 | 8 | 0 |
+------+------+------+
In the original example I gave, we grouped by col1 and saw that there were 4 rows for each partition. We take half of that, which is 2, and flag every 2nd row (every other row) as true/1.
In this second example, once we group by col1, we see that there are 8 rows for each partition. Splitting that in half gives us 4, so every 4th row should be flagged with a true/1.
Use modulo arithmetic.
Many dialects of SQL use % for modulus:
SELECT col1, col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) % 2 as col3
FROM mytable;
Some use the function MOD():
SELECT col1, col2,
MOD(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2), 2) as col3
FROM mytable;
EDIT:
You don't want to alternate rows. You simply want two rows. For that, you can still use modulo arithmetic but with somewhat different logic:
SELECT col1, col2,
(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2)
) as col3
FROM mytable;
I am just extending the Gordon's answer as his answer will not give you correct result -
SELECT col1, col2,
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2) = 1 THEN 1 ELSE 0 END
) as col3
FROM mytable;
I need something like this in MS ACCESS SQL
SELECT
ID,
col1,
col2,
random(col3)
FROM
table
GROUP BY
ID,
col1,
col2
NOTE:
I want to remove duplicates choosing random value of col3.
INPUT:
+----+------+------+------+
| Id | col1 | col2 | col3 |
+----+------+------+------+
| 1 | A | B | 7 |
+----+------+------+------+
| 1 | A | B | 10 |
+----+------+------+------+
RESULT:
+----+------+------+------+
| Id | col1 | col2 | col3 |
+----+------+------+------+
| 1 | A | B | 7 |
+----+------+------+------+
REQUERY:
+----+------+------+------+
| Id | col1 | col2 | col3 |
+----+------+------+------+
| 1 | A | B | 10 |
+----+------+------+------+
I need to find the sum of cases in col2 where for each set in col1 (ABC), the col2 value has a Y in col3 100% of the time. In this case, B1 &
D1 meet this criteria, so N=2. Support in pandas or SQL are helpful (both are ideal).
| col1 | col2 | col3 | col4 | col5 |
|------|-------|-------|-------|-------|
| A | A1 | N | 1 | 256 |
| A | B1 | Y | 2 | 3 |
| A | C1 | N | 3 | 323 |
| B | F1 | N | 1 | 89 |
| B | B1 | Y | 2 | 256 |
| C | D1 | Y | 1 | 3 |
| D | A1 | N | 1 | 32 |
| D | C1 | Y | 2 | 893 |
Something like this in python pandas
df.groupby('col2').col3.apply(lambda x : sum(x=='Y')==x.count()).sum()
Out[568]: 2
More detail :
df.groupby('col2').col3.apply(lambda x : sum(x=='Y')==x.count())
Out[569]:
col2
A1 False
B1 True
C1 False
D1 True
F1 False
Name: col3, dtype: bool
I don't see what col1 has to do with this. You can do this with a SQL query:
select count(*)
from (select col2
from t
where min(col3) = max(col3) and min(col3) = 'Y'
) t;
I am stuck in similar situation as this.
I have multiple columns with different types of data, and I want to select all columns but group by it with only one column.
My Table:
+--------+----------+----------+-------+-----------------------+
| id | b_group | col2 | col3 | col4 |
+--------+----------+----------+-------+-----------------------+
| 1 | 1 | abcd | 100 | www.google.com |
| 2 | 1 | xyz | 200 | www.yahoo.com |
| 3 | 2 | dfs | 200 | www.stackoverflow.com |
| 4 | 3 | asda3 | 78 | www.imdb.com |
| 5 | 4 | zsdvf4 | 65 | www.youtube.com |
| 6 | 5 | sdf4 | 101 | www.ymail.com |
| 7 | 5 | ssdfsd | 200 | www.gmail.com |
| 8 | 1 | zxcgdf4 | 200 | www.club.com |
| 9 | 6 | yujhgj | 202 | www.thunderbird.com |
+--------+----------+----------+-------+-----------------------+
After reading the solution provided there, what I understood is to use aggregate function so my query is like:
select MIN(b_group),id,col2,col3,col4 from myTable where col3='200' group by id,col2,col3,col4;
But this is not working in my case, it is giving all the records where col3=200.
My desired Output:
+--------+----------+----------+-------+-----------------------+
| id | b_group | col2 | col3 | col4 |
+--------+----------+----------+-------+-----------------------+
| 2 | 1 | xyz | 200 | www.yahoo.com |
| 3 | 2 | dfs | 200 | www.stackoverflow.com |
| 6 | 5 | sdf4 | 200 | www.ymail.com |
+--------+----------+----------+-------+-----------------------+
I don't care which record is picked, order don't matter.
I just want to select all columns with group by only one.
By applying a group by clause, you get a result row per unique combination of all the columns in it (in this case, per unique combination of id, col2, col3, and col4). Instead, you could use the row_number window function to number rows per b_group, and then select just the (arbitrary) first of each group:
SELECT id, b_group, col2, col3, col4
FROM (SELECT id, b_group, col2, col3, col4,
ROW_NUMBER() OVER (PARTITION BY b_group ORDER BY 1) AS rn
FROM mytable
WHERE col3 = 200)
WHERE rn = 1