add two columns with groupby

add two columns with groupby - pandas

How can I add two columns after being grouped by a key from another column,
for example I have the following table:
+------+------+------+
| Col1 | Val1 | Val2 |
+------+------+------+
| 1 | 3 | 3 |
| 1 | 4 | 2 |
| 1 | 2 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 0 |
| 3 | 2 | 9 |
| 3 | 2 | 8 |
| 4 | 2 | 1 |
| 5 | 1 | 1 |
+------+------+------+
what I want to achieve is
+------+----------------------+
| Col1 | Sum of Val1 and Val2 |
+------+----------------------+
| 1 | 15 |
| 2 | 5 |
| 3 | 21 |
| 4 | 3 |
| 5 | 2 |
+------+----------------------+
I can get sum of a column grouping Col1, Col1 and then adding thier results but I am creating multiple columns in the process.
import pandas as pd
data =[[1,3,3],[1,4,2],[1,2,1],[2,2,0],[2,3,0],[3,2,9],[3,2,8],
[4,2,1],[5,1,1]]
mydf = pd.DataFrame(data, columns = ['Col1','Val1','Val2'])
print(mydf)
mydf['total1'] = mydf.groupby('Col1')['Val1'].transform('sum')
mydf['total2'] = mydf.groupby('Col1')['Val2'].transform('sum')
mydf['Sum of Val1 and Val2'] = mydf['total1'] + mydf['total2']
mydf = mydf.drop_duplicates('Col1')
print(mydf[['Col1', 'Sum of Val1 and Val2' ]])
is there a shorter way to deal with this?

mydf.groupby('Col1').sum().sum(axis=1)

Use the following:
mydf['Sum of Val1 and Val2'] = mydf['Val1'] + mydf['Val2']
df = mydf.groupby('Col1')['Sum of Val1 and Val2'].sum().reset_index()
print(df)
Col1 Sum of Val1 and Val2
0 1 15
1 2 5
2 3 21
3 4 3
4 5 2

Related

How to assign duplicate increment in SQL?

While going through SQL columns, if we find text match "NEW" in Calc column, update the incrementing a count starting with 1 in Results column.
It should look like this on the output:

The following uses an id column to resolve the order issue. Replace that with your corresponding expression. This also addresses the requirement to start the display sequence with 1 and also show 0 for the 'NEW' rows.
The SQL (updated):
SELECT logs.*
, CASE WHEN text = 'NEW' THEN 0
ELSE
COALESCE(SUM(CASE WHEN text = 'NEW' THEN 1 END) OVER (PARTITION BY xrank ORDER BY id)+1, 1)
END AS display
FROM logs
ORDER BY id
The result:
+----+-------+------+---------+
| id | xrank | text | display |
+----+-------+------+---------+
| 1 | 1 | A | 1 |
| 2 | 1 | B | 1 |
| 3 | 1 | C | 1 |
| 4 | 1 | NEW | 0 |
| 5 | 1 | D | 2 |
| 6 | 1 | Q | 2 |
| 7 | 1 | B | 2 |
| 8 | 1 | NEW | 0 |
| 9 | 1 | D | 3 |
| 10 | 1 | Z | 3 |
| 11 | 2 | A | 1 |
| 12 | 2 | B | 1 |
| 13 | 2 | C | 1 |
| 14 | 2 | NEW | 0 |
| 15 | 2 | D | 2 |
| 16 | 2 | Q | 2 |
| 17 | 2 | B | 2 |
| 18 | 2 | NEW | 0 |
| 19 | 2 | D | 3 |
| 20 | 2 | Z | 3 |
+----+-------+------+---------+

You need a column that specifies the ordering for the table. With that, just use a cumulative sum:
select t.*,
1 + sum(case when Calc = 'NEW' then 1 else 0 end) over (partition by Rank_Id order by Seq) as display
from t;

Oracle SQL unpivot and keep rows with null values [duplicate]

This question already has an answer here:
oracle - querying NULL values in unpivot query
(1 answer)
Closed 2 years ago.
I'm currently doing an unpivot for a Oracle Data Source (v.12.2) like this:
SELECT *
FROM some_table
UNPIVOT (
(X,Y,Val)
FOR SITE
IN (
(SITE1_X, SITE1_Y, SITE1_VAL) AS '1',
(SITE2_X, SITE2_Y, SITE2_VAL) AS '2',
(SITE3_X, SITE3_Y, SITE3_VAL) AS '3'
))
This works totally fine so far. There is only one exception - I have another column, let's say extend_info, ... if this column has the value y, there will be only one row of this column and all the site columns will be null. Nevertheless I would like to keep this row and not drop it.
I'm not really sure how to do this or what would be a nice way to do this. Any recommendations?
Example:
Original Table:
ID | SITE1_X | SITE1_Y |SITE1_VAL | SITE2_X | SITE2_Y | SITE2_VAL | ... | extend_info
-------
1 | 0 | 0 | 5 | 1 | 1 | 10 | ... | n
2 | 0 | 0 | 3 | null | null | null | ... | n
3 | null | null | null | null | null | null | ... | y
current output:
ID | SITE | X | Y | VAL | extend_info
-------
1 | 1 | 0 | 0 | 5 | n
2 | 1 | 1 | 1 | 10 | n
3 | 2 | 0 | 0 | 3 | n
desired output:
ID | SITE | X | Y | VAL | extend_info
-------
1 | 1 | 0 | 0 | 5 | n
2 | 1 | 1 | 1 | 10 | n
3 | 2 | 0 | 0 | 3 | n
4 | | | | | y
I don't really care what is in SITE|X|Y|VAL in that case, can be 0 for everything or null.
Bonus question:
If extend_info is y I would like to join another table with this ID. The other table looks like this:
ID | F_ID | X | Y | VAL
-----
1 | 4 | 1 | 1 | 8
2 | 4 | 2 | 2 | 9
and in that case my final output table should look like:
ID | SITE | X | Y | VAL | X_OTHER_TABLE | Y_OTHER_TABLE
-------
1 | 1 | 0 | 0 | 5 |
2 | 1 | 1 | 1 | 10 |
3 | 2 | 0 | 0 | 3 |
4 | | | | 8 | 1 | 1
5 | | | | 9 | 2 | 2
I know... the database structure is super ugly but that is what a vendor provides us and we are trying to create a View to make it easier to perform some data analysis tasks on it.
It doesn't have to look 1:1 like my final example - but maybe my itention gets clear = I want to have one single table/view with all the information in a single format.
Thanks for any help!

I would recommend a lateral join:
SELECT s.id, u.*
FROM some_table s CROSS JOIN LATERAL
(SELECT s.SITE1_X as SITE_X, s.SITE1_Y as SITE_Y, s.SITE1_VAL as SITE_VAL FROM DUAL UNION ALL
SELECT s.SITE2_X, s.SITE2_Y, s.SITE2_VAL FROM DUAL UNION ALL
SELECT s.SITE3_X, s.SITE3_Y, s.SITE3_VAL FROM DUAL
) u;
You can just join additional tables to this as you like.

Adding conditional statements to a SQL window function

I want to use a series of conditions to dictate how a window function I have works. Currently, what I have is this:
SELECT col1, col2,
1=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC) OR
3=Row_number() OVER (PARTITION BY col1 ORDER BY col2 ASC)
AS col3
FROM myTable;
What it's essentially doing is taking two columns of input, grouping by the values in col1, ordering by values in col2, and then splitting the data for each partition into two halves, and flagging the first row of each half as a true/1.
So, taking this input:
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
+------+------+
We get this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 1 |
| 1 | 4 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 4 | 0 |
+------+------+------+
Now, obviously, this only works when there are exactly 4 rows of entries for each value in col1. How do I introduce conditional statements to make this work when there aren't exactly 4 rows?
The constraints I have are these:
a) there will always be an even number of rows (2,4,6..) when grouping by values in `col1`
b) there will be a minimum of 2 rows when grouping by values in `col1`
EDIT:
I think I need to clarify that I do not simply want alternating rows of 1's and 0's. For example, if I used this table instead...
+------+------+
| col1 | col2 |
+------+------+
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 1 | 5 |
| 1 | 6 |
| 1 | 7 |
| 1 | 8 |
| 2 | 1 |
| 2 | 2 |
| 2 | 3 |
| 2 | 4 |
| 2 | 5 |
| 2 | 6 |
| 2 | 7 |
| 2 | 8 |
+------+------+
...then I'd expect this result:
+------+------+------+
| col1 | col2 | col3 |
+------+------+------+
| 1 | 1 | 1 |
| 1 | 2 | 0 |
| 1 | 3 | 0 |
| 1 | 4 | 0 |
| 1 | 5 | 1 |
| 1 | 6 | 0 |
| 1 | 7 | 0 |
| 1 | 8 | 0 |
| 2 | 1 | 1 |
| 2 | 2 | 0 |
| 2 | 3 | 0 |
| 2 | 4 | 0 |
| 2 | 5 | 1 |
| 2 | 6 | 0 |
| 2 | 7 | 0 |
| 2 | 8 | 0 |
+------+------+------+
In the original example I gave, we grouped by col1 and saw that there were 4 rows for each partition. We take half of that, which is 2, and flag every 2nd row (every other row) as true/1.
In this second example, once we group by col1, we see that there are 8 rows for each partition. Splitting that in half gives us 4, so every 4th row should be flagged with a true/1.

Use modulo arithmetic.
Many dialects of SQL use % for modulus:
SELECT col1, col2,
ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) % 2 as col3
FROM mytable;
Some use the function MOD():
SELECT col1, col2,
MOD(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2), 2) as col3
FROM mytable;
EDIT:
You don't want to alternate rows. You simply want two rows. For that, you can still use modulo arithmetic but with somewhat different logic:
SELECT col1, col2,
(ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2)
) as col3
FROM mytable;

I am just extending the Gordon's answer as his answer will not give you correct result -
SELECT col1, col2,
(CASE WHEN ROW_NUMBER() OVER (PARTITION BY col1 ORDER BY col2) %
FLOOR(COUNT(*) OVER (PARTITION BY col1) / 2) = 1 THEN 1 ELSE 0 END
) as col3
FROM mytable;

Make a new column based from CASE and GROUP BY result

I have table like this called, table test:
+------------------------+--------+
| id_laporan_rekomendasi | status |
+------------------------+--------+
| 1 | 2 |
| 1 | 2 |
| 1 | 2 |
| 1 | 3 |
| 2 | 2 |
| 2 | 2 |
| 2 | 2 |
| 2 | 3 |
| 3 | 2 |
| 3 | 3 |
| 4 | 2 |
| 5 | 2 |
| 5 | 3 |
| 6 | 2 |
+------------------------+--------+
I want to group by id_laporan_rekomendasi and make a new column when in column status there is value 3. so if there is no value 3 in column status, then the value would be 0, but if there is value 3 than 1.
I expect the result would be like this
+------------------------+------+
| id_laporan_rekomendasi | test |
+------------------------+------+
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 0 |
| 5 | 1 |
| 6 | 0 |
+------------------------+------+
I have tried this query
SELECT t1.id_laporan_rekomendasi,
COUNT(distinct case when t1.status = 3 then 1 else 0 end) as test
FROM test t1
GROUP BY t1.id_laporan_rekomendasi
But i got the result like below
+------------------------+------+
| id_laporan_rekomendasi | test |
+------------------------+------+
| 1 | 2 |
| 2 | 2 |
| 3 | 2 |
| 4 | 1 |
| 5 | 2 |
| 6 | 1 |
+------------------------+------+
Does anyone could help me with this table ?

You are close. In MariaDB, you can simplify this to:
SELECT t1.id_laporan_rekomendasi,
MAX( t1.status = 3 ) as test
FROM test t1
GROUP BY t1.id_laporan_rekomendasi;
MariaDB (and MySQL) treat boolean expressions as numbers, with "1" for true and "0" for false. So this does what you want.

Advanced SQL SUM COLUMN FROM A SAME TABLE

I have this table...
+-----+--------+------+-----+-----+
|categ| nAME | quan |IDUNQ| ID|
+-----+--------+------+-----+-----+
| 1 | Z | 3 | 1 | 15 |
| 1 | A | 3 | 2 | 16 |
| 1 | B | 3 | 3 | 17 |
| 2 | Z | 2 | 4 | 15 |
| 2 | A | 2 | 5 | 16 |
| 3 | Z | 1 | 6 | 15 |
| 3 | B | 1 | 7 | 17 |
| 2 | Z | 1 | 8 | 15 |
| 2 | C | 4 | 8 | 15 |
| 1 | D | 1 | 8 | 15 |
+-----+--------+------+-----+-----+
I need to get the Z of category 1 + Z of category 2 - Z of category 3
For example, (3+3-1) = 5 ==> 3 of cat 1, 3 of cat 2, 1 of cat 3
The final result should be...
Z ==> 5
A ==> 5
B ==> 2
C ==> 4

Note: I'm assuming the data for "C" from your example was mistakenly omitted.
SELECT nAME, SUM(CASE categ WHEN 3 THEN 0-quan ELSE quan END) AS quan
FROM theTable
GROUP BY nAME
SQL Fiddle

SELECT name, SUM(quan) AS sum
FROM tableName
GROUP BY name, categ
This should work.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

add two columns with groupby - pandas

mydf.groupby('Col1').sum().sum(axis=1)

Use the following: mydf['Sum of Val1 and Val2'] = mydf['Val1'] + mydf['Val2'] df = mydf.groupby('Col1')['Sum of Val1 and Val2'].sum().reset_index() print(df) Col1 Sum of Val1 and Val2 0 1 15 1 2 5 2 3 21 3 4 3 4 5 2

Related

How to assign duplicate increment in SQL?

Oracle SQL unpivot and keep rows with null values [duplicate]

Adding conditional statements to a SQL window function

Make a new column based from CASE and GROUP BY result

Advanced SQL SUM COLUMN FROM A SAME TABLE

Categories

Resources