Pandas: Dataframe pivot result doesn't drill down

Pandas: Dataframe pivot result doesn't drill down - pandas

i have the bellow dataframe
Item_code Type year-month Qty
0 TH-32H400M O Jan-22-Q 0.000000
1 TH-32H400M MPO Jan-22-Q 0.000000
2 TH-32H400M ADJ Jan-22-Q 0.000000
3 TH-32H400M BP_O Jan-22-Q 0.000000
4 TH-32H400M LY_O Jan-22-Q 0.000000
... ... ... ... ...
1795 TH-75JX660M P Jun-23-Q 0.000000
1796 TH-75JX660M S Jun-23-Q 11.538462
1797 TH-75JX660M BP_S Jun-23-Q 0.000000
1798 TH-75JX660M LY_S Jun-23-Q 0.000000
1799 TH-75JX660M I Jun-23-Q 0.769231
When i run the below code i get the desired result but with a few issues,
new_df = new_df.pivot(index=['Item_code','year-month'], columns='Type', values='Qty')
+--------------+------------+----------+------+------+---+-------------+------+-----+-----+-----+-----+
| Item_code | year-month | ADJ | BP_O | BP_S | I | LY_O | LY_S | MPO | O | P | S |
+--------------+------------+----------+------+------+---+-------------+------+-----+-----+-----+-----+
| TH-32GS655M | Apr-22-Q | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| | Apr-23-Q | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 350 | 350 | 350 |
| | Aug-22-Q | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| | Dec-22-Q | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 350 | 0 | 0 |
| | Feb-22-Q | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| | Feb-23-Q | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 350 | 350 | 350 |
| | Jan-22-Q | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ------------ | | | | | | | | | | | | |
| TH-75HX750 | Jan-23-Q | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 350 | 350 | 350 |
| | Jul-22-Q | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| | Jun-22-Q | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| | Jun-23-Q | 0 | 0 | 0 | 13| 0 | 0 | 0 | 0 | 0 | 1.9 |
+--------------+------------+----------+------+------+---+-------------+------+-----+-----+-----+-----+
Why is "Item code" only not repeated on every row
How to get column name on the same row,
Basically "Type" should not be there and "Item_code" & "year-month" should be first row witht he rest of column names
Thank you for the help.

Maybe this solution will work.
new_df = new_df.pivot(index=['Item_code','year-month'], columns='Type', values='Qty')
new_df = new_df.reset_index().fillna(0)

Related

dense_rank over boolean column

Good day. I have a permutated table with condition and I am running redshift DB. This is a table with events log and I splitted it into session start (bool = 1) and session continue (bool = 0) like this:
=======================
| ID | BOOL |
=======================
| 1 | 0 |
| 2 | 1 |
| 3 | 0 |
| 4 | 0 |
| 5 | 0 |
| 6 | 0 |
| 7 | 0 |
| 8 | 0 |
| 9 | 0 |
| 10 | 0 |
| 11 | 1 |
| 12 | 0 |
| 13 | 0 |
| 14 | 1 |
| 15 | 0 |
| 16 | 0 |
=======================
I need to create sesssion_id column with something like dense_rank:
================================
| ID | BOOL | D_RANK |
================================
| 1 | 0 | 1 |
| 2 | 1 | 2 |
| 3 | 0 | 2 |
| 4 | 0 | 2 |
| 5 | 0 | 2 |
| 6 | 0 | 2 |
| 7 | 0 | 2 |
| 8 | 0 | 2 |
| 9 | 0 | 2 |
| 10 | 0 | 2 |
| 11 | 1 | 3 |
| 12 | 0 | 3 |
| 13 | 0 | 3 |
| 14 | 1 | 4 |
| 15 | 0 | 4 |
| 16 | 0 | 4 |
================================
Is there any option to do this? Would appreciate any help.

Use a cumulative sum. Assuming that bool is the start of a new session:
select t.*,
sum(bool) over (order by id) as session_id
from t;
Note: This will start at 0. You can add 1 if you need.

Cumulative sum of multiple window functions V3

I have this table:
id | date | player_id | score | all_games | all_wins | n_games | n_wins
============================================================================================
6747 | 2018-08-10 | 1 | 0 | 1 | | 1 |
6751 | 2018-08-10 | 1 | 0 | 2 | 0 | 2 |
6764 | 2018-08-10 | 1 | 0 | 3 | 0 | 3 |
6783 | 2018-08-10 | 1 | 0 | 4 | 0 | 4 |
6804 | 2018-08-10 | 1 | 0 | 5 | 0 | 5 |
6821 | 2018-08-10 | 1 | 0 | 6 | 0 | 6 |
6828 | 2018-08-10 | 1 | 0 | 7 | 0 | 7 |
17334 | 2018-08-23 | 1 | 0 | 8 | 0 | 8 | 0
17363 | 2018-08-23 | 1 | 0 | 9 | 0 | 9 | 0
17398 | 2018-08-23 | 1 | 0 | 10 | 0 | 10 | 0
17403 | 2018-08-23 | 1 | 0 | 11 | 0 | 11 | 0
17409 | 2018-08-23 | 1 | 0 | 12 | 0 | 12 | 0
33656 | 2018-09-13 | 1 | 0 | 13 | 0 | 13 | 0
33687 | 2018-09-13 | 1 | 0 | 14 | 0 | 14 | 0
45393 | 2018-09-27 | 1 | 0 | 15 | 0 | 15 | 0
45402 | 2018-09-27 | 1 | 0 | 16 | 0 | 16 | 0
45422 | 2018-09-27 | 1 | 1 | 17 | 0 | 17 | 0
45453 | 2018-09-27 | 1 | 0 | 18 | 1 | 18 | 0
45461 | 2018-09-27 | 1 | 0 | 19 | 1 | 19 | 0
45474 | 2018-09-27 | 1 | 0 | 20 | 1 | 20 | 0
57155 | 2018-10-11 | 1 | 0 | 21 | 1 | 21 | 1
57215 | 2018-10-11 | 1 | 0 | 22 | 1 | 22 | 1
57225 | 2018-10-11 | 1 | 0 | 23 | 1 | 23 | 1
69868 | 2018-10-25 | 1 | 0 | 24 | 1 | 24 | 1
The issue that I now need to solve is that I need n_games to be a rolling count of the last number of games per day, i.e. a user can play multiple games per day, as present it is just the same as row_number(*) OVER all_games
The other issues is that the column n_wins only does a sum(*) of the rolling windows wins for the day, so if a user wins a couple of games early on in day, that will not be added to the n_wins column until the next day.
I have the example DEMO:
I have tried this query
SELECT id,
date,
player_id,
score,
row_number(*) OVER all_races AS all_games,
sum(score) OVER all_races AS all_wins,
row_number(*) OVER last_n AS n_games,
sum(score) OVER last_n AS n_wins
FROM scores
WINDOW
all_races AS (PARTITION BY player_id ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING),
last_n AS (PARTITION BY player_id ORDER BY date ASC RANGE BETWEEN interval '7 days' PRECEDING AND interval '1 day' PRECEDING);
Ideally I need a query that will output something like this table
id | date | player_id | score | all_games | all_wins | n_games | n_wins
============================================================================================
6747 | 2018-08-10 | 1 | 0 | 1 | | 1 |
6751 | 2018-08-10 | 1 | 0 | 2 | 0 | 2 |
6764 | 2018-08-10 | 1 | 0 | 3 | 0 | 3 |
6783 | 2018-08-10 | 1 | 0 | 4 | 0 | 4 |
6804 | 2018-08-10 | 1 | 0 | 5 | 0 | 5 |
6821 | 2018-08-10 | 1 | 0 | 6 | 0 | 6 |
6828 | 2018-08-10 | 1 | 0 | 7 | 0 | 7 |
17334 | 2018-08-23 | 1 | 0 | 8 | 0 | 1 | 0
17363 | 2018-08-23 | 1 | 0 | 9 | 0 | 2 | 0
17398 | 2018-08-23 | 1 | 0 | 10 | 0 | 3 | 0
17403 | 2018-08-23 | 1 | 0 | 11 | 0 | 4 | 0
17409 | 2018-08-23 | 1 | 0 | 12 | 0 | 5 | 0
33656 | 2018-09-13 | 1 | 1 | 13 | 1 | 6 | 0
33687 | 2018-09-13 | 1 | 0 | 14 | 1 | 7 | 1
45393 | 2018-09-27 | 1 | 0 | 15 | 1 | 1 | 1
45402 | 2018-09-27 | 1 | 0 | 16 | 1 | 2 | 1
45422 | 2018-09-27 | 1 | 1 | 17 | 1 | 3 | 1
45453 | 2018-09-27 | 1 | 0 | 18 | 2 | 4 | 2
45461 | 2018-09-27 | 1 | 0 | 19 | 2 | 5 | 2
45474 | 2018-09-27 | 1 | 0 | 20 | 2 | 6 | 1
57155 | 2018-10-11 | 1 | 0 | 21 | 2 | 7 | 1
57215 | 2018-10-11 | 1 | 0 | 22 | 2 | 1 | 1
57225 | 2018-10-11 | 1 | 0 | 23 | 2 | 2 | 1
69868 | 2018-10-25 | 1 | 0 | 24 | 2 | 3 | 1

sql server "bridging" data

I have the following mssql table:
+-------------------------+---+---+---+---+---+
| date | A | B | C | D | E |
+-------------------------+---+---+---+---+---+
| 2017-02-02 00:00:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:01:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:02:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:03:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:04:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:05:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:06:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:07:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:08:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:09:00.000 | 1 | 0 | 0 | 1 | 0 |
+-------------------------+---+---+---+---+---+
I need to write a query that changes the 0s to 1s on column D if the state of D goes to zero for less than 5 minutes. In other words I need to "bridge" the two consecutive 1s at the extremities of the 0s if the state 0 is smaller than ten mins.
Is it possible to perform this operation using T-SQL (SQL SERVER 2014)?
Thank you.
Example1:
+-------------------------+---+---+---+---+---+
| date | A | B | C | D | E |
+-------------------------+---+---+---+---+---+
| 2017-02-02 00:00:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:01:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:02:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:03:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:04:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:05:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:06:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:07:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:08:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:09:00.000 | 1 | 0 | 0 | 1 | 0 |
+-------------------------+---+---+---+---+---+
The query should return
+-------------------------+---+---+---+---+---+
| date | A | B | C | D | E |
+-------------------------+---+---+---+---+---+
| 2017-02-02 00:00:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:01:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:02:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:03:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:04:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:05:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:06:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:07:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:08:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:09:00.000 | 1 | 0 | 0 | 1 | 0 |
+-------------------------+---+---+---+---+---+
example 2:
+-------------------------+---+---+---+---+---+
| date | A | B | C | D | E |
+-------------------------+---+---+---+---+---+
| 2017-02-02 00:00:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:01:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:02:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:03:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:04:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:05:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:06:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:07:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:08:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:09:00.000 | 1 | 0 | 0 | 1 | 0 |
+-------------------------+---+---+---+---+---+
the query should return
+-------------------------+---+---+---+---+---+
| date | A | B | C | D | E |
+-------------------------+---+---+---+---+---+
| 2017-02-02 00:00:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:01:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:02:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:03:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:04:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:05:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:06:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:07:00.000 | 1 | 0 | 0 | 0 | 0 |
| 2017-02-02 00:08:00.000 | 1 | 0 | 0 | 1 | 0 |
| 2017-02-02 00:09:00.000 | 1 | 0 | 0 | 1 | 0 |
+-------------------------+---+---+---+---+---+

UPDATE - You probably got the idea from the original, but I used the wrong aggregate function some of the time; I think I have it untangled now.
So... If a row's value is 0, but the time between the most recent preceding row with a 1 and the earliest subsequent row with a 1 is less than 10 minutes, you want to change that row's value to a 1. And in all other cases you leave the value as is. Right?
The time of the most recent row with a 1 can be expressed as max(case when D = 1 then date end) over (order by date rows unbounded preceding).
Likewise the time of the earliest subsequent row with a 1 can be expressed as min(case when D = 1 then date end) over (order by date rows unbounded following).
Find the interval between them; if the dates are all aligned to an even minute, then you can simply use datediff:
datediff(minute, max(case when D=1 then date end) over (order by date rows unbounded preceding),
min(case when D=1 then date end) over (order by date rows unbounded following))
Then apply case logic.
case when -- the above expression
< 10 then 1 else D end

What is the equivalent of aggregate functions FIRST and LAST from MySQL in Firebird

Does anyone know what is the equivalent of the aggregate functions FIRST and LAST from MySQL to Firebird. I have this inventory master table that looks like this:
DATE |ITEM_CODE | BEG | + | - | - | - | + | + | + | + | - | - | END
2015-10-27 | 000000000MS016 |12.5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 12.5
2015-10-27 | 000000000PN044 | 0 |10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 10
2015-10-27 | 000000000VI064 | 440 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 445
2015-10-27 | 000000000VI029 | 274 | 0 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 269
2015-10-28 | 000000000MS016 |12.5 |20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 32.5
2015-10-28 | 000000000PN044 | 10 |50 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 60
2015-10-28 | 000000000VI064 | 445 | 0 | 0 |10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 435
2015-10-28 | 000000000VI029 | 269 | 0 | 0 | 0 |20 | 0 | 0 | 0 | 0 | 0 | 0 | 249
2015-10-29 | 000000000MS016 |32.5 | 0 |10 | 0 | 0 | 0 | 0 | 0 |30 | 0 | 5 | 47.5
2015-10-29 | 000000000PN044 | 60 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 65
2015-10-29 | 000000000VI064 | 435 | 0 | 0 | 0 | 0 |10 | 0 | 0 | 0 | 8 | 0 | 437
2015-10-29 | 000000000VI029 | 249 |35 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 4 | 280
2015-10-30 | 000000000MS016 |47.5 | 0 |15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 32.5
2015-10-30 | 000000000PN044 | 65 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 65
2015-10-30 | 000000000VI064 | 437 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 437
2015-10-30 | 000000000VI029 | 280 | 0 | 5 | 0 | 5 | 0 | 0 | 6 | 0 | 3 | 0 | 273
and I have this SELECT clause:
SELECT
INV.ITEM_CODE,
FIRST(INV.BEG_QTY) AS BEG_QTY,
SUM(INV.REC_QTY) AS REC_QTY,
SUM(INV.RET_QTY) AS RET_QTY,
SUM(INV.SOLD_QTY) AS SOLD_QTY,
SUM(INV.BO_QTY) AS BO_QTY,
SUM(INV.ADJ_QTY) AS ADJ_QTY,
SUM(INV.COUNT_P) AS COUNT_P,
SUM(INV.COUNT_C) AS COUNT_C,
SUM(INV.TRANS_IN) AS TRANS_IN,
SUM(INV.TRANS_OUT) AS TRANS_OUT,
SUM(INV.DELIVERY) AS DELIVERY,
LAST(INV.END_QTY) AS END_QTY
FROM INV_MASTER INV
WHERE (INV.INV_DATE BETWEEN '2015-10-27' AND '2015-10-31')
GROUP BY INV.ITEM_CODE
ORDER BY INV.ITEM_CODE
and the result SHOULD look like this:
ITEM_CODE | BEG | + | - | - | - | + | + | + | + | - | - | END
000000000MS016 |12.5 |20 |25 | 0 | 0 | 0 | 0 | 0 |30 | 0 | 5 | 32.5
000000000PN044 | 0 |70 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 5 | 0 | 65
000000000VI064 | 440 | 5 | 0 |10 | 0 |10 | 0 | 0 | 0 | 8 | 0 | 437
000000000VI029 | 274 |35 |10 | 0 |25 | 0 | 0 | 6 | 0 | 3 | 4 | 273
but I'm having a problem with the FIRST and LAST aggregate functions, I'm using firebird v2.5. How can i do this?

You should be able to replace the use of LAST with
(SELECT END_QTY FROM INV_MASTER
WHERE ITEM_CODE = INV.ITEM_CODE
AND INV_DATE = MAX(INV.INV_DATE)) AS END_QTY
This selects the END_QTY of the current item, with the highest date for that item.

SSRS Remove Column from Report

i am including column 13 as a dummy column here:
+----+---+---+---+----+---+---+---+---+---+----+----+----+----+
| | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 |
+----+---+---+---+----+---+---+---+---+---+----+----+----+----+
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 2 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 3 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 4 | 1 | 0 | 0 | 20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 6 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 7 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 9 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
+----+---+---+---+----+---+---+---+---+---+----+----+----+----+
the reason i am including a dummy column is so that IF columns 1 through 12 are all zero, i would still like to include an entry for that row.
as you can see row 1 would not have been included.
this report is generated by SSRS.
i am wondering if there is a way to HIDE column 13?
is there some kind of conditional formatting i can do?
to clarify here's my query:
select tat.*, tat.tat as tat2 from tat
it is organized in the report this way:
this data set [TAT] contains dummy data specifically for column 13

Specific columns in a column group can be hidden based on values with the following steps.
Right-click header of the column group you want to hide, Column Group -> Group Properties
Click on the Visibility pane and select Show or hide based on an expression radio button. Use an expression to determine when column is hidden.
True hides the column, False displays it. You will need to update the field name in my example to match your month field name.

Don't include column 13 in your select? If you are doing a select *, change it to Select col1, col2, ...., col12

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pandas: Dataframe pivot result doesn't drill down - pandas

Maybe this solution will work. new_df = new_df.pivot(index=['Item_code','year-month'], columns='Type', values='Qty') new_df = new_df.reset_index().fillna(0)

Related

dense_rank over boolean column

Cumulative sum of multiple window functions V3

sql server "bridging" data

What is the equivalent of aggregate functions FIRST and LAST from MySQL in Firebird

SSRS Remove Column from Report

Categories

Resources