Unique records with any increase per day - sql

I am new to psql. I have a table of car ids - id and their battery level - battery at every ten-minute interval for 2 weeks.
My goal is to create an output of the total count of unique car ids whose batteries experienced any gain per day. That could mean that at any time over the course of the day where a car's battery level was higher than the previous time stamp. In other words, where the value of battery - the previous value of battery is positive. Records with NA values under battery should be skipped.
I have started with the query but I am unsure how to only select unique id's whose battery levels rose. Any recommendations would be appreciated!
SELECT count(distinct id), TO_CHAR(date_trunc('day', (time::timestamp) AT TIME ZONE 'EST'), 'YYYY-MM-DD') AS day FROM test_db ....
GROUP by day
ORDER by day
Here is a sample of the data :
id| time| battery
54 | 2017-12-12 09:50:04.402775+00 | 100
54 | 2017-12-12 09:40:04.618926+00 | 100
54 | 2017-12-12 09:30:04.11399+00 | 100
54 | 2017-12-12 09:20:03.906716+00 | 100
54 | 2017-12-12 09:10:03.955133+00 | 100
54 | 2017-12-12 09:00:04.678508+00 | 100
54 | 2017-12-12 08:50:03.733471+00 | 100
54 | 2017-12-12 08:40:03.65688+00 | 100
54 | 2017-12-12 08:30:04.260608+00 | 100
54 | 2017-12-12 08:20:03.98387+00 | 100
54 | 2017-12-12 08:10:04.164129+00 | 98
54 | 2017-12-12 08:00:04.597976+00 | 98
54 | 2017-12-12 07:50:04.501231+00 | 98
54 | 2017-12-12 07:40:04.441531+00 | 98
54 | 2017-12-12 07:30:04.310876+00 | 98
54 | 2017-12-12 07:20:04.317241+00 | 98
54 | 2017-12-12 07:10:03.856432+00 | 67
54 | 2017-12-12 07:00:03.628862+00 | 67
54 | 2017-12-12 06:50:03.868495+00 | 67
54 | 2017-12-12 06:40:04.490324+00 | 67
54 | 2017-12-12 06:30:03.83739+00 | 67
54 | 2017-12-12 06:20:03.817014+00 | 67
54 | 2017-12-12 06:10:04.081174+00 | 29
54 | 2017-12-12 06:00:04.178765+00 | 29
data_type
--------------------------
integer
timestamp with time zone
integer

Related

Pandas: keep first row of duplicated indices of second level of multi index

I found lots of drop_duplicates for index when both multi level indices are the same but, I would like to keep the first row of a multi index when the second level of the multi index has duplicates. So here:
| | col_0 | col_1 | col_2 | col_3 | col_4 |
|:-------------------------------|--------:|--------:|--------:|--------:|--------:|
| date | ID
| ('2022-01-01', 'identifier_0') | 26 | 46 | 44 | 21 | 10 |
| ('2022-01-01', 'identifier_1') | 25 | 45 | 83 | 23 | 45 |
| ('2022-01-01', 'identifier_2') | 42 | 79 | 55 | 5 | 78 |
| ('2022-01-01', 'identifier_3') | 32 | 4 | 57 | 19 | 61 |
| ('2022-01-01', 'identifier_4') | 30 | 25 | 5 | 93 | 72 |
| ('2022-01-02', 'identifier_0') | 42 | 14 | 56 | 43 | 42 |
| ('2022-01-02', 'identifier_1') | 90 | 27 | 46 | 58 | 5 |
| ('2022-01-02', 'identifier_2') | 33 | 39 | 53 | 94 | 86 |
| ('2022-01-02', 'identifier_3') | 32 | 65 | 98 | 81 | 64 |
| ('2022-01-02', 'identifier_4') | 48 | 31 | 25 | 58 | 15 |
| ('2022-01-03', 'identifier_0') | 5 | 80 | 33 | 96 | 80 |
| ('2022-01-03', 'identifier_1') | 15 | 86 | 45 | 39 | 62 |
| ('2022-01-03', 'identifier_2') | 98 | 3 | 42 | 50 | 83 |
I'd like to keep first rows with unique ID.
If your index is a MultiIndex:
>>> df.loc[~df.index.get_level_values('ID').duplicated()]
col_0 col_1 col_2 col_3 col_4
date ID
2022-01-01 identifier_0 26 46 44 21 10
identifier_1 25 45 83 23 45
identifier_2 42 79 55 5 78
identifier_3 32 4 57 19 61
identifier_4 30 25 5 93 72
# Or
>>> df.groupby(level='ID').first()
col_0 col_1 col_2 col_3 col_4
ID
identifier_0 26 46 44 21 10
identifier_1 25 45 83 23 45
identifier_2 42 79 55 5 78
identifier_3 32 4 57 19 61
identifier_4 30 25 5 93 72
If your index is an Index:
>>> df.loc[~df.index.str[1].duplicated()]
col_0 col_1 col_2 col_3 col_4
(2022-01-01, identifier_0) 26 46 44 21 10
(2022-01-01, identifier_1) 25 45 83 23 45
(2022-01-01, identifier_2) 42 79 55 5 78
(2022-01-01, identifier_3) 32 4 57 19 61
(2022-01-01, identifier_4) 30 25 5 93 72
>>> df.groupby(df.index.str[1]).first()
col_0 col_1 col_2 col_3 col_4
identifier_0 26 46 44 21 10
identifier_1 25 45 83 23 45
identifier_2 42 79 55 5 78
identifier_3 32 4 57 19 61
identifier_4 30 25 5 93 72

How to create a SQL query for the below scenario

I am using Snowflake SQL, but I guess this can be solved by any sql. So I have data like this:
RA_MEMBER_ID YEAR QUARTER MONTH Monthly_TOTAL_PURCHASE CATEGORY
1000 2020 1 1 105 CAT10
1000 2020 1 1 57 CAT13
1000 2020 1 2 107 CAT10
1000 2020 1 2 59 CAT13
1000 2020 1 3 109 CAT11
1000 2020 1 3 61 CAT14
1000 2020 2 4 111 CAT11
1000 2020 2 4 63 CAT14
1000 2020 2 5 113 CAT12
1000 2020 2 5 65 CAT15
1000 2020 2 6 115 CAT12
1000 2020 2 6 67 CAT15
And I need data like this:
RA_MEMBER_ID YEAR QUARTER MONTH Monthly_TOTAL_PURCHASE CATEGORY Monthly_rank Quarterly_Total_purchase Quarter_category Quarter_rank Yearly_Total_purchase Yearly_category Yearly_rank
1000 2020 1 1 105 CAT10 1 105 CAT10 1 105 CAT10 1
1000 2020 1 1 57 CAT13 2 57 CAT13 2 57 CAT13 2
1000 2020 1 2 107 CAT10 1 212 CAT10 1 212 CAT10 1
1000 2020 1 2 59 CAT13 2 116 CAT13 2 116 CAT13 2
1000 2020 1 3 109 CAT11 1 212 CAT10 1 212 CAT10 1
1000 2020 1 3 61 CAT14 2 116 CAT13 2 116 CAT13 2
1000 2020 2 4 111 CAT11 1 111 CAT11 1 212 CAT10 1
1000 2020 2 4 63 CAT14 2 63 CAT14 2 124 CAT14 2
1000 2020 2 5 113 CAT12 1 113 CAT12 1 212 CAT10 1
1000 2020 2 5 65 CAT15 2 65 CAT15 2 124 CAT14 2
1000 2020 2 6 115 CAT12 1 228 CAT12 1 228 CAT12 1
1000 2020 2 6 67 CAT15 2 132 CAT15 2 132 CAT15 2
So basically, I have the top two categories by purchase amount for the first 6 months. I need the same for quarterly based on which month of the quarter it is. So let's say it is February, then the top 2 categories and amounts should be calculated based on both January and February. For March we have to get the quarter data by taking all three months. From April it will be the same as monthly rank, for May again calculate based on April and May. Similarly for Yearly also.
I have tried a lot of things but nothing seems to give me what I want.
The solution should be generic enough because there can be many other months and years.
I really need help in this.
Not sure if below is what you are after. I assume that everything is category based:
create or replace table test (
ra_member_id int,
year int,
quarter int,
month int,
monthly_purchase int,
category varchar
);
insert into test values
(1000, 2020, 1,1, 105, 'cat10'),
(1000, 2020, 1,1, 57, 'cat13'),
(1000, 2020, 1,2, 107, 'cat10'),
(1000, 2020, 1,2, 59, 'cat13'),
(1000, 2020, 1,3, 109, 'cat11'),
(1000, 2020, 1,3, 61, 'cat14'),
(1000, 2020, 2,4, 111, 'cat11'),
(1000, 2020, 2,4, 63, 'cat14'),
(1000, 2020, 2,5, 113, 'cat12'),
(1000, 2020, 2,5, 65, 'cat15'),
(1000, 2020, 2,6, 115, 'cat12'),
(1000, 2020, 2,6, 67, 'cat15');
WITH BASE as (
select
RA_MEMBER_ID,
YEAR,
QUARTER,
MONTH,
CATEGORY,
MONTHLY_PURCHASE,
LAG(MONTHLY_PURCHASE) OVER (PARTITION BY QUARTER, CATEGORY ORDER BY MONTH) AS QUARTERLY_PURCHASE_LAG,
IFNULL(QUARTERLY_PURCHASE_LAG, 0) + MONTHLY_PURCHASE AS QUARTERLY_PURCHASE,
LAG(MONTHLY_PURCHASE) OVER (PARTITION BY YEAR, CATEGORY ORDER BY MONTH) AS YEARLY_PURCHASE_LAG,
IFNULL(YEARLY_PURCHASE_LAG, 0) + MONTHLY_PURCHASE AS YEARLY_PURCHASE
FROM
TEST
),
BASE_RANK AS (
SELECT
RA_MEMBER_ID,
YEAR,
QUARTER,
MONTH,
CATEGORY,
MONTHLY_PURCHASE,
RANK() OVER (PARTITION BY MONTH ORDER BY MONTHLY_PURCHASE DESC) as MONTHLY_RANK,
QUARTERLY_PURCHASE,
RANK() OVER (PARTITION BY QUARTER ORDER BY QUARTERLY_PURCHASE DESC) as QUARTERLY_RANK,
YEARLY_PURCHASE,
RANK() OVER (PARTITION BY YEAR ORDER BY YEARLY_PURCHASE DESC) as YEARLY_RANK
FROM BASE
),
MAIN AS (
SELECT
RA_MEMBER_ID,
YEAR,
QUARTER,
MONTH,
CATEGORY,
MONTHLY_PURCHASE,
MONTHLY_RANK,
QUARTERLY_PURCHASE,
QUARTERLY_RANK,
YEARLY_PURCHASE,
YEARLY_RANK
FROM BASE_RANK
)
SELECT * FROM MAIN
ORDER BY YEAR, QUARTER, MONTH
;
Result:
+--------------+------+---------+-------+----------+------------------+--------------+--------------------+----------------+-----------------+-------------+
| RA_MEMBER_ID | YEAR | QUARTER | MONTH | CATEGORY | MONTHLY_PURCHASE | MONTHLY_RANK | QUARTERLY_PURCHASE | QUARTERLY_RANK | YEARLY_PURCHASE | YEARLY_RANK |
|--------------+------+---------+-------+----------+------------------+--------------+--------------------+----------------+-----------------+-------------|
| 1000 | 2020 | 1 | 1 | cat10 | 105 | 1 | 105 | 4 | 105 | 9 |
| 1000 | 2020 | 1 | 1 | cat13 | 57 | 2 | 57 | 6 | 57 | 12 |
| 1000 | 2020 | 1 | 2 | cat10 | 107 | 1 | 212 | 1 | 212 | 3 |
| 1000 | 2020 | 1 | 2 | cat13 | 59 | 2 | 116 | 2 | 116 | 6 |
| 1000 | 2020 | 1 | 3 | cat11 | 109 | 1 | 109 | 3 | 109 | 8 |
| 1000 | 2020 | 1 | 3 | cat14 | 61 | 2 | 61 | 5 | 61 | 11 |
| 1000 | 2020 | 2 | 4 | cat11 | 111 | 1 | 111 | 4 | 220 | 2 |
| 1000 | 2020 | 2 | 4 | cat14 | 63 | 2 | 63 | 6 | 124 | 5 |
| 1000 | 2020 | 2 | 5 | cat12 | 113 | 1 | 113 | 3 | 113 | 7 |
| 1000 | 2020 | 2 | 5 | cat15 | 65 | 2 | 65 | 5 | 65 | 10 |
| 1000 | 2020 | 2 | 6 | cat12 | 115 | 1 | 228 | 1 | 228 | 1 |
| 1000 | 2020 | 2 | 6 | cat15 | 67 | 2 | 132 | 2 | 132 | 4 |
+--------------+------+---------+-------+----------+------------------+--------------+--------------------+----------------+-----------------+-------------+

PIVOT TABLE WITH MULTITAB [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am working with a set of data that looks something like the following.
NAME SALETYPE UNITS
AMBREEN SALE 98
AMBREEN REFUND 4
ASIF SALE 80
ASIF REFUND 12
ASIF SALE 56
FARHAN REFUND 15
FARHAN SALE 124
FARIHA SALE 45
FARIHA REFUND 21
JABEEN SALE 120
JABEEN REFUND 72
JABEEN SALE 85
JUNAID SALE 69
JUNAID REFUND 8
MUNEEB SALE 25
MUNEEB REFUND 45
MUNEEB SALE 12
MUSHTAQ SALE 15
MUSHTAQ REFUND 25
NASIRA SALE 87
NASIRA REFUND 23
SADAF SALE 70
SADAF REFUND 14
SADAF SALE 45
RAO SALE 100
RAO REFUND 2
SOHAIL REFUND 20
SOHAIL SALE 123
I need to get results simlilar to the following.
NAME SALE REFUND TOTAL
AMBREEN 98 4 102
ASIF 80 12 92
FARHAN 45 21 66
FARIHA 205 72 277
JABEEN 69 8 77
JUNAID 37 45 82
MUNEEB 25 15 40
MUSHTAQ 87 23 110
NASIRA 115 14 129
SADAF 100 2 102
RAO 0 20 20
SOHAIL 123 20 143
This (conditional aggregation) should work with all major RDBMSes
SELECT name, sale, refund, sale + refund total
FROM
(
SELECT name,
SUM(CASE WHEN saletype = 'SALE' THEN units ELSE 0 END) sale,
SUM(CASE WHEN saletype = 'REFUND' THEN units ELSE 0 END) refund
FROM table1
GROUP BY name
) q
ORDER BY name
Output:
| NAME | SALE | REFUND | TOTAL |
|---------|------|--------|-------|
| AMBREEN | 98 | 4 | 102 |
| ASIF | 136 | 12 | 148 |
| FARHAN | 124 | 15 | 139 |
| FARIHA | 45 | 21 | 66 |
| JABEEN | 205 | 72 | 277 |
| JUNAID | 69 | 8 | 77 |
| MUNEEB | 37 | 45 | 82 |
| MUSHTAQ | 15 | 25 | 40 |
| NASIRA | 87 | 23 | 110 |
| RAO | 100 | 2 | 102 |
| SADAF | 115 | 14 | 129 |
| SOHAIL | 123 | 20 | 143 |
Here is SQLFiddle demo

How can I obtain sum of each row in Postgresql?

I have got some rows of results like below, if each sum (row(i)) same, I can suppose the results are correct. How can I write a SQL clause to calculate sum of each row? thanks.
27 | 29 | 27 | 36 | 33 | 29 | 16 | 17 | 35 | 28 | 34 | 15
27 | 29 | 27 | 29 | 33 | 29 | 16 | 17 | 35 | 28 | 34 | 15
27 | 29 | 27 | 14 | 33 | 29 | 16 | 17 | 35 | 28 | 34 | 15
27 | 29 | 16 | 37 | 33 | 29 | 16 | 17 | 35 | 28 | 34 | 15
27 | 29 | 16 | 36 | 33 | 29 | 16 | 17 | 35 | 28 | 34 | 15

SQL Server partitioning when null

I have a sql server table like this:
Value RowID Diff
153 48 1
68 49 1
50 57 NULL
75 58 1
65 59 1
70 63 NULL
66 64 1
79 66 NULL
73 67 1
82 68 1
85 69 1
66 70 1
118 88 NULL
69 89 1
67 90 1
178 91 1
How can I make it like this (note the partition after each null in 3rd column):
Value RowID Diff
153 48 1
68 49 1
50 57 NULL
75 58 2
65 59 2
70 63 NULL
66 64 3
79 66 NULL
73 67 4
82 68 4
85 69 4
66 70 4
118 88 NULL
69 89 5
67 90 5
178 91 5
It looks like you are partitioning over sequential values of RowID. There is a trick to do this directly by grouping on RowID - Row_Number():
select
value,
rowID,
Diff,
RowID - row_number() over (order by RowID) Diff2
from
Table1
Notice how this gets you similar groupings, except with distinct Diff values (in Diff2):
| VALUE | ROWID | DIFF | DIFF2 |
|-------|-------|--------|-------|
| 153 | 48 | 1 | 47 |
| 68 | 49 | 1 | 47 |
| 50 | 57 | (null) | 54 |
| 75 | 58 | 1 | 54 |
| 65 | 59 | 1 | 54 |
| 70 | 63 | (null) | 57 |
| 66 | 64 | 1 | 57 |
| 79 | 66 | (null) | 58 |
| 73 | 67 | 1 | 58 |
| 82 | 68 | 1 | 58 |
| 85 | 69 | 1 | 58 |
| 66 | 70 | 1 | 58 |
| 118 | 88 | (null) | 75 |
| 69 | 89 | 1 | 75 |
| 67 | 90 | 1 | 75 |
| 178 | 91 | 1 | 75 |
Then to get ordered values for Diff, you can use Dense_Rank() to produce a numbering over each separate partition - except when a value is Null:
select
value,
rowID,
case when Diff = 1
then dense_rank() over (order by Diff2)
else Diff end as Diff
from (
select
value,
rowID,
Diff,
RowID - row_number() over (order by RowID) Diff2
from
Table1
) T
The result is the expected result, except keyed off of RowID directly rather than off of the existing Diff column.
| VALUE | ROWID | DIFF |
|-------|-------|--------|
| 153 | 48 | 1 |
| 68 | 49 | 1 |
| 50 | 57 | (null) |
| 75 | 58 | 2 |
| 65 | 59 | 2 |
| 70 | 63 | (null) |
| 66 | 64 | 3 |
| 79 | 66 | (null) |
| 73 | 67 | 4 |
| 82 | 68 | 4 |
| 85 | 69 | 4 |
| 66 | 70 | 4 |
| 118 | 88 | (null) |
| 69 | 89 | 5 |
| 67 | 90 | 5 |
| 178 | 91 | 5 |