fill gaps day with two tables in sql - sql

I have three different ID. id are dynamics
FOR EACH id, i need complete a calendar with last exist value.
example
ID
VALUE
date
1
30
1/1/2020
1
29
3/1/2020
2
65
1/1/2020
3
30
2/1/2020
1
11
6/1/2020
2
40
4/1/2020
3
23
5/1/2020
OUTPUT EXPECTED
ID
VALUE
date
1
30
1/1/2020
1
30
2/1/2020
1
29
3/1/2020
1
29
4/1/2020
1
29
5/1/2020
1
11
6/1/2020
2
65
1/1/2020
2
65
2/1/2020
2
65
3/1/2020
2
40
4/1/2020
2
40
5/1/2020
2
40
6/1/2020
3
30
2/1/2020
3
30
3/1/2020
3
30
4/1/2020
3
23
5/1/2020
3
23
6/1/2020
---Complete fields until today for each id---

Related

Pandas drop_duplicates with multiple conditions

I have some measurement datas that need to be filtered, I read them as dataframe data, like these:
df
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
3 25 16 97 104
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
7 30 18 29 106
and I need to use two different conditions at the same time, that is, to filter 'RequestTime' 'RequestID' and 'ResponseTime' 'ResponseID' by use drop_duplicate(subset=) at the same time. I have used follow command to get the filter results for each of the two conditions:
>>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['ResponseTime','ResponseID'])
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
7 30 18 29 106
>>>df[['RequestTime','RequestID','ResponseTime','ResponseID']].drop_duplicates(subset = ['RequestTime','RequestID'])
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
3 25 16 97 104
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
but how to combine the two conditions to drop duplicate row 3 and row 7?
IIUC,
m = ~(df.duplicated(subset=['RequestTime','RequestID']) | df.duplicated(subset=['ResponseTime', 'ResponseID']))
df[m]
Output:
RequestTime RequestID ResponseTime ResponseID
0 150 14 103 101
1 150 15 110 102
2 25 16 121 103
4 22 16 44 105
5 19 17 44 106
6 26 18 29 106
Create a mask (boolean series) to boolean index your dataframe.
Or chain methods:
df.drop_duplicates(subset=['RequestTime', 'RequestID']).drop_duplicates(subset=['ResponseTime', 'ResponseID'])

re-arrange and plot data pandas

I have a data frame like the following:
days movements count
0 0 0 2777
1 0 1 51
2 0 2 2
3 1 0 6279
4 1 1 200
5 1 2 7
6 1 3 3
7 2 0 5609
8 2 1 110
9 2 2 32
10 2 3 4
11 3 0 4109
12 3 1 118
13 3 2 101
14 3 3 8
15 3 4 3
16 3 6 1
17 4 0 3034
18 4 1 129
19 4 2 109
20 4 3 6
21 4 4 2
22 4 5 2
23 5 0 2288
24 5 1 131
25 5 2 131
26 5 3 9
27 5 4 2
28 5 5 1
29 6 0 1918
30 6 1 139
31 6 2 109
32 6 3 13
33 6 4 1
34 6 5 1
35 7 0 1442
36 7 1 109
37 7 2 153
38 7 3 13
39 7 4 10
40 7 5 1
41 8 0 1085
42 8 1 76
43 8 2 111
44 8 3 13
45 8 4 7
46 8 7 1
47 9 0 845
48 9 1 81
49 9 2 86
50 9 3 8
51 9 4 8
52 10 0 646
53 10 1 70
54 10 2 83
55 10 3 1
56 10 4 2
57 10 5 1
58 10 6 1
This shows that for example on day 0, I have 2777 entries with 0 movements, 51 entries with 1 movement, 2 entries with 2 movements. I want to plot it as bar graph for every day and show the entries count for all movements. In order to do it, I thought I would transform the data to something like below and then plot a bar graph.
days 0 1 2 3 4 5 6 7
0 2777 51 2
1 6279 200 7 3
2 5609 110 32 4
3 4109 118 101 8 3
4 3034 129 109 6 2 2
5 2288 131 131 9 2 1
6 1918 139 109 13 1 1
7 1442 109 153 13 10 1
8 1085 76 111 13 7 1
9 845 81 86 8 8
10 646 70 83 1 2 1 1
I am not getting an idea of how should I achieve this? I have thousands of lines of data so doing it by hand does not make sense. Can someone guide me how to rearrange the data or if there is a quick way to plot the bar graph using matplotlib straight from the actual data frame that would be even better. Thanks for the help.

Count registers in columns sql

I have a SQL table with below values
dozen1 dozen2 dozen3 dozen4 dozen5 dozen6
----------------------------------------------
10 27 40 46 49 58
2 11 34 37 32 50
3 4 29 36 45 55
14 32 33 36 44 52
20 11 36 38 47 53
1 5 11 16 20 55
2 18 31 42 51 52
5 11 22 24 51 53
1 3 11 17 34 45
I need count the quantity of results by number, sample :
Number 10 appears 1 time
Number 2 appears 2 times
Result:
Dozen Times
--------------
10 1
2 2
....
How do this in a SQL query?
If you want to count the number of occurrences of distinct numbers in all columns:
select dozen, count(*) as times
from my_table
cross join unnest(array[dozen1, dozen2, dozen3, dozen4, dozen5, dozen6]) u(dozen)
group by 1
order by 1;
dozen | times
-------+-------
1 | 2
2 | 2
3 | 2
4 | 1
5 | 2
etc...

pandas element wise conditional return index

I want to find abnormal values and replace them with corresponding day of next week.
year week day v1 v2
2001 1 1 46 9999
2001 1 2 60 9335
2001 1 3 9999 9318
2001 1 4 47 9999
2001 1 5 57 9373
2001 1 6 9999 9384
2001 1 7 72 9444
2001 2 1 75 73
2001 2 2 74 63
2001 2 3 79 377
2001 2 4 70 361
2001 2 5 75 73
2001 2 6 77 64
2001 2 7 76 57
I could carry out column by column,code as follows:
index_row=df[df['v1']==9999].index
for i in index_row:
df['v1'][i]=df['v1'][i+7] # i+7 is the index of next week
How to element-wise the whole dataframe? Such as pd.applymap.
How get the columns number(name) and row number base on conditional seiving values?
The target df I want as follows:
( * indicated modified values and the next week values)
year week day v1 v2
2001 1 1 46 *73
2001 1 2 60 9335
2001 1 3 *79 9318
2001 1 4 47 *361
2001 1 5 57 9373
2001 1 6 *77 9384
2001 1 7 72 9444
2001 2 1 75 *73
2001 2 2 74 63
2001 2 3 *79 377
2001 2 4 70 *361
2001 2 5 75 73
2001 2 6 *77 64
2001 2 7 76 57
create d1 with set_index on columns ['year', 'week', 'day']
create d2 with same index as d1 except, subtract 1 from week
mask with other
cols = ['year', 'week', 'day']
d1 = df.set_index(cols)
d2 = df.assign(week=df.week - 1).set_index(cols)
d1.mask(d1.eq(9999), d2).reset_index()
year week day v1 v2
0 2001 1 1 46 73
1 2001 1 2 60 9335
2 2001 1 3 79 9318
3 2001 1 4 47 361
4 2001 1 5 57 9373
5 2001 1 6 77 9384
6 2001 1 7 72 9444
7 2001 2 1 75 73
8 2001 2 2 74 63
9 2001 2 3 79 377
10 2001 2 4 70 361
11 2001 2 5 75 73
12 2001 2 6 77 64
13 2001 2 7 76 57
old answer
One approach is to setup d1 with index of ['year', 'week', 'day'] and manipulate that to shift a week. Then mask it for equal to 9999 and fillna
d1 = df.set_index(['year', 'week', 'day'])
s1 = d1.unstack(['year', 'day']).shift(-1).stack(['year', 'day']).swaplevel(0, 1)
d1.mask(d1==9999).fillna(s1).reset_index()
year week day v1 v2
0 2001 1 1 46.0 73.0
1 2001 1 2 60.0 9335.0
2 2001 1 3 79.0 9318.0
3 2001 1 4 47.0 361.0
4 2001 1 5 57.0 9373.0
5 2001 1 6 77.0 9384.0
6 2001 1 7 72.0 9444.0
7 2001 2 1 75.0 73.0
8 2001 2 2 74.0 63.0
9 2001 2 3 79.0 377.0
10 2001 2 4 70.0 361.0
11 2001 2 5 75.0 73.0
12 2001 2 6 77.0 64.0
13 2001 2 7 76.0 57.0
You can working with DatetimeIndex, set value by mask with shifted rows:
a = df['year'].astype(str).add('-').add(df['week'].astype(str))
.add('-').add(df['day'].sub(1).astype(str))
#http://strftime.org/
df.index = pd.to_datetime(a, format='%Y-%U-%w')
df2 = df.shift(-1,freq='7D')
df = df.mask(df.eq(9999), df2).reset_index(drop=True)
print (df)
year week day v1 v2
0 2001 1 1 46 73
1 2001 1 2 60 9335
2 2001 1 3 79 9318
3 2001 1 4 47 361
4 2001 1 5 57 9373
5 2001 1 6 77 9384
6 2001 1 7 72 9444
7 2001 2 1 75 73
8 2001 2 2 74 63
9 2001 2 3 79 377
10 2001 2 4 70 361
11 2001 2 5 75 73
12 2001 2 6 77 64
13 2001 2 7 76 57

How can i give a numeric order to each line based on unique ID

I'm working on big data set and I need to give and print a numeric order for each unique ID ($1) and want to delete the lines above 335 numeric order for each unique ID.
The data looks like
101 24
101 13
101 15
102 25
102 21
102 23
103 20
103 12
103 18
The output looks like this
101 24 1
101 13 2
101 15 3
102 25 1
102 21 2
102 23 3
103 20 1
103 12 2
103 18 3
Try below one
Input
$ cat f
101 24
101 13
101 15
102 25
102 21
102 23
103 20
103 12
103 18
Output
$ awk '{print $0,++a[$1]}' f
101 24 1
101 13 2
101 15 3
102 25 1
102 21 2
102 23 3
103 20 1
103 12 2
103 18 3
If data is sorted ( column1 ) then use below one, faster
$ awk '$1!=p{n=0}{print $0,++n; p=$1}' f
101 24 1
101 13 2
101 15 3
102 25 1
102 21 2
102 23 3
103 20 1
103 12 2
103 18 3
To remove id above 335
$ awk '$1!=p{n=0; p=$1}++n<335{print $0,n}' f
$ awk '++a[$1]<335{print $0,a[$1]}' f