Date Formatting in SQL for weeks - sql

Select
[fa] as 'CouponName',
[fb] as 'Store',
[fc] as 'DateTime',
[fd] as 'PLU',
[fe] as 'QTY'
FROM [database].[dbo].[table]
where [fd] = '00milecard' and [fc] >= dateadd(dd, -70, getdate()),
Order By [fb]
This produces:
CouponName************Store***DateTime*********************PLU**************QTY
CPN: MILE CARD $5*** 747*** 2020-01-10 14:57:26.060*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 747*** 2020-01-10 19:21:12.763*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 747*** 2020-01-11 18:19:01.093*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 747*** 2020-01-12 17:11:29.610*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 747*** 2020-01-12 15:33:31.747*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 747*** 2020-01-13 13:11:58.243*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 747*** 2020-01-08 16:45:41.070*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 747*** 2020-01-03 18:11:12.050*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 748*** 2020-01-11 15:12:13.370*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 748*** 2020-01-10 11:59:28.517*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 748*** 2019-12-26 08:17:40.420*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 748*** 2019-12-26 15:39:31.900*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 748*** 2019-12-27 14:59:12.890*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 750*** 2020-01-04 19:08:45.337*** 00MILECARD*** 1.0000
CPN: MILE CARD $5*** 750*** 2020-01-08 06:23:59.963*** 00MILECARD*** 1.0000
I need this to sum the qty in a week's time span, per store number, and run for a period of 10 weeks(70 days).
Our week is a Monday - Sunday.
I think a "DATEDIFF" will do this, but I do not have any experience with this formatter.

I think something like this will do what you want:
select min(fc0), fb, sum(fe)
from [database].[dbo].[table]
where [f01] = '00milecard' and
datediff(week, fc, getdate()) <= 10
group by datediff(week, fc, getdate()), fb
Order By [fb]

Related

How to use LAG function and get the numbers in percentages in SQL

I have a table with columns below
Employee(linked_lylty_card_nbr, prod_nbr, tot_amt_incld_gst, start_txn_date, main_total_size, tota_size_uom)
Below is the respective table
linked_lylty_card_nbr, prod_nbr, tot_amt_incld_gst, start_txn_date, main_total_size, tota_size_uom
1100000000006296409 83563-EA 3.1600 2021-11-10 500.0000 ML
1100000000006296409 83563-EA 2.6800 2021-11-20 500.0000 ML
1100000000001959800 83563-EA 2.6900 2021-12-21 500.0000 ML
1100000000006296409 83563-EA 3.1600 2021-12-30 500.0000 ML
1100000000001959800 83563-EA 5.3700 2022-01-14 500.0000 ML
1100000000006296409 83563-EA 2.6800 2022-01-16 500.0000 ML
1100000000001959800 83563-EA 2.4900 2022-01-19 500.0000 ML
1100000000006296409 83563-EA 3.4600 2022-02-26 500.0000 ML
1100000000006296409 607577-EA 3.9800 2022-05-26 500.0000 ML
1100000000006296409 607577-EA 3.9800 2022-06-11 500.0000 ML
1100000000001959800 83563-EA 3.9800 2022-06-14 500.0000 ML
1100000000001959800 83563-EA 3.9800 2022-06-24 500.0000 ML
1100000000006296409 607577-EA 4.4600 2022-07-30 500.0000 ML
1100000000001959800 83563-EA 4.0100 2022-08-02 500.0000 ML
1100000000001959800 83563-EA 4.0100 2022-09-01 500.0000 ML
1100000000006296409 607577-EA 3.9800 2022-09-08 500.0000 ML
I'm trying to get the change in the volume per each visit in % i.e. percentage, for example if 1100000000006296409 is linked_lylty_card_nbr then for start_txn_date 2021-11-10 main_total_size is 500, then for the same customer for 2021-11-20 the main_total_size is 500, there's no difference and no of days it's taking for the linked_lylty_card_nbr to return and get the product in % i.e. percentage. Below is the SQL query i've written
SELECT
linked_lylty_card_nbr,
prod_nbr,
start_txn_date,
main_total_size,
total_size_uom,
(
main_total_size - LAG(main_total_size, 1) OVER (
PARTITION BY linked_lylty_card_nbr
ORDER BY
start_txn_date
)) / main_total_size AS change_in_volume_per_visit,
(
start_txn_date - LAG(start_txn_date, 1) OVER (
PARTITION BY linked_lylty_card_nbr
ORDER BY
start_txn_date
)) / main_total_size AS change_in_days_per_visit
FROM
Employee
ORDER BY
linked_lylty_card_nbr,
start_txn_date
The output is below
linked_lylty_card_nbr prod_nbr start_txn_date main_total_size tota_size_uomm change_in_volume_per_visit change_in_days_per_visit
1100000000001959800 83563-EA 2021-12-21 500.0 ML
1100000000001959800 83563-EA 2022-01-14 1000.0 ML 0.5 0.024
1100000000001959800 83563-EA 2022-01-19 500.0 ML -1.0 0.01
1100000000001959800 83563-EA 2022-06-14 500.0 ML 0.0 0.292
1100000000001959800 83563-EA 2022-06-24 500.0 ML 0.0 0.02
1100000000001959800 83563-EA 2022-08-02 500.0 ML 0.0 0.078
1100000000001959800 83563-EA 2022-09-01 500.0 ML 0.0 0.06
1100000000006296409 83563-EA 2021-11-10 500.0 ML
1100000000006296409 83563-EA 2021-11-20 500.0 ML 0.0 0.02
1100000000006296409 83563-EA 2021-12-30 500.0 ML 0.0 0.08
1100000000006296409 83563-EA 2022-01-16 500.0 ML 0.0 0.034
1100000000006296409 83563-EA 2022-02-26 500.0 ML 0.0 0.082
1100000000006296409 607577-EA 2022-05-26 500.0 ML 0.0 0.178
1100000000006296409 607577-EA 2022-06-11 500.0 ML 0.0 0.032
1100000000006296409 607577-EA 2022-07-30 500.0 ML 0.0 0.098
1100000000006296409 607577-EA 2022-09-08 500.0 ML 0.0 0.08
From the above output, the change_in_volume_per_visit column, 2nd row has value 0.5. But it must be 1 if 500 is jumping to 1000 as in main_total_size 1st row has 500 and 2nd row has 1000 and Can anyone please confirm whether change_in_days_per_visit values are correct or not?

Unable to change refresh rate using xrandr (xrandr: Configure crtc 0 failed)

Hello! I'm using a laptop with Arch Linux
I have been trying to change the refresh rate from my laptop (144 Hz to 60 Hz) using:
xrandr --output eDP-1 --mode 1920x1080 --rate 60
But when I run that, I get this:
xrandr: Configure crtc 0 failed
I have already checked that said refresh rate is supported by my monitor:
xrandr -q
Screen 0: minimum 320 x 200, current 3840 x 1080, maximum 16384 x 16384
eDP-1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 344mm x 194mm
1920x1080 144.00*+ 60.01 59.97 59.96 59.93
1680x1050 59.95 59.88
1400x1050 74.76 59.98
1600x900 59.99 59.94 59.95 59.82
1280x1024 85.02 75.02 60.02
1400x900 59.96 59.88
1280x960 85.00 60.00
1440x810 60.00 59.97
1368x768 59.88 59.85
1280x800 59.99 59.97 59.81 59.91
1152x864 75.00
1280x720 60.00 59.99 59.86 59.74
1024x768 85.00 75.05 60.04 85.00 75.03 70.07 60.00
1024x768i 86.96
960x720 85.00 75.00 60.00
928x696 75.00 60.05
896x672 75.05 60.01
1024x576 59.95 59.96 59.90 59.82
960x600 59.93 60.00
832x624 74.55
960x540 59.96 59.99 59.63 59.82
800x600 85.00 75.00 70.00 65.00 60.00 85.14 72.19 75.00 60.32 56.25
840x525 60.01 59.88
864x486 59.92 59.57
700x525 74.76 59.98
800x450 59.95 59.82
640x512 85.02 75.02 60.02
700x450 59.96 59.88
640x480 85.09 60.00 85.01 72.81 75.00 59.94
720x405 59.51 58.99
720x400 85.04
684x384 59.88 59.85
640x400 59.88 59.98 85.08
576x432 75.00
640x360 59.86 59.83 59.84 59.32
640x350 85.08
512x384 85.00 75.03 70.07 60.00
512x384i 87.06
512x288 60.00 59.92
416x312 74.66
480x270 59.63 59.82
400x300 85.27 72.19 75.12 60.32 56.34
432x243 59.92 59.57
320x240 85.18 72.81 75.00 60.05
360x202 59.51 59.13
360x200 85.04
320x200 85.27
320x180 59.84 59.32
320x175 85.27)
I even specified the crtc, but I keep getting the same (xrandr: Configure crtc 0 failed) error
xrandr --output eDP-1 --crtc 0 --mode 1920x1080 --rate 60.00
Even changing to a resolution that is not 1920x1080 gives me the same error, any ideas?
try this
xrandr --output eDP-1 --rate 60.00 --mode 1920x1080
This morning, I updated all of my packets
pacman -Syu
Unexpectedly solving the error

Joining data frames with datetime windows

Looking to match two data frames using times in one dataframe that fall into time windows of another dataframe.
Production Dataframe
Production Time
Product
Value
Worker_ID
2020-01-24 08:13:59
Prod4
5.9
402
2020-01-24 08:15:38
Prod5
5.7
402
2020-01-24 08:17:17
Prod4
5.1
402
2020-01-25 22:13:59
Prod4
5.9
402
2020-01-25 21:15:38
Prod7
5.7
402
2020-01-26 02:17:17
Prod2
5.1
402
2020-01-24 09:17:17
Prod4
5.1
403
2020-01-25 21:13:59
Prod5
5.9
403
Location Dataframe
Location
window_start
window_stop
Worker_ID
Loc16
2020-01-24 05:00:00
2020-01-24 21:00:00
402
Loc27
2020-01-25 21:00:00
2020-01-26 05:00:00
402
Loc61
2020-01-24 05:00:00
2020-01-24 21:00:00
403
Loc27
2020-01-25 21:00:00
2020-01-26 05:00:00
403
Results would look like this:
Location
window_start
window_stop
Worker_ID
Production Time
Product
Quality
Loc16
2020-01-24 05:00:00
2020-01-24 21:00:00
402
2020-01-24 08:13:59
Prod4
5.9
Loc16
2020-01-24 05:00:00
2020-01-24 21:00:00
402
2020-01-24 08:15:38
Prod5
5.7
Loc16
2020-01-24 05:00:00
2020-01-24 21:00:00
402
2020-01-24 08:17:17
Prod4
5.1
Loc27
2020-01-25 21:00:00
2020-01-26 05:00:00
402
2020-01-25 22:13:59
Prod4
5.9
Loc27
2020-01-25 21:00:00
2020-01-26 05:00:00
402
2020-01-25 21:15:38
Prod7
5.7
Loc27
2020-01-25 21:00:00
2020-01-26 05:00:00
402
2020-01-26 02:17:17
Prod2
5.1
Loc61
2020-01-24 05:00:00
2020-01-24 21:00:00
403
2020-01-24 09:17:17
Prod4
5.1
Loc27
2020-01-25 21:00:00
2020-01-26 05:00:00
403
2020-01-25 21:13:59
Prod5
5.9
Where the match is made first on Worker_ID then where the Production datetime falls in the datetime window of the the location.
This code works:
possible_matches = location_df.merge(production_df,on='Worker_ID',how='left')
build_df = possible_matches[(possible_matches['Production Time'] >= possible_matches['window_start']) &
(possible_matches['Production Time'] <= possible_matches['window_stop'])]
But does not work when there are millions of rows in the production dataframe and thousands of rows in the location dataframe.
Looking for a more efficient way of doing this join that actually works with large datasets with more workers and locations.
To avoid crash, you may have to check datetimes before merging:
I tried to generate 2 dataframes with 10,000 records for location and 5,000,000 for production.
dti = pd.date_range('2020-01-01', '2021-01-01', freq='H', closed='left')
df2 = pd.DataFrame({'Worker_ID': np.random.randint(100, 500, 10000)})
df2['window_start'] = np.random.choice(dti, len(df2))
df2['window_stop'] = df2['window_start'] + pd.DateOffset(hours=np.random.randint(4, 17))
df1 = pd.DataFrame({'Worker_ID': np.random.randint(100, 500, 5000000)})
df1['Production Time'] = pd.to_datetime(1e9 * np.random.randint(df2['window_start'].min().timestamp(), df2['window_stop'].max().timestamp(), len(df1)))
>>> df1
Worker_ID Production Time
0 263 2020-12-31 11:28:31
1 194 2020-09-19 04:57:17
2 139 2020-06-14 00:27:07
3 105 2020-04-14 02:45:05
4 484 2020-12-07 22:36:56
... ... ...
4999995 338 2020-05-29 18:30:39
4999996 455 2020-03-03 20:51:27
4999997 228 2020-12-19 01:43:12
4999998 197 2020-04-07 07:32:13
4999999 304 2020-07-06 14:51:39
[5000000 rows x 2 columns]
>>> df2
Worker_ID window_start window_stop
0 309 2020-10-07 18:00:00 2020-10-08 08:00:00
1 486 2020-01-24 19:00:00 2020-01-25 09:00:00
2 120 2020-11-05 10:00:00 2020-11-06 00:00:00
3 224 2020-04-08 15:00:00 2020-04-09 05:00:00
4 208 2020-01-08 23:00:00 2020-01-09 13:00:00
... ... ... ...
9995 218 2020-01-10 00:00:00 2020-01-10 14:00:00
9996 358 2020-10-12 03:00:00 2020-10-12 17:00:00
9997 474 2020-12-25 03:00:00 2020-12-25 17:00:00
9998 416 2020-10-26 20:00:00 2020-10-27 10:00:00
9999 443 2020-03-31 09:00:00 2020-03-31 23:00:00
[10000 rows x 3 columns]
# from tqdm import tqdm
# Convert datetime to arrays of int
ptime = df1['Production Time'].astype(int).values
wtime = df2[['window_start', 'window_stop']].astype(int).values
data = []
# for wid in tqdm(df2['Worker_ID'].unique()):
for wid in df2['Worker_ID'].unique():
i = df1.loc[df1['Worker_ID'] == wid]
j = df2.loc[df2['Worker_ID'] == wid]
m = [np.where((wtime[j.index, 0] <= p) & (p <= wtime[j.index, 1]), x, -1)
for x, p in enumerate(ptime[i.index])]
m = np.where(np.array(m) >= 0)
df = pd.concat([j.iloc[m[1]].reset_index(drop=True),
i.iloc[m[0]].reset_index(drop=True)], axis='columns')
data.append(df)
df = pd.concat(data)
Old answer
Create and interval index to bind each production time to the corresponding window and merge on Worker_ID and the interval:
ii = pd.IntervalIndex.from_tuples(list(zip(dfl['window_start'], dfl['window_stop'])),
closed='left') # left means >= and <
dfp['interval'] = pd.cut(dfp['Production Time'], bins=ii)
dfl['interval'] = ii
>>> pd.merge(dfl, dfp, on=['Worker_ID', 'interval'], how='left') \
.drop(columns='interval')
Location window_start window_stop Worker_ID Production Time Product Value
0 Loc16 2020-01-24 05:00:00 2020-01-24 21:00:00 402 2020-01-24 08:13:59 Prod4 5.9
1 Loc16 2020-01-24 05:00:00 2020-01-24 21:00:00 402 2020-01-24 08:15:38 Prod5 5.7
2 Loc16 2020-01-24 05:00:00 2020-01-24 21:00:00 402 2020-01-24 08:17:17 Prod4 5.1
3 Loc27 2020-01-25 21:00:00 2020-01-26 05:00:00 402 2020-01-25 22:13:59 Prod4 5.9
4 Loc27 2020-01-25 21:00:00 2020-01-26 05:00:00 402 2020-01-25 21:15:38 Prod7 5.7
5 Loc27 2020-01-25 21:00:00 2020-01-26 05:00:00 402 2020-01-26 02:17:17 Prod2 5.1

Calculate difference between time over midnight and condition

When I calculate the difference between two time, I must also pay attention to what happens after midnight.
select *
,datediff(second,
cast([schedule_deptime]as time),
cast([prev_announce_time]as time))
as kpi1_delta
id
date_event
schedule_deptime
prev_announce_time
kpi1_delta
79643204
2021-02-11 19:55:52.000
19:15
2021-02-11 19:39:01.000
1441
79569510
2021-02-11 16:51:05.000
16:50
2021-02-11 16:48:17.000
-103
106160161
2021-01-21 20:43:44.000
20:28
2021-01-21 01:03:41.000
-69859
106216877
2021-01-21 23:50:10.000
23:45
2021-01-21 00:06:52.000
-85088
79703534
2021-02-11 23:58:01.000
00:04
2021-02-11 00:03:01.000
-59
I should get for my third and fourth row:
id
date_event
schedule_deptime
prev_announce_time
kpi1_delta
106160161
2021-01-21 20:43:44.000
20:28
2021-01-21 01:03:41.000
16541
106216877
2021-01-21 23:50:10.000
23:45
2021-01-21 00:06:52.000
1312

Subtotals and Grand totals across two axis

I have a dataframe like this:
Industry Firm Project Month Cost
0 Auto Company 1 NUKDJF 06-2020 1000.00
1 Auto Company 1 NUKDJF 07-2020 5000.00
2 Auto Company 1 NUKDJF 08-2020 5000.00
0 Auto Company 1 Alpha 06-2020 3000.00
1 Auto Company 1 Alpha 07-2020 0.00
2 Auto Company 1 Alpha 08-2020 0.00
3 Lamps ASDF Inc. BigThing 06-2020 2000.00
4 Lamps ASDF Inc. BigThing 07-2020 500.00
5 Lamps ASDF Inc. BigThing 08-2020 500.00
7 Lamps Super Corp SupProj 06-2020 1500.00
8 Lamps Super Corp SupProj 07-2020 8000.00
9 Lamps Super Corp SupProj 08-2020 9000.00
and I want to turn it into an Excel-style Pivot table with Subtotals and Grand Total like this:
Industry Firm Project 06-2020 07-2020 08-2020 Total
Auto 4000.00 5000.00 5000.00 14000.00
Company 1 4000.00 5000.00 5000.00 14000.00
NUKDJF 1000.00 5000.00 5000.00 11000.00
Alpha 3000.00 0.00 0.00 3000.00
Lamps 3500.00 8500.00 9500.00 21500.00
ASDF Inc. 2000.00 500.00 500.00 3000.00
BigThing 2000.00 500.00 500.00 3000.00
Super Corp 1500.00 8000.00 9000.00 18500.00
SupProj 1500.00 8000.00 9000.00 18500.00
Total 7500.00 13500.00 14500.00 35500.00
I am currently at this stage:
pd.concat([
df.assign(
**{x: 'total' for x in ['Industry', 'Firm', 'Project', 'Month'][i:]}
).groupby(list(['Industry', 'Firm', 'Project', 'Month'])).sum() for i in range(5)
]).sort_index()
but this does not provide Totals per Month
Thanks for any help!
Admittedly not elegant, but works...
indices = ["Industry","Firm","Project"]
l = list()
for index in [indices[0],indices[0:2],indices,None]:
tmp = pd.pivot_table(df,values="Cost",index=index,columns=["Month"],aggfunc=np.sum)
tmp["Total"] = tmp.sum(axis=1)
tmp.reset_index(inplace=True)
for col in indices:
if col not in tmp.columns:
tmp[col] = ""
tmp.set_index(indices,inplace=True)
tmp.drop("index",axis=1,errors='ignore',inplace=True)
l.append(tmp)
l[-1].index = [("Total","","")]
output = pd.concat(l[:-1]).sort_index()
output = pd.concat([output,l[-1]])
output
Month 06-2020 07-2020 08-2020 Total
Industry Firm Project
Auto 4000.0 5000.0 5000.0 14000.0
Company 1 4000.0 5000.0 5000.0 14000.0
Alpha 3000.0 0.0 0.0 3000.0
NUKDJF 1000.0 5000.0 5000.0 11000.0
Lamps 3500.0 8500.0 9500.0 21500.0
ASDF Inc. 2000.0 500.0 500.0 3000.0
BigThing 2000.0 500.0 500.0 3000.0
Super Corp 1500.0 8000.0 9000.0 18500.0
SupProj 1500.0 8000.0 9000.0 18500.0
Total 7500.0 13500.0 14500.0 35500.0
Another way is to use groupby and pivot_table
df_t = df.groupby(['Industry', 'Firm', 'Project']).agg({'Cost':'sum'}).reset_index()
df_t['Month'] = 'Total'
df2 = pd.pivot_table(df.append(df_t), index=['Industry', 'Firm', 'Project'], columns=['Month'], values='Cost', aggfunc='sum').reset_index()
df2 = df2.append(df2.sum(axis=0, numeric_only=True), ignore_index=True).fillna('')
df2.iloc[-1, df2.columns.get_loc('Industry')] = 'Total'
df2 = df2.set_index(['Industry', 'Firm', 'Project'])
print(df2.to_string())
It will give you the following output.
Month 06-2020 07-2020 08-2020 Total
Industry Firm Project
Auto Company1 Alpha 3000.0 0.0 0.0 3000.0
NUKDJF 1000.0 5000.0 5000.0 11000.0
Lamps ASDFInc. BigThing 2000.0 500.0 500.0 3000.0
SuperCorp SupProj 1500.0 8000.0 9000.0 18500.0
Total 7500.0 13500.0 14500.0 35500.0