I have this result
ZONE SITE BRAND VALUE
north a a_brand1 10
north a a_brand2 15
north a a_brand3 27
south b b_brand1 17
south b b_brand2 5
south b b_brand3 56
Is there any way to add a column wih the sum grouped by zone, and site? like this: Total site a = 10+15+27 = 52 and total site b = 17+5+56 = 78
ZONE SITE BRAND VALUE TOTAL_IN_SITE
north a a_brand1 10 52
north a a_brand2 15 52
north a a_brand3 27 52
south b b_brand1 17 78
south b b_brand2 5 78
south b b_brand3 56 78
Thanks.
Use sum window function.
select t.*,sum(val) over(partition by zone,site)
from tbl t
Related
I have a query that looks like this:
with x as (
select *, date_format(SomeDate, 'MMM') as Month from SomeTable
)
select *, count(Package) over (partition by Company, Region order by SomeDate) as BoxCount
from x
Table SomeTable basically looks like this:
Package Company Region SomeDate
1 A East 20220101
2 A East 20220105
3 A East 20220310
4 A East 20220411
5 A East 20220502
6 A West 20220405
7 A West 20220505
8 A West 20220508
9 B East 20220106
10 B East 20220212
11 B East 20220311
12 B West 20220505
13 B North 20220908
The result I want is basically this:
Company Month BoxCount
A Jan 2
A Mar 3
A Apr 4
A May 8
B Jan 1
B Feb 2
B Mar 3
B May 4
B Sept 5
What I want is basically a CUSUM by Company and Region, however, when it's the month of the May, I'd like to calculate Region West with Region East then in September I'd like to calculate all 3 regions for each respective company. Is there a way to do this in Spark SQL?
My Query gives the cumulative sum, but I'm not sure how to go about from here.
I'm getting this error:
Error tokenizing data. C error: Expected 2 fields in line 11, saw 3
Code: import webbrowser
website = 'https://en.wikipedia.org/wiki/Winning_percentage'
webbrowser.open(website)
league_frame = pd.read_clipboard()
And the above mentioned comes next.
I believe you need use read_html - returned all parsed tables and select Dataframe by position:
website = 'https://en.wikipedia.org/wiki/Winning_percentage'
#select first parsed table
df1 = pd.read_html(website)[0]
print (df1.head())
Win % Wins Losses Year Team Comment
0 0.798 67 17 1882 Chicago White Stockings best pre-modern season
1 0.763 116 36 1906 Chicago Cubs best 154-game NL season
2 0.721 111 43 1954 Cleveland Indians best 154-game AL season
3 0.716 116 46 2001 Seattle Mariners best 162-game AL season
4 0.667 108 54 1975 Cincinnati Reds best 162-game NL season
#select second parsed table
df2 = pd.read_html(website)[1]
print (df2)
Win % Wins Losses Season Team \
0 0.890 73 9 2015–16 Golden State Warriors
1 0.110 9 73 1972–73 Philadelphia 76ers
2 0.106 7 59 2011–12 Charlotte Bobcats
Comment
0 best 82 game season
1 worst 82-game season
2 worst season statistically
I have the below dataframe has in a messy way and I need to club row 0 and 1 to make that as columns and keep rest rows from 3 asis:
Start Date 2005-01-01 Unnamed: 3 Unnamed: 4 Unnamed: 5
Dat an_1 an_2 an_3 an_4 an_5
mt mt s t inch km
23 45 67 78 89 9000
change to below dataframe :
Dat_mt an_1_mt an_2 _s an_3_t an_4_inch an_5_km
23 45 67 78 89 9000
IIUC
df.columns=df.loc[0]+'_'+df.loc[1]
df=df.loc[[2]]
df
Out[429]:
Dat_mt an_1_mt an_2_s an_3_t an_4_inch an_5_km
2 23 45 67 78 89 9000
I need some help with creating a query as SAS proc SQL.
Consider the following dataset which has sales from different regions already bucketed by 3 hour chunks (its only a subset, actual data covers 24 hours):
Date ObsAtHour Region Sales
1/1/2018 2 Asia 76
1/1/2018 2 Africa 5
1/1/2018 5 Asia 14
1/1/2018 5 Africa 10
2/1/2018 2 Asia 40
2/1/2018 2 Africa 1
2/1/2018 5 Asia 15
2/1/2018 5 Africa 20
I get data covering last 45 days..
I am trying to do two things
1) Group by date, ObsAtHour and Region and get cumulative sum of Sales such that I get something like
Date ObsAtHour Region Sales CumSales
1/1/2018 2 Asia 76 76
1/1/2018 2 Africa 5 5
1/1/2018 5 Asia 14 90
1/1/2018 5 Africa 10 15
2/1/2018 2 Asia 40 40
2/1/2018 2 Africa 1 1
2/1/2018 5 Asia 15 55
2/1/2018 5 Africa 20 21
2) Get Percentage for sales that indicate what percentage of daily sales per Region has been achieved at any obsAtHour. It would look like:
Date ObsAtHour Region Sales CumSales Pct
1/1/2018 2 Asia 76 76 84%
1/1/2018 2 Africa 5 5 33%
1/1/2018 5 Asia 14 90 100%
1/1/2018 5 Africa 10 15 100%
2/1/2018 2 Asia 40 40 72%
2/1/2018 2 Africa 1 1 4.76%
2/1/2018 5 Asia 15 55 100%
2/1/2018 5 Africa 20 21 100%
Your help will be very appreciated.
something like below
data have;
input Date:mmddyy10. ObsAtHour Region $ Sales;
format date mmddyy10;
datalines;
1/1/2018 2 Asia 76
1/1/2018 2 Africa 5
1/1/2018 5 Asia 14
1/1/2018 5 Africa 10
2/1/2018 2 Asia 40
2/1/2018 2 Africa 1
2/1/2018 5 Asia 15
2/1/2018 5 Africa 20
;
proc sort data=have;
by date region;
run;
/* this gives moving sum*/
data have1;
format date mmddyy10.;
set have;
by date region;
if first.region then sumsales = sales;
else sumsales+sales;
run;
/* get the total sales from your intial table by group and join it back
and calculate the percent*/
proc sql;
select a.*, sumsales/tot_sales as per format =percent10.2 from
(select * from have1)a
inner join
(select region , date, sum(sales) as tot_sales
from have
group by 1, 2)b
on a.region =b.region
and a.date =b.date;
The key to understanding the following query is that the cumulative levels will be called tiers. The tiers are used as part of the self-join criteria to restrict the items that are grouped for being summed.
Data
data have;
input Date ddmmyy10. ObsAtHour Region $ Sales;
format Date yymmdd10.;
datalines;
1/1/2018 2 Asia 76
1/1/2018 2 Africa 5
1/1/2018 5 Asia 14
1/1/2018 5 Africa 10
2/1/2018 2 Asia 40
2/1/2018 2 Africa 1
2/1/2018 5 Asia 15
2/1/2018 5 Africa 20
run;
Sample query
The second query (percentage computation) is performed off the result of the first query (cumulative computation), however, the first query could by embedded as a nested query within the second one.
proc sql;
create table want(label='Cumulative within day up to obsathour') as
select
tiers.Date
, tiers.ObsAtHour
, tiers.Region
, Sum(case when have.ObsAtHour = tiers.ObsAtHour then have.Sales else 0 end) as SalesAtTier
, Sum(have.Sales) as CumSales
, Count(*) as CumCount
from
have
join
(select distinct Date, ObsAtHour, Region from have) as tiers
on
have.Date = tiers.Date
and have.Region = tiers.Region
and have.ObsAtHour <= tiers.ObsAtHour
group by
tiers.Date, tiers.Region, tiers.ObsAtHour
order
by Date, ObsAtHour, Region
;
create table want2 as
select
cum.Date
, cum.ObsAtHour
, cum.Region
, cum.SalesAtTier
, cum.CumSales
, cum.CumSales / Sum(cum.SalesAtTier) as fraction format=Percent7.2
from
want as cum
group by
cum.Date, cum.Region
order by
cum.Date, cum.ObsAtHour, cum.Region
;
I have a set of data that lists each employee ever employed in a certain type of department at many cities, and it lists each employee's begin and end date.
For example:
name city_id start_date end_date
-----------------------------------------
Joe Public 54 3-19-1994 9-1-2002
Suzi Que 54 10-1-1995 9-1-2005
What I want is each city's employee count for each year in a particular period. For example, if this was all the data for city 54, then I'd show this as the query results if I wanted to show city 54's employee count for the years 1990-2005:
city_id year employee_count
-----------------------------
54 1990 0
54 1991 0
54 1992 0
54 1993 0
54 1994 1
54 1995 2
54 1996 2
54 1997 2
54 1998 2
54 1999 2
54 2000 2
54 2001 2
54 2002 2
54 2003 1
54 2004 1
54 2005 1
(Note that I will have many cities, so the primary key here would be city and year unless I want to have a separate id column.)
Is there an efficient SQL query to do this? All I can think of is a series of UNIONed queries, with one query for each year I wanted to get numbers for.
My dataset has a few hundred cities and 178,000 employee records. I need to find a few decades' worth of this yearly data for each city on my dataset.
replace 54 with your parameter
select
<city_id>, c.y, count(t.city_id)
from generate_series(1990, 2005) as c(y)
left outer join Table1 as t on
c.y between extract(year from t.start_date) and extract(year from t.end_date) and
t.city_id = <city_id>
group by c.y
order by c.y
sql fiddle demo