retrieving data from field which has multiple values - google-bigquery

We have table person. It has sample fields with multiple values like
person
ID name tripNumber startPlace endPlace
1 xxx 20 Portland Atlanta
25 California Atlanta
40 America Africa
2 EKVV 40 America Africa
37 Argentina Carolina
We need to retrieve entire row of data in particular condition like tripNumber=40 and endPlace="Africa"
We need the result like this,
ID name tripNumber startPlace endPlace
1 xxx 40 America Africa
2 EKVV 40 America Africa

Below is for BigQuery Standard SQL
#standardSQL
WITH `project.dataset.person` AS (
SELECT 1 id, 'xxx' name, [20, 25, 40] tripNumber, ['Portland', 'California', 'America'] startPlace, ['Atlanta', 'Atlanta', 'Africa'] endPlace UNION ALL
SELECT 2, 'EKVV', [40, 37], ['America', 'Argentina'], ['Africa', 'Carolina']
)
SELECT id, name, tripNumber, startPlace[SAFE_OFFSET(off)] startPlace, endPlace[SAFE_OFFSET(off)] endPlace
FROM `project.dataset.person`,
UNNEST(tripNumber) tripNumber WITH OFFSET off
WHERE tripNumber = 40
with result
Row id name tripNumber startPlace endPlace
1 1 xxx 40 America Africa
2 2 EKVV 40 America Africa
Above solution assumes that you have independent repeated fields and match to be done based on positions in respective arrays
Below - is based on more common pattern of having repeated record
so if person table would look like below
Row id name trips.tripNumber trips.startPlace trips.endPlace
1 1 xxx 20 Portland Atlanta
25 California Atlanta
40 America Africa
2 2 EKVV 40 America Africa
37 Argentina Carolina
in this case solution would be
#standardSQL
WITH `project.dataset.person` AS (
SELECT 1 id, 'xxx' name, [STRUCT<tripNumber INT64, startPlace STRING, endPlace STRING>(20, 'Portland', 'Atlanta'),(25, 'California', 'Atlanta'),(40, 'America', 'Africa')] trips UNION ALL
SELECT 2, 'EKVV', [STRUCT(40, 'America', 'Africa'),(37, 'Argentina', 'Carolina')]
)
SELECT id, name, tripNumber, startPlace, endPlace
FROM `project.dataset.person`,
UNNEST(trips) trip
WHERE tripNumber = 40
with result
Row id name tripNumber startPlace endPlace
1 1 xxx 40 America Africa
2 2 EKVV 40 America Africa

Related

Can not use group by function

Data on Table:-
wkt Partners Team Opponent Runs Balls
1 S Hope & E Lewis WEST INDIES SOUTH AFRICA 43 66
2 S Hope & S Hetmyer WEST INDIES SOUTH AFRICA 70 79
3 D Bravo & S Hetmyer WEST INDIES SOUTH AFRICA 84 97
1 J Malan & Q Kock SOUTH AFRICA WEST INDIES 3 4
2 J Malan & F Plessis SOUTH AFRICA WEST INDIES 32 44
3 J Malan & R Dussen SOUTH AFRICA WEST INDIES 100 90
1 S Dhawan & R Sharma INDIA IRELAND 3 8
2 V Kohli & R Sharma INDIA IRELAND 102 70
I want to return the pair of partners, team they belong to, opponent they play against only once for each wkt where runs are highest for that particular wkt
For above table I'd like result as follow
wkt Partners Team Opponent Runs Balls
1 S Hope & E Lewis WEST INDIES SOUTH AFRICA 43 66
2 V Kohli & R Sharma INDIA IRELAND 102 70
3 J Malan & R Dussen SOUTH AFRICA WEST INDIES 100 90
Following is the code that I've used
SELECT wkt, Partners, Team, Opponent, max(Runs), Balls
FROM Partnerships
GROUP BY wkt
But I've been stuck with following error
Column 'Partnerships.Partners' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
How about row_number()?
select p.*
from (select p.*, row_number() over (partition by wkt order by runs desc) as seqnum
from Partnerships p
) p
where seqnum = 1;

Calculate Columns Cumulative Sum and Percentage in SAS

I need some help with creating a query as SAS proc SQL.
Consider the following dataset which has sales from different regions already bucketed by 3 hour chunks (its only a subset, actual data covers 24 hours):
Date ObsAtHour Region Sales
1/1/2018 2 Asia 76
1/1/2018 2 Africa 5
1/1/2018 5 Asia 14
1/1/2018 5 Africa 10
2/1/2018 2 Asia 40
2/1/2018 2 Africa 1
2/1/2018 5 Asia 15
2/1/2018 5 Africa 20
I get data covering last 45 days..
I am trying to do two things
1) Group by date, ObsAtHour and Region and get cumulative sum of Sales such that I get something like
Date ObsAtHour Region Sales CumSales
1/1/2018 2 Asia 76 76
1/1/2018 2 Africa 5 5
1/1/2018 5 Asia 14 90
1/1/2018 5 Africa 10 15
2/1/2018 2 Asia 40 40
2/1/2018 2 Africa 1 1
2/1/2018 5 Asia 15 55
2/1/2018 5 Africa 20 21
2) Get Percentage for sales that indicate what percentage of daily sales per Region has been achieved at any obsAtHour. It would look like:
Date ObsAtHour Region Sales CumSales Pct
1/1/2018 2 Asia 76 76 84%
1/1/2018 2 Africa 5 5 33%
1/1/2018 5 Asia 14 90 100%
1/1/2018 5 Africa 10 15 100%
2/1/2018 2 Asia 40 40 72%
2/1/2018 2 Africa 1 1 4.76%
2/1/2018 5 Asia 15 55 100%
2/1/2018 5 Africa 20 21 100%
Your help will be very appreciated.
something like below
data have;
input Date:mmddyy10. ObsAtHour Region $ Sales;
format date mmddyy10;
datalines;
1/1/2018 2 Asia 76
1/1/2018 2 Africa 5
1/1/2018 5 Asia 14
1/1/2018 5 Africa 10
2/1/2018 2 Asia 40
2/1/2018 2 Africa 1
2/1/2018 5 Asia 15
2/1/2018 5 Africa 20
;
proc sort data=have;
by date region;
run;
/* this gives moving sum*/
data have1;
format date mmddyy10.;
set have;
by date region;
if first.region then sumsales = sales;
else sumsales+sales;
run;
/* get the total sales from your intial table by group and join it back
and calculate the percent*/
proc sql;
select a.*, sumsales/tot_sales as per format =percent10.2 from
(select * from have1)a
inner join
(select region , date, sum(sales) as tot_sales
from have
group by 1, 2)b
on a.region =b.region
and a.date =b.date;
The key to understanding the following query is that the cumulative levels will be called tiers. The tiers are used as part of the self-join criteria to restrict the items that are grouped for being summed.
Data
data have;
input Date ddmmyy10. ObsAtHour Region $ Sales;
format Date yymmdd10.;
datalines;
1/1/2018 2 Asia 76
1/1/2018 2 Africa 5
1/1/2018 5 Asia 14
1/1/2018 5 Africa 10
2/1/2018 2 Asia 40
2/1/2018 2 Africa 1
2/1/2018 5 Asia 15
2/1/2018 5 Africa 20
run;
Sample query
The second query (percentage computation) is performed off the result of the first query (cumulative computation), however, the first query could by embedded as a nested query within the second one.
proc sql;
create table want(label='Cumulative within day up to obsathour') as
select
tiers.Date
, tiers.ObsAtHour
, tiers.Region
, Sum(case when have.ObsAtHour = tiers.ObsAtHour then have.Sales else 0 end) as SalesAtTier
, Sum(have.Sales) as CumSales
, Count(*) as CumCount
from
have
join
(select distinct Date, ObsAtHour, Region from have) as tiers
on
have.Date = tiers.Date
and have.Region = tiers.Region
and have.ObsAtHour <= tiers.ObsAtHour
group by
tiers.Date, tiers.Region, tiers.ObsAtHour
order
by Date, ObsAtHour, Region
;
create table want2 as
select
cum.Date
, cum.ObsAtHour
, cum.Region
, cum.SalesAtTier
, cum.CumSales
, cum.CumSales / Sum(cum.SalesAtTier) as fraction format=Percent7.2
from
want as cum
group by
cum.Date, cum.Region
order by
cum.Date, cum.ObsAtHour, cum.Region
;

SQL Select Distinct returning duplicates

I am trying to return the country, golfer name, golfer age, and average drive for the golfers with the highest average drive from each country.
However I am getting a result set with duplicates of the same country. What am I doing wrong? here is my code:
select distinct country, name, age, avgdrive
from pga.golfers S1
inner join
(select max(avgdrive) as MaxDrive
from pga.golfers
group by country) S2
on S1.avgdrive = s2.MaxDrive
order by avgdrive;
These are some of the results I've been getting back, I should only be getting 15 rows, but instead I'm getting 20:
COUN NAME AGE AVGDRIVE
---- ------------------------------ ---------- ----------
Can Mike Weir 35 279.9
T&T Stephen Ames 41 285.8
USA Tim Petrovic 39 285.8
Ger Bernhard Langer 47 289.3
Swe Fredrik Jacobson 30 290
Jpn Ryuji Imada 28 290
Kor K.J. Choi 37 290.4
Eng Greg Owen 33 291.8
Ire Padraig Harrington 33 291.8
USA Scott McCarron 40 291.8
Eng Justin Rose 25 293.1
Ind Arjun Atwal 32 293.7
USA John Rollins 30 293.7
NIr Darren Clarke 37 294
Swe Daniel Chopra 31 297.2
Aus Adam Scott 25 300.6
Fij Vijay Singh 42 300.7
Spn Sergio Garcia 25 301.9
SAf Ernie Els 35 302.9
USA Tiger Woods 29 315.2
You are missing a join condition:
select s1.country, s1.name, s1.age, s1.avgdrive
from pga.golfers S1 inner join
(select country, max(avgdrive) as MaxDrive
from pga.golfers
group by country
) S2
on S1.avgdrive = s2.MaxDrive and s1.country = s2.country
order by s1.avgdrive;
Your problem is that some people in one country have the same average as the best in another country.
DISTINCT eliminated duplicate rows, not values in some fields.
To get a list of countries with ages, names, and max drives, you would need to group the whole select by country.

Same-table Tree Table Query in SQL Server

I've searched but found nothing that could help.
I have the following table in a SQL Server 2005 database:
Parent Child Value
---- -------- ---------
America Mexico 8
America Canada 1
Asia Japan 5
Asia Korea 7
Europe Spain 0
Europe Italy 2
Africa Zimbabwe 1
Mexico Baja California 0
America USA 3
USA California 1
USA Texas 2
Parent and Child are Primary Key, value is not important (IMO). I would like to create a view that results in something like this:
Parent Child Value
---- -------- ---------
America USA 3
USA California 1
USA Texas 2
I would search for America, and the result will give back every nested child there is, recursively, no matter how many it has, since I could include cities, localities, etc.
What I need is similar to what some call a BOM explosion.
Here is how you can do it:
with cte as (
select parent, child
from t
union all
select cte.parent, t.child
from cte join
t
on cte.child = t.parent
)
select cte.*
from cte
where parent = 'America';
Here is a small SQL Fiddle example.

SQL Server: Merge Data Rows in single table in output

I have a SQL Server table with the following fields and sample data:
ID Name Address Age
23052-PF Peter Timbuktu 25
23052-D1 Jane Paris 22
23052-D2 David London 24
23050-PF Sam Beijing 22
23051-PF Nancy NYC 26
23051-D1 Carson Cali 22
23056-PF Grace LA 28
23056-D1 Smith Boston 23
23056-D2 Mark Adelaide 26
23056-D3 Hose Mexico 25
23056-D4 Mandy Victoria 24
Each ID with -PF is unique in the table.
Each ID with the -Dx is related to the same ID with the -PF.
Each ID with -PF may have 0 or more IDs with -Dx.
The maximum number of -Dx rows for a given -PF is 9.
i.e. an ID 11111-PF can have 11111-D1, 11111-D2, 11111-D3 up to 11111-D9.
Output expected for above sample data:
ID ID (without suffix) PF_Name PF_Address PF_Age D_Name D_Address D_Age
23052-PF 23052 Peter Timbuktu 25 Jane Paris 22
23052-PF 23052 Peter Timbuktu 25 David London 24
23050-PF 23050 Sam Beijing 22 NULL NULL NULL
23051-PF 23051 Nancy NYC 26 Carson Cali 22
23056-PF 23056 Grace LA 28 Smith Boston 23
23056-PF 23056 Grace LA 28 Mark Adelaide 26
23056-PF 23056 Grace LA 28 Hose Mexico 25
23056-PF 23056 Grace LA 28 Mandy Victoria 24
I need to be able to join the -PF and -Dx as above.
If a -PF has 0 Dx rows, then D_Name, D_Address and D_Age columns in the output should return NULL.
If a -PF has one or more Dx rows, then PF_Name, PF_Address and PF_Age should repeat for each row in the output and D_Name, D_Address and D_Age should contain the values from each related Dx row.
Need to use MSSQL.
Query should not use views or create additional tables.
Thanks for all your help!
select
pf.ID,
pf.IDNum,
pf.Name as PF_Name,
pf.Address as PF_Address,
pf.Age as PF_Age,
dx.Name as D_Name,
dx.Address as D_Address,
dx.Age as D_Age
from
(
select
ID, left(ID, 5) as IDNum, Name, Address, Age
from
mytable
where
right(ID, 3) = '-PF'
) pf
left outer join
(
select
ID, left(ID, 5) as IDNum, Name, Address, Age
from
mytable
where
right(ID, 3) != '-PF'
) dx
on pf.IDNum = dx.IDNum
SqlFiddle demo: http://sqlfiddle.com/#!6/dfdbb/1
SELECT t1.ID, LEFT(t1.ID,5) "ID (without Suffix)",
t1.Name "PF_Name", t1.Address "PF_Address", t1.Age "PF_Age",
t2.Name "D_Name", t2.Address "D_Address", t2.Age "D_Age"
FROM PFTable t1
LEFT JOIN PFTable t2 on LEFT(t1.ID,5) = LEFT(t2.ID,5)
WHERE RIGHT(t1.ID,2) = 'PF'