How to combine tables in a (fairly complicated) BigQuery Query?

How to combine tables in a (fairly complicated) BigQuery Query? - sql

So I have a SQL Query that pulls the IP Addresses from the Google Workspace activity logs in BigQuery and then transforms them to Geolocation data.
This all works perfectly fine. However, I also want to pull in the email address of the users as well, and I can't figure out the best way to achieve this.
The email address is available in the table workspace-data.Logs.activity
Example to get the email address:
SELECT
DISTINCT(email)
FROM
`workspace-data.Logs.activity`
The IP Address is in the same table as email, accessed via ip_address
Here is my current SQL Query:
WITH
source_of_ip_addresses AS (
SELECT
REGEXP_REPLACE(SAFE_CONVERT_BYTES_TO_STRING(ip_address), 'xxx', '0') ip,
COUNT(1) c
FROM
`workspace-data.Logs.activity`
WHERE
SAFE_CONVERT_BYTES_TO_STRING(ip_address) IS NOT NULL
GROUP BY
1 )
SELECT
city_name,
country_name,
country_iso_code,
SUM(c) c,
ST_GEOGPOINT(AVG(longitude), AVG(latitude)) point
FROM (
SELECT
ip,
city_name,
country_name,
country_iso_code,
c,
latitude,
longitude,
geoname_id
FROM (
SELECT
*,
NET.SAFE_IP_FROM_STRING(ip) & NET.IP_NET_MASK(4,
mask) network_bin
FROM
source_of_ip_addresses,
UNNEST(GENERATE_ARRAY(9,32)) mask
WHERE
BYTE_LENGTH(NET.SAFE_IP_FROM_STRING(ip)) = 4 )
JOIN
`fh-bigquery.geocode.201806_geolite2_city_ipv4_locs`
USING
(network_bin,
mask) )
WHERE
city_name IS NOT NULL
GROUP BY
city_name,
country_name,
country_iso_code,
geoname_id
ORDER BY
c DESC
What would be the best way to incorporate the email addresses for the users alongside the transformed Geolocation data?
Currently, the query will output the following table:
Row
city_name
country_name
country_iso_code
c
point
1
Mountain View
United States
US
50639
POINT(-122.0574 37.4192)
2
Houston
United States
US
24671
POINT(-95.3454 29.9668)
3
Jacksonville
United States
US
6717
POINT(-81.6236 30.19675)
But I would like for it to output:
Row
Email
city_name
country_name
country_iso_code
c
point
1
bob#someaddress.com
Mountain View
United States
US
165
POINT(-122.0574 37.4192)
2
frodo#someaddress.com
Houston
United States
US
134
POINT(-95.3454 29.9668)
3
darth#someaddress.com
Jacksonville
United States
US
292
POINT(-81.6236 30.19675)

Related

How to get country wise max populated city with population count

I am currently searching for a SQL query that does the following:
I have a table of all cities worldwide with their countries and population.
e.g. the table "city" (some columns)
name
country
population
Berlin
Germany
3640000
New York
USA
8419000
Hamburg
Germany
1841000
Los Angeles
USA
3967000
I know need to find the city with the city with the highest population per country.
e.g. the desired result
name
population
country
Berlin
3640000
Germany
New York
8419000
USA
The problem is that this query:
SELECT name, MAX(population) FROM city GROUP BY country
wouldn't return the appropriate name of the city. I know why that happens but am not sure how I could solve it in another way.
Any help is appreciated! Thanks!

ANSI SQL solution using subquery almost for any rdbms:
create table city (name varchar(50),country varchar(50), population int);
insert into city values('Berlin' ,'Germany', 3640000);
insert into city values('New York' ,'USA', 8419000);
insert into city values('Hamburg' ,'Germany', 1841000);
insert into city values('Los Angeles' ,'USA', 3967000);
Query:
select name, population, country from city c
where population=(select max(population ) from city a where a.country=c.country)
Output:
name
population
country
Berlin
3640000
Germany
New York
8419000
USA
db<fiddle here

you need to tag your dbms , but this works in most of databases including mariadb , using window function will ovoid hitting city table twice:
select * from
(
select * , row_number() over (partition by country order by population desc) rn
from city
) t
where rn = 1

you can use "NOT EXISTS"
SELECT * from country a
where not exists
(
select 1 from country b
where a.country = b.country
and b.population > a.population
)

select * from city where (country, population) in
(
select country, max(population) from city group by country
);

sql, select only such rows that share a predefined value

I am trying to select such rows that have the same specific value in one of the columns. For example, in the table below there are airlines that fly to different cities. I need to select only such airlines that fly exclusively to the usa. In the table below that would be only the airline2. The city is basically not important for the moment.
airline country_destination city_destination
airline1 usa washington
airline1 eng london
airline1 fra paris
airline2 usa new york
airline2 usa chicago
airline2 usa washington
airline3 can montreal
airline3 usa new york
airline3 can toronto
My first guess returns all the airlines, because in every of them the usa appears at least once.
select distinct airline from table1 where country_destination = 'usa'
I assume I need a nested 'select' and probably a group by airlines? Somewhere directionof what I have below? But I am stuck at this point. Any help is highly appreciated!
select airline, country_destination
from (select airline, country_destination from table1 where country_destination = 'usa' group by airline)

You do need to aggregate.
This is the simplest way I know of to do it:
select airline
from table1
group by airline
having min(country_destination) = max(country_destination)
and min(country_destination) = 'usa';

One method is to check if the airline's row count matches the conditional count (in this case, the amount of rows where the destination is 'usa').
The CTE aggregates the airline data. In the SELECT-statement you can apply the filter to only include airlines where the total row count equals the row count for destination 'usa'. If the counts between count_all and count_usa differ you know there were other country destinations.
with counts as ( select airline,
count(*) as count_all,
sum(case when country_destination == 'usa' then 1 end) as count_usa
from table1
group by 1 )
select airline
from counts
where count_all = count_usa;

you can use below with inner query-
select * from
(select distinct airline, country_destination from table1 ) t
group by airline
having count(airline) = 1 AND country_destination='usa';

return max purchase from column and group by country?

I am looking to group by country and find the max purchase
related to that country.
SELECT
country ,
customer_name,
total_purchased
FROM total
GROUP BY 1,2
ORDER BY 1
output:
country customer_name total_purchased
Australia Diego Gutiérrez 39.6
Australia Mark Taylor 81.18
Austria Astrid Gruber 69.3
Belgium Daan Peeters 60.3899999
Brazil Alexandre Rocha 69.3
Brazil Eduardo Martins 60.39
I am looking for a way to return the best customer of the country. Best customer means the person of the country who spend more money.
Eg: In australia there are two person I want the table with one australia and a customer with max purchase. How can I do it? I tried but I couldn't figure out a way so far.
Desired output:
country customer_name total_purchased
Australia Mark Taylor 81.18
Austria Astrid Gruber 69.3
Belgium Daan Peeters 60.3899999
Brazil Eduardo Martins 60.39

I don't know which version of SQL are you using but analytics functions can help you although are not available in old MySQL versions.
If you CAN use analytics functions, something like this might work:
SELECT country,
customer_name,
total_purchased
FROM (
SELECT
country,
customer_name,
total_purchased,
RANK() OVER (PARTITIONED BY country ORDER BY total_purchased DESC) as rank
FROM total
) a
WHERE rank = 1
If you can't use analytics functions you can do something like this:
SELECT a.country,
a.customer_name,
a.total_purchased
FROM total a
JOIN (
SELECT
country,
MAX(total_purchased) AS max_purchased
FROM total
GROUP BY country
) b
ON a.country = b.country
AND a.total_purchased = b.max_purchased
This query should have the expected result. In case 2 customers have the same total_purchased value. The result will show both customers.

Getting sub data from a list of facilities

I am trying to write a query and would like some help if possible. Thanks in advance.
I have a table of facility data (~100k rows) that I am getting from a public source. That data contains several records for what I would consider to be the same place (same name, city, state), they just have different suite numbers. The other interesting bit of code is that I have a selection counter on the data that I increment anytime someone chooses one of the facilities. This way, I can use the selection count along with some other weight calculations to make results show higher in a list.
What I am trying to do is write a query that when someone enters a search query, it will show only one record for the facility, the one with the highest selection count, and omit the rest.
Note: I do not want to do any preprocessing to the data as it is going to get re-loaded monthly.
Scheama:
ID
Name
Address 1
Address 2
City
State
Zip
Phone
Selection Count
Example Search: "women"
ID Name City State Selection Count
1 Brigham & Women's Hospital Boston MA 22
2 Brigham & Women's Hospital Cambridge MA 0
3 Brigham & Women's Hospital Boston MA 5
4 Brigham & Women's Hospital Boston MA 1
5 Brigham & Women's Hospital Orlando FL 3
6 Woman's Hospital of Detroit Detroit MI 100
7 Brigham & Women's Hospital Boston MA 0
8 Woman's Hospital of Detroit Detroit MI 55
What I'd like is a resultset that contains 1, 2, 5, 6
1,3,4,7 Are the same so bring back the top selection count. Same for 6 and 8.
I am sure that there is a having and a top clause in here somewhere, but I have not been able to get this to do what I want.
Thoughts?

How about
select id, name, city, state, selcount from t
where exists
(
select 1 from
(select name, city, state, max(selcount) selcount
from t
group by name, city, state) s
where s.name = t.name and s.city = t.city and s.state = t.state and s.selcount = t.selcount
)
I've built a SQL Fiddle for this to show a working example.

WITH cteRowNum AS (
SELECT ID, Name, City, State, [Selection Count],
ROW_NUMBER() OVER(PARTITION BY Name, City, State ORDER BY [Selection Count] DESC) AS RowNum
FROM YourTable
)
SELECT ID, Name, City, State, [Selection Count]
FROM cteRowNum
WHERE RowNum = 1;

How to join different columns of same table?

Suppose I have one table with two column, Country and City.
Country
USA
Canada
UK
City
NY
London
I want to join/merge both column records and expect the output like this -
USA
Canada
UK
NY
London
So, what will be the SQL query to merge different columns records of same table?

SELECT Country FROM TABLE
UNION
SELECT City FROM Table
should do it.

Responding to the comment "I am searching for any quick way. Because if I need to merge 10 columns then i have to write 10 Unions! Is there any other way?":
You can use an unpivot, which means you just need to add the column names into a list. Only thing is to watch for data types though. eg:
--CTE for example only
;WITH CTE_Locations as (
select Country = convert(varchar(50),'USA'), City = convert(varchar(50),'NY')
union select Country = 'Canada', City = 'Vancouver'
union select Country = 'UK', City = 'Manchester'
)
--Select a list of values from all columns
select distinct
Place
from
CTE_Locations l
unpivot (Place for PlaceType in ([Country],[City])) u

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas