How to left join and divide value by columns matched in bigquery?

How to left join and divide value by columns matched in bigquery? - google-bigquery

I have 2 tables one contains utms and the other is cost.
I would like to join on the medium and network and location and divide the joined value using medium and network amongst the matched items locations. For example
network
medium
location
facebook
CPC
tokyo
facebook
CPC
Singapore
facebook
CPC
tokyo
facebook
CPC
Malaysia
google ads
CPC
singapore
google ads
CPC
maldives
network
medium
cost
facebook
cpc
4
google ads
cpc
4
into
network
medium
location
cost
facebook
CPC
tokyo
1
facebook
CPC
Singapore
1
facebook
CPC
tokyo
1
facebook
CPC
Malaysia
1
google ads
CPC
singapore
2
google ads
CPC
maldives
2
so the cost divides itsef amongst the location that matches network and medium

consider below approach
select u.*, cost / count(*) over(partition by u.network, u.medium) as cost
from utms u
left join costs c
on lower(u.network) = lower(c.network)
and lower(u.medium) = lower(c.medium)
if applied to sample data in your question - output is

Related

How to assign equal revenue weight to every location of a company in a table? Google Big Query

I am working on a problem where I have the following table:
+----------+ | +------+ | +------------+
company_id | country | total revenue
1 Russia 1200
2 Croatia 1200
2 Italy 1200
3 USA 1200
3 UK 1200
3 Italy 1200
There are 3 companies in this table, but company '2' and company '3' have offices in 2 and 3 countries respectively. All companies pay 1200 per month, and because company 2 has 2 offices it shows as if they paid 1200 per month 2 times, and because company 3 has 3 offices it shows as if it paid 1200 per month 3 times. Instead, I would like revenue to be equally distributed based on how many times company_id appears in the table. company_id will only appear more than once for every additional country in which a company is based.
Assuming each company always pays 1,200 per month, my desired output is:
+----------+ | +------+ | +------------+
company_id | country | total revenue
1 Russia 1200
2 Croatia 600
2 Italy 600
3 USA 400
3 UK 400
3 Italy 400
Being new to SQL, I was thinking this can maybe be done through CASE WHEN statement, but I only learned to use CASE WHEN when I want to output a string depending on a condition. Here, I am trying to assign equal revenue weight to each company's country, depending on in how many countries a company is based in.
Thank you in advance for you help!

Below is for BigQuery Standard SQL
#standardSQL
SELECT company_id, country,
total_revenue / (COUNT(1) OVER(PARTITION BY company_id)) AS total_revenue
FROM `project.dataset.table`
If to apply to sample data from your question - output is
Row company_id country total_revenue
1 1 Russia 1200.0
2 2 Croatia 600.0
3 2 Italy 600.0
4 3 USA 400.0
5 3 UK 400.0
6 3 Italy 400.0

turicreate visualizations (Google Collab environment) - SFrame.explore response "Materializing SFrame"

In a very simple workbook load data into an Sframe named "Students".
When I execute "Students" I get the expected results (just cut and pasted not actual results)
First Name Last Name Country age
Bob Smith United States 24
Alice Williams Canada 23
Malcolm Jone England 22
Felix Brown USA 23
Alex Cooper Poland 23
Tod Campbell United States 22
Derek Ward Switzerland 25
[7 rows x 4 columns]
But when I enter "Students.explore()" I get the results
"Materializing SFrame"
I expected a GUI with a rich display describing the data. This is what I get when I use graphlab.create in a non - Google Collaboratory workbook.
Below is the code description and link to the turicreate API help.
"SFrame.explore([title]) Explore the SFrame in an interactive" GUI.https://apple.github.io/turicreate/docs/api/turicreate.visualization.html

Google Colab is run in the Cloud. So, it can't open a new app window on your computer.
You may want to try Local Runtime

Aggregating against rest of the values

Hi I need help analyzing below data. The logic I need is sum of each provider should be divided by rest of the providers. For example based on below data each sum(provider) should be dived by sum( rest of providers)
sum(east RISK)/sum(west Risk)+sum(south RISK)
sum(west RISK)/sum(east RISK)+sum(south RISK)
sum(south RISK)/sum(east RISK)+sum(west RISK)
and so on....
....
....
Mbr Provider Group Risk
1 east Group 2.44
2 east Group 0.05
3 east Group 1.01
4 east Group 0.14
5 west Comp MRKT 0.32
6 west Comp MRKT 2.12
7 south Comp MRKT 5.78
8 south Comp MRKT 1.11

I think you can use ANSI standard window functions for this purpose:
select provider,
(sum(risk) / (sum(sum(risk)) over () - sum(risk))
)
from t
group by provider;

join over two tables

I have two tables (T1 and T2).
shop land
------------------
1 F24UK UK
2 MDUK UK
3 RDAUK UK
4 EDOUK UK
5 RDUK UK
6 TIUK UK
shop land customertype
---------------------------------
1 RDUK GB B2C
2 RDUK GB B2C
3 MDUK GB B2C
4 MDUK GB B2C
I want to join over column 'land'. But broblem is in t2 i have GB instead of UK. How can i solve this issue optimally?
Thanks

You can use a CASE expression:
t1.land=CASE WHEN t2.land='GB' THEN 'UK' ELSE t2.land END

SOLR index implementation and configuration to support different pricing for millions of records

I have some knowledge using SOLR/Lucene, but really never worked extensively.
We have multiple 6 ecommerce web sites, where each B2B customer gets his/her own pricing for thousands of products sold on the sites. Some customers even get their own product descriptions and some don't, but should be able to search either way.
We are planning to replace 3rd party search with SOLR and would like to know how to set it up, so that customer A won't get customer B data(products, descriptions, and/or pricing), and so on. I am not sure if we need to one index per customer or large index with a unique token per customer's records when indexing the data.
Current records size: close to 30 million combinations for 1000s of customers. Simply assume that each customer has their own products, pricing, and descriptions. Also pricing, manufacturer and many other custom facets, so the customer can drill down to what they are looking for.
These are B2B sites where each customer get their own pricing and custom names/descriptions of the products sold.
Example scenario:
item_number name price
----------- ------- -------
123 Brush $1.00 -- customer A might call this as 'MyBrush' and customer B as 'TomBrush'
234 shirt -- $20.00
112 pencil -- $1.50
Customer A pricing and descriptions.
item_number name price
----------- ------- -------
123 MyBrush $1.00
234 shirt $20.00
112 pencil $1.50
Customer B pricing and descriptions.
item_number name price
----------- ------ -------
123 TomBrush $1.10
234 shirt $23.00
112 pencil $1.70
Customer C pricing and descriptions.
item number name
123 CBrush -- $1.11
234 shirt -- $13.00
112 pencil -- $2.70
and so on for thousands of customers, which result in 30 million pricing records, but customer A shouldn't se customer Customer B items and pricing.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas