Regression with Big Query ML

Regression with Big Query ML - sql

I tried a linear regression with Big Query.
therefor I used test data:
nr1 nr2 x
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
10 10 10
11 11 11
12 12 12
With the following query i created a model.
CREATE MODEL `regression_model_9`
OPTIONS
(model_type='linear_reg',
input_label_cols=['x']) AS
SELECT
nr1,
nr2,
x
FROM
`reg_test`
After that I evaluate the model and want to make a prediction, like described here:
https://cloud.google.com/bigquery/docs/bigqueryml-analyst-start
So what I have to do to get predict a 13?
With the following I get "Query returned zero records.....
SELECT
x
FROM
ML.PREDICT(MODEL `regression_model_9`,
(
SELECT
x,
nr1,
nr2
FROM
`reg_test`
where nr1=13
))

... what I have to do to get predict a 13?
#standardSQL
SELECT *
FROM ML.PREDICT(MODEL `yourproject.yourdataset.regression_model_9`,
(SELECT 13 nr1, 13 nr2))
with result as something like below
Row predicted_x nr1 nr2
1 12.999999982559942 13 13

Related

the 'combine' of a split-apply-combine in pd.groupby() works brilliantly, but I'm not sure why

I have a fragment of code similar to below. It works perfectly, but I'm not sure why I am so lucky.
The groupby() is a split-apply-combine operation. So I understand why the qf.groupby(qf.g).mean() returns a series with two rows, the mean() for each of a,b.
And what's brilliant is that -combine step of the qf.groupby(qf.g).cumsum() reassembles all the rows into their original order as found in the starting df.
My question is, "Why am I able to count on this behavior?" I'm glad I can, but I cannot articulate why it's possible.
#split-apply-combine
import pandas as pd
#DF with a value, and an arbitrary category
qf= pd.DataFrame(data=[x for x in "aaabbaaaab"], columns=['g'])
qf['val'] = [1,2,3,1,2,3,4,5,6,9]
print(f"applying mean() to members in each group of a,b ")
print ( qf.groupby(qf.g).mean() )
print(f"\n\napplying cumsum() to members in each group of a,b ")
print( qf.groupby(qf.g).cumsum() ) #this combines them in the original index order thankfully
qf['running_totals'] = qf.groupby(qf.g).cumsum()
print (f"\n{qf}")
yields:
applying mean() to members in each group of a,b
val
g
a 3.428571
b 4.000000
applying cumsum() to members in each group of a,b
val
0 1
1 3
2 6
3 1
4 3
5 9
6 13
7 18
8 24
9 12
g val running_totals
0 a 1 1
1 a 2 3
2 a 3 6
3 b 1 1
4 b 2 3
5 a 3 9
6 a 4 13
7 a 5 18
8 a 6 24
9 b 9 12

SQLite: How to create a combination of unrelated elements of two queries?

I have one table that I need to get some metrics from.
For example I have the following table:
meas_count
skippings
links
extra
10
8
4.2
some
10
9
5.8
some
10
9
5.8
some_2
11
8
4.2
some
11
8
5.8
some
11
9
5.9
some
I need to get a view of an existing table in the following form for further work:
meas_count
skippings
links_min
links_max
10
8
0
4
10
8
4
5
10
8
5
6
10
9
0
4
10
9
4
5
10
9
5
6
11
8
0
4
11
8
4
5
11
8
5
6
11
9
0
4
11
9
4
5
11
9
5
6
At the moment I have 2 queries, the results of which I need to combine to get the result I need.
First request:
SELECT meas_count,skippings FROM current_stats GROUP BY meas_count,skippings
Creates the following:
meas_count
skippings
10
8
10
9
11
8
11
9
Second request:
SELECT
LAG(rounded) OVER (ORDER BY rounded) as links_min,
rounded as links_max FROM
(SELECT * FROM
(SELECT ROUND(links, 1) as rounded FROM current_stats)
GROUP BY rounded ORDER BY rounded)
Creates the following:
links_min
links_max
NULL
4
4
5
5
6
I need something like result of sets multiplication...
What query should be executed to get the table of the view I need as a result?
I also have an additional question: is the execution of the second query slowed down due to several SELECTs inside?

You can do that by doing an INNER JOIN on the two tables without specifying a join condition. That will give you every combination of the two sets of rows.
SELECT * FROM
(
SELECT meas_count,skippings
FROM current_stats
GROUP BY meas_count,skippings)
AS one
INNER JOIN
(
SELECT LAG(rounded) OVER (ORDER BY rounded) as links_min,
rounded as links_max FROM
(SELECT * FROM
(SELECT ROUND(links, 1) as rounded FROM current_stats)
GROUP BY rounded
ORDER BY rounded
)
) AS two;
As for performance, that's really only an issue if there is a better way to do it. Of course nested SELECTs take time, but the query optimizers in today's SQL engine are pretty good at determining what you MEANT from what you SAID.

Convert column into the rows

This is my current result set of my query:
Question Sol25A Sol25B Sol25C Sol40A Sol40B
======================================================
A 1 4 2 6 0
B 2 3 2 1 9
C 6 7 1 0 8
======================================================
Total = 9 14 5 7 17
======================================================
And I want the result in this form:
Product Total
===============
Sol25A 9
Sol25B 14
Sol25C 5
Sol40A 7
Sol40B 17
Can you please provide me the query for me, this will be the great help for me.

I would suggest that you unpivot using cross apply and then aggregate:
select product, sum(val)
from t cross apply
(values ('Sol25A', Sol25A), ('Sol25B', Sol25B), ('Sol25C', Sol25C),
('Sol40A', Sol40A), ('Sol40B', Sol40B)
) v(product, val)
group by product;

Using temporary extended table to make a sum

From a given table I want to be able to sum values having the same number (should be easy, right?)
Problem: A given value can be assigned from 2 to n consecutive numbers.
For some reasons this information is stored in a single row describing the value, the starting number and the ending number as below.
TABLE A
id | starting_number | ending_number | value
----+-----------------+---------------+-------
1 2 5 8
2 0 3 5
3 4 6 6
4 7 8 10
For instance the first row means:
value '8' is assigned to numbers: 2, 3 and 4 (5 is excluded)
So, I would like the following intermediairy result table
TABLE B
id | number | value
----+--------+-------
1 2 8
1 3 8
1 4 8
2 0 5
2 1 5
2 2 5
3 4 6
3 5 6
4 7 10
So I can sum 'value' for elements having identical 'number'
SELECT number, sum(value)
FROM B
GROUP BY number
TABLE C
number | sum(value)
--------+------------
2 13
3 8
4 14
0 5
1 5
5 6
7 10
I don't know how to do this and didn't find any answer on the web (maybe not looking with appropriate key words...)
Any idea?

You can do what you want with generate_series(). So, TableB is basically:
select id, generate_series(starting_number, ending_number - 1, 1) as n, value
from tableA;
Your aggregation is then:
select n, sum(value)
from (select id, generate_series(starting_number, ending_number - 1, 1) as n, value
from tableA
) a
group by n;

Postgresql: Merge multiple geometries into single geometry using Join

Say I have two tables, geom_levels and taz_geoms where taz_geoms has the columns as follows:
taz_geoms
id(int) state(int) county(int) taz(int) geom(geometry(MultiPolygon,4326))
and geom_levels looks like this:
geom_levels
TAZ COUNTY STATE DISTRICT
1 1 29 1
2 1 29 1
3 1 29 1
4 2 29 2
5 2 29 2
6 2 29 2
7 2 29 3
8 3 29 3
9 3 29 3
10 3 29 4
11 3 29 4
12 3 29 4
13 4 29 5
14 4 29 5
15 4 29 5
16 4 29 6
17 4 29 6
How would I go about combining these taz geometries into county, state, and district geometries? I would like to have a county_geoms, state_geoms, and district_geoms table. I have see that you can use ST_UNION with a geom array, but how would I generate such an array for counties or districts?
I was thinking something like this for counties:
SELECT ST_UNION(SELECT geom from taz_geoms GROUP BY county);
and for districts:
SELECT ST_UNION(SELECT geom from taz_geoms t LEFT JOIN geom_levels gl ON gl.taz = t.taz GROUP BY district);
But those options do not see possible.
Ideas?

try with:
SELECT ST_UNION( ARRAY( 'YOUR SELECT geoms QUERY' ) );
in your case:
SELECT ST_UNION(ARRAY( (SELECT geom from taz_geoms t LEFT JOIN geom_levels gl ON gl.taz = t.taz GROUP BY district) ));
I had the same problem and got it to work with postgre usin the ARRAY() function ;)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Regression with Big Query ML - sql

... what I have to do to get predict a 13? #standardSQL SELECT * FROM ML.PREDICT(MODEL `yourproject.yourdataset.regression_model_9`, (SELECT 13 nr1, 13 nr2)) with result as something like below Row predicted_x nr1 nr2 1 12.999999982559942 13 13

Related

the 'combine' of a split-apply-combine in pd.groupby() works brilliantly, but I'm not sure why

SQLite: How to create a combination of unrelated elements of two queries?

Convert column into the rows

Using temporary extended table to make a sum

Postgresql: Merge multiple geometries into single geometry using Join

Categories

Resources