How to group rows overwriting column value in Big Query SQL - sql

I wanted to know how to group the rows maintaining the value form one the columns as following:
tabla_1
url
tags
pvs
www.helloworld.com
bigquery,sql
200
www.helloworld.com
-
100
www.byeworld.com
python,java
250
www.byeworld.com
-
150
and the desired result:
url
tags
pvs
www.helloworld.com
bigquery,sql
300
www.byeworld.com
python, java
400
I have tried creating two different tables (using filters) and then joining them. I supose there´s a much easier way but I´m not able to find it:
SELECT url, tags, sum(pvs)
FROM
(
SELECT * FROM tabla_1 WHERE tags != '-'
LEFT JOIN
(SELECT * FROM tabla_1 WHERE tags '-')
ON url
)

You can convert the - to NULL and get the non-null value with ARRAY_AGG function
SELECT
url,
SUM(pvs) AS total_pvs,
ARRAY_AGG(IF(tags = '-', NULL, tags) IGNORE NULLS)[OFFSET(0)] AS tags
FROM table_1
GROUP BY url

Related

Remove duplicate reverse pairs

I am making a table of all product SKUs (fj1_sku column) paired with the product SKU of a second product which is a close substitute to that product (fj2_sku column) based on a range of criteria. To get the main matrix I have done a cross join of the final_join table with itself.
I cannot figure out how to remove the mirror duplicate pairs in the resulting table:
fj1_sku
fj2_sku
10
11
11
10
i.e. I only want one row with the pair 11 & 10.
I have tried a number of solutions found on this platform, but do not want to use a "select distinct" because I need many variables that I cannot use a distinct with.
Here is the code I have so far:
select
fj1.sku as fj1_sku,
fj2.sku as fj2_sku,
fj1.name as fj1_name,
fj2.name as fj2_name,
fj1.brand as fj1_brand,
fj2.brand as fj2_brand
from final_join as fj1
cross join final_join as fj2
where (fj1.sku <> fj2.sku) and (fj1.brand <> fj2.brand)
If you have duplicates for all pairs, then you can use:
select t.*
from t
where fj1_sku < fj2_sku;
If not, you can use be more selective:
select t.*
from t
where t.fj1_sku < t.fj2_sku or
not exists (select 1
from t t2
where t2.fj1_sku = t.fj2_sku and
t2.fj2_sku = t.fj1_sku
);
You can also incorporate similar logic into a delete, if you actually want to change the table.

Extract values from repeated columns in an array with BigQuery

Each array consists of information about which list (internal_list_id) does a certain contact belong to (vid).
I'm trying to include all internal_list_id (separated by comma) in one column grouped by vid.
The end data should like something like:
ContactID | ListMembership:
3291601 1058,1060
I've tried with the below code but it returns information about the first object only:
SELECT list_memberships[offset(1)].vid ContactId, list_memberships[offset(1)].internal_list_id ListMembership FROM hs.contacts as c
The below results is achieved via:
SELECT list_memberships FROM hs.contacts as c
P.S. If you have any suggestions for better a title please let me know. Thanks!
Use STRING_AGG(x) FROM UNNEST(array), like in:
WITH data AS (
SELECT visitStartTime, hits[OFFSET(0)].product
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
LIMIT 100
)
SELECT visitStartTime, (
SELECT STRING_AGG(FORMAT('$%i', localProductPrice), ', ')
FROM UNNEST(product)
) aggregated
FROM data

Querying Column Headers in GBQ

Is it possible to do a query to provide me an output with the column headers of a specific table? I'm uploading multiple files into our server via GBQ and while it auto-detects the headers, I would like to list out the headers either in rows or as a comma separated cell.
Thank you
I am assuming your files are in CSV format so schema of table does not have repeated fields. With this in mind - below is for BigQuery Standard SQL and requires just fully qualified table name
#standardSQL
SELECT
REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(.+?)"') cols_as_array,
ARRAY_TO_STRING(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(.+?)"'), ',') cols_as_string
FROM (SELECT 1) LEFT JOIN
(SELECT * FROM `project.dataset.table` WHERE FALSE) t
ON TRUE
If to apply to some real table as in below example
#standardSQL
SELECT
REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(.+?)"') cols_as_array,
ARRAY_TO_STRING(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r'"(.+?)"'), ',') cols_as_string
FROM (SELECT 1) LEFT JOIN
(SELECT * FROM `bigquery-public-data.utility_us.us_states_area` WHERE FALSE) t
ON TRUE
result will be
Row cols_as_array cols_as_string
1 region_code region_code,division_code,state_fips_code,state_gnis_code,state_geo_id,state_abbreviation,state_name,legal_area_code,feature_class_code,functional_status_code,area_land_meters,area_water_meters,internal_point_lat,internal_point_lon,state_geom
division_code
state_fips_code
state_gnis_code
state_geo_id
state_abbreviation
state_name
legal_area_code
feature_class_code
functional_status_code
area_land_meters
area_water_meters
internal_point_lat
internal_point_lon
state_geom
You can choose which version to use: list as array or list as comma separated string
Also note, above query does not incur any cost at all!

Completely Unique Rows and Columns in SQL

I want to randomly pick 4 rows which are distinct and do not have any entry that matches with any of the 4 chosen columns.
Here is what I coded:
SELECT DISTINCT en,dialect,fr FROM words ORDER BY RANDOM() LIMIT 4
Here is some data:
**en** **dialect** **fr**
number SFA numero
number TRI numero
hotel CAI hotel
hotel SFA hotel
I want:
**en** **dialect** **fr**
number SFA numero
hotel CAI hotel
Some retrieved rows would have something similar with each other, like having the same en or the same fr, I would like to retrieved rows that do not share anything similar with each other, how do I do that?
I think I’d do this in the front end code rather the dB, here’s a pseudo code (don’t know what your node looks like):
var seenEn = “en not in (''“;
var seenFr = “fr not in (''“;
var rows =[];
while(rows.length < 4)
{
var newrow = sqlquery(“SELECT *
FROM table WHERE “ + seenEn + “) and ”
+ seenFr + “) ORDER BY random() LIMIT 1”);
if(!newrow)
break;
rows.push(newrow);
seenEn += “,‘“+ newrow.en + “‘“;
seenFr += “,‘“+ newrow.fr + “‘“;
}
The loop runs as many times as needed to retrieve 4 rows (or maybe make it a for loop that runs 4 times) unless the query returns null. Each time the query returns the values are added to a list of values we don’t want the query to return again. That list had to start out with some values (null) that are never in the data, to prevent a syntax error when concatenation a comma-value string onto the seenXX variable. Those syntax errors can be avoided in other ways like having a Boolean of “if it’s the first value don’t put the comma” but I chose to put dummy ineffective values into the sql to make the JS simpler. Same goes for the
As noted, it looks like JS to ease your understanding but this should be treated as pseudo code outlining a general algorithm - it’s never been compiled/run/tested and may have syntax errors or not at all work as JS if pasted into your file; take the idea and work it into your solution
Please note this was posted from an iphone and it may have done something stupid with all the apostrophes and quotes (turned them into the curly kind preferred by writers rather than the straight kind used by programmers)
You can use Rank or find first row for each group to achieve your result,
Check below , I hope this code will help you
SELECT 'number' AS Col1, 'SFA' AS Col2, 'numero' AS Col3 INTO #tbl
UNION ALL
SELECT 'number','TRI','numero'
UNION ALL
SELECT 'hotel','CAI' ,'hotel'
UNION ALL
SELECT 'hotel','SFA','hotel'
UNION ALL
SELECT 'Location','LocationA' ,'Location data'
UNION ALL
SELECT 'Location','LocationB','Location data'
;
WITH summary AS (
SELECT Col1,Col2,Col3,
ROW_NUMBER() OVER(PARTITION BY p.Col1 ORDER BY p.Col2 DESC) AS rk
FROM #tbl p)
SELECT s.Col1,s.Col2,s.Col3
FROM summary s
WHERE s.rk = 1
DROP TABLE #tbl

Calculate number of fields

I have three tables:
Article(idArticle,NameArt)
Tag(idTag, NameTag)
ArtiTag(idArticle,idTag)
I want to have a result like this: NameTag,Count(Articles that belongs to that tag)
I tried the following:
SELECT Tag.NameTag , COUNT(DISTINCT(idArticle))
FROM ArtiTag, ArtiTag
but it returns always only one row, even if I have many tags and many articles related
SELECT t.NameTag, COUNT(*)
FROM ArtiTag at
INNER JOIN Tag t
ON at.idTag = t.idTag
GROUP BY t.NameTag;
Select T.idTag, Max(nametag), count(artitag.idArticle) from Tag t
left join ArtiTag on t.idTag=ArtiTag.idTag
Group by t.idTag
This query outputs all tags including also tags with 0 articles.