What does Google BigQuery's string "t0" mean? - google-bigquery

I've trying to understand GoogleBigQuery and I've seen this in a Query : AS t0
I also see t0 attached to some metrics or dimension like this t0.postId
Here is the full query I'm trying to understand :
SELECT t0.Author, COUNT(DISTINCT t0.postId, 50000) AS t0.calc_FPB538 FROM (SELECT
MAX(IF (hits.customDimensions.index = 10, hits.customDimensions.value, NULL)) WITHIN RECORD AS postId,
date(MAX(IF (hits.customDimensions.index = 4, hits.customDimensions.value, NULL))) WITHIN RECORD AS Datepublished,
MAX(IF (hits.customDimensions.index = 1, hits.customDimensions.value, NULL)) WITHIN RECORD AS Country,
MAX(IF (hits.customDimensions.index = 7, hits.customDimensions.value, NULL)) WITHIN RECORD AS Author,
FROM
[My_data.ga_sessions_20161104]) AS t0 WHERE (STRFTIME_UTC_USEC(TIMESTAMP_TO_USEC(TIMESTAMP(STRING(t0.Datepublished))), '%Y%m%d') >= '20161102' AND STRFTIME_UTC_USEC(TIMESTAMP_TO_USEC(TIMESTAMP(STRING(t0.Datepublished))), '%Y%m%d') <= '20161108') GROUP EACH BY t0.Author ORDER BY t0.calc_FPB538 DESC
What does it mean, how should I use it ?
Thanks.

I think you really need to find a tutorial on basic sql/query terms and methods, but in general (and I'm going to use general terms like object as it applies whether table or not) when you see syntax like this:
[My_data.ga_sessions_20161104]) AS t0
You are saying look at this object/table [My_data.ga_session_20161104] and give it a label of t0 so I can reference columns/datapoints on that object. Then when you later see things like t0.postId you know that you are referencing [My_data.ga_sessions_20161104]. This way if you reference another similar table that has a datapoint/column of postId both you and the engine running the query knows what the heck you are talking about.
You can also label columns/data points as you see in your query with COUNT(DISTINCT t0.postId, 50000) AS t0.calc_FPB538 this is saying perform a count on the number of postId results and label it as t0.calc_FPB538 because I will want to reference it as such later (or you just like your resutls to have specific names).

Related

BigQuery Query working with multiple "likes" but not working with "in"

I would like to isolate some emails with specific titles. I can use multiple "like"s connected with an ORs in the where clause. This gives me a number of results. However, if I try to do a ____ in ('____', '____', etc), the code suddenly returns nothing.
This does not work.
select DATE_TRUNC(DATE(send_time,"America/Los_Angeles"), week(monday)) as week,
status,
settings_title,
sum(emails_sent) as emails_sent,
sum(report_summary_opens) as report_summary_opens,
sum(report_summary_unique_opens) as report_summary_unique_opens,
sum(report_summary_subscriber_clicks) as report_summary_subscriber_clicks
from mailchimp.campaigns_view
where status = 'sent'
and settings_title in ('%_LL_%', '%_IC_%', '%_AC_%', '%_CC_%', '%_PC_%')
group by 1,2,3
order by 1 desc
However, this works.
select DATE_TRUNC(DATE(send_time,"America/Los_Angeles"), week(monday)) as week,
status,
settings_title,
sum(emails_sent) as emails_sent,
sum(report_summary_opens) as report_summary_opens,
sum(report_summary_unique_opens) as report_summary_unique_opens,
sum(report_summary_subscriber_clicks) as report_summary_subscriber_clicks
from mailchimp.campaigns_view
where status = 'sent'
and (settings_title like '%_LL_%'
or settings_title like '%_IC_%'
or settings_title like '%_AC_%'
or settings_title like '%_CC_%'
or settings_title like '%_PC_%')
group by 1,2,3
order by 1 desc
I have already tried to include a subquery in my "from" that eliminates all null settings_title. Any ideas why this is not working? Am I missing some small syntax error?
Thanks for the help!
The % symbol will only work with LIKE. For IN it's only equality. Try REGEXP_CONTAINS too.
As in:
SELECT REGEXP_CONTAINS("abcdefg", '(xxx|zzz|yyy|cd)')
Thanks Felipe, very usefull!!!. In my case I used REGEXP_CONTAINS for matching with a multiple patterns added to a table. The select with the column "pattern_str" located in the second position is able to search and find correctly for every portion of the parttern:
WITH CTE_PatternCovid as (
Select STRING_AGG(Pattern,'|') as strPattern from xxxxxxxxx.TEMP.Temp_patternsearch_covid 
)
--this convert the multiple patterns into a single line:
--.*MASK.FFP.|.*MASK.KN.|.*TEST.ANTIG.|.*MASK.QUI.|.*SP.*H.DRO.AL.
--then use in this way:
Select ProductName FROM xxxxxxxxx.TEMP.Table_ProductsName_covid 
where
regexp_contains (upper(ProductName),(SELECT strPattern FROM CTE_PatternCovid ))

Access query with Count()

(Access 2010)
Can anyone tell the reason why column 2 and 3 retrieve exactly the same numbers? (they should not - different conditions and consequently different number of records)
Cannot see what's wrong...
SELECT tblAssssDB.[Division:], Count(([Mod New Outcome]="SELF EMPLOYED" And [ET Outcome]="GREY AREA")) AS [Undecided to Self-employed], Count(([Mod New Outcome]="EMPLOYED" And [ET Outcome]="GREY AREA")) AS [Undecided to Employed], Count(([Mod New Outcome]="SELF EMPLOYED" And [tblAssssDB].[ET Outcome]="EMPLOYED")) AS [Employed to Self-Employed], Count((IsNull([ET Outcome] And [ET Outcome]=[Mod New Outcome]))) AS [No change in Outcome]
FROM tblAssssDB
WHERE (((tblAssssDB.[ET Comment]) Is Not Null))
GROUP BY tblAssssDB.[Division:];
Help!!
Count is not really intended to be used to give you a quantity of specific outcomes. I think that you would be better served by using something like this
SUM(IIF(critiera, 1, 0))

SQL "Count (Distinct...)" returns 1 less than actual data shows?

I have some data that doesn't appear to be counting correctly. When I look at the raw data I see 5 distinct values in a given column, but when I run an "Count (Distinct ColA)" it reports 4. This is true for all of the categories I am grouping by, too, not just one. E.g. a 2nd value in the column reports 2 when there are 3, a 3rd value reports 1 when there are 2, etc.
Table A: ID, Type
Table B: ID_FK, WorkID, Date
Here is my query that summarizes:
SELECT COUNT (DISTINCT B.ID_FK), A.Type
FROM A INNER JOIN B ON B.ID_FK = A.ID
WHERE Date > 5/1/2013 and Date < 5/2/2013
GROUP BY Type
ORDER BY Type
And a snippet of the results:
4|Business
2|Design
2|Developer
Here is a sample of my data, non-summarized. Pipe is the separator; I just removed the 'COUNT...' and 'GROUP BY...' parts of the query above to get this:
4507|Business
4515|Business
7882|Business
7889|Business
7889|Business
8004|Business
4761|Design
5594|Design
5594|Design
5594|Design
7736|Design
7736|Design
7736|Design
3132|Developer
3132|Developer
3132|Developer
4826|Developer
5403|Developer
As you can see from the data, Business should be 5, not 4, etc. At least that is what my eyes tell me. :)
I am running this inside a FileMaker 12 solution using it's internal ExecuteSQL call. Don't be concerned by that too much, though: the code should be the same as nearly anything else. :)
Any help would be appreciated.
Thanks,
J
Try using a subquery:
SELECT COUNT(*), Type
FROM (SELECT DISTINCT B.ID_FK, A.Type Type
FROM A
INNER JOIN B ON B.ID_FK = A.ID
WHERE Date > 5/1/2013 and Date < 5/2/2013) x
GROUP BY Type
ORDER BY Type
This could be a FileMaker issue, have you seen this post on the FileMaker forum? It describes the same issue (a count distinct smaller by 1) with 11V3 back in 03/2012 with a plug in, then updated with same issue with 12v3 in 11/2012 with ExecuteSQL. It didn't seem to be resolved in either case.
Other considerations might be if there are any referential integrity constraints on the joined tables, or if you can get a query execution plan, you might find it is executing the query differently than expected. not sure if FileMaker can do this.
I like Barmar's suggestion, it would sort twice.
If you are dealing with a bug, directing the COUNT DISTINCT, Join and/or Group By by structuring the query to make them happen at different times might work around it:
SELECT COUNT (DISTINCT x.ID), x.Type
FROM (SELECT A.ID ID, A.Type Type
FROM A
INNER JOIN B ON B.ID_FK = A.ID
WHERE B.Date > 5/1/2013 and B.Date < 5/2/2013) x
GROUP BY Type
ORDER BY Type
you might also try replacing B.ID_FK with A.ID, who knows what context it applies, such as:
SELECT COUNT (DISTINCT A.ID), A.Type

Select first or random row in group by

I have this query using PostgreSQL 9.1 (9.2 as soon as our hosting platform upgrades):
SELECT
media_files.album,
media_files.artist,
ARRAY_AGG (media_files. ID) AS media_file_ids
FROM
media_files
INNER JOIN playlist_media_files ON media_files.id = playlist_media_files.media_file_id
WHERE
playlist_media_files.playlist_id = 1
GROUP BY
media_files.album,
media_files.artist
ORDER BY
media_files.album ASC
and it's working fine, the goal was to extract album/artist combinations and in the result set have an array of media files ids for that particular combo.
The problem is that I have another column in media files, which is artwork.
artwork is unique for each media file (even in the same album) but in the result set I need to return just the first of the set.
So, for an album that has 10 media files, I also have 10 corresponding artworks, but I would like just to return the first (or a random picked one for that collection).
Is that possible to do with only SQL/Window Functions (first_value over..)?
Yes, it's possible. First, let's tweak your query by adding alias and explicit column qualifiers so it's clear what comes from where - assuming I've guessed correctly, since I can't be sure without table definitions:
SELECT
mf.album,
mf.artist,
ARRAY_AGG (mf.id) AS media_file_ids
FROM
"media_files" mf
INNER JOIN "playlist_media_files" pmf ON mf.id = pmf.media_file_id
WHERE
pmf.playlist_id = 1
GROUP BY
mf.album,
mf.artist
ORDER BY
mf.album ASC
Now you can either use a subquery in the SELECT list or maybe use DISTINCT ON, though it looks like any solution based on DISTINCT ON will be so convoluted as not to be worth it.
What you really want is something like an pick_arbitrary_value_agg aggregate that just picks the first value it sees and throws the rest away. There is no such aggregate and it isn't really worth implementing it for the job. You could use min(artwork) or max(artwork) and you may find that this actually performs better than the later solutions.
To use a subquery, leave the ORDER BY as it is and add the following as an extra column in your SELECT list:
(SELECT mf2.artwork
FROM media_files mf2
WHERE mf2.artist = mf.artist
AND mf2.album = mf.album
LIMIT 1) AS picked_artwork
You can at a performance cost randomize the selected artwork by adding ORDER BY random() before the LIMIT 1 above.
Alternately, here's a quick and dirty way to implement selection of a random row in-line:
(array_agg(artwork))[width_bucket(random(),0,1,count(artwork)::integer)]
Since there's no sample data I can't test these modifications. Let me know if there's an issue.
"First" pick
Wouldn't it be simpler / cheaper to just use min():
SELECT m.album
,m.artist
,array_agg(m.id) AS media_file_ids
,min(m.artwork) AS artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
WHERE p.playlist_id = 1
GROUP BY m.album, m.artist
ORDER BY m.album, m.artist;
Abitrary / random pick
If you are looking for a random selection, #Craig already provided a solution with truly random picks.
You could also use a CTE to avoid additional scans on the (possibly big) base table and then run two separate (cheap) subqueries on the small result set.
For arbitrary selection - not truly random, the result will depend on the physical order of rows in the table and implementation-specifics:
WITH x AS (
SELECT m.album, m.artist, m.id, m.artwork
FROM playlist_media_files p
JOIN media_files m ON m.id = p.media_file_id
)
SELECT a.album, a.artist, a.media_file_ids, b.artwork
FROM (
SELECT album, artist, array_agg(id) AS media_file_ids
FROM x
) a
JOIN (
SELECT DISTINCT ON (1,2) album, artist, artwork
FROM x
) b USING (album, artist);
For truly random results, you can add an ORDER BY .. random() like this to subquery b:
JOIN (
SELECT DISTINCT ON (1, 2) album, artist, artwork
FROM x
ORDER BY 1, 2, random()
) b USING (album, artist);

Parent-child sql query with order by and limit

I have two tables DOCUMENT and ATTRIBUTES like these
DOCUMENT(id),
ATTRIBUTE(name, value, doc_fk).
I need to run a query that works like this "abstract query"
select top 100 documents
where $state='COMPLETED'
order by $creationDate
Where $state and $creationDate are two attributes.
Note that the limit is on documents, not attributes, and sort and filter are on two different attributes. The final query should return all document attributes, not only the filtered/sorted ones.
I was able to write this with a very complex query and I'm looking for better alternatives. I could post my solution if useful, but I do not want to point you in the, possibly, wrong direction.
It's ok to get a FEW extra documents, like 1000 instead of 100, and filter/sort in memory.
Could be ok for the limit not to be exact, like 74 instead of the required limit 100, but not too far from it.
Extra "soft" requirements:
the query should work with several databases (oracle, mysql and sqlserver), so weird analytic functions should be avoided unless available on all platforms
should work with JPA (eclipselink 2.4.0 implementation)
The expected output is something like this
DOC_ID ATTRIBUTE_NAME VALUE
123 state COMPLETED
123 creationDate 21/11/2012
123 userid someone
456 state COMPLETED
...
Ah, the flaws of an EAV design.
Try this.
select
top 100
document.*
from document
inner join attribute astate on document.id = astate.doc_fk
and astate.name='state'
and astate.value = 'completed'
inner join attribute acreation on document.id = acreation.doc_fk
and acreation.name='creationdate'
order by cast(acreation.value as date)
But it's only going to get more complicated if you persist with this EAV structure.
(PS. MySQL doesn't use TOP, but LIMIT instead)
SELECT doc_id, attr_name, attr_val, creationDate FROM
(
SELECT * FROM (
SELECT
doc.id as 'doc_id', attr.name as 'attr_name', null as 'attr_val', attr.value as 'creationDate'
FROM
ATTRIBUTE attr
LEFT JOIN
DOCUMENT doc ON attr.doc_fk = doc.id
WHERE
attr.name='creationDate'
ORDER BY creationDate desc;
) AS dt1
UNION ALL
SELECT * FROM(
SELECT
doc.id as 'doc_id', attr.name as 'attr_name', attr.value as 'attr_val', null as 'creationDate'
FROM
ATTRIBUTE attr
LEFT JOIN
DOCUMENT doc ON attr.doc_fk = doc.id;
) as dt2
) as dt0 GROUP BY doc_id ORDER by creationDate desc LIMIT 100;
Derived table 1 (dt1) gives you all the date attributes - to enable order your results by document's creation date.
Derived table 2 gives you all the attribute.. all put together by "union all", enables you to group by document, then order by the date of creation.
Hope this is in the right direction.