Calculating Unique events in Google-Bigquery - google-bigquery

How can one find the count of unique events in Big Query ? I am having a hard time calculating the count of Unique events and matching it with GA interface.

two ways how this is used:
1) One is as the original linked documentation says, to combine full visitor user id, and their different session id: visitId, and count those.
SELECT
EXACT_COUNT_DISTINCT(combinedVisitorId)
FROM (
SELECT
CONCAT(fullVisitorId,string(VisitId)) AS combinedVisitorId
FROM
[google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE
hits.type='PAGE' )
2) The other is just counting distinct fullVisitorIds
SELECT
EXACT_COUNT_DISTINCT(fullVisitorId)
FROM
[google.com:analytics-bigquery:LondonCycleHelmet.ga_sessions_20130910]
WHERE
hits.type='PAGE'

Related

Dynamic mapping with SQL

I have an SQL db which contains a table of license plate numbers (plates), a table of parking lots (places), and a table corresponding one to the other over time (parking), each row placing a specific vehicle plate in a specific place at a particular time.
create table parking (
plateid integer,
placeid integer,
time_period integer
);
This means each row as a whole is unique but the plate/place combinations are not.
I need to count how many times each plate appears in each place. There are an indeterminate number of both, so I cannot maintain a table per place and simply count the entries.
This is easy enough using a general purpose programming language applied to the list. Is there a way to do it with straight SQL?
Are you just looking for aggregation?
select plate_id, place_id, count(*)
from parking
group by plate_id, place_id;
You can group by both plate and place and count occurrences:
SELECT place_id, place_id, count(*)
FROM parking
GROUP BY place_id, plate_id

Select latest and earliest times within a time group and a pivot statement

I have attandance data that contains a username, time, and status (IN or OUT). I want to show attendance data that contains a name, and the check in/out times. I expect a person to check in and out no more than twice a day. The data looks like this:
As you can see, my problem is that one person can have multiple data entries in different seconds for the same login attempt. This is because I get data from a fingerprint attendace scanner, and the machine in some cases makes multiple entries, sometimes just within 5-10 seconds. I want to select the data to be like this:
How can I identify the proper time for the login attempt, and then select the data with a pivot?
First, you need to normalize your data by removing the duplicate entries. In your situation, that's a challenge because the duplicated data isn't easily identified as a duplicate. You can make some assumptions though. Below, I assume that no one will make multiple login attempts in a two minute window. You can do this by first using a Common Table Expression (CTE, using the WITH clause).
Within the CTE, you can use the LAG function. Essentially what this code is saying is "for each partition of user and entry type, if the previous value was within 2 minutes of this value, then put a number, otherwise put null." I chose null as the flag that will keep the value because LAG of the first entry is going to be null. So, your CTE will just return a table of entry events (ID) that were distinct attempts.
Now, you prepare another CTE that a PIVOT will pull from that has everything from your table, but only for the entry IDs you cared about. The PIVOT is going to look over the MIN/MAX of your IN/OUT times.
WITH UNIQUE_LOGINS AS (
SELECT ID FROM LOGIN_TABLE
WHERE CASE WHEN LAG(TIME, 1, 0) OVER (PARTITION BY USERNAME, STATUS ORDER BY TIME)
+ (2/60/24) < TIME THEN NULL ELSE 1 END IS NULL ), -- Times within 2 minutes
TEMP_FOR_PIVOT AS (
SELECT USERNAME, TIME, STATUS FROM LOGIN_TABLE WHERE ID IN (SELECT ID FROM UNIQUE_LOGINS)
)
SELECT * FROM TEMP_FOR_PIVOT
PIVOT (
MIN(TIME), MAX(TIME) FOR STATUS IN ('IN', 'OUT')
)
From there, if you need to rearrange or rename your columns, then you can just put that last SELECT into yet another CTE and then select your values from it. There is some more about PIVOT here: Rotate/pivot table with aggregation in Oracle

Need to find duplicates where two columns values added together create a duplicate

Problem: There is a conference. Each class in the conference will run three times (time1, time2, time3). Attendee can only go to one session per class. I'm looking for duplicate class registrations. (User is going to class1: time1 and time2)
I need to write a query to find duplicate registrations with information in 3 different tables. The tables I'm joining are (class, user, registration). I need to find all duplicates that are created by adding the values for two columns. (class.title + user.id#) = duplicate.
I need the query to display only the duplicate rows found plus display additional column information such as id number, title, first, last, status.
For Example: The search would find these results
table
But display only...
results
I'm not sure where to start. The information in class.title + user.id# will vary per row, so I can't search by specific information. (ie: user.id#=45624).
If I understood you right is the DISTINCT-Keyword what you are looking for: http://www.w3schools.com/sql/sql_distinct.asp
An example:
SELECT * FROM (SELECT DISTINCT title, status, first, last FROM your_table)
This will make sure that there will be no duplicate rows returned identified by title, status, first and last.
I may misunderstood your question. In case you are looking for duplicates only, consider the following code. This will return you all duplicate rows identified by title, status, first and last and give you id as an additional information.
SELECT id, title, status, first, last
FROM your_table
GROUP BY title, status, first, last
HAVING COUNT(*) > 1
Which one is the answer you were looking for?
maybe this will work
select * from(
SELECT id, title, status, first, last ,
row_number() over ( partition by id, title order by status) rowid
FROM your_table
) x
where rowid = 2

Can peewee nest SELECT queries such that the outer query selects on an aggregate of the inner query?

I'm using peewee2.1 with python3.3 and an sqlite3.7 database.
I want to perform certain SELECT queries in which:
I first select some aggregate (count, sum), grouping by some id column; then
I then select from the results of (1), aggregating over its aggregate. Specifically, I want to count the number of rows in (1) that have each aggregated value.
My database has an 'Event' table with 1 record per event, and a 'Ticket' table with 1..N tickets per event. Each ticket record contains the event's id as a foreign key. Each ticket also contains a 'seats' column that specifies the number of seats purchased. (A "ticket" is really best thought of as a purchase transaction for 1 or more seats at the event.)
Below are two examples of working SQLite queries of this sort that give me the desired results:
SELECT ev_tix, count(1) AS ev_tix_n FROM
(SELECT count(1) AS ev_tix FROM ticket GROUP BY event_id)
GROUP BY ev_tix
SELECT seat_tot, count(1) AS seat_tot_n FROM
(SELECT sum(seats) AS seat_tot FROM ticket GROUP BY event_id)
GROUP BY seat_tot
But using Peewee, I don't know how to select on the inner query's aggregate (count or sum) when specifying the outer query. I can of course specify an alias for that aggregate, but it seems I can't use that alias in the outer query.
I know that Peewee has a mechanism for executing "raw" SQL queries, and I've used that workaround successfully. But I'd like to understand if / how these queries can be done using Peewee directly.
I posted the same question on the peewee-orm Google group. Charles Leifer responded promptly with both an answer and new commits to the peewee master. So although I'm answering my own question, obviously all credit goes to him.
You can see that thread here: https://groups.google.com/forum/#!topic/peewee-orm/FSHhd9lZvUE
But here's the essential part, which I've copied from Charles' response to my post:
I've added a couple commits to master which should make your queries
possible
(https://github.com/coleifer/peewee/commit/22ce07c43cbf3c7cf871326fc22177cc1e5f8345).
Here is the syntax,roughly, for your first example:
SELECT ev_tix, count(1) AS ev_tix_n FROM
(SELECT count(1) AS ev_tix FROM ticket GROUP BY event_id)
GROUP BY ev_tix
ev_tix = SQL('ev_tix') # the name of the alias.
(Ticket
.select(ev_tix, fn.count(ev_tix).alias('ev_tix_n'))
.from_(
Ticket.select(fn.count(Ticket.id).alias('ev_tix')).group_by(Ticket.event))
.group_by(ev_tix))
This yields the following SQL:
SELECT ev_tix, count(ev_tix) AS ev_tix_n FROM (SELECT Count(t2."id")
AS ev_tix FROM "ticket" AS t2 GROUP BY t2."event_id")
GROUP BY ev_tix

SQL - Get answer from query as a single number

The following code returns a couple of numbers, identifying people who take part in more than three activities.
SELECT pnr
FROM Participates
GROUP BY pnr
HAVING count(activities)>3;
I want the answer to be the number of people who participate in more than three activities though, i.e. "4", instead of four unique numbers. What to do?
Access supports derived tables.
SELECT COUNT(*) AS NumberOfParticipants FROM
(
SELECT pnr
FROM Participates
GROUP BY pnr
HAVING count(activities)>3
) T
You will need a WHERE clause on the pnr field to uniquely identify one of your groupings:
SELECT COUNT(pnr)
FROM Participates
GROUP BY pnr
WHERE pnr = 'whatever'
HAVING COUNT(activities)>3
The order of my clauses might be wrong
Select Count(Distinct pnr)
From Participates
Having Count(activities) > 3