Looking for identical sequences over all users - sql

I want to get the identical pathes (with counts) of all users
Hey everybody,
Want to keep the question short and hopefully it‘s clear what I want.
I have a table in BigQuery. There I have the following columns
- UserID
- Timestamp
- Domain
- some other columns (but I guess they are unimportant)
I have totally no idea how to fix this!
So I want to look for the same paths over all users and count how many users have the same sequence of domains.
Problem: We are talking about 129 000 users and around 5TB of data. I guess I have to limit the amount of path length or something else.
I‘m familiar with SQL but I need some help/input to keep the costs low. Every query costs money and my thought was to ask the community before I spend thousands of Dollars.
Thanks for any input!
EDIT:
I tried the following to rank the visits of domains:
SELECT
guid,
domain AS channel,
timestamp,
RANK() OVER (PARTITION BY guid ORDER BY timestamp ASC ) AS rank
FROM
data.all
My problem is now: How can I match identical pathes afters merge each "step" in this customer journey?

This may help or at least help you get started:
select domains, count(*)
from (select userid, string_agg(domain order by timestamp, ',') as domains
from t
group by userid
) u
group by domains;
I would prefer to use arrays to store the path itself, but BigQuery does not (yet) support arrays as GROUP BY keys.

Related

SQL schema Site Leader Board

So I am trying to set up a site which has challenges and then want to convert that to leader boards for each challenge, and then an all time leaderboard.
So I have a challenges table that looks like this:
Challenge ID Challenge Name Challenge Date Sport Prize Pool
Then I need a way so each challenge has its own leader board of say 50 people.
linked by the challenge ID where that will = Leaderboard ID
I have a leader board of 50 people for that challenge that will look something like this:
Challenge ID User Place Prize Won
My question is 2 things:
How can I make a table auto create when a new challenge is added to the challenges table?
How can I get an A site wide leader board for every challenge so it will show the following:
Rank USER Prize Money Won(total every challenge placed)
and then base rank order by how much money won..
I know this is a lot of questions all wrapped in one, schema design and logic.
Any insights greatly appreciated
A better approach than one table per challenge is one table for all of them. That way you can compute grand totals and individual challenge rankings all with the same table. You'd also want to not record the place directly but compute it on the fly with the appropriate window function depending on how you want to handle ties (rank(), dense_rank(), and row_number() will have different results in those cases); that way you don't have to keep adjusting it as you add new records.
A table something like (You didn't specify a SQL database, so I'm going to assume Sqlite. Adjust as needed.):
CREATE TABLE challenge_scores(user_id INTEGER REFERENCES users(id),
challenge_id INTEGER REFERENCES challenges(id),
prize_amount NUMERIC,
PRIMARY KEY(user_id, challenge_id));
will let you do things like
SELECT *
FROM (SELECT user_id,
sum(prize_amount) AS total,
rank() OVER (ORDER BY sum(prize_amount) DESC) AS place
FROM challenge_scores
GROUP BY user_id)
WHERE place <= 50
ORDER BY place;
for the global leaderboard, or the similar:
SELECT *
FROM (SELECT user_id,
prize_amount,
rank() OVER (ORDER BY prize_amount DESC) AS place
FROM challenge_scores
WHERE challenge_id = :some_challenge_id
GROUP BY user_id)
WHERE place <= 50
ORDER BY place;
for a specific challenge's.

SQL query to get all unique rows with uid

I want to see which user has received the most highfives using a SQL query. My table looks like following, id | uid | ip. Now, I want to count the amount of rows a uid has, but it has to be unique with the ip. So nobody can give multiple highfives to a person.
I searched around online, and I couldn't find anything about this. If anyone could help me with this, I would be grateful.
you can try like below
select ip, count(distinct uid) from table t
group by ip
SELECT uid,COUNT(ip)NoOfVotes FROM
(SELECT uid,ip,Serial=ROW_NUMBER() OVER(PARTITION BY ip,uid ORDER BY uid) FROM
dbo.tbl_user)A
WHERE Serial=1
GROUP BY uid
I think this will give you perfect vote counting. Using Row Number actively remove duplicates from same ip,same uid. Voting to multiple uid from same ip is allowed in this query.

Can a nested Group By be done in a single Select?

Using T-SQL (we're on 2008, but if it can be done in 2012 using some new function/extension, please note)
This is purely out of curiosity...I ended up just going with a GROUP BY within a GROUP BY. But I'm curious to see if there is a way to do this in a single query, maybe there's some fancy shmancy functions or extensions I haven't learned yet....It's more of a challenge than it is a need to get the job done, as it's already done.
I tried building an example table on here, but it's too large to build, so here's the concept. The table has three columns, UserID, UserGroupID and Minutes. In one hour increments, we log how much time a user spends within an application. So, for example, UserID 1 spent 10 min during the hour of 04/28/2014 10:00:00, and then 15 minutes during the hour of 04/28/2014 11:00:00...and so on. (for this example, please ignore any time constraints as far as per day or per month, etc)
I wanted to see the number of users per group that have used the application for at least 30 minutes. This is the logic that was used:
SELECT UserGroupID, COUNT(*)
FROM (
SELECT UserGroupID, UserID
FROM Example
GROUP BY UserGroupID, UserID
HAVING SUM([Minutes]) >= 30
) AS x
GROUP BY UserGroupID
The question is, can this be done in a single query? Not looking for efficiency here, I'm just curious.
I don't think so, but a negative is quite hard to prove.
The following query (without the having clause) can be simplified. So:
SELECT UserGroupID, COUNT(*)
FROM (
SELECT UserGroupID, UserID
FROM Example
GROUP BY UserGroupID, UserID
) AS x
GROUP BY UserGroupID;
Is pretty much the same as:
SELECT UserGroupId, COUNT(DISTINCT UserId)
FROM Example
GROUP BY UserGroupId;
(These are not exactly equivalent if UserId can be NULL, but that case could also be handled.)
I don't think there is a way to do your full query, though. You need to aggregate by UserGroupId, UserId to get the sum() condition. Then you need to aggregate just by UserGroupId. Nothing comes to mind.

sql max of count 2 tables

i've been searching for literally hours (1pm-11pm) for a solution to this SQL query I have to write. Basically, I have 2 tables and I have to select the ID from one table which has the maximum results in another. The second issue is there are 2 IDs. I can't quite explain what I mean because I'm that unsure but I can post my instructions and a link to the tables.
Any help will be greatly appreciated. I've also looked at a million other posts on SO and other places but even if it seems remotely relevant, I've no idea what changes to make to suit my needs.
i.e
SQL SELECT MAX COUNT
So my task is as follows:-
Display the name and the telephone number of private owners which have more properties than anybody else.
The top table in the image shows the "properties for rent" table and the lower shows the "private owners".
In reference to the question, I need to use the primary key of the private owners table to count the number of properties that each private owner has available to rent and then display the details of the private owner(s) who has the most properties available - which by studying the data, is 2 private owners (CO87 and CO93).
Again, I'd appreciate any help at all with this, I've been pulling my hair out for the best part of 12 hours :/
Thanks in advance guys,
Tim.
P.s - Just for the curious, this is one of an insane amount of SQL tasks for a university assignment =)
Edit:- The owner IDs are strings, not integers.
Your process should be:
Get a count of properties for each owner
Find the owner IDs with the max # of properties
Find the owners with those owner ID.
Seems like this should work:
SELECT * FROM Owners
WHERE OwnerID IN
(
SELECT OwnerID
FROM Properties
GROUP BY OwnerID
HAVING COUNT(*) =
(SELECT COUNT(*)
FROM Properties
GROUP BY OwnerID
ORDER BY COUNT(*) DESC
LIMIT 1)
)

SQL: Order By dictated by a different table

Hopefully very simple SQL question - I'm just blacking out :)
I have a table of vendors (id, name, description, url). It used to be the web service returned them all sorted by id. After a while, I was asked to return sorted by name. Now they want me to allow them to change the order manually - to showcase new vendors.
Suppose I create another table, VendorOrder with (vendorid, placement), what can I put in the Order By section of the original query, to return the vendors sorted by placement?
As always, thanks in advance.
Guy
select
vendor.id,
vendor.name,
vendor.description,
vendor.url
from
vendors,
vendorOrder
where
vendors.id = vendorOrder.vendorId
order by
vendorOrder.placement;
Make sure that you find exactly one vendorId in vendorOrder for each id in vendor, otherwise use a left join between the tables.