Caching aggregrate results of user supplied queries - sql

I have an application which let users track visitors on their own website. For this I have a table visitors(id, email, sessions, first_seen, last_seen, ...)
Users of my application can save filters/groups of visitors matching certain conditions on attributes. For this I have a table called like
groups(id, visitor_id, name, filters[{type, field, amount}...])
Example of a filters[]
(type: "greater_than", field: "sessions", amount: 5)
Each group can have multiple filters and for each group, I'd like to display the amount of visitors/results that match the filters.
What is the best way to handle this in the database?
I am thinking of something like a materialized view, but I'd still want the data to be fresh and up to date and not sure if this is the best approach.
Any suggestions?

Related

Bulk data filters in Tableau

Our organization is in e-commerce and users are looking to change a filter everyday with a different list of items, and none of the users will have their own license, just read-only access. The data is connected through Google Big Query, is there a way to have this bulk filter upload capability without the License owners having to touch the filter each time?
Example
Product ID is the filter
Monday: they have a list of 10,000 ID's they want to check sales for
Tuesday: They have a new list of 4,000 different ID's they want to check sales for.
Without clicking each ID each time, is there a way to just upload a list, csv, google sheet etc.
We thought users can upload a list of Product ids to Google sheets which can map to a BigQuery table. We can use it to join with the sales table and get the relevant data. However this becomes unmanageable when we have more than 1 user as users might step on to others data.
Any suggestions/recommendations are welcome. Our team is pretty new to Tableau as such. Let me know if any additional details are needed.
Have you tried changing the filter type to "Multi Values (custom list)" and then having the report user paste their list into the filter? See below:

Return first 'unsorted' join in Oracle SQL

I have a table 'ACCOUNTS', with fields ACCTNO and ACPARENT. One account can be the parent of another. One account can have many children.
It's been discovered that certain external processes are using the 'first child' in certain reports and outputs - but there's no actual 'reason' for any particular child to be 'first', just an unintended bug in the code.
First step in untangling this - I need a query, that can be re-run (but not often, so optimisation is not really a factor) that will identify, for all accounts that are parents, what their 'first child' is.
Problem - the 'first child' isn't necessarily anything to do with record ID. If I run the following query, for example:
SELECT ACCTNO FROM ACCOUNTS WHERE ACPARENT = '80005217';
I get a result of:
ACCTNO
______
80007325
80007310
80007315
80007298
I can absolutely, 100% confirm that for this particular example, account 80007325 is the account ID being used as the 'first child'.
On the flipside, if I run a naive query of:
SELECT A1.ACCTNO, A2.ACCTNO AS CHILDACCOUNT FROM ACCOUNTS A1
INNER JOIN ACCOUNTS A2 ON A1.ACCTNO = A2.ACPARENT
WHERE A1.ACCTNO IN
(SELECT ACPARENT FROM ACCOUNTS);
then if I scroll down to where 80005217 is the parent account, I see the following list:
CHILDACCOUNT
______
80007298
80007310
80007315
80007325
It's sorted, even though it's exactly not what I want.
Is there a query that will get me a list of what I want in a single query? A list of all parent accounts, and their 'first child' as returned by SQL unsorted?
To guarantee records coming in a fixed order we must provide the database with sort criteria in the ORDER BY clause. If there is no attribute which defines "first-ness" then no guarantee is possible. Without an ORDER BY clause the records are essentially in an uncontrolled order, although because of
database internals they often fall into some kind of pattern.
So, what makes account 80007325 the first child WHERE ACPARENT = '80005217'? Clearly not numerical order. Is there some other criterion? Date created? A flag column? Seems like you need to talk to your users. Do they really care which records come first? All the time or just in some specific report?
If your users cannot specify the criteria there's not much you can do...
...although I might be tempted to sort CHILDACCOUNT numerically by ACCTNO whenever it is displayed. At least that would provide consistency, and the users will get used to it.

SQL complicated query with joins

I have problem with one query.
Let me explain what I want:
For the sake of bravity let's say that I have three tables:
-Offers
-Ratings
-Users
Now what I want to do is to create SQL query:
I want Offers to be listed with all its fields and additional temporary column that IS NOT storred anywhere called AverageUserScore.
This AverageUserScore is product of grabbing all offers, belonging to particular user and then grabbing all ratings belonging to these offers and then evaluating those ratings average - this average score is AverageUserScore.
To explain it even further, I need this query for Ruby on Rails application. In the browser inside application you can see all offers of other users , with AverageUserScore at the very end, as the last column.
Associations:
Offer has many ratings
Offer belongs to user
Rating belongs to offer
User has many offers
Assumptions made:
You actually have a numeric column (of any type that SQL's AVG is fine with) in your Rating model. I'm using a column ratings.rating in my examples.
AverageUserScore is unconventional, so average_user_score is better.
You don't mind not getting users that have no offers: average rating is not clearly defined for them anyway.
You don't deviate from Rails' conventions far enough to have a primary key other than id.
Displaying offers for each user is a straightforward task: in a loop of #users.each do |user|, you can do user.offers.each do |offer| and be set. The only problem here is that it will execute a separate query for every user. Not good.
The "fetching offers" part is a standard N+1 counter seen even in the guides.
#users = User.includes(:offers).all
The interesting part here is only getting the averages.
For that I'm going to use Arel. It's already part of Rails, ActiveRecord is built on top of it, so you don't need to install anything extra.
You should be able to do a join like this:
User.joins(offers: :ratings)
And this won't get you anything interesting (apart from filtering users that have no offers). Inside though, you'll get a huge set of every rating joined with its corresponding offer and that offer's user. Since we're taking averages per-user we need to group by users.id, effectively making one entry per one users.id value. That is, one per user. A list of users, yes!
Let's stop for a second and make some assignments to make Arel-related code prettier. In fact, we only need two:
users = User.arel_table
ratings = Rating.arel_table
Okay. So. We need to get a list of users (all fields), and for each user fetch an average value seen on his offers' ratings' rating field. So let's compose these SQL expressions:
# users.*
user_fields = users[Arel.star] # Arel.star is a portable SQL "wildcard"
# AVG(ratings.rating) AS average_user_score
average_user_score = ratings[:rating].average.as('average_user_score')
All set. Ready for the final query:
User.includes(:offers) # N+1 counteraction
.joins(offers: :ratings) # dat join
.select(user_fields, average_user_score) # fields we need
.group(users[:id]) # grouping to only get one row per user

MySQL joins for friend feed

I'm currently logging all actions of users and want to display their actions for the people following them to see - kind of like Facebook does it for friends.
I'm logging all these actions in a table with the following structure:
id - PK
userid - id of the user whose action gets logged
actiondate - when the action happened
actiontypeid - id of the type of action (actiontypes stored in a different table - i.e. following other users, writing on people's profiles, creating new content, commenting on existing content, etc.)
objectid - id of the object they just created (i.e. comment id)
onobjectid - id of the object they did the action to (i.e. id of the content that they commented on)
Now the problem is there are several types of actions that get logged (actiontypeid).
What would be the best way of retrieving the data to display to the user?
The easiest way out would be gabbing the people the user follows dataset and then just go from there and grab all other info from the other tables (i.e. the names of the users the people you're following just started following, names of the user profiles they wrote on, etc.). This however would create a a huge amount of small queries and trips to the database in a while loop. Not a good idea.
I could use joins to retrieve everything in one massive data set, but how would I know where to grab the data from in just one query? - there's different types of actions that require me to look into several different tables to retrieve data, based on the actiontypeid...
i.e. To get User X is now following User Y I'd have to get my data (User Y's username) from the followers table, whereas User X commented on content Y would need me to look in the content table to get the content's title and URL.
Any tips are welcome, thanks!
Consider creating several views for different actiontypeids. Union them to have one full history.

How to design a database table structure for storing and retrieving search statistics?

I'm developing a website with a custom search function and I want to collect statistics on what the users search for.
It is not a full text search of the website content, but rather a search for companies with search modes like:
by company name
by area code
by provided services
...
How to design the database for storing statistics about the searches?
What information is most relevant and how should I query for them?
Well, it's dependent on how the different search modes work, but generally I would say that a table with 3 columns would work:
SearchType SearchValue Count
Whenever someone does a search, say they search for "Company Name: Initech", first query to see if there are any rows in the table with SearchType = "Company Name" (or whatever enum/id value you've given this search type) and SearchValue = "Initech". If there is already a row for this, UPDATE the row by incrementing the Count column. If there is not already a row for this search, insert a new one with a Count of 1.
By doing this, you'll have a fair amount of flexibility for querying it later. You can figure out what the most popular searches for each type are:
... ORDER BY Count DESC WHERE SearchType = 'Some Search Type'
You can figure out the most popular search types:
... GROUP BY SearchType ORDER BY SUM(Count) DESC
Etc.
This is a pretty general question but here's what I would do:
Option 1
If you want to strictly separate all three search types, then create a table for each. For company name, you could simply store the CompanyID (assuming your website is maintaining a list of companies) and a search count. For area code, store the area code and a search count. If the area code doesn't exist, insert it. Provided services is most dependent on your setup. The most general way would be to store key words and a search count, again inserting if not already there.
Optionally, you could store search date information as well. As an example, you'd have a table with Provided Services Keyword and a unique ID. You'd have another table with an FK to that ID and a SearchDate. That way you could make sense of the data over time while minimizing storage.
Option 2
Treat all searches the same. One table with a Keyword column and a count column, incorporating SearchDate if needed.
You may want to check this:
http://www.microsoft.com/sqlserver/2005/en/us/express-starter-schemas.aspx