How to show the combinations of a column in SQL along with the aggregated results? - sql

The question is on SQL:
Create a summary table to show how many customers use different GO-JEK services in a daily
basis, along with the combination of services used (Please see below screenshot for more
details).
Several conditions for this task:
Take only order_status = “Completed”
No repetition on details of order_type
You can combine it in any order but every combination is only allowed once. E.g.
same combinations like RIDE, CAR, SEND and CAR, RIDE, SEND are
unacceptable
Group each order_payment and its combination
I.e. aggregations by CASH, GOPAY, CASH&GOPAY (ALL)
Use date in UTC timezone
Dataset sample: link
I'm not sure on getting the order_type combinations. Please suggest.

Related

How to create an aggregate table (data mart) that will improve chart performance?

I created a table named user_preferences where user preferences have been grouped by user_id and month.
Table:
Each month I collect all user_ids and assign all preferences:
city
district
number of rooms
the maximum price they can spend
The plan assumes displaying a graph showing users' shopping intentions like this:
The blue line is the number of interested users for the selected values in the filters.
The graph should enable filtering by parameters marked in red.
What you see above is a simplified form for clarifying the subject. In fact, there are many more users. Every month, the table increases by several hundred thousand records. The SQL query retrieving data (feeding) for chart lasts up to 50 seconds. It's far too much - I can't afford it.
So, I need to create a table (table/aggregation/data mart) where I will be able to insert the previously calculated numer of interested users for all combinations. Thanks to this, the end user will not have to wait for the data to count.
Details below:
Now the question is - how to create such a table in PostgreSQL?
I know how to write a SQL query that will calculate a specific example.
SELECT
month,
count(DISTINCT user_id) interested_users
FROM
user_preferences
WHERE
month BETWEEN '2020-01' AND '2020-03'
AND city = 'Madrid'
AND district = 'Latina'
AND rooms IN (1,2)
AND price_max BETWEEN 400001 AND 500000
GROUP BY
1
The question is - how to calculate all possible combinations? Can I write multiple nested loop in SQL?
The topic is extremely important to me, I think it will also be useful to others for the future.
I will be extremely grateful for any tips.
Well, base on your query, you have the following filters:
month
city
distirct
rooms
price_max
You can try creating a view with the following structure:
SELECT month
,city
,distirct
,rooms
,price_max
,count(DISTINCT user_id)
FROM user_preferences
GROUP BY month
,city
,distirct
,rooms
,price_max
You can make this view materialized. So, the query behind the view will not be executed when queried. It will behave like table.
When you are adding new records to the base table you will need to refresh the view (unfortunately, posgresql does not support auto-refresh like others):
REFRESH MATERIALIZED VIEW my_view;
or you can scheduled a task.
If you are using only exact search for each field, this will work. But in your example, you have criteria like:
month BETWEEN '2020-01' AND '2020-03'
AND rooms IN (1,2)
AND price_max BETWEEN 400001 AND 500000
In such cases, I usually write the same query but SUM the data from the materialized view. In your case, you are using DISTINCT and this may lead to counting a user multiple times.
If this is a issue, you need to precalculate too many combinations and I doubt this is the answer. Alternatively, you can try to normalize your data - this will improve the performance of the aggregations.

multiple Access queries using same criteria

New to this...
I have a table QAQC_Studies that includes titles, dates, and subject matter
I have another table QAQC_Publications that includes citation information for multiple publications resulting from a single study in the first table.
Every 3 months I need to create a report to QC studies added by coworkers so I run the following query (with some additional attributes removed for brevity). The where clause is a list of study IDs they provide me (often 15-20 different studies).
SELECT QAQC_Studies.StudiesID,
QAQC_Studies.NSL,
QAQC_Studies.StudyTitle,
QAQC_Studies.Abstract,
QAQC_Studies.StudyStatus
FROM QAQC_Studies
WHERE [QAQC_Studies].[StudiesID]=26806 or 26845
I'd like to add to that report a list of the publications associated with each study.
How do I write the Where clause in the second query to reference those studies indicated in the first query?
You can use a subquery. Something like:
SELECT [QAQC_Publications].[QAQC_Field]
FROM [QAQC_Publications]
WHERE [QAQC_Publications].[StudiesID] --or whichever field the two tables
--share for publication/study connection
IN (SELECT QAQC_Studies.StudiesID
FROM QAQC_Studies
WHERE [QAQC_Studies].[StudiesID]=26806 or 26845)

Count Distinct Records and

I have a very large database and I need to extract information from 3 columns:
I am trying to determine how many unique customers the user is processing.
Username. This is unique name.
CustomerNumber. A customer number will appear on many lines, as they could have ordered many products and each product is a line.
Date Range. I need to be able to define a date range.
The code I am tried and searched is counting the customer numbers, but not just the distinct customer number.
I have not tried the date range as yet.
I have attached 2 images to show an example of the database and the end result. We used a pivot table to produce this result, but the data changes all the time and we dont want to create a pivot table the whole time.
Image of Sample Data in Excel:
Image of Required Final Result
SELECT `'All data$'`.Username, Count(`'All data$'`.CustomerNumber)
FROM `C:\Users\rhynto\Desktop\Darren Qwix\QWIX_PICKED.xlsx`.`'All data$'` `'All data$'`
GROUP BY `'All data$'`.Username
I will appreciate any advice on this please.

Troubleshooting an inner join that's not finding expected matches

Okay...I'm not a coder and my code has lots of steps that would make it difficult to post and get an outright answer. So I am looking for general steps you would follow if an inner join does not seem to be working correctly. Here is my general situation:
Problem with inner join:
I start with two tables that I basically am appending to each other - they share most fields, including "id". One of the tables contains households who receive an email, and the other table are households who did not receive an email - "controls". So I append them into a single table, keep in mind they come from different sources with different processes creating them.
Then I match the id against another table that contains only customers and get a custnum for some of those households that are indeed customers.
Next is to use the custnum variable to join to a sales table. At least some controls, and likely a greater number of the mailed households should be customers and have sales - the point of the email was to obviously bring about sales.
My problem is that NO control households are showing up with any sales. That is impossible, given that there are hundreds of thousands of households. I'm getting a reasonable number of matches to the emailed households.
In trying to troubleshoot this all I can figure is that somehow there is a format issue of the id or the custnum fields between the mailed and control households - perhaps because they did come from different sources and I had to append them together at the start.
Is this possible? Should both the format and informat be the same for each key?What else could be the problem?
It is much easier to append data using a DATA STEP than using SQL statements.
data both ;
set email noemail;
run;
If you want to do the same thing in SQL then use UNION instead of any type of join.
proc sql ;
create table both as
select * from email
union
select * from noemail
;
quit;

SQL - how to include average of a column in a report

Here's the scenario:
I have an Access project with tables consisting of the following data:
Activity table: ActivityName, ActivityPopularityRating
Volunteer table: VolunteerName, VolunteerRating
Each activity can have many volunteers and each volunteer has a rating.
I have to create a report which indicates for each activity, the names and ratings of the volunteers taking part in that activity, as well as the average VolunteerRating of the those volunteers taking part in each activity. An example is attached.
I have created the SQL query but I am not sure if I should generate the average value needed in the query, or if there is some function in Access that would allow me to do that in the report.
Here is my Query:
SELECT Activity.ActivityName,
Activity.ActivityPopularityRating,
StudentVolunteer.VolunteerName,
StudentVolunteer.VolunteerRating,
AVG(StudentVolunteer.VolunteerRating)
FROM Activity
INNER JOIN StudentVolunteer ON Activity.ActivityName = StudentVolunteer.ActivityName
GROUP BY Activity.ActivityName
All help is appreciated
Thanks
To add a total line (here showing the average rating) to your results, you would use UNION ALL in MsAccess (or ROLLUP in another DBMS). However, it is not necessary to add such line to your query. The data is already there (it is the avarage of the selected ratings, which can easily be calcualted from them).
So remove GROUP BY and the AVG line from your query and add AVG(VolunteerRating) in your report layer instead. The report will then calculate the avarage from the ratings.