MariaDB joining tables on themselfs - sql

Ok, I've googled, i've tried but mostly i failed.
I've got a table with 5 columns
ID (just a primary key)
UserUUID
Category
Value
I can pull a query where i get the rankings of a specific category for all users
SELECT
RANK() OVER (PARTITION BY t1.cat ORDER BY value DESC) as rank,
t1.UUID, t1.cat, t1.value
FROM t1
WHERE t1.cat='Category1'
ORDER by t1.value DESC
So this outputs something like:
| 1 | sdc9c4-541 | cat1 | 16102 |
| 2 | sqdf5d-542 | cat1 | 7313 |
| 3 | sqsd5d-685 | cat1 | 7116 |
| 4 | s45sdf-213 | cat1 | 4158 |
.....
This works, but now i'm trying to get the reverse view on this.
So I'm trying to pull a query where i get the rankings of a user category for all categories
The desired output should look something like:
| 1 | sdc9c4-541 | cat1 | 16102 |
| 37 | sdc9c4-541 | cat2 | 25 |
| 15 | sdc9c4-541 | cat3 | 2345 |
| 2 | sdc9c4-541 | cat4 | 912 |
This showing the Rank, User, Category and value's. where the rank represents the users ranking on that category in comparison with other users
I've already messed around with subqueries, with clauses, variables, joins. but i can't get this result to come out and work.
Is there anybody that can give me some pointers in what direction i need to look to make this work.
Thanks in advance

Related

Turn results of count distinct into something that can be aggregated

I have a table like this:
+----------+--------------+-------------+
| category | sub_category | customer_id |
+----------+--------------+-------------+
| A | AB2 | A876 |
| A | AB2 | A876 |
| A | AA1 | A876 |
| A | AA1 | A876 |
| A | AC3 | A756 |
| B | AB2 | A876 |
| B | AA1 | A756 |
| B | AB7 | A908 |
| C | AA1 | A756 |
| C | AB7 | A908 |
| C | AC3 | A908 |
+----------+--------------+-------------+
And I want to count distinct customers so I can easily do something like:
SELECT category, sub_category, COUNT(DISTINCT customer_id) as count_of_customers
FROM tbl
GROUP BY category, sub_category
And I get a report that gives me distinct customers for each sub_category and category. But these numbers can no longer be aggregated as there needs to be de-duplication if I just need distinct customers by category only.
For e.g customer_id = 'A876' will be counted twice in category='A' (once in sub_category = 'AB2' and once in sub_category = 'AA1') if I just sum the count_of_customers from my query result.
So here is the question, I would like to make these query results "aggregatable". Looking at the problem, it looks like this just isn't possible but I am wondering if there some clever way of distributing these results across categories? so that in my reporting layer (like an excel pivot table), I can get a result that counts 'A876' once in category='A' but counts it twice when I also include sub_category in the fields. Basically converting the results into something summable.
I should mention that this is an overly simplified example. The solution will need to generalize across n different categories and sub_categories.
I am looking for an output that would easily allow me to get either of the following results in something similar to a pivot table (think tableau-like reporting tools):
+----------+--------------------+
| category | distinct_customers |
+----------+--------------------+
| A | 2 |
| B | 3 |
| C | 2 |
+----------+--------------------+
+--------------+--------------------+
| sub_category | distinct_customers |
+--------------+--------------------+
| AA1 | 2 |
| AB2 | 1 |
| AB7 | 1 |
| AC3 | 2 |
+--------------+--------------------+
My immediate thought is to assign weights to a customer_id depending on how many categories and sub_categories it occurs in but I don't know exactly how I'd go about doing this.
You can do exactly what you want -- assigning weights. But this still won't aggregate correctly. Assuming there are no duplicates:
select category, sub_category,
count(distinct customer_id),
sum(1.0 / num_cs) as weighted_customers
from (select t.*,
count(*) over (partition by customer_id) as num_cs
from t
) t
group by category, sub_category;
This weights by both category and sub_category. Obviously, you can adjust the partition by to weight by just one or the other.

Make a query making groups on the same result row

I have two tables. Like this.
select * from extrafieldvalues;
+----------------------------+
| id | value | type | idItem |
+----------------------------+
| 1 | 100 | 1 | 10 |
| 2 | 150 | 2 | 10 |
| 3 | 101 | 1 | 11 |
| 4 | 90 | 2 | 11 |
+----------------------------+
select * from items
+------------+
| id | name |
+------------+
| 10 | foo |
| 11 | bar |
+------------+
I need to make a query and get something like this:
+--------------------------------------+
| idItem | valtype1 | valtype2 | name |
+--------------------------------------+
| 10 | 100 | 150 | foo |
| 11 | 101 | 90 | bar |
+--------------------------------------+
The quantity of types of extra field values is variable, but every item ALWAYS uses every extra field.
If you have only two fields, then left join is an option for this:
select i.*, efv1.value as value_1, efv2.value as value_2
from items i left join
extrafieldvalues efv1
on efv1.iditem = i.id and
efv1.type = 1 left join
extrafieldvalues efv2
on efv1.iditem = i.id and
efv1.type = 2 ;
In terms of performance, two joins are probably faster than an aggregation -- and it makes it easier to bring in more columns from items. One the other hand, conditional aggregation generalizes more easily and the performance changes by little as more columns from extrafieldvalues are added to the select.
Use conditional aggregation
select iditem,
max(case when type=1 then value end) as valtype1,
max(case when type=2 then value end) as valtype2,name
from extrafieldvalues a inner join items b on a.iditem=b.id
group by iditem,name

PostgreSQL: Using the LEAST() command after GROUP BY to achieve first transactions

I am working with a magento table like this:
+-----------+--------------+------------+--------------+----------+-------------+
| date | email | product_id | product_type | order_id | qty_ordered |
+-----------+--------------+------------+--------------+----------+-------------+
| 2017/2/15 | x#y.com | 18W1 | custom | 12 | 1 |
+-----------+--------------+------------+--------------+----------+-------------+
| 2017/2/15 | x#y.com | 18W2 | simple | 17 | 3 |
+-----------+--------------+------------+--------------+----------+-------------+
| 2017/2/20 | z#abc.com | 22Y34 | simple | 119 | 1 |
+-----------+--------------+------------+--------------+----------+-------------+
| 2017/2/20 | z#abc.com | 22Y35 | custom | 31 | 2 |
+-----------+--------------+------------+--------------+----------+-------------+
I want to make a new view by grouping by email, and then taking the row with the LEAST of order_id only.
So my final table after doing this operation from above should look like this:
+-----------+--------------+------------+--------------+----------+-------------+
| date | email | product_id | product_type | order_id | qty_ordered |
+-----------+--------------+------------+--------------+----------+-------------+
| 2017/2/15 | x#y.com | 18W1 | custom | 17 | 1 |
+-----------+--------------+------------+--------------+----------+-------------+
| 2017/2/15 | z#abc.com | 18W2 | simple | 31 | 3 |
+-----------+--------------+------------+--------------+----------+-------------+
I'm trying to use the following query (but it's not working):
SELECT * , (SELECT DISTINCT table.email, table.order_id,
LEAST (order_id) AS first_transaction_id
FROM
table
GROUP BY
email)
FROM table;
Would really love any help with this, thank you!
I think you want distinct on:
select distinct on (email) t.*
from t
order by email, order_id;
distinct on is a Postgres extension. It takes one record for all combinations of keys in parentheses, based on the order by clause. In this case, it is one row per email, with the first one being the one with the smallest order_id (because of the order by). The keys in the select also need to be the first keys in the order by.

Window functions limited by value in separate column

I have a "responses" table in my postgres database that looks like
| id | question_id |
| 1 | 1 |
| 2 | 2 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
I want to produce a table with the response and question id, as well as the id of the previous response with that same question id, as such
| id | question_id | lag_resp_id |
| 1 | 1 | |
| 2 | 2 | |
| 3 | 1 | 1 |
| 4 | 2 | 2 |
| 5 | 2 | 4 |
Obviously pulling "lag(responses.id) over (order by responses.id)" will pull the previous response id regardless of question_id. I attempted the below subquery, but I know it is wrong since I am basically making a table of all lag ids for each question id in the subquery.
select
responses.question_id,
responses.id as response_id,
(select
lag(r2.id, 1) over (order by r2.id)
from
responses as r2
where
r2.question_id = responses.question_id
)
from
responses
I don't know if I'm on the right track with the subquery, or if I need to do something more advanced (which may involve "partition by", which I do not know how to use).
Any help would be hugely appreciated.
Use partition by. There is no need for a correlated subquery here.
select id,question_id,
lag(id) over (partition by question_id order by id) lag_resp_id
from responses

Create a pivot table from two tables based on dates

I have two MS Access tables sharing a one to many relationship. Their structures are like the following:
tbl_Persons
+----------+------------+-----------+
| PersonID | PersonName | OtherData |
+----------+------------+-----------+
| 1 | PersonA | etc. |
| 2 | PersonB | |
| 3 | PersonC | |
tbl_Visits
+----------+------------+------------+-----------------------
| VisitID | PersonID | VisitDate | dozens of other fields
+----------+------------+------------+-----------
| 1 | 1 | 09/01/13 |
| 2 | 1 | 09/02/13 |
| 3 | 2 | 09/03/13 |
| 4 | 2 | 09/04/13 | etc...
I wish to create a new table based on the VisitDate field, the column headings of which are Visit-n where n is 1 to the number of visits, Visit-n-Data1, Visit-n-Data2, Visit-n-Data3 etc.
MergedTable
+----------+----------+---------------+-----------------+----------+----------------+
| PersonID | Visit1 | Visit1Data1 | Visit1Data2... | Visit2 | Visit2Data1... |
+----------+----------+---------------+-----------
| 1 | 09/01/13 | | | 09/02/13 |
| 2 | 09/03/13 | | | 09/04/13 |
| 3 | etc. | |
I am really not sure how to do this. Whether SQL query or using DAO then looping through records and columns. It is essential that there is only 1 PersonID per row and all his data appears chronologically into columns.
Start of by ranking the visits with something like
SELECT PersonID, VisitID,
(SELECT COUNT(VisitID) FROM tbl_Visits AS C
WHERE C.PersonID = tbl_Visits.PersonID
AND C.VisitDate < tbl_Visits.VisitDate) AS RankNumber
FROM tbl_Visits
Use this query as a base for the 'pivot'
Since you seem to have some visits of persons on the same day (visit 1 and 2) the WHERE clause needs to be a bit more sophisticated. But I hope you get the basic concept.
Pivoting can be done with multiple LEFT JOINs.
I question if my solution will have a high performance, since I did not test it. It is easier in SQL Server than in MS Access to accomplish.