PostgreSQL - Best approach for summarize data - sql

We have data as follows in system
User data
Experience
Education
Job Application
This data will be used across application and there are few logic also attached to these data.
Just to make sure that this data are consistent across application, i thought to create View for the same and get count of these data then use this view at different places.
Now question is, as detail tables does not have relation with each other, how should i create view
Create different view for each table and then use group by
Create one view and write sub query to get these data
From performance perspective, which one is the best approach?
For e.g.
SELECT
UserId,
COUNT(*) AS ExperienceCount,
0 AS EducationCount
FROM User
INNER JOIN Experience ON user_id = User_Id
GROUP BY
UserId
UNION ALL
SELECT
UserId,
0,
COUNT(*)
FROM User
INNER JOIN Education ON user_id = user_id
GROUP BY
UserId
And then group by this to get summary of all these data in one row per user.

One way to write the query that you have specified would probably be:
SELECT UserId, SUM(ExperienceCount), SUM(EducationCount
FROM ((SELECT UserId, COUNT(*) as ExperienceCount, 0 AS EducationCount
FROM Experience
GROUP BY UserId
) UNION ALL
(SELECT UserId, 0, COUNT(*)
GROUP BY UserId
)
) u
GROUP BY UserId;
This can also be written as a FULL JOIN, LEFT JOIN, and using correlated subqueries. Each of these can be appropriate in different circumstances, depending on your data.

Related

Order by date, while grouping matches by another column

I have this query
SELECT *, COUNT(app.id) AS totalApps FROM users JOIN app ON app.id = users.id
GROUP BY app.id ORDER BY app.time DESC LIMIT ?
which is supposed to get all results from "users" ordered by another column (time) in a related table (the id from the app tables references the id from the users table).
The issue I have is that the grouping is done before the ordering by date, so I get very old results. But I need the grouping in order to get distinct users, because each user can have multiple 'apps'... Is there a different way to achieve this?
Table users:
id TEXT PRIMARY KEY
Table app:
id TEXT
time DATETIME
FOREIGN KEY(id) REFERENCES users(id)
in my SELECT query I want to get a list of users, ordered by the app.time column. But because one user can have multiple app records associated, I could get duplicate users, that's why I used GROUP BY. But then the order is messed up
The underlying issue is that the SELECT is an aggregate query as it contains a GROUP BY clause :-
There are two types of simple SELECT statement - aggregate and
non-aggregate queries. A simple SELECT statement is an aggregate query
if it contains either a GROUP BY clause or one or more aggregate
functions in the result-set.
SQL As Understood By SQLite - SELECT
And thus that the column's value for that group, will be an arbitrary value the column of that group (first according to scan/search, I suspect, hence the lower values) :-
If the SELECT statement is an aggregate query without a GROUP BY
clause, then each aggregate expression in the result-set is evaluated
once across the entire dataset. Each non-aggregate expression in the
result-set is evaluated once for an arbitrarily selected row of the
dataset. The same arbitrarily selected row is used for each
non-aggregate expression. Or, if the dataset contains zero rows, then
each non-aggregate expression is evaluated against a row consisting
entirely of NULL values.
So in short you cannot rely upon the column values that aren't part of the group/aggregation, when it's an aggregate query.
Therefore have have to retrieve the required values using an aggregate expression, such as max(app.time). However, you can't ORDER by this value (not sure exactly why by it's probably inherrent in the efficiency aspect)
HOWEVER
What you can do is use the query to build a CTE and then sort without aggregates involved.
Consider the following, which I think mimics your problem:-
DROP TABLE IF EXISTS users;
DROP TABLE If EXISTS app;
CREATE TABLE IF NOT EXISTS users (id INTEGER PRIMARY KEY, username TEXT);
INSERT INTO users (username) VALUES ('a'),('b'),('c'),('d');
CREATE TABLE app (the_id INTEGER PRIMARY KEY, id INTEGER, appname TEXT, time TEXT);
INSERT INTO app (id,appname,time) VALUES
(4,'app9',721),(4,'app10',7654),(4,'app11',11),
(3,'app1',1000),(3,'app2',7),
(2,'app3',10),(2,'app4',101),(2,'app5',1),
(1,'app6',15),(1,'app7',7),(1,'app8',212),
(4,'app9',721),(4,'app10',7654),(4,'app11',11),
(3,'app1',1000),(3,'app2',7),
(2,'app3',10),(2,'app4',101),(2,'app5',1),
(1,'app6',15),(1,'app7',7),(1,'app8',212)
;
SELECT * FROM users;
SELECT * FROM app;
SELECT username
,count(app.id)
, max(app.time) AS latest_time
, min(app.time) AS earliest_time
FROM users JOIN app ON users.id = app.id
GROUP BY users.id
ORDER BY max(app.time)
;
This results in :-
Where although the latest time for each group has been extracted the final result hasn't been sorted as you would think.
Wrapping it into a CTE can fix that e.g. :-
WITH cte1 AS
(
SELECT username
,count(app.id)
, max(app.time) AS latest_time
, min(app.time) AS earliest_time
FROM users JOIN app ON users.id = app.id
GROUP BY users.id
)
SELECT * FROM cte1 ORDER BY cast(latest_time AS INTEGER) DESC;
and now :-
Note simple integers have been used instead of real times for my convenience.
Since you need the newest date in every group, you could just MAX them:
SELECT
*,
COUNT(app.id) AS totalApps,
MAX(app.time) AS latestDate
FROM users
JOIN app ON app.id = users.id
GROUP BY app.id
ORDER BY latestDate DESC
LIMIT ?
You could use windowed COUNT:
SELECT *, COUNT(app.id) OVER(PARTITION BY app.id) AS totalApps
FROM users
JOIN app
ON app.id = users.id
ORDER BY app.time DESC
LIMIT ?
Maybe you could use?
SELECT DISTINCT
Read more here: https://www.w3schools.com/sql/sql_distinct.asp
Try to grouping by id and time and then order by time.
select ...
group by app.id desc, app.time
I assume that id is unique in app table.
and how you assign ID to? maybe you have enough to order by id desc

How to select records from database table which has to user id (created_by_user, given_to_user) and replace users id by usernames?

This is task table:
This is user table:
I want to select user tasks.
I would give from backend ("given_to_user) id.
But The thing is I want that SELECTED data would have usernames instead of Id which is (created_by_user and given_to_user).
SELECTED table would look like this.
Example:
How to achieve what I want?
Or maybe I designed poorly my tables that It is difficult to select data I need? :)
task table has to id values that are foreign keys to user table.
I tried many thinks but couldn't get desired result.
You did not design poorly the tables.
In fact this is common practice to store the ids that reference columns in other tables. You just need to learn to implement joins:
SELECT
task.id, task.title, task.information, user.usename AS created_by, user2.usename AS given_to
FROM
(task INNER JOIN user ON task.created_by_user = user.id)
INNER JOIN user AS user2 ON task.created_by_user = user2.id;
Do you just want two joins?
select t.*, uc.username as created_by_username,
ug.username as given_to_username
from task t left join
users uc
on t.created_by_user = uc.id left join
users ug
on t.given_to_user = ug.id;
This uses left join in case one of the user ids is missing.

Using multiple columns with counts in access database designer

I am trying to display several columns with different counts in a microsoft access query. It doesn't let me do certain things a normal query can b/c it has the sql design view.
I'd like to display
multiple single etc columns with their counts.
Note: the table names and attributes have been changed.
select (select count(*)as multiple from (select userId from dbo.Purchases
where userId is not null GRoup by userId having count(*)>1) x), (
select count(*)as single from (select userId from dbo.Purchases where
userId is not null GRoup by userId having count(*)=1) x );
if I do these separately I can display it, but I'd like to combine them into one query and one row. Is this possible?
select count(*)as multiple from (select userId from dbo.Purchases
where userId is not null GRoup by userId having count(*)>1) x)
It's very easy with 2 queries:
First one, saved as "Purchases Summary"
Select UserID, count(UserID) as Count from Purchases Group By UserID
With a 2nd built on it:
SELECT Sum(IIf([count]=1,1)) AS [Single], Sum(IIf([count]>1,1)) AS Multiple FROM [Purchases Summary]
I cannot find a clever way to combine this into a single query.
I don't know what my problem last night was, but the single query is
SELECT Sum(IIf([count]=1,1)) AS [Single], Sum(IIf([count]>1,1)) AS Multiple
FROM (Select UserID, count(UserID) as Count from Purchases Group By UserID)
Don George solution also would work.
I ended up using a form and VBA for each column. An issue I had was that I needed to use a distinct call for unique IDs and when there is a sql design view that's not really supported. distinctrow is supported, but it would not work for my query. I ended up writing it as before so that it did not need distinct.
This is the VBA I used to override each input inside of an access form. The currentDb needs to be connected to the database properly before it will also work.
selectStatement = "SELECT Count(* ) FROM (SELECT userID FROM dbo_Purchases WHERE userID is not null GROUP BY userID HAVING count(*)>1) AS x;"
rs = CurrentDb.OpenRecordset(selectStatement).Fields(0).Value
[Text30].Value = rs

Constructing a query, for selecting a table with limit of associations

I have using the last too many hours trying to construct this sql query that i just can't wrap my head around.
I have three tables, with the following relations, i have removed the rest of the columns for simplicity.
- Jobs
id
- Company
id
- Offer
job_id
company_id
offer_type (either 'single' or 'voucher')
- Reservation
job_id
company_id
Context.
A user creates a job. Companies can make one or two offers (one of each type) on a job, a job is closed when a job gets offers from 3 different companies. Also a reservation can take one of the spots.
So i am trying to fetch all open jobs, for a listing to the company. That is all jobs which have received offers from 2 different companies.
As mentioned i have tried to come up with a query for this, so far i got.
;WITH company_offers AS
(
SELECT
DISTINCT ON(offers.company_id) offers.company_id,
count(offers.company_id) as total,
offers.job_id
FROM offers
GROUP BY offers.company_id, offers.job_id
),
counts AS
(
SELECT jobs.*,
(SELECT count(*) FROM company_offers) as offer_count,
(SELECT count(*) FROM reservations WHERE reservations.job_id = jobs.id) as reservation_count
FROM jobs
JOIN company_offers ON company_offers.job_id = jobs.id
GROUP BY jobs.id
)
SELECT offer_count+reservation_count as total
FROM counts
I have tried to fetch the offers by unique company id, in the first CTE. Then using the second CTE to count the results of the first, and also find the reservation. Then i add them together at last, and lastly i should make a condition that the total is less than 3.
But this doesn't return the expected result, in fact long from.
I would appreciate if someone could help me out, and explain aswell.
Let me know if you got question.
Some generic SQL could look like this:
select Jobs.id
from Jobs
left outer join Offer on Offer.job_id = Jobs.id
left outer join Reservation on Reservation.job_id = Jobs.id
group by Jobs.id
having count(distinct Offer.company_id) + count(distinct Reservation.company_id) < 3
If PostgreSQL does not like that count(distinct ...), you may have to include an equivalent sub-query.
By the way:
SELECT DISTINCT ... GROUP BY ..., i.e. DISTINCT and GROUP BY, usually does not work out.

Issues with subqueries for stored procedure

The query I am trying to perform is
With getusers As
(Select userID from userprofspecinst_v where institutionID IN
(select institutionID, professionID from userprofspecinst_v where userID=#UserID)
and professionID IN
(select institutionID, professionID from userprofspecinst_v where userID=#UserID))
select username from user where userID IN (select userID from getusers)
Here's what I'm trying to do. Given a userID and a view which contains the userID and the ID of their institution and profession, I want to get the list of other userID's who also have the same institutionID and and professionID. Then with that list of userIDs I want to get the usernames that correspond to each userID from another table (user). The error I am getting when I try to create the procedure is, "Only one expression can be specified in the select list when the subquery is not introduced with EXISTS.". Am I taking the correct approach to how I should build this query?
The following query should do what you want to do:
SELECT u.username
FROM user AS u
INNER JOIN userprofspecinst_v AS up ON u.userID = up.userID
INNER JOIN (SELECT institutionID, professionID FROM userprofspecinst_v
WHERE userID = #userID) AS ProInsts
ON (up.institutionID = ProInsts.institutionID
AND up.professionID = ProInsts.professionID)
Effectively the crucial part is the last INNER JOIN statement - this creates a table constituting the insitutionsids and professsionids the user id belongs to. We then get all matching items in the view with the same institution id and profession id (the ON condition) and then link these back to the user table on the corresponding userids (the first JOIN).
You can either run this for each user id you are interested in, or JOIN onto the result of a query (your getusers) (it depends on what database engine you are running).
If you aren't familiar with JOIN's, Jeff Atwood's introductory post is a good starting place.
The JOIN statement effectively allows you to explot the logical links between your tables - the userId, institutionID and professionID are all examples of candidates for foreign keys - so, rather than having to constantly subquery each table and piece the results together, you can link all the tables together and filter down to the rows you want. It's usually a cleaner, more maintainable approach (although that is opinion).