PostgreSQL Return Row if Value Exists in One of Several Columns - sql

Ok, I am stuck on this one.
I have a PostgreSQL table customers that looks like this:
id firm1 firm2 firm3 firm4 firm5 lastname firstname
1 13 8 2 0 0 Smith John
2 3 2 0 0 0 Doe Jane
Each row corresponds to a client/customer. Each client/customer can be associated with one or multiple firms; the numeric value under each firm# columns corresponds to the firm id in a different table.
So I am looking for a way of returning all rows of customers that are associated with a specific firm.
For example, SELECT id, lastname, firstname where 8 exists in firm1, firm2, firm3, firm4, firm5 would just return the John Smith row as he is associated with firm 8 under the firm2 column.
Any ideas on how to accomplish that?

You can use the IN operator for that:
SELECT *
FROM customer
where 8 IN (firm1, firm2, firm3, firm4, firm5);
But it would be much better in the long run if your normalized your data model.

You should consider to normalize your tables, with the current schema you should join firms tables as many times as the number of firm fields in your customer table.
select *
from customers c
left join firms f1
on f1.firm_id = c.firm1
left join firms f2
on f2.firm_id = c.firm2
left join firms f3
on f3.firm_id = c.firm3
left join firms f4
on f4.firm_id = c.firm4

You can "unpivot" using a combination of array and unnest, as specified in this answer: unpivot and PostgreSQL.
In your case, I think this should work:
select lastname,
firstname,
unnest(array[firm1, firm2, firm3, firm4, firm5]) as firm_id
from customer
Now you can select from this table (using either a with statement or an inner query) where firm_id is the value you care about

Related

How to write a SQL query to calculate percentages based on values across different tables?

Suppose I have a database containing two tables, similar to below:
Table 1:
tweet_id tweet
1 Scrap the election results
2 The election was great!
3 Great stuff
Table 2:
politician tweet_id
TRUE 1
FALSE 2
FALSE 3
I'm trying to write a SQL query which returns the percentage of tweets that contain the word 'election' broken down by whether they were a politician or not.
So for instance here, the first 2 tweets in Table 1 contain the word election. By looking at Table 2, you can see that tweet_id 1 was written by a politician, whereas tweet_id 2 was written by a non-politician.
Hence, the result of the SQL query should return 50% for politicians and 50% for non-politicians (i.e. two tweets contained the word 'election', one by a politician and one by a non-politician).
Any ideas how to write this in SQL?
You could do this by creating one subquery to return all election tweets, and one subquery to return all election tweets by politicians, then join.
Here is a sample. Note that you may need to cast the totals to decimals before dividing (depending on which SQL provider you are working in).
select
politician_tweets.total / election_tweets.total
from
(
select
count(tweet) as total
from
table_1
join table_2 on table_1.tweet_id = table_2.tweet_id
where
tweet like '%election%'
) election_tweets
join
(
select
count(tweet) as total
from
table_1
join table_2 on table_1.tweet_id = table_2.tweet_id
where
tweet like '%election%' and
politician = 1
) politician_tweets
on 1 = 1
You can use aggregation like this:
select t2.politician, avg( case when t.tweet like '%election%' then 1.0 else 0 end) as election_ratio
from tweets t join
table2 t2
on t.tweet_id = t2.tweet_id
group by t2.politician;
Here is a db<>fiddle.

Get union and intersection of jsonb array in Postgresql

I have a DB of people with jsonb column interests. In my application user can search for people by providing their hobbies which is set of some predefined values. I want to offer him a best match and in order to do so I would like to count match as intersection/union of interests. This way the top results won't be people who have plenty of hobbies in my DB.
Example:
DB records:
name interests::jsonb
Mary ["swimming","reading","jogging"]
John ["climbing","reading"]
Ann ["swimming","watching TV","programming"]
Carl ["knitting"]
user input in app:
["reading", "swimming", "knitting", "cars"]
my script should output this:
Mary 0.4
John 0.2
Ann 0.16667
Carl 0.25
Now I'm using
SELECT name
FROM people
WHERE interests #>
ANY (ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"']::jsonb[])
but this gives me even records with many interests and no way to order it.
Is there any way I can achieve it in a reasonable time - let's say up to 5 seconds in DB with around 400K records?
EDIT:
I added another example to clarify my calculations. My calculation needs to filter people with many hobbies. Therefore match should be calculated as Intersection(input, db_record)/Union(input, db_record).
Example:
input = ["reading"]
DB records:
name interests::jsonb
Mary ["swimming","reading","jogging"]
John ["climbing","reading"]
Ann ["swimming","watching TV","programming"]
Carl ["reading"]
Match for Mary would be calculated as (LENGTH(["reading"]))/(LENGTH(["swimming","reading","jogging"])) which is 0.3333
and for Carl it would be (LENGTH(["reading"]))/LENGTH([("reading")]) which is 1
UPDATE: I managed to do it with
SELECT result.id, result.name, result.overlap_count/(jsonb_array_length(persons.interests) + 4 - result.overlap_count)::decimal as score
FROM (SELECT t1.name as name, t1.id, COUNT(t1.name) as overlap_count
FROM (SELECT name, id, jsonb_array_elements(interests)
FROM persons) as t1
JOIN (SELECT unnest(ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"'])::jsonb as elements) as t2 ON t1.jsonb_array_elements = t2.elements
GROUP BY t1.name, t1.id) as result
JOIN persons ON result.id = persons.id ORDER BY score desc
Here's my fiddle https://dbfiddle.uk/?rdbms=postgres_12&fiddle=b4b1760854b2d77a1c7e6011d074a1a3
However it's not fast enough and I would appreciate any improvements.
One option is to unnest the parameter and use the ? operator to check each and every element the jsonb array:
select
t.name,
x.match_ratio
from mytable t
cross join lateral (
select avg( (t.interests ? a.val)::int ) match_ratio
from unnest(array['reading', 'swimming', 'knitting', 'cars']) a(val)
) x
It is not very clear what are the rules behind the result that you are showing. This gives you a ratio that represents the percentage of values in the parameter array that can be found in the interests of each person (so Mary gets 0.5 since she has two interests in common with the search parameter, and all other names get 0.25).
Demo on DB Fiddle
One option would be using jsonb_array_elements() to unnest the jsonb column :
SELECT name, count / SUM(count) over () AS ratio
FROM(
SELECT name, COUNT(name) AS count
FROM people
JOIN jsonb_array_elements(interests) AS j(elm) ON TRUE
WHERE interests #>
ANY (ARRAY ['"reading"', '"swimming"', '"knitting"', '"cars"']::jsonb[])
GROUP BY name ) q
Demo

Access SQL: How to retrieve sums of multiple values where user IDs are assigned to multiple positions

I'm working on an Access database for assigning tasks to personnel and tracking task status and workload. A single user ID can be assigned to one of many fields associated with a particular task. In this case, the Task table has fields for "TechReviewerID" "DesignerID" "TechReviewerWorkload" and "DesignerWorkload."
I want one query to return one row for each person, with two summary columns totaling all of the workload assigned to them. So if I'm ID1, I want column 3 to return the sum of "TechReviewerWorkload" in all tasks where "TechReviewerID = 1" and column 4 to return the sum of "DesignerWorkload" in all tasks where "DesignerID = 1."
I have successfully written two separate queries that accomplish this:
SELECT MESPersonnel.MESID, MESPersonnel.PersonnelName,
IIF(SUM(DesignerTask.DesignerWorkload) IS NULL, 0, SUM(DesignerTask.DesignerWorkload)) AS
TotalDesignerWorkload
FROM
(MESPersonnel LEFT OUTER JOIN Task AS DesignerTask ON (MESPersonnel.MESID =
DesignerTask.DesignerID
AND DesignerTask.DueDate < CDATE('2020-07-30') AND DesignerTask.DueDate > CDATE ('2020-05-01')))
WHERE MESPersonnel.PositionID = 1
GROUP BY MESPersonnel.MESID, MESPersonnel.PersonnelName;
This query gives the following table:
MESID PersonnelName TotalDesignerWorkload
1 John Doe 40
2 Dohn Joe 20
I can create a near-identical query by replacing all instances of "designer" terms with "tech reviewer" terms.
What I'm looking for is a table like:
MESID PersonnelName TotalDesignerWorkload TotalReviewerWorkload
1 John Doe 40 10
2 Dohn Joe 20 20
My attempts to combine these two via multiple outer joins resulted in wildly inaccurate sums. I know how to solve that for items on different tables, but I'm not sure how to resolve it when I'm using two items from the same table. Is there some kind of conditional sum I can use in my query that Access supports?
EDIT: Sample Raw Data
Task Table
TaskID DesignerID TechReviewerID DesignerWorkload TechReviewerWorkload DueDate
1 1 2 40 20 06-20-2020
2 2 1 20 10 06-20-2020
MESPersonnel Table
MESID PersonnelName
1 John Doe
2 Dohn Joe
Consider:
Query1: TaskUNION
rearranges data to a normalized structure
SELECT TaskID, DesignerID AS UID, PersonnelName, DesignerWorkload AS Data, DueDate, "Design" AS Cat FROM MESPersonnel
INNER JOIN Task ON MESPersonnel.MESID = Task.DesignerID
UNION SELECT TaskID, TechReviewerID, PersonnelName, TechReviewerWorkload, DueDate, "Tech" FROM MESPersonnel
INNER JOIN Task ON MESPersonnel.MESID = Task.TechReviewerID;
Query2:
TRANSFORM Sum(Data) AS SumData
SELECT UID, PersonnelName
FROM TaskUNION
WHERE DueDate BETWEEN #5/1/2020# AND #7/31/2020#
GROUP BY UID, PersonnelName
PIVOT Cat;
An alternative would involve 2 simple, filtered aggregate queries on Task table then join those 2 queries to MESPersonnel. Here as all-in-one statement:
SELECT MESID, PersonnelName, SumOfDesignerWorkload, SumOfTechReviewerWorkload
FROM (
SELECT DesignerID, Sum(DesignerWorkload) AS SumOfDesignerWorkload
FROM Task WHERE DueDate BETWEEN #5/1/2020# AND #7/31/2020# GROUP BY DesignerID) AS SumDesi
RIGHT JOIN ((
SELECT TechReviewerID, Sum(TechReviewerWorkload) AS SumOfTechReviewerWorkload
FROM Task WHERE DueDate BETWEEN #5/1/2020# AND #7/31/2020# GROUP BY TechReviewerID) AS SumTech
RIGHT JOIN MESPersonnel ON SumTech.TechReviewerID = MESPersonnel.MESID)
ON SumDesi.DesignerID = MESPersonnel.MESID;

Postgresql: Values of multiple rows in one row

I have the following database:
Car: {[CarID, HorsePower, Brand, HeadDesigner]}
DesignsCar:{[CarID, DesID]}
Designer:{[DesID, Name]}
You should note that while every Car has only 1 HeadDesigner, multiple people can design cars (as in work on them).
Say I have 10 cars in my database. For CarID (1..9) only one DesID per CarID in DesignsCar.
However, for carID 10 we have 3 people working on it (carID has 3 entries in DesignsCar because 3 people worked on it).
Say I do this:
select *
from car c
left outer join designscar ds on c.carid = ds.carid
left outer join designer d on frb.persnr = r.persnr
This gives me 12 rows, when I only want 10. The reason why this gives me 12 rows should be clear: for carID 10 we have 3 people working on it (carID has 3 entries in DesignsCar because 3 people worked on it).
I hope I've done a good job explaining this problem, so here comes my question:
How do I modify the query above so I get 10 Rows. For CarID 10 I'd like the 3 designers to be written in one column (like, comma separated but anything works as long it's in one column).
Is that possible?
You need to aggregate the values. Here is one possibility:
select c.*,
array_agg(d.name) as designer_names
from car c left outer join
designscar ds
on c.carid = ds.carid left outer join
designer d
on frb.persnr = r.persnr
group by c.carid ; -- allowed assuming `carid` is the primary key

SQL server - How to find the highest number in '<> ' in a text column?

Lets say I have the following data in the Employee table: (nothing more)
ID FirstName LastName x
-------------------------------------------------------------------
20 John Mackenzie <A>te</A><b>wq</b><a>342</a><d>rt21</d>
21 Ted Green <A>re</A><b>es</b><1>t34w</1><4>65z</4>
22 Marcy Nate <A>ds</A><b>tf</b><3>fv 34</3><6>65aa</6>
I need to search in the X column and get highest number in <> these brackets
What sort of SELECT statement can get me, for example, the number 6 like in <6>, in the x column?
This type of query generally works on finding patterns, I consider that the <6> is at the 9th position from left.
Please note if the pattern changes the below query will not work.
SELECT A.* FROM YOURTABLE A INNER JOIN
(SELECT TOP 1 ID,Firstname,Lastname,SUBSTRING(X,LEN(X)-9,1) AS [ORDER]
FROM YOURTABLE
WHERE ISNUMERIC(SUBSTRING(X,LEN(X)-9,1))=1
ORDER BY SUBSTRING(X,LEN(X)-9,1))B
ON
A.ID=B.ID AND
A.FIRSTNAME=B.FIRSTNAME AND
A.LASTNAME=B.LASTNAME