Divide to rows that return a count - sql

I am new to Postgres and I was wondering how to divide two columns and place that value into a new column or if that is possible with what I am currently working with. The two columns I am trying to divide into each other are already created by Count functions.
This is my current query:
select w.publisher_id, w.sub_id_2, COUNT(w.contact_id), COUNT(e.edocs_signed_date)
from leads_last_90_days w
left join enrollments e on w.contact_id = e.contact_id
where w.sub_id is not null
group by w.publisher_id, w.sub_id_2
order by publisher_id desc
And this is what my results currently look like:
publisher_id sub_id count count
"1481" "11" 148 4
"1481" "7" 8 0
"1481" "695" 209 6
"1481" "266" 5 1
"1481" "54" 95 2
How do I divide the last column into the third column to get a closing percentage in a fifth column?

You can use mathematical operators (such as /) in SQL. Note, however, that dividing two integers in PostgreSQL will result in integer division, so you'll need to cast (at least one of) them to a real using the cast operator (::):
SELECT w.publisher_id,
w.sub_id_2,
COUNT(w.contact_id),
COUNT(e.edocs_signed_date)
COUNT(w.contact_id)::real / COUNT(e.edocs_signed_date) AS percentage
FROM leads_last_90_days w
LEFT JOIN enrollments e ON w.contact_id = e.contact_id
WHERE w.sub_id IS NOT NULL
GROUP BY w.publisher_id, w.sub_id_2
ORDER BY publisher_id DESC

Related

SQL: Find rows that match closely but not exactly

I have a table inside a PostgreSQL database with columns c1,c2...cn. I want to run a query that compares each row against a tuple of values v1,v2...vn. The query should not return an exact match but should return a list of rows ordered in descending similarity to the value vector v.
Example:
The table contains sports records:
1,USA,basketball,1956
2,Sweden,basketball,1998
3,Sweden,skating,1998
4,Switzerland,golf,2001
Now when I run a query against this table with v=(Sweden,basketball,1998), I want to get all records that have a similarity with this vector, sorted by number of matching columns in descending order:
2,Sweden,basketball,1998 --> 3 columns match
3,Sweden,skating,1998 --> 2 columns match
1,USA,basketball,1956 --> 1 column matches
Row 4 is not returned because it does not match at all.
Edit: All columns are equally important. Although, when I really think of it... it would be a nice add-on if I could give each column a different weight factor as well.
Is there any possible SQL query that would return the rows in a reasonable amount of time, even when I run it against a million rows?
What would such a query look like?
SELECT * FROM countries
WHERE country = 'sweden'
OR sport = 'basketball'
OR year = 1998
ORDER BY
cast(country = 'sweden' AS integer) +
cast(sport = 'basketball' as integer) +
cast(year = 1998 as integer) DESC
It's not beautiful, but well. You can cast the boolean expressions as integers and sum them.
You can easily change the weight, by adding a multiplicator.
cast(sport = 'basketball' as integer) * 5 +
This is how I would do it ... the multiplication factors used in the case stmts will handle the importance(weight) of the match and they will ensure that those records that have matches for columns designated with the highest weight will come up top even if the other columns don't match for those particular records.
/*
-- Initial Setup
-- drop table sport
create table sport (id int, Country varchar(20) , sport varchar(20) , yr int )
insert into sport values
(1,'USA','basketball','1956'),
(2,'Sweden','basketball','1998'),
(3,'Sweden','skating','1998'),
(4,'Switzerland','golf','2001')
select * from sport
*/
select * ,
CASE WHEN Country='sweden' then 1 else 0 end * 100 +
CASE WHEN sport='basketball' then 1 else 0 end * 10 +
CASE WHEN yr=1998 then 1 else 0 end * 1 as Match
from sport
WHERE
country = 'sweden'
OR sport = 'basketball'
OR yr = 1998
ORDER BY Match Desc
It might help if you wrote a stored procedure that calculates a "similarity metric" between two rows. Then your query could refer to the return value of that procedure directly rather than having umpteen conditions in the where-expression and the order-by-expression.

Sybase SQL CASE with CAST

I have a Sybase table (which I can't alter) that I am trying to get into a specific table format. The table contains three columns all which are string values, with an id (which is not unique), a "position" which is a number that represents a field name, and a field column that is the value. The table looks like:
id position field
100 0 John
100 1 Jane
100 2 25
100 3 50
101 0 Dave
101 3 30
Position 0 means "SalesRep1", Position 1 means "SR1Commission", Position 2 means "SalesRep2", and Position 3 means "SR2Commission".
I am trying to get a table that looks like following, with the Commission columns being decimals instead of strings:
id SalesRep1 SR1Commission SalesRep2 SR2Commisson
100 John 25 Jane 50
101 Dave 30 NULL NULL
I've gotten close using CASE, but I end up with only one value per row and not sure there's a way to do what I want. I also have problems with trying to get CAST included to change the commission values from strings to decimals. Here's what I have so far:
SELECT id
CASE "position" WHEN '0' THEN field END AS SalesRep1,
CASE "position" WHEN '1' THEN field END AS SalesRep2,
CASE "position" WHEN '2' THEN field END AS SR1Commission,
CASE "position" WHEN '3' THEN field END AS SR2Commission
FROM v_custom_field WHERE id = ?
This gives me the following result when querying for id 100:
id SalesRep1 SR1Commission SalesRep2 SR2Commission
100 John NULL NULL NULL
100 NULL 25 NULL NULL
100 NULL NULL Jane NULL
100 NULL NULL NULL 50
This is close, but I want to 'collapse' the rows down into one row based off of the id as well as cast the commission values to numbers. I tried adding in a CAST(field AS DECIMAL) I'm not sure if this is even the right direction to go, and was looking into PIVOT, but Sybase doesn't seem to support that.
This is known as an entity-attribute-value table. They're a pain to work with because they're one step removed from being relational data, but they're very common for user-defined fields in applications.
If you can't use PIVOT, you'll need to do something like this:
SELECT DISTINCT s.id,
f0.field AS SalesRep1,
CAST(f1.field AS DECIMAL(20,5)) AS SR1Commission,
f2.field AS SalesRep2,
CAST(f3.field AS DECIMAL(20,5)) AS SR2Commission
FROM UnnamedSalesTable s
LEFT JOIN UnnamedSalesTable f0
ON f0.id = s.id AND f0.position = 0
LEFT JOIN UnnamedSalesTable f1
ON f1.id = s.id AND f1.position = 1
LEFT JOIN UnnamedSalesTable f2
ON f2.id = s.id AND f2.position = 2
LEFT JOIN UnnamedSalesTable f3
ON f3.id = s.id AND f3.position = 3
It's not very fast because it's a ton of self-joins followed by a DISTINCT, but it does work.

Get MAX() on repeating IDs

This is how my query results look like currently. How can I get the MAX() value for each unique id ?
IE,
for 5267139 is 8.
for 5267145 is 4
5267136 5
5267137 8
5267137 2
5267139 8
5267139 5
5267139 3
5267141 4
5267141 3
5267145 4
5267145 3
5267146 1
5267147 2
5267152 3
5267153 3
5267155 8
SELECT DISTINCT st.ScoreID, st.ScoreTrackingTypeID
FROM ScoreTrackingType stt
LEFT JOIN ScoreTracking st
ON stt.ScoreTrackingTypeID = st.ScoreTrackingTypeID
ORDER BY st.ScoreID, st.ScoreTrackingTypeID DESC
GROUP BY will partition your table into separate blocks based on the column(s) you specify. You can then apply an aggregate function (MAX in this case) against each of the blocks -- this behavior applies by default with the below syntax:
SELECT First_column, MAX(Second_column) AS Max_second_column
FROM Table
GROUP BY First_column
EDIT: Based on the query above, it looks like you don't really need the ScoreTrackingType table at all, but leaving it in place, you could use:
SELECT st.ScoreID, MAX(st.ScoreTrackingTypeID) AS ScoreTrackingTypeID
FROM ScoreTrackingType stt
LEFT JOIN ScoreTracking st ON stt.ScoreTrackingTypeID = st.ScoreTrackingTypeID
GROUP BY st.ScoreID
ORDER BY st.ScoreID
The GROUP BY will obviate the need for DISTINCT, MAX will give you the value you are looking for, and the ORDER BY will still apply, but since there will only be a single ScoreTrackingTypeID value for each ScoreID you can pull it out of the ordering.

SQL ROUND A AVG

I have that 2 tables:
Evaluation
id_student teste
----------- -----
1 16
1 10
1 20
1 13
Student
id name
----------- ------
1 Jonh
And I want do the average of the column "teste" for the student with the id 1.
I used that query:
select ROUND(AVG(e.teste),0) from Student s, Evaluation e
where s.id=e.id_student and s.id=1 group by s.nome
That query return the value 14, but if i go to the calculator and do (16+10+20+13) / 4 it gives me 14.75 I already tryed with the ROUND to round the number, the query should return 15 instaed of 14.
Somebody know how I can soulve this? Thanks and sorry my english.
The problem is probably that the average is calculated using integer arithmetic. You don't specify your database, but some do use integer arithmetic. This has nothing to do with the round().
Try this out:
select AVG(e.teste)
from Student s join
Evaluation e
on s.id = e.id_student
where s.id = 1;
Notice the changes:
Fixed the join use proper, explicit join syntax.
Removed the round().
Removed the group by because you only seem to want to return one row.
This will return 14 and not 15, because the calculation is integer 59 divided by integer 4 to return an integer -- so it is truncated not rounded. You can fix this by converting to some sort of decimal/float representation. Often the easiest way is just by multiplying by 1.0:
select AVG(e.teste * 1.0)
from Student s join
Evaluation e
on s.id = e.id_student
where s.id = 1;
Once you have the average calculating correctly, you can apply round() if you like.

Why percentage is not working properly in SQLite3?

I have the following code for the following question however percentage is happening just to be zero:
SELECT p.state, (p.popestimate2011/sum(p.popestimate2011)) * 100
FROM pop_estimate_state_age_sex_race_origin p
WHERE p.age >= 21
GROUP BY p.state;
Also here's the table schema:
sqlite> .schema pop_estimate_state_age_sex_race_origin
CREATE TABLE pop_estimate_state_age_sex_race_origin (
sumlev NUMBER,
region NUMBER,
division NUMBER,
state NUMBER,
sex NUMBER,
origin NUMBER,
race NUMBER,
age NUMBER,
census2010pop NUMBER,
estimatesbase2010 NUMBER,
popestimate2010 NUMBER,
popestimate2011 NUMBER,
PRIMARY KEY(state, age, sex, race, origin),
FOREIGN KEY(sumlev) REFERENCES SUMLEV(sumlev_cd),
FOREIGN KEY(region) REFERENCES REGION(region_cd),
FOREIGN KEY(division) REFERENCES DIVISION(division_cd),
FOREIGN KEY(sex) REFERENCES SEX(sex_cd),
FOREIGN KEY(race) REFERENCES RACE(race_cd),
FOREIGN KEY(origin) REFERENCES ORIGIN(origin_cd));
So when I run the query it just shows 0 for the percentage:
stat p.popestimate
---- -------------
1 0
2 0
4 0
5 0
6 0
8 0
9 0
10 0
11 0
12 0
13 0
15 0
16 0
17 0
18 0
19 0
20 0
21 0
22 0
23 0
I was trying to write it using nested queries by didn't get anywhere too:
SELECT p.state, 100.0 * sum(p.popestimate2011) / total_pop AS percentage
FROM pop_estimate_state_age_sex_race_origin p
JOIN (SELECT state, sum(p2.popestimate2011) AS total_pop
FROM pop_estimate_state_age_sex_race_origin p2) s ON (s.state = p.state)
WHERE age >= 21
GROUP BY p.state, total_pop
ORDER BY p.state;
The current problem I am having is that it just shows one row as result and just shows the result for the last state number (state ID=56):
56 0.131294163192301
Here's an approach (not tested) that does not require an inner query. It makes a single pass over the table, aggregating by state, and using CASE to calculate the numerator of population aged over 20 and denominator of total state population.
SELECT
state,
(SUM(CASE WHEN age >= 21 THEN popestimate2011 ELSE 0) / SUM(popestimate2011)) * 100
FROM pop_estimate_state_age_sex_race_origin
GROUP BY state
I'm not sure why your SQL statement is executing at all. You are including the non-aggregated column value popestimate2011 in a GROUP BY select and that should generate an error.
A closer reading of the SQLite documentation indicates that it does, in fact, support random value selection for non-aggregate columns in the result expression list (a feature also offered by MySQL). This explains:
Why your SELECT statement is able to execute (a random value is chosen for the non-aggregated popestimate2011 reference).
Why you are seeing a result of 0: the random value chosen is probably the first occurring row and if the rows were added to the database in order that row probably has an age value of 0. Since the numerator in your division would then be 0, the result is also 0.
As to the meat of your calculation it's not clear from your table definition whether the data in your base table is already aggregated or not and, if so, what the age column represents (an average? the grouping factor for that row?)
Finally, SQLite does not have a NUMBER data type. These columns will get the default affinity of NUMERIC which is probably what you want but might not be.
You need something along these lines (not tested):
SELECT state, SUM(popestimate2011) /
(SELECT SUM(popestimate2011)
FROM pop_estimate_state_age_sex_race_origin
WHERE age > 21)))
* 100 as percentage
FROM pop_estimate_state_age_sex_race_origi
WHERE age >= 21
GROUP by state
;
The NUMBER type does not exist in SQLite.
SQLite interprets as INTEGER and
decimals are lost in an integer division
(p.popestimate2011 / sum (p.popestimate2011))
is always 0.
Change the type of the column popestimate2011 REAL
or use CAST (...)
(CAST (p.popestimate2011 AS REAL) / SUM (p.popestimate2011))