Compare data from query result to different table data in PostgreSQL - sql

STEP 1
Select data1,name,phone,address from dummyTable limit 4;
From above query, I will get the following result for example:
data1 | name | phone | address
fgh | hjk | 567...| CA
ghjkk | jkjii| 555...| NY
Now, after having the above result I am suppose to match data1 records that I got from above query to existing another table in a database called existingTable which has a same column called data1 in it. If the result above gives data1 value as 'fgh' so I take that 'fgh' and compare with that existingtable column called data1.
STEP 2
Next, after I am finished comparing, I need to apply some condition as follows:
if((results.data1.value).equals(existingTable.data1.value))
then count --
else
count++
So by above condition I am trying to explain, that if the value I got from the result is matched then I do count decrement by 1 and if not then count is incremented by 1.
Summary
I basically wanted to achieve this in one single query, is it possible using PostgreSQL?

I think you can translate that to a simple query:
SELECT d.data1, d.name, d.phone, d.address
, count(*) - 2 * count(e.data1)
FROM (
SELECT data1, name, phone, address
FROM dummytable
-- ORDER BY ???
LIMIT 4
) d
LEFT JOIN existingtable e USING (data1)
GROUP BY d.data1, d.name, d.phone, d.address;
The major ingredient is the LEFT [OUTER] JOIN. Follow the link to the manual.
count(*) counts all rows from dummytable.
count(e.data1) only counts rows from existingtable where a matching data1 exists (count() does not count NULL values). I subtract that twice to match your formula.
About ORDER BY: There is no natural order in a database table. You need to order by something to get predictable results.
If there can be duplicates in existingtable but you want to count every distinct data1 only once, eliminate dupes before you join or use an EXISTS semi-join:
SELECT data1, name, phone, address
, count(*) - 2 * count(EXISTS (
SELECT 1 FROM existingtable e
WHERE e.data1 = d.data1) OR NULL)
FROM (
SELECT data1, name, phone, address
FROM dummytable
-- ORDER BY ???
LIMIT 4
) d
GROUP BY data1, name, phone, address;
The last count works because (TRUE OR NULL) IS TRUE, but (FALSE OR NULL) IS NULL.

Related

Compare every field in table to every other field in same table

Imagine a table with only one column.
+------+
| v |
+------+
|0.1234|
|0.8923|
|0.5221|
+------+
I want to do the following for row K:
Take row K=1 value: 0.1234
Count how many values in the rest of the table are less than or equal to value in row 1.
Iterate through all rows
Output should be:
+------+-------+
| v |output |
+------+-------+
|0.1234| 0 |
|0.8923| 2 |
|0.5221| 1 |
+------+-------+
Quick Update I was using this approach to compute a statistic at every value of v in the above table. The cross join approach was way too slow for the size of data I was dealing with. So, instead I computed my stat for a grid of v values and then matched them to the vs in the original data. v_table is the data table from before and stat_comp is the statistics table.
AS SELECT t1.*
,CASE WHEN v<=1.000000 THEN pr_1
WHEN v<=2.000000 AND v>1.000000 THEN pr_2
FROM v_table AS t1
LEFT OUTER JOIN stat_comp AS t2
Windows functions were added to ANSI/ISO SQL in 1999 and to to Hive in version 0.11, which was released on 15 May, 2013.
What you are looking for is a variation on rank with ties high which in ANSI/ISO SQL:2011 would look like this-
rank () over (order by v with ties high) - 1
Hive currently does not support with ties ... but the logic can be implemented using count(*) over (...)
select v
,count(*) over (order by v) - 1 as rank_with_ties_high_implicit
from mytable
;
or
select v
,count(*) over
(
order by v
range between unbounded preceding and current row
) - 1 as rank_with_ties_high_explicit
from mytable
;
Generate sample data
select 0.1234 as v into #t
union all
select 0.8923
union all
select 0.5221
This is the query
;with ct as (
select ROW_NUMBER() over (order by v) rn
, v
from #t ot
)
select distinct v, a.cnt
from ct ot
outer apply (select count(*) cnt from ct where ct.rn <> ot.rn and v <= ot.v) a
After seeing your edits, it really does look look like you could use a Cartesian product, i.e. CROSS JOIN here. I called your table foo, and crossed joined it to itself as bar:
SELECT foo.v, COUNT(foo.v) - 1 AS output
FROM foo
CROSS JOIN foo bar
WHERE foo.v >= bar.v
GROUP BY foo.v;
Here's a fiddle.
This query cross joins the column such that every permutation of the column's elements is returned (you can see this yourself by removing the SUM and GROUP BY clauses, and adding bar.v to the SELECT). It then adds one count when foo.v >= bar.v, yielding the final result.
You can take the full Cartesian product of the table with itself and sum a case statement:
select a.x
, sum(case when b.x < a.x then 1 else 0 end) as count_less_than_x
from (select distinct x from T) a
, T b
group by a.x
This will give you one row per unique value in the table with the count of non-unique rows whose value is less than this value.
Notice that there is neither a join nor a where clause. In this case, we actually want that. For each row of a we get a full copy aliased as b. We can then check each one to see whether or not it's less than a.x. If it is, we add 1 to the count. If not, we just add 0.

SQL - Count Results of 2 Columns

I have the following table which contains ID's and UserId's.
ID UserID
1111 11
1111 300
1111 51
1122 11
1122 22
1122 3333
1122 45
I'm trying to count the distinct number of 'IDs' so that I get a total, but I also need to get a total of ID's that have also seen the that particular ID as well... To get the ID's, I've had to perform a subquery within another table to get ID's, I then pass this into the main query... Now I just want the results to be displayed as follows.
So I get a Total No for ID and a Total Number for Users ID - Also would like to add another column to get average as well for each ID
TotalID Total_UserID Average
2 7 3.5
If Possible I would also like to get an average as well, but not sure how to calculate that. So I would need to count all the 'UserID's for an ID add them altogether and then find the AVG. (Any Advice on that caluclation would be appreciated.)
Current Query.
SELECT DISTINCT(a.ID)
,COUNT(b.UserID)
FROM a
INNER JOIN b ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999)
GROUP BY a.ID
Which then Lists all the IDs and COUNT's all the USERID.. I would like a total of both columns. I've tried warpping the query in a
SELECT COUNT(*) FROM (
but this only counts the ID's which is great, but how do I count the USERID column as well
You seem to want this:
SELECT COUNT(DISTINCT a.ID), COUNT(b.UserID),
COUNT(b.UserID) * 1.0 / COUNT(DISTINCT a.ID)
FROM a INNER JOIN
b
ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999);
Note: DISTINCT is not a function. It applies to the whole row, so it is misleading to put an expression in parentheses after it.
Also, the GROUP BY is unnecessary.
The 1.0 is because SQL Server does integer arithmetic and this is a simple way to convert a number to a decimal form.
You can use
SELECT COUNT(DISTINCT a.ID) ...
to count all distinct values
Read details here
I believe you want this:
select TotalID,
Total_UserID,
sum(Total_UserID+TotalID) as Total,
Total_UserID/TotalID as Average
from (
SELECT (DISTINCT a.ID) as TotalID
,COUNT(b.UserID) as Total_UserID
FROM a
INNER JOIN b ON someID = someID
WHERE a.ID IN ( SELECT ID FROM c WHERE GROUPID = 9999)
) x

sql query - difference between the row values of same column

Can anybody tell me how to calculate the difference between the rows of the same column?
ID DeviceID Reading Date Flag
1 2 10 12/02/2015 1
2 3 08 12/02/2015 1
3 2 12 12/02/2015 1
4 2 20 12/02/2015 0
5 4 10 12/02/2015 0
6 2 19 12/02/2015 0
In ABOVE table I want to calculate the difference between the Readings for DeviceID 2 for some date say 12/02/2015 for example,
(12-10=2)
(20-12=8)
(19-2 =-1) and want to sum up this difference
i.e. 2+8+(-1)=9
If you use MS Access, I was try this code for your question:
I was made 4 query in MS Access:
Query1 to get data deviceId=2 and date=12/2/2015:
select id, reading from table1 where deviceid=2 and date=#12/2/2015#;
Then I make Query2 to get row number from query1:
select
(select count(*) from query1 where a.id>=id) as rowno,
a.reading from query1 a;
Then I make Query3 to get difference value field reading from query2:
select
(tbl2.reading-tbl1.reading) as diff
from query2 tbl1
left join query2 tbl2 on tbl1.rowno=tbl2.rowno-1
And then final query to get sum from result difference in query3:
SELECT sum(diff) as Total_Diff
FROM Query3;
But, if you use SQL Server, you can use this query (look for example sqlfiddle):
;with tbl as(
select row_number()over(order by id) as rowno,
reading
from table1
where deviceid=2 and date='20150212'
)
select sum(diff) as sum_diff
from (
select
(b.reading-a.reading) as diff
from tbl a
left join tbl b on a.rowno=b.rowno-1
) tbl_diff
You can try this (replace Table1 with your table name):
SELECT Sum([Diffs].[Difference]) AS FinalReading
FROM (
SELECT IDs.DeviceID, [Table1].Reading AS NextReading, Table1_1.Reading AS PrevReading, [Table1].Reading-Table1_1.Reading AS Difference
FROM (
(
SELECT [Table1].DeviceID,
[Table1].ID,
CLng(Nz(DMax("ID","Table1","[DeviceID] = " & [DeviceID] & " And [ID] < " & [ID]),0)) AS PrevID
FROM Table1
WHERE DeviceID = 2
) AS IDs
INNER JOIN Table1
ON IDs.ID=[Table1].ID)
INNER JOIN Table1 AS Table1_1
ON IDs.PrevID=Table1_1.ID
) AS Diffs;
The IDs table expression calculates the prev ID for the DeviceID in question. (I put the WHERE clause in this table expression, but you can move it to the outer one if you want to calc the FinalReadings for ALL devices at once, the filter it at the end. Less efficient but more flexible.) We join back to the original tables on the ID and PrevIDs from the inner table expressions, get their Reading values, and perform the difference operation in the Diffs table expression. The final outer query just sums the Difference values from each row value.

Select data from a table where only the first two columns are distinct

Background
I have a table which has six columns. The first three columns create the pk. I'm tasked with removing one of the pk columns.
I selected (using distinct) the data into a temp table (excluding the third column), and tried inserting all of that data back into the original table with the third column being '11' for every row as this is what I was instructed to do. (this column is going to be removed by a DBA after I do this)
However, when I went to insert this data back into the original table I get a pk constraint error. (shocking, I know)
The other three columns are just date columns, so the distinct select didn't create a unique pk for each record. What I'm trying to achieve is just calling a distinct on the first two columns, and then just arbitrarily selecting the three other columns as it doesn't matter which dates I choose (at least not on dev).
What I've tried
I found the following post which seems to achieve what I want:
How do I (or can I) SELECT DISTINCT on multiple columns?
I tried the answers from both Joel,and Erwin.
Attempt 1:
However, with Joels answer the set returned is too large - the inner join isn't doing what I thought it would do. Selecting distinct col1 and col2 there are 400 columns returned, however when I use his solution 600 rows are returned. I checked the data and in fact there were duplicate pk's. Here is my attempt at duplicating Joels answer:
select a.emp_no,
a.eec_planning_unit_cde,
'11' as area, create_dte,
create_by_emp_no, modify_dte,
modify_by_emp_no
from tempdb.guest.temp_part_time_evaluator b
inner join
(
select emp_no, eec_planning_unit_cde
from tempdb.guest.temp_part_time_evaluator
group by emp_no, eec_planning_unit_cde
) a
ON b.emp_no = a.emp_no AND b.eec_planning_unit_cde = a.eec_planning_unit_cde
Now, if I execute just the inner select statement 400 rows are returned. If I select the whole query 600 rows are returned? Isn't inner join supposed to only show the intersection of the two sets?
Attempt 2:
I also tried the answer from Erwin. This one has a syntax error and I'm having trouble googling the spec on the where clause (specifically, the trick he is using with (emp_no, eec_planning_unit_cde))
Here is the attempt:
select emp_no,
eec_planning_unit_cde,
'11' as area, create_dte,
create_by_emp_no,
modify_dte,
modify_by_emp_no
where (emp_no, eec_planning_unit_cde) IN
(
select emp_no, eec_planning_unit_cde
from tempdb.guest.temp_part_time_evaluator
group by emp_no, eec_planning_unit_cde
)
Now, I realize that the post I referenced is for postgresql. Doesn't T-SQL have something similar? Trying to google parenthesis isn't working too well.
Overview of Questions:
Why doesn't inner join return an intersection of two sets? From googling this is what I thought it was supposed to do
Is there another way to achieve the same method that I was trying in attempt 2 in t-sql?
It doesn't matter to me which one of these I use, or if I use another solution... how should I go about this?
A select distinct will be based on all columns so it does not guarantee the first two to be distinct
select pk1, pk2, '11', max(c1), max(c2), max(c3)
from table
group by pk1, pk2
You could TRY this:
SELECT a.emp_no,
a.eec_planning_unit_cde,
b.'11' as area,
b.create_dte,
b.create_by_emp_no,
b.modify_dte,
b.modify_by_emp_no
FROM
(
SELECT emp_no, eec_planning_unit_cde
FROM tempdb.guest.temp_part_time_evaluator
GROUP BY emp_no, eec_planning_unit_cde
) a
JOIN tempdb.guest.temp_part_time_evaluator b
ON a.emp_no = b.emp_no AND a.eec_planning_unit_cde = b.eec_planning_unit_cde
That would give you a distinct on those fields but if there is differences in the data between columns you might have to try a more brute force approch.
SELECT a.emp_no,
a.eec_planning_unit_cde,
a.'11' as area,
a.create_dte,
a.create_by_emp_no,
a.modify_dte,
a.modify_by_emp_no
FROM
(
SELECT ROW_NUMBER() OVER(ORDER BY emp_no, eec_planning_unit_cde) rownumber,
a.emp_no,
a.eec_planning_unit_cde,
a.'11' as area,
a.create_dte,
a.create_by_emp_no,
a.modify_dte,
a.modify_by_emp_no
FROM tempdb.guest.temp_part_time_evaluator
) a
WHERE rownumber = 1
I'll reply one by one:
Why doesn't inner join return an intersection of two sets? From googling this is what I thought it was supposed to do
Inner join don't do an intersection. Le'ts supose this tables:
T1 T2
n s n s
1 A 2 X
2 B 2 Y
2 C
3 D
If you join both tables by numeric column you don't get the intersection (2 rows). You get:
select *
from t1 inner join t2
on t1.n = t2.n;
| N | S |
---------
| 2 | B |
| 2 | B |
| 2 | C |
| 2 | C |
And, your second query approach:
select *
from t1
where t1.n in (select n from t2);
| N | S |
---------
| 2 | B |
| 2 | C |
Is there another way to achieve the same method that I was trying in attempt 2 in t-sql?
Yes, this subquery:
select *
from t1
where not exists (
select 1
from t2
where t2.n = t1.n
);
It doesn't matter to me which one of these I use, or if I use another solution... how should I go about this?
yes, using #JTC second query.

Select values in SQL that do not have other corresponding values except those that i search for

I have a table in my database:
Name | Element
1 2
1 3
4 2
4 3
4 5
I need to make a query that for a number of arguments will select the value of Name that has on the right side these and only these values.
E.g.:
arguments are 2 and 3, the query should return only 1 and not 4 (because 4 also has 5). For arguments 2,3,5 it should return 4.
My query looks like this:
SELECT name FROM aggregations WHERE (element=2 and name in (select name from aggregations where element=3))
What do i have to add to this query to make it not return 4?
A simple way to do it:
SELECT name
FROM aggregations
WHERE element IN (2,3)
GROUP BY name
HAVING COUNT(element) = 2
If you want to add more, you'll need to change both the IN (2,3) part and the HAVING part:
SELECT name
FROM aggregations
WHERE element IN (2,3,5)
GROUP BY name
HAVING COUNT(element) = 3
A more robust way would be to check for everything that isn't not in your set:
SELECT name
FROM aggregations
WHERE NOT EXISTS (
SELECT DISTINCT a.element
FROM aggregations a
WHERE a.element NOT IN (2,3,5)
AND a.name = aggregations.name
)
GROUP BY name
HAVING COUNT(element) = 3
It's not very efficient, though.
Create a temporary table, fill it with your values and query like this:
SELECT name
FROM (
SELECT DISTINCT name
FROM aggregations
) n
WHERE NOT EXISTS
(
SELECT 1
FROM (
SELECT element
FROM aggregations aii
WHERE aii.name = n.name
) ai
FULL OUTER JOIN
temptable tt
ON tt.element = ai.element
WHERE ai.element IS NULL OR tt.element IS NULL
)
This is more efficient than using COUNT(*), since it will stop checking a name as soon as it finds the first row that doesn't have a match (either in aggregations or in temptable)
This isn't tested, but usually I would do this with a query in my where clause for a small amount of data. Note that this is not efficient for large record counts.
SELECT ag1.Name FROM aggregations ag1
WHERE ag1.Element IN (2,3)
AND 0 = (select COUNT(ag2.Name)
FROM aggregatsions ag2
WHERE ag1.Name = ag2.Name
AND ag2.Element NOT IN (2,3)
)
GROUP BY ag1.name;
This says "Give me all of the names that have the elements I want, but have no records with elements I don't want"