calculating average metric across two tables in Postgres SQL - sql

I currently have two tables, A and B, where
Table A:-
col1 col2
a 1,2,3
b 1,4,5
c 4
Table B:-
ID metric 1
1 231.0
2 1123.1
3 110
4 1231
5 116
I have to find the mean value of metric 1 for each col1 value in Table A. The resulting table should contain col1 in descending order measured by avg(metric1) value from table B, using SQL
Result: -
col1 avg(metric1) count
c 1231 1
b 526 3
c 488 3
any ideas on how I can come up with a query for the same in Postgres SQL? I've tried the following query, but this does not work :
combined_stats AS(
select avg(metric1), count(*)
from table_b
where ID in (select col2 from table_a)
group by (select col1 from table_a)

Fix your data model! Do not store numbers in strings! Do not store multiple values in a string!
Let me assume that you are stuck with someone else's really bad data model. If so, you can split the results and join:
select a.col1, avg(b.metric1), count(b.id)
from a left join
b
on b.id = any (regexp_split_to_array(col2, ','))
group by a.col1;
Note: If b.id is a number, then you need to deal with type conversions, something like:
on b.id::text = any (regexp_split_to_array(col2, ','))
Here is a db<>fiddle.

Related

How to count the amount of entry in a column separated by a comma

Currently I have a table as my database, and I want to create a Bar Chart to of out the Reasons Column. This is an example of my table:
Table Name: Survey
id
reasons
1
a,b,c
2
a,d,e
3
b,c,d
How to count total amount of each reasons like this table below?
reasons
total
a
2
b
2
c
2
d
1
e
1
You would use string_split():
select s.value as reason, count(*)
from t cross apply
string_split(reasons, ',') s
group by s.value
order by s.value;
That said, you should fix your data model. You should have a separate table with one row per reason.

Aggregate column text where dates in table a are between dates in table b

Sample data
CREATE TEMP TABLE a AS
SELECT id, adate::date, name
FROM ( VALUES
(1,'1/1/1900','test'),
(1,'3/1/1900','testing'),
(1,'4/1/1900','testinganother'),
(1,'6/1/1900','superbtest'),
(2,'1/1/1900','thebesttest'),
(2,'3/1/1900','suchtest'),
(2,'4/1/1900','test2'),
(2,'6/1/1900','test3'),
(2,'7/1/1900','test4')
) AS t(id,adate,name);
CREATE TEMP TABLE b AS
SELECT id, bdate::date, score
FROM ( VALUES
(1,'12/31/1899', 7 ),
(1,'4/1/1900' , 45),
(2,'12/31/1899', 19),
(2,'5/1/1900' , 29),
(2,'8/1/1900' , 14)
) AS t(id,bdate,score);
What I want
What I need to do is aggregate column text from table a where the id matches table b and the date from table a is between the two closest dates from table b. Desired output:
id date score textagg
1 12/31/1899 7 test, testing
1 4/1/1900 45 testinganother, superbtest
2 12/31/1899 19 thebesttest, suchtest, test2
2 5/1/1900 29 test3, test4
2 8/1/1900 14
My thoughts are to do something like this:
create table date_join
select a.id, string_agg(a.text, ','), b.*
from tablea a
left join tableb b
on a.id = b.id
*having a.date between b.date and b.date*;
but I am really struggling with the last line, figuring out how to aggregate only where the date in table b is between the closest two dates in table b. Any guidance is much appreciated.
I can't promise it's the best way to do it, but this is a way to do it.
with b_values as (
select
id, date as from_date, score,
lead (date, 1, '3000-01-01')
over (partition by id order by date) - 1 as thru_date
from b
)
select
bv.id, bv.from_date, bv.score,
string_agg (a.text, ',')
from
b_values as bv
left join a on
a.id = bv.id and
a.date between bv.from_date and bv.thru_date
group by
bv.id, bv.from_date, bv.score
order by
bv.id, bv.from_date
I'm presupposing you will never have a date in your table greater than 12/31/2999, so if you're still running this query after that date, please accept my apologies.
Here is the output I got when I ran this:
id from_date score string_agg
1 0 7 test,testing
1 92 45 testinganother,superbtest
2 0 19 thebesttest,suchtest,test2
2 122 29 test3,test4
2 214 14
I might also note that between in a join is a performance killer. IF you have large data volumes, there might be better ideas on how to approach this, but that depends largely on what your actual data looks like.

Count number of repeats in SQL

I tried to solve one problem but without success.
I have two list of number
{1,2,3,4}
{5,6,7,8,9}
And I have table
ID Number
1 1
1 2
1 7
1 2
1 6
2 8
2 7
2 3
2 9
Now I need to count how many times number from second list come after number from first list but I should count only one by one id
in example table above result should be 2
three matched pars but because we have only two different IDs result is 2 instead 3
Pars:
1 2
1 7
1 2
1 6
2 3
2 9
note. I work with MSSQL
Edit. There is one more column Date which determined order
Edit2 - Solution
i write this query
SELECT * FROM table t
left JOIN table tt ON tt.ID = t.ID
AND tt.Date > t.Date
AND t.Number IN (1,2,3,4)
AND tt.Number IN (6,7,8,9)
And after this I had a plan to group by id and use only one match for each id but execution take a lot time
Here is a query that would do it:
select a.id, min(a.number) as a, min(b.number) as b
from mytable a
inner join mytable b
on a.id = b.id
and a.date < b.date
and b.number in (5,6,7,8,9)
where a.number in (1,2,3,4)
group by a.id
Output is:
id a b
1 1 6
2 3 9
So the two pairs are output each on one line, with the value a belonging to the first group of numbers, and the value of column b to the second group.
Here is a fiddle
Comments on attempt (edit 2 to question)
Later you added a query attempt to your question. Some comments about that attempt:
You don't need a left join because you really want to have a match for both values. inner join has in general better performance, so use that.
The condition t.Number IN (1,2,3,4) does not belong in the on clause. In combination with a left join the result will include t records that violate this condition. It should be put in the where clause.
Your concern about performance may be warranted, but can be resolved by adding a useful index on your table, i.e. on (id, number, date) or (id, date, number)

sql server getting first value when grouping

I have a table with column a having not necessarily distinct values and column b having for each value of a a number of distinct values. I want to get a result having each value of a appearing only once and getting the first found value of b for that value of a. How do I do this in sql server 2000?
example table:
a b
1 aa
1 bb
2 zz
3 aa
3 zz
3 bb
4 bb
4 aa
Wanted result:
a b
1 aa
2 zz
3 aa
4 bb
In addition, I must add that the values in column b are all text values. I updated the example to reflect this.
Thanks
;with cte as
(
select *,
row_number() over(partition by a order by a) as rn
from yourtablename
)
select
a,b
from cte
where rn = 1
SQL does not know about ordering by table rows. You need to introduce order in the table structure (usually using an id column). That said, once you have an id column, it's rather easy:
SELECT a, b FROM test WHERE id in (SELECT MIN(id) FROM test GROUP BY a)
There might be a way to do this, using internal SQL Server functions. But this solution is portable and more easily understood by anyone who knows SQL.

Add Column values in sql server query

I have result of two queries like:
Result of query 1
ID Value
1 4
2 0
3 6
4 9
Result of query 2
ID Value
1 6
2 4
3 0
4 1
I want to add values column "Value" and show final result:
Result of Both queries
ID Value
1 10
2 4
3 6
4 10
plz guide me...
select id, sum(value) as value
from (
select id, value from query1
uninon all
select id, value from query2
) x
group by id
Try using a JOIN:
SELECT
T1.ID,
T1.Value + T2.Value AS Value
FROM (...query1...) AS T1
JOIN (...query2...) AS T2
ON T1.Id = T2.Id
You may also need to consider what should happen if there is an Id present in one result but not in the other. The current query will omit it from the results. You may want to investigate OUTER JOIN as an alternative.
A not particularly nice but fairly easy to comprehend way would be:
SELECT ID,SUM(Value) FROM
(
(SELECT IDColumn AS ID,ValueColumn AS Value FROM TableA) t1
OUTER JOIN
(SELECT IDColumn AS ID,ValueColumn AS Value FROM TableB) t2
) a GROUP BY a.ID
It has the benefits of
a) I don't know your actual table structure so you should be able to work out how to get the two 'SELECT's working from your original queries
b) If ID doesn't appear in either table, that's fine