Join between two tables - sql

table1 - doctors
+---------+--------+------+
| country | state | doc |
+---------+--------+------+
| india | AP | 20 |
+---------+--------+------+
| india | TN | 30 |
+---------+--------+------+
| india | KA | 10 |
+---------+--------+------+
| US | LA | 30 |
+---------+--------+------+
| US | CA | 10 |
+---------+--------+------+
| US | NY | 50 |
+---------+--------+------+
table2 - engineers
+---------+--------+-------+
| country | state | engg |
+---------+--------+-------+
| india | AP | 100 |
+---------+--------+-------+
| india | TN | 400 |
+---------+--------+-------+
| india | KA | 250 |
+---------+--------+-------+
| US | LA | 140 |
+---------+--------+-------+
| US | CA | 120 |
+---------+--------+-------+
| US | NY | 150 |
+---------+--------+-------+
Desired output:
+---------+------+-------+
| country | doc | engg |
+---------+------+-------+
| india | 60 | 750 |
+---------+------+-------+
| US | 90 | 410 |
+---------+------+-------+
I tried with the below query but am getting more count of docs and engg. Someone please correct me..
select country, sum(a.doc), sum(b.engg)
from table1 a join table2 b on (a.country = b.country)

I think your problem is that you are getting a cross-product of both the tables with these set of values.
Try using:
tableA NATURAL JOIN tableB.

You can use UNION ALL
SELECT
country,
SUM(doc) AS doc,
SUM(engg) AS engg
FROM
(SELECT
country,
doc,
0 AS engg
FROM
doctors
UNION ALL
SELECT
country,
0,
engg
FROM
engineers
) a
GROUP BY
country

You need to group by country.
select a.country, sum(docSum), sum(enggSum) from
(select country, sum(doc) docSum from doctors) a
inner join
(select country, sum(engg) enggSum from engineers)
on a.country = b.country
group by a.country

Related

match TableA.Id with TableB.Id Than return TableB.Country

I have two tables in my SQLite databaselike this
Table A:
| Id | Date | Rate | Person
|:-- |:-------:| ----:| ------:|
| 1 | 2022-02 | 6.3 | Alex |
| 1 | 2022-05 | 4.2 | John |
| 2 | 2022-09 | 2.5 | Alex |
| 3 | 2022-01 | 7.8 | David |
| 2 | 2022-21 | 9 | William|
Table B:
| Id | City | Country |
|:-- |:---------:| -------:|
| 1 | London | England |
| 2 | Paris | France |
| 3 | Washington| USA |
| 4 | Berlin | Germany |
I need a query to get Id and Rate of each row, in table A then get Country of that Id in Table B
The result should be something like this
Table C:
| Ids | Countries | Rates |
|:--- |:---------:| -----:|
| 1 | England | 6.3 |
| 1 | England | 4.2 |
| 2 | France | 2.5 |
| 3 | USA | 7.8 |
| 2 | France | 9 |
You need to join A and B with Id, and select some columns:
select B.Id as Ids,
B.Country as Countries,
A.Rate as Rates
From A inner join B on(A.Id = B.Id)

How do I join an ARRAY to a COLUMN?

I have two large tables - Table_A and Table_B - that I want to join on the ID field. "ID" in Table_A is a column and "IDs" in Table_B is an array
Table_A:
ID | City |
----+------------+
101 | London |
102 | Paris |
103 | Rome |
104 | Copenhagen |
105 | Amsterdam |
106 | Berlin |
107 | Cardiff |
108 | Lisbon |
Table_B:
Date | Sessions | IDs
------+----------+--------------
06-02 | 1 | [107,102]
06-03 | 1 | [103]
11-12 | 1 | [105,107,103]
27-06 | 1 | [104,108]
31-01 | 1 | [105]
22-04 | 1 | [106,102]
08-07 | 1 | [101,105,108]
02-10 | 1 | [105]
Desirable Output:
Date | Sessions | ID | City
------+----------+-------------+-------------
06-02 | 1 | 107 | Cardiff
| | 102 | Paris
06-03 | 1 | 103 | Rome
11-12 | 1 | 105 | Amsterdam
| | 107 | Cardiff
| | 103 | Rome
27-06 | 1 | 104 | Copenhagen
| | 108 | Lisbon
...
I have tried using inner joins with unnest and union all but nothing is working. Any help would be appreciated.
Something along those lines should yield the result you are looking for
select
date,
sessions,
array_agg(id_un) as id,
array_agg(city) as city
from table_b b, unnest (id) as id_un
left join table_a a on id_un = a.id
group by 1, 2
Consider also below approach
select date, sessions, ids as id,
array(
select city
from b.ids id
left join Table_A
using(id)
) city
from Table_B b
if applied to sample data in your question - output is

Select all values from last date that is shared between rows grouped by a value

I have a Postgresql table with a list of values for countries over time, and their continents. Values can be NULL. I’d like to get the sum for each continent over time, up to the latest date each continent has data for.
This is my table (view on DB Fiddle):
| continent | country | date | value | id |
| --------- | ------- | ---------- | ----- | --- |
| Europe | Germany | 2020-05-25 | 10 | 1 |
| Europe | Germany | 2020-05-26 | 11 | 2 |
| Europe | Germany | 2020-05-27 | 12 | 3 |
| Europe | Germany | 2020-05-28 | 13 | 4 |
| Europe | Italy | 2020-05-25 | 20 | 5 |
| Europe | Italy | 2020-05-26 | 21 | 6 |
| Europe | Italy | 2020-05-27 | 22 | 7 |
| Europe | Italy | 2020-05-28 | 23 | 8 |
| Europe | France | 2020-05-25 | 30 | 9 |
| Europe | France | 2020-05-26 | 31 | 10 |
| Europe | France | 2020-05-27 | 32 | 11 |
| Europe | France | 2020-05-28 | NULL | 12 |
| Africa | Congo | 2020-05-25 | 40 | 13 |
| Africa | Congo | 2020-05-26 | 41 | 14 |
| Africa | Congo | 2020-05-27 | NULL | 15 |
And this is what I’d like to get back. Note that the Europe includes data up to the 27th, because France has no data for the 28th, and Africa up to the 26th, because that’s the last date its countries have data for.
| continent | date | value |
| --------- | ---------- | ----- |
| Europe | 2020-05-27 | 66 |
| Africa | 2020-05-26 | 41 |
| Europe | 2020-05-26 | 63 |
| Africa | 2020-05-25 | 40 |
| Europe | 2020-05-25 | 60 |
I managed to almost get there by including the number of countries per continent that have data on each date.
SELECT
countries.continent,
countries.date,
SUM(countries.value) AS value,
COUNT(countries.country) AS countries_count
FROM
countries
WHERE
countries.value IS NOT NULL
GROUP BY
countries.continent,
countries.date
ORDER BY
countries.date DESC,
countries.continent;
| continent | date | value | countries_count |
| --------- | ---------- | ----- | --------------- |
| Europe | 2020-05-28 | 36 | 2 |
| Europe | 2020-05-27 | 66 | 3 |
| Africa | 2020-05-26 | 41 | 1 |
| Europe | 2020-05-26 | 63 | 3 |
| Africa | 2020-05-25 | 40 | 1 |
| Europe | 2020-05-25 | 60 | 3 |
I also managed to get the number of countries per continent.
SELECT
countries.continent,
COUNT(DISTINCT countries.country) as number_of_countries
FROM
countries
GROUP BY
countries.continent;
| continent | number_of_countries |
| --------- | ------------------- |
| Africa | 1 |
| Europe | 3 |
I’m stuck on how to combine the two queries to filter out rows that haven’t got the full number of countries for the continent (e. g. select rows where countries_count is 3 for Europe and 1 for Africa.
This is the end result I’d like to get back:
| continent | date | value |
| --------- | ---------- | ----- |
| Europe | 2020-05-27 | 66 |
| Africa | 2020-05-26 | 41 |
| Europe | 2020-05-26 | 63 |
| Africa | 2020-05-25 | 40 |
| Europe | 2020-05-25 | 60 |
Or maybe there’s a completely different way to go about this?
View on DB Fiddle
You can compare the number of countries on the continent to the number available on each date -- and then just use dates where the two match ("complete data").
Unfortunately, Postgres does not support count(distinct) as a window function. But you can do:
SELECT c.continent, c.date,
SUM(c.value) AS value,
COUNT(c.country) AS countries_count
FROM (SELECT c.*,
COUNT(*) OVER (PARTITION BY continent, date) as num_on_date
FROM countries c
WHERE value IS NOT NULL
) c JOIN
(SELECT continent, COUNT(DISTINCT country) as num_countries
FROM countries
GROUP BY continent
) cc
ON cc.continent = c.continent
WHERE num_on_date = num_countries
GROUP BY c.continent, c.date
ORDER BY c.date DESC, c.continent;
Here is a db<>fiddle.
You can also do this with a filter in the HAVING clause:
SELECT c.continent, c.date,
SUM(c.value) AS value,
COUNT(c.country) AS countries_count
FROM countries c
WHERE value IS NOT NULL
GROUP BY c.continent, c.date
HAVING COUNT(*) = (SELECT COUNT(DISTINCT c2.country)
FROM countries c2
WHERE c2.continent = c.continent
)
ORDER BY c.date DESC, c.continent;
This does the aggregation and then only keeps the rows where the number of rows matches the number of countries.
You can use NOT IN within your WHERE Clause :
SELECT
c.continent,
c.date,
SUM(c.value) AS value,
COUNT(DISTINCT c.country) AS countries_count
FROM countries c
WHERE date NOT IN
( SELECT date
FROM countries
WHERE value IS NULL )
GROUP BY c.continent, c.date
ORDER BY c.date DESC, c.continent;
You can filter with a having clause to exclude groups where any country is null
SELECT
continent,
date,
SUM(value) AS value
FROM countries
GROUP BY continent, date
HAVING BOOL_AND(value is not null)
ORDER BY date DESC, continent
With SUM() window function:
select distinct c.continent, c.date,
sum(c.value) over (partition by c.continent, c.date) "value"
from countries c
where not exists (
select 1 from countries
where continent = c.continent and date = c.date and value is null
)
order by c.date desc, c.continent;
See the demo.
Results:
| continent | date | value |
| --------- | ------------------------ | ----- |
| Europe | 2020-05-27T00:00:00.000Z | 66 |
| Africa | 2020-05-26T00:00:00.000Z | 41 |
| Europe | 2020-05-26T00:00:00.000Z | 63 |
| Africa | 2020-05-25T00:00:00.000Z | 40 |
| Europe | 2020-05-25T00:00:00.000Z | 60 |

Choosing or filtering columns based on ranks of multiple fields in PostgreSQL

In one of the tables, I have multiple fields with a rank field against them. All these fields have a common grouping attribute against which I need to find the best ranked column value which can exist in any of the records of the group. For example, let's consider the data below:
+---------+---------------+-----------+-----------------+-------------+----------------------+------------+
| Country | City | City_Rank | Artist | Artist_Rank | Movie | Movie_Rank |
+---------+---------------+-----------+-----------------+-------------+----------------------+------------+
| USA | Las Vegas | 2 | Louis C.K | 2 | Justice League | 3 |
| USA | New York City | 3 | Michael Flynn | 3 | IT | 1 |
| USA | Los Angeles | 1 | Matt Lauer | 1 | Get Out | 2 |
| UK | Leeds | 2 | Jack Maynard | 3 | Beauty and the Beast | 2 |
| UK | Manchester | 3 | Charlie Gard | 1 | Wonder Woman | 1 |
| UK | London | 1 | Shannon Mathews | 2 | Logan | 3 |
+---------+---------------+-----------+-----------------+-------------+----------------------+------------+
Now I need the Rank 1 of City, Artist and Movie Grouped by the Country in the single record. So the expected output is:
+---------+------------------+--------------------+-------------------+
| Country | Best_Ranked_City | Best_Ranked_Artist | Best_Ranked_Movie |
+---------+------------------+--------------------+-------------------+
| USA | Los Angeles | Matt Lauer | IT |
| UK | London | Charlie Gard | Wonder Woman |
+---------+------------------+--------------------+-------------------+
I have many more attributes against which I have the rank field. I can arrive at the desired output by forming multiple datasets of the above with a filtering condition for each ranked field (where rank=1) and then joining these datasets by the group field.
However, this is quite a costly affair due to millions of records in the table, and filtering and joining this dataset multiple times doesn't seem to be the best way to solve this. I have arrived at the ranks for each field using a Rank() windows function by applying some business logic over it.
I would wish further to solve this problem using Window function only if possible.
I have arrived at the ranks for each field using a Rank() windows
function by applying some business logic over it.
I guess that there is some query which calculates ranks and then does a pivot operation in order to generate a summary table shown in the question.
It would be good to eliminate the pivot operation so that the input data geneerated by this query would look something like this:
| country | category | cat_value | rank_value |
|---------|----------|----------------------|------------|
| UK | Artist | Jack Maynard | 3 |
| UK | Artist | Shannon Mathews | 2 |
| UK | Artist | Charlie Gard | 1 |
| UK | City | Leeds | 2 |
| UK | City | Manchester | 3 |
| UK | City | London | 1 |
| UK | Movie | Logan | 3 |
| UK | Movie | Beauty and the Beast | 2 |
| UK | Movie | Wonder Woman | 1 |
| USA | Artist | Louis C.K | 2 |
| USA | Artist | Michael Flynn | 3 |
| USA | Artist | Matt Lauer | 1 |
| USA | City | Las Vegas | 2 |
| USA | City | Los Angeles | 1 |
| USA | City | New York City | 3 |
| USA | Movie | Justice League | 3 |
| USA | Movie | IT | 1 |
| USA | Movie | Get Out | 2 |
If this is not possible, then this resultset can be unpivoted using:
SELECT Country, 'City' as category, City as cat_value, City_Rank as rank_value
FROM Table1
UNION ALL
SELECT Country, 'Artist' as category, Artist as cat_value, Artist_Rank as rank_value
FROM Table1
UNION ALL
SELECT Country, 'Movie' as category, Movie as cat_value, Movie_Rank as rank_value
FROM Table1
If you unpivot this table, then picking items with rank=1 is very easy, just do:
SELECT * FROM unpivot_table WHERE rank_value = 1
and then another pivot can be done on it's results.
The final query may look like this (live demo: http://sqlfiddle.com/#!17/05e53/5)
With unpivot_me As (
SELECT Country, 'City' as category, City as cat_value, City_Rank as rank_value
FROM Table1
UNION ALL
SELECT Country, 'Artist' as category, Artist as cat_value, Artist_Rank as rank_value
FROM Table1
UNION ALL
SELECT Country, 'Movie' as category, Movie as cat_value, Movie_Rank as rank_value
FROM Table1
)
SELECT Country,
Max( case when category = 'City' Then cat_value End) As Best_Ranked_City,
Max( case when category = 'Artist' Then cat_value End) As Best_Ranked_Artist,
Max( case when category = 'Movie' Then cat_value End) As Best_Ranked_Movie
FROM unpivot_me
WHERE rank_value = 1
GROUP BY Country
| country | best_ranked_city | best_ranked_artist | best_ranked_movie |
|---------|------------------|--------------------|-------------------|
| UK | London | Charlie Gard | Wonder Woman |
| USA | Los Angeles | Matt Lauer | IT |
Used the window function max() and within it placed a case condition where the ranks are 1 partitioned by country. This fetched first rank values for the desired columns against all the countries. Later filtered it using one of the ranked field with value 1 (could have filtered using any of the available rank field). Here is the SQL : http://sqlfiddle.com/#!17/05e53/18
With T1 as (
select Country, max(case when City_Rank =1 then City else '' end)
over (partition by Country) as Best_Ranked_City, City_Rank,
max(case when Artist_Rank =1 then Artist else '' end)
over (partition by Country) as Best_Ranked_Artist, max(case when
Movie_Rank =1 then Movie else '' end)
over (partition by Country) as Best_Ranked_Movie
from Table1
)
select Country, Best_Ranked_City, Best_Ranked_Artist, Best_Ranked_Movie
from T1 where city_rank=1;

How do I add a "total_num" field by aggregating equal values with SQL?

Well, I apologize for the horrible question title. I am not a SQL or database guy so I find I am somewhat lacking the vocabulary to succinctly describe what I am trying to do. So, I will just pose the question as an anecdote.
I have two tables:
+-------+--------+------------+
| STATE | REGION | CAPITAL |
+-------+--------+------------+
| WA | X | Olympia |
| CA | IX | Sacramento |
| TX | VI | Austin |
+-------+--------+------------+
And:
+-------+--------+-------+
| NAME | NUMBER | STATE |
+-------+--------+-------+
| Tom | 1 | WA |
| Dick | 5 | WA |
| Larry | 45 | WA |
| Joe | 65 | TX |
| John | 3 | CA |
+-------+--------+-------+
How can I then query the second table so that I can "append" a fourth field to the first table that stores a total count for the number of people in that state, such that the first table would then look like this:
+-------+--------+------------+-------+
| STATE | REGION | CAPITAL | COUNT |
+-------+--------+------------+-------+
| WA | X | Olympia | 3 |
| CA | IX | Sacramento | 1 |
| TX | VI | Austin | 1 |
+-------+--------+------------+-------+
Thanks in advance.
SELECT s.STATE, s.REGION, s.CAPITAL, COUNT(*) as 'COUNT'
FROM secondtable s
JOIN firsttable f ON s.STATE = f.STATE
GROUP BY f.STATE, f.REGION, f.CAPITAL
ORDER BY COUNT(*) DESC
Try this
select sc.state,sc.region.sc.capital, count(*) as tot
from State_City_Table sc
join people_table pt on pt.state=sc.state
group by sc.state,sc.region.sc.capital