Suppose I have addresses and country_codes tables.
addresses table has incorrect country_code values and I want to replace them with country_code from the country_codes table based on the address.
How to do that in Snowflake?
Note that the address column might have a country name or country code in it.
Tables:
+--------------+-----------------------------------+
| country_code | address |
+--------------+-----------------------------------+
| US | 1145 Oakmound Drive, US |
| W | 4733 Pallet Street, United States |
| F | Rua Wanda Carnio 190, Brazil |
| 22 | Via delle Viole 137, Italy |
| 50 | 7 Essex Rd, GB |
+--------------+-----------------------------------+
+--------------+----------------+
| country_code | country |
+--------------+----------------+
| GB | United Kingdom |
| BR | Brazil |
| IT | Italy |
| US | United States |
+--------------+----------------+
Desired output:
+--------------+-----------------------------------+
| country_code | address |
+--------------+-----------------------------------+
| US | 1145 Oakmound Drive, US |
| US | 4733 Pallet Street, United States |
| BR | Rua Wanda Carnio 190, Brazil |
| IT | Via delle Viole 137, Italy |
| GB | 7 Essex Rd, GB |
+--------------+-----------------------------------+
Something like this can give you the expected result:
select c.country_code, a.address from addresses a join country_codes c
ON c.country_code = a.country_code OR ( POSITION ( c.country_code, a.address ) OR POSITION ( c.country, a.address ));
+--------------+-----------------------------------+
| COUNTRY_CODE | ADDRESS |
+--------------+-----------------------------------+
| US | 1145 Oakmound Drive, US |
| US | 4733 Pallet Street, United States |
| BR | Rua Wanda Carnio 190, Brazil |
| IT | Via delle Viole 137, Italy |
| GB | 7 Essex Rd, GB |
+--------------+-----------------------------------+
You could join on the condition that the last part of address includes either the country or country_code.
select c.country_code, a.address
from addresses a
join country_codes c on trim(split_part(a.address,',',-1)) ilike any (c.country_code, c.country)
Related
I have a Postgresql table with a list of values for countries over time, and their continents. Values can be NULL. I’d like to get the sum for each continent over time, up to the latest date each continent has data for.
This is my table (view on DB Fiddle):
| continent | country | date | value | id |
| --------- | ------- | ---------- | ----- | --- |
| Europe | Germany | 2020-05-25 | 10 | 1 |
| Europe | Germany | 2020-05-26 | 11 | 2 |
| Europe | Germany | 2020-05-27 | 12 | 3 |
| Europe | Germany | 2020-05-28 | 13 | 4 |
| Europe | Italy | 2020-05-25 | 20 | 5 |
| Europe | Italy | 2020-05-26 | 21 | 6 |
| Europe | Italy | 2020-05-27 | 22 | 7 |
| Europe | Italy | 2020-05-28 | 23 | 8 |
| Europe | France | 2020-05-25 | 30 | 9 |
| Europe | France | 2020-05-26 | 31 | 10 |
| Europe | France | 2020-05-27 | 32 | 11 |
| Europe | France | 2020-05-28 | NULL | 12 |
| Africa | Congo | 2020-05-25 | 40 | 13 |
| Africa | Congo | 2020-05-26 | 41 | 14 |
| Africa | Congo | 2020-05-27 | NULL | 15 |
And this is what I’d like to get back. Note that the Europe includes data up to the 27th, because France has no data for the 28th, and Africa up to the 26th, because that’s the last date its countries have data for.
| continent | date | value |
| --------- | ---------- | ----- |
| Europe | 2020-05-27 | 66 |
| Africa | 2020-05-26 | 41 |
| Europe | 2020-05-26 | 63 |
| Africa | 2020-05-25 | 40 |
| Europe | 2020-05-25 | 60 |
I managed to almost get there by including the number of countries per continent that have data on each date.
SELECT
countries.continent,
countries.date,
SUM(countries.value) AS value,
COUNT(countries.country) AS countries_count
FROM
countries
WHERE
countries.value IS NOT NULL
GROUP BY
countries.continent,
countries.date
ORDER BY
countries.date DESC,
countries.continent;
| continent | date | value | countries_count |
| --------- | ---------- | ----- | --------------- |
| Europe | 2020-05-28 | 36 | 2 |
| Europe | 2020-05-27 | 66 | 3 |
| Africa | 2020-05-26 | 41 | 1 |
| Europe | 2020-05-26 | 63 | 3 |
| Africa | 2020-05-25 | 40 | 1 |
| Europe | 2020-05-25 | 60 | 3 |
I also managed to get the number of countries per continent.
SELECT
countries.continent,
COUNT(DISTINCT countries.country) as number_of_countries
FROM
countries
GROUP BY
countries.continent;
| continent | number_of_countries |
| --------- | ------------------- |
| Africa | 1 |
| Europe | 3 |
I’m stuck on how to combine the two queries to filter out rows that haven’t got the full number of countries for the continent (e. g. select rows where countries_count is 3 for Europe and 1 for Africa.
This is the end result I’d like to get back:
| continent | date | value |
| --------- | ---------- | ----- |
| Europe | 2020-05-27 | 66 |
| Africa | 2020-05-26 | 41 |
| Europe | 2020-05-26 | 63 |
| Africa | 2020-05-25 | 40 |
| Europe | 2020-05-25 | 60 |
Or maybe there’s a completely different way to go about this?
View on DB Fiddle
You can compare the number of countries on the continent to the number available on each date -- and then just use dates where the two match ("complete data").
Unfortunately, Postgres does not support count(distinct) as a window function. But you can do:
SELECT c.continent, c.date,
SUM(c.value) AS value,
COUNT(c.country) AS countries_count
FROM (SELECT c.*,
COUNT(*) OVER (PARTITION BY continent, date) as num_on_date
FROM countries c
WHERE value IS NOT NULL
) c JOIN
(SELECT continent, COUNT(DISTINCT country) as num_countries
FROM countries
GROUP BY continent
) cc
ON cc.continent = c.continent
WHERE num_on_date = num_countries
GROUP BY c.continent, c.date
ORDER BY c.date DESC, c.continent;
Here is a db<>fiddle.
You can also do this with a filter in the HAVING clause:
SELECT c.continent, c.date,
SUM(c.value) AS value,
COUNT(c.country) AS countries_count
FROM countries c
WHERE value IS NOT NULL
GROUP BY c.continent, c.date
HAVING COUNT(*) = (SELECT COUNT(DISTINCT c2.country)
FROM countries c2
WHERE c2.continent = c.continent
)
ORDER BY c.date DESC, c.continent;
This does the aggregation and then only keeps the rows where the number of rows matches the number of countries.
You can use NOT IN within your WHERE Clause :
SELECT
c.continent,
c.date,
SUM(c.value) AS value,
COUNT(DISTINCT c.country) AS countries_count
FROM countries c
WHERE date NOT IN
( SELECT date
FROM countries
WHERE value IS NULL )
GROUP BY c.continent, c.date
ORDER BY c.date DESC, c.continent;
You can filter with a having clause to exclude groups where any country is null
SELECT
continent,
date,
SUM(value) AS value
FROM countries
GROUP BY continent, date
HAVING BOOL_AND(value is not null)
ORDER BY date DESC, continent
With SUM() window function:
select distinct c.continent, c.date,
sum(c.value) over (partition by c.continent, c.date) "value"
from countries c
where not exists (
select 1 from countries
where continent = c.continent and date = c.date and value is null
)
order by c.date desc, c.continent;
See the demo.
Results:
| continent | date | value |
| --------- | ------------------------ | ----- |
| Europe | 2020-05-27T00:00:00.000Z | 66 |
| Africa | 2020-05-26T00:00:00.000Z | 41 |
| Europe | 2020-05-26T00:00:00.000Z | 63 |
| Africa | 2020-05-25T00:00:00.000Z | 40 |
| Europe | 2020-05-25T00:00:00.000Z | 60 |
I have the below scenario:
Input data:
Table t1:
+-------------+
| Teams |
+-------------+
| India |
| Australia |
| England |
| Italy |
+-------------+
Required output:
+-------------+------------+
| Team1 | Team2 |
+-------------+------------+
| India | Australia |
| India | England |
| India | Italy |
| Australia | England |
| Australia | Italy |
| England | Italy |
+-------------+------------+
i.e. the countries (column Team1) who are playing against which country (column Team2).
I tried using full outer join but wasn't able to get distinct values. Can we achieve this through a single sql query?
Do a "half" join on teams not being equal:
select a.team, b.team
from teams a
join teams b on a.team < b.team
See live demo on SQLFiddle.
The use of a.team < b.team rather than a.team != b.team returns only combinations rather than permutations - you get only one side of each join, giving you only distinct combinations.
In one of the tables, I have multiple fields with a rank field against them. All these fields have a common grouping attribute against which I need to find the best ranked column value which can exist in any of the records of the group. For example, let's consider the data below:
+---------+---------------+-----------+-----------------+-------------+----------------------+------------+
| Country | City | City_Rank | Artist | Artist_Rank | Movie | Movie_Rank |
+---------+---------------+-----------+-----------------+-------------+----------------------+------------+
| USA | Las Vegas | 2 | Louis C.K | 2 | Justice League | 3 |
| USA | New York City | 3 | Michael Flynn | 3 | IT | 1 |
| USA | Los Angeles | 1 | Matt Lauer | 1 | Get Out | 2 |
| UK | Leeds | 2 | Jack Maynard | 3 | Beauty and the Beast | 2 |
| UK | Manchester | 3 | Charlie Gard | 1 | Wonder Woman | 1 |
| UK | London | 1 | Shannon Mathews | 2 | Logan | 3 |
+---------+---------------+-----------+-----------------+-------------+----------------------+------------+
Now I need the Rank 1 of City, Artist and Movie Grouped by the Country in the single record. So the expected output is:
+---------+------------------+--------------------+-------------------+
| Country | Best_Ranked_City | Best_Ranked_Artist | Best_Ranked_Movie |
+---------+------------------+--------------------+-------------------+
| USA | Los Angeles | Matt Lauer | IT |
| UK | London | Charlie Gard | Wonder Woman |
+---------+------------------+--------------------+-------------------+
I have many more attributes against which I have the rank field. I can arrive at the desired output by forming multiple datasets of the above with a filtering condition for each ranked field (where rank=1) and then joining these datasets by the group field.
However, this is quite a costly affair due to millions of records in the table, and filtering and joining this dataset multiple times doesn't seem to be the best way to solve this. I have arrived at the ranks for each field using a Rank() windows function by applying some business logic over it.
I would wish further to solve this problem using Window function only if possible.
I have arrived at the ranks for each field using a Rank() windows
function by applying some business logic over it.
I guess that there is some query which calculates ranks and then does a pivot operation in order to generate a summary table shown in the question.
It would be good to eliminate the pivot operation so that the input data geneerated by this query would look something like this:
| country | category | cat_value | rank_value |
|---------|----------|----------------------|------------|
| UK | Artist | Jack Maynard | 3 |
| UK | Artist | Shannon Mathews | 2 |
| UK | Artist | Charlie Gard | 1 |
| UK | City | Leeds | 2 |
| UK | City | Manchester | 3 |
| UK | City | London | 1 |
| UK | Movie | Logan | 3 |
| UK | Movie | Beauty and the Beast | 2 |
| UK | Movie | Wonder Woman | 1 |
| USA | Artist | Louis C.K | 2 |
| USA | Artist | Michael Flynn | 3 |
| USA | Artist | Matt Lauer | 1 |
| USA | City | Las Vegas | 2 |
| USA | City | Los Angeles | 1 |
| USA | City | New York City | 3 |
| USA | Movie | Justice League | 3 |
| USA | Movie | IT | 1 |
| USA | Movie | Get Out | 2 |
If this is not possible, then this resultset can be unpivoted using:
SELECT Country, 'City' as category, City as cat_value, City_Rank as rank_value
FROM Table1
UNION ALL
SELECT Country, 'Artist' as category, Artist as cat_value, Artist_Rank as rank_value
FROM Table1
UNION ALL
SELECT Country, 'Movie' as category, Movie as cat_value, Movie_Rank as rank_value
FROM Table1
If you unpivot this table, then picking items with rank=1 is very easy, just do:
SELECT * FROM unpivot_table WHERE rank_value = 1
and then another pivot can be done on it's results.
The final query may look like this (live demo: http://sqlfiddle.com/#!17/05e53/5)
With unpivot_me As (
SELECT Country, 'City' as category, City as cat_value, City_Rank as rank_value
FROM Table1
UNION ALL
SELECT Country, 'Artist' as category, Artist as cat_value, Artist_Rank as rank_value
FROM Table1
UNION ALL
SELECT Country, 'Movie' as category, Movie as cat_value, Movie_Rank as rank_value
FROM Table1
)
SELECT Country,
Max( case when category = 'City' Then cat_value End) As Best_Ranked_City,
Max( case when category = 'Artist' Then cat_value End) As Best_Ranked_Artist,
Max( case when category = 'Movie' Then cat_value End) As Best_Ranked_Movie
FROM unpivot_me
WHERE rank_value = 1
GROUP BY Country
| country | best_ranked_city | best_ranked_artist | best_ranked_movie |
|---------|------------------|--------------------|-------------------|
| UK | London | Charlie Gard | Wonder Woman |
| USA | Los Angeles | Matt Lauer | IT |
Used the window function max() and within it placed a case condition where the ranks are 1 partitioned by country. This fetched first rank values for the desired columns against all the countries. Later filtered it using one of the ranked field with value 1 (could have filtered using any of the available rank field). Here is the SQL : http://sqlfiddle.com/#!17/05e53/18
With T1 as (
select Country, max(case when City_Rank =1 then City else '' end)
over (partition by Country) as Best_Ranked_City, City_Rank,
max(case when Artist_Rank =1 then Artist else '' end)
over (partition by Country) as Best_Ranked_Artist, max(case when
Movie_Rank =1 then Movie else '' end)
over (partition by Country) as Best_Ranked_Movie
from Table1
)
select Country, Best_Ranked_City, Best_Ranked_Artist, Best_Ranked_Movie
from T1 where city_rank=1;
table1 - doctors
+---------+--------+------+
| country | state | doc |
+---------+--------+------+
| india | AP | 20 |
+---------+--------+------+
| india | TN | 30 |
+---------+--------+------+
| india | KA | 10 |
+---------+--------+------+
| US | LA | 30 |
+---------+--------+------+
| US | CA | 10 |
+---------+--------+------+
| US | NY | 50 |
+---------+--------+------+
table2 - engineers
+---------+--------+-------+
| country | state | engg |
+---------+--------+-------+
| india | AP | 100 |
+---------+--------+-------+
| india | TN | 400 |
+---------+--------+-------+
| india | KA | 250 |
+---------+--------+-------+
| US | LA | 140 |
+---------+--------+-------+
| US | CA | 120 |
+---------+--------+-------+
| US | NY | 150 |
+---------+--------+-------+
Desired output:
+---------+------+-------+
| country | doc | engg |
+---------+------+-------+
| india | 60 | 750 |
+---------+------+-------+
| US | 90 | 410 |
+---------+------+-------+
I tried with the below query but am getting more count of docs and engg. Someone please correct me..
select country, sum(a.doc), sum(b.engg)
from table1 a join table2 b on (a.country = b.country)
I think your problem is that you are getting a cross-product of both the tables with these set of values.
Try using:
tableA NATURAL JOIN tableB.
You can use UNION ALL
SELECT
country,
SUM(doc) AS doc,
SUM(engg) AS engg
FROM
(SELECT
country,
doc,
0 AS engg
FROM
doctors
UNION ALL
SELECT
country,
0,
engg
FROM
engineers
) a
GROUP BY
country
You need to group by country.
select a.country, sum(docSum), sum(enggSum) from
(select country, sum(doc) docSum from doctors) a
inner join
(select country, sum(engg) enggSum from engineers)
on a.country = b.country
group by a.country
Well, I apologize for the horrible question title. I am not a SQL or database guy so I find I am somewhat lacking the vocabulary to succinctly describe what I am trying to do. So, I will just pose the question as an anecdote.
I have two tables:
+-------+--------+------------+
| STATE | REGION | CAPITAL |
+-------+--------+------------+
| WA | X | Olympia |
| CA | IX | Sacramento |
| TX | VI | Austin |
+-------+--------+------------+
And:
+-------+--------+-------+
| NAME | NUMBER | STATE |
+-------+--------+-------+
| Tom | 1 | WA |
| Dick | 5 | WA |
| Larry | 45 | WA |
| Joe | 65 | TX |
| John | 3 | CA |
+-------+--------+-------+
How can I then query the second table so that I can "append" a fourth field to the first table that stores a total count for the number of people in that state, such that the first table would then look like this:
+-------+--------+------------+-------+
| STATE | REGION | CAPITAL | COUNT |
+-------+--------+------------+-------+
| WA | X | Olympia | 3 |
| CA | IX | Sacramento | 1 |
| TX | VI | Austin | 1 |
+-------+--------+------------+-------+
Thanks in advance.
SELECT s.STATE, s.REGION, s.CAPITAL, COUNT(*) as 'COUNT'
FROM secondtable s
JOIN firsttable f ON s.STATE = f.STATE
GROUP BY f.STATE, f.REGION, f.CAPITAL
ORDER BY COUNT(*) DESC
Try this
select sc.state,sc.region.sc.capital, count(*) as tot
from State_City_Table sc
join people_table pt on pt.state=sc.state
group by sc.state,sc.region.sc.capital