HADOOP HIVE QUERY - SQL - sql

I write a query in hive. it not working
query:
hive>> select country ,max(total_count) from (select country, count(airlineid) from airport group by country) t2;
it shows expression group by 'country' is missing.

Multiple problems. if you format it you should be able to see it.
select
country, -- you need to add this in group by after t2.
max(total_count) -- you need to create/alias a column called total_count
from
(
select
country,
count(airlineid)
from
airport
group by
country
) t2;
Fixed SQL -
select
country,
max(total_count) max_total_count
from
(
select
country,
count(airlineid)total_count
from
airport
group by
country
) t2
group by country
;

Related

Get count summed across two columns in SQL?

I am working in Postgres and have the following accounts table:
account_number | integer
country1 | character varying(1000)
country2 | character varying(1000)
I want to get a count of accounts in each country, regardless of whether the country is country1 or country2.
So if the content of the table was:
account_number,country1,country2
123,France,Germany
124,Switzerland,France
125,Germany
Then the desired output from the query would be:
France,2
Germany,2
Switzerland,1
I know how to do this for one country at a time (select country1, count(*) from accounts group by country1) but not for both countries simultaneously.
You can try the below -
with cte as
(
select account_number, country1 as country
from table1
union all
select account_number, country2
from table1
)
select country, count(*) as cnt
from cte
group by country
I recommend unpivoting the data using a lateral join and then aggregating:
select country, count(*)
from t cross join lateral
(values (country1), (country2)
) v(country)
where v.country is not null
group by country;
In addition to being a more concise way to write the query, it should be faster because the table is scanned only once. This could be a very big win if the "table" is really a view or subquery.

hive get percentages of count column not working

i have the following query in hive to get the counts per each of those columns (cluster, country and airline) as a percentage. But my percentage column contains only 0's.. why/what am i doing wrong below?
select
count(*)/ t.cnt * 100 AS percentage,
cluster,
country,
airline
from
table1
CROSS JOIN (SELECT COUNT(*) AS cnt FROM table1 ) t
GROUP
BY cluster,
country,
airline
First, you should use window functions.
Second, beware of integer division.
I would phrase this as:
select count(*) * 100.0 / sum(count(*)) over () AS percentage,
cluster, country, airline
from table1
group by cluster, country, airline;

postgis postgres count and group by column for ST_Distance function

This SQL produces the following:
SELECT city FROM travel_logs ORDER BY ST_Distance(travel_logs.start_point, ST_GeographyFromText('SRID=4326;POINT(101.652506 3.167610)'))
"Tshopo"
"Tshopo"
"Mongala"
"Haut-Komo"
This SQL produces the following:
SELECT city, count(*) AS count FROM travel_logs GROUP BY travel_logs.start_point, city ORDER BY ST_Distance(travel_logs.start_point, ST_GeographyFromText('SRID=4326;POINT(101.652506 3.167610)'))
"Tshopo";1
"Tshopo";1
"Mongala";1
"Haut-Komo";1
Basically, I want the result like this that groups by city and the number of times same city occurs. something like this
"Tshopo";2 <--- its summed up correctly
"Mongala";1
"Haut-Komo";1
Im not an expert on joins, subquery, would that help ? Thanks in advance.
this worked for me:
select city, count(*) as count
from
(SELECT city FROM travel_logs ORDER BY ST_Distance(travel_logs.start_point, ST_GeographyFromText('SRID=4326;POINT(101.652506 3.167610)'))
) as subquery_travel_logs_nearest group by city
Simple, plain SQL without a sub-query:
SELECT city, count(*)
FROM travel_logs
GROUP BY city
ORDER BY ST_Distance(start_point,
ST_GeographyFromText('SRID=4326;POINT(101.652506 3.167610)'));

How to get a count via a subselect

I'm trying to write an SQL which gets the number of rows in a table which has the same value as a specific column.
In this case, the table has a column called 'title'. For each row I return, along with the value of other columns for the row, I want to get the number of rows in the table which have the same value as that row's 'title' value.
For now, the best I have is:
select firstname, lastname, city, state, title, (select count from myTable where title = title);
Obviously, all this gives me is the number of rows in the table in my subselect.
How do I get the right side of the title = to refer to the value of the column's title?
Thanks for any help, anyone.
this is called "correlated subquery". It goes like this:
select firstname, lastname, title,
(select count(1) from table where title=a.title) title_count
from table a;
This is not tested, but I think that it should work for what you are trying to do
WITH titles AS (
SELECT
Title
, COUNT(*) AS Occurences
FROM myTable
GROUP BY Title
)
SELECT
t1.FirstName
, t1.LastName
, t1.City
, t1.state
, t1.title
, titles.Occurences
FROM myTable AS t1
INNER JOIN titles ON t1.Title = titles.Title
Most databases support the window/analytic functions. If you are using one of these (SQL Server, Oracle, Postgres, for example), you can do this as:
select t.firstname, t.lastname, t.title, t.city, t.state,
count(*) over (partition by title) as numwithtitle
from t

select max, min values from two tables

I have two tables. Differ in that an archive is a table and the other holds the current record. These are the tables recording sales in the company. In both we have among other fields: id, name, price of sale. I need to select from both tables, the highest and lowest price for a given name. I tried to do with the query:
select name, max (price_of_sale), min (price_of_sale)
from wapzby
union
select name, max (price_of_sale), min (price_of_sale)
from wpzby
order by name
but such an inquiry draws me two records - one of the current table, one table archival. I want to chose a name for the smallest and the largest price immediately from both tables. How do I get this query?
Here's two options (MSSql compliant)
Note: UNION ALL will combine the sets without eliminating duplicates. That's a much simpler behavior than UNION.
SELECT Name, MAX(Price_Of_Sale) as MaxPrice, MIN(Price_Of_Sale) as MinPrice
FROM
(
SELECT Name, Price_Of_Sale
FROM wapzby
UNION ALL
SELECT Name, Price_Of_Sale
FROM wpzby
) as subQuery
GROUP BY Name
ORDER BY Name
This one figures out the max and min from each table before combining the set - it may be more performant to do it this way.
SELECT Name, MAX(MaxPrice) as MaxPrice, MIN(MinPrice) as MinPrice
FROM
(
SELECT Name, MAX(Price_Of_Sale) as MaxPrice, MIN(Price_Of_Sale) as MinPrice
FROM wapzby
GROUP BY Name
UNION ALL
SELECT Name, MAX(Price_Of_Sale) as MaxPrice, MIN(Price_Of_Sale) as MinPrice
FROM wpzby
GROUP BY Name
) as subQuery
GROUP BY Name
ORDER BY Name
In SQL Server you could use a subquery:
SELECT [name],
MAX([price_of_sale]) AS [MAX price_of_sale],
MIN([price_of_sale]) AS [MIN price_of_sale]
FROM (
SELECT [name],
[price_of_sale]
FROM [dbo].[wapzby]
UNION
SELECT [name],
[price_of_sale]
FROM [dbo].[wpzby]
) u
GROUP BY [name]
ORDER BY [name]
Is this more like what you want?
SELECT
a.name,
MAX (a.price_of_sale),
MIN (a.price_of_sale) ,
b.name,
MAX (b.price_of_sale),
MIN (b.price_of_sale)
FROM
wapzby a,
wpzby b
ORDER BY
a.name
It's untested but should return all your records on one row without the need for a union
SELECT MAX(value) FROM tabl1 UNION SELECT MAX(value) FROM tabl2;
SELECT MIN(value) FROM tabl1 UNION SELECT MIN(value) FROM tabl2;
SELECT (SELECT MAX(value) FROM table1 WHERE trn_type='CSL' and till='TILL01') as summ, (SELECT MAX(value) FROM table2WHERE trn_type='CSL' and till='TILL01') as summ_hist