HADOOP HIVE QUERY - SQL

HADOOP HIVE QUERY - SQL - sql

I write a query in hive. it not working
query:
hive>> select country ,max(total_count) from (select country, count(airlineid) from airport group by country) t2;
it shows expression group by 'country' is missing.

Multiple problems. if you format it you should be able to see it.
select
country, -- you need to add this in group by after t2.
max(total_count) -- you need to create/alias a column called total_count
from
(
select
country,
count(airlineid)
from
airport
group by
country
) t2;
Fixed SQL -
select
country,
max(total_count) max_total_count
from
(
select
country,
count(airlineid)total_count
from
airport
group by
country
) t2
group by country
;

Related

Get count summed across two columns in SQL?

I am working in Postgres and have the following accounts table:
account_number | integer
country1 | character varying(1000)
country2 | character varying(1000)
I want to get a count of accounts in each country, regardless of whether the country is country1 or country2.
So if the content of the table was:
account_number,country1,country2
123,France,Germany
124,Switzerland,France
125,Germany
Then the desired output from the query would be:
France,2
Germany,2
Switzerland,1
I know how to do this for one country at a time (select country1, count(*) from accounts group by country1) but not for both countries simultaneously.

You can try the below -
with cte as
(
select account_number, country1 as country
from table1
union all
select account_number, country2
from table1
)
select country, count(*) as cnt
from cte
group by country

I recommend unpivoting the data using a lateral join and then aggregating:
select country, count(*)
from t cross join lateral
(values (country1), (country2)
) v(country)
where v.country is not null
group by country;
In addition to being a more concise way to write the query, it should be faster because the table is scanned only once. This could be a very big win if the "table" is really a view or subquery.

hive get percentages of count column not working

i have the following query in hive to get the counts per each of those columns (cluster, country and airline) as a percentage. But my percentage column contains only 0's.. why/what am i doing wrong below?
select
count(*)/ t.cnt * 100 AS percentage,
cluster,
country,
airline
from
table1
CROSS JOIN (SELECT COUNT(*) AS cnt FROM table1 ) t
GROUP
BY cluster,
country,
airline

First, you should use window functions.
Second, beware of integer division.
I would phrase this as:
select count(*) * 100.0 / sum(count(*)) over () AS percentage,
cluster, country, airline
from table1
group by cluster, country, airline;

postgis postgres count and group by column for ST_Distance function

This SQL produces the following:
SELECT city FROM travel_logs ORDER BY ST_Distance(travel_logs.start_point, ST_GeographyFromText('SRID=4326;POINT(101.652506 3.167610)'))
"Tshopo"
"Tshopo"
"Mongala"
"Haut-Komo"
This SQL produces the following:
SELECT city, count(*) AS count FROM travel_logs GROUP BY travel_logs.start_point, city ORDER BY ST_Distance(travel_logs.start_point, ST_GeographyFromText('SRID=4326;POINT(101.652506 3.167610)'))
"Tshopo";1
"Tshopo";1
"Mongala";1
"Haut-Komo";1
Basically, I want the result like this that groups by city and the number of times same city occurs. something like this
"Tshopo";2 <--- its summed up correctly
"Mongala";1
"Haut-Komo";1
Im not an expert on joins, subquery, would that help ? Thanks in advance.

this worked for me:
select city, count(*) as count
from
(SELECT city FROM travel_logs ORDER BY ST_Distance(travel_logs.start_point, ST_GeographyFromText('SRID=4326;POINT(101.652506 3.167610)'))
) as subquery_travel_logs_nearest group by city

Simple, plain SQL without a sub-query:
SELECT city, count(*)
FROM travel_logs
GROUP BY city
ORDER BY ST_Distance(start_point,
ST_GeographyFromText('SRID=4326;POINT(101.652506 3.167610)'));

How to get a count via a subselect

I'm trying to write an SQL which gets the number of rows in a table which has the same value as a specific column.
In this case, the table has a column called 'title'. For each row I return, along with the value of other columns for the row, I want to get the number of rows in the table which have the same value as that row's 'title' value.
For now, the best I have is:
select firstname, lastname, city, state, title, (select count from myTable where title = title);
Obviously, all this gives me is the number of rows in the table in my subselect.
How do I get the right side of the title = to refer to the value of the column's title?
Thanks for any help, anyone.

this is called "correlated subquery". It goes like this:
select firstname, lastname, title,
(select count(1) from table where title=a.title) title_count
from table a;

This is not tested, but I think that it should work for what you are trying to do
WITH titles AS (
SELECT
Title
, COUNT(*) AS Occurences
FROM myTable
GROUP BY Title
)
SELECT
t1.FirstName
, t1.LastName
, t1.City
, t1.state
, t1.title
, titles.Occurences
FROM myTable AS t1
INNER JOIN titles ON t1.Title = titles.Title

Most databases support the window/analytic functions. If you are using one of these (SQL Server, Oracle, Postgres, for example), you can do this as:
select t.firstname, t.lastname, t.title, t.city, t.state,
count(*) over (partition by title) as numwithtitle
from t

select max, min values from two tables

I have two tables. Differ in that an archive is a table and the other holds the current record. These are the tables recording sales in the company. In both we have among other fields: id, name, price of sale. I need to select from both tables, the highest and lowest price for a given name. I tried to do with the query:
select name, max (price_of_sale), min (price_of_sale)
from wapzby
union
select name, max (price_of_sale), min (price_of_sale)
from wpzby
order by name
but such an inquiry draws me two records - one of the current table, one table archival. I want to chose a name for the smallest and the largest price immediately from both tables. How do I get this query?

Here's two options (MSSql compliant)
Note: UNION ALL will combine the sets without eliminating duplicates. That's a much simpler behavior than UNION.
SELECT Name, MAX(Price_Of_Sale) as MaxPrice, MIN(Price_Of_Sale) as MinPrice
FROM
(
SELECT Name, Price_Of_Sale
FROM wapzby
UNION ALL
SELECT Name, Price_Of_Sale
FROM wpzby
) as subQuery
GROUP BY Name
ORDER BY Name
This one figures out the max and min from each table before combining the set - it may be more performant to do it this way.
SELECT Name, MAX(MaxPrice) as MaxPrice, MIN(MinPrice) as MinPrice
FROM
(
SELECT Name, MAX(Price_Of_Sale) as MaxPrice, MIN(Price_Of_Sale) as MinPrice
FROM wapzby
GROUP BY Name
UNION ALL
SELECT Name, MAX(Price_Of_Sale) as MaxPrice, MIN(Price_Of_Sale) as MinPrice
FROM wpzby
GROUP BY Name
) as subQuery
GROUP BY Name
ORDER BY Name

In SQL Server you could use a subquery:
SELECT [name],
MAX([price_of_sale]) AS [MAX price_of_sale],
MIN([price_of_sale]) AS [MIN price_of_sale]
FROM (
SELECT [name],
[price_of_sale]
FROM [dbo].[wapzby]
UNION
SELECT [name],
[price_of_sale]
FROM [dbo].[wpzby]
) u
GROUP BY [name]
ORDER BY [name]

Is this more like what you want?
SELECT
a.name,
MAX (a.price_of_sale),
MIN (a.price_of_sale) ,
b.name,
MAX (b.price_of_sale),
MIN (b.price_of_sale)
FROM
wapzby a,
wpzby b
ORDER BY
a.name
It's untested but should return all your records on one row without the need for a union

SELECT MAX(value) FROM tabl1 UNION SELECT MAX(value) FROM tabl2;
SELECT MIN(value) FROM tabl1 UNION SELECT MIN(value) FROM tabl2;

SELECT (SELECT MAX(value) FROM table1 WHERE trn_type='CSL' and till='TILL01') as summ, (SELECT MAX(value) FROM table2WHERE trn_type='CSL' and till='TILL01') as summ_hist

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

HADOOP HIVE QUERY - SQL - sql

I write a query in hive. it not working query: hive>> select country ,max(total_count) from (select country, count(airlineid) from airport group by country) t2; it shows expression group by 'country' is missing.

Related

Get count summed across two columns in SQL?

hive get percentages of count column not working

postgis postgres count and group by column for ST_Distance function

How to get a count via a subselect

select max, min values from two tables

Categories

Resources