Can a HIVE SELECT combine GROUP BY and ORDER BY? - sql

I'm doing some relatively simple queries in Hive and cannot seem to combine GROUP BY and ORDER BY in a single statement. I have no problem doing a select into a temporary table of the GROUP BY query and then doing a select on that table with an ORDER BY, but I can't combine them together.
For example, I have a table a and can execute this query:
SELECT place,count(*),sum(weight) from a group by place;
And I can execute this query:
create temporary table result (place string,count int,sumweight int);
insert overwrite table result
select place,count(*),sum(weight) from a group by place;
select * from result order by place;
But if I try this query:
SELECT place,count(*),sum(weight) from a group by place order by place;
I get this error:
Error: Error while compiling statement: FAILED: ParseException line 1:45 mismatched input '' expecting \' near '_c0' in character string literal (state=42000,code=40000)

Try using group by as a sub-query and order by as an outer query as show below:
SELECT
place,
cnt,
sum_
FROM (
SELECT
place,
count(*) as cnt,
sum(weight) as sum_
FROM a
GROUP BY place
) a
ORDER BY place;

use sort by like this:
SELECT place,count(*),sum(weight) from a group by place sort by place;

Related

How to run a subquery in hive

I have this query that I am trying to run in HIVE:
select transaction_date, count(total_distinct) from (
SELECT transaction_date, concat(subid,'**', itemid) as total_distinct
FROM TBL_1
group by transaction_date, subid,itemid
) group by transaction_date
What I am trying to do it get the distinct combination of subid and itemid, but I need the total count per day. When I run the query above, I get this error:
Error while compiling statement: FAILED: ParseException line 6:2 cannot recognize input near 'group' 'by' 'TRANSACTION_DATE' in subquery source
The query looks correct to me though. Has anyone encountered this error?
Hive requires subqueries to be aliased, so you need to specify a name for it:
select transaction_date, count(total_distinct) from (
SELECT transaction_date, concat(subid,'**', itemid) as total_distinct
FROM TBL_1
group by transaction_date, subid,itemid
) dummy -- << note here
group by transaction_date
True, the error message is far from helpful.

I can't figure out how to do this DISTINCT

Good morning
I tried and tried to understand why this Query gives the usual error on Group By. I would like to find the duplicate lines and delete them. I found this query on Microsoft's MSDN but despite this it keeps giving me this error on Group By.
The main table has 3 fields "Id, Item, Description", the table name is "tlbDescription", this query should in theory create a table named "duplicate_table" insert the duplicate values inside the "duplicate_table", then delete the values from table "tlbDescription" and finally delete the table "duplicate_table".
If someone can kindly give me a hand
Thank you
Fabrizio
This is the query:
SELECT DISTINCT *
INTO duplicate_table
FROM [tlbDescrizione]
GROUP BY [Articolo]
HAVING COUNT([Articolo]) > 1
DELETE [tlbDescrizione]
WHERE [Articolo] IN (SELECT [Articolo] FROM duplicate_table)
INSERT [tlbDescrizione]
SELECT * FROM duplicate_table
DROP TABLE duplicate_table
This query doesn't make sense:
SELECT DISTINCT *
INTO duplicate_table
FROM [tlbDescrizione]
GROUP BY [Articolo]
HAVING COUNT([Articolo]) > 1;
It is selecting all columns but is an aggregation query because of the GROUP BY. Hence, the SELECT columns are inconsistent with the GROUP BY columns and you get an error.
If you want all the columns then you can use window functions:
SELECT DISTINCT *
INTO duplicate_table
FROM (SELECT d.*, COUNT(*) OVER (PARTITION BY d.Articolo) as cnt
FROM tlbDescrizione d
) d
WHERE cnt > 1;
Or, if you want only the ids:
SELECT Articolo
INTO duplicate_table
FROM tlbDescrizione
GROUP BY [Articolo]
HAVING COUNT(*) > 1;

SQL query - How to get max value of a column of each group by column value

I have a table that contains 10 million rows, like this:
I want to group by [CoinNameId] (this column is a foreign key) and get max value of [CreatedAt] for each [CoinNameId] group, but my query returns an error:
How can I solve this?
When you use aggregates in the select clause, every field that is not aggregated needs to be in the group by. That's why you are getting an error. I'm not sure why you had select * in your query.
You'd have to have a query like this:
SELECT CoinNameID, max([CreatedAt])
FROM [dbo].[CoinData]
GROUP BY [CoinNameID]
If you just want column CreatedAt and MAX(CreatedAt) in that case you can do like following.
SELECT CoinNameID, MAX([CreatedAt])
FROM [dbo].[CoinData]
GROUP BY [CoinNameID]
In case if you want all columns along with the MAX([CreatedAt]), you can get it like following.
SELECT *,
(SELECT MAX([CreatedAt])
FROM [dbo].[CoinData] CDI WHERE CDI.CoinNameID=CD.CoinNameID) AS MAX_CreatedAt
FROM [dbo].[CoinData] CD
You have select * on your query
SELECT
CoinNameId,MAX(CreatedAt) AS MaxCreatedAt
FROM [dbo].[CoinData]
GROUP BY CoinNameId
This will return MAX(CreatedAt) with other columns
SELECT
*, MAX([CreatedAt]) OVER (PARTITION BY [CoinNameId])
FROM [dbo].[CoinData]

count all the distinct records in a table

I need to count all the distinct records in a table name with a single query and also without using any sub-query.
My code is
select count ( distinct *) from table_name
It gives an error:
Incorrect syntax near '*'.
I am using Microsoft SQL Server
Try this -
SELECT COUNT(*)
FROM
(SELECT DISTINCT * FROM [table_name]) A
I'm afraid that if you don't want to use a subquery, the only way to achieve that is replacing * with a concatenation of the columns in your table
select count(distinct concat(column1, column2, ..., columnN))
from table_name
To avoid undesired behaviours (like the concatenation of 1 and 31 becoming equal to the concatenation of 13 and 1) you could add a reasonable separator
select count(distinct concat(column1, '$%&£', column2, '$%&£', ..., '$%&£', columnN)
from table_name
You can use CTE.
;WITH CTE AS
(
SELECT DISTINCT * FROM TableName
)
SELECT COUNT(*)
FROM CTE
Hope this query gives you what you required.
As others mentioned, you cannot use DISTINCT with *. Also it is good practice to use a column name instead of the *, like a unique key / primary key of the table.
SELECT COUNT( DISTINCT id )
FROM table
select distinct Name , count(Name) from TableName
group by Name
having count(Name)=1
select ##rowcount
I had the same issue involving a query that had multiple joins to tables and I could not simply do count(distinct ) or count(distinct alias.).
My solution was to create a string made up of the key columns I cared about and count them.
SELECT Count(DISTINCT person.first || '~' || person.last)
from person;
If you want to use DISTINCT keyword, you need to specify column name on which bases you want to get distinct records.
Example:
SELECT count(DISTINCT Column-Name) FROM table_name

SELECT *, COUNT(*) in SQLite

If i perform a standard query in SQLite:
SELECT * FROM my_table
I get all records in my table as expected. If i perform following query:
SELECT *, 1 FROM my_table
I get all records as expected with rightmost column holding '1' in all records. But if i perform the query:
SELECT *, COUNT(*) FROM my_table
I get only ONE row (with rightmost column is a correct count).
Why is such results? I'm not very good in SQL, maybe such behavior is expected? It seems very strange and unlogical to me :(.
SELECT *, COUNT(*) FROM my_table is not what you want, and it's not really valid SQL, you have to group by all the columns that's not an aggregate.
You'd want something like
SELECT somecolumn,someothercolumn, COUNT(*)
FROM my_table
GROUP BY somecolumn,someothercolumn
If you want to count the number of records in your table, simply run:
SELECT COUNT(*) FROM your_table;
count(*) is an aggregate function. Aggregate functions need to be grouped for a meaningful results. You can read: count columns group by
If what you want is the total number of records in the table appended to each row you can do something like
SELECT *
FROM my_table
CROSS JOIN (SELECT COUNT(*) AS COUNT_OF_RECS_IN_MY_TABLE
FROM MY_TABLE)