Hive table EOF error near 'Customer' 'ID' ')' - hive

I'm trying to query data for a class project in Hive. I have built an external table with the following columns: Zip code, state, customer id, sales, and date. My next step is to query the data to bring up the top 5 states with the most customers. I keep getting the following error when using this command:
SELECT State, COUNT(Customer ID) AS NumCustomers
FROM salesrecords
GROUP BY State
ORDER BY NumCustomers DESC
LIMIT 5;
enter image description here
Any help would be greatly appreciated.

column name cant be customer ID with a space in between. Please use correct column name. You can use it with backtick(`) like below.
SELECT State, COUNT(`Customer ID`) AS NumCustomers
FROM salesrecords GROUP BY State
ORDER BY NumCustomers DESC LIMIT 5;

Related

How to get the number of time a particular number appeared

I want count the number of times a single data occured in a column, how can I achieve that using mysqli. For instance I want to know the number of times Victor appeared in the column of name.
If you're using SQL server:
SELECT name, count(1)
from Tablename
where name like 'Victor'
group by name
This query will give you results like - eg Victor appeared 22 times:
Victor 22
Is this what you're looking for?
Please provide more information so its easier to assist.
Try window functions
SELECT ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Name DESC)
AS Total, Name from table
this one will give you count of records with name='Victor'
select count(name) as cnt from t where name='Victor'
this one will give you all names with counts
select name, count(1) as cnt
from t
group by name
order by name

The alias name RANK() function is not recognized in the where clause with DISTINCT columns

I have 2 tables with columns (customer, position, product ,sales_cycle, call_count , cntry_cd , owner_cd , cr8) and I am facing some challenges as mentioned below Kindly please help me to fix this
My Requirement
I have 2 tables test.table1 and test.table2
I need to insert values form "test.table2" by doing an select with "test.table1". But I am facing a problem i.e. I am getting some duplicates while loading data to "test.table2"
I have totally 8 columns in both the table but while loading I need to take the highest rank of the column "call_count" with condition of unique values of these columns (customer, position, product ,sales_cycle)
Query what I tried
select
distinct (customer, position, product ,sales_cycle),
rank () over (order by call_count desc) rnk,
cntry_cd,
owner_cd,
cr8
from test.table1
where rnk=1
I am facing few challenges in the above query (The database I am using is RedShift)
1.I can't do distinct for only few columns
2.The alias name "rnk" is not recognized in the where clause
Kindly please help me to fix this , Thanks
You can't use a column alias on the same level where it's introduced. You need to wrap the query in a derived table. The distinct as shown is useless as well if you use rank()
select customer, position, product, sales_cycle,
cntry_cd, owner_cd, cr8
from (
select customer, position, product, sales_cycle,
cntry_cd, owner_cd, cr8,
rank () over (order by call_count desc) rnk
from test.table1
) t
where rnk=1;
The derived table adds no overhead to the processing time. In this case it is merely syntactic sugar to allow you to reference the column alias.

Select SQL statement group by and sum two columns

I have an sql statement which I cant get the structure right on, when I run what I currently have It says syntax wrong. I am looking the result to look like this:-
(source: churchcom.co.uk)
.
This is my query so far but I dont think I am on the right track at all.
SELECT Name, ValueofTaught, Amount
FROM Activities
WHERE (Department = #Department)
ORDER BY Name
GROUP BY BurnhamGrade
(SUM ValueofTaught AND Amount
WHERE Departmetn = #Department)
Structure of the activities Table is like this:-
(source: churchcom.co.uk)
.
It sort of sounds like you're looking to group by two columns. I'm assuming the Value of Taught column is what gets rolled up for hours:
SELECT Department, Name, SUM([Value of Taught]) Hours, SUM(Amount) Pay
FROM Activities
GROUP BY Department, Name
ORDER BY Department, Name
WITH ROLLUP

Oracle SQL Current and previous status in the same output record

I have a weird requirement below to display the current state of an application and the previous state, my requirements in a picture
I have tried to get the top value of each application and the remaining set separately using SQL but I'm not sure of the best way to combine them. But I am sure there are easier ways to do this.
Pasting my query here.
Query 1 gives me the latest status of each application:
select application_id, last_updated, application_state
from BELL_APPLICATION_EVENTS where (application_id, last_updated) in (
select application_id, max(last_updated) as last_updated
from BELL_APPLICATION_EVENTS
group by application_id
) order by last_updated desc ;
The below query provides the data set for rest of the statuses, such as "Application finalized" and "User Email Sent" as shown in the picture separately.
select *
from BELL_APPLICATION_EVENTS U1
where last_updated < (
select max(last_updated)
from BELL_APPLICATION_EVENTS where application_id = U1.application_id)
order by U1.LAST_UPDATED desc ;
Could you please help to provide an easier option to get the current state and previous state in a single record per application id?
The LAG analytic function is perfect for this. Please use SQL Fiddle instead of data in pictures to provide a test case. From the documentation:
SELECT last_name, hire_date, salary,
LAG(salary, 1, 0) OVER (ORDER BY hire_date) AS prev_sal
FROM employees
WHERE job_id = 'PU_CLERK';

T-SQL Randomize order of results using RAND(seed)

Im using the following statement (this is a shortened version as an example) to get results from my Microsoft SQL Express 2012 database:
SELECT id, name, city
FROM tblContact
ORDER BY RAND(xxx)
and injecting a seed stored in the session for the xxx part so that the results are consistently random for a given session (so when paging through results, the user doesn't see duplicates)
PROBLEM: No matter what the seed is, the results get returned in the same order
I have also tried this:
SELECT id, name, city, RAND(xxx) AS OrderValue
FROM tblContact
ORDER BY OrderValue
Both give the same (unexpected result) - am I using this incorrectly?
The value of rand(seed) will be the same for the entire query, You my want to use the ID column to generate random value on the row per row basis:
SELECT id, name, city, RAND(xxx + id) AS OrderValue
FROM tblContact ORDER BY OrderValue
However I've been developing some functionality in the past where I needed to have random order for different session, but the same order within the same session. At that time I have used HASHBYTES() and it worked very well:
SELECT id, name, city, HASHBYTES('md5',cast(xxx+id as varchar)) AS OrderValue
FROM tblContact ORDER BY OrderValue
In SQL Server, Rand() is calculated once for the query. To get a random order, use ORDER BY NEWID().
Often, the newid() function is used for this purpose:
SELECT id, name, city
FROM tblContact
ORDER BY newid();
I have heard that rand(checksum(newid())) actually has better properties as a random number generator:
SELECT id, name, city
FROM tblContact
ORDER BY rand(checksum(newid()));
If you want consistent result from one query to the next, then #dimt's solution using id or a function of id.