Getting error while trying to run a simple select hive query with order by clause - hive

Hey folks i am pretty new to hive and trying out simple hive queries
my table contains 3 columns id,name,address with the following data in it
+---------------+-----------------+--------------------+--+
| customers.id | customers.name | customers.address |
+---------------+-----------------+--------------------+--+
| 111 | john | wa |
| 222 | emily | wa |
| 333 | rick | wa |
| 444 | jane | ca |
| 555 | amit | nj |
| 666 | nina | ny |
+---------------+-----------------+--------------------+--+```
I'm getting error while running the below query
select id from customers order by address;
Error while compiling statement: FAILED: SemanticException [Error 10004]: Line 1:34 Invalid table alias or column reference 'address': (possible column names are: id)
Thanks for the help in advance

Related

Spark sql issue with columns specified

we are trying to replicate an oracle db into hive. We get the queries from oracle and run them in hive.
So, we get them in this format:
INSERT INTO schema.table(col1,col2) VALUES ('val','val');
While this query works in Hive directly, when I use spark.sql, I get the following error:
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'emp_id' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 1, pos 20)
== SQL ==
insert into ss.tab(emp_id,firstname,lastname) values ('1','demo','demo')
--------------------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:217)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:114)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:68)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:623)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:691)
at com.datastream.SparkReplicator.insertIntoHive(SparkReplicator.java:20)
at com.datastream.App.main(App.java:67)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
This error is coming as Spark SQL does not support column lists in the insert statement. so exclude the column list from the insert statement.
below was my hive table:
select * from UDB.emp_details_table;
+---------+-----------+-----------+-------------------+--+
| emp_id | emp_name | emp_dept | emp_joining_date |
+---------+-----------+-----------+-------------------+--+
| 1 | AAA | HR | 2018-12-06 |
| 1 | BBB | HR | 2017-10-26 |
| 2 | XXX | ADMIN | 2018-10-22 |
| 2 | YYY | ADMIN | 2015-10-19 |
| 2 | ZZZ | IT | 2018-05-14 |
| 3 | GGG | HR | 2018-06-30 |
+---------+-----------+-----------+-------------------+--+
here I am inserting record using spark sql through pyspark
df = spark.sql("""insert into UDB.emp_details_table values ('6','VVV','IT','2018-12-18')""");
you could see below that given record has been inserted to my existing hive table.
+---------+-----------+-----------+-------------------+--+
| emp_id | emp_name | emp_dept | emp_joining_date |
+---------+-----------+-----------+-------------------+--+
| 1 | AAA | HR | 2018-12-06 |
| 1 | BBB | HR | 2017-10-26 |
| 2 | XXX | ADMIN | 2018-10-22 |
| 2 | YYY | ADMIN | 2015-10-19 |
| 2 | ZZZ | IT | 2018-05-14 |
| 3 | GGG | HR | 2018-06-30 |
| 6 | VVV | IT | 2018-12-18 |
+---------+-----------+-----------+-------------------+--+
change your spark sql query as :
spark.sql("""insert into ss.tab values ('1','demo','demo')""");
Note: I am using spark 2.3, you need to use hive context in case you
are using spark 1.6 version.
Let me know if it works.

sql query for reading all the rows in a table by sorting the column name

I have a table which has around 5 columns, I want to arrange them in alphabetical order and display all the records present in that table.
Sample Table as below .
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Expected Output is :
|ADDRESS |AGE| ID| NAME | SALARY |
|Ahmedabad |32 | 1 | Ramesh | 2000.00 |
|Ahmedabad |32 | 1 | Ramesh | 2000.00 |
|Delhi |25 | 2 | Khilan | 1500.00 |
|Kota |23 | 3 | Kaushik | 2000.00 |
|Mumbai |25 | 4 | Chaitali | 6500.00 |
As per my understanding you need order based on column_names not on column_values.
Please follow the below steps:
Find the list of columns in your table,information_schema.columns is the (ANSI standard) view ,for oracle it is ALL_TAB_COLUMNS.Use a cursor to fetch the records in comma separated way order by column_name ASC.For eg in your case ADDRESS,AGE,ID,NAME,SALARY
Create a dynamic query using this cursor record to fetch the output.
P.S I have never tried this as there is no need to find output based on sorting of column names,if it is a fixed table then you can hardcode the column name.
Something like that:
SELECT * FROM Customers ORDER BY CustomerName
Use ORDERBY column_That_u_want_to_Order By ASC
Eg: As per your Table
Select * From Customers
ORDER BY CustomerName ASC
select * from table order by column_name;
you can simply use this query :
SELECT [ADDRESS],[AGE],[ID],[NAME],[SALARY] FROM ...
The result should be the output you expect.

I'm wanting to combine unknown number of rows into one

Here is an example of data:
TABLE: DRIVERS
ID | FNAME | LNAME
------+-------+------
WR558 | WILL | RIKER
WW123 | WALT | WHITE
TABLE: ACCIDENTS
DRIVER | ACCIDENT_NBR | ACCI_DATE | ACCI_CITY | ACCI_ST
-------+--------------+------------+-----------+--------
WW123 | 4-7777 | 2014-01-01 | Chicago | IL
WW123 | 4-7782 | 2014-01-03 | Houston | TX
WW123 | 4-7988 | 2014-01-15 | El Paso | NM
There could be any number of accidents listed for this driver in this table or there could be none
What i need to see is this:
ID | FNAME | LNAME | ACCIDENT1_NBR | ACCI1_DATE | ACCI1_CITY | ACCI1_ST | ACCIDENT2_NBR | ACCI2_DATE | ACCI2_CITY | ACCI2_ST | ACCIDENT3_NBR | ACCI3_DATE | ACCI3_CITY | ACCI3_ST | ... | ACCIDENT10_NBR | ACCI10_DATE | ACCI10_CITY | ACCI10_ST
------+-------+--------+---------------+------------+------------+----------+---------------+
WR558 | WILL | RIKER | | | | | | ...
WW123 | WALT | WHITE | 4-7777 | 2014-01-01 | Chicago | IL | 4-7782 | ...
I need to pull the driver info and the 10 most recent accidents (if any) onto this one line. I'm unsure if I need to use a PIVOT table or a FETCH. Either way, I'm not sure how to iterate into the columns as needed.
Anyone's assistance is greatly appreciated!
I'm using SQL Server 2012 and this will ultimately get pulled into a .XLS SSRS file. The pipes I included are only for display and will not be part of the final result
You should use PIVOT for this as ugly as it will be it is the correct choice if coding a C# solution is not viable.
Here is an example: Simple Pivot sample

Access(SQL) - Count distinct fields and group by field name

I am in the middle of making a Client a Access Database and am stuck on how to work around what im doing.
i have a table with somthing like
i have a table called Observations with somithing like this
Error Identified | Error Cat | ... | So on
No | | |
Yes | Dave3 | |
Yes | Dave | |
Yes | Dave3 | |
Yes | Dave5 | |
Yes | Dave | |
Yes | Dave6 | |
Yes | Dave6 | |
Yes | Dave | |
I want to count the number of occurrences that each [Error Cat] where [Error Identified] is yes
so it would bb
Error Cat | Count |
Dave | 3 |
Dave3 | 2 |
Dave5 | 1 |
Dave6 | 2 |
What is the Access SQL for this to happen
I tried so hard but it just wont run the SQL
Please help.
SELECT ErrorCat, COUNT(*) totalCount
FROM tableName
WHERE ErrorIdentified = 'YES'
GROUP BY ErrorCat

How do I add a "total_num" field by aggregating equal values with SQL?

Well, I apologize for the horrible question title. I am not a SQL or database guy so I find I am somewhat lacking the vocabulary to succinctly describe what I am trying to do. So, I will just pose the question as an anecdote.
I have two tables:
+-------+--------+------------+
| STATE | REGION | CAPITAL |
+-------+--------+------------+
| WA | X | Olympia |
| CA | IX | Sacramento |
| TX | VI | Austin |
+-------+--------+------------+
And:
+-------+--------+-------+
| NAME | NUMBER | STATE |
+-------+--------+-------+
| Tom | 1 | WA |
| Dick | 5 | WA |
| Larry | 45 | WA |
| Joe | 65 | TX |
| John | 3 | CA |
+-------+--------+-------+
How can I then query the second table so that I can "append" a fourth field to the first table that stores a total count for the number of people in that state, such that the first table would then look like this:
+-------+--------+------------+-------+
| STATE | REGION | CAPITAL | COUNT |
+-------+--------+------------+-------+
| WA | X | Olympia | 3 |
| CA | IX | Sacramento | 1 |
| TX | VI | Austin | 1 |
+-------+--------+------------+-------+
Thanks in advance.
SELECT s.STATE, s.REGION, s.CAPITAL, COUNT(*) as 'COUNT'
FROM secondtable s
JOIN firsttable f ON s.STATE = f.STATE
GROUP BY f.STATE, f.REGION, f.CAPITAL
ORDER BY COUNT(*) DESC
Try this
select sc.state,sc.region.sc.capital, count(*) as tot
from State_City_Table sc
join people_table pt on pt.state=sc.state
group by sc.state,sc.region.sc.capital