Is lateral view+json tuple better than multiple get_json_object? - sql

In Hive, I wanna to generate three columns from one one column which comprises JSON data:
There are two options:
1- Lateral view + json_tuple:
select * from t1
lateral view json_tuple(t1.info,'user_id','gender', 'age') t2 as uid, gender, age
2- Multiple get_json_object:
select
get_json_object(info, '$.user_id') AS uid,
get_json_object(info, '$.gender') AS gender,
get_json_object(info, '$.age') AS age
from
t1
As a result, the first option runs faster. Could anyone provide reasons why the first one is better? Because it deals with the data in one time while get_json_object reads data 3 times?
Thanks for your help in advance.

Related

divide one row record into 2 rows in sql server

In my Sql table, I want to divide a row record into 2 rows.
sample data set given below.
Need to divide 2 rows like this.
This is the query I used
SELECT FRCS2.[CID],FRCS2.[DATE],FRCS2.[Status],
FRCS1.[ID],FRCS1.[DATE],FRCS1.[Status]
FROM #temp FRCS1
INNER JOIN #temp FRCS2
ON FRCS1.[ID] = FRCS2.[ID]
Please help me to solve this.
Thanks.
In SQL Server, I would recommend apply:
select v.*
from t cross apply
(values (id1, date1, status1), (id2, date2, status2)
) v(id, date, status);
Note that column names cannot be duplicated in a table. So, this assumes that the columns actually have different names.
This approach is preferred over union all because it only scans the table once. For a small table, the performance difference is negligible. It is noticeable for larger tables and can be quite significant if the "table" is really a subquery, CTE, or view.
You want UNION ALL here:
SELECT ID1, DATE1, STATUS1 FROM yourTable
UNION ALL
SELECT ID2, DATE2, STATUS;
Note that I assume that your actual columns don't have the same names, because they can't.

Compare two tables with different column numbers in sql

I am using Sybase for my SQL coding.
I was comparing tables which have the same columns such as follows:
SELECT name, date, time, location
FROM
(SELECT * FROM table1
UNION ALL
SELECT * FROM table2) data
GROUP BY name, date, time, location
HAVING count(*)!=2
Now, I want to be able to compare the table1 and table2 but now table2 has another column called origin and I am not sure on how to extend my current logic to make it happen.
---Intention: to be able to compare the two tables with varying column numbers
---How to modify this code to do it?
I want to be able to show the differences between the two tables after the query.
May someone guide me? I dont want to use joins or minus, I prefer to use the UNION way.
If you want to union two different tables, you must make up missing columns. E.g.:
SELECT name, date, time, location, origin FROM table2
UNION ALL
SELECT name, date, time, location, null as origin FROM table1
I think the problem is that you actually want to ignore the extra column. The problem with using select * is that things can change.
SELECT name, date, time, location
FROM (
SELECT name, date, time, location FROM table1
UNION ALL
SELECT name, date, time, location FROM table2
) data
GROUP BY name, date, time, location
HAVING count(*) != 2
If you're not going to use select * in one half of the union there isn't any reason include origin in the first place.

Any easier way to group by individual columns in Hive/Impala?

I need to output report of users by their age, gender, education, income, etc from our database. However, there are about 40 variables. It seems just silly to group by each variable one bye one but I'm not aware of other ways and I don't know how to write UDF to solve it yet. I'd appreciate your help.
It's not that complicated but it does come up a lot in daily work. My work environment is Hive/Impala.
We cannot implement 'Group By' task on input rows in UDF , UDAF or UDTF.
UDF takes in a single input row and output a single output row.
UDAF just does Aggregations on one column, but not by Grouping rows.
UDTF transforms a single input row to multiple output rows.
Only possible solution is to write multiple Queries and Combine them using UNION ALL and display/insert into table
Sample Query:
SELECT *
FROM
(
SELECT COUNT(column1),column1 FROM table GROUP BY column1
UNION ALL
SELECT COUNT(column2),column2 FROM table GROUP BY column2
UNION ALL
SELECT COUNT(column3),column3 FROM table GROUP BY column3
) s

Join two SQL Server tables [duplicate]

This question already has answers here:
Combine two tables for one output
(2 answers)
Closed 8 years ago.
I have two tables now I need a select or join command in SQL to have the third table just like image below
My two tables are like this:
I only know a simple things about join command in SQL, should I use join or something else?
I do not want have the third table in my database, I want that for a short time (something like virtual table). Please help !
You are actually looking for UNION or UNION ALL.
First of all, there is no condition on which to JOIN tables (review your documentation on JOIN) and JOIN is used for retrieving information about one logical element, let's say Event in your case, which has details stored in more tables.
Secondly, JOIN will make one result set with all of the columns of your two tables, when actually you are not trying to get all columns, but all rows.
For this you will have to use UNION or UNION ALL like this:
SELECT
EventID,
ID,
EventName,
Date,
Pic,
Privacy
FROM Table1
UNION ALL
SELECT
PLID AS EventID,
ID AS ID,
PlaceName AS EventName,
Date AS Date,
NULL AS Pic,
NULL AS Privacy
FROM Table2
In order to sort the result you get from the result set returned by the queries above you will need to wrap your above SELECT statements with another SELECT and use a WHERE clause at that level, like below:
SELECT *
FROM (SELECT
EventID,
ID,
EventName,
Date,
Pic,
Privacy
FROM Table1
UNION ALL
SELECT
PLID AS EventID,
ID AS ID,
PlaceName AS EventName,
Date AS Date,
NULL AS Pic,
NULL AS Privacy
FROM Table2) AS Result
WHERE Date > '2014-05-26'
What you're looking to do is a UNION or UNION ALL, not a join. See: http://www.w3schools.com/sql/sql_union.asp
UNION combines two tables without connecting their content. Your example shows all 4 records from the original tables unmodified.
A JOIN solution links the two tables. It's very common and you will probably use it if you're building a relational database, but it won't give you the example result.
Since the two tables don't have identical # of columns, you have to help it out here:
SELECT EventID, EventName, Date, Pic, privacy FROM [table 1]
UNION ALL
SELECT PLID, PlaceName, Date, null, null FROM [table 2]
You want to have one table from two different tables. So you need unified result set from each by renaming column in SELECT statement:
SELECT `EventID` AS `ObjectID`, `EventName` AS `ObjectName`, .... FROM table_1 ...
similary with table_2
Then combine to one result set:
SELECT `ID` AS `ObjectID`, `EventName` AS `ObjectName`, .... FROM table_1 ...
UNION
SELECT `PlaceID` AS `ObjectID`, `PlaceName` AS `ObjectName`, .... FROM table_2 ...
My mistake, I didn't take the time to examine the pictures fully. you would have to use Union since you want to return what is in both tables.

Oracle Query on 24 tables with same columns

I have 24 tables table1, table2, table3 ... with same columns to keep track of customers data on hourly basis and one rate table which rate is applied for a specific hour, rateId is a foregin key in all the 24 other tables, i need a dynamic query to fetch data from those tables on date and time basis. Can any one provide an example or guide me for that query.
You should not store the same data in 24 different tables. Partitioning (mentioned in a comment) is a very good solution when you have lots and lots of data and want to split it for performance reasons.
In any case, one way to structure your query is:
select t.*
from ((select * from table1) union all
(select * from table2) union all
. . .
(select * from table24)
) t
where <whatever you want>
You can then join this to whatever other tables you like (using rateId, for instance), filter on the fields, or whatever.
If you need to know the table where something came from, then you can get this as well:
select t.*
from ((select t.*, 1 as which from table1 t) union all
(select t.*, 2 as which from table2 t) union all
. . .
(select t.*, 24 as which from table24 t)
) t
where <whatever you want>
Note: I am using * here because the OP explicitly states that the tables have the same format. Even so, it is probably a good idea to list all the columns in each subquery.
EDIT:
As Bill suggests in the comment, you might want to turn this into a view. That way, you can write lots of queries on the tables, without worrying about the detailed tables. (And, better yet, you can fix the data structure by combining the tables, then change the view, and existing queries will work).