Hive - getting the column names count of a table - hive

How can I get the hive column count names using HQL? I know we can use the describe.tablename to get the names of columns. How do we get the count?

create table mytable(i int,str string,dt date, ai array<int>,strct struct<k:int,j:int>);
select count(*)
from (select transform ('')
using 'hive -e "desc mytable"'
as col_name,data_type,comment
) t
;
5
Some additional playing around:
create table mytable (id int,first_name string,last_name string);
insert into mytable values (1,'Dudu',null);
select size(array(*)) from mytable limit 1;
This is not bulletproof since not all combinations of columns types can be combined into an array.
It also requires that the table will contain at least 1 row.
Here is a more complex but also stronger solution (types versa), but also requires that the table will contain at least 1 row
select size(str_to_map(val)) from (select transform (struct(*)) using 'sed -r "s/.(.*)./\1/' as val from mytable) t;

Related

Redshift - Extract value matching a condition in Array

I have a Redshift table with the following column
How can I extract the value starting by cat_ from this column please (there is only one for each row and at different position in the array)?
I want to get those results:
cat_incident
cat_feature_missing
cat_duplicated_request
Thanks!
There is no easy way to extract multiple values from within one column in SQL (or at least not in the SQL used by Redshift).
You could write a User-Defined Function (UDF) that returns a string containing those values, separated by newlines. Whether this is acceptable depends on what you wish to do with the output (eg JOIN against it).
Another option is to pre-process the data before it is loaded into Redshift, to put this information in a separate one-to-many table, with each value in its own row. It would then be trivial to return this information.
You can do this using tally table (table with numbers). Check this link on information how to create this table: http://www.sqlservercentral.com/articles/T-SQL/62867/
Here is example how you would use it. In real life you should replace temporary #tally table with a permanent one.
--create sample table with data
create table #a (tags varchar(500));
insert into #a
select 'blah,cat_incident,mcr_close_ticket'
union
select 'blah-blah,cat_feature_missing,cat_duplicated_request';
--create tally table
create table #tally(n int);
insert into #tally
select 1
union select 2
union select 3
union select 4
union select 5
;
--get tags
select * from
(
select TRIM(SPLIT_PART(a.tags, ',', t.n)) AS single_tag
from #tally t
inner join #a a ON t.n <= REGEXP_COUNT(a.tags, ',') + 1 and n<1000
)
where single_tag like 'cat%'
;
Thanks!
In the end I managed to do it with the following query:
SELECT SUBSTRING(SUBSTRING(tags, charindex('cat_', tags), len(tags)), 0, charindex(',', SUBSTRING(tags, charindex('cat_', tags), len(tags)))) tags
FROM table

How to create temp table in postgresql with values and empty column

I am very new to postgresql. I want to create a temp table containing some values and empty columns. Here is my query but it is not executing, but gives an error at , (comma).
CREATE TEMP TABLE temp1
AS (
SELECT distinct region_name, country_name
from opens
where track_id=42, count int)
What did I do wrong?
How to create a temp table with some columns that has values using select query and other columns as empty?
Just select a NULL value:
CREATE TEMP TABLE temp1
AS
SELECT distinct region_name, country_name, null::integer as "count"
from opens
where track_id=42;
The cast to an integer (null::integer) is necessary, otherwise Postgres wouldn't know what data type to use for the additional column. If you want to supply a different value you can of course use e.g. 42 as "count" instead
Note that count is a reserved keyword, so you have to use double quotes if you want to use it as an identifier. It would however be better to find a different name.
There is also no need to put the SELECT statement for an CREATE TABLE AS SELECT between parentheses.
Your error comes form your statement near the clause WHERE.
This should work :
CREATE TEMP TABLE temp1 AS
(SELECT distinct region_name,
country_name,
0 as count
FROM opens
WHERE track_id=42)
Try This.
CREATE TEMP TABLE temp1 AS
(SELECT distinct region_name,
country_name,
cast( '0' as integer) as count
FROM opens
WHERE track_id=42);

Calculate average from JSON column

I have a table with a column of JSON data that I want to extract information from. Specifically I just want to get the average value.
Example of what I have:
id speed_data
391982 [{"speed":1.3,"speed":1.3,"speed":1.4,"speed":1.5...
391983 [{"speed":0.9,"speed":0.8,"speed":0.8,"speed":1.0...
Example of what I want:
id speed_data
391982 1.375
391982 0.875
Any suggestions on how to get this query to work?
select t.*, avg(x.speed)
from tbl t,
json_array_elements(a->'speed') x
order by random()
limit 1
Your json array is messed up, like #posz commented. Would have to be:
CREATE TABLE tbl (id int, speed_data json);
INSERT INTO tbl VALUES
(391982, '{"speed":[1.3,1.3,1.4,1.5]}')
, (391983, '{"speed":[0.9,0.8,0.8,1.0]}');
You query is twisted in multiple ways, too. Would work like this in pg 9.3:
SELECT t.id, avg(x::text::numeric) AS avg_speed
FROM tbl t
, json_array_elements(speed_data->'speed') x
GROUP BY t.id;
SQL Fiddle.
In the upcoming pg 9.4 we can simplify with the new json_array_elements_text() ( also less error-prone in the cast):
SELECT t.id, avg(x::numeric) AS avg_speed
FROM tbl t
, json_array_elements_text(speed_data->'speed') x
GROUP BY t.id;
More Details:
How to turn json array into postgres array?
Aside: It would be much more efficient to store this as plain array (numeric[], not json) or in a normalized schema to begin with.

Create temporary table with fixed values

How do I create a temporary table in PostgreSQL that has one column "AC" and consists of these 4-digit values:
Zoom
Inci
Fend
In essence the table has more values, this should just serve as an example.
If you only need the temp table for one SQL query, then you can hard-code the data into a Common Table Expression as follows :
WITH temp_table AS
(
SELECT 'Zoom' AS AC UNION
SELECT 'Inci' UNION
SELECT 'Fend'
)
SELECT * FROM temp_table
see it work at http://sqlfiddle.com/#!15/f88ac/2
(that CTE syntax also works with MS SQL)
HTH

Using IN with multiple columns

Just a quick question. I'm using single values manually inputed by the user and doing an SQL query comparing to two columns, like:
SELECT col3,col1,col4 FROM table WHERE
col1='SomeReallyLongText' OR col2='SomeReallyLongText'
The repetition of SomeReallyLongText is tolerable, but my program also supports looping through an Excel document with several hundred rows - which means I'll be doing:
SELECT col3,col1,col4 FROM table WHERE
col1 IN('item1','item2',...'itemN') OR col2 IN('item1','item2',...'itemN')
And the query would be exhaustively long, which I can't imagine is efficient.
Is there a way to shorten this so two columns can be compared to the same IN(xxx) set?
If not, are there other (more efficient) ways of giving the set of values in the query?
(I'm using C# with .NET 4.0 Client Profile, using Excel Interop to access the file)
I'm not too sure about the performance you'd get with this:
SELECT col3,col1,col4 FROM table
WHERE EXISTS (
SELECT 1
FROM (VALUES
('item1')
, ('item2')
, ...
, ('itemN')
) AS It(m)
WHERE It.m IN (col1, col2, ...)
)
You can create a temp table to store all the values used inside the IN clause
IF OBJECT_ID('tempdb..#Sample') IS NOT NULL DROP TABLE #Sample
Create table #Sample
(name varchar(20))
Insert into #Sample
values
('item1'),('Item2'),....
SELECT col3,col1,col4 FROM table WHERE
col1 IN ( Select name from #Sample) OR col2 IN(Select name from #Sample)
or if you are using Linq to SQL then you can store the excel data in collection and use Contains method to query the DB
var excelVal = new string[] { 'item1','item2'... };
var result = from x in Table
where excelVal .Contains(x.Col1) || excelVal.Contains(x.Col2)
select x;