select only specific number of columns from Table - Hive - hive

How to select only specific number of columns from a table in hive. For Example, If I have Table with 50 Columns, then how Can I just select first 25 columns ? Is there any easy way to do it rather than hard coading the column names.

I guess that you're asking about using the order in which you defined your columns in your CREATE TABLE statement. No, that's not possible in Hive for the moment.
You could do the trick by adding a new column COLUMN_NUMBER and use that in your WHERE statements, but in that case I would really think twice of the trade off between spending some more time typing your queries and messing your whole table design by adding unnecessary columns. Apart from the fact that if you need to change your table schema in the future (for instance, by adding a new column), adapting your previous code with different column numbers would be painful.

Related

How to add column to an existing table and calculate the value

Table info:
I want to add new column and calculated the different of the alarmTime column with this code:
ALTER TABLE [DIALinkDataCenter].[dbo].[DIAL_deviceHistoryAlarm]
ADD dif AS (DATEDIFF(HOUR, LAG((alarmTime)) OVER (ORDER BY (alarmTime)), (alarmTime)));
How to add the calculation on the table? Because always there's error like this:
Windowed functions can only appear in the SELECT or ORDER BY clauses.
You are using the syntax for a generated virtual column that shows a calculated value (ADD columnname AS expression).
This, however, only works on values found in the same row. You cannot have a generated column that looks at other rows.
If you consider now to create a normal column and fill it with calculated values, this is something you shouldn't do. Don't store values redundantly. You can always get the difference in an ad-hoc query. If you store this redundantly instead, you will have to consider this in every insert, update, and delete. And if at some time you find rows where the difference doesn't match the time values, which column holds the correct value then and which the incorrect one? alarmtime or dif? You won't be able to tell.
What you can do instead is create a view for convenience:
create view v_dial_devicehistoryalarm as
select
dha.*,
datediff(hour, lag(alarmtime) over (order by alarmtime), alarmtime) as dif
from dial_devicehistoryalarm dha;
Demo: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=b7f9b5eef33e72955c7f135952ef55b5
Remember though, that your view will probably read and sort the whole table everytime you access it. If you query only a certain time range, it will be faster hence to calculate the differences in your query instead.

Removing rows with duplicated column values based on another column's value

Hey guys, maybe this is a basic SQL qn. Say I have this very simple table, I need to run a simple sql statement to return a result like this:
Basically, the its to dedup Name based on it's row's Value column, whichever is larger should stay.
Thanks!
Framing the problem correctly would help you figure it out.
"Deduplication" suggests altering the table - starting with a state with duplicates, ending with a state without them. Usually done in three steps (getting the rows without duplicates into temp table, removing original table, renaming temp table).
"Removing rows with duplicated column values" also suggests alteration of data and derails train of thought.
What you do want is to get the entire table, and in cases where the columns you care about have multiple values attached get the highest one. One could say... group by columns you care about? And attach them to the highest value, a maximum value?
select id,name,max(value) from table group by id,name

SQL to identify duplicate columns from table having hundreds of column

I've 250+ columns in customer table. As per my process, there should be only one row per customer however I've found few customers who are having more than one entry in the table
After running distinct on entire table for that customer it still returns two rows for me. I suspect one of column may be suffixed with space / junk from source tables resulting two rows of same information.
select distinct * from ( select * from customer_table where custoemr = '123' ) a;
Above query returns two rows. If you see with naked eye to results there is not difference in any of column.
I can identify which column is causing duplicates if I run query every time for each column with distinct but thinking that would be very manual task for 250+ columns.
This sounds like very dumb question but kind of stuck here. Please suggest if you have any better way to identify this, thank you.
Solving this one-time issue with sql is too much effort. Simply copy-paste to excel, transpose data into columns and use some simple function like "if a==b then 1 else 0".

How to repeat records in access table number of times

I need your assistance to figure out how to achieve the following in MS access database.
I have a table with a lot of columns but one of them has a numeric value that will be used as how many times will the record will be repeated.
I need to make another table with repeated records based on Count column.
Build a numbers (aka tally) table (you can google it). I'll call it tblNumbers. Then all you need to do is create a query SELECT <yourTable>.* FROM <yourTable>, tblNumbers WHERE tblNumbers.Number <= <yourTable>.<numberField>

SQL field as sum of other fields

This is not query related, what I would like to know is if it's possible to have a field in a column being displayed as a sum of other fields. A bit like Excel does.
As an example, I have two tables:
Recipes
nrecepie integer
name varchar(255)
time integer
and the other
Instructions
nintrucion integer
nrecepie integer
time integer
So, basically as a recipe has n instructions I would like that
recipes.time = sum(intructions.time)
Is this possible to be done in create table script?? if so, how?
You can use a view:
CREATE VIEW recipes_with_time AS
SELECT nrecepie, name, SUM(Instructions.time) AS total_time
FROM Recepies
JOIN Instructions USING (nrecepie)
GROUP BY Recepies.nrecepie
If you really want to have that data in the real table, you must use a trigger.
This could be done with an INSERT/UPDATE/DELETE trigger. Every time data is changed in table Instructions, the trigger would run and update the time value in Recepies.
You can use a trigger to update the time column everytime the instructions table is changed, but a more "normal" (less redundant) way would be to compute the time column via a group by clause on a join between the instructions and recepies [sic] table.
In general, you want to avoid situations like that because you're storing derived information (there are exceptions for performance reasons). Therefore, the best solution is to create a view as suggested by AndreKR. This provides an always-correct total that is as easy to SELECT from the database as if it were in an actual, stored column.
Depends pn the database vendor... In SQL Server for example, you can create a column that calculates it's value based on the values of other columns in the same row. they are called calculated columns, and you do it like this:
Create Table MyTable
(
colA Integer,
colB Integer,
colC Intgeer,
SumABC As colA + colB + colC
)
In general just put the column name you want, the word 'as' and the formula or equation to ghenerate the value. This approach uses no aditonal storage, it calculates thevalue each time someone executes a select aganist it, so the table profile remains narrower, and you get better performance. The only downsode is you cannot put an index on a calculated column. (although there is a flag in SQL server that allows you to specify to the database that it should persist the value whenever it is created or updated... In which case it can be indexed)
In your example, however, you are accessing data from multiple rows in another table. To do this, you need a trigger as suggested by other respondants.