Convert string into very large integer in Hive - hive

I have numeric values stored in a string column named "hst" in a hive table with 1636626 rows, but to perform arithmetic operations(comparison, difference), I need to convert the hst column into very large integers to preserve all the digits. here's a sample of my data:
id hst
A1 155836976724851034470045871285935636480
A2 55836976724791053359504802768816491263
B1 55836977111335639658316742086388875264
A3 55836977111354662261430576153184174079
C2 55836926053814078414548020414090575872
C4 55836926053833373226361854480885874687
B2 55836926013959368986746057541906857984
B4 55836926013959368635392801615616409599
C3 55836976724870256360155492454040600576
I tried decimal type:
SELECT cast('55836976724791053359504802768816491263' as DECIMAL(38, 0))
but as the max length of the field is 39 digits and decimal type allows 38 at most it doesn't work for the first value of the sample 155836976724851034470045871285935636480
Does anyone have an idea how to achieve that ?

Related

Postgresql performing partitioning to find time difference

I am trying to fill column D and column E.
Column A: varchar(64) - unique for each trip
Column B: smallint
Column C: timestamp without time zone (excel messed it up in the
image below but you can assume this as timestamp column)
Column D: numeric - need to find out time from origin in minutes
column E: numeric - time to destination in minutes.
Each trip has got different intermediate stations and I am trying to figure out the time it has been since origin and time to destination
Cell D2 = C2 - C2 = 0
cell D3 = C3 - C2
Cell D4 = C4 - C2
Cell E2 = E6 - E2
Cell E3 = E6 - E3
Cell E6 = E6 - E6 = 0
The main issue is that each trip contains differnt number of stations for each trip_id. I can think about using partition by column but cant figure out how to implement it.
Another sub question: I am dealing with very large table (100 million rows). What is the best way Postgresql experts implement data modification. Do you create like a sample table from the original data and implement everything on the sample before implementing the modifications on the original table or do you use something like "Begin trasaction" on the original data so that you can rollback in case of any error.
PS: Help with question title appreciated.
you don't need to know the number of stops
with a as (select *,extract(minutes from c - min(c) over (partition by a)) dd,extract(minutes from max(c) over (partition by a) - c) ee from td)
update td set d=dd, e=ee
from a
where a.a = td.a and a.b=td.b
;
http://sqlfiddle.com/#!17/c9112/1

Oracle SQL - Combine results from two columns

I am seeking to combine the results of two columns, and view it in a single column:
select description1, description2 from daclog where description2 is not null;
results two registry:
1st row:
DESCRIPTION1
Initialization scans sent to RTU 1, 32 bit mask: 0x00000048. Initialization mask bits are as follows: B0 - status dump, B1 - analog dump B2 - accumulator dump, B3 - Group Data Dump, B4 - accumulat
(here begin DESCRIPTION2)
,or freeze, B5 - power fail reset, B6 - time sync.
2nd row:
DESCRIPTION1
Initialization scans sent to RTU 1, 32 bit mask: 0x00000048. Initialization mask bits are as follows: B0 - status dump, B1 - analog dump B2 - accumulator dump, B3 - Group Data Dump, B4 - accumulat
(here begin DESCRIPTION2)
,or freeze, B5 - power fail reset, B6 - time sync.
Then I need the value of description1 and description2, on the same column.
It is possible?
Thank you!
You can combine two columns into one by using || operator.
select description1 || description2 as description from daclog where description2 is not null;
If you would like to use some substrings from each of the descriptions, you can use String functions and then combine the results. FNC(description1) || FNC(descriptions2) where FNC might be a function to return the desired substring of your columns.

search the column with decimal value for '.'

I have a table with a1 and a2 float columns,
The values in a2 are calculated from a1, as a2 = 3*a1
The condition is:
If the value in a1 is 9.5, I need to get the ceiling value in a2
i.e., if the numeric value after the decimal point is greater than or equal to 5 I need to get ceiling value, else I need to get the floor value.
I have written below query
SET a2 =(case when substring(cast((a1 * 3) as varchar(6)),CHARINDEX('.',(a1*3)),1) >=5 then CEILING(a1 * 3) else FLOOR(a1 * 3) end) from table
but it obviously returns the below error:
Conversion failed when converting the varchar value '.' to data type int.
Since it, can't take varchar into ceiling or floor.
Is there any way by which I can achieve this?
Your help will be greatly appreciated.
The value of a2 keeps changing based on a1, if a1 is 4.5 a2 should ceiling of that, if a1 is 4.9 a2 should be again ceiling value but if a1 is anything below 4.5 as 4.3,4.2,4.1 then it should be a floor value
Any other approach for this would also do except ceiling and floor.
How about using round()? It implements this logic directly in the database:
set a2 = round(a1, 0);
An alternative method is to subtract 0.5:
set a2 = floor(a1 + 0.5)
If you want a2 as a string value (you say you want a float but the code returns a string), then use str():
set a2 = str(a1)
str() rounds by default.
That because you trying to compare varchar >=5, cast it to INT:
SET a2 =(case when cast(substring(cast((a1 * 3) as varchar(6)),CHARINDEX('.',(cast((a1 * 3) as varchar(6))),1) as INT) >=5 then CEILING(a1 * 3) else FLOOR(a1 * 3) end) from table

How to read Date from VB binary-file in another language?

I get a binary file from a VB-application which consists of about 1400 records of a Timestamp in Date format followed by 19 Long values.
I am able to read the data in VBA with this function:
Dim myDate As Date
Dim myLong As Long
iFileNum = FreeFile
Open "C:\test.bin" For Binary Access Read As #iFileNum
Do While Not EOF(iFileNum)
Get iFileNum, , myDate
MsgBox(myDate)
For i = 1 To 19
Get iFileNum, , myLong
Next i
Loop
Now i want to read the Date-timestamps from the file (I am already able to read the Long-values) within java, but i cannot find any information on how to interpret the 8 bytes of the Date-type.
As an example, the first timestamp in binary form is c5 b3 a2 11 80 7b e4 40.
VB output for this is 2014-11-05 0:03:06 AM.
To clarify, I am not looking for a java implementation, but for information of the binary representation of data type 'Date' from VB.
(for which i wasn't able to find any more information as than this http://msdn.microsoft.com/en-us/library/3eaydw6e.aspx , which doesn't help much)
As #RBarryYoung mentions in his comment to the question, VBA stores Date/Time values internally as floating-point Double values. The integer part is the number of days since 1899-12-30 and the fractional part is the time (e.g., 0.25 = 6 AM, 0.5 = Noon, 0.75 = 6 PM).
In the example you gave, the Date/Time (Double) value is stored in bytes as c5 b3 a2 11 80 7b e4 40. Windows has always been a little-endian environment, so by reversing the order of the bytes we know that it corresponds to the 64-bit binary value 0x40e47b8011a2b3c5.
A Binary-to-IEEE_Double convertor like this one tells us that the decimal Double value is 4.19480021527777789742685854435E4, and if we ask VBA what the corresponding Date/Time value is we get
?CDate(41948.0021527777789742685854435)
2014-11-05 00:03:06

Redis and linked hashes

everyone
I would like to ask community of help to find a way of how to cache our huge plain table by splitting it to the multiple hashes or otherwise.
The sample of table, as an example for structure:
A1 B1 C1 D1 E1 X1
A1 B1 C1 D1 E1 X2
A7 B5 C2 D1 E2 X3
A8 B1 C1 D1 E2 X4
A1 B6 C3 D2 E2 X5
A1 B1 C1 D2 E1 X6
This our denormalized data, we don't have any ability to normalize it.
So currently we must perform 'group by' to get required items, for instance to get all D* we perform 'data.GroupBy(A1).GroupBy(B1).GroupBy(C1)' and it takes a lot of time.
Temporarily we had found workaround for this by creating composite a string key:
A1 -> 'list of lines begin A1'
A1:B1 -> 'list of lines begin A1:B1'
A1:B1:C1 -> 'list of lines begin A1:B1:C1'
...
as a cache of results of grouping operations.
The question is how it can be stored efficiently?
Estimated number of lines in denormalized data around 10M records and as in my an example there is 6 columns it will be 60M entries in hash. So, I'm looking an approach to lookup values in O(N) if it's possible.
Thanks.