Convert decimals into whole numbers and finding the difference between two columns - hive

I have two columns like below:
A | B
0.33 | 0.55
0.44 | 0.65
10 | 20
10.1 | 10.234
11.236 | 12.8963
12 | 30
30.5698| 35.6893
Here in the above columns, the values with the decimals should be multiplied by 100 to convert them into whole numbers and the whole numbers should not be disturbed as they are in a correct format.
Using the converted whole numbers, the difference of the columns is calculated.
So I tried the mathematical function in hive say MOD function.
But using this function, the difference of the whole numbers is correct. But the difference of the decimals is wrong.
I don't know where i'm going wrong.
I tried the following code:
select mod(B,100)-mod(A,100) from sample
The actual result is:
A | B | C
0.33 | 0.55 | 22
0.44 | 0.65 | 21
10 | 20 | 10
10.1 | 10.234 | 13
11.236 | 12.8963| 166
12 | 30 | 18
30.5698| 35.6893| 512

What data type are A and B?
If you define them as decimals, all the values will have the same presicion:
create table temp.table_name (
A decimal(10,5)
,B decimal(10,5)
)
stored as parquet location '../temp.db/table_name'
;
INSERT INTO TABLE temp.table_name
VALUES (0.33 ,0.55)
,(0.44 ,0.65)
,(10 ,20)
,(10.1 ,10.234)
,(11.236 ,12.8963)
,(12 ,30)
,(30.5698,35.6893);
Result of the select (All the data with the same precision):
+---------------+---------------+--+
| table_name.a | table_name.b |
+---------------+---------------+--+
| 0.33000 | 0.55000 |
| 0.44000 | 0.65000 |
| 10.00000 | 20.00000 |
| 10.10000 | 10.23400 |
| 11.23600 | 12.89630 |
| 12.00000 | 30.00000 |
| 30.56980 | 35.68930 |
+---------------+---------------+--+
Select to get the difference in decimals:
select a ,b ,( cast(round((b*100),0) as int) -
cast(round((a*100),0) as int)) as res
from temp.table_name;
Result - difference of the decimals:
+-----------+-----------+-------+--+
| a | b | res |
+-----------+-----------+-------+--+
| 0.33000 | 0.55000 | 22 |
| 0.44000 | 0.65000 | 21 |
| 10.00000 | 20.00000 | 1000 |
| 10.10000 | 10.23400 | 13 |
| 11.23600 | 12.89630 | 166 |
| 12.00000 | 30.00000 | 1800 |
| 30.56980 | 35.68930 | 512 |
+-----------+-----------+-------+--+
Hope that can help you.

Related

Select first `n` rows of a grouped query

I am using PostgreSQL with SQLAlchemy
I have a table of GPS metrics in the form:
SELECT * FROM user_gps_location;
My Output:
| id | user_id | entry_time | lat | lng | accuracy | altitude | speed |
| 1 | 54 | 2020-07-24 14:08:30.000000 | 54.42184220 | -110.21029370 | 41.42 | 512.40 | 0.07 |
| 2 | 54 | 2020-07-24 22:20:12.000000 | 54.42189750 | -110.21038070 | 13.00 | 512.60 | 0.00 |
| 3 | 26 | 2020-07-27 13:51:11.000000 | 54.41453910 | -110.20775990 | 1300.00 | 0.00 | 0.00 |
| 4 | 26 | 2020-07-27 22:59:00.000000 | 54.42122590 | -110.20959960 | 257.52 | 509.10 | 0.00 |
| 5 | 26 | 2020-07-28 13:54:12.000000 | 54.42185280 | -110.21025010 | 81.45 | 510.20 | 0.00 |
...
I need to be able to answer the question "What are the latest 5 entries for each user since "", sorted by entry_time
Right now I only have a basic query:
select *
from user_gps_location
where user_id in (select distinct user_id
from user_gps_location
where entry_time > '2020-09-01')
and entry_time > '2020-09-01';
Applying a limit will not do what I want. I assume I need to use a grouping and window functions (?), but I do not understand them.
The row_number function is exactly what you're looking for:
SELECT *
FROM (SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY entry_time DESC) AS rn
FROM user_gps_location
WHERE entry_time > '2020-09-01') t
WHERE rn <= 5
you can use FETCH FIRST N ROWS ONLY
select * from user_gps_location
where entry_time > '2020-09-01'
order by entry_time desc
fetch first 5 rows only

SQL - divide one column by another

I have the following code:
select c.category
,sum(b.is_open) as open
,count(b.name) as total
from business b inner join category c on b.id=c.business_id
group by c.category
order by sum(b.is_open) desc
limit 10
which gives me following dataset:
+------------------+------+-------+
| category | open | total |
+------------------+------+-------+
| Restaurants | 53 | 71 |
| Shopping | 25 | 30 |
| Food | 20 | 23 |
| Health & Medical | 16 | 17 |
| Home Services | 15 | 16 |
| Beauty & Spas | 12 | 13 |
| Nightlife | 12 | 20 |
| Bars | 11 | 17 |
| Active Life | 10 | 10 |
| Local Services | 10 | 12 |
+------------------+------+-------+
However, if I change line 2 and 3 to:
sum(b.is_open) / count(b.name) as '%'
instead of a specific value, I get zeroes all along. I tried to cast both columns to decimal type (although looks like they have been such in the beginning), did not work. Why can't I get the right results? I am writing my queries in SQLite.
Try using floating point arithmetic instead of integer arithmetic:
1.0 * sum(b.is_open) / count(b.name) as '%'

How to get this kind of data?

I have data some thing like this:
+---------+---------+---------+-------+
| MAXIMUM | MINIMUM | SENSORS | TIME |
+---------+---------+---------+-------+
| 10 | 12 | 14 | 13:12 |
| 80 | 70 | 100 | 14:54 |
+---------+---------+---------+-------+
But I need something like this:
+---------+-------+
| X | Y |
+---------+-------+
| MAXIMUM | 10 |
| MINIMUM | 12 |
| SENSORS | 14 |
| TIME | 13:12 |
| MAXIMUM | 80 |
| MINIMUM | 70 |
| SENSORS | 100 |
| TIME | 14:54 |
+---------+-------+
How to get this kind of data is there any possibility to get data?
Just another option
Example
Select B.*
From YourTable
Cross Apply (values ('MAXIMUM',convert(nvarchar(50),MAXIMUM))
,('MINIMUM',convert(nvarchar(50),MINIMUM))
,('SENSORS',SENSORS)
,('TIME' ,convert(nvarchar(50),[TIME],108))
) B(x,y)
Returns
x y
MAXIMUM 10
MINIMUM 12
SENSORS 14
TIME 13:12:00
MAXIMUM 80
MINIMUM 70
SENSORS 100
TIME 14:54:00
You can use UNPIVOT:
declare #tmp table (MAXIMUM nvarchar(10), MINIMUM nvarchar(10), SENSORS nvarchar(10), [TIME] nvarchar(10))
insert into #tmp select 10,12,14 ,'13:12'
insert into #tmp select 80,70,100,'14:54'
select u.x,u.y
from #tmp s
unpivot
(
[y]
for [x] in ([MAXIMUM],[MINIMUM],[SENSORS],[TIME])
) u;
Results:

Multiply with Previous Value in Oracle SQL

Its easy to multiply (or sum/divide/etc.) with previous row in Excel spreadsheet, however, I could not do it so far in Oracle SQL.
A B C
199901 3.81 51905
199902 -6.09 48743.9855
199903 4.75 51059.32481
199904 6.39 54322.01567
199905 -2.35 53045.4483
199906 2.65 54451.15268
199907 1.1 55050.11536
199908 -1.45 54251.88869
199909 0 54251.88869
199910 4.37 56622.69622
Above, column B is static and column C has the formula as:
((B2/100)+1)*C1
((B3/100)+1)*C2
((B4/100)+1)*C3
Example: 51905 from row 1 multiplied with -6.09 from row 2:
((-6.09/100)+1)*51905
I have been trying analytical and window functions, but not succeeded yet. LAG function can give previous row value in current row, but cannot give calculated previous value.
This can be done with a help of MODEL clause
select *
FROM (
SELECT t.*,
row_number() over (order by a) as rn
from table1 t
)
MODEL
DIMENSION BY (rn)
MEASURES ( A, B, 0 c )
RULES (
c[rn=1] = 51905, -- value in a first row
c[rn>1] = round( c[cv()-1] * (b[cv()]/100 +1), 6 )
)
;
Demo: http://sqlfiddle.com/#!4/9756ed/11
| RN | A | B | C |
|----|--------|-------|--------------|
| 1 | 199901 | 3.81 | 51905 |
| 2 | 199902 | -6.09 | 48743.9855 |
| 3 | 199903 | 4.75 | 51059.324811 |
| 4 | 199904 | 6.39 | 54322.015666 |
| 5 | 199905 | -2.35 | 53045.448298 |
| 6 | 199906 | 2.65 | 54451.152678 |
| 7 | 199907 | 1.1 | 55050.115357 |
| 8 | 199908 | -1.45 | 54251.888684 |
| 9 | 199909 | 0 | 54251.888684 |
| 10 | 199910 | 4.37 | 56622.696219 |

Simple algebra with recursive SQL

The following schema is used to create simple algebraic formulas. variables is used to create formulas such as x=3+4y. variables_has_sub_variables is used to combine the previous mentioned formulas and uses the sign column (will be +1 or -1 only) to determine whether the formula should be added or subtracted to the combination.
For instance, variables table might have the following data where the Implied Formulas column is not really in the table but just for illustrative purposes only.
variables table
+-----------+-----------+-------+------------------+
| variables | intercept | slope | Implied Formula |
+-----------+-----------+-------+------------------+
| 1 | 2.86 | -0.82 | Y1=+2.86-0.82*X1 |
| 2 | 2.96 | -3.49 | Y2=+2.96-3.49*X2 |
| 3 | 2.56 | 2.81 | Y3=+2.56+2.81*X3 |
| 4 | 3.04 | -3.43 | Y4=+3.04-3.43*X4 |
| 5 | -1.94 | 4.11 | Y5=-1.94+4.11*X5 |
| 6 | -1.21 | -0.62 | Y6=-1.21-0.62*X6 |
| 7 | 0.88 | -0.61 | Y7=+0.88-0.61*X7 |
| 8 | -2.77 | -0.34 | Y8=-2.77-0.34*X8 |
| 9 | 1.81 | 1.65 | Y9=+1.81+1.65*X9 |
+-----------+-----------+-------+------------------+
Then, given the below variables_has_sub_variables data, the variables combined resulting in X7=+Y1-Y2+Y3, X8=+Y4+Y5-Y7, and X9=+Y6-Y7+Y8. Next Y7, Y8, and Y9 can be derived using the variables table resulting in Y7=+0.88-0.61*X7, etc. Note that the application will prevent an endless loop such as inserting a record where variables equals 7 and sub_variables equals 9 as variable 9 is based on variable 7.
variables_has_sub_variables table
+-----------+---------------+------+
| variables | sub_variables | sign |
+-----------+---------------+------+
| 7 | 1 | 1 |
| 7 | 2 | -1 |
| 7 | 3 | 1 |
| 8 | 4 | 1 |
| 8 | 5 | 1 |
| 8 | 7 | -1 |
| 9 | 6 | 1 |
| 9 | 7 | -1 |
| 9 | 8 | 1 |
+-----------+---------------+------+
My objective is given any variable (i.e. 1 to 9), determine the constants and root variables where a root variable is defined as not being in variables_has_sub_variables.variables (I can also easily a root column to variables if needed), and these root variables includes 1 through 6 using my above example data.
Doing so for a root variable is easier as there are no sub_variables and is simply Y1=+2.86-0.82*X1.
Doing so for variable 7 is a little trickier:
Y7=+0.88-0.61*X7
=+0.88-0.61*(+Y1-Y2+Y3)
=+0.88-0.61*(+(+2.86-0.82*X1)-(+2.96-3.49*X2)+( +2.56+2.81*X3))
= -0.62 + 0.50*X1 - 2.13*X2 - 1.71*X3
Now the SQL. Below is how I created the tables:
CREATE DATABASE algebra;
USE algebra;
CREATE TABLE `variables` (
`variables` INT NOT NULL,
`slope` DECIMAL(6,2) NOT NULL DEFAULT 1,
`intercept` DECIMAL(6,2) NOT NULL DEFAULT 0,
PRIMARY KEY (`variables`))
ENGINE = InnoDB;
CREATE TABLE `variables_has_sub_variables` (
`variables` INT NOT NULL,
`sub_variables` INT NOT NULL,
`sign` TINYINT NOT NULL,
PRIMARY KEY (`variables`, `sub_variables`),
INDEX `fk_variables_has_variables_variables1_idx` (`sub_variables` ASC),
INDEX `fk_variables_has_variables_variables_idx` (`variables` ASC),
CONSTRAINT `fk_variables_has_variables_variables`
FOREIGN KEY (`variables`)
REFERENCES `variables` (`variables`)
ON DELETE NO ACTION
ON UPDATE NO ACTION,
CONSTRAINT `fk_variables_has_variables_variables1`
FOREIGN KEY (`sub_variables`)
REFERENCES `variables` (`variables`)
ON DELETE NO ACTION
ON UPDATE NO ACTION)
ENGINE = InnoDB;
INSERT INTO variables(variables,intercept,slope) VALUES (1,2.86,-0.82),(2,2.96,-3.49),(3,2.56,2.81),(4,3.04,-3.43),(5,-1.94,4.11),(6,-1.21,-0.62),(7,0.88,-0.61),(8,-2.77,-0.34),(9,1.81,1.65);
INSERT INTO variables_has_sub_variables(variables,sub_variables,sign) VALUES (7,1,1),(7,2,-1),(7,3,1),(8,4,1),(8,5,1),(8,7,-1),(9,6,1),(9,7,-1),(9,8,1);
And now the queries. XXXX is 7, 8, and 9 for the following results. Before each query, I show my expected results.
WITH RECURSIVE t AS (
SELECT v.variables, v.slope, v.intercept
FROM variables v
WHERE v.variables=XXXX
UNION ALL
SELECT v.variables, vhsv.sign*t.slope*v.slope slope, vhsv.sign*t.slope*v.intercept intercept
FROM t
INNER JOIN variables_has_sub_variables vhsv ON vhsv.variables=t.variables
INNER JOIN variables v ON v.variables=vhsv.sub_variables
)
SELECT variables, SUM(slope) constant FROM t GROUP BY variables
UNION SELECT 'intercept' variables, SUM(intercept) intercept FROM t;
Variable 7 Desired
+-----------+----------+
| variables | constant |
+-----------+----------+
| 1 | 0.50 |
| 2 | -2.13 |
| 3 | -1.71 |
| intercept | -0.6206 |
+-----------+----------+
Variable 7 Actual
+-----------+----------+
| variables | constant |
+-----------+----------+
| 1 | 0.50 |
| 2 | -2.13 |
| 3 | -1.71 |
| 7 | -0.61 |
| intercept | -0.61 |
+-----------+----------+
5 rows in set (0.00 sec)
Variable 8 Desired
+-----------+-----------+
| variables | constant |
+-----------+-----------+
| 1 | 0.17 |
| 2 | -0.72 |
| 3 | -0.58 |
| 4 | 1.17 |
| 5 | -1.40 |
| intercept | -3.355004 |
+-----------+-----------+
Variable 8 Actual
+-----------+----------+
| variables | constant |
+-----------+----------+
| 1 | 0.17 |
| 2 | -0.73 |
| 3 | -0.59 |
| 4 | 1.17 |
| 5 | -1.40 |
| 7 | -0.21 |
| 8 | -0.34 |
| intercept | -3.36 |
+-----------+----------+
8 rows in set (0.00 sec)
Variable 9 Desired
+-----------+------------+
| variables | constant |
+-----------+------------+
| 1 | -0.54 |
| 2 | 2.32 |
| 3 | 1.87 |
| 4 | 1.92 |
| 5 | -2.31 |
| 6 | -1.02 |
| intercept | -4.6982666 |
+-----------+------------+
Variable 9 Actual
+-----------+----------+
| variables | constant |
+-----------+----------+
| 1 | -0.55 |
| 2 | 2.33 |
| 3 | 1.88 |
| 4 | 1.92 |
| 5 | -2.30 |
| 6 | -1.02 |
| 7 | 0.67 |
| 8 | -0.56 |
| 9 | 1.65 |
| intercept | -4.67 |
+-----------+----------+
10 rows in set (0.00 sec)
All I need to do is detect which variables are not the root variables and filter them out. How should this be accomplished?
In response to JNevill's answer:
For v.variables of 9
+-----------+-------+-------+----------+
| variables | depth | path | constant |
+-----------+-------+-------+----------+
| 1 | 3 | 9>7>1 | -0.55 |
| 2 | 3 | 9>7>2 | 2.33 |
| 3 | 3 | 9>7>3 | 1.88 |
| 4 | 3 | 9>8>4 | 1.92 |
| 5 | 3 | 9>8>5 | -2.30 |
| 6 | 2 | 9>6 | -1.02 |
| 7 | 2 | 9>7 | 0.67 |
| 8 | 2 | 9>8 | -0.56 |
| 9 | 1 | 9 | 1.65 |
| intercept | 1 | 9 | -4.67 |
+-----------+-------+-------+----------+
10 rows in set (0.00 sec)
I'm not going to attempt to fully wrap my head around what you are doing, and I would agree with #RickJames up in the comments that this feels like maybe not the best use-case for a database. I too am a little obsessive though. I get it.
There are couple of things that I almost always track in a recursive CTE.
The "Path". If I'm going to let a query head down a rabbit hole, I want to know how it got to the end point. So I track a path so I know which primary key was selected through each iteration. In the recursive seed (top portion) I use something like SELECT CAST(id as varchar(500)) as path... and in the recursive member (bottom portion) I do something like recursiveCTE.path + '>' + id as path...
The "Depth". I want to know how deep the iterations went to get to the resulting record. This is tracked by adding SELECT 1 as depth to the recursive seed and recursiveCTE + 1 as depth to the recursive member. Now I know how deep each record is.
I believe number 2 will solve your issue:
WITH RECURSIVE t
AS (
SELECT v.variables,
v.slope,
v.intercept,
1 as depth
FROM variables v
WHERE v.variables = XXXX
UNION ALL
SELECT v.variables,
vhsv.sign * t.slope * v.slope slope,
vhsv.sign * t.slope * v.intercept intercept,
t.depth + 1
FROM t
INNER JOIN variables_has_sub_variables vhsv ON vhsv.variables = t.variables
INNER JOIN variables v ON v.variables = vhsv.sub_variables
)
SELECT variables,
SUM(slope) constant
FROM t
WHERE depth > 1
GROUP BY variables
UNION
SELECT 'intercept' variables,
SUM(intercept) intercept
FROM t;
The WHERE clause here will restrict records in your recursive result set that have a depth of 1, meaning they were brought in from the recursive seed portion of the recursive CTE (That they are a root).
It wasn't clear if you required that the root be removed from your second UNION of your t CTE. If so, the same logic applies; just toss that WHERE clause on to restrict depth records of 1
While it may not be helpful here, an example of your recursive cte with PATH would be:
WITH RECURSIVE t
AS (
SELECT v.variables,
v.slope,
v.intercept,
1 as depth,
CAST(v.variables as CHAR(30)) as path
FROM variables v
WHERE v.variables = XXXX
UNION ALL
SELECT v.variables,
vhsv.sign * t.slope * v.slope slope,
vhsv.sign * t.slope * v.intercept intercept,
t.depth + 1,
CONCAT(t.path,'>', v.variables)
FROM t
INNER JOIN variables_has_sub_variables vhsv ON vhsv.variables = t.variables
INNER JOIN variables v ON v.variables = vhsv.sub_variables
)
SELECT variables,
SUM(slope) constant
FROM t
WHERE depth > 1
GROUP BY variables
UNION
SELECT 'intercept' variables,
SUM(intercept) intercept
FROM t;