SQL Query to return all of the same numbers ignoring if the number is positive or not - sql

I am looking for an SQL query to return all records based on a number. The trick is that the number can be positive or negative, but as long as the number is the same it should return.
So I have a table like this:
1 amt1 100.00
2 amt2 100.00
3 amt3 -100.00
4 amt4 120.00
A query such as this:
Select * from table where amount = '100.00'
Only returns the first 2 rows.
I want to return the first 3 rows, thus ignoring the minus sign but still matching the amount.
Any ideas?

Use in:
where amount in (100.00, -100.00)
Note the single quotes are not necessary.
You can also use abs():
where abs(amount) = 100.00
if you don't care much about indexes or query optimization.

If amount is a number:
select * from table where abs(amount) = 100.00;
If amount is text:
select * from table where amount in ('100.00', '-100.00');

Related

Cast decimal as int when a specific value is found

I have a query that compares data between weeks using CTEs. One of things it stores is percentage data. I am trying to convert or cast any value that is equal to 100.00.
For example, here is what the result would look like
Name Percentage Other
App1 99.56 5.5
App2 100.00 6
I am hoping to remove the zero's from the 100. I have tried some case statements but I never get the results I am looking for. Here is what I have now.
SELECT
f.Application,
f.Percentage,
f.Other,
CASE
WHEN f.Percentage = 100.00
THEN 100
END AS Percentage
FROM Table1 f
Currently, the values that are 100.00 are changed to 100. However, everything not equal to 100.00 is nulled out. I can get rid of the NULLs by adding
ELSE f.Percentage
But then 100.00 is not changed.
Any help would be appreciated.
As I mentioned in the comments, you can't have a single SQL column that contains multiple data types.
That being said, whether or not it's useful in your case remains to be seen, but a string-type like nvarchar can handle both, though your code would need to be able to handle them as strings instead of numbers.
An example would be something like this:
SELECT
f.Application,
f.Percentage,
f.Other,
CASE
WHEN f.Percentage = 100.00
THEN '100'
ELSE CAST(f.Percentage as nvarchar(10))
END AS Percentage
FROM Table1 f
The Percentage column would now be of type nvarchar, but it would return values in this fashion:
Name Percentage Other
App1 99.56 5.5
App2 100 6
App3 91.23 3.5
App4 94.41 4.8
App5 100 6

Hive: Create rows with summed data, by date (unknown number of dates)

I am currently working with a Hive Table which contains transactions data and I need to do some basic statistics on these data, and put the results in a new table.
EDIT: I'm using Hive 0.13 on Hadoop 2.4.1.
CONTEXT
First, let me try to present the input table: here's a table with 3 columns, an ID, a date (month/year), and an amount:
<ID> <Date> <Amount>
1 11.2014 5.00
2 11.2014 10.00
3 12.2014 15.00
1 12.2014 7.00
1 12.2014 15.00
2 01.2015 20.00
3 01.2015 30.00
3 01.2015 45.00
... ... ...
And the desired output consist of a table grouped by IDs, where in each line I sum the the amounts, for each corresponding months:
<ID> <11.2014> <12.2014> <01.2015> <...>
1 5.00 22.00 0.00 ...
2 10.00 0.00 20.00 ...
3 15.00 0.00 75.00 ...
... ... ... ... ...
Considering that the original table has >4 million IDs and > 500 million lines, on more then 2 years. It seems pretty hard to hardcode the table by hand since I don't know how many columns I should create.
(I know how many different dates I have, but if the original table grows over 5, 10, 15 years, there is going to be a lot to do by hand and that's risky.)
THE CHALLENGE
I know how to do some basic manipulations and GROUP BYs, I can even do some CASE WHEN, but the tricky part in my problem is that I can not create columns like this (as mentionned above)...
SUM (CASE WHEN Date = 11.2014 THEN Amount ELSE 0 END) AS 11.2014
SUM (CASE WHEN Date = 12.2014 THEN Amount ELSE 0 END) AS 12.2014
SUM (CASE WHEN Date = 01.2015 THEN Amount ELSE 0 END) AS 01.2015
SUM (CASE WHEN Date = ??? THEN Amount ELSE 0 END) AS ???
... because I don't know how many different dates I'll eventually have, so I would need something like this:
SUM (CASE WHEN Date = [loop over each dates] THEN Amount ELSE 0 END)
AS [the date selected in the loop]
THE QUESTION
Do you have something to propose in order to :
How can I loop over all the dates ?
And be able to create a colum for every dates I have without specifying myself the name of the soon to be created column ?
Is it doable in a single HiveQL script ? (not obligated but could be really nice)
I would like to avoid UDF but at this point I'm not sure it's preventable since I haven't find any case that ressemble mine.
Thanks in advance and don't hesitate to ask for more info.
This is too long for a comment.
You cannot do exactly what you want in Hive, because a SQL query has to have a fixed number of columns when it is defined.
What can you do?
The easiest thing is simply to change what you want. Product multiple rows instead of multiple columns:
select id, date, sum(amount)
from table t
group by id, date;
You can then load the data into your favorite spreadsheet and pivot it there.
Other alternatives. You can write a query that will write the appropriate query. This would go through the table, identify the possible dates, and construct a SQL statement. You can then run the SQL statement.
Or, you could use some other data types, such as a list or JSON to store the aggregated values in one row.

Counting Results in SQL

I'm having trouble using COUNT in SQL...The following query returns two rows, but then returns the raps column as 137. So I believe it's counting the total number of operation_id columns in the dataset instead of from the results.
Is there any way to make it count only the columns from the results, so that raps returns as 1 in each of the columns? I would then use PHP to add them together.
//Query
SELECT DISTINCT hrap_id,
operation_id,
COUNT (operation_id) AS raps,
operation_type
FROM view_rappels
WHERE year = '2013' AND crew_id = '4'
GROUP BY hrap_id, operation_type, operation_id
//Results
10.00 702020000.00 137.00 operational
1.00 702020000.00 137.00 operational
You need to put DISTINCT inside of the count function like so
COUNT(DISTINCT operation_id) AS raps

SQL Rounding Percentages to make the sum 100% - 1/3 as 0.34, 0.33, 0.33

I am currently trying to split one value with percentage column. But as most of percentages values are 1/3, I am not able to get aboslute 100% with two decimal points in the value. For example:
Product Supplier percentage totalvalue customer_split
decimal(15,14) (decimal(18,2) decimal(18,2)
-------- -------- ------------ --------------- ---------------
Product1 Supplier1 0.33 10.00 3.33
Product1 Supplier2 0.33 10.00 3.33
Product1 Supplier3 0.33 10.00 3.33
So, here we are missing 0.01 in the value column and suppliers would like to put this missing 0.01 value against any one of the supplier randomly. I have been trying to get this done in a two sets of SQLs with temporary tables, but is there any simple way of doing this. If possible how can I get 0.34 in the percentage column itself for one of the above rows? 0.01 is negligible value, but when the value column is 1000000000 it is significant.
It sounds like you're doing some type of "allocation" here. This is a common problem any time you are trying to allocate something from a higher granulartiy to a lower granularity, and you need to be able to re-aggregate to the total value correctly.
This becomes a much bigger problem when dealing with larger fractions.
For example, if I try to divide a total value of, say $55.30 by eight, I get a decimal value of $6.9125 for each of the eight buckets. Should I round one to $6.92 and the rest to $6.91? If I do, I will lose a cent. I would have to round one to $6.93 and the others to $6.91. This gets worse as you add more buckets to divide by.
In addition, when you start to round, you introduce problems like "Should 33.339 be rounded to 33.34 or 33.33?"
If your business logic is such that you just want to take whatever remainder beyond 2 significant digits may exist and add it to one of the dollar values "randomly" so you don't lose any cents, #Diego is on the right track with this.
Doing it in pure SQL is a bit more difficult. For starters, your percentage isn't 1/3, it's .33, which will yield a total value of 9.9, not 10. I would either store this as a ratio or as a high-precision decimal field (.33333333333333).
P S PCT Total
-- -- ------------ ------
P1 S1 .33333333333 10.00
P2 S2 .33333333333 10.00
P3 S3 .33333333333 10.00
SELECT
BaseTable.P, BaseTable.S,
CASE WHEN BaseTable.S = TotalTable.MinS
THEN BaseTable.BaseAllocatedValue + TotalTable.Remainder
ELSE BaseTable.BaseAllocatedValue
END As AllocatedValue
FROM
(SELECT
P, S, FLOOR((PCT * Total * 100)) / 100 as BaseAllocatedValue,
FROM dataTable) BaseTable
INNER JOIN
(SELECT
P, MIN(S) AS MinS,
SUM((PCT * Total) - FLOOR((PCT * Total * 100)) / 100) as Remainder,
FROM dataTable
GROUP BY P) as TotalTable
ON (BaseTable.P = TotalTable.P)
It appears your calculation is an equal distribution based on the total number of products per supplier. If it is, it may be advantageous to remove the percentage and instead just store the count of items per supplier in the table.
If it is also possible to store a flag indicating the row that should get the remainder value applied to it, you could assign based on that flag instead of randomly.
run this, it will give an idea on how you can solve your problem.
I created a table called orders just with an ID to be easy to understand:
create table orders(
customerID int)
insert into orders values(1)
go 3
insert into orders values(2)
go 3
insert into orders values(3)
go 3
these values represent the 33% you have
1 33.33
2 33.33
3 33.33
now:
create table #tempOrders(
customerID int,
percentage numeric(10,2))
declare #maxOrder int
declare #maxOrderID int
select #maxOrderID = max(customerID) from orders
declare #total numeric(10,2)
select #total =count(*) from orders
insert into #tempOrders
select customerID, cast(100*count(*)/#total as numeric(10,2)) as Percentage
from orders
group by customerID
update #tempOrders set percentage = percentage + (select 100-sum(Percentage) from #tempOrders)
where customerID =#maxOrderID
this code will basically calculate the percentage and the order with the max ID, then it gets the diference from 100 to the percentage sum and add it to the order with the maxID (your random order)
select * from #tempOrders
1 33.33
2 33.33
3 33.34
This should be an easy task using Windowed Aggregate Functions. You probably use them already for the calculation of customer_split:
totalvalue / COUNT(*) OVER (PARTITION BY Product) as customer_split
Now sum up the customer_splits and if there's a difference to total value add (or substract) it to one random row.
SELECT
Product
,Supplier
,totalvalue
,customer_split
+ CASE
WHEN COUNT(*)
OVER (PARTITION BY Product
ROWS UNBOUNDED PRECEDING) = 1 -- get a random row, using row_number/order you might define a specific row
THEN totalvalue - SUM(customer_split)
OVER (PARTITION BY Product)
ELSE 0
END
FROM
(
SELECT
Product
,Supplier
,totalvalue
,totalvalue / COUNT(*) OVER (PARTITION BY Product) AS customer_split
FROM dropme
) AS dt
After more than one trial and test i think i found better solution
Idea
Get Count of all(Count(*)) based on your conditions
Get Row_Number()
Check if (Row_Number() value < Count(*))
Then select round(curr_percentage,2)
Else
Get sum of all other percentage(with round) and subtract it from 100
This steps will select current percentage every time EXCEPT Last one will be
100 - the sum of all other percentages
this is part of my code
Select your_cols
,(Select count(*) from [tbl_Partner_Entity] pa_et where [E_ID] =#E_ID)
AS cnt_all
,(ROW_NUMBER() over ( order by pe.p_id)) as row_num
,Case when (
(ROW_NUMBER() over ( order by pe.p_id)) <
(Select count(*) from [tbl_Partner_Entity] pa_et where [E_ID] =#E_ID))
then round(([partnership_partners_perc]*100),2)
else
100-
((select sum(round(([partnership_partners_perc]*100),2)) FROM [dbo].
[tbl_Partner_Entity] PEE where [E_ID] =#E_ID and pee.P_ID != pe.P_ID))
end AS [partnership_partners_perc_Last]
FROM [dbo].[tbl_Partner_Entity] PE
where [E_ID] =#E_ID

My aggregate is not affected by ROLLUP

I have a query similar to the following:
SELECT CASE WHEN (GROUPING(Name) = 1) THEN 'All' ELSE Name END AS Name,
CASE WHEN (GROUPING(Type) = 1) THEN 'All' ELSE Type END AS Type,
sum(quantity) AS [Quantity],
CAST(sum(quantity) * (SELECT QuantityMultiplier FROM QuantityMultipliers WHERE a = t.b) AS DECIMAL(18,2)) AS Multiplied Quantity
FROM #Table t
GROUP BY Name, Type WITH ROLLUP
I'm trying to return a list of Names, Types, a summed Quantity and a summed quantity multiplied by an arbitrary number. All fine so far. I also need to return a sub-total row per Name and per Type, such as the following
Name Type Quantity Multiplied Quantity
------- --------- ----------- -------------------
a 1 2 4
a 2 3 3
a ALL 5 7
b 1 6 12
b 2 1 1
b ALL 7 13
ALL ALL 24 40
The first 3 columns are fine. I'm getting null values in the rollup rows for the multiplied quantity though. The only reason I can think this is happening is because SQL doesn't recognize the last column as an aggregate now that I've multiplied it by something.
Can I somehow work around this without things getting too convoluted?
I will be falling back onto temporary tables if this can't be done.
In your sub-query to acquire the multiplier, you have WHERE a=b. Are either a or b from the tables in your main query?
If these values are static (nothing to do with the main query), it looks like it should be fine...
If the a or b values are the name or type field, they can be NULL for the rollup records. If so, you can change to something similiar to...
CAST(sum(quantity * (<multiplie_query>)) AS DECIMAL(18,2)).
If a or b are other field from your main query, you'd be getting multiple records back, not just a single multiplier. You could change to something like...
CAST(sum(quantity) * (SELECT MAX(multiplier) FROM ...)) AS DECIMAL(18,2))