SQL - count amount of occurences for items in different price diapasons - sql

I have a question about SQL, and I honestly tried to search methods before asking. I will give an abstract (but precise) description below, and will greatly appreciate your example of solution (SQL query).
What I have:
Table A with category ids of the items and prices (in USD) for each item. category id has int type of value, price is string and looks like "USD 200000000" (real value is multiplied by 10^7). Tables also has a kind column with int type of value.
Table B with relation of category id and name.
What I need:
Get a table with price diapasons (like 0-100 | 100-200 | ...) as column names and count amount of items for each category id (as lines names) in all of the price diapasons. All results must be filtered by kind parameter (from table A) with value 3.
Questions, that I encountered (and which caused to ask for an example of SQL query):
Cut "USD from price string value, divide it by 10^7 and convert to float.
Gather diapasons of price values (0-100 | 100-200 | ...), with given step in the given interval (max price is considered as unknown at the start). Example: step 100 on 0-500 interval, and step 200 for values >500.
Put diapasons of price values into column names of the result table.
For each diapason, count amount of items in each category (category_id). Left limit of diapason shall not be considered (e.g. on 1000-1200 diapason, items with price 1000 shall not be considered).
Using B table, display name instead of category id.
Response is appreciated, ignorance will be understood.

If you only need category ids, then you do not need B. What you are looking for is conditional aggregation, something like:
select category_id,
sum(case when cast(substring(price, 4, 100) as int)/10000000 < 100 then 1 else 0 end) as price_000_100
sum(case when cast(substring(price, 4, 100) as int)/10000000 >= 100 and cast(substring(price, 4, 100) as int)/10000000 < 200
then 1 else 0
end) as price_100_200,
. . .
from a
group by category_id

There is no standard way to do what you describe.
That is because to do (3) you need a pivot aka crosstab, and this is not in ANSI SQL. Each DBMS has it's own implementation. Plus dynamic columns in a pivot table are an additional complication.
For example, Postgres calls it a "crosstab" and requires the tablefunc module to be installed. See this SO question and the documentation. Compare to SQL Server, which uses the PIVOT command.
You can get close using reasonably standard SQL.
Here is an example based on SQLite. A little bit of conversion would provide a solution for other systems, e.g. SUBSTR would be substring(string [from int] [for int]) in postgre.
Assuming a data table of format:
and a category name table of:
then the following code will produce:
WITH dataCTE AS
(SELECT product_id AS 'ID', CAST(SUBSTR(price, 5) AS INT)/1000000 AS 'USD',
CASE WHEN (CAST(SUBSTR(price, 5) AS INT)/1000000) <= 500 THEN
100 ELSE 200
END AS 'Interval'
FROM data
WHERE kind = 3),
groupCTE AS
(SELECT dataCTE.ID AS 'ID', dataCTE.USD AS 'USD', dataCTE.Interval AS 'Interval',
CASE WHEN dataCTE.Interval = 100 THEN
CAST(dataCTE.USD AS INT)/100
ELSE
(CAST(dataCTE.USD-500 AS INT)/200)+5
END AS 'GroupID'
FROM dataCTE),
cleanCTE AS
(SELECT *, CASE WHEN groupCTE.Interval = 100 THEN
CAST(groupCTE.GroupID *100 AS VARCHAR)
|| '-' ||
CAST((groupCTE.GroupID *100)+99 AS VARCHAR)
ELSE
CAST(((groupCTE.GroupID-5)*200)+500 AS VARCHAR)
|| '-' ||
CAST(((groupCTE.GroupID-5)*200)+500+199 AS VARCHAR)
END AS 'diapason'
FROM groupCTE
INNER JOIN cat_name AS cn ON groupCTE.ID = cn.cat_id)
SELECT *
FROM cleanCTE;
If you modify the last SELECT to:
SELECT name, diapason, COUNT(diapason)
FROM cleanCTE
GROUP BY name, diapason;
then you get a grouped output:
This is as close as you will get without specifying the exact system; even then you will have a problem with dynamically creating the column names.

Related

Get row if a number is inside a string range from a column

My table(with simplified columns) has this structure:
brand
period
fuel
Audi
2008-2016
G
BWM
2018-
D
The user will give me the matriculation year of the car and I want to look inside all the periods and find if the given year is inside the range.
For example 2010 would return me the Audi and 2020 should return the BWM.
My current query looks like the following:
SELECT *
FROM cars
WHERE brand='Audi' AND fuel='G' AND
<userIntroducedYear> BETWEEN <year1> AND <year2>;
My guess is that I should be able to get this with some subquery instead of the BETWEEN but I'm a bit lost doing it.
Thanks in advance.
You should fix your data model so you are:
Storing number values as numbers (years are numbers, not strings).
Storing only one value in a scalar column.
Postgres offers an range data type as well, which does exactly what you want.
For this, though, I'll just parse period into the period start and end years using split_part() and a lateral join to do the work in the FROM clause:
select c.*
from cars c cross join lateral
(values (split_part(c.period, '-', 1)::int, nullif(split_part(c.period, '-', 2), '')::int)
) v(period_start, period_end)
where c.brand = 'Audi' and c.fuel = 'G' and
v.period_start <= :userIntroducedYear and
(v.period_end >= :userIntroducedYear or v.period_end is null);

How to find the next sequence number in oracle string field

I have a database table with document names stored as a VARCHAR and I need a way to figure out what the lowest available sequence number is. There are many gaps.
name partial seq
A-B-C-0001 A-B-C- 0001
A-B-C-0017 A-B-C- 0017
In the above example, it would be 0002.
The distinct name values total 227,705. The number of "partial" combinations is quite large A=150, B=218, C=52 so 1,700,400 potential combinations.
I found a way to iterate through from min to max per distinct value and list all the "missing" (aka available) values, but this seems inefficient given we are not using anywhere close to the max potential partial combinations (10,536 out of 1,700,400).
I'd rather have a table based on existing data with a partial value, it's next available sequence value, and a non-existent partial means 0001.
Thanks
Hmmmm, you can try this:
select coalesce(min(to_number(seq)), 0) + 1
from t
where partial = 'A-B-C-' and
not exists (select 1
from t t2
where t2.partial = t.partial and
to_number(T2.seq) = to_number(t.seq) + 1
);
EDIT:
For all partials you need a group by:
You can use to_char() to convert it back to a character, if necessary.
select partial, coalesce(min(to_number(seq)), 0) + 1
from t
where not exists (select 1
from t t2
where t2.partial = t.partial and
to_number(T2.seq) = to_number(t.seq) + 1
)
group by partial;

SQL: Calculate Percentage in new column using another column

I found it hard to describe what I wanted to do in the title, but I will be more specific here.
I have a reasonably long query:
SELECT
/*Amount earned with validation to remove outlying figures*/
Case When SUM(t2.[ActualSalesValue])>=0.01 OR SUM(t2.[ActualSalesValue])<0 Then SUM(t2.[ActualSalesValue]) ELSE 0 END AS 'Amount',
/*Profit earned (is already calculated then input into db, this just pulls that figure*/
SUM(t2.[Profit]) AS 'Profit',
/*Product Type - pulls the product type so that we can sort by product*/
t1.[ucIIProductType] AS 'Product Type',
/*Profit Percentage - This is to calculate the percentage of profit based on the sales price which uses 2 different columns - Case ensures that there are no wild values appearing in the reports as previously experienced*/
Case When SUM(t2.[ActualSalesValue])>=0.01 OR SUM(t2.[ActualSalesValue])<0 THEN (SUM(t2.[Profit])/SUM(t2.[ActualSalesValue])) ELSE 0 END AS 'Profit Percentage',
/*Percentage of Turnover*/
*SUM(t2.[ActualSalesValue])/(Select SUM(t2.[ActualSalesValue]) OVER() FROM [_bvSTTransactionsFull]) AS 'PoT'
/*The join is connect the product type with the profit and the amount*/
FROM [dbo].[StkItem] AS t1
INNER JOIN [dbo].[_bvSTTransactionsFull] AS t2
/*There attirbutes are the links between the tables*/
ON t1.[StockLink]=t2.[AccountLink]
WHERE t2.[TxDate] BETWEEN '1/Aug/2014' AND '31/Aug/2014' AND ISNUMERIC(t2.[Account]) = 1
Group By t1.[ucIIProductType]
The 'Percentage of Turnover' part I am having trouble with - I am trying to calculate the percentage of the Amount based on the total amount - using the same column. So eg: I want to take the Amount value in row 1, then divide it by the total amount of the entire column and then have that value listed in a new column. But I keep getting errors or I Keep getting 1 (because it wants to divide the value by the same value. CAN anyone please advise me on proper syntax for solving this:
/*Percentage of Turnover*/
*SUM(t2.[ActualSalesValue])/(Select SUM(t2.[ActualSalesValue]) OVER() FROM [_bvSTTransactionsFull]) AS 'PoT'
I think you want one of the following:
SUM(t2.[ActualSalesValue])/(Select SUM(t.[ActualSalesValue]) FROM [_bvSTTransactionsFull] t) AS PoT
or:
SUM(t2.[ActualSalesValue])/(SUM(SUM(t2.[ActualSalesValue])) OVER() ) AS PoT
Note: you should use single quotes only for string and date constants, not for column and table names. If you need to escape names, use square braces.

SQL query to add or subtract values based on another field

I need to calculate the net total of a column-- sounds simple. The problem is that some of the values should be negative, as are marked in a separate column. For example, the table below would yield a result of (4+3-5+2-2 = 2). I've tried doing this with subqueries in the select clause, but it seems unnecessarily complex and difficult to expand when I start adding in analysis for other parts of my table. Any help is much appreciated!
Sign Value
Pos 4
Pos 3
Neg 5
Pos 2
Neg 2
Using a CASE statement should work in most versions of sql:
SELECT SUM( CASE
WHEN t.Sign = 'Pos' THEN t.Value
ELSE t.Value * -1
END
) AS Total
FROM YourTable AS t
Try this:
SELECT SUM(IF(sign = 'Pos', Value, Value * (-1))) as total FROM table
I am adding rows from a single field in a table based on values from another field in the same table using oracle 11g as database and sql developer as user interface.
This works:
SELECT COUNTRY_ID, SUM(
CASE
WHEN ACCOUNT IN 'PTBI' THEN AMOUNT
WHEN ACCOUNT IN 'MLS_ENT' THEN AMOUNT
WHEN ACCOUNT IN 'VAL_ALLOW' THEN AMOUNT
WHEN ACCOUNT IN 'RSC_DEV' THEN AMOUNT * -1
END) AS TI
FROM SAMP_TAX_F4
GROUP BY COUNTRY_ID;
select a= sum(Value) where Sign like 'pos'
select b = sum(Value) where Signe like 'neg'
select total = a-b
this is abit sql-agnostic, since you didnt say which db you are using, but it should be easy to adapat it to any db out there.

Simplify query in H2 database - alternative to TOP X PERCENT

I'm having performance issues with a query and was wondering how to simplify it.
I have a table "Evaluations" (Sample, Category, Jury, Value)
And created some custom functions to get some average values for each sample, so I have this view:
CREATE VIEW Results AS
SELECT Sample,
Category,
IFNULL(COUNT_VALID(Value),0) || ' / ' || COUNT(Value) AS Valid,
CUSTOM_MEAN(Value) AS Mean,
CUSTOM_MEDIAN(Value) AS Median
FROM Evaluations GROUP BY Sample, Category;
Then I want to have another field telling me if each sample is within the 30% of best valued samples of its category. It would be perfect to use TOP(X) PERCENT but it seems H2 doesn't support it so I made a second view that calculates the position in category multiplied by 100, divided by the total count in category and compared to 30:
CREATE VIEW Res AS
SELECT R1.*,
CASE
WHEN (
((SELECT COUNT(*) FROM Results R2
WHERE R2.Category = R1.Category
AND (R2.Mean > R1.Mean OR (R2.Mean = R1.Mean AND R2.Median > R1.Median))) + 1) * 100
/
(SELECT COUNT(*) FROM Results R2 WHERE R2.Category = R1.Category) )
> 30
THEN 'over 30%'
ELSE 'within 30%'
END as 30PERCENT
FROM Results R1 ORDER BY Mean DESC, Median DESC;
This works properly but with just 500 records it takes some time to retrieve the results.
Could someone tell me a more efficient way of constructing this query?
Thanks and regards!