I have an oracle 10g database that has 2 tables: a REBATES table, and an ORDERS table.
The REBATES table looks sort of like this:
| rebate_percentage | min_purchase |
------------------------------------
| 1.0 | 5000 |
| 1.5 | 7000 |
| 2.0 | 11000 |
| 5.0 | 20000 |
I'm trying to determine the rebate percentage to apply, based on total orders. I know how to find the sum of all orders for a particular customer, for a particular time range, but how do I also grab the rebate percentage, all in one query?
For example, if the order total is 16,000 then how can I construct a query that takes this value, compares it against the REBATES table, and returns 2.0?
In my opinion, the easiest way is if you have a min and max purchase amounts:
select rebate_percentage, min_purchase,
(lead(min_purchase, 1) over (order by min_purchase) - 1) as max_purchase
from rebates
Then you can do a simple between join, where the join condition looks like:
on totalorders between rebates.min_purchase and rebates.max_purchase
You can handle the final case (with NULLs) with a modified join condition:
on totalorders >= rebates.min_purchase and
(totalorders <= rebates.max_purchase or rebates.max_purchase is null)
Or, alternatively, by changing the original logic to have a coalesce() on the lead function with some very large value.
use Functions:
Example:
FUNCTION RebatePercentage(purchase Number) RETURN NUMBER IS
rebateVal NUMBER;
minPurchase NUMBER;
BEGIN
SELECT MAX(min_purchase)
INTO minPurchase
FROM REBATES
WHERE min_purchase <= purchase;
SELECT rebate_percentage
INTO rebateVal
FROM REBATES WHERE min_purchase = minPurchase;
RETURN ( rebateVal );
END;
Now you can call this function in your query
SELECT RebatePercentage(purchase_amt) from orders;
Related
I have a table which requires filtering based on the dates.
| Group | Account || Values | Date_ingested |
| -------- | -------- || -------- | -------- |
| X | 3000 || 0 | 2023-01-07 |
| Y | 3000 || null | 2021-02-22 |
The goal is to select the latest date when there is multiple data points like in the example above.
The account 3000 in the dataframe occurs under two Groups but the up-to-date and correct result should only reflect the group X because it was ingested into Databricks very recently.
Now, if I try to use the code below with grouping the code gets executed but the max function is ignored and in the results I get two results for account 3000 with group X and then Y.
Select Group, Account, Values, max(Date_ingested) from datatableX
If I choose to use the code without grouping, I get the following error
Error in SQL statement: AnalysisException: grouping expressions sequence is empty, and 'datatableX.Account' is not an aggregate function. Wrap '(max(spark_catalog.datatableX.Date_ingested) AS`max(Date_ingested))' in windowing function(s) or wrap 'spark_catalog.datatableX.Account' in first() (or first_value) if you don't care which value you get.
I can't, however, figure out a way to do the above. Tried reading about the aggreate functions but I can't grasp the concept.
Select Group, Account, Values, max(Date_ingested) from datatableX
or
Select Group, Account, Values, max(Date_ingested) from datatableX
group by Group, Account, Values
You want the entire latest record per account, which suggests filtering rather than aggregation.
A typical approach uses rank() to enumerate records having the same account by descending date of ingestion, then filters on the top-record per group in the outer query:
select *
from (
select d.*,
row_number() over(partition by account order by date_ingested desc) rn
from datatableX
) d
where rn = 1
I have a table: maps_query like below:
CREATE TABLE maps_query
(
id int
day varchar
search_query varchar
country varchar
query_score int
)
The question is to output the relative percentage of queries for maps_query for each country.
Desired output is like below:
country pct
----------------
CA 0.13
FR 0.45
GB 0.21
I don't quite understand what relative percentage is here but I assumed it's asking to output (a country's search_query counts/ all search_query for all countries)?
Would something like the following work?
Select
country,
(sum(search_query) / sum(search_query) over () * 100) pct
From
map_search
Group by
country
You almost have it. Here's your SQL adjusted slightly:
SELECT country
, SUM(query_score) / (SUM(SUM(query_score)) OVER ()) AS pct
, SUM(query_score)
, SUM(SUM(query_score)) OVER ()
FROM map_search
GROUP BY country
;
The result, using some test data:
+---------+--------+------------------+-------------------------------+
| country | pct | SUM(query_score) | SUM(SUM(query_score)) OVER () |
+---------+--------+------------------+-------------------------------+
| C1 | 0.5323 | 3300 | 6200 |
| C2 | 0.4677 | 2900 | 6200 |
+---------+--------+------------------+-------------------------------+
search_query wasn't a numeric type. I think you meant query_score.
No need to multiply by 100, if your expected result is not a percent, but just the fraction between 0 and 1.
Your use of a window function wasn't quite valid, since you tried to SUM OVER a non-aggregate (expression not functionally dependent on the GROUP BY terms).
I resolved that by using SUM(query_score) as the expression to use in the window function argument.
First compute total search query and then use that total to get relative percentage for each country.
with total as
( select sum(search_query) as total
from maps_query)
select country,
search_query / total.total
from maps_query,total
The search_query seems to be a query string. You cannot sum it, it's not a number. What you probably want to do is count queries per country.
Apart from this your query looks quite fine, but if search_query really were a number to add up then you'd have to calculate the sum and the sum of the sums: sum(search_query) / sum(sum(search_query)) over ().
Here is your query corrected:
Select
country,
(count(*) / sum(count(*)) over () * 100) as pct
From
map_search
Group by
country
Order by
country;
I'm trying to add a column which calculates percentages of different products in MS Access Query. Basically, this is the structure of the query that I'm trying to reach:
Product |
Total |
Percentage
Prod1 |
15 |
21.13%
Prod2 |
23 |
32.39%
Prod3 |
33 |
46.48%
Product |
71 |
100%
The formula for finding the percent I use is: ([Total Q of a Product]/[Totals of all Products])*100, but when I try to use the expression builder (since my SQL skills are basic) in MS Access to calculate it..
= [CountOfProcuts] / Sum([CountOfProducts])
..I receive an error message "Cannot have aggregate function in GROUP BY clause.. (and the expression goes here)". I also tried the option with two queries: one that calculates only the totals and another that use the first one to calculate the percentages, but the result was the same.
I'll be grateful if someone can help me with this.
You can get all but the last row of your desired output with this query.
SELECT
y.Product,
y.Total,
Format((y.Total/sub.SumOfTotal),'#.##%') AS Percentage
FROM
YourTable AS y,
(
SELECT Sum(Total) AS SumOfTotal
FROM YourTable
) AS sub;
Since that query does not include a JOIN or WHERE condition, it returns a cross join between the table and the single row of the subquery.
If you need the last row from your question example, you can UNION the query with another which returns the fabricated row you want. In this example, I used a custom Dual table which is designed to always contain one and only one row. But you could substitute another table or query which returns a single row.
SELECT
y.Product,
y.Total,
Format((y.Total/sub.SumOfTotal),'#.##%') AS Percentage
FROM
YourTable AS y,
(
SELECT Sum(Total) AS SumOfTotal
FROM YourTable
) AS sub
UNION ALL
SELECT
'Product',
DSum('Total', 'YourTable'),
'100%'
FROM Dual;
In my table, I have data that looks like this:
CODE DATE PRICE
100 1/1/13 $500
100 2/1/13 $521
100 3/3/13 $530
100 5/9/13 $542
222 3/3/13 $20
350 1/1/13 $200
350 3/1/13 $225
Is it possible to create query to pull out the TWO most recent records by DATE? AND only if there are 2+ dates for a specific code. So the result would be:
CODE DATE PRICE
100 5/9/13 $542
100 3/3/13 $530
350 3/1/13 $225
350 1/1/13 $200
Bonus points if you can put both prices/dates on the same line, like this:
CODE OLD_DATE OLD_PRICE NEW_DATE NEW_PRICE
100 3/3/13 $530 5/9/13 $542
350 1/1/13 $200 3/1/13 $225
Thank you!!!
I managed to solve it with 5 sub-queries and 1 rollup query.
First we have a subquery that gives us the MAX date for each code.
Next, we do the same subquery, except we exclude our previous results.
We assume that your data is already rolled up and you won't have duplicate dates for the same code.
Next we bring in the appropriate Code / Price for the latest and 2nd latest date. If a code doesn't exist in the 2nd Max query - then we don't include it at all.
In the union query we're combining the results of both. In the Rollup Query, we're sorting and removing null values generated in the union.
Results:
CODE MaxOfOLDDATE MaxOfOLDPRICE MaxOfNEWDATE MaxOfNEWPRICE
100 2013-03-03 $530.00 2013-05-09 542
350 2013-01-01 $200.00 2013-03-01 225
Using your Data in a table called "Table", create the following queries:
SUB_2ndMaxDatesPerCode:
SELECT Table.CODE, Max(Table.Date) AS MaxOfDATE1
FROM SUB_MaxDatesPerCode RIGHT JOIN [Table] ON (SUB_MaxDatesPerCode.MaxOfDATE = Table.DATE) AND (SUB_MaxDatesPerCode.CODE = Table.CODE)
GROUP BY Table.CODE, SUB_MaxDatesPerCode.CODE
HAVING (((SUB_MaxDatesPerCode.CODE) Is Null));
SUB_MaxDatesPerCode:
SELECT Table.CODE, Max(Table.Date) AS MaxOfDATE
FROM [Table]
GROUP BY Table.CODE;
SUB_2ndMaxData:
SELECT Table.CODE, Table.Date, Table.PRICE
FROM [Table] INNER JOIN SUB_2ndMaxDatesPerCode ON (Table.DATE = SUB_2ndMaxDatesPerCode.MaxOfDATE1) AND (Table.CODE = SUB_2ndMaxDatesPerCode.Table.CODE);
SUB_MaxData:
SELECT Table.CODE, Table.Date, Table.PRICE
FROM ([Table] INNER JOIN SUB_MaxDatesPerCode ON (Table.DATE = SUB_MaxDatesPerCode.MaxOfDATE) AND (Table.CODE = SUB_MaxDatesPerCode.CODE)) INNER JOIN SUB_2ndMaxDatesPerCode ON Table.CODE = SUB_2ndMaxDatesPerCode.Table.CODE;
SUB_Data:
SELECT CODE, DATE AS OLDDATE, PRICE AS OLDPRICE, NULL AS NEWDATE, NULL AS NEWPRICE FROM SUB_2ndMaxData;
UNION ALL SELECT CODE, NULL AS OLDDATE, NULL AS OLDPRICE, DATE AS NEWDATE, PRICE AS NEWPRICE FROM SUB_MaxData;
Data (Rollup):
SELECT SUB_Data.CODE, Max(SUB_Data.OLDDATE) AS MaxOfOLDDATE, Max(SUB_Data.OLDPRICE) AS MaxOfOLDPRICE, Max(SUB_Data.NEWDATE) AS MaxOfNEWDATE, Max(SUB_Data.NEWPRICE) AS MaxOfNEWPRICE
FROM SUB_Data
GROUP BY SUB_Data.CODE
ORDER BY SUB_Data.CODE;
There you go - thanks for the challenge.
Accessing the recent data
To access the recent data, you use TOP 2. Such as you inverse the data from the table, then select the top 2. Just as you start ABC from ZYX and select the TOP 2 which would provide you with ZY.
SELECT TOP 2 * FROM table_name ORDER BY column_time DESC;
This way, you reverse the table, and then select the most recent two from the top.
Joining the Tables
To join the two columns and create a result from there quest you can use JOIN (INNER JOIN; I prefer this) such as:
SELECT TOP 2 * FROM table_name INNER JOIN table_name.column_name ON
table_name.column_name2
This way, you will join both the tables where a value in one column matches the value from the other column in both tables.
You can use a for loop for this to select the value for them, or you can use this inside the foreach loop to take out the values for them.
My suggestion
My best method would be to, first just select the data that was ordered using the date.
Then inside the foreach() loop where you will write the data for that select the remaining data for that time. And write it inside that loop.
Code (column_name) won't bother you
And when you will reference the query using ORDER By Time Desc you won't be using the CODE anymore such as WHERE Code = value. And you will get the code for the most recent ones. If you really need the code column, you can filter it out using and if else block.
Reference:
http://technet.microsoft.com/en-us/library/ms190014(v=sql.105).aspx (Inner join)
http://www.w3schools.com/sql/sql_func_first.asp (top; check the Sql Server query)
I'm quite new into SQL and I'd like to make a SELECT statement to retrieve only the first row of a set base on a column value. I'll try to make it clearer with a table example.
Here is my table data :
chip_id | sample_id
-------------------
1 | 45
1 | 55
1 | 5986
2 | 453
2 | 12
3 | 4567
3 | 9
I'd like to have a SELECT statement that fetch the first line with chip_id=1,2,3
Like this :
chip_id | sample_id
-------------------
1 | 45 or 55 or whatever
2 | 12 or 453 ...
3 | 9 or ...
How can I do this?
Thanks
i'd probably:
set a variable =0
order your table by chip_id
read the table in row by row
if table[row]>variable, store the table[row] in a result array,increment variable
loop till done
return your result array
though depending on your DB,query and versions you'll probably get unpredictable/unreliable returns.
You can get one value using row_number():
select chip_id, sample_id
from (select chip_id, sample_id,
row_number() over (partition by chip_id order by rand()) as seqnum
) t
where seqnum = 1
This returns a random value. In SQL, tables are inherently unordered, so there is no concept of "first". You need an auto incrementing id or creation date or some way of defining "first" to get the "first".
If you have such a column, then replace rand() with the column.
Provided I understood your output, if you are using PostGreSQL 9, you can use this:
SELECT chip_id ,
string_agg(sample_id, ' or ')
FROM your_table
GROUP BY chip_id
You need to group your data with a GROUP BY query.
When you group, generally you want the max, the min, or some other values to represent your group. You can do sums, count, all kind of group operations.
For your example, you don't seem to want a specific group operation, so the query could be as simple as this one :
SELECT chip_id, MAX(sample_id)
FROM table
GROUP BY chip_id
This way you are retrieving the maximum sample_id for each of the chip_id.