SQL Server : duplicate key from creating unique index - sql-server-2012

I have a table of stock quotes, and dates:
StockID QuoteID QuoteDay QuoteClose
---------------------------------------
5 95 2018-01-03 1.080
5 96 2018-01-04 1.110
5 97 2018-01-05 1.000
5 98 2018-01-06 1.030
5 99 2018-01-07 1.010
5 100 2018-01-08 0.899
5 101 2018-01-09 0.815
I create a clustered index to manipulate the data but am running into duplicate key errors with the index
CREATE UNIQUE CLUSTERED INDEX MACD_IDX ON #TBL_MACD_LOOP (StockId, QuoteId)
Different combinations of StockID and QuoteID will result in the same output:
For example (StockID, QuoteID) of (5, 11) and (51, 1) both produce an index of 511.
My solution is to add "-" between StockId and QuoteId.
Now (5, 11) produces 5-11 and (51, 1) produces 51-1.
How do I combine strings with values?

No, you are definitely mistaken.
The combinations for (StockId, QuoteId) of (5, 11) and (51, 1) are two DISTINCTLY different pairs of values.
They are NOT combined into a single value (of 511 as you assume) when creating the index entry. Those are two different values and therefore can co-exist in that table - no problem.
To prove this - just run this INSERT statement:
INSERT INTO #TBL_MACD_LOOP(StockId, QuoteId, QuoteDay, QuoteClose)
VALUES (5, 11, '20180505', 42.76), (51, 1, '20180505', 128.07)
Even with your unique index in place, this INSERT works without any trouble at all (assuming you don't already have one of these two pairs of values in your table, of course)

Related

INSERT rows into SQL Server by looping through a column with numbers

Let's say I have a very basic table:
DAY_ID
Value
Inserts
5
8
2
4
3
0
3
3
0
2
4
1
1
8
0
I want to be able to "loop" through the Inserts column, and add that many # of rows.
For each added row, I want DAY_ID to be decreased by 1 and Value to remain the same, Inserts column is irrelevant we can set to 0.
So 2 new rows should be added from DAY_ID = 5 and Value = 8, and 1 new row with DAY_ID = 2 and Value = 4. The final output of the new rows would be:
DAY_ID
Value
Inserts
(5-1)
8
0
(5-2)
8
0
(2-1)
4
0
I haven't tried much in SQL Server, I was able to create a solution in R and Python using arrays, but I'm really hoping I can make something work in SQL Server for this project.
I think this can be done using a loop in SQL.
Looping is generally not the way you solve any problems in SQL - SQL is designed and optimized to work with sets, not one row at a time.
Consider this source table:
CREATE TABLE dbo.src(DAY_ID int, Value int, Inserts int);
INSERT dbo.src VALUES
(5, 8, 2),
(4, 3, 0),
(3, 3, 0),
(2, 4, 1),
(1, 8, 0);
There are many ways to "explode" a set based on a single value. One is to split a set of commas (replicated to the length of the value, less 1).
-- INSERT dbo.src(DAY_ID, Value, Inserts)
SELECT
DAY_ID = DAY_ID - ROW_NUMBER() OVER (PARTITION BY DAY_ID ORDER BY ##SPID),
src.Value,
Inserts = 0
FROM dbo.src
CROSS APPLY STRING_SPLIT(REPLICATE(',', src.Inserts-1), ',') AS v
WHERE src.Inserts > 0;
Output:
DAY_ID
Value
Inserts
1
4
0
4
8
0
3
8
0
Working example in this fiddle.

How to replace a column value by its previous value with condition

Row 1
Category
1
New business
2
Adjustment
3
Adjustment
4
Renewal
5
Adjustment
6
Cancellation
The goal is to replace the all the Category 'Adjustment' with the above values.
Output:
Row 1
Category
1
New business
2
New business
3
New business
4
Renewal
5
Renewal
6
Cancellation
I was going to tell you it couldn't be done, but it can. I invented a "row number" column here, but you can substitute your timestamp. This is sqlite3:
CREATE TABLE data (
row integer,
category text
);
INSERT INTO data VALUES
(1, "New business"),
(2, "Adjustment"),
(3, "Adjustment"),
(4, "Renewal"),
(5, "Adjustment"),
(6, "Cancellation");
UPDATE data SET category=(
SELECT category FROM data d2
WHERE d2.category != 'Adjustment'
AND d2.row < data.row
ORDER BY d2.row DESC LIMIT 1
)
WHERE category='Adjustment';
Basically, the subquery selects the non-adjustments row that has the largest row number.

Write SQL query with division, total and %

I'm writing test queries in MS SQL Server to test reports.
Can't figure out how to calculate following:
Ingredient_Cost_Paid / Total Ingredient_Cost_Paid * 100 as 'Ingredient Cost Allow as % of Total'
This is Ingredient cost allowable as a percentage of the total ingredient cost allowable.
P.S. I'm new to SQL, so would appreciate explanations as well, so I learn for the future. Thanks
Also I'm not sure I correctly understand difference between Total and SUM.
Thanks everyone
The single quote (') is used as a delimiter for textual values. If you use the AS keyword to specify a (column) alias, you need to use square brackets ([]) if it includes spaces and/or special characters:
Ingredient_Cost_Paid / Total_Ingredient_Cost_Paid * 100 as [Ingredient Cost Allow as % of Total]
Is that what you are looking for?
Edit: I noticed that it also works with single quotes! I didn't know that! But honestly, I would not use it. I'm not sure if it's officially considered to be valid.
Regarding the difference between "Total" and SUM, I would need to understand what you mean with "Total", since that is not something that SQL understands. You could probably use the SUM aggregate function to calculate a total. An aggregate function calculates a value based on a certain column/expression in groups of rows (or in the entire table as a whole single group). So you probably need to provide (much) more information in your question to get effective help with that.
Edit:
I would like to elaborate a little on this SQL issue for you. My apologies in advance for this rather lengthy post. ;)
For example, assume that all query logic described here applies to a table called Recipe_Ingredients, which contains rows with information about ingredients for various recipes (identified by the column Recipe_ID) and the price of the recipe ingredient (in a column called Ingredient_Cost_Paid).
The (simplified) table definition would look something like this:
CREATE TABLE Recipe_Ingredients (
Recipe_ID INT NOT NULL,
Ingredient_Cost_Paid NUMERIC NOT NULL
);
For testing purposes, I created this table in a test database and populated it with the following query:
INSERT INTO Recipe_Ingredients
VALUES
(12, 4.65),
(12, 0.40),
(12, 9.98),
(27, 5.35),
(27, 12.50),
(27, 1.09),
(27, 3.00),
(65, 2.35),
(65, 0.99);
You could select all rows from the table to view all data in the table:
SELECT
Recipe_ID,
Ingredient_Cost_Paid
FROM
Recipe_Ingredients;
This would yield the following results:
Recipe_ID Ingredient_Cost_Paid
--------- --------------------
12 4.65
12 0.40
12 9.98
27 5.35
27 12.50
27 1.09
27 3.00
65 2.35
65 0.99
You could group the rows based on corresponding Recipe_ID values. Like this:
SELECT
Recipe_ID
FROM
Recipe_Ingredients
GROUP BY
Recipe_ID;
This will yield the following result:
Recipe_ID
---------
12
27
65
Not very spectacular, I agree. But you could ask the query to calculate values based on those groups as well. That's where aggregate functions like COUNT and SUM come into play:
SELECT
Recipe_ID,
COUNT(Recipe_ID) AS Number_Of_Ingredients,
SUM(Ingredient_Cost_Paid) AS Total_Ingredient_Cost_Paid
FROM
Recipe_Ingredients
GROUP BY
Recipe_ID;
This will yield the following result:
Recipe_ID Number_Of_Ingredients Total_Ingredient_Cost_Paid
--------- --------------------- --------------------------
12 3 15.03
27 4 21.94
65 2 3.34
Introducing your percentage column is somewhat tricky. The calculation has to be performed on a rowset (a table or a query result) and cannot be expressed directly in a SUM.
You could specify the previous query as a subquery in the FROM-clause of another query (this is called a table expression) and join it with table Recipe_Ingredients. That way you combine the group data back with the detail data.
I will drop the Number_Of_Ingredients column from now on. It was just an example for the COUNT function, but you do not need it for your issue at hand.
SELECT
Recipe_Ingredients.Recipe_ID,
Recipe_Ingredients.Ingredient_Cost_Paid,
Subquery.Total_Ingredient_Cost_Paid
FROM
Recipe_Ingredients
INNER JOIN (
SELECT
Recipe_ID,
SUM(Ingredient_Cost_Paid) AS Total_Ingredient_Cost_Paid
FROM
Recipe_Ingredients
GROUP BY
Recipe_ID
) AS Subquery ON Subquery.Recipe_ID = Recipe_Ingredients.Recipe_ID;
This will yield the following results:
Recipe_ID Ingredient_Cost_Paid Total_Ingredient_Cost_Paid
--------- -------------------- --------------------------
12 4.65 15.03
12 0.40 15.03
12 9.98 15.03
27 5.35 21.94
27 12.50 21.94
27 1.09 21.94
27 3.00 21.94
65 2.35 3.34
65 0.99 3.34
With this, it is pretty easy to add your calculation for the percentage:
SELECT
Recipe_Ingredients.Recipe_ID,
Recipe_Ingredients.Ingredient_Cost_Paid,
Subquery.Total_Ingredient_Cost_Paid,
CAST(Recipe_Ingredients.Ingredient_Cost_Paid / Subquery.Total_Ingredient_Cost_Paid * 100 AS DECIMAL(8,1)) AS [Ingredient Cost Allow as % of Total]
FROM
Recipe_Ingredients
INNER JOIN (
SELECT
Recipe_ID,
SUM(Ingredient_Cost_Paid) AS Total_Ingredient_Cost_Paid
FROM
Recipe_Ingredients
GROUP BY
Recipe_ID
) AS Subquery ON Subquery.Recipe_ID = Recipe_Ingredients.Recipe_ID;
Note that I also cast the percentage column values to type DECIMAL(8,1) so that you do not get values with large fractions. The above query yields the following results:
Recipe_ID Ingredient_Cost_Paid Total_Ingredient_Cost_Paid Ingredient Cost Allow as % of Total
--------- -------------------- -------------------------- -----------------------------------
12 4.65 15.03 30.9
12 0.40 15.03 2.7
12 9.98 15.03 66.4
27 5.35 21.94 24.4
27 12.50 21.94 57.0
27 1.09 21.94 5.0
27 3.00 21.94 13.7
65 2.35 3.34 70.4
65 0.99 3.34 29.6
As I said earlier, you will need to supply more information in your question if you need more specific help with your own situation. These queries and their results are just examples to show you what can be possible. Perhaps (and hopefully) this contains enough information to help you find a solution yourself. But you may always ask more specific questions, of course.

with pandas, find a max(col) row in each group

with pandas,
CREATE TABLE ForgeRock
(`id` int, `productName` varchar(7), `score` int)
;
INSERT INTO ForgeRock
(`id`, `productName`, `score`)
VALUES
(1, 'OpenIDM', '8'),
(2, 'OpenAM', '3'),
(3, 'OpenDJ', '7'),
(4, 'OpenDJ', '4'),
(5, 'OpenAM', '9')
;
wanted result is
1 OpenIDM 8
3 OpenDJ 7
5 OpenAM 9
To get the max score on each group,
df.groupby('productName')['score'].max()
Result is:
OpenAM 9
OpenDJ 7
OpenIDM 8
The result is right but, I need full colmuns -> id also.
How could I get score(max) with id and productName?
you want to use idxmax instead of max. This way, you get the index values at which the maximums occurred. You can then use these to access the entire row of the dataframe.
max_idx = df.groupby('productName', as_index=False)['score'].idxmax()
print df.loc[max_idx]
id productName score
4 5 OpenAM 9
2 3 OpenDJ 7
0 1 OpenIDM 8

How do aggregates (group by) work on SQL Server?

How does SQL Server implement group by clauses (aggregates)?
As inspiration, take the execution plan of this question's query:
select p_id, DATEDIFF(D, MIN(TreatmentDate), MAX(TreatmentDate)) from
patientsTable group by p_id
Before query data, simple select statement and its execution plan is this:
After retrieving the data with the query and execution plan:
Usually it's a Stream Aggregate or a Hash Aggregate.
Stream aggregate sorts the resultset, scans it and returns every new value (not equal to the last in scan). It allows to keep but one set of the aggregate state variables.
Hash aggregate builds a hash table from the resultset. Each entry keeps the aggregate state variables which are initialized on hash miss and updated on hash hit.
Let's see how AVG works. It needs two state variables: sum and count
grouper value
1 4
1 3
2 8
1 7
2 1
1 2
2 6
2 3
Stream Aggregate
First, it needs to sort the values:
grouper value
1 4
1 3
1 7
1 2
2 8
2 1
2 6
2 3
Then, it keeps one set of state variables, initialized to 0, and scans the sorted resultset:
grouper value sum count
-- Entered
-- Variables: 0 0
1 4 4 1
1 3 7 2
1 7 14 3
1 2 16 4
-- Group change. Return the result and reinitialize the variables
-- Returning 1, 4
-- Variables: 0 0
2 8 8 1
2 1 9 2
2 6 15 3
2 3 18 4
-- Group change. Return the result and reinitialize the variables
-- Returning 2, 4.5
-- End
Hash aggregate
Just scanning the values and keeping the state variables in the hash table:
grouper value
-- Hash miss. Adding new entry to the hash table
-- [1] (0, 0)
-- ... and updating it:
1 4 [1] (4, 1)
-- Hash hit. Updating the entry:
1 3 [1] (7, 2)
-- Hash miss. Adding new entry to the hash table
-- [1] (7, 2) [2] (0, 0)
-- ... and updating it:
2 8 [1] (7, 2) [2] (8, 1)
1 7 [1] (14, 3) [2] (8, 1)
2 1 [1] (14, 3) [2] (9, 2)
1 2 [1] (16, 4) [2] (9, 2)
2 6 [1] (16, 4) [2] (15, 3)
2 3 [1] (16, 4) [2] (18, 4)
-- Scanning the hash table and returning the aggregated values
-- 1 4
-- 2 4.5
Usually, sort is faster if the resultset is already ordered (like, the values come out of the index or a resultset sorted by the previous operation).
Hash is faster is the resultset is not sorted (hashing is faster than sorting).
MIN and MAX are special cases, since they don't require scanning the whole group: only the first and the last value of the aggregated column within the group.
Unfortunately, SQL Server, unlike most other systems, cannot utilize this efficiently, since it's not good in doing INDEX SKIP SCAN (jumping over distinct index keys).
While simple MAX and MIN (without GROUP BY clause) use a TOP method if the index on the aggregated column is present, MIN and MAX with GROUP BY use same methods as other aggregate functions do.
AS i dont have table available i tried with my custom table PRICE
I defined Primary key = ID_PRICE
Select PRICE.ID_PRICE, Max(PRICE.COSTKVARH) - min(PRICE.COSTKVARH) from PRICE
GROUP BY PRICE.ID_PRICE
Plan:
PLAN (PRICE ORDER PK_PRICE)
Adapted plan:
PLAN (PRICE ORDER PK_PRICE)
In your case p_id is a primary key so the adapted plan will be first order of patientsTable based on p_id then grouping and difference calculation will happen..