Oracle - Converting a column to rows and getting a sum - sql

Im trying to convert a column to rows and get a sum of the items ordered. Below is how the data is currently in the database. Notice how the DollarDays store gives the indication that not all stores have ordered all items.
Store ItemNumber Description Ordered
---- ---------- ---------- ---------
WallyMart 10021 Corn 10
J-Mart 10021 Corn 4
Big-H Foods 10021 Corn 32
WallyMart 20055 Beans 11
J-Mart 20055 Beans 3
Big-H Foods 20055 Beans 21
DollarDays 50277 Onions 48
This is my goal below. It looks to be pretty much grouping by ItemNumber
ItemNumber Description WallyMart J-Mart Big-H Foods DollarDays TotalOrdered
---------- ----------- --------- ------ ----------- ---------- ------------
10021 Corn 10 4 32 0 46
20055 Beans 11 3 21 0 35
50277 Onions 0 0 0 48 48
When I try PIVOT this is shortened example of what I get. Im totally lost.
ItemNumber Description WallyMart J-Mart Big-H Foods DollarDays
---------- ----------- --------- ------ ----------- ----------
10021 Corn 10 Null Null Null
10021 Corn Null 4 Null Null
10021 Corn Null Null 32 Null
10021 Corn Null Null Null 0
FYI I am of course a beginner and a student so please forgive me I haven't posted exactly right.
Any help is appreciated.

Try this:
select
itemnumber, description,
coalesce("WallyMart",0) as "WallyMart",
coalesce("J-Mart",0) as "J-Mart",
coalesce("Big-H Foods",0) as "Big-H Foods",
coalesce("DollarDays",0) as "DollarDays",
coalesce("WallyMart",0) + coalesce("J-Mart",0) + coalesce("Big-H Foods",0) + coalesce("DollarDays",0) as "Total"
from
(select * from stores) s
pivot
(max(ordered)
for store in
('WallyMart' as "WallyMart",
'J-Mart' as "J-Mart",
'Big-H Foods' as "Big-H Foods",
'DollarDays' as "DollarDays")) p
To explain a bit, the PIVOT clause consists of 2 parts - the aggregation and the list of values used for this aggregation. For your case, the aggregation used is MAX, since it looks like one store can have only one record for a particular product. If that is not so, SUM would be the right function to use. Likewise, since we want store-wise details, and not product-wise, we specify the distinct values of the store column in the list.
COALESCE is used to default null values in the ordered column to 0. Finally, we add the 4 derived columns (after coalescing to 0) to get the total value.
SQLFiddle

Related

Check for interchangable column values in SQL

I have 2 tables. The first one contains IDs of certain airports, the second contains flights from one airport to another.
ID Airport
---- ----
12 NYC
23 LOS
21 AMS
54 SFR
33 LSA
from to cost
---- ---- ----
12 23 500
12 23 250
23 12 200
21 23 100
54 12 400
33 21 700
I'd like to return a table where it contains ONLY airports that are interchangeable (NYC -LOS) in that case, with a total cost.
please note that there're identical (from , to) rows with different costs and the desired output needs to aggregate all costs for each unique combination
Desired Output :
airport_1 airport_2 total_cost
---- ---- ----
NYC LOS 950
You can get the result without need of a subquery by using LEAST() and GREATEST() functions along with HAVING clause such as
SELECT MIN(airport) AS airport_1, MAX(airport) AS airport_2, SUM(cost)/2 AS total_cost
FROM flights
JOIN airports
ON id IN ("from" , "to")
GROUP BY LEAST("from","to"), GREATEST("from","to")
HAVING COUNT(DISTINCT "from")*COUNT(DISTINCT "to")=4
where each pair(2) is counted twice(2) -> (2*2=4)
Demo

Increment a counter in a sql select query and sum by value?

I'm trying to do something a little different in a Pervasive DB.
I have a query result that comes out a little like this
part# Qty
----------------
A123 10
A123 3
B234 13
B234 43
A123 24
I can sort them with no problem. What I need is to iterate over the unique values and get a total count of them so I get something like this:
part# Qty Num OfNum
-----------------------------
A123 10 1 3
A123 3 2 3
B234 13 1 2
B234 43 2 2
A123 24 3 3
I am stuck on how to this in Pervasive.
Would something like this work for you:
SELECT t1.PartNumber, t1.Qty, (SELECT COUNT(PartNumber) FROM "myTable" WHERE PartNumber = t1.PartNumber) AS "OfNum"
FROM "myTable" AS t1
...replacing names as appropriate, of course.

How to print SQLITE_MAX_COMPOUND_SELECT when using sqlite?

As defined here, the maximum number of compound select term is SQLITE_MAX_COMPOUND_SELECT
How can we get this value when we query a sqlite database using a sql command?
e.g.
select SQLITE_MAX_COMPOUND_SELECT from SomeTableSomeWhere
or
select someFunctionForThisValue()
well, I can not answer your directly to your question, but as we know SQLITE_MAX_COMPOUND_SELECT default defined 500. We can go around with this by using SELECT LIMIT, OFFSET statement to fetch all data in a large table (hope this is want you want)
for example:
ID NAME AGE ADDRESS SALARY
---------- ---------- ---------- ---------- ----------
1 Paul 32 California 20000.0
2 Allen 25 Texas 15000.0
3 Teddy 23 Norway 20000.0
4 Mark 25 Rich-Mond 65000.0
5 David 27 Texas 85000.0
6 Kim 22 South-Hall 45000.0
With help of count rows
sqlite> SELECT count(*) FROM COMPANY
We can select limit rows with offset to fetch all database table
sqlite> SELECT * FROM COMPANY LIMIT 3 OFFSET 2;
ID NAME AGE ADDRESS SALARY
---------- ---------- ---------- ---------- ----------
3 Teddy 23 Norway 20000.0
4 Mark 25 Rich-Mond 65000.0
5 David 27 Texas 85000.0
reference here: http://www.tutorialspoint.com/sqlite/sqlite_limit_clause.htm

Running Totals Results Incorrect

I have a cursor that collects information from a different table and then updates the summary,here is the code snippet that is giving me a headache.
I am trying to update the summary table by input the correct Males and Females information that will add up to the total numbers of patients.
BEGIN
Set #MaleId =154
set #Femaleid =655
if #Category =#MaleId
begin
set #Males = #Value
end
if #Category = #Femaleid
begin
set #CummFemalesOnHaart =#Value
end
Update Stats
set Males =#Malest,Females =#Females
where
Category =#Category and
periodType =#PeriodType AND StartDate =#Startdate and EndDate =#enddate
Problem:
The results are inaccurate
organisation PeriodType StartDate EndDate Deaths Patients Males Females
------------ ---------- ---------- ---------- ------ -------- ----- -------
34 Monthly 2010-04-01 2010-04-30 NULL 6843 896 463
34 Monthly 2010-04-01 2010-04-30 NULL 10041 896 463
34 Monthly 2010-05-01 2010-05-31 NULL 10255 898 463
34 Monthly 2010-05-01 2010-05-31 NULL 7086 898 461
34 Monthly 2010-06-01 2010-06-30 NULL 10344 922 461
36 Monthly 2010-03-01 2010-03-31 NULL 4317 1054 470
36 Monthly 2010-03-01 2010-03-31 NULL 5756 896 470
36 Monthly 2010-04-01 2010-04-30 NULL 4308 896 463
36 Monthly 2010-04-01 2010-04-30 NULL 5783 896 463
The Males and Female should only update single rows that add up to the total number of patiens
e,g
Patients Males Females
-------- ----- -------
41 21 20
Any ideas?
Just to clarify a bit more
/Query Table/
organisation PeriodType StartDate EndDate Males Females
------------ ---------- ---------- ---------- ----- -------
34 Monthly 01-02-2012 01-02-2012 220 205
34 Monthly 01-02-2012 01-02-2012 30 105
/*Update statetement */
Update Stats
set Males =#Malest,Females =#Females
where
--Category =#Category and
periodType =#PeriodType AND StartDate =#Startdate and EndDate =#enddate
Stats Table /Before Input/
organisation PeriodType StartDate EndDate Deaths Patients Males Females
------------ ---------- ---------- ---------- ------ -------- ----- -------
34 Monthly 01-02-2012 01-02-2012 0 425 null null
34 Monthly 01-02-2012 01-02-2012 25 null null null
34 Monthly 01-02-2012 01-02-2012 5 null null null
34 Monthly 01-02-2012 01-02-2012 5 135 null null
/if you look at closely at the rows the period type,startdate,enddate are the same so I don't want the update values to affect any other row besides the ones with the Patients Totals/
Stats Table /* Output */
organisation PeriodType StartDate EndDate Deaths Patients Males Females
------------ ---------- ---------- ---------- ------ -------- ----- -------
34 Monthly 01-02-2012 01-02-2012 0 425 220 205
34 Monthly 01-02-2012 01-02-2012 25 null null null
34 Monthly 01-02-2012 01-02-2012 5 null null null
34 Monthly 01-02-2012 01-02-2012 5 135 30 105
I hope this gives more information
What you are proposing, from what I can see, will always run the risk of having incorrect data assigned. What happens when you have two records, one with patients 500 (males 250 and females 250) and another with patients 500 (but males 300 and females 200)?
There is no clear distinction which record totals to then use, and you'll end up with possible multiple updates with incorrect data.
That being said, I think what you should do is the following.
create a CTE which selects the patients from stats table, and males / females from query table, where patients = (males + females)
Run the update statement on the stats table, using values from the CTE joined on patients total.
This should, as far as I can understand, give a solution to what you're after.
You can read more about CTE's here: MSDN Common Table Expressions
I'm not entirely sure I understand your question... but think this is what you are after?
UPDATE Stats
SET Males = (SELECT COUNT(*) From <queryTable> WHERE Category = #MaleId and periodType =#PeriodType AND StartDate =#Startdate and EndDate =#enddate)
,SET Females = (SELECT COUNT(*) From <queryTable> WHERE Category = #FemaleId and periodType =#PeriodType AND StartDate =#Startdate and EndDate =#enddate)
Hope it helps!
I am not sure I understand the question either. But it sounds like you are saying there can be multiple records having the same periodType, startDate and endDate, but a different number of Patients. You wish to ensure the query only updates records where the total number of Patients matches the total you calculated ie total = #Males + #Females.
If that is correct, add a filter on the Patients value:
UPDATE Stats
SET Males = #Males,
Females = #Females
WHERE Category = #Category
AND periodType = #PeriodType
AND StartDate = #Startdate
AND EndDate = #enddate
AND Patients = (#Males + #Females)
Though as an aside, cursors generally tend to be less efficient than set based alternatives ..

Statistics and Cardinality Estimation - Why am I seeing this result?

I came across this little issue when trying to solve a more complex problem and have gotten to the end of my rope with trying to figure the optimizer out. So, let's say I have a table called `MyTable' that can be defined like this:
CREATE TABLE MyTable (
GroupClosuresID int identity(1,1) not null,
SiteID int not null,
DeleteDateTime datetime null
, CONSTRAINT PK_MyTable PRIMARY KEY (GroupClosuresID, SiteID))
This table has 286,685 rows in it and running DBCC SHOW_STATISTICS('MyTable','PK_MyTable') will yield:
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows
-------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------
PK_MyTable Aug 10 2011 1:00PM 286685 286685 18 0.931986 8 NO NULL 286685
(1 row(s) affected)
All density Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
3.743145E-06 4 GroupClosuresID
3.488149E-06 8 GroupClosuresID, SiteID
(2 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
------------ ------------- ------------- -------------------- --------------
1 0 8 0 1
129 1002 7 127 7.889764
242 826 6 112 7.375
531 2010 6 288 6.979167
717 1108 5 185 5.989189
889 822 4 171 4.807017
1401 2044 4 511 4
1763 1101 3 361 3.049861
14207 24780 1 12443 1.991481
81759 67071 1 67071 1
114457 31743 1 31743 1
117209 2047 1 2047 1
179109 61439 1 61439 1
181169 1535 1 1535 1
229410 47615 1 47615 1
235846 2047 1 2047 1
275456 39442 1 39442 1
275457 0 1 0 1
Now I run a query on this table with no additional indexes or statistics having been created.
SELECT GroupClosuresID FROM MyTable WHERE SiteID = 1397 AND DeleteDateTime IS NULL
Two new statistics objects now appear, one for the SiteID column and the other for DeleteDateTime column. Here they are respectively (Note: Some non-relevant information has been excluded):
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows
-------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------
_WA_Sys_00000002_7B0C223C Aug 10 2011 1:15PM 286685 216605 200 0.03384706 4 NO NULL 286685
(1 row(s) affected)
All density Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0007380074 4 SiteID
(1 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
------------ ------------- ------------- -------------------- --------------
.
.
.
1397 59.42782 16005.02 5 11.83174
.
.
.
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows
-------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------
_WA_Sys_00000006_7B0C223C Aug 10 2011 1:15PM 286685 216605 201 0.7447883 0.8335911 NO NULL 286685
(1 row(s) affected)
All density Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0001065871 0.8335911 DeleteDateTime
(1 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
----------------------- ------------- ------------- -------------------- --------------
NULL 0 255827 0 1
.
.
.
The execution plan generated for the query I ran above gives me no surprises. It consists of a simple Clustered Index Scan with 14282.3 estimated rows and 15676 actual rows. From what I've learned about statistics and cost estimation, using the two histograms above we can multiply the selectivity of SiteID (16005.02 / 286685) times the selectivity of DeleteDateTime (255827 / 286685) to get a composite selectivity of 0.0498187307480119. Multiplying that times the total number of rows (286685) gives us the exact same thing the optimizer did: 14282.3.
But here is where I get confused. I create an index with CREATE INDEX IX_MyTable ON Mytable (SiteID, DeleteDateTime) which creates its own statistics object:
Name Updated Rows Rows Sampled Steps Density Average key length String Index Filter Expression Unfiltered Rows
-------------------------------------------------------------------------------------------------------------------------------- -------------------- -------------------- -------------------- ------ ------------- ------------------ ------------ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- --------------------
IX_MyTable Aug 10 2011 1:41PM 286685 286685 200 0.02749305 8.822645 NO NULL
286685
(1 row(s) affected)
All density Average Length Columns
------------- -------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
0.0007107321 4 SiteID
7.42611E-05 4.822645 SiteID, DeleteDateTime
3.488149E-06 8.822645 SiteID, DeleteDateTime, GroupClosuresID
(3 row(s) affected)
RANGE_HI_KEY RANGE_ROWS EQ_ROWS DISTINCT_RANGE_ROWS AVG_RANGE_ROWS
------------ ------------- ------------- -------------------- --------------
.
.
.
1397 504 15686 12 42
.
.
.
When I run the same query as before (SELECT GroupClosuresID FROM MyTable WHERE SiteID = 1397 AND DeleteDateTime IS NULL) I still get 15676 rows returned, but my estimated row count is now 181.82.
I've tried manipulating numbers to try and figure out where this estimation is coming from but I just can't get it. I have to assume it is related to the density values for IX_MyTable.
Any help would be greatly appreciated. Thanks!!
EDIT: Here is the execution plan for that last query execution.
This one took some digging!
It's a product of:
NULL density in your date field (from your first set of stats 255827/286685 = .892363
...times the density of the first field (siteid) in your new index: 0.0007107321
The formula is:
.00071017321 * 286685 = 203.7562
-- est. rows with your value in siteid based on even distribution of values
255827 / 286685 = 0.892363
-- Probability of a NULL across all rows
203.7562 * 0.892363 = 181.8245
I'm guessing that since the row count in this instance doesn't actually affect anything, the optimizer took the easiest route and just multiplied the probabilities together.
Just wanted to write about it, but JNK was first.
Basically hash function calculates results now for two columns. And hash function result for SiteID = 1397 AND DeleteDateTime IS NULL matches approx 181 rows.
http://en.wikipedia.org/wiki/Hash_table#Hash_function