clubbing multiple "With" clauses in sql - sql

I am using oracle database 10g and trying to compute the Upper control limit and lower control limit for the data set.Though it seems useless for phone number values but I am just trying to use it as a learning experience.The output should have a row wise form for entries of:-
salutation,zip,lcl and ucl value
which would allow better understanding of data.
with q as(
select student_id,salutation,zip,first_name,last_name from tempTable)
with r as(
select avg(phone) as average,stddev(phone) as sd from tempTable)
select salutation,zip,average-3*sd as"lcl",average+3*sd as"UCL"
from
q ,r
error given is select statement missing.Please tell me what is wrong I am a sql newbie and can't do it myself

while using stacked CTE expect for the first CTE you don't need With keyword instead use comma before the CTE name. Try this syntax.
WITH q
AS (SELECT student_id,
salutation,
zip,
first_name,
last_name
FROM temptable),
r
AS (SELECT Avg(phone) AS average,
STDDEV(phone) AS sd
FROM temptable)
SELECT salutation,
zip,
average - 3 * sd AS"lcl",
average + 3 * sd AS"UCL"
FROM q Cross Join r;

I don't think you need a WITH clause at all to run such a query. It might be better to use the AVG() and STDDEV() functions as window functions (analytic functions in Oracle lingo):
SELECT temp1.*, average - 3 * sd AS lcl, average + 3 * sd AS ucl
FROM (
SELECT student_id, salutation, zip, first_name, last_name
, AVG(phone) OVER ( ) AS average, STDDEV(phone) OVER ( ) AS sd
FROM tempTable
) temp1
You don't even need the subquery but it helps save some keystrokes. See this SQL Fiddle demo with dummy data from DUAL.
P.S. You do need the alias (in this case, temp1) for the subquery if you want to use * to get all the columns selected in the subquery - it won't work otherwise. Alternately you could name the columns explicitly, which is a good practice anyway.

Related

Simpler way to do a SUM with a fanout on a join

Note: SQL backend does not matter, any mainstream relational DB is fine (postgres, mysql, oracle, sqlserver)
There is an interesting article on Looker that tells about the technique they use to provide correct totals when a JOIN results in a fanout, along the lines of:
# In other words, using a hash to remove any potential duplicates (assuming a Primary Key).
SUM(DISTINCT big_unique_number + total) - SUM(DISTINCT big_unique_number)
A good way to simulate the fanout it just doing something like this:
WITH Orders AS (
SELECT 10293 AS id, 2.5 AS rate UNION ALL
SELECT 210293 AS id, 3.5
),
Other AS (
SELECT 1 UNION ALL SELECT 2
)
SELECT SUM(rate) FROM Orders CROSS JOIN Other
-- Returns 12.0 instead of 6.0
Their example does something like this, which I think is just a long-form way of grabbing md5(PK) with all the fancy footwork to get around the 8-byte limitation (so they do a LEFT(...) then a RIGHT(...):
(COALESCE(CAST( ( SUM(DISTINCT (CAST(FLOOR(COALESCE(users.age ,0)
*(1000000*1.0)) AS DECIMAL(38,0))) +
CAST(STRTOL(LEFT(MD5(CONVERT(VARCHAR,users.id )),15),16) AS DECIMAL(38,0))
* 1.0e8 + CAST(STRTOL(RIGHT(MD5(CONVERT(VARCHAR,users.id )),15),16) AS DECIMAL(38,0)) )
- SUM(DISTINCT CAST(STRTOL(LEFT(MD5(CONVERT(VARCHAR,users.id )),15),16) AS DECIMAL(38,0))
* 1.0e8 + CAST(STRTOL(RIGHT(MD5(CONVERT(VARCHAR,users.id )),15),16) AS DECIMAL(38,0))) )
AS DOUBLE PRECISION)
/ CAST((1000000*1.0) AS DOUBLE PRECISION), 0)
Is there another general-purpose way to do this? Perhaps using a correlated subquery or something else? Or is the above way the best known way to do this?
Two related answers:
https://stackoverflow.com/a/14140884/651174
https://stackoverflow.com/a/3333574/651174
Without worrying about a general-purpose hashing function (for example, that may take strings), the following works:
WITH Orders AS (
SELECT 10293 AS id, 2.5 AS rate UNION ALL
SELECT 210293 AS id, 3.5
),
Other AS (
SELECT 1 UNION ALL SELECT 2
)
SELECT SUM(DISTINCT id + rate) - SUM(DISTINCT id) FROM Orders CROSS JOIN Other
-- 6.0
But this still begs the question: is there another / better way to do this in a very general-purpose manner?
A typical example for the joins mutilating the aggregation is this:
select
posts.id,
count(likes.id) as likes_total,
count(dislikes.id) as dislikes_total
from posts
left join likes on likes.post_id = posts.post_id
left join dislikes on dislikes.post_id = posts.post_id
group by posts.id;
where both counts result in the same number, because each gets multiplied by the other. With 2 likes and 3 dislikes, both counts are 6.
The simple solution is: Aggregate before joining. If you want to know the likes and dislikes counts per post, join the likes and dislikes counts to the posts.
select posts.id, l.likes_total, d.dislikes_total
from posts
left join
(
select post_id, count(*) as likes_total
from likes
group by post_id
) l on l.post_id = posts.post_id
left join
(
select post_id, count(*) as dislikes_total
from dislikes
group by post_id
) d on d.post_id = posts.post_id
group by posts.id;
Use COALESCE, if you want to see zeros instead of nulls.
Don't try to muddle through with tricks. Just aggregate, then join. You can of course replace the joins with lateral joins (which are correlated subqueries), if the DBMS supports them. Or for single aggregates as in the example even move the correlated subqueries to the select clause. That's mainly personal preference, but depending on the DBMS's optimizer one solution may be faster than the other. (Ideally the optimizer would come up with the same execution plan for all those queries of course.)
Use a larger datatype to shift the values out of the way. This is similar to the first example without the potential for collisions. It probably also has minor performance benefits in not having to execute two different distinct sums.
sum(distinct id * 1000000000 + value) % 1000000000
The principle is to package up the values into a single unit. For the most flexibility you'd want to convert to something like a wide decimal type in order to accommodate the full range. With strings it's easY to generate a new surrogate id via dense_rank() That would also let you collapse the key width according to the number of expect key values.
Ultimately though, I think the ultimate answer is no. There's not a one size fits all approach, especially across the spectrum of the various aggregate functions going beyond variations in mixed data types.
I think the best option is always to SUM data before to join via a subquery or a cte
WITH
Orders AS (
SELECT 10293 AS id, 2.5 AS rate
UNION ALL
SELECT 210293 AS id, 3.5
),
Other AS (
SELECT 1 other
UNION ALL
SELECT 2
)
select *
from (
SELECT SUM(rate) rate
FROM Orders
) OrdersSummed
CROSS JOIN Other
or
WITH
Orders AS (
SELECT 10293 AS id, 2.5 AS rate
UNION ALL
SELECT 210293 AS id, 3.5
),
Other AS (
SELECT 1 other
UNION ALL
SELECT 2
),
OrdersSummed AS (
SELECT SUM(rate) rate
FROM Orders
) s
select *
from OrdersSummed
CROSS JOIN Other
--Approaching the solution such that fanout phenomenon a natural consequence of cross join.
;WITH Orders AS (
SELECT 10293 AS id, 2.5 AS rate UNION ALL
SELECT 210293 AS id, 3.5
), Other AS (
SELECT 1 as oth_id UNION ALL SELECT 2 as oth_id
)
, FanDepth AS (
SELECT count(*) as depth from Other
)
SELECT SUM(rate) / depth
FROM
Orders CROSS JOIN Other CROSS JOIN FanDepth
Group by depth

How to join in SQL-SERVER

I am trying to learn SQL-SERVER and I have created the below query:
WITH T AS
(
SELECT ROW_NUMBER() OVER(ORDER BY d.DIALOG_ID) as row_num, *
FROM test.db as d
INNER JOIN test.dbs as ds
ON d.DIALOG_ID = ds.DIALOG_ID
)
SELECT *
FROM T
WHERE row_num <=10;
I found that the only way to limit is with ROW_NUMBER().
Although when I try to run the join I have this error:
org.jkiss.dbeaver.model.sql.DBSQLException: SQL Error [8156] [S0001]: The column 'DIALOG_ID' was specified multiple times for 'T'.
The problem: In the WITH, you do SELECT * which gets all columns from both tables db and dbs. Both have a column DIALOG_ID, so a column by that name ends up twice in the result set of the WITH.
Although until here that is all allowed, it is not good practice: why have the same data twice?
Things go wrong when SQL Server has to determine what SELECT * FROM T means: it expands SELECT * to the actual columns of T, but it finds a duplicate column name, and then it refuses to continue.
The fix (and also highly recommended in general): be specific about the columns that you want to output. If T has no duplicate columns, then SELECT * FROM T will succeed.
Note that the even-more-pure variant is to also be specific about what columns you select from T. By doing that it becomes clear at a glance what the SELECT produces, instead of having to guess or investigate when you look at the query later on (or when someone else does).
The updated code would look like this (fill in your column names as we don't know them):
WITH T AS
(
SELECT
ROW_NUMBER() OVER(ORDER BY d.DIALOG_ID) as row_num,
d.DIALOG_ID, d.SOME_OTHER_COL,
ds.DS_ID, ds.SOME_OTHER_COL_2
FROM test.db AS d
INNER JOIN test.dbs AS ds ON d.DIALOG_ID = ds.DIALOG_ID
)
SELECT row_num, DIALOG_ID, SOME_OTHER_COL, DS_ID, SOME_OTHER_COL_2
FROM T
WHERE row_num <= 10;
WITH T AS
(
SELECT ROW_NUMBER() OVER(ORDER BY d.DIALOG_ID) as row_num, d.*
FROM test.db as d
INNER JOIN test.dbs as ds
ON d.DIALOG_ID = ds.DIALOG_ID
)
SELECT *
FROM T
WHERE row_num <=10;

Selecting a 1% sample in Aginity Workbench SQL

I need to scoop up a random sample of 1% of the records in a table (with the number of rows growing every second).
My idea is to
SELECT DISTINCT
random(),
name,
age,
registrationNumber
FROM everGrowingTable
ORDER BY random desc
LIMIT (
(select count(*) from everGrowingTable) * 0.01
) -- this is attempting to get 1%
The compiler complains about the * operator. It is fine when I hard code the table size however.
I've tried IBM documentation, but this talks about calculations using known values, not values that grow (such is that case in my table)
There doesn't seem to be a Aginity SQL function that does this. I've notice the MINUS function in the Aginity Workbench Intellisense, but alas, no multiplication equivalent.
You could use window functions in a subquery to assign a random number to each record and compute the total record number, and then do the filtering in the outer query :
SELECT name, age, registrationNumber
FROM (
SELECT
name,
age,
registrationNumber,
ROW_NUMBER() OVER(ORDER BY random()) rn,
COUNT(*) OVER() cnt
FROM everGrowingTable
) x
WHERE rn <= cnt / 100
ORDER BY rn

Sql max trophy count

I Create DataBase in SQL about Basketball. Teacher give me the task, I need print out basketball players from my database with the max trophy count. So, I wrote this little bit of code:
select surname ,count(player_id) as trophy_count
from dbo.Players p
left join Trophies t on player_id=p.id
group by p.surname
and SQL gave me this:
but I want, that SQL will print only this:
I read info about select in selects, but I don't know how it works, I tried but it doesn't work.
Use TOP:
SELECT TOP 1 surname, COUNT(player_id) AS trophy_count -- or TOP 1 WITH TIES
FROM dbo.Players p
LEFT JOIN Trophies t
ON t.player_id = p.id
GROUP BY p.surname
ORDER BY COUNT(player_id) DESC;
If you want to get all ties for the highest count, then use SELECT TOP 1 WITH TIES.
;WITH CTE AS
(
select surname ,count(player_id) as trophy_count
from dbo.Players p
group by p.surname;
)
select *
from CTE
where trophy_count = (select max(trophy_count) from CTE)
While select top with ties works (and is probably more efficient) I would say this code is probably more useful in the real world as it could be used to find the max, min or specific trophy count if needed with a very simple modification of the code.
This is basically getting your group by first, then allowing you to specify what results you want back. In this instance you can use
max(trophy_count) - get the maximum
min(trophy_count) - get the minimum
# i.e. - where trophy_count = 3 - to get a specific trophy count
avg(trophy_count) - get the average trophy_count
There are many others. Google "SQL Aggregate functions"
You will eventually go down the rabbit hole of needing to subsection this (examples are by week or by league). Then you are going to want to use windows functions with a cte or subquery)
For your example:
;with cte_base as
(
-- Set your detail here (this step is only needed if you are looking at aggregates)
select surname,Count(*) Ct
left join Trophies t on player_id=p.id
group by p.surname
, cte_ranked as
-- Dense_rank is chosen because of ties
-- Add to the partition to break out your detail like by league, surname
(
select *
, dr = DENSE_RANK() over (partition by surname order by Ct desc)
from cte_base
)
select *
from cte_ranked
where dr = 1 -- Bring back only the #1 of each partition
This is by far overkill but helping you lay the foundation to handle much more complicated queries. Tim Biegeleisen's answer is more than adequate to answer you question.

Avoiding Correlated Subquery in Oracle

In Oracle 9.2.0.8, I need to return a record set where a particular field (LAB_SEQ) is at a maximum (it is a sequential VARCHAR array '0001', '0002', etc.) for each of another field (WO_NUM). To select the maximum, I am attempting to order in descending order and select the first row. Everything I can find on StackOverflow suggests that the only way to do this is with a correlated subquery. Then I use this maximum in the WHERE clause of the outer query to get the row I want for each WO_NUM:
SELECT lt.WO_NUM, lt.EMP_NUM, lt.LAB_END_DATE, lt.LAB_END_TIME
FROM LAB_TIM lt WHERE lt.LAB_SEQ = (
SELECT LAB_SEQ FROM (
SELECT lab.LAB_SEQ FROM LAB_TIM lab WHERE lab.CCN='1' AND MAS_LOC='1'
AND lt.WO_NUM = lab.WO_NUM ORDER BY ROWNUM DESC
) WHERE ROWNUM=1
)
However, this returns an invalid identifier for lt.WO_NUM error. Research suggests that ORacle 8 only allows correlated subqueries one level deep, and suggests rewriting to avoid the subquery - something which discussion of selecting maximums suggests can't be done. Any help getting this statement to execute would be greatly appreciated.
Your correlated subquery would need to be something like
SELECT lt.WO_NUM, lt.EMP_NUM, lt.LAB_END_DATE, lt.LAB_END_TIME
FROM LAB_TIM lt WHERE lt.LAB_SEQ = (
SELECT max(lab.LAB_SEQ)
FROM LAB_TIM lab
WHERE lab.CCN='1' AND MAS_LOC='1'
AND lt.WO_NUM = lab.WO_NUM
)
Since you are on Oracle 9.2, it will probably be more efficient to use a correlated subquery. I'm not sure what the predicates lab.CCN='1' AND MAS_LOC='1' are doing in your current query so I'm not quite sure how to translate them into the analytic function approach. Is the combination of LAB_SEQ and WO_NUM not unique in LAB_TIM? Do you need to add in the predicates on CCN and MAS_LOC in order to get a single unique row for every WO_NUM? Or are you using those predicates to decrease the number of rows in your output? The basic approach will be something like
SELECT *
FROM (SELECT lt.WO_NUM,
lt.EMP_NUM,
lt.LAB_END_DATE,
lt.LAB_END_TIME,
rank() over (partition by wo_num
order by lab_seq desc) rnk
FROM LAB_TIM lt)
WHERE rnk = 1
but it's not clear to me whether CCN and MAS_LOC need to be added to the ORDER BY clause in the analytic function or whether they need to be added to the WHERE clause.
This is one case where a correlated subquery is better, particularly if you have indexes on the table. However, it should be possible to rewrite correlated subqueries as joins.
I think the following is equivalent, without the correlated subquery:
SELECT lt.WO_NUM, lt.EMP_NUM, lt.LAB_END_DATE, lt.LAB_END_TIME
FROM (select *, rownum as r
from LAB_TIM lt
) lt join
(select wo_num, max(r) as maxrownum
from (select LAB_SEQ, wo_num, rownum as r
from LAB_TIM lt
where lab.CCN = '1' AND MAS_LOC = '1'
)
) ltsum
on lt.wo_num = ltsum.wo_num and
lt.r = ltsum.maxrownum
I'm a little unsure about how Oracle works with rownums in things like ORDER BY.