Using Analytical Clauses with DISTINCT - sql

The purpose is to query multiple tables using DISTINC (if not I get millions of rows as results), but at the same time using sample to gather a 10% sample from the results that should all be unique. I am getting the following error:
ORA-01446: cannot select ROWID from, or sample, a view with DISTINCT, GROUP BY, etc.
Here is the code I have written:
WITH V AS (SELECT DISTINCT AL1."NO", AL3."IR", AL1."ACCT", AL3."CUST_DA", AL1."NA",
AL3."1_LINE", AL3."2_LINE", AL3."3_LINE", AL1."DA",
AL1."CD", AL1."TITLE_NA", AL1."ENT_NA", AL3."ACCT",
AL3."ACCTLNK_ENRL_CNT"
FROM "DOC"."DOCUMENT" AL1, "DOC"."VNDR" AL2, "DOC"."CUST_ACCT" AL3
WHERE (AL1."ACCT"=AL2."VNDR"
AND AL2."ACCT"=AL3."ACCT")
AND ((AL1."IMG_DA" >= Trunc(sysdate-1)
AND AL1."PROC"='A'
AND AL3."ACCT"<>'03')))
SELECT * FROM V SAMPLE(10.0)

You can't sample a join view like this.
Simpler test case (MCVE):
with v as
( select d1.dummy from dual d1
join dual d2 on d2.dummy = d1.dummy
)
select * from v sample(10);
Fails with:
ORA-01445: cannot select ROWID from, or sample, a join view without a key-preserved table
The simplest fix would be to move the sample clause to the driving table:
with v as
( select d1.dummy from dual sample(10) d1
join dual d2 on d2.dummy = d1.dummy
)
select * from v;
I would therefore rewrite your view as:
with v as
( select distinct
d.no
, a.ir
, d.acct
, a.cust_da
, d.na
, a."1_LINE", a."2_LINE", a."3_LINE"
, d.da, d.cd, d.title_na, d.ent_na
, a.acct
, a.acctlnk_enrl_cnt
from doc.document sample(10) d
join doc.vndr v
on v.vndr = d.acct
join doc.cust_acct a
on a.acct = v.acct
and d.img_da >= trunc(sysdate - 1)
and d.proc = 'A'
and a.acct <> '03'
)
select * from v;

Related

Apache Phoenix SQL Join Limitation when using sub-queries

I have this query in Apache Phoenix SQL:
select WO.* from (
select "nr_id", "txt_commrcial_label"
from "e_application" APP
where "txt_commrcial_label" in ('a','b')
and "nr_id" not in (select "nr_ap_id"
from "e_workorder"
where "nr_id" in ('888'))
and "epochtimestampchanged" = (select max("epochtimestampchanged")
from "e_application"
where "nr_id" = APP."nr_id") ) as APP2,
--
(select Y.ID as WO_ID, Y."nr_id" as WO_nr_id, Y."nr_ap_id" as WO_nr_ap_id
from ( select "nr_id", max("epochtimestampchanged") as max_epochtimestampchanged
from "e_workorder"
where CAST(TO_NUMBER("epochtimestampchanged") AS TIMESTAMP) < TO_TIMESTAMP('2020-10-21 19:22:20.0')
group by "nr_id" ) as X, "e_workorder" as Y
where Y."nr_id" = X."nr_id"
and Y."epochtimestampchanged" < X.max_epochtimestampchanged ) as WO
--
where APP2."nr_id" = WO.WO_nr_ap_id;
I get java language illegal ... blurb for this not overly complex statement. But I cannot see the reason here or in the manuals.
The individual queries work (imagine the ( and , are not there), but no joy when these 2 sub-queries merged to a JOIN.
Do I need to persist the results to tables and then JOIN? Or is there way around this? I have the impression this is too complex in terms of sub-queries.
For others to note, this is a big and a different SQL Approach is needed as per below which is a work-around with note from Cloudera:
The best workaround is to explicitly define a join in the APP2 query.
See the APP_MAX_TIMESTAMP table joined with the APP table, defining
basically the same condition as in the original query (but using a
table join instead of an inner select):
The query that should work and should do the same as the original
query:
select
WO.*
from
(
select
"nr_id",
"txt_commrcial_label"
from
"e_application" APP
LEFT JOIN (
select
max("epochtimestampchanged") as max_app_timestamp,
"nr_id" as max_app_timestamp_nr_id
from
"e_application"
group by "nr_id"
) APP_MAX_TIMESTAMP
ON APP_MAX_TIMESTAMP.max_app_timestamp_nr_id = APP."nr_id"
where
"txt_commrcial_label" in
( list
)
and "nr_id" not in
(
select
"nr_ap_id"
from
"e_workorder"
where
"nr_id" in
(
'888'
)
)
and "epochtimestampchanged" = max_app_timestamp
)
as APP2,
(
select
Y.ID as WO_ID,
Y."nr_id" as WO_nr_id,
Y."nr_ap_id" as WO_nr_ap_id
from
(
select
"nr_id",
max("epochtimestampchanged") as max_epochtimestampchanged
from
"e_workorder"
where
CAST(TO_NUMBER("epochtimestampchanged") AS TIMESTAMP) < TO_TIMESTAMP('2022-10-10 19:22:20.0')
group by
"nr_id"
)
as X,
"e_workorder" as Y
where
Y."nr_id" = X."nr_id"
and Y."epochtimestampchanged" < X.max_epochtimestampchanged
)
as WO
where
APP2."nr_id" = WO.WO_nr_ap_id;

How to create a view from existing table records, but also adding new records that do not exist

I am trying to create a view from an existing views data, but also if there are certain lines that do not exist per part/date combo, then have those lines be created. I have the below query that is showing what I currently have for the particular s_date/part_no combos:
SELECT
s_date,
part_no,
issue_group,
s_level,
qty_filled
FROM
current_view
WHERE
part_no = 'xxxxx'
AND s_date IN (
'201802',
'201803'
)
ORDER BY
s_date,
part_no,
issue_group,
DECODE(s_level, '80', 1, '100', 2, 'Late', 3)
Which produces the below:
I know how to create a view with that data, that's the easy part. But what I'm needing is a line for each issue_group and s_level combo, and if it's a created line, to put 0 as the qty_filled.
Every part_no / s_date combo should have 6 rows that go with it
- issue_group = '1' / s_level = '80'
- issue_group = '1' / s_level = '100'
- issue_group = '1' / s_level = 'Late'
- issue_group = '2/3 ' / s_level = '80'
- issue_group = '2/3 ' / s_level = '100'
- issue_group = '2/3 ' / s_level = 'Late'
So if one of the above combinations already exists for the current s_date/part_no, then it obviously takes the qty_filled info from the current view. If not, a new line is created, and qty_filled = 0. So I'm trying to get it to look like this:
I've only shown 1 part, with a couple dates, just to get the point across. There are 10k+ parts within the table and there will never be more than 1 part/date combo for each of the 6 issue_group/s_level combos.
The idea is to generate the rows using CROSS JOIN and then bring in the extra information with a LEFT JOIN. In Oracle syntax, this looks like:
WITH v as (
SELECT v.*
FROM current_view v
WHERE part_no = 'xxxxx' AND
s_date IN ('201802', '201803')
)
SELECT d.s_date, ig.part_no, ig.issue_group, l.s_level,
COALESCE(v.qty_filled, 0) as qty_filled
FROM (SELECT DISTINCT s_date FROM v) d CROSS JOIN
(SELECT DISTINCT part_no, ISSUE_GROUP FROM v) ig CROSS JOIN
(SELECT '80' as s_level FROM DUAL UNION ALL
SELECT '100' FROM DUAL UNION ALL
SELECT 'LATE' FROM DUAL
) l LEFT JOIN
v
ON v.s_date = d.s_date AND v.part_no = ig.part_no AND
v.issue_group = ig.issue_group AND v.s_level = l.s_level
ORDER BY s_date, part_no, issue_group,
(CASE s_level WHEN '80' THEN 1 WHEN '100' THEN 2 WHEN 'Late' THEN 3 END)
One solution could be to generate a cartesian product of all expected rows using a cartesian product between the (fixed) list of values, and then LEFT JOIN it with current_view.
The following query guarantees that you will get a record for each given s_date/part_no/issue_group/s_level tuple. If no record matches in current_view, the query will display a 0 quantity.
SELECT
sd.s_date,
pn.part_no,
ig.issue_group,
sl.s_level,
COALESCE(cv.qty_filled, 0) qty_filled
FROM
(SELECT '201802' AS s_date UNION SELECT '201803') AS sd
CROSS JOIN (SELECT 'xxxxx' AS part_no) AS pn
CROSS JOIN (SELECT '1' AS issue_group UNION SELECT '2') AS ig
CROSS JOIN (SELECT '80' AS s_level UNION SELECT '100' UNION SELECT 'Late') AS sl
LEFT JOIN current_view cv
ON cv.s_date = sd.s_date
AND cv.part_no = pn.part_no
AND cv.issue_group = ig.issue_group
AND cv.s_level = ig.s_level
ORDER BY
sd.s_date,
pn.part_no,
ig.issue_group,
DECODE(sl.s_level, '80', 1, '100', 2, 'Late', 3)
NB : you did not tag your RDBMS. This should work on most of them, excepted Oracle, where you need to add FROM DUAL to each select in the queries that list the allowed values, like :
(SELECT '201802' AS s_date FROM DUAL UNION SELECT '201803' FROM DUAL) AS sd

SQL join where value in second table is first lower value w.r.t the first table

Let's say I have 2 tables and both of them have a column that contains timestamp for various events. The timestamp values in both the tables are different as they are for different events.
I want to join the two tables such that every record in table1 is joined with first lower timestamp on table2.
For e.g.
Table1 Table2
142.13 141.16
157.34 145.45
168.45 155.85
170.23 166.76
168.44
Joined Table should be:
142.13,141.16
157.34,155.85
168.45,166.76
170.23,168.44
I am using Apache Spark SQL.
I am a noob in SQL and this doesn't look like job for a noob :). Thanks.
Try this:
with t1 as (
select 142.13 v from dual union all
select 157.34 v from dual union all
select 168.45 v from dual union all
select 170.23 v from dual
),
t2 as (
select 141.16 v from dual union all
select 145.45 v from dual union all
select 155.85 v from dual union all
select 166.76 v from dual union all
select 168.44 v from dual
)
select v, ( select max(v) from t2 where t2.v <= t1.v )
from t1;
V (SELECTMAX(V)FROMT2WHERET2.V<=T1.V)
---------- -----------------------------------
142.13 141.16
157.34 155.85
168.45 168.44
170.23 168.44
4 rows selected.
the WITH clause is just me faking the data ...
the simplified query is just:
select t1.v, ( select max(t2.v) from table2 t2 where t2.v <= t1.v ) from table1 t1
[edit]
admittedly, I'm not familiar with Spark .. but this is simple enough SQL .. I'm assuming it works :)
[/edit]
Ditto has shown the straight-forward way to solve this. If Apache Spark really has problems with this very basic query, then join first (which can lead to a big intermediate result) and aggregate then:
select t1.v, max(t2.v)
from table1 t1
join table2 t2 on t2.v <= t1.v
group by t1.v
order by t1.v;
If you are using apache spark sql then you can join these two tables as dataframes with a adding a column using monotonically_increasing_id()
val t1 = spark.sparkContext.parallelize(Seq(142.13, 157.34, 168.45, 170.23)).toDF("c1")
val t2 = spark.sparkContext.parallelize(Seq(141.16,145.45,155.85,166.76,168.44)).toDF("c2")
val t11 = t1.withColumn("id", monotonically_increasing_id())
val t22 = t2.withColumn("id", monotonically_increasing_id())
val res = t11.join(t22, t11("id") + 1 === t22("id") ).drop("id")
Output:
+------+------+
| c1| c2|
+------+------+
|142.13|145.45|
|168.45|166.76|
|157.34|155.85|
|170.23|168.44|
+------+------+
Hope this helps

select statement to list numbers in range

In DB2, I have this query to list numbers 1-x:
select level from SYSIBM.SYSDUMMY1 connect by level <= "some number"
But this maxes out due to SQL20450N Recursion limit exceeded within a hierarchical query.
How can I generate a list of numbers between 1 and x using a select statement when x is not known at runtime?
I found an answer based on this post:
WITH d AS
(SELECT LEVEL - 1 AS dig FROM SYSIBM.SYSDUMMY1 CONNECT BY LEVEL <= 10)
SELECT t1.n
FROM (SELECT (d7.dig * 1000000) +
(d6.dig * 100000) +
(d5.dig * 10000) +
(d4.dig * 1000) +
(d3.dig * 100) +
(d2.dig * 10) +
d1.dig AS n
FROM d d1
CROSS JOIN d d2
CROSS JOIN d d3
CROSS JOIN d d4
CROSS JOIN d d5
CROSS JOIN d d6
CROSS JOIN d d7) t1
JOIN ("subselect that returns desired value as i") t2
ON t1.n <= t2.i
ORDER BY t1.n
That's how I usually create lists:
For your example
numberlist (num) as
(
select min(1) from anytable
union all
select num + 1 from numberlist
where num <= x
)
I did something like this when I wanted a list of values to correspond with months:
with t1 (mon) as (
values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)
)
select * from t1
It seems a bit kludgy, but for a small list like 1-12, or even 1-50, it did what I needed it to.
It's nice to see someone else tagging their questions with DB2.
If you have any table known to have more than x rows, you can always do:
select * from (
select row_number() over () num
from my_big_table
) where num <= x
or, per bhamby's suggestion:
select row_number() over () num
from my_big_table
fetch first X rows only
For DB2 you can use recursive common table expressions (cf. IBM documentation on recursive CTE):
with max(num) as (
select 1 from sysibm.sysdummy1
)
,result (num) as (
select num from max
union ALL
select result.num+1
from result
where result.num<=100
)
select * from result;

Can you create nested WITH clauses for Common Table Expressions?

WITH y AS (
WITH x AS (
SELECT * FROM MyTable
)
SELECT * FROM x
)
SELECT * FROM y
Does something like this work? I tried it earlier but I couldn't get it to work.
While not strictly nested, you can use common table expressions to reuse previous queries in subsequent ones.
To do this, the form of the statement you are looking for would be
WITH x AS
(
SELECT * FROM MyTable
),
y AS
(
SELECT * FROM x
)
SELECT * FROM y
You can do the following, which is referred to as a recursive query:
WITH y
AS
(
SELECT x, y, z
FROM MyTable
WHERE [base_condition]
UNION ALL
SELECT x, y, z
FROM MyTable M
INNER JOIN y ON M.[some_other_condition] = y.[some_other_condition]
)
SELECT *
FROM y
You may not need this functionality. I've done the following just to organize my queries better:
WITH y
AS
(
SELECT *
FROM MyTable
WHERE [base_condition]
),
x
AS
(
SELECT *
FROM y
WHERE [something_else]
)
SELECT *
FROM x
With does not work embedded, but it does work consecutive
;WITH A AS(
...
),
B AS(
...
)
SELECT *
FROM A
UNION ALL
SELECT *
FROM B
EDIT
Fixed the syntax...
Also, have a look at the following example
SQLFiddle DEMO
These answers are pretty good, but as far as getting the items to order properly, you'd be better off looking at this article
http://dataeducation.com/dr-output-or-how-i-learned-to-stop-worrying-and-love-the-merge
Here's an example of his query.
WITH paths AS (
SELECT
EmployeeID,
CONVERT(VARCHAR(900), CONCAT('.', EmployeeID, '.')) AS FullPath
FROM EmployeeHierarchyWide
WHERE ManagerID IS NULL
UNION ALL
SELECT
ehw.EmployeeID,
CONVERT(VARCHAR(900), CONCAT(p.FullPath, ehw.EmployeeID, '.')) AS FullPath
FROM paths AS p
JOIN EmployeeHierarchyWide AS ehw ON ehw.ManagerID = p.EmployeeID
)
SELECT * FROM paths order by FullPath
we can create nested cte.please see the below cte in example
;with cte_data as
(
Select * from [HumanResources].[Department]
),cte_data1 as
(
Select * from [HumanResources].[Department]
)
select * from cte_data,cte_data1
I was trying to measure the time between events with the exception of what one entry that has multiple processes between the start and end. I needed this in the context of other single line processes.
I used a select with an inner join as my select statement within the Nth cte. The second cte I needed to extract the start date on X and end date on Y and used 1 as an id value to left join to put them on a single line.
Works for me, hope this helps.
cte_extract
as
(
select ps.Process as ProcessEvent
, ps.ProcessStartDate
, ps.ProcessEndDate
-- select strt.*
from dbo.tbl_some_table ps
inner join (select max(ProcessStatusId) ProcessStatusId
from dbo.tbl_some_table
where Process = 'some_extract_tbl'
and convert(varchar(10), ProcessStartDate, 112) < '29991231'
) strt on strt.ProcessStatusId = ps.ProcessStatusID
),
cte_rls
as
(
select 'Sample' as ProcessEvent,
x.ProcessStartDate, y.ProcessEndDate from (
select 1 as Id, ps.Process as ProcessEvent
, ps.ProcessStartDate
, ps.ProcessEndDate
-- select strt.*
from dbo.tbl_some_table ps
inner join (select max(ProcessStatusId) ProcessStatusId
from dbo.tbl_some_table
where Process = 'XX Prcss'
and convert(varchar(10), ProcessStartDate, 112) < '29991231'
) strt on strt.ProcessStatusId = ps.ProcessStatusID
) x
left join (
select 1 as Id, ps.Process as ProcessEvent
, ps.ProcessStartDate
, ps.ProcessEndDate
-- select strt.*
from dbo.tbl_some_table ps
inner join (select max(ProcessStatusId) ProcessStatusId
from dbo.tbl_some_table
where Process = 'YY Prcss Cmpltd'
and convert(varchar(10), ProcessEndDate, 112) < '29991231'
) enddt on enddt.ProcessStatusId = ps.ProcessStatusID
) y on y.Id = x.Id
),
.... other ctes
Nested 'With' is not supported, but you can always use the second With as a subquery, for example:
WITH A AS (
--WITH B AS ( SELECT COUNT(1) AS _CT FROM C ) SELECT CASE _CT WHEN 1 THEN 1 ELSE 0 END FROM B --doesn't work
SELECT CASE WHEN count = 1 THEN 1 ELSE 0 END AS CT FROM (SELECT COUNT(1) AS count FROM dual)
union all
select 100 AS CT from dual
)
select CT FROM A