We have data in millions (total rows 1698393). While exporting this data in text takes 4 hours. I need to know if there is a way to reduce the exporting time for those many records from Oracle database using SQL Developer.
with cte as (
select *
from (
select distinct
system_serial_number,
( select s.system_status
from eim_pr_system s
where .system_serial_number=a.system_serial_number
) system_status,
( select SN.cmat_customer_id
from EIM.eim_pr_ib_latest SN
where SN.role_id=19
and SN.system_serial_number=a.system_serial_number
) SN_cmat_customer_id,
( select EC.cmat_customer_id
from EIM.eim_pr_ib_latest EC
where EC.role_id=1
and a.system_serial_number=EC.system_serial_number
) EC_cmat_customer_id
from EIM.eim_pr_ib_latest a
where a.role_id in (1,19)
)
where nvl(SN_cmat_customer_id,0)!=nvl(EC_cmat_customer_id,0)
)
select system_serial_number,
system_status,
SN_CMAT_Customer_ID,
EC_CMAT_Customer_ID,
C.Customer_Name SN_Customer_Name,
D.Customer_Name EC_Customer_Name
from cte,
eim.eim_party c,
eim.eim_party D
where c.CMAT_Customer_ID=SN_cmat_customer_id
and D.CMAT_Customer_ID=EC_cmat_customer_id;
offset first 5001 rows fetch next 200000 rows only
You can get rid of a lot of the joins and correlated sub-queries (which will speed things up by reducing the number of table scans) by doing something like:
SELECT a.system_serial_number,
s.system_status,
a.SN_cmat_customer_id,
a.EC_cmat_customer_id,
a.SN_customer_name,
a.EC_customer_name
FROM (
SELECT l.system_serial_number,
MAX( CASE l.role_id WHEN 19 THEN l.cmat_customer_id END ) AS SN_cmat_customer_id,
MAX( CASE l.role_id WHEN 1 THEN l.cmat_customer_id END ) AS EC_cmat_customer_id
MAX( CASE l.role_id WHEN 19 THEN p.customer_name END ) AS SN_customer_name,
MAX( CASE l.role_id WHEN 1 THEN p.customer_name END ) AS EC_customer_name
FROM EIM.eim_pr_ib_latest l
INNER JOIN
EIM.eim_aprty p
ON ( p.CMAT_Customer_ID= l.cmat_customer_id )
WHERE l.role_id IN ( 1, 19 )
GROUP BY system_serial_number
HAVING NVL( MAX( CASE l.role_id WHEN 19 THEN l.cmat_customer_id END ), 0 )
<> NVL( MAX( CASE l.role_id WHEN 1 THEN l.cmat_customer_id END ), 0 )
) a
LEFT OUTER JOIN
eim_pr_system s
ON ( s.system_serial_number=a.system_serial_number )
Since your original query is not throwing a TOO_MANY_ROWS exception on the correlated sub-queries, I am assuming that your data is such that there is only a single row being returned for each correlated query and the above query will reflect your output (although without some sample data it is difficult to test).
Apart from 'making the query faster' - there is a way to achieve a faster export using SQL Developer.
When you use the data grid, export feature - this will execute the query, again. The only time this won't happen is if you have fetched ALL the rows into the grid. Doing this will for very large data sets will be 'expensive' on the client side, but you can avoid that.
For a faster export, add a /*csv*/ comment in your select, and wrap the statement with a spool c:\my_file.csv - then collapse the script output panel, and run that with F5. As we fetch the data, we'll write it to that file in a CSV format.
/*csv*/
/*xml*/
/*json*/
/*html*/
/*insert*/
I talk about this feature in detail here.
Related
SELECT DISTINCT
Member_ID,
CASE
WHEN a.ASTHMA_MBR = 1 THEN 'ASTHMA'
WHEN a.COPD_MBR = 1 THEN 'COPD'
WHEN a.HYPERTENSION_MBR = 1 THEN 'HYPERTENSION'
END AS DX_FLAG
So a member may have more than one, but my statement is only returning one of them.
I'm using Teradata and trying to convert multiple columns of boolean data into one column. The statement is only returning one condition when members may have 2 or more. I tried using Select instead of Select Distinct and it made no difference.
This is a kind of UNPIVOT:
with base_data as
( -- select the columns you want to unpivot
select
member_id
,date_col
-- the aliases will be the final column value
,ASTHMA_MBR AS ASTHMA
,COPD_MBR AS COPD
,HYPERTENSION_MBR AS HYPERTENSION
from your_table
)
,unpvt as
(
select member_id, date_col, x, DX_FLAG
from base_data
-- now unpivot those columns into rows
UNPIVOT(x FOR DX_FLAG IN (ASTHMA, COPD, HYPERTENSION)
) dt
)
select member_id, DX_FLAG, date_col
from unpvt
-- only show rows where the condition is true
where x = 1
I have a insert select query that brings back around 1 million records, each record has around 30 columns, there are two columns (performance total, mechanical total). One of these columns will have a value in them. Performance Total could have nulll value, mechanical total could have null value or both could have values for that record.
When the record has a value in both columns (performance total, mechanical total) I want the SQL query to create two records, so two records are inserted into a table rather than one. One record being the performance record and one a mechanical record. The performance total or mechanical total will be inserted into a table where there's is total field.
How can this be done in an SQL query without creating a UNION statement as it cause performance issues??
I don't suppose this is any more efficient that using a UNION, but you could do this:
insert into target (a, b, c, rec_type, rec_total)
select mt.a, mt.b, mt.c,
case when r.rec = 1 then 'PERFORMANCE' else 'MECHANICAL' end
case when r.rec = 1 then mt.perf_total else mt.mech_total end
from mytable mt
cross join (select rownum rec from dual connect by level <= 2) r
where (mt.perf_total is not null and r.rec=1)
or (mt.mech_total is not null and r.rec = 2);
You are describing union all:
select . . . , 'performance' as which, performance_total
from t
where performance_total is not null
union all
select . . . , 'mechanical' as which, mechanical_total
from t
where mechanical_total is not null;
This does require scanning the table twice. I'm not sure if that is such a big hit on a base table with a million rows, which should fit into memory.
If it were (and this would be particularly true if the table were really a view), then I would phrase an unpivot as:
select . . . , pm.which,
(case when which = 'performance' then performance_total
else mechanical_total
end)
from t cross join
(select 'performance' as which from dual union all
select 'mechanical' as which from dual
) pm
where (case when which = 'performance' then performance_total
else mechanical_total
end) is not null;
Or, in the most recent versions of Oracle, use a lateral join:
select . . . , pm.which, pm.total
from t cross join lateral
(select 'performance' as which, performance_total as total from dual union all
select 'mechanical' as which, mechanical_total from dual
) pm
where total is not null;
Just to let you know I used the UNPIVOT on my records in the end it worked a treat.
SELECT TYPE_OF_RECORD, RECORD_POINTS, 28 columns
FROM ( SELECT PERF_TOTAL, MECH_TOTAL, 28 columns
FROM TABLE UNPIVOT (RECORD_POINTS FOR TYPE_OF_RECORD IN
(PERF_TOTAL AS 'PERF',
MECH_TOTAL AS 'MECH'))
WHERE RECORD_POINTS > 0
) X;
I have noticed strange behaviour in some SQL code used for address matching at the company I work for & have created some test SQL to illustrate the issue.
; WITH Temp (Id, Diff) AS (
SELECT 9218, 0
UNION
SELECT 9219, 0
UNION
SELECT 9220, 0
)
SELECT TOP 1 * FROM Temp ORDER BY Diff DESC
Returns 9218 but
; WITH Temp (Id, Name) AS (
SELECT 9218, 'Sonnedal'
UNION
SELECT 9219, 'Lammermoor'
UNION
SELECT 9220, 'Honeydew'
)
SELECT TOP 1 *, DIFFERENCE(Name, '') FROM Temp ORDER BY DIFFERENCE(Name, '') DESC
returns 9219 even though the Difference() is 0 for all records as you can see here:
; WITH Temp (Id, Name) AS (
SELECT 9218, 'Sonnedal'
UNION
SELECT 9219, 'Lammermoor'
UNION
SELECT 9220, 'Honeydew'
)
SELECT *, DIFFERENCE(Name, '') FROM Temp ORDER BY DIFFERENCE(Name, '') DESC
which returns
9218 Sonnedal 0
9219 Lammermoor 0
9220 Honeydew 0
Does anyone know why this happens? I am writing C# to replace existing SQL & need to return the same results so I can test that my code produces the same results. But I can't see why the actual SQL used returns 9219 rather than 9218 & it doesn't seem to make sense. It seems it's down to the Difference() function but it returns 0 for all the record in question.
When you call:
SELECT TOP 1 *, DIFFERENCE(Name, '')
FROM Temp l
ORDER BY DIFFERENCE(Name, '') DESC
All three records have a DIFFERENCE value of zero, and hence SQL Server is free to choose from any of the three records for ordering. That is to say, there is no guarantee which order you will get. The same is true for your second query. Actually, it is possible that the ordering for the same query could even change over time. In practice, if you expect a certain ordering, you should provide exact logic for it, e.g.
SELECT TOP 1 *
FROM Temp
ORDER BY Id;
I'm using the RAND function in bigquery to provide me with a random sample of data, and unioning it with another sample of the same dataset.
This is for a machine learning problem where I'm interested in one class more than the other.
I've recreated the logic using a public dataset.
SELECT
COUNT(1),
bigarticle
FROM
(
SELECT
1 as [bigarticle]
FROM [bigquery-public-data:samples.wikipedia]
WHERE num_characters > 50000
),
(
SELECT
0 as [bigarticle]
FROM [bigquery-public-data:samples.wikipedia]
WHERE (is_redirect is null) AND (RAND() < 0.01)
)
GROUP BY bigarticle
Most of the time this behaves as expected,
giving one row with the count of rows where num_characters is more than 50k,
and giving another row with a count of a 1% sample of rows where is_redirect is null.
(This is an approximation of the logic I use in my internal dataset).
If you run this query repeatedly, occasionally it gives unexpected results.
In this result set (bquijob_124ad56f_15da8af982e) I only get a single row, containing the count of bigarticle = 1.
RAND does not use a deterministic seed. If you want deterministic results, you need to hash/fingerprint a column in the table and use a modulus to select a subset of values instead. Using legacy SQL:
#legacySQL
SELECT
COUNT(1),
bigarticle
FROM (
SELECT
1 as [bigarticle]
FROM [bigquery-public-data:samples.wikipedia]
WHERE num_characters > 50000
), (
SELECT
0 as [bigarticle]
FROM [bigquery-public-data:samples.wikipedia]
WHERE (is_redirect is null) AND HASH(title) % 100 = 0
)
GROUP BY bigarticle;
Using standard SQL in BigQuery, which is recommended since legacy SQL is not under active development:
#standardSQL
SELECT
COUNT(*),
bigarticle
FROM (
SELECT
1 as bigarticle
FROM `bigquery-public-data.samples.wikipedia`
WHERE num_characters > 50000
UNION ALL
SELECT
0 as bigarticle
FROM `bigquery-public-data.samples.wikipedia`
WHERE (is_redirect is null) AND MOD(FARM_FINGERPRINT(title), 100) = 0
)
GROUP BY bigarticle;
i am using Microsoft SQL Server 2008
i would like to save the result of a subquery to reuse it in a following subquery.
Is this possible?
What is best practice to do this? (I am very new to SQL)
My query looks like:
INSERT INTO [dbo].[TestTable]
(
[a]
,[b]
)
SELECT
(
SELECT TOP 1 MAT_WS_ID
FROM #TempTableX AS X_ALIAS
WHERE OUTERBASETABLE.LT_ALL_MATERIAL = X_ALIAS.MAT_RM_NAME
)
,(
SELECT TOP 1 MAT_WS_NAME
FROM #TempTableY AS Y_ALIAS
WHERE Y_ALIAS.MAT_WS_ID = MAT_WS_ID
--(
--SELECT TOP 1 MAT_WS_ID
--FROM #TempTableX AS X_ALIAS
--WHERE OUTERBASETABLE.LT_ALL_MATERIAL = X_ALIAS.MAT_RM_NAME
--)
)
FROM [dbo].[LASERTECHNO] AS OUTERBASETABLE
My question is:
Is this correct what i did.
I replaced the second SELECT Statement in the WHERE-Clause for [b] (which is commented out and exactly the same as for [a]), with the result of the first SELECT Statement of [a] (=MAT_WS_ID).
It seems to give the right results.
But i dont understand why!
I mean MAT_WS_ID is part of both temporary tables X_ALIAS and Y_ALIAS.
So in the SELECT statement for [b], in the scope of the [b]-select-query, MAT_WS_ID could only be known from the Y_ALIAS table. (Or am i wrong, i am more a C++, maybe the scope things in SQL and C++ are totally different)
I just wannt to know what is the best way in SQL Server to reuse an scalar select result.
Or should i just dont care and copy the select for every column and the sql server optimizes it by its own?
One approach would be outer apply:
SELECT mat.MAT_WS_ID
, (
SELECT TOP 1 MAT_WS_NAME
FROM #TempTableY AS Y_ALIAS
WHERE Y_ALIAS.MAT_WS_ID = mat.MAT_WS_ID
)
FROM [dbo].[LASERTECHNO] AS OUTERBASETABLE
OUTER APPLY
(
SELECT TOP 1 MAT_WS_ID
FROM #TempTableX AS X_ALIAS
WHERE OUTERBASETABLE.LT_ALL_MATERIAL = X_ALIAS.MAT_RM_NAME
) as mat
You could rank rows in #TempTableX and #TempTableY partitioning them by MAT_RM_NAME in the former and by MAT_WS_ID in the latter, then use normal joins with filtering by rownum = 1 in both tables (rownum being the column containing the ranking numbers in each of the two tables):
WITH x_ranked AS (
SELECT
*,
rownum = ROW_NUMBER() OVER (PARTITION BY MAT_RM_NAME ORDER BY (SELECT 1))
FROM #TempTableX
),
y_ranked AS (
SELECT
*,
rownum = ROW_NUMBER() OVER (PARTITION BY MAT_WS_ID ORDER BY (SELECT 1))
FROM #TempTableY
)
INSERT INTO dbo.TestTable (a, b)
SELECT
x.MAT_WS_ID,
y.MAT_WS_NAME
FROM dbo.LASERTECHNO t
LEFT JOIN x_ranked x ON t.LT_ALL_MATERIAL = x.MAT_RM_NAME AND x.rownum = 1
LEFT JOIN y_ranked y ON x.MAT_WS_ID = y.MAT_WS_ID AND y.rownum = 1
;
The ORDER BY (SELECT 1) bit is a trick to specify an indeterminate ordering, which, accordingly, would result in indeterminate rownum = 1 rows picked by the query. That is to more or less duplicate your TOP 1 without an explicit order, but I would recommend you to specify a more sensible ORDER BY clause to make the results more predictable.