Netezza SQL For Loop - sql

I have the following query Which I will call query1:
with a as (
select customer_key as cust,
sum(sales)*1.0/4 as avg_sales
sum(returns)*1.0/4 as avg_return
count(distinct order_key)*1.04 as avg_num_orders
from orders_table
where purch_year between 2011 and 2014
group by cust
order by random()
),
b as (
select *
from a
where avg_num_orders > .25
limit 100000
)
select case
when avg_num_orders <= 1 then 'Low'
when avg_num_orders between 1 and 4 then 'Medium'
when avg_num_orders > 4 then 'High'
end as estimated_frequency,
count(cust) as num_purchasers_year,
sum(avg_num_orders) as num_orders_year,
avg(avg_num_orders) as avg_num_order_year,
sum(avg_sales) as avg_sales_year,
sum(avg_total_return) as avg_return_year,
avg_sales_year/num_orders_year as AOV,
avg_sales_year/num_purchasers_year as ACS,
stddev(avg_sales) as sales_stddev
from b
where avg_num_orders > .25
group by estimated_frequency
order by estimated_frequency;
I want to write code that does the following (this is what does not work, I have provided pseudocode). I do not have permission to create a procedure.
Create table temp1
for i in 1..100 loop
insert into temp1 the result of QUERY1
end loop
then
select estimated_frequency,
avg(acs),
avg(sales_stddev)
from temp1
group by estimated_frequency
Essentially, I want to run query1 100 times, and store the results in a table called temp1, and then compute some averages on temp1 once i am all said and done.
Thank you for your help

I would have added this as a comment but don't have enough rep.
The only option I can see is to do this outside of Netezza and write your loop in a batch file/shell script/Python script/...
I tried the following but note that this does NOT work because the random number is generated only once and then reused, so you get 100 identical samples.
-- Test view which gives some random data from an existing table.
create view my_view as
select
m.*
from my_table t
join (
select (floor(random()*10)+1)::integer rand_id -- assuming I have ids from 1 to 10
) x on x.rand_id = t.id;
create table results (id integer, data double precision);
insert into results
select v.*
from my_view v
cross join table(generate_series(1,100));
Generate_series is a user-defined table function that you can get from the Enzee Community website.

Related

Distance between geography points return each TOP 1

I have two sql server 2016 tables:
Customers which has about 200k rows
Shops which has 100 rows
Both tables contain a geography field.
For each customer I need to find their closest shop.
I have created the code below which gives the correct results but returns the distance between every shop and customer and I only need TOP 1 for each customer. I could create a ROW_NUMBER and then =1 but I'm worried about the amount of data this will create in the cte and how long it will take to run.
select t1.[CustNo_],t1.Latitude, t1.Longitude,t1.GeoLoc
,t2.[ShopCode]
,t1.GeoLoc.STDistance(t2.GeoLoc)/1000 as DistanceApartKM
from #Customers t1
join #Shops t2
on (t1.GeoLoc.STDistance(t2.GeoLoc) <= 10000)
order by t1.[CustNo_], DistanceApartKM
I am now going to try suggestion to use Cross Apply, amended code below.
select t1.[CustNo_],t1.Latitude, t1.Longitude,t1.GeoLoc
,x.[ShopCode]
,x.DistanceApartKM
from #Customers t1
cross apply (select top 1 t2.[ShopCode]
,t1.GeoLoc.STDistance(t2.GeoLoc)/1000 as DistanceApartKM
from #Shops t2
where (t1.GeoLoc.STDistance(t2.GeoLoc) <= 10000)
order by DistanceApartKM
) x
order by t1.[CustNo_]

Find duplicates in MS SQL table

I know that this question has been asked several times but I still cannot figure out why my query is returning values which are not duplicates. I want my query to return only the records which have identical value in the column Credit. The query executes without any errors but values which are not duplicated are also being returned. This is my query:
Select
_bvGLTransactionsFull.AccountDesc,
_bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate,
_bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit,
_bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName
From
_bvGLAccountsFinancial Inner Join
_bvGLTransactionsFull On _bvGLAccountsFinancial.AccountLink =
_bvGLTransactionsFull.AccountLink
Where
_bvGLTransactionsFull.Credit
IN
(SELECT Credit AS NumOccurrences
FROM _bvGLTransactionsFull
GROUP BY Credit
HAVING (COUNT(Credit) > 1 ) )
Group By
_bvGLTransactionsFull.AccountDesc, _bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate, _bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit, _bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(_bvGLTransactionsFull.Reference), _bvGLTransactionsFull.TrCode
Having
_bvGLTransactionsFull.TxDate > 01 / 11 / 2014 And
_bvGLTransactionsFull.Reference Like '5_____' And
_bvGLTransactionsFull.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
That's because you're matching on the credit field back to your table, which contains duplicates. You need to isolate the rows that are duplicated with ROW_NUMBER:
;WITH CTE AS (
SELECT *, ROW_NUMBER() OVER(PARTITION BY CREDIT ORDER BY (SELECT NULL)) AS RN
FROM _bvGLTransactionsFull)
Select
CTE.AccountDesc,
_bvGLAccountsFinancial.Description,
CTE.TxDate,
CTE.Description,
CTE.Credit,
CTE.Reference,
CTE.UserName
From
_bvGLAccountsFinancial Inner Join
CTE On _bvGLAccountsFinancial.AccountLink = CTE.AccountLink
WHERE CTE.RN > 1
Group By
CTE.AccountDesc, _bvGLAccountsFinancial.Description,
CTE.TxDate, CTE.Description,
CTE.Credit, CTE.Reference,
CTE.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(CTE.Reference), CTE.TrCode
Having
CTE.TxDate > 01 / 11 / 2014 And
CTE.Reference Like '5_____' And
CTE.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
Just as a side note, I would consider using aliases to shorten your queries and make them more readable. Prefixing the table name before each column in a join is very difficult to read.
I trust your code in terms of extracting all data per your criteria. With this, let me have a different approach and see your script "as-is". So then, lets keep first all the records in a temp.
Select
_bvGLTransactionsFull.AccountDesc,
_bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate,
_bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit,
_bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName
-- temp table
INTO #tmpTable
From
_bvGLAccountsFinancial Inner Join
_bvGLTransactionsFull On _bvGLAccountsFinancial.AccountLink =
_bvGLTransactionsFull.AccountLink
Where
_bvGLTransactionsFull.Credit
IN
(SELECT Credit AS NumOccurrences
FROM _bvGLTransactionsFull
GROUP BY Credit
HAVING (COUNT(Credit) > 1 ) )
Group By
_bvGLTransactionsFull.AccountDesc, _bvGLAccountsFinancial.Description,
_bvGLTransactionsFull.TxDate, _bvGLTransactionsFull.Description,
_bvGLTransactionsFull.Credit, _bvGLTransactionsFull.Reference,
_bvGLTransactionsFull.UserName, _bvGLAccountsFinancial.Master_Sub_Account,
IsNumeric(_bvGLTransactionsFull.Reference), _bvGLTransactionsFull.TrCode
Having
_bvGLTransactionsFull.TxDate > 01 / 11 / 2014 And
_bvGLTransactionsFull.Reference Like '5_____' And
_bvGLTransactionsFull.Credit > 0.01 And
_bvGLAccountsFinancial.Master_Sub_Account = '90210'
Then remove the "single occurrence" data by creating a row index and remove all those 1 time indexes.
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY Credit ORDER BY Credit) AS rowIdx
, *
FROM #tmpTable) AS innerTmp
WHERE
rowIdx != 1
You can change your preference through PARTITION BY <column name>.
Should you have any concerns, please raise it first as these are so far how I understood your case.
EDIT : To include those credits that has duplicates.
SELECT
tmp1.*
FROM #tmpTable tmp1
RIGHT JOIN (
SELECT
Credit
FROM (
SELECT
ROW_NUMBER() OVER (PARTITION BY Credit ORDER BY Credit) AS rowIdx
, *
FROM #tmpTable) AS innerTmp
WHERE
rowIdx != 1
) AS tmp2
ON tmp1.Credit = tmp2.Credit

Return records repeated according to field value

I have a table like this
Name Qty
----- -----
BD 2
SD 1
XE 3
I need to return this table records repeated according to the field Qty value. for example first row would be repeated twice with all same values.
I was thinking to use nested FOR Select inside stored procedure with return parameters
For Select name, qty from mytable into :name, :qty do
begin
if (qty > 1 ) then begin
i = 1; -- should I start from 2 ?
while (i <= :qty) do
begin
for select name, qty from mytable into :name1, :qty1 do ...
SUSPEND;
i = i + 1;
end
end
SUSPEND;
end
Can this stored procedure return the correct result or should I use another way ?
I use FireBird 2.5 and please discard any typos in the previous SQL I am looking only to main idea.
You can do it using recursive CTE, supported since version 2.1, something like
WITH RECURSIVE tQty AS(
SELECT ta.name, ta.qty
FROM T ta
WHERE(ta.qty > 0)
UNION ALL
SELECT tb.name, tb.qty-1
FROM tQty tb
WHERE(tb.qty-1 > 0)
)
SELECT name,qty FROM tQty
ORDER BY name
If you have a Numbers table, this is trivial:
SELECT
OT.Name
FROM
OrigTable AS OT
INNER JOIN Numbers AS N ON N.Number BETWEEN 1 AND OT.Qty
I'm not familiar with Firebird, but I trust this very simple syntax applies.

Sql query for adding all the rows' data except

I am looking for a query that performs sum operation on the all the rows except one. It will be more clear by the example below..
Suppose i have a company table like this
Company_name Rev order
c1 100 1000
c2 200 2000
c3 300 1500
now the query should insert into a table like the way explained below:
c1(rev) c1(order) sum of other(rev) other(order)
100 1000 500(sum of c2 and c3) 3500(sum of c2 and c3's order)
What would be the query for this kind of scenario?
I was thinking of a query:
insert into table_name (c1_rev,c1_order,sum_rev,sum_order)
select rev, order, sum(rev), sum(order) where Company_name=c1 ....
but I got stuck as I can not find the sum of other two using this.
In SQL server, you could do something like this to fetch the data:
WITH totals AS
(SELECT SUM(rev) revSum, SUM(ORDER_) orderSum
FROM T)
SELECT company_name,
rev,
order_,
totals.revsum - rev AS otherRev,
totals.orderSum - order_ AS otherOrder
FROM t, totals
Try this query:
SELECT Company_name ,
Rev ,
ORDER,
(SELECT SUM(Rev ) FROM table_name k WHERE k.Company_name<>t.Company_name
)"sum of other(rev)",
(SELECT SUM(order ) FROM table_name k WHERE k.Company_name<>t.Company_name
)"other(order)"
FROM table_name t
In Hive, a query similar to:
select sq.cnn,sum(rev),sum(orders) from
(select if(cn=='c1','c1','other') as cnn, rev,orders from test_cn) sq
group by sq.cnn;
will generate 2 rows :
c1 100 1000
other 500 3500
Doesn't exactly match your output, but can extract in a form you need.
To test, create a text file with the following:
% cat test_cn
c1|100|1000
c2|200|2000
c3|300|3000
and in hive:
hive> drop table if exists test_cn;
hive> create table test_cn (cn string, rev int, orders int) row format delimited fields terminated by '|' stored as textfile;
hive> select sq.cnn,sum(rev),sum(orders) from (select if(cn=='c1','c1','other') as cnn, rev,orders from test_cn)sq group by sq.cnn;
Having without Group By would be the best solution but HIVE dosen't suport it for now, try this:
insert into table table_name
select rev, order, sumRev, sumOrd
from (
select Company_name,rev, order, sum(rev)-rev as sumRev, sum(order) - order as sumOrd from base_table
) a
where Company_name='c1'

How do I get the top 10 results of a query?

I have a postgresql query like this:
with r as (
select
1 as reason_type_id,
rarreason as reason_id,
count(*) over() count_all
from
workorderlines
where
rarreason != 0
and finalinsdate >= '2012-12-01'
)
select
r.reason_id,
rt.desc,
count(r.reason_id) as num,
round((count(r.reason_id)::float / (select count(*) as total from r) * 100.0)::numeric, 2) as pct
from r
left outer join
rtreasons as rt
on
r.reason_id = rt.rtreason
and r.reason_type_id = rt.rtreasontype
group by
r.reason_id,
rt.desc
order by r.reason_id asc
This returns a table of results with 4 columns: the reason id, the description associated with that reason id, the number of entries having that reason id, and the percent of the total that number represents.
This table looks like this:
What I would like to do is only display the top 10 results based off the total number of entries having a reason id. However, whatever is leftover, I would like to compile into another row with a description called "Other". How would I do this?
with r2 as (
...everything before the select list...
dense_rank() over(order by pct) cause_rank
...the rest of your query...
)
select * from r2 where cause_rank < 11
union
select
NULL as reason_id,
'Other' as desc,
sum(r2.num) over() as num,
sum(r2.pct) over() as pct,
11 as cause_rank
from r2
where cause_rank >= 11
As said above Limit and for the skipping and getting the rest use offset... Try This Site
Not sure about Postgre but SELECT TOP 10... should do the trick if you sort correctly
However about the second part: You might use a Right Join for this. Join the TOP 10 Result with the whole table data and use only the records not appearing on the left side. If you calculate the sum of those you should get your "Sum of the rest" result.
I assume that vw_my_top_10 is the view showing you the top 10 records. vw_all_records shows all records (including the top 10).
Like this:
SELECT SUM(a_field)
FROM vw_my_top_10
RIGHT JOIN vw_all_records
ON (vw_my_top_10.Key = vw_all_records.Key)
WHERE vw_my_top_10.Key IS NULL