I am looking for a query that performs sum operation on the all the rows except one. It will be more clear by the example below..
Suppose i have a company table like this
Company_name Rev order
c1 100 1000
c2 200 2000
c3 300 1500
now the query should insert into a table like the way explained below:
c1(rev) c1(order) sum of other(rev) other(order)
100 1000 500(sum of c2 and c3) 3500(sum of c2 and c3's order)
What would be the query for this kind of scenario?
I was thinking of a query:
insert into table_name (c1_rev,c1_order,sum_rev,sum_order)
select rev, order, sum(rev), sum(order) where Company_name=c1 ....
but I got stuck as I can not find the sum of other two using this.
In SQL server, you could do something like this to fetch the data:
WITH totals AS
(SELECT SUM(rev) revSum, SUM(ORDER_) orderSum
FROM T)
SELECT company_name,
rev,
order_,
totals.revsum - rev AS otherRev,
totals.orderSum - order_ AS otherOrder
FROM t, totals
Try this query:
SELECT Company_name ,
Rev ,
ORDER,
(SELECT SUM(Rev ) FROM table_name k WHERE k.Company_name<>t.Company_name
)"sum of other(rev)",
(SELECT SUM(order ) FROM table_name k WHERE k.Company_name<>t.Company_name
)"other(order)"
FROM table_name t
In Hive, a query similar to:
select sq.cnn,sum(rev),sum(orders) from
(select if(cn=='c1','c1','other') as cnn, rev,orders from test_cn) sq
group by sq.cnn;
will generate 2 rows :
c1 100 1000
other 500 3500
Doesn't exactly match your output, but can extract in a form you need.
To test, create a text file with the following:
% cat test_cn
c1|100|1000
c2|200|2000
c3|300|3000
and in hive:
hive> drop table if exists test_cn;
hive> create table test_cn (cn string, rev int, orders int) row format delimited fields terminated by '|' stored as textfile;
hive> select sq.cnn,sum(rev),sum(orders) from (select if(cn=='c1','c1','other') as cnn, rev,orders from test_cn)sq group by sq.cnn;
Having without Group By would be the best solution but HIVE dosen't suport it for now, try this:
insert into table table_name
select rev, order, sumRev, sumOrd
from (
select Company_name,rev, order, sum(rev)-rev as sumRev, sum(order) - order as sumOrd from base_table
) a
where Company_name='c1'
Related
SELECT Stock.*
FROM Stock
WHERE (
(
(Stock.ComputerPartNumber) In (SELECT [ComputerPartNumber] FROM [Stock] As Tmp GROUP BY [ComputerPartNumber] HAVING Count(*)=2)
)
AND
(
(Stock.EquipmentName)="EquipmentA" Or (Stock.EquipmentName)="EquipmentB")
)
OR (
(
(Stock.ComputerPartNumber) In (SELECT [ComputerPartNumber] FROM [Stock] As Tmp GROUP BY [ComputerPartNumber] HAVING Count(*)=1)
)
AND (
(Stock.EquipmentName)="EquipmentA" Or (Stock.EquipmentName)="EquipmentB"
)
);
I am using the above SQL to achieve below 3 items:-
Find out all of the ComputerPartNumber which used by EquipmentA and/or EquipmentB only
Filter out the query result if the ComputerPartNumber used by equipment other than EquipmentA and EquipmentB.
If the ComputerPartNumber is used by both EquipmentA and EquipmentC, filter out the result also.
However the item 3 cannot be filtered out successfully. What should I do in order to achieve the item3?
Table and Query snapshots are attached. Thanks in advance!
Table
Query
What you need to do is to check if the total number of times a part is used in all pieces of Equipment is equal to the total number of times a part is used by either Equipment A or B:
SELECT S.StorageID, S.ComputerPartNumber, S.EquipmentName, S.Result
FROM Stock AS S
WHERE
(SELECT COUNT(*) FROM Stock AS S1 WHERE S1.ComputerPartNumber=S.ComputerPartNumber)
=(SELECT COUNT(*) FROM Stock AS S2 WHERE S2.ComputerPartNumber=S.ComputerPartNumber AND S2.EquipmentName IN("EquipmentA","EquipmentB"))
Regards,
You can use not exists:
select s.*
from stock as s
where not exists (select 1
from stock as s2
where s2.ComputerPartNumber = s.ComputerPartNumber and
s2.EquipmentName not in ("EquipmentA", "EquipmentB")
);
Have a data set similiar to this.
Customer_id PART_N PART_C TXN_ID
B123 268888 7902/7900 159
B123 12839 82900/8900 1278
B869 12839 8203/890025/7902 17890
B290 268888 62820/12839 179018
not sure how to combine PART_N and PART_C and find count(distinct customer_id) for each part the same part could be in PART_N or PART_C like part number 12839
I am interested in getting as following table using teradata
Part COUNT(Distinct Customer id)
268888 2
12839 3
7902 2
7900 1
82900 1
8900 1
8203 1
890025 1
62820 1
if it was just PART_N then it would be straight forward as just one part number is present per row. Unsure how I combine every part number and find how many distinct customer id each one has. If it helps I have all the list of distinct Part numbers in one table say table2.
I cannot not try this code, so see it as pseudocode and sketch of an idea.
SELECT numbers, COUNT(numbers)
FROM
(SELECT
REGEXP_SPLIT_TO_TABLE( -- B
CONCAT(PART_N, '/', PART_C), -- A
'/'
) as numbers
FROM table) s
GROUP BY numbers -- C
A: Concatenation of both columns into one string divided by the delimiter '/'
B: Split string by delimiter
C: Group string parts and count them
http://www.teradatawiki.net/2014/05/regular-expression-functions.html
This is pretty ugly.
First let's split those delimited strings up, using strtok_split_to_table.
create volatile table vt_split as (
select
txn_id,
token as part
from table
(strtok_split_to_table(your_table.txn_id,your_table.part_c,'/')
returns (txn_id integer,tokennum integer,token varchar(10))) t
)
with data
primary index (txn_id)
on commit preserve rows;
That will give you all those split apart, with the appropriate txn_id.
Then we can union that with the part_n values.
create volatile table vt_merged as (
select * from vt_split
UNION ALL
select
txn_id,
cast(part_n as varchar(10)) as part
from
vt_foo)
with data
primary index (txn_id)
on commit preserve rows;
Finally, we can join that back to your original table to get the counts of customer by part.
select
vt_merged.part,
count (distinct yourtable.customer_id)
from
vt_merged
inner join yourtable
on vt_merged.txn_id = yourtable.txn_id
group by 1
This could probably done a little bit cleaner, but it should get you what you're looking for.
This is #S-Man's pseudocode as working query:
WITH cte AS
(
SELECT Customer_id,
Trim(PART_N) ||'/' || PART_C AS all_parts
FROM tab
)
SELECT
part, -- if part should be numeric: Cast(part AS INT)
Count(DISTINCT Customer_id)
FROM TABLE (StrTok_Split_To_Table(cte.Customer_id, cte.all_parts, '/')
RETURNS (Customer_id VARCHAR(10), tokennum INTEGER, part VARCHAR(30))) AS t
GROUP BY 1
situation:
we have monthly files that get loaded into our data warehouse however instead of being replaced with old loads, these are just compiled on top of each other. the files are loaded in over a period of days.
so when running a SQL script, we would get duplicate records so to counteract this we run a union over 10-20 'customers' and selecting Max(loadID) e.g
SELECT
Customer
column 2
column 3
FROM
MyTable
WHERE
LOADID = (SELECT MAX (LOADID) FROM MyTable WHERE Customer= 'ASDA')
UNION
SELECT
Customer
column 2
column 3
FROM
MyTable
WHERE
LOADID = (SELECT MAX (LOADID) FROM MyTable WHERE Customer= 'TESCO'
The above union would have to be done for multiple customers so i was thinking surely there has to be a more efficient way.
we cant use a MAX (LoadID) in the SELECT statement as a possible scenario could entail the following;
Monday: Asda,Tesco,Waitrose loaded into DW (with LoadID as 124)
Tuesday: Sainsburys loaded in DW (with LoadID as 125)
Wednesday: New Tesco loaded in DW (with LoadID as 126)
so i would want LoadID 124 Asda & Waitrose, 125 Sainsburys, & 126 Tesco
Use window functions:
SELECT t.*
FROM (SELECT t.*, MAX(LOADID) OVER (PARTITION BY Customer) as maxLOADID
FROM MyTable t
) t
WHERE LOADID = maxLOADID;
Would a subquery to a derived table meet your needs?
select yourfields
from yourtables join
(select customer, max(loadID) maxLoadId
from yourtables
group by customer) derivedTable on derivedTable.customer = realTable.customer
and loadId = maxLoadId
I have the following query Which I will call query1:
with a as (
select customer_key as cust,
sum(sales)*1.0/4 as avg_sales
sum(returns)*1.0/4 as avg_return
count(distinct order_key)*1.04 as avg_num_orders
from orders_table
where purch_year between 2011 and 2014
group by cust
order by random()
),
b as (
select *
from a
where avg_num_orders > .25
limit 100000
)
select case
when avg_num_orders <= 1 then 'Low'
when avg_num_orders between 1 and 4 then 'Medium'
when avg_num_orders > 4 then 'High'
end as estimated_frequency,
count(cust) as num_purchasers_year,
sum(avg_num_orders) as num_orders_year,
avg(avg_num_orders) as avg_num_order_year,
sum(avg_sales) as avg_sales_year,
sum(avg_total_return) as avg_return_year,
avg_sales_year/num_orders_year as AOV,
avg_sales_year/num_purchasers_year as ACS,
stddev(avg_sales) as sales_stddev
from b
where avg_num_orders > .25
group by estimated_frequency
order by estimated_frequency;
I want to write code that does the following (this is what does not work, I have provided pseudocode). I do not have permission to create a procedure.
Create table temp1
for i in 1..100 loop
insert into temp1 the result of QUERY1
end loop
then
select estimated_frequency,
avg(acs),
avg(sales_stddev)
from temp1
group by estimated_frequency
Essentially, I want to run query1 100 times, and store the results in a table called temp1, and then compute some averages on temp1 once i am all said and done.
Thank you for your help
I would have added this as a comment but don't have enough rep.
The only option I can see is to do this outside of Netezza and write your loop in a batch file/shell script/Python script/...
I tried the following but note that this does NOT work because the random number is generated only once and then reused, so you get 100 identical samples.
-- Test view which gives some random data from an existing table.
create view my_view as
select
m.*
from my_table t
join (
select (floor(random()*10)+1)::integer rand_id -- assuming I have ids from 1 to 10
) x on x.rand_id = t.id;
create table results (id integer, data double precision);
insert into results
select v.*
from my_view v
cross join table(generate_series(1,100));
Generate_series is a user-defined table function that you can get from the Enzee Community website.
I would like to retrieve all rows matching a set of conditions on the same column. But I would like the rows only if ALL the conditions are good, and no row if only one condition fails.
For example, taking this table:
|id|name|
---------
|1 |toto|
|2 |tata|
I would like to be able to request if "tata" && "toto" are in this table. But when asking if "tata" and "tuto" are in, I would like an empty response if one of argument is in not in the table, for example asking if "toto" && "tutu" are included in the table.
How can I do that ?
Currently, I'am doing one query per argument, which is not very efficient. I tried several solutions including a subselect or a group+having, but no one is working like I want.
thanks for your support !
cheers
This isn't the most efficient way, but this query would work.
SELECT * FROM table_name
WHERE (name = 'toto' OR name = 'tata')
AND ( SELECT COUNT(*) FROM table_name WHERE name = 'toto') > 0
AND ( SELECT COUNT(*) FROM table_name WHERE name = 'tata') > 0
This is a little vague. If the names are unique, you could count the matching rows that match a where clause:
where name='toto' or name='tata'
If the count is 2, then you know both matched. If name is not unique you could potentially select the first ID (select top 1 id ...) that matches each in a union and count those with an outer select.
Even if you had an arbitrary number of names to match, you could create a stored procedure or code in whatever top-level language you are using to build the select statement.
SELECT 1 AS found FROM hehe
WHERE 1 IN (SELECT 1 FROM hehe WHERE name='tata')
AND 1 IN (SELECT 1 FROM hehe WHERE name='toto')
If name is unique you can simplify to:
SELECT *
FROM tbl
WHERE name IN ('toto', 'tata')
AND (SELECT count(*) FROM tbl WHERE name IN ('toto', 'tata')) > 1;
If it isn't:
SELECT *
FROM tbl
WHERE name IN ('toto', 'tata')
AND EXISTS (SELECT * FROM tbl WHERE name = 'toto')
AND EXISTS (SELECT * FROM tbl WHERE name = 'tata');
Or, in PostgreSQL, MySQL and possibly others:
SELECT *
FROM tbl
WHERE name IN ('toto', 'tata')
AND (SELECT count(DISTINCT name) FROM tbl WHERE name IN ('toto', 'tata')) > 1;