How to use row_number and partition function in sqldf - sql

Update
I can run below sql query in netezza database, but it goes wrong in sqldf package in R
> sqldf("SELECT TEXT,
+ VEH_MAKE_NM,
+ NEW_USED_CD,
+ PRODUCT,
+ OVERALL_SUBV_IND,
+ AS_OF_DATE,
+ CATEGORY,
+ ROW_NUMBER() OVER(PARTITION BY TEXT, VEH_MAKE_NM, NEW_USED_CD, PRODUCT, OVERALL_SUBV_IND, AS_OF_DATE ORDER BY CATEGORY DESC) RN_CATEGORY,
+ SUBCATEGORY,
+ ROW_NUMBER() OVER(PARTITION BY TEXT, VEH_MAKE_NM, NEW_USED_CD, PRODUCT, OVERALL_SUBV_IND, AS_OF_DATE ORDER BY SUBCATEGORY DESC) RN_SUBCATEGORY
+ FROM output
+ --GROUP BY 1,2,3,4,5,6")
Error in sqliteSendQuery(con, statement, bind.data) :
error in statement: near "(": syntax error
I think it might because sqldf package doesn't support netezza SQL. Is there a netezza sql package in R?
Thanks

Step 1. Add row number column into output dataframe:
output['RN_CATEGORY'] = output.sort_values(['CATEGORY'],
ascending=False).groupby(['TEXT', 'VEH_MAKE_NM', 'NEW_USED_CD', 'PRODUCT',
'OVERALL_SUBV_IND', 'AS_OF_DATE']).cumcount() + 1
output['RN_SUBCATEGORY'] =output.sort_values(['SUBCATEGORY'],
ascending=False).groupby(['TEXT', 'VEH_MAKE_NM', 'NEW_USED_CD', 'PRODUCT',
'OVERALL_SUBV_IND', 'AS_OF_DATE']).cumcount() + 1
Step 2.
sqldf("SELECT TEXT,
VEH_MAKE_NM,
NEW_USED_CD,
PRODUCT,
OVERALL_SUBV_IND,
AS_OF_DATE,
CATEGORY,
RN_CATEGORY,
SUBCATEGORY,
RN_SUBCATEGORY
FROM output
--GROUP BY 1,2,3,4,5,6")

Related

Pervasive SQL order by with if

In Pervasive SQL 11 I could use a IF statement in the ORDER BY:
SELECT *
FROM (
SELECT
D1001 as 'part_number',
'' as 'required_date',
'' as 'confirmed_date'
FROM PULAGER
WHERE
D1001 LIKE '1121%'
UNION
SELECT
D5410 as 'part_number',
D5511 as 'required_date',
D5513 as 'confirmed_date'
FROM PUIKOKRO
WHERE
D5410 LIKE '1121%'
) as t1
ORDER BY part_number, IF (confirmed_date = '', required_date, confirmed_date)
But after an upgrade version 15.10.031, I get the error "Reference to column name not allowed in ORDER BY with UNION". No error if I remove the IF statement. Any suggestions?
First order by part_number and then order by required_date or confirmed_date depending on the state of confirmed_date.
I solved it by moving the IF statement to the SELECT to create a new column 'sort_date' and wrapping it all with another SELECT. Doesn't feel like the most beautiful solution, but it works.
SELECT * FROM (
SELECT t1.*, IF (confirmed_date = '', required_date, confirmed_date) as 'sort_date' FROM t1
) ORDER BY part_number, sort_date

Query gives me a syntax error in From clause error

This SQL query runs perfectly in an older MS Access file but after recreating it for a newer table it gives me a Syntax error in the from clause.
select * from ( with dso as (select month, product_line, third_cust_indic as sales_type,
sum(sales) + sum(beg_snb_ra) as beg_sales,
What could be the problem?
If this is SQL Server, then the with statement needs to precede the select:
with dso as (
select . . .
. . .
)
select . . .
with dso as (select year, month, product_line_id, third_party_cust_indic as sales_type,
sum(beg_snb_sales) + sum(beg_snb_ra) as beg_snb_sales,
sum(snb_sales) + sum(snb_ra) as snb_sales,
sum(beg_snb_cost) as beg_snb_cost,
sum(snb_cost) as snb_cost,
sum(spa) as spa_amount,
sum(invoiced_sales) + coalesce(sum(ic_extended_price),0) as invoiced_sales,
sum(gross_invoiced_sales) as gross_invoiced_sales,
sum(invoiced_cost)+ coalesce(sum(ic_extended_cost),0) as cost,
sum(beg_bl_future) + sum(beg_bl_dated) + sum(beg_bl_alloc) + sum(beg_bl_not_produced) as beginning_backlog,
sum(bl_future) + sum(bl_dated) + sum(bl_alloc) + sum(bl_not_produced) as current_backlog,
sum(bl_alloc) as bl_alloc,
sum(bl_dated) as bl_dated,
sum(bl_future) as bl_future,
sum(bl_not_produced) as bl_not_produced,
sum(promos) + sum(edi_disc_allowance) + sum(new_store_allowance) + sum(ord_level_disc_allowance) + sum (rdc_allowance) + sum(cert_rec_allowance)
+ sum(defective_allowance) + sum(freight_allowance) + sum(adv_coop_allowance) + sum(store_svc_allowance) + sum(promo_amt) + sum(below_min_order_fee) as order_level_discounts,
manufacturing_location, inv_legal_entity || ' ' || inv_legal_entity_desc as shipping_location
from prodsales.monthly_sales_orders
where year = 2018
and month = 10
and (CUST_LEGAL_ENTITY = '001' )
-- or ar_legal_entity = '001')
and PRODUCT_LINE_ID <> '000'
and BUSINESS_UNIT_ID Not in ('OI','IC','BI','RF')
group by year, month, product_line_id, third_party_cust_indic, manufacturing_location, inv_legal_entity || ' ' || inv_legal_entity_desc
)
, Pl as (select distinct product_line_id as pl from dso)
, YN as (select distinct sales_type as slstyp, year, month, day from dso)
, pl_indic as (select * from pl, yn)
select * from DSO
... though you don't seem to be using PL, YN or PL_INDIC.
Your query looks like:
select *
from (
with dso as (
select
...
from prodsales.monthly_sales_orders
where
...
)
and ...
group by ...
),
Pl as (select ... from dso),
YN as (select ... from dso),
pl_indic as (select ... from YN)
There are a lot of problems here:
as answer by Gordon Linoff, the with statement must precede the select
the conditions after the end of the declaration of dso statement make not sense, they basically cannot relate to anything
P1 and YN are outside of the with statement, whereas their syntax suggest that they should belong to it : Pl as (...))
That's far too many issues to make it possible to provide suggestions on how to rewrite the query. It also makes it extremely unlikely that this ever ran on any database. You would need to entirely revise the logic of this query.

How to get data from 2 rows which has same data in all columns except one in MSSQL

As in my title I want to take data from 2 rows but In my case each 2nd row has one different value compare to the first row.
I want to take all the common data along with the different data as a single row .
Here you can see each row has same values in another row except the 2nd rows last column.
Thanks.
Edits Result :
I suspect you have a some kind of ordering columns that could specify your actual data ordering if so, then you can use row_number() function
select * from (
select *,
row_number() over (partition by <common data cols> order by ? desc) Seq
from table t
) t
where seq = 1;
EDIT : I don't believe your inventort_item_id columns but yes you could use creation_date for ordering purpose
SELECT
EPI.ITEM_CODE, LMP.PROD_DESC, LLPC.COLOC_PROD_PRICE,
BASE_PATH + '' + EPI.IMAGE_FOLDER_NAME + '/' + EPI.IMAGE_DESCRIPTION AS POPULAR_PRODUCTS_IMAGE_PATHS
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY ITEM_CODE ORDER BY creation_date DESC) as Seq
FROM ECOM_PRODUCT_IMAGES EPI
) EPI
INNER JOIN ECOM_POPULAR_PRODUCTS_MAPPING EPPIM ON EPPIM.ITEM_CODE = EPI.ITEM_CODE
INNER JOIN LOM_MST_PRODUCT LMP ON LMP.PROD_CODE = EPI.ITEM_CODE
INNER JOIN LOM_LNK_PROD_COMP LLPC ON LLPC.COLOC_PROD_CODE = LMP.PROD_CODE
WHERE EPI.Seq = 1 AND
EPPIM.ITEM_STATUS = 'ACTIVE';
EDIT 2: In that case you need to use GROUP BY clause with conditional aggregation
SELECT
EPI.ITEM_CODE, LMP.PROD_DESC, LLPC.COLOC_PROD_PRICE,
MAX(CASE WHEN EPI.Seq = 2
THEN (BASE_PATH + '' + EPI.IMAGE_FOLDER_NAME + '/' + EPI.IMAGE_DESCRIPTION)
END) AS POPULAR_PRODUCTS_IMAGE_PATHS,
MAX(CASE WHEN EPI.Seq = 1
THEN (BASE_PATH + '' + EPI.IMAGE_FOLDER_NAME + '/' + EPI.IMAGE_DESCRIPTION)
END) AS PATH_NEW
FROM (SELECT *,
ROW_NUMBER() OVER (PARTITION BY ITEM_CODE ORDER BY creation_date DESC) as Seq
FROM ECOM_PRODUCT_IMAGES EPI
) EPI
INNER JOIN ECOM_POPULAR_PRODUCTS_MAPPING EPPIM ON EPPIM.ITEM_CODE = EPI.ITEM_CODE
INNER JOIN LOM_MST_PRODUCT LMP ON LMP.PROD_CODE = EPI.ITEM_CODE
INNER JOIN LOM_LNK_PROD_COMP LLPC ON LLPC.COLOC_PROD_CODE = LMP.PROD_CODE
WHERE EPPIM.ITEM_STATUS = 'ACTIVE'
GROUP BY EPI.ITEM_CODE, LMP.PROD_DESC, LLPC.COLOC_PROD_PRICE;
here is my approach, also using a window function.
sample data
if object_id('tempdb..#x') is not null drop table #x
CREATE TABLE #x (ITEM_CODE VARCHAR(10), PROD_DESC VARCHAR(20),
COLOR_PROD_PRICE DECIMAL, POPULAR_PRODUCTS_IMAGE_PATHS VARCHAR(200))
INSERT INTO #X(ITEM_CODE,PROD_DESC,COLOR_PROD_PRICE,POPULAR_PRODUCTS_IMAGE_PATHS) VALUES
('P0001', 'Axe Brand', 88.000, 'some_path_to_img1.jpg'),
('P0001', 'Axe Brand', 88.000, 'some_path_to_img2.jpg'),
('P0002', 'Almond Nuts', 499.000, 'some_path_to_img1.jpg'),
('P0002', 'Almond Nuts', 499.000, 'some_path_to_img2.jpg')
query - just change #x to your table and it should work
;WITH my_cte as
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY ITEM_CODE ORDER BY POPULAR_PRODUCTS_IMAGE_PATHS) AS 'track_row'
FROM #x
)
SELECT a.ITEM_CODE, a.PROD_DESC, a.COLOR_PROD_PRICE,
a.POPULAR_PRODUCTS_IMAGE_PATHS + ' ' + b.POPULAR_PRODUCTS_IMAGE_PATHS AS 'POPULAR_PRODUCTS_IMAGE_PATHS'
FROM my_cte AS a
INNER JOIN
my_cte AS b ON a.ITEM_CODE=b.ITEM_CODE
WHERE a.track_row=1 AND b.track_row=2
output
ITEM_CODE PROD_DESC COLOR_PROD_PRICE POPULAR_PRODUCTS_IMAGE_PATHS
P0001 Axe Brand 88 some_path_to_img1.jpg some_path_to_img2.jpg
P0002 Almond Nuts 499 some_path_to_img1.jpg some_path_to_img2.jpg

'LAG' function not working in Amazon Redshift

I'm trying to find out the retention rate by using the following query on Amazon Redshift:
WITH t AS (
SELECT ga.ownerid,
DATE_PART('month',ga.creationtime) AS month,
COUNT(*) AS item_transactions,
LAG(DATE_PART('month',ga.creationtime)) OVER (PARTITION BY ownerid ORDER BY DATE_PART('month',ga.creationtime)) = DATE_PART('month',ga.creationtime) -interval '1 month' OR NULL AS repeat_transaction
FROM flx2.groupactivities ga
JOIN auth.members m ON ga.ownerid = m.id
WHERE ga.activitytype = 'assign'
AND ga.groupid NOT IN (SELECT groupid
FROM (SELECT groupid,
COUNT(DISTINCT memberid)
FROM flx2.grouphasmembers
GROUP BY groupid
HAVING COUNT(DISTINCT memberid) = 1))
AND ga.ownerid IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 5)
AND ga.ownerid NOT IN (SELECT memberid FROM auth.memberhasroles WHERE roleid = 25)
GROUP BY ga.ownerid,
DATE_TRUNC('month',ga.creationtime)
ORDER BY ga.ownerid,
DATE_TRUNC('month',ga.creationtime)
)
SELECT month
,sum(item_transactions) AS num_trans
,count(*) AS num_buyers
,count(repeat_transaction) AS repeat_buyers
,round(
CASE WHEN sum(item_transactions) > 0
THEN count(repeat_transaction) / sum(item_transactions) * 100
ELSE 0
END, 2) AS buyer_retention
FROM t
GROUP BY 1
ORDER BY 1;
but it gives me the following error:
An error occurred when executing the SQL command:
WITH t AS (
SELECT ga.ownerid,
DATE_TRUNC('month',ga.creationtime) AS month,
COUNT(*) AS item_transactions,
LAG(DATE_TRUNC('month...
[Amazon](500310) Invalid operation: ORDER/GROUP BY expression not found in targetlist;
Execution time: 0.29s
1 statement failed.
I believe there's something wrong with the LAG function here, but I'm not quite sure. I got this query from the post here and I modified it according to my requirements.
Would someone please be able to help me out with what's going wrong here?
I appreciate the help in advance.
A quick look, but lag by itself is not an aggregate function so repeat_transaction would need to be included in the group by.

How to use an ALIAS in a PostgreSQL ORDER BY clause?

I have the following query:
SELECT
title,
(stock_one + stock_two) AS global_stock
FROM
product
ORDER BY
global_stock = 0,
title;
Running it in PostgreSQL 8.1.23 i get this error:
Query failed: ERROR: column "global_stock" does not exist
Anybody can help me to put it to work? I need the availale items first, after them the unnavailable items. Many thanks!
You can always ORDER BY this way:
select
title,
( stock_one + stock_two ) as global_stock
from product
order by 2, 1
or wrap it in another SELECT:
SELECT *
from
(
select
title,
( stock_one + stock_two ) as global_stock
from product
) x
order by (case when global_stock = 0 then 1 else 0 end) desc, title
One solution is to use the position:
select title,
( stock_one + stock_two ) as global_stock
from product
order by 2, 1
However, the alias should work, but not necessarily the expression. What do you mean by "global_stock = 0"? Do you mean the following:
select title,
( stock_one + stock_two ) as global_stock
from product
order by (case when global_stock = 0 then 1 else 0 end) desc, title
In case anyone finds this when googling for whether you can just ORDER BY my_alias: Yes, you can. This cost me a couple hours.
As the postgres docs state:
The ordinal number refers to the ordinal (left-to-right) position of the output column. This feature makes it possible to define an ordering on the basis of a column that does not have a unique name. This is never absolutely necessary because it is always possible to assign a name to an output column using the AS clause.
So either this has been fixed since, or this question is specifically about the ORDER BY my_alias = 0, other_column syntax which I didn't actually need.