I have a code that counts spaces before a string - see sample table called TEST
id
value
1
AB FEB EB
in this case i want to count the spaces in front of "EB" which should be 2 but my query returns 1 since it considers "FEB" as "EB". How do i specifically make the query only count spaces preceding "EB"
Thanks!
select id
, REGEXP_COUNT(
SPLIT(
TRIM(
REGEXP_REPLACE(value, '[^[:digit:]]', ' ')
), 'EB'
)[0] , ' '
) count_of_spaces
from TEST
If you split by space, and then ask for the ARRAY_POISTION of 'EB' you will find the exact first match location:
select column1
,SPLIT(column1, ' ') as s
,ARRAY_POSITION('EB'::variant, s) as p
from values
('EB'),
('FEB EB'),
('AB FEB EB'),
('AB FEB EBX EB'),
('AB FEB EB EBX'),
('AB FEB FEB EB AB FEB FEB EB')
;
COLUMN1
S
P
EB
[ "EB" ]
0
FEB EB
[ "FEB", "EB" ]
1
AB FEB EB
[ "AB", "FEB", "EB" ]
2
AB FEB EBX EB
[ "AB", "FEB", "EBX", "EB" ]
3
AB FEB EB EBX
[ "AB", "FEB", "EB", "EBX" ]
2
AB FEB FEB EB AB FEB FEB EB
[ "AB", "FEB", "FEB", "EB", "AB", "FEB", "FEB", "EB" ]
3
Related
I need to create a condition which separates the data by decade. The first column is the year value (going back to year 0). How do I change the condition within the awk query?
0 Jan 10 2:04:40 Tot D
0 Jul 05 11:33:06 Tot A
3 May 04 22:22:05 Tot A
3 Oct 29 1:32:40 Tot D
7 Feb 20 23:03:27 Tot A
7 Aug 17 5:58:18 Tot D
10 Dec 10 6:28:52 Tot A
11 Jun 04 15:36:12 Tot D
14 Apr 04 4:41:23 Tot D
14 Sep 27 7:18:39 Tot A
18 Jan 20 10:38:27 Tot D
18 Jul 16 18:04:17 Tot A
21 May 15 5:47:44 Tot A
21 Nov 08 9:27:47 Tot D
22 May 04 23:00:32 Tot A
25 Mar 03 6:19:48 Tot A
25 Aug 27 13:47:51 Tot D
28 Dec 20 15:07:37 Tot A
29 Jun 14 22:37:10 Tot D
32 Apr 14 11:56:36 Tot D
32 Oct 07 15:38:15 Tot A
36 Jan 31 19:07:10 Tot D
36 Jul 27 0:39:47 Tot A
39 May 26 13:13:25 Tot A
39 Nov 19 17:26:37 Tot D
40 May 15 6:26:43 Tot A
I need to present the data as follows:
awk '{if ($1 >= 0 && $1 < 10) print }' All_Lunar_Eclipse.txt
0 Jan 10 2:04:40 Tot D
0 Jul 05 11:33:06 Tot A
3 May 04 22:22:05 Tot A
3 Oct 29 1:32:40 Tot D
7 Feb 20 23:03:27 Tot A
7 Aug 17 5:58:18 Tot D
But I would have to do it manually for every 10 years.
awk '{if ($1 >= 10 && $1 < 20) print }' All_Lunar_Eclipse.txt
10 Dec 10 6:28:52 Tot A
11 Jun 04 15:36:12 Tot D
14 Apr 04 4:41:23 Tot D
14 Sep 27 7:18:39 Tot A
18 Jan 20 10:38:27 Tot D
18 Jul 16 18:04:17 Tot A
I have tried something similar to the following with no joy.
awk 'BEGIN { for (i = 0; i <= 2019; +=10) print i }'
$ awk '
int(p/10)!=int($1/10) {
print "New decade begins:"
}
{ p=$1 }
1' file
0 Jan 10 2:04:40 Tot D
0 Jul 05 11:33:06 Tot A
3 May 04 22:22:05 Tot A
3 Oct 29 1:32:40 Tot D
7 Feb 20 23:03:27 Tot A
7 Aug 17 5:58:18 Tot D
New decade begins:
10 Dec 10 6:28:52 Tot A
11 Jun 04 15:36:12 Tot D
...
... on your definition of a decade (if ($1 >= 10 && $1 < 20)). I would've assumed that years 1-10 are the first decade 11-20 the second etc. Did not check, though. It would've made it one summation harder, too.
Depend on what your want but use the first line as info by dividing by 10 and catchin the integer value
awk '
# separator process
{ Decade = int( $1 / 10 ) }
# apply sample (unsorted and just stored by decade)
{ Data[ Decade] = Data[Decade] "\n" $0 }
END { for ( Dec in Data ) printf "--- Decade: %d ----\n%s\n", Dec, Data[ Dec] }
' YourFile
I am working on a hana table and i am trying to delete a table if it contains value from a list.
A B
22 01
22 01
22 02
22 06
23 01
23 01
23 06
I will like to drop some values from this table and have this.
A B
22 01
22 01
22 06
23 01
23 01
23 06
Basically i will like to most likely do a count and check if column B consists of 01 AND 02, if it does drop 02 and if it consists of only 01 leave as it is.
This seems virtually impossible with almost every sql script i have tried
SELECT BP, COUNT(*) AS SO FROM "EH"."BP_CUST" GROUP BY BP;
This scripts gets the count of each row and put it in SO column.
after that maybe do an if statement on the SO column and delete if the B field contains 01 and 02?
I tried doing and IF statement then select and i could not get it to work either.
A B
22 01
22 01
22 02
22 06
23 01
23 01
23 06
24 02
Becomes
A B
22 01
22 01
22 06
23 01
23 01
23 06
24 02
If I understand correctly, you want:
select c.*
from "EH"."BP_CUST" c
where c.b <> '02' or
not exists (select 1
from "EH"."BP_CUST" c2
where c2.a = c.a and c2.b = '01'
);
Your question says "delete". But I think the intention is to select "02" rows only when there is no "01" row for the same a (and all other rows).
If I understood correctly, this might be the solution:
DELETE BP_CUST
WHERE A IN
(
SELECT
BP_CUST.A
FROM
(
SELECT
A
, COUNT(CASE WHEN B != '02' THEN 1 ELSE NULL END) AS NOT_02
, COUNT(CASE WHEN B = '02' THEN 1 ELSE NULL END) AS IS_02
FROM BP_CUST
GROUP BY A
) AS t_delete
JOIN BP_CUST ON BP_CUST.A = t_delete.A
WHERE B = '02' AND NOT_02 > 0 AND IS_02 > 0
)
AND B = '02'
How can I make a static column/row in crosstab? See example below; can I have a fixed jan, feb, march, ... columns instead of it generating dynamically?
location jan feb mar apr may jun jul aug sep oct nov dec
london 500 62 200 50 0 60 100 46 89 200 150 210
paris 50 26 20 500 50 70 40 200 0 40 250 50
I want the column (jan, feb, mar, apr, ...) to always show up regardless of their measures zero or have values. Like they are fixed.
Here is the query I'm using:
select sum("AMOUNT"), "REQUESTDATE","description"
from(
SELECT SUM(e.AMOUNT)"AMOUNT",TO_CHAR(REQUESTDATE,'MM')"REQUESTDATE", CA.DESCR "description"
FROM PC_PAYMENTTRXNLOG PC,GLB_TYPE ca, PC_ESERVICEINQUIRY e
where PC.ESERVICE_ID = E.ID
AND trunc(REQUESTDATE) between trunc(to_date('2012-01-01','yyyy-mm-dd')) and trunc(to_date('2012-06-30','yyyy-mm-dd'))
GROUP BY TO_CHAR(REQUESTDATE,'MM'),CA.DESCR
)
group by "REQUESTDATE","description"
and the output
SUM("amount") Requestdate Description
2550405 04 A
2550405 04 B
23893281 05 C
614977 06 A
614977 06 E
2550405 04 C
now after updated the query to be
select sum("AMOUNT"), month,"description"
from(
SELECT SUM(e.AMOUNT)"AMOUNT",TO_CHAR(REQUESTDATE,'MM')"REQUESTDATE", CA.DESCR "description"
FROM PC_PAYMENTTRXNLOG PC,GLB_TYPE ca, PC_ESERVICEINQUIRY e
where PC.ESERVICE_ID = E.ID
AND trunc(REQUESTDATE) between trunc(to_date('2012-01-01','yyyy-mm-dd')) and trunc(to_date('2012-06-30','yyyy-mm-dd'))
GROUP BY TO_CHAR(REQUESTDATE,'MM'),CA.DESCR
)
full outer join (select to_char(date '1970-01-01'
+ numtoyminterval(level - 1, 'month'), 'mm') as month
from dual
connect by level <= 12) on month="REQUESTDATE"
group by month,"description"
when run the query run it displaying all the months regardless of their measures zero or have values.
BUT now the output is like that
location jan feb mar apr may jun jul aug sep oct nov dec
london 500 62 200 50 0 60 100 46 89 200 150 210
paris 50 26 20 500 50 70 40 200 0 40 250 50
null 0 0 0 0 0 0 0 0 0 0 0 0
how i can restrict/hide the last null row?
have not tested it.. but try something like this
select sum("AMOUNT"), month,"description"
from(SELECT SUM(e.AMOUNT)"AMOUNT",TO_CHAR(REQUESTDATE,'MM')"REQUESTDATE", CA.DESCR "description"
FROM PC_PAYMENTTRXNLOG PC,GLB_TYPE ca, PC_ESERVICEINQUIRY e
where PC.ESERVICE_ID = E.ID
AND trunc(REQUESTDATE) between trunc(to_date('2012-01-01','yyyy-mm-dd')) and trunc(to_date('2012-06-30','yyyy-mm-dd'))
GROUP BY TO_CHAR(REQUESTDATE,'MM'),CA.DESCR
)
full outer join (select to_char(date '1970-01-01'
+ numtoyminterval(level - 1, 'month'), 'mm') as month
from dual
connect by level <= 12) on month="REQUESTDATE"
group by month,"description"
click here for SQL Fiddle demo to generate 1 to 12 in Oracle
Once you have generated this.. full outer join Your main query with this series query and take month from series query as I did in main query.
Using this query you will get all the data for all months with null values in measure.
For Description column - iReport set property's isRemoveLineWhenBlank and isBlankWhenNull to True, This will remove the null value being printed in iReport
For Measure use Print when expression in such a way, when ever description is null then return false. So this will prevent the value 0 being printed in iReport.
I have a data looks like:
df = pd.DataFrame( np.random.randn(140,13),columns=['Year', 'Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec'])
df['Year']=np.arange(1876,2016)
df.head()
Out[54]:
Year Jan Feb Mar Apr May Jun Jul \
1 1877 -0.341183 -2.369659 -0.301529 1.268756 0.291787 -0.433796 1.846660
2 1878 0.015547 -1.248171 -0.961130 -2.473062 -1.227789 -0.291215 -0.552831
3 1879 -1.643790 0.238561 1.120954 0.273184 -2.255050 0.189526 -0.528215
4 1880 1.800950 0.900657 -1.785493 -0.505400 -0.909594 0.829114 0.310907
Aug Sep Oct Nov Dec
0 -0.540807 1.041048 -0.392727 0.526774 0.482579
1 0.087704 1.520229 0.008850 -0.052644 1.255057
2 0.475701 -0.402313 0.860482 -1.331818 1.248075
3 1.746745 -0.362812 -0.357801 -1.649273 -0.884970
4 1.064974 -2.636122 0.300357 0.523165 1.047123
I want to transform it into a single column data with index being year-month . I try to stack my original data but it becomes a time series, which has the year mix with my values.
df=df.stack()
df
Out[60]:
0 Year 1876.000000
Jan -1.375433
Feb 0.115271
Mar 0.160305
Apr 0.962201
May -1.170467
Jun -0.312078
Jul -1.046972
Aug -0.540807
Sep 1.041048
Oct -0.392727
Nov 0.526774
Dec 0.482579
1 Year 1877.000000
Jan -0.341183
...
What I really want looks like:
result=pd.DataFrame(data=np.random.randn(10,1),columns=['values'],index=pd.date_range('1876/1/1',periods=10,freq='BM'))
result.head()
Out[58]:
values
1876-01-31 0.593254
1876-02-29 0.777550
1876-03-31 -1.777443
1876-04-28 -0.880476
1876-05-31 -1.698800
set_index to Year first, and then stack.
# data
# =====================
Year Jan Feb Mar Apr ... Aug Sep Oct Nov Dec
0 1876 1.8309 0.6724 0.6230 0.3548 ... 0.6316 0.7837 -0.0132 -0.3274 -0.0795
1 1877 1.1363 -2.5042 1.8929 -0.2806 ... 2.0662 0.5430 -0.2887 1.2593 0.6788
2 1878 -0.4730 -1.3182 1.2255 1.1420 ... -0.3064 -1.0505 0.8774 -0.7551 1.0743
3 1879 -0.6651 -0.1462 0.5634 1.7074 ... 0.1588 0.8856 -2.9899 -0.2085 0.3358
4 1880 -0.1305 1.2971 -0.6043 -1.1446 ... 0.7274 -0.8798 0.0978 -0.7801 -1.7695
5 1881 0.0165 -0.6090 -0.2994 -0.5597 ... -1.3628 0.6206 1.4357 1.1800 -1.8132
6 1882 -0.3365 -0.0699 -1.2027 -0.4825 ... -0.3016 1.7806 0.9992 -1.4172 0.4250
7 1883 0.7963 -1.1474 0.8532 -0.9619 ... -0.8057 -1.0750 -0.5305 0.3533 -0.0818
.. ... ... ... ... ... ... ... ... ... ... ...
132 2008 -0.0440 -2.2967 -1.0145 0.1504 ... -0.4940 0.2150 0.2712 0.5997 0.2958
133 2009 -0.2410 -0.6169 1.1429 0.1749 ... 0.8128 0.9391 1.1312 -0.0915 1.1761
134 2010 0.8155 0.3567 1.1648 0.7068 ... -0.8204 -0.3549 1.5648 -0.2102 1.6549
135 2011 0.4847 -0.4535 0.5300 -0.8678 ... -0.2837 0.8821 1.1700 0.0899 -0.5830
136 2012 0.1835 0.9730 -0.7666 -1.0301 ... 0.3203 -0.2747 -1.8450 0.0942 0.2149
137 2013 0.2517 0.8293 1.9907 -1.0461 ... -0.3113 0.7177 0.8896 0.2329 2.0546
138 2014 -1.6106 -1.3285 -0.1870 0.2511 ... -0.3264 1.3578 1.5639 -1.3799 -1.1196
139 2015 -2.0050 0.3680 -0.5553 -0.6471 ... 0.6217 -0.0965 1.3019 -1.0420 -1.3107
[140 rows x 13 columns]
# processing
# =================================
df.set_index('Year').stack()
Year
1876 Jan 1.8309
Feb 0.6724
Mar 0.6230
Apr 0.3548
May 1.4329
Jun -0.3263
Jul 1.7276
Aug 0.6316
...
2015 May -0.5075
Jun -1.4982
Jul -1.9434
Aug 0.6217
Sep -0.0965
Oct 1.3019
Nov -1.0420
Dec -1.3107
dtype: float64
I have this query
With
NoOfOrder as
(
SELECT Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,Jan,Feb,Mar
FROM
(
select LEFT(datename(month,InvoiceDate),3) mon,InvoiceNo as InvoiceNo
from tbl_InvoiceMain ,tbl_OrderMain,tbl_CompanyMaster
where tbl_InvoiceMain.OrderID = tbl_OrderMain.OrderID
and (CAST(tbl_InvoiceMain.InvoiceDate AS date) BETWEEN tbl_CompanyMaster.YearStart AND tbl_CompanyMaster.YearEnd)
) P
PIVOT (count(InvoiceNo)for mon in (Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec)) PV
),
OnTime as
(
SELECT Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec,Jan,Feb,Mar
FROM
(
select LEFT(datename(month,InvoiceDate),3) mon,InvoiceNo as InvoiceNo
from tbl_InvoiceMain ,tbl_OrderMain,tbl_CompanyMaster
where tbl_InvoiceMain.OrderID = tbl_OrderMain.OrderID
and (CAST(tbl_InvoiceMain.InvoiceDate AS date) BETWEEN tbl_CompanyMaster.YearStart AND tbl_CompanyMaster.YearEnd)
and CAST(tbl_InvoiceMain.InvoiceDate AS date) <= CAST(tbl_OrderMain.ScheduledDispatchDate AS date)
) P
PIVOT (count(InvoiceNo)for mon in (Jan,Feb,Mar,Apr,May,Jun,Jul,Aug,Sep,Oct,Nov,Dec)) PV
)
Select * From NoOfOrder
union all
Select * From OnTime
It gives this result:
Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar
18 35 39 52 32 47 47 22 14 0 0 0
9 10 16 22 6 11 19 10 5 0 0 0
Here is my expected result
Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar
NoOfOrder 18 35 39 52 32 47 47 22 14 0 0 0
OnTimeDelivered 9 10 16 22 6 11 19 10 5 0 0 0
DeliverPerformance% 50.00 28.57 41.03 42.31 18.75 23.40 40.43 45.45 35.71 0.00 0.00 0.00
The formula for DeliverPerformance is:
DeliverPerformance% = (OnTimeDelivered/NoOfOrder) X 100
How do I achieve this result on the next row?
for reference you check my question in good format
enter link description here
My immediate suggestion is to combine everything first prior to pivoting the results.
Your first query might look like this:
SELECT
LEFT(datename(month, InvoiceDate), 3) InvMon,
SUM(1) AS NoOfOrder,
SUM(CASE WHEN CAST(tbl_InvoiceMain.InvoiceDate AS date) <= CAST(tbl_OrderMain.ScheduledDispatchDate AS date) THEN 1 ELSE 0 END) OnTimeDelivered
FROM tbl_InvoiceMain, tbl_OrderMain, tbl_CompanyMaster
WHERE tbl_InvoiceMain.OrderID = tbl_OrderMain.OrderID
AND (CAST(tbl_InvoiceMain.InvoiceDate AS date) BETWEEN tbl_CompanyMaster.YearStart AND tbl_CompanyMaster.YearEnd)
GROUP BY LEFT(datename(month, InvoiceDate), 3)
Note that I'm always counting every invoice record and optionally counting the "on time" invoices with that CASE statement and the respective SUM functions.
My next thought is to put that query in a CTE and then the statement that uses that CTE will do the additional calculation like so:
SELECT InvMon, NoOfOrder, OnTimeDelivered, ((OnTimeDelivered / NoOfOrder) * 100) DeliverPerformance ...
And finally, that's what I pivot.