ABL Progress 4gl : For Each with Count in Output-Stream - sql

Progress-Procedure-Editor:
DEFINE STREAM myStream.
OUTPUT STREAM myStream TO 'C:\Temp\BelegAusgangSchnittstelle.txt'.
FOR EACH E_BelegAusgang
WHERE E_BelegAusgang.Firma = '000'
AND E_BelegAusgang.Schnittstelle = '$Standard'
NO-LOCK:
PUT STREAM myStream UNFORMATTED
STRING(E_BelegAusgang.Firma)
'|'
STRING(E_BelegAusgang.BelegNummer)
'|'
STRING(E_BelegAusgang.Schnittstelle)
'|'
SKIP
.
END.
I get this (extraction):
Firma | BelegNr | Schnittstelle
000 | 3 | $Standard
000 | 3 | $Standard
000 | 3 | $Standard
000 | 3 | $Standard
000 | 3 | $Standard
000 | 8 | $Standard
000 | 8 | $Standard
What I need is to COUNT the BelegNr. So I import the data of the TXT to SQL Server.
On Server my query is:
SELECT [BelegNr]
,COUNT(*) AS [Anzahl]
FROM [TestDB].[dbo].[Beleg_Ausgang]
GROUP BY [BelegNr]
ORDER BY [Anzahl]
With that query I got (extraction):
BelegNr Anzahl
3 | 5
8 | 2
Is there a way to put the COUNT directly into the Progress-Code? I mean, I want my result directly from the Progress-Procedure-Editor.

In ABL you use BREAK BY instead of GROUP BY. One limit is that BREAK BY groups AND sorts.
You could for instance have another "FOR EACH" for this:
DEFINE VARIABLE iCount AS INTEGER NO-UNDO.
FOR EACH E_BelegAusgang NO-LOCK
WHERE E_BelegAusgang.Firma = '000'
AND E_BelegAusgang.Schnittstelle = '$Standard'
BREAK BY BelegNr:
iCount = iCount + 1.
IF LAST-OF(BelegNr) THEN DO:
DISPLAY BelegNr iCount.
iCount = 0.
END.
END.
You could also incorporate that code in the export but note: that will change the order of the file rows. Maybe that's a problem for you, maybe not!

Related

Confusing matching behaviour of pandas extract(all)

I have a strange problem. But first, I want to match a hierarchy-based string onto the value of a column in a pandas data frame and count the occurrence of the current node and all of its children.
| index | hierarchystr |
| ----- | --------------------- |
| 0 | level0level00level000|
| 1 | level0level01 |
| 2 | level0level02level021|
| 3 | level0level02level021|
| 4 | level0level02level020|
| 5 | level0level02level021|
| 6 | level1level02level021|
| 7 | level1level02level021|
| 8 | level1level02level021|
| 9 | level2level02level021|
Assume that there are 300k lines. Each node can have multiple children with again multiple children so on and so forth (here represented by level0-2 strings). Now I have a separate hierarchy where I extract the hierarchy strings from. Now to the problem:
#hstrs = ["level0", "level1", "level0level01", "level0level02", "level0level02level021"]
pat = "|".join(hstrs)
s = df.hierarchystr.str.extract('(' + pat + ')', expand=True)[0]
df1 = df.groupby(s).size().reset_index(name='Count')
df1 = df1[df1 > 200]
size = len(df1)
The size of the found matched substrings with occurrence greater than 200 differ every RUN! "level0" should match every row where the hierarchy str level0 is included and should build a group with all its subchildren and that size needs to be greater than 200.
Edit:// levelX is just an example, i have thousands of nodes, with different names and again thousands of different subchilds. The hstrs strings do not include each other, besides the parent nodes. (E.g. "parent1" is included in "parent1subchild1" and "parent1subchild2")
I traced it back to a different order of the hierarchy strings in the array hstrs. So I changed the code and compare each substring individually:
for hstr in hstrs:
s = df.hierarchystr.str.extract('(' + hstr + ')', expand=True)
s2 = s.count()
s3 = s2.values[0]
if s3 > 200:
list.append(hstr)
This is slow as hell, but the result sticks the same, no matter which order hstrs has. But for efficiency is it possible to do the same with only one regex matching group, all at once for all hstrs?
Edit://
expected output would be:
|index| 0 | Count |
|-----|---------------------|-------|
|0 |level0 | 5 |
|1 |level1 | 3 |
|2 |level0level01 | 1 |
|3 |level0level02 | 4 |
|4 |level0level02level021| 3 |
Edit2://
it has something to do with the ordering of hstrs. I think with the match and stop after the first match the behavior of the extract method. If the ordering is different the hierarchy strings in the pat will be matched differently which results in different sizes of each group. A high hierarchy (short str) will be matched first, the lower hierarchy levels in the same pat won't be matched again. But IDK what to do against this behavior.
Edit3://
an alternative would be, but is also slow as hell:
for hstr in hstrs:
s = df[df.hierarchystr.str.contains(fqn)]
s2 = s.count()
s3 = s2.values[0]
if s3 > 200:
beforeset.append(fqn)
Edit4://
I think what I am searching for is the opportunity to do a "group_by" with "contains" or "is in" for the hstrs. I am glad for every Idea. :)
Edit5://
Found a simple, but not satisfying alternative (but faster than the previous tries):
containing =[item for hierarchystr in df.hierarchystr for item in hstrs if item in hierarchystr]
containing = Counter(containing)
df1 = pd.DataFrame([containing]).T
nodeNamesWithOver200 = df1[df1 > 200].dropna().index.values

Find referenced value of multiple columns

I have a table Setpoints which contains 3 columns Base,Effective and Actual which contains an id that refers to the item found in io.
I would like to make a query that will return the io_value found in the io table for the id found in Setpoints.
Currently my query will return multiple id's and then I query the io table to find the io_value for each id.
Ex Query returning the ID's in the row
row # | base | effective | actual
1 | 24 | 30 | 40
2 | 25 | 31 | 41
3 | 26 | 32 | 42
But i want it return the value instead of the id
Ex returning the value for the id's instead
row # | base | effective | actual
1 | 2.3 | 4.5 | 3.44
2 | 4.2 | 7.7 | 4.41
3 | 3.9 | 8.12 | 5.42
Here are the table fields
IO
io_value
io_id
Setpoints
stpt_base
stpt_effective
stpt_actual
Using postgres 9.5
What Im using now
SELECT * from setpoints
For each row
SELECT io_id, io_value
from io
where io_id in
(stpt_effective, stpt_actual, stpt_base);
// these are from previous query
You can solve this by joining the io table three times to the setpoints table, using the three columns in each individual JOIN:
SELECT a.io_value AS base,
b.io_value AS effective,
c.io_value AS actual
FROM setpoints s
JOIN io a ON a.io_id = s.stpt_base
JOIN io b ON b.io_id = s.stpt_effective
JOIN io c ON c.io_id = s.stpt_actual;

FormulaArray not averaging out all the specified entries

Table 1:
G H I J K
| Lane | Bowler | Score | Score | Score | 1
|:-----------|------------:|:------------:|:------------:|:------------:|
| Lane 1 | Thomas| 100 | 100 | 100 | 2
| Lane 2 | column | 200 | 200 | 100 | 3
| Lane 3 | Mary | 300 | 300 | 100 | 4
| Lane 1 | Cool | 150 | 400 | 100 | 5
| Lane 2 | right | 160 | 500 | 100 | 6
| Lane 9 | Susan | 170 | 600 | 100 | 7
say I want to find the average for each Lane that appeared in table 2 and put them in column O:
Table 2:
N O
| Lane | Average | 1
|:-----------|------------:|
| Lane 1 | | 2
| Lane 2 | | 3
| Lane 3 | | 4
I would put
=AVERAGE(IF(N2=$G$2:$G$7, $I$2:$K$7 )) for lane 1 (put this formula on cell "O2")
=AVERAGE(IF(N3=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O3")
=AVERAGE(IF(N4=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O4")
My first question is
What if I want to find the Average of ALL the lane together that appear in table 2. So average of Lane 1, Lane 2 and Lane 3 together (but not other lane, such as lane 9).
My attempt:
= Average(IF(G2:G7 = N2:N4, I2:K:7)) why doesn't this work?
My second question is
I have done the "average of each individual Lane" using vba:
.
Dim i As Integer
For i = 2 To 4
Cells(i, 15).FormulaArray = "=AVERAGE(IF(RC[-1]=R2C7:R7C7,R2C9:R7C12))"
Next i
.
What if I have done it using vba without the .formula method
For Lane 1 only:
pseudo code:
Loop from G2 to G7
If cell (N1) = Gx then //x: 2 to 7
Sum = Sum + Ix + Jx + Kx
}
Average = Sum/totalEntries
Would this be slower than if I were to use the build in .formula? is there a advanage to doing it this way instead?
The answer to the first question about why this FormulaArray
= Average(IF(G2:G7 = N2:N4, I2:K7)) doesn't work?
Is implicit on how this other FormulaArray works:
= AVERAGE( IF( $G$7:$G$12 = $N7, $I$7:$K$12 ) )
Let’s see how each part of this “single-cell formula array” works:
1st part: $G$7:$G$12 = $N7
The first part of the formula generates an array with the records from range $G$7:$G$12 complying with the condition = $N7. Fig. 1 shows the first part of the FormulaArray in as a “multi-cell formula array”.
2nd Part: $I$7:$K$12
The result of the first part is applied to the second part to obtain the range of scores complying with the condition = $N7 (see Fig. 2)
3rd part: AVERAGE
Finally the last part of the formula calculates the average of the scores complying with the condition = $N7
Now let’s try to apply the same analysis to the formula:
= AVERAGE( IF( G2:G7 = N2:N4, I2:K7 ) )
Unfortunately, we cannot go beyond the first part G2:G7 = N2:N4 as it fails trying to compare two arrays of different dimensions thus resulting in #N/A (see Fig. 3)
However, even if the arrays have same dimension the result would not have shown the duplicated values, as the members are compared one to one (see Fig. 4)
To obtain the average for Lanes 1 to 3 use this FormulaArray
=AVERAGE( IF(
( $G$7:$G$12 = $N7 ) + ( $G$7:$G$12 = $N8 ) + ( $G$7:$G$12 = $N9 ),
$I$7:$K$12 ) )
It generates an array with the records complying with the conditions = $N7 + = $N8 + = $N9 (+ equivalent to operator OR)
As regards the second question:
Performance is intrinsically associated to maintenance and efficiency.
The sample procedure just enters a formula which is hard coded and only works for this particular case, for example:
If needed to change the formulas to expand the ranges, the macro has to be updated, it may still have to change the formula but no need to open the VBA editor.
If any of the columns before column G get deleted as it becomes obsolete, the macro needs to be updated, while the formulas will not require any maintenance as they are automatically updated.
In reference to the macro without the .Formula method
I found this redundant, as it’s like writing an algorithm to do something that can be done efficiently and accurately with an existing function, as such a macro will not bring anything that's it's not there actually.
I'll consider the advantage of writing such a procedure in a situation in which the workbook is very large and it heavily uses resource significantly slowing down the performance of the workbook, however the advantages to be delivered by the procedure will not reside and just writing the formulas but it must calculate the results and enter the values resulting from the formulas instead of the formulas thus making the workbook light, fast and smooth to the end user.
To get the average of them all, just use
=AVERAGE(I2:K7)
As to the VBA, as it is all done on the same lines, could you just use
For i = 2 To 7
Cells(i,"O").Value = Application.Sum(Range(Cells(i,"I"),Cells(i,"K")))
Next i

Multiples of a number oracle SQL

How do you check if the content of a field is a multiple of 2 in oracle sql
n_id | n_content
-------------------------
1 | Balloon
2 | Drill
3 | Cup
4 | Bottle
5 | Pencil
6 | Ball
I have tried:
select*from num_w where (n_id%2>0);
and also
select*from num_w where n_id%2=0;
Neither of these worked
Wasn't this multiple of 5 a few seconds ago? :)
Anyway, in Oracle I believe the query would be:
select * from num_w where MOD(n_id,2) = 0;

Convert multi-dimensional array to records

Given: {{1,"a"},{2,"b"},{3,"c"}}
Desired:
foo | bar
-----+------
1 | a
2 | b
3 | c
You can get the intended result with the following query; however, it'd be better to have something that scales with the size of the array.
SELECT arr[subscript][1] as foo, arr[subscript][2] as bar
FROM ( select generate_subscripts(arr,1) as subscript, arr
from (select '{{1,"a"},{2,"b"},{3,"c"}}'::text[][] as arr) input
) sub;
This works:
select key as foo, value as bar
from json_each_text(
json_object('{{1,"a"},{2,"b"},{3,"c"}}')
);
Result:
foo | bar
-----+------
1 | a
2 | b
3 | c
Docs
Not sure what exactly you mean saying "it'd be better to have something that scales with the size of the array". Of course you can not have extra columns added to resultset as the inner array size grows, because postgresql must know exact colunms of a query before its execution (so before it begins to read the string).
But I would like to propose converting the string into normal relational representation of matrix:
select i, j, arr[i][j] a_i_j from (
select i, generate_subscripts(arr,2) as j, arr from (
select generate_subscripts(arr,1) as i, arr
from (select ('{{1,"a",11},{2,"b",22},{3,"c",33},{4,"d",44}}'::text[][]) arr) input
) sub_i
) sub_j
Which gives:
i | j | a_i_j
--+---+------
1 | 1 | 1
1 | 2 | a
1 | 3 | 11
2 | 1 | 2
2 | 2 | b
2 | 3 | 22
3 | 1 | 3
3 | 2 | c
3 | 3 | 33
4 | 1 | 4
4 | 2 | d
4 | 3 | 44
Such a result may be rather usable in further data processing, I think.
Of course, such a query can handle only array with predefined number of dimensions, but all array sizes for all of its dimensions can be changed without rewriting the query, so this is a bit more flexible approach.
ADDITION: Yes, using with recursive one can build resembling query, capable of handling array with arbitrary dimensions. None the less, there is no way to overcome the limitation coming from relational data model - exact set of columns must be defined at query parse time, and no way to delay this until execution time. So, we are forced to store all indices in one column, using another array.
Here is the query that extracts all elements from arbitrary multi-dimensional array along with their zero-based indices (stored in another one-dimensional array):
with recursive extract_index(k,idx,elem,arr,n) as (
select (row_number() over())-1 k, idx, elem, arr, n from (
select array[]::bigint[] idx, unnest(arr) elem, arr, array_ndims(arr) n
from ( select '{{{1,"a"},{11,111}},{{2,"b"},{22,222}},{{3,"c"},{33,333}},{{4,"d"},{44,444}}}'::text[] arr ) input
) plain_indexed
union all
select k/array_length(arr,n)::bigint k, array_prepend(k%array_length(arr,2),idx) idx, elem, arr, n-1 n
from extract_index
where n!=1
)
select array_prepend(k,idx) idx, elem from extract_index where n=1
Which gives:
idx | elem
--------+-----
{0,0,0} | 1
{0,0,1} | a
{0,1,0} | 11
{0,1,1} | 111
{1,0,0} | 2
{1,0,1} | b
{1,1,0} | 22
{1,1,1} | 222
{2,0,0} | 3
{2,0,1} | c
{2,1,0} | 33
{2,1,1} | 333
{3,0,0} | 4
{3,0,1} | d
{3,1,0} | 44
{3,1,1} | 444
Formally, this seems to prove the concept, but I wonder what a real practical use one could make out of it :)