Delimiter to separate long string - sql

In Ax Dynamics 365, GL financial dimensions are getting stored in the following manner.
41110-GTC-R-West-JD-WJED014-R0101-1410-WJED014-SAL--
11410------R0102-----
I want to use delimiter function with ‘-‘ to separate dimensions. Can we create function to separate all the dimensions.
I need the output as follows:
11411 GTC R West JD WJED014 R0101 1410 WJED014 SAL
11410 - - - - - R0102 - - -
Thanks in advance

Related

Extract dimensions from a pandas column excluding year

I have a Sku master description and I want to extract just dimensional numbers (volume / weight related) and exclude all other charactor & figure combinations that are not relevant such as "year" , "N5", "number of products bought"
This is my code so far
def find_number(text):
num = re.findall(r'[^N\d+](\d+)(g|ml|mm|m|cm|dm)?',text)
return num
df['number']=df['SKU Master Description'].apply(lambda x: find_number(x))
SKU Master Description
numbers
LA MOUSSE 150ml
150ml
BLEU DE CHANEL PARFUM SPRAY 100ml / PARFUM POUR HOMME
100ml
N5 EAU DE PARFUM SPRAY 100ml
100ml
FOLD.MEDIUM S.BLACK GIFT BOX 2016 / FOLDABLE/SIZE 222.2x222.2x100.3mm
222.2x222.2x100.3mm
BLACK RIBBON 15mm ROLL 100m
15mm , 100m
12 PAPER BAGS SMALL SIZE / 140x50x120mm
140x50x120mm
The following regex matches your targeted parts and units.
\b\d[\dx.]*(?:ml|[cdm]?m)\b
See this demo at regex101
\b matches a word boundary
\d is a short for digit
(?: non capture group ) with alternation
[ character class ] matches one character from the listed
* repeats any amount of times, ? zero or one (optional)
The pattern above is not highly accurate but should get the job done.
More specific: \b\d+(?:\.\d+)?(?:x\d+(?:\.\d+)?)*(?:ml|[cdm]?m)\b

get prefix out a size range with different size formats

I have column in a df with a size range with different sizeformats.
artikelkleurnummer size
6725 0161810ZWA B080
6726 0161810ZWA B085
6727 0161810ZWA B090
6728 0161810ZWA B095
6729 0161810ZWA B100
in the sizerange are also these other size formats like XS - XXL, 36-50 , 36/38 - 52/54, ONE, XS/S - XL/XXL, 363-545
I have tried to get the prefix '0' out of all sizes with start with a letter in range (A:K). For exemple: Want to change B080 into B80. B100 stays B100.
steps:
1 look for items in column ['size'] with first letter of string in range (A:K),
2 if True change second position in string into ''
for range I use:
from string import ascii_letters
def range_alpha(start_letter, end_letter):
return ascii_letters[ascii_letters.index(start_letter):ascii_letters.index(end_letter) + 1]
then I've tried a for loop
for items in df['size']:
if df.loc[df['size'].str[0] in range_alpha('A','K'):
df.loc[df['size'].str[1] == ''
message
SyntaxError: unexpected EOF while parsing
what's wrong?
You can do it with regex and the pd.Series.str.replace -
df = pd.DataFrame([['0161810ZWA']*5, ['B080', 'B085', 'B090', 'B095', 'B100']]).T
df.columns = "artikelkleurnummer size".split()
replacement = lambda mpat: ''.join(g for g in mpat.groups() if mpat.groups().index(g) != 1)
df['size_cleaned'] = df['size'].str.replace(r'([a-kA-K])(0*)(\d+)', replacement)
Output
artikelkleurnummer size size_cleaned
0 0161810ZWA B080 B80
1 0161810ZWA B085 B85
2 0161810ZWA B090 B90
3 0161810ZWA B095 B95
4 0161810ZWA B100 B100
TL;DR
Find a pattern "LetterZeroDigits" and change it to "LetterDigits" using a regular expression.
Slightly longer explanation
Regexes are very handy but also hard. In the solution above, we are trying to find the pattern of interest and then replace it. In our case, the pattern of interest is made of 3 parts -
A letter in from A-K
Zero or more 0's
Some more digits
In regex terms - this can be written as r'([a-kA-K])(0*)(\d+)'. Note that the 3 brackets make up the 3 parts - they are called groups. It might make a little or no sense depending on how exposed you have been to regexes in the past - but you can get it from any introduction to regexes online.
Once we have the parts, what we want to do is retain everything else except part-2, which is the 0s.
The pd.Series.str.replace documentation has the details on the replacement portion. In essence replacement is a function that takes all the matching groups as the input and produces an output.
In the first part - where we identified three groups or parts. These groups are accessed with the mpat.groups() function - which returns a tuple containing the match for each group. We want to reconstruct a string with the middle part excluded, which is what the replacement function does
sizes = [{"size": "B080"},{"size": "B085"},{"size": "B090"},{"size": "B095"},{"size": "B100"}]
def range_char(start, stop):
return (chr(n) for n in range(ord(start), ord(stop) + 1))
for s in sizes:
if s['size'][0].upper() in range_char("A", "K"):
s['size'] = s['size'][0]+s['size'][1:].lstrip('0')
print(sizes)
Using a List/Dict here for example.

Compare Values in Excel in all sheets that have the same title/name

Hello fellow netizens of the programming community!
Need some help with a couple of excel worksheets that I'll be working forth day in and out.
Here's a some info of what I have on hand, and what I want to do..
I have several worksheets (10 of them) of similar (not 100% identical) tables ..
one worksheet may look like this (x, y and z are other unimportant variables)
name - score - x - y - z
jon - 50 - x - y - z
sam - 50 - x - y - z
pad - 50 - x - y - z
fed - 50 - x - y - z
mum - 50 - x - y - z
and another worksheet may look like this
name - score - x - y - z
pad - 50 - x - y - z
mum - 50 - x - y - z
fed - 50 - x - y - z
jon - 50 - x - y - z
sam - 50 - x - y - z
simply put, there are names such as 'jon' and the relevant scores that can occur across all the worksheets, or names such as 'ped' that can only appear once in one worksheet.
I would like to compare all the sheets at the same time, finding out the highest score for jon, sam, pad, fed, mum across all the sheets and have these information presented in another new sheet itself.
for e.g.
the new sheet should present data as
name - highest score
jon - 39
sam - 22
pad - 42
mum - 22
I hope whatever I'm trying to say is not confusing anyone! If anyone could help, ill greatly appreciate it!
The solution is available via pivot table formed across multiple sheets.
Steps:
Assuming your files is open, and you are in one of your sheets.
Alt+D+P (Opens a dialog box)
Select "Multiple consolidation ranges" and "Pivottable" >>Next
Select "Create a single page field for me" >>Next
Here, select in the Range box and Add, all the table areas one by one. Have one addition per sheet of yours. So you should have 10 ranges getting inserted here. >>Finish
You will get a pivot table where the Values field will have "Count of score" (it is mostly this, else it could be Sum of score, etc. Here's how you change it to max) Click on whatever is in the values field >> Value Field Settings >> Max >> Ok.
(If you want to choose min/average/count/product/stdDev, this is the place to make that change)
As a caveat, do check fields in Filters/Columns/Rows are as you want. I have run a sample pivot on the two tables you have provided, screenshot is attached, this is how the table should look like.
Sample Pivot screenshot on data provided

Combining between Sumif and subtotal with tables in Excel

I have the following table in Excel
A - B - C
amount - type - tag
4 - Debit - nice
5 - Credit - nice
32 - Debit - bad
31 - Credit - bad
for calculation of total I used the following formula:
=sumif([type],"Credit",[amount])-sumif([type],"Debit",[amount])
I got 0, which is right.
but then I filtered the table to show "nice" tags only, but the result didn't change to 1, it remained 0.
How can I solve this problem so that subtotal is calculated when values are filtered according to tag?
Finally, I found a formula that works!
It doesn't work in Google sheets though...
=SUMPRODUCT(SUBTOTAL(109,OFFSET(D$2:D$160,ROW(D$2:$D$160)-ROW(D$2),,1)),--(E$2:E$160="CREDIT"))-SUMPRODUCT(SUBTOTAL(109,OFFSET(D$2:D$160,ROW(D$2:$D$160)-ROW(D$2),,1)),--(E$2:E$160="DEBIT"))

Counting the number of datapoints within a Euclidean distance MS SQL

Have 2 data sets
list of 300 geocordinates
list of over 2million geocordinates
For each entry in list 1, I am trying to count the number of entries from list 2 that lie within 5 mile radius.
I've decided to use the euclidean distance as i am only dealing with relatively small distances.
Here is my code. It takes forever to run. Any suggestions on how I can improve the code.
Select
DistFilter.storenumber,
count(companynumber) as sohoCount
from
(Select
UKStoreCoord.storenumber,
UKStoreCoord.latitude as SLat,
UKStoreCoord.longitude as SLng,
SohoCoordinates.companynumber,
SohoCoordinates.latitude,
SohoCoordinates.longitude
from UKStoreCoord, SohoCoordinates
where abs(UKStoreCoord.latitude - SohoCoordinates.latitude)<0.1 and abs(SohoCoordinates.longitude - UKStoreCoord.longitude)<0.1
group by
UKStoreCoord.storenumber,
UKStoreCoord.latitude,
UKStoreCoord.longitude,
SohoCoordinates.companynumber,
SohoCoordinates.latitude,
SohoCoordinates.longitude) as DistFilter
where (((Distfilter.latitude - Distfilter.SLat) * 69) ^2 + ((Distfilter.longitude - Distfilter.SLng) * 46) ^2) <25
group by
DistFilter.storenumber
cheers