I'm trying to pull product search colors from urls in my bigquery data, but the urls format changed at one point so I'm trying to pull them from two different formats.
The first one is like /someproduct/color/blue/color/red and the second one is like /someproduct/?colors=blue,red.
In both these examples, the person filtered on multiple colors, and I would like it the output of the extraction to be like 'blue,red' and all in one column.
I can pull the first one with:
REGEXP_EXTRACT_all(url,r'color/([A-Za-z]+)')
and the second one with:
REGEXP_EXTRACT(url,r'colors=(\w*,\w*)')
The first one produces an array and I don't know how to get all the outputs to show as one line item in one row. A person can filter up to 20 colors, so there can be a lot of repeating within both of the URL types.
The second one also can use some improving as I have to add in an additional \w*, for each additional color filtered on, and I have a case statement that handles that, but I don't think it's the most efficient way to handle it.
Consider below approach
select url,
coalesce(regexp_extract(url, r'colors=(\w*,\w*)'), array_to_string(regexp_extract_all(url,r'color/([A-Za-z]+)'), ',')) colors
from your_table
if applied to sample data in your question - output is
Related
So I'm working with an old joomla site and trying to export only useful data, and I found a pretty hacky way of accomplishing what I wanted to do.
So for the column I want to get to, images, even though there's been no useful information input into it, joomla automatically populates it with the following:
{"image_intro":"","float_intro":"","image_intro_alt":"","image_intro_caption":"","image_fulltext":"","float_fulltext":"","image_fulltext_alt":"","image_fulltext_caption":""}
I Ideally only want to search for fields where the value for image_fulltext is not empty, and only get the value of the field.
My original idea was this hacky snippet just to find the relevant rows at least.
SELECT id, images FROM `JOOMLAPREFIX_content` WHERE images !='{"image_intro":"","float_intro":"","image_intro_alt":"","image_intro_caption":"","image_fulltext":"","float_fulltext":"","image_fulltext_alt":"","image_fulltext_caption":""}'
This gets the me the relevant rows, but it also gets me more information than I want. (The first suggested solution, while a much less "hackey" query, gets me the same result, as you can see in the SQL snippet.)
Id Images
7 {"image_intro":"","float_intro":"","image_intro_alt":"","image_intro_caption":"","image_fulltext":"images\/featuredimage1832014.jpg","float_fulltext":"","image_fulltext_alt":"","image_fulltext_caption":""}
Is there a way to target only the value of image_fulltext, in the case of this row: "images/featuredimage1832014.jpg"?
SQL Fiddle
Try this:
SELECT id, images FROM `JOOMLAPREFIX_content` WHERE images NOT LIKE '%"image_fulltext":""%'
So, I'm constantly being given data in new and different formats. I'm on a crusade to get my work to standardize data for easy use, and if I managed to convince the powers that be to standardize data, this problem becomes entirely moot. Until then, I have the following problem:
I get data in a variety of ways. Sometimes my gross sales are called total sales. Sometimes gross sales before discounts, total sales before discounts, Gross_Sales, etc. Discounts, deductions, exempt amounts, etc. form another column. So on and so forth. I'd like to be able to do the following:
1) Figure out what columns I want,
2) Turn those columns into a pivot table.
For part 1, I have two options, and I'm wondering if there's anymore: The 1st is to use Microsoft's fuzzy-matching add-in to help me match. I'd have a separate tab dedicated to fuzzy matching each column I need. The second is to just generate a long list of all the variants, and to test each one until I find a hit, assign it, and move onto testing the next one.
The second part is turning all of this into a pivot table - the resouces I have so far are https://www.thespreadsheetguru.com/blog/2014/9/27/vba-guide-excel-pivot-tables and How to Create a Pivot Table in VBA
Is there a better method? Is there another way?
Edit: Slightly better method - Grab the data columns, place them into a table, and pivot everything off of that table - it removes the need to re-create pivot tables, just need to move the data over.
Having the same problem, I use a mix of your two methods.
My data consists of a bunch of logs for rejected x-ray images, and the reject reason is a free text field. My solution was to create a table where the first column contains my desired output categories, and then each subsequent column contains a different variation of it.
For example, a row might have (column one/ouput first entry):
Positioning, POS, Positioning Error, Patient Positioning
Note that these are all fairly different from each other. Where the fuzzy matching comes in - it is used to capture all the smaller differences and mispellings around those other columns. When the fuzzy matching section decides a given reason matches a column's entry, it is then replaced with the appropriate desired output reason from column 1 of the table. In my example, a reason of 'Possitioning Err' [sic] would match to column 3 (Positioning Error) and then get converted to Positioning.
Then wash rinse repeat over the rest of your data as needed. This approach was super useful and fairly flexible in helping standardize my data. It was also computationally more expensive, but you'd only need to run the matching portion once I guess.
As for the actual mechanics of going about doing this - I use 2010, so no inbuilt functionality. I run the fuzzy matching code on a temporary worksheet until best percentage matches are found, and then overwrite the actual source data afterwards.
I'm looking to compare two datasets with each other. In an ideal world, I'd like to have it to show a green item if the data matches between the two. I have created two different GDocs files to get the code out there, to prevent SO from dinging me on formatting.
The first dataset is from our program itself, it pulls everything from our application, and displays the information, based on company code. The second dataset is from an external source requiring validation. The main fields I am matching are "NPI Number (Type 1)" from DS1 vs. "NPI" from DS2. If there is a match to highlight in green the row from both sides of data.
Dataset 1
Dataset 2
You may need to use LookUp function and set that as a expression to fill the background color of a text box or row of a table
Sample Expression: =iif(Len(Lookup(Fields!NPI.Value, Fields!NPI.Value, Fields!ProviderName.Value, "DS1"))>0,"Green","Red")
I have created a sample here. Download entire content and run it.
The best way I can explain my problem is by showing a few screenshots.
I need to turn data like this:
[
Into something that displays like this:
After Data
There are multiple part numbers in the file, and I need the macro to take all the data from a matching part number and transform the data into what is displayed in the second image. All the part numbers are grouped with their data together, so it wouldn't need to run the loop through the top every single time, but adding to the entries with each new piece of data. Something also needs to be done for the years as well, because the way the data is presented, is in a range of years, and I need an entry for each year in that range.
Additional Information:
I am using this data for prep for category data for a BigCommerce site, that is working with a year/make/model plugin on the site, to create a vehicle lookup system. Thus in order for the user to look up their vehicle accurately the categories need to be listed the way they are in the second picture, which needs to be the result of the macro.
I thank anyone who takes the time to look into this, it will cut down the time I spend doing this manually by a huge amount.
You can do this with a formula (without actual VBA):
In cell F2 write: ="YMM/"&C2&"/"&D2&"/"&E2&";"
In cell F3 write: =F2&"YMM/"&C3&"/"&D3&"/"&E3&";"
drag down the formula in F3 until the last row.
The last row will contain the entire string of all vehicles.
I just noticed you may have duplicate values. You can use the built in Remove Duplicates feature to remove those before using the above technique.
I have sales data that I'd like to plot on my chart. However, at a specific point in time, we had a change taking place I'd like to ensure is clearly visible in the chart, preferably by dividing the sales data (which is stored in a single SQL Server column) into two different chunks, which would allow me to then treat them as different data series.
I used to solve this in Excel by storing the post-event data in a different column (by simply dragging them to a different column), and thus I was able to treat them as a different series (the blue and green line in the chart below. The red and orange line are pre-event and post-event averages):
I'd like to reproduce this effect in SSRS, but am not sure how to tackle it. I've tried using an approach where I added two category groups, both pointing to the date-time column, and applying filters to them (one <= the cutoff date, the other >=).
I then added my sales data twice, with the idea I could somehow connect them to the individual category groups, but that does not seem possible.
Has anyone tried anything like this before, or would have a different approach to achieve what I'm trying to get?
Thanks!
I managed to get this to work, and figured I'd share how to do it.
My dataset contains a field called DATEKEY, which stores the date in the format YYYYMMDD. It's possible to use this in an expression and evaluate the date for a specific row. In case the expression evaluates to true, we display the value. If not, we display a blank string.
In case we want to show the values prior to the date, the expression would be:
=IIF(Fields!DATEKEY.Value <= 20130601, Avg(Fields!My_NUMBER.Value), "")
The second series can then be made by reversing the symbol:
=IIF(Fields!DATEKEY.Value >= 20130601, Avg(Fields!My_NUMBER.Value), "")
The graph then looks like this: