Separating columns ( array of arrays) - Advanced SQL looping - sql

I tried using a name that more accurately describes my question but msg said I am limited to 150 chars.
Looking for assistance from someone who has advanced SQL skills. Ideally I want to do it in SQL to let the computer do the work. Too much manual manipulation is ripe with the possibility of mistakes.
I've already searched for users groups within Google. All emails are being returned saying the email does not exist anymore.
What I am using appears to be a proprietary version of Dremel SQL / Google SQL, however, someone experienced in Dremel SQL will probably be able to guide me in the right direction.
BACKGROUND INFO:
Pulling a column that is an array column which holds another array (a notes column). I think maybe an array of arrays?
I have not figured a way to do what I am trying to do with Google or Dremel SQL yet.
So for now, I am doing it the hard way.
As originally pulled, the data looks like this [{Array of arrays}, {Array of arrays}, {Array of arrays}, etc., repeat... :
More specifically: [{4 or more text fields which could also hold numbers and separated by commas}, {another set of fields}, {another set of fields}...]
I.E. (this is all in just one column of data and hundreds of rows)
[
{"created":"1540236216969","notes": blah... blah... blah", "original_text_length":534, "User_email":"someone#emailaddress.com","user_shortname":"someone"},
{"created":"1540236216969","notes": blah... blah... blah", "original_text_length":1224, "User_email":"someone#emailaddress.com","user_shortname":"someone"},
{"created":"1540236216969","notes": blah... blah... blah", "original_text_length":1664, "User_email":"someone#emailaddress.com","user_shortname":"someone"}
...
]
The number of these is different for each row pulled and each has a specific ID #
A typical row of data is:
ID #, start_date, end_date, some other fields, notes_(the array field)
WHAT I AM DOING NOW is:
SQL data pull,
exporting to google sheets,
make separate tabs for the different array columns.
copying the notes column (the array column holding arrays) to a separate tab on Google Sheets, then
Split Text To Columns using the first curly brace "{" as the separater.
Here is where my dillema is.
Once pulled, I need to split all of those columns again to separate each of the individual elements in each array. Unable to Split text to Columns again with all of them highlighted. I can Split Text to Columns again one at a time but will really be a pain if I have to do that individually for each column and every row (hundreds of rows). Need to find a way to automate this.
I will also need to change each of unix dates to calendar dates within each array PLUS add rows to the spreadsheet depending on the number of columns from the first split. The columns are different for each row depending on how many notes have been added.
OR... do it with SQL (which appears to be a proprietary type of SQL similar to NoSQL but not the same). I have tried using syntax's for IBM SQL, Oracle SQL, SQL Server, and others found online but none work.
OR... do it with a looping function within Google Sheets.
Possibly re-add it to the database as a new table once both sets of arrays are completely split up.
END RESULT
ID#, date1, date 2, first created date (right now a unix date), first note, first other field, etc...
Then add a new row with
Same ID# from above, date1 from row above, date 2 from row above, next (2nd) created date (right now a unix date), 2nd note, 2nd other field, etc...
Add a new row...
3rd set of notes etc.

Related

Compare Two Rows and Update Start and End Dates

I need some help and I know I am not the only one to deal with this issue but I am wondering if you might have some ideas on how to handle the situation of comparing two rows of data filling out start and end dates.
To give you some context, we have a huge hierarchy (approx 8,000 rows and about 12 columns wide) that is updated each year. Sometimes the values change and sometimes they don’t. When the values don’t change, then I don’t need to adjust the dates. When the values do change and a new row is added, I need to change the data.
I have attached some fake data to try and illustrate my data. I am building this in MS Access, so I think this is more of a DBA type question that is going to be manipulated via a recordset type method.
In my example I have two tables – Old Table and New Table. In each table there is a routing code field that represents my join field and primary key for this table.
The Old table represents existing data - tblMain. The New Table represents the data to be appended - tblTemp.
To append the data, I have an append query set up in Access. I perform a left join between the Old and New tables, joining on every field and append the rows that are null in the Old table. That’s fine and that is not where my issue is.
What is causing me issue is how to fill out the start and end dates.
So as you can see from my tables, we are running a zoo. Let’s just say for the sake of the argument, our zoo started off pretty simple and has become more sophisticated. We now want our hierarchy to expand out and become a bit more detailed as we are now capturing the type of animal (Level 4) and the native location (Level 5).
As you can see when comparing one table to another the routing codes are the same, so the append query has to have a join on each field. When you do this, you return the Result Table which is essentially the Old and New tables stacked on top of each other. You might think about a Union query but this is going to give me duplicates and I don’t want that.
If you notice in the Result Table there is a Start and End Date. Let’s just say I get the start and end dates via message box that pops up upon the import of the data and is held in a variable. I think there are dates in my real data but still trying to verify this.
So how do I compare (pseudo code for the logic needed)?
• For each routing code:
Compare Levels 1-5
If the routing code is the same but Levels 1 -5 are not the same
fill out the end date of the old record
fill out the start date of the new record
This idea of comparing two records and filling out a data is quite prevalent in my organization but I haven’t found a way of creating the logic that consistently works so any help or suggestions would be appreciated.
Old Table
New Table
Result Table

Dynamically creating a pivot table using fuzzy matching

So, I'm constantly being given data in new and different formats. I'm on a crusade to get my work to standardize data for easy use, and if I managed to convince the powers that be to standardize data, this problem becomes entirely moot. Until then, I have the following problem:
I get data in a variety of ways. Sometimes my gross sales are called total sales. Sometimes gross sales before discounts, total sales before discounts, Gross_Sales, etc. Discounts, deductions, exempt amounts, etc. form another column. So on and so forth. I'd like to be able to do the following:
1) Figure out what columns I want,
2) Turn those columns into a pivot table.
For part 1, I have two options, and I'm wondering if there's anymore: The 1st is to use Microsoft's fuzzy-matching add-in to help me match. I'd have a separate tab dedicated to fuzzy matching each column I need. The second is to just generate a long list of all the variants, and to test each one until I find a hit, assign it, and move onto testing the next one.
The second part is turning all of this into a pivot table - the resouces I have so far are https://www.thespreadsheetguru.com/blog/2014/9/27/vba-guide-excel-pivot-tables and How to Create a Pivot Table in VBA
Is there a better method? Is there another way?
Edit: Slightly better method - Grab the data columns, place them into a table, and pivot everything off of that table - it removes the need to re-create pivot tables, just need to move the data over.
Having the same problem, I use a mix of your two methods.
My data consists of a bunch of logs for rejected x-ray images, and the reject reason is a free text field. My solution was to create a table where the first column contains my desired output categories, and then each subsequent column contains a different variation of it.
For example, a row might have (column one/ouput first entry):
Positioning, POS, Positioning Error, Patient Positioning
Note that these are all fairly different from each other. Where the fuzzy matching comes in - it is used to capture all the smaller differences and mispellings around those other columns. When the fuzzy matching section decides a given reason matches a column's entry, it is then replaced with the appropriate desired output reason from column 1 of the table. In my example, a reason of 'Possitioning Err' [sic] would match to column 3 (Positioning Error) and then get converted to Positioning.
Then wash rinse repeat over the rest of your data as needed. This approach was super useful and fairly flexible in helping standardize my data. It was also computationally more expensive, but you'd only need to run the matching portion once I guess.
As for the actual mechanics of going about doing this - I use 2010, so no inbuilt functionality. I run the fuzzy matching code on a temporary worksheet until best percentage matches are found, and then overwrite the actual source data afterwards.

Im trying to populate a Sharepoint list with the most upcoming dates from certain colums of data from another Sharepoint list?

I have a list named Employee Dates, this list contains the columns:
Employee | CPR Completed | CPR Required | ETC
These columns keep going on for all of the training courses required for our employees with alternating columns for completed and required dates. I am using a workflow to calculate all of the required dates of training from the completed dates.
What I desire to do is make another list that will look at ALL of the columns for the required dates and find the soonest ones and populate that list with the soonest dates and from which column it was pulled from.
Any help as to how to approach this? I have been trying to use queries in Access and also some of the custom view settings in SharePoint Designer but no luck so far.
You could try an Excel table (they also have these functions in access if I recall, but I avoid access like a plague). To connect Excel to share point follow the steps in this article:
support.office.com
Ok, now that we are connected you should see all of the columns and values in excel. Next up we need to find the min date (easy) and then get the associated column name (a little harder).
Min Date: The formula should be something along the lines of =min(B1,B3,B5), jut type in =Min( and then CTRL-click on the columns you want to consider for the row. When your done close with ). After wards double click on the square in the bottom right corner and it will do the same logic for all of the rows.
Column Name: A little more difficult, use the min value from the prior column as the lookup value for VlookUp to get the column name. After wards double click on the square in the bottom right corner and it will do the same logic for all of the rows. I'd explain VlookUp, but I'd run out of characters and attention span long before I got to the relevant parts, and excel functions does a fine job of getting you the basics.
Anyway hope that helps,

Summing different parts of a column in SQL

I have a database extract in excel and want to create a custom value in Tablue using their create calculation, which I believe is SQL based.
Basically I have a large number of feeds which all show up different amounts in a column. For example:
feed 1
feed 1
feed 2
feed 3
feed 4
feed 4
feed 4
And I want to have a sum for feed 1, feed 2, and feed 4. But in my actual DB there's about 100 feeds all with different number of appearances. I'm having troubles finding a good way to do this. If there even is one. Any help or direction would be appreciated!
I'm assuming that your list is a single column and you need a count of the number of occurrences of each feed. For the sake of example, since a column or table names were not supplied, let's call them colname and tablename.
select colname, count(*) as Ct from tablename group by colname
It would be easier to give an exact answer if you posted a small simplified subset of your spreadsheet. But assuming you have a column called "feed_name" which takes on values like "feed 1", "feed 2" etc depending on the row. Then the feed_name column should be a discrete dimension in Tableau.
Then just put the feed_name pill on a shelf, say the row shelf. And put the "Number of Records" field on another shelf, say the column shelf.
You don't need to write SQL to do this (or most tasks) in Tableau. It helps to understand SQL concepts and its very helpful to drop down to the SQL level when needed to solve tricky issues. But for most situations, you can just interactively explore the data by moving fields around and writing some simple calculations -- and let Tableau take care of generating the SQL necessary to retrieve the data needed to build the visualization you requested.
Tableau supports SQL and some NO-SQL data sources, along with some cubes too. It does that quite well and in multiple ways. You just can work more quickly and efficiently by using Tableau's visual based manipulations in most cases, and then drop to the lower level detail when needed. It just takes getting used to how Tableau operates.

Find or Strip Invalid characters from Database

We are using a database where the front end software has allowed the input of invalid characters. (I have no control or re-writing of the software.)
The types of characters are carriage returns, line breaks, �, ¶, basically anything that is not 0-9, a-z or standard punctuation causes us issues with the database and how we use the data.
I'm looking for a way to scan the entire database to identify these invalid codes and either display them as results or strip them out?
I had been looking at This site wondering if there was a way of searching for a certain range? But I might be barking up the wrong tree.
I'm fairly new to SQL so be gentle with me, thanks.
The only way I could think to do this would be to write a stored procedure which uses system tables to get a list of all fields in the database/schema in question. Have it exclude system tables (or only include those that are user defined) then dynamically write out SQL update statements based on the columns/tables found in the system table inquiries. Using regular expressions or character removal like in this article
The system tables in question are:
SELECT
table_name,column_name
FROM
information_schema.columns
Psudo code would be:
Get list of tables we want to do this for
For each table in list
get list of columns for table that have string data.
For each column in table
generate update statement to strip unwanted characters
--Consider writing out table, column key, before after values to history table. incase this
has to be undone.
--Consider counter so I have an idea of what was updated
execute updatestatement
next column
next table
write out counter
Since you say
the data then moves to a second program that cannot handle these
characters and this causes the process to fail.
I'm wondering if you can leave the unreadable data where it is and create a new column for changed data that's only populated if/when the 2nd process fails. You'll still have to test every character of the data in the failed cell, but you wouldn't have to test every character of every row. After you determine the updated text to process, you can call the 2nd process again with the updated value.