I'm trying to build a query that looks through a string column and compares it to a list of strings I have in a text file to see if any of the strings in the list are contained within the text of the string column. I then want to grab the first occurrence of a match and return it.
For further context, I have a list of app names in a text file that look like ('app 1', 'app 2', etc). These all belong to one device (let's call that 'device_1').
Separately, I have a database table called "reports" with a 3 columns:
report_id
device
report_title
1
device_1
title string 1
2
device_1
title string 2
3
device_1
title string 3
I'm filtering the reports table for only device = 'device_1'. The "report_title" column will hold a long string of text that may or may not contain an app name. Using a sql query, I want to check each report title string to see if it contains one of the app names in my text file, and if so, return that app name for the first match (there SHOULD only ever be one match per title string if there is one).
The final output that I'm trying to get would be something like the below:
report_id
device
app_name
1
device_1
app 1
2
device_1
app 2
3
device_1
app 1
4
device_1
app 3
I was originally trying to do this somehow by creating a temporary local table to hold the text file strings, but I'm getting error messages when trying to create a table due to not having the appropriate permissions (unless I'm doing it wrong).
Would this be better done by converting the text file into an array somehow?
I think something like this should do it
SELECT TOP (1) report_title
FROM reports
WHERE device = 'device_1';
I want to extract a string from a string...and use it under a field named source.
I tried writing like this bu no good.
index = cba_nemis Status: J source = *AAP_ENC_UX_B.* |eval plan=upper
(substr(source,57,2)) |regex source = "AAP_ENC_UX_B.\w+\d+rp"|stats
count by plan,source
for example..
source=/p4products/nemis2/filehandlerU/encpr1/log/AAP_ENC_UX_B.az_in_aza_277U_ rp-20190722-054802.log
source=/p4products/nemis2/filehandlerU/encpr2/log/AAP_ENC_UX_B.oh_in_ohf_ed_ph_ld-20190723-034121.log
I want to extract the string \
AAP_ENC_UX_B.az_in_aza_277U_ rp from 1st
and
AAP_ENC_UX_B.oh_in_ohf_ed_ph_ld from 2nd.
and put it under the column source along with the counts..
I want results like...
source counts
AAP_ENC_UX_B.az_in_aza_277U_ rp 1
AAP_ENC_UX_B.oh_in_ohf_ed_ph_ld 1
You can use the [rex][1] command that extracts a new field from an existing field by applying a regular expression.
...search...
| rex field=source ".+\/(?<source_v2>[\.\w\s]+)-.+"
| stats count by plan, source_v2
Be careful, though: I called the new field source_v2, what you were asking would rewrite the existing source field without you explicitly requesting this. Just change source_v2 to source in my code in case this is what you want.
The search takes this new source_v2 field into account. Try and see if this is what you need. You can tweak it easily to get your expected results.
One result, under one field of data contains the below, it's extremely long. I need to be able to pull out certain substrings into seperate columns.
Desired Result:
1) email addresses that it's being sent to, identified by "TO": gregory.dettorre#cardinalhealth.com; scott.ballard#cardinalhealth.com
2) email addresses that it's being CC'd to, identified by "CC":
GMB-OptiFreight-CCBABR#cardinalhealth.com
3) email addresses that it's being CC'd to, identified by "ReplyTo":
OptiFreightcustomercare#cardinalhealth.com
4) Include report: True
5) Render Format: Excel
6) Subject: 13 Week Volume File - LifePoint Health - Brentwood, TN
Result:
"<ParameterValues><ParameterValue><Name>TO</Name>
<Value>gregory.dettorre#cardinalhealth.com;
scott.ballard#cardinalhealth.com</Value></ParameterValue><ParameterValue>
<Name>CC</Name><Value>GMB-OptiFreight-CCBABR#cardinalhealth.com</Value>
</ParameterValue><ParameterValue><Name>ReplyTo</Name>
<Value>OptiFreightcustomercare#cardinalhealth.com</Value></ParameterValue>
<ParameterValue><Name>IncludeReport</Name><Value>True</Value>
</ParameterValue><ParameterValue><Name>RenderFormat</Name>
<Value>EXCEL</Value></ParameterValue><ParameterValue><Name>Subject</Name>
<Value>13 Week Volume File - LifePoint Health - Brentwood, TN</Value>
</ParameterValue><ParameterValue><Name>Comment</Name><Value>Please see the
attached 13 week volume file and let us know if you have any questions.
OptiFreightcustomercare#cardinalhealth.com</Value></ParameterValue><ParameterValue><Name>IncludeLink</Name><Value>False</Value></ParameterValue><ParameterValue><Name>Priority</Name><Value>NORMAL</Value></ParameterValue></ParameterValues>"
Here there is an answered question on splitting strings, using SUBSTRING and CHARINDEX in SSRS. You Get the indexes of 2 delimiters (e.g. "TO" and "CC"), and by applying SUBSTRING between these 2 delimiters you get the value that you wanted.
Also, the best practice would probably be splitting the data in the dataset (e.g. SQL query) itself, instead of doing so in the report itself.
I'm trying to load data from a fixed width flat file in SSIS (2008 R2), but the first row contains data that:
Needs to be parsed with different fixed widths than the data below it and
The parsed data from that first row needs to be appended to each
item in the data below it, after that data has been separately parsed.
What is the best way to approach this? I'm relatively new to SSIS, so I've tried using a Row Count and a conditional split to separate out the first row, but I'm not sure how to parse the data outside of the Flat file importer. I've read that using a Script Transform could work, but I don't know what the code should be...
By way of example, if I had flat data that looks like:
Hamilton Beach 20150410 Sunny
Bob Male Blue Black
Bill Male BrownBrown
GeorgeMale GreenBlonde
JackieFemaleGreenBlack
Jill FemaleBlue Black
It should be in the output table as:
Hamilton Beach, 20150410, Sunny, Bob, Male, Blue, Black
Hamilton Beach, 20150410, Sunny, Bill, Male, Brown, Brown
Hamilton Beach, 20150410, Sunny, George, Male, Green, Blonde
Hamilton Beach, 20150410, Sunny, Jackie, Female, Green, Black
Hamilton Beach, 20150410, Sunny, Jill, Female, Blue, Black
You are in luck. SSIS does not support mixed record types but you can get away with it because you only have 1 header row.
My implementation would look like a Script Task that reads the first line of my file and a Data Flow task that reads the rest of the data.
Read the first line
This one is simple. Create an SSIS Variable, call it FirstLine of type String. Pass that value as a read/write value into a script task.
Use the code from this answer
Read only the first few lines of text from a file
Now you simply need to push the value of line1 into our SSIS level Variable. That looks like
Dts.Variables["User::FirstLine"].Value = line1;
This assumes you want the whole line stored into FirstLine. If you need to portion it out into individual fields, then you'll need to implement that logic. You don't provide guidance on how to delimit "Hamilton Beach 20150410 Sunny" into individual pieces but the above logic holds true. Parse and assign into different SSIS level variables.
My specific implementation created 3 SSIS Variables, all of type string
User::HeaderIHaveNoIdeaWhatThisIs
User::HeaderObservationDate
User::HeaderWeather
The following code represents what's already been linked
using System;
using System.Data;
using System.IO;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
namespace ST_7edd5e6df63a4837afac15b86c21d639.csproj
{
[System.AddIn.AddIn("ScriptMain", Version = "1.0", Publisher = "", Description = "")]
public partial class ScriptMain : Microsoft.SqlServer.Dts.Tasks.ScriptTask.VSTARTScriptObjectModelBase
{
#region VSTA generated code
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
public void Main()
{
// User::HeaderIHaveNoIdeaWhatThisIs,User::HeaderObservationDate,User::HeaderWeather
// https://stackoverflow.com/questions/9439733/read-only-the-first-few-lines-of-text-from-a-file
string line1 = string.Empty;
using (StreamReader reader = new StreamReader(#"C:\ssisdata\so_29811494.txt"))
{
line1 = reader.ReadLine();
}
// Magic here to understand how to split this out. Assuming this is also fixed width
// Horrible, hard coded brittle approach taken
//Hamilton Beach 20150410 Sunny
string h1, h2, h3;
h1 = line1.Substring(0, 20).TrimEnd();
h2 = line1.Substring(20, 12).TrimEnd();
h3 = line1.Substring(32, line1.Length - 32);
Dts.Variables["User::HeaderIHaveNoIdeaWhatThisIs"].Value = h1;
Dts.Variables["User::HeaderObservationDate"].Value = h2;
Dts.Variables["User::HeaderWeather"].Value = h3;
Dts.TaskResult = (int)ScriptResults.Success;
}
}
}
Read the rest of the data
In your flat file connection manager, you want to change the value for Skip header rows from 0 to 1. This indicates that validation of data and parsing should not begin until we've read the first N rows. Define your connection manager as usual.
Connect a Data Flow Task to the above Script Task. Within the Data Flow Task, use a Flat File Source and connect a Derived Column Component. The Derived Column Component is how we're going to get the value from our SSIS Variable into the data flow. Add a new column called HeaderColumn and use an expression like #[User::FirstLine].
If you notice that the column on the right indicates a data type of DT_NTEXT, that's probably not going to match the target column definition. You might need to do something like substring the variable SUBSTRING(#[User::FirstLine], 1, 20). That results in a data type of DT_WSTR and a length of 20. You goal is to make this match the target definition.
You may also need to make that a DT_STR data type instead of DT_WSTR. In that case, add an explicit cast into your substring operations (DT_STR, 20, 1252)SUBSTRING(#[User::FirstLine], 1, 20)
Source data
I defined my file based on supplied data (click edit on question to get definitions without the stripping of white space)
Hamilton Beach 20150410 Sunny
Bob Male Blue Black
Bill Male BrownBrown
GeorgeMale GreenBlonde
JackieFemaleGreenBlack
Jill FemaleBlue Black
I have a CSV file with following columns
ID State Email
1020202034566949 LA r1#abc.com
1020202034543245 CA r2#abc.com
1020202034521234 TX r3#abc.com
1020202034521345 TN r4#abc.com
1020202034589789 NY r5#abc.com
But wen I import them to sql table I get following result
ID State Email
1.020202034566949E+15 LA r1#abc.com
1.020202034543245E+15 CA r2#abc.com
1.020202034521234E+15 TX r3#abc.com
1.020202034521345E+15 TN r4#abc.com
1.020202034589789E+15 NY r5#abc.com
as the column is ID I need it to be exactly same like what I get,
I tried changing the format in Excel with various formats as number without decimals, special category and all but are of no use.
You need to pre format the cell in excel to "number", set decimal place to 0, currently it will be set at general but this struggles with larger numbers such as yours.
Had the same issue myself earlier.
EDIT
As commented below you will obviously need to account for such a large number in SQL server bigint will be enough.
EDIT 2
Think the only way for you to go would be to set this as a text field, I've just tested and it doesn't actually alter the number this way.
The issue is that excel will only allow accurate numbers of up to 15 charactors.
The workaround here being you have it as a text field in excel, then cast this as a big int during the import.