Add text to the start of a SOLR database value - sql

I am using this code for my SOLR DIH:
<dataConfig>
<dataSource name="app" driver="org.postgresql.Driver" url="jdbc:postgresql://localhost:5432/wikipedia" user="wikipedia" password="secret" />
<dataSource name="data" driver="org.postgresql.Driver" url="jdbc:postgresql://localhost:5432/wikipedia" user="wikipedia" password="secret" />
<document>
<entity dataSource="app-ds" name="item" query="SELECT id, title, description, date, link, location_id, source_id, company_id from item">
<field column="id" name="id" />
<field column="title" name="title" />
<field column="description" name="description" />
<field column="date" name="date" />
<field column="link" name="link" />
<entity dataSource="app-ds" name="location" query="SELECT name, coordinate from location where location_id=${item.location_id}">
<field column="name" name="location_name" />
<field column="coordinate" name="location_coordinates" />
</entity>
<entity dataSource="app-ds" name="source" query="SELECT name from source where source_id=${item.source_id}">
<field column="name" name="source_name" />
</entity>
<entity dataSource="app-ds" name="company" query="SELECT name from company where company_id=${item.company_id}">
<field column="name" name="company_name" />
</entity>
</entity>
</document>
</dataConfig>
Since I am merging two databases I want to have a uniqueID for each entry within SOLR. In my case the best way of doing this is to have app*ID* for the first databases ID's and data*ID* for the second databases ID's.
Using my code above, how do I add the word "app" to the front of the ID that is stored in the SOLR ID field so that my database ID=123 and the Solr ID = app123
EDIT: As I guess it might be something like this (but I am not good with SQL)
query="SELECT app_(id)

You can try to modify the SQL Query as -
SELECT 'TABLE1' || ID AS PRIMARY_ID ........
PRIMARY_ID can be now configured as unique ID in solr.
<field column="primary_id" name="primary_id" />
Even if you are performing incremental updates using Delta imports, the SQL ID updated would still generate the Same Solr ID with the above query and would be updated. So be sure to use the above in the Delta queries as well.

Related

How to insert bulk csv file into SQL Server

I have a csv file that have columns like this
"Hold","EmpID","Source","Shard-Exists","Type"
but my DB table look like this
//Note: My Id is auto increment
"Id","Hold","EmpID","Source","Type","CurrentDate"
I'm just wondering how can bulk insert my csv file into the database table without the shard-Exist column and also passing the Current Date automatically.
Any help or suggestion will be really appreciated
TRUNCATE TABLE dbo.Actors;
GO
-- import the file
BULK INSERT dbo.Actors
FROM 'C:\Documents\Skyvia\csv-to-mssql\actor.csv'
WITH
(
FORMAT='CSV',
FIRSTROW=2
)
GO
You should be able to use a format file to accomplish the 'skip columns in table' task. I'll modify the example from the MS docs.
<?xml version="1.0"?>
<BCPFORMAT xmlns="https://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="7"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="25" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="25" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="25" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR="," MAX_LENGTH="25" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="6" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="30" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="2" NAME="Hold" xsi:type="SQLINT"/>
<COLUMN SOURCE="3" NAME="EmpID" xsi:type="SQLINT"/>
<COLUMN SOURCE="4" NAME="Source" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="6" NAME="Type" xsi:type="SQLVARYCHAR"/>
</ROW>
</BCPFORMAT>
Note, in the <ROW> portion that I'm not specifying anything for the ID or CurrentDate columns. It's also noteworthy that there's no SOURCE="5"; that's how the Shard-Exists field in the source data is being skipped.
As to auto-generating a value for CurrentDate, my recommendation would be to add a default constraint to your table. that can be done like so:
ALTER TABLE dbo.Actors
ADD CONSTRAINT DF_Actors__CurrentDate
DEFAULT (getdate()) FOR CurrentDate;

Bulk insert csv file with semicolon as delimiter

I'm trying to import data from semicolon separated csv file into a SQL Server database. Here is the table structure
CREATE TABLE [dbo].[waste_facility]
(
[Id] INT IDENTITY (1, 1) NOT NULL,
[postcode] VARCHAR (50) NULL,
[name] VARCHAR (50) NULL,
[type] VARCHAR (255) NULL,
[street] VARCHAR (255) NULL,
[suburb] VARCHAR (255) NULL,
[municipality] VARCHAR (255) NULL,
[telephone] VARCHAR (255) NULL,
[website] VARCHAR (255) NULL,
[longtitude] DECIMAL (18, 8) NULL,
[latitude] DECIMAL (18, 8) NULL,
PRIMARY KEY CLUSTERED ([Id] ASC)
);
The csv file is shown below:
Location Coordinate;Feature Extent;Projection;Postcode;Name Of Facility;Type Of Facility;Street;Suburb;Municipality;Telephone Number;Website;Easting Coordinate;Northing Coordinate;Longitude Coordinate;Latitude Coordinate;Google Maps Direction
-37.9421182892,145.3193857967;"{""coordinates"": [145.3193857967, -37.9421182892], ""type"": ""Point""}";MGA zone 55;3156;Cleanaway Lysterfield Resource Recovery Centre;Recovery Centre;840 Wellington Road;LYSTERFIELD;Yarra Ranges;9753 5411;https://www.cleanaway.com.au/location/lysterfield/;352325;5799275;145.31938579674124;-37.94211828921733;https://www.google.com.au/maps/dir//-37.94211828921733,145.31938579674124/#your+location,17z/data=!4m2!4m1!3e0
-38.0529529215,145.2433557709;"{""coordinates"": [145.2433557709, -38.0529529215], ""type"": ""Point""}";MGA zone 55;3175;Smart Recycling (South Eastern Depot);Recycling Centre;185 Dandenong-Hastings Rd;LYNDHURST;Greater Dandenong;8787 3300;https://smartrecycling.com.au/;345876;5786853;145.24335577090602;-38.05295292152536;https://www.google.com.au/maps/dir//-38.05295292152536,145.24335577090602/#your+location,17z/data=!4m2!4m1!3e0
-38.0533129717,145.267610135;"{""coordinates"": [145.267610135, -38.0533129717], ""type"": ""Point""}";MGA zone 55;3976;Hampton Park Transfer Station (Outlook Environmental);Transfer Station;274 Hallam Road;HAMPTON PARK;Casey;9554 4502;https://www.suez.com.au/en-au/who-we-are/suez-in-australia-and-new-zealand/our-locations/waste-management-hampton-park-transfer-station;348005;5786853;145.2676101350274;-38.053312971691255;https://www.google.com.au/maps/dir//-38.053312971691255,145.2676101350274/#your+location,17z/data=!4m2!4m1!3e0
-38.1243050577,145.2183465487;"{""coordinates"": [145.2183465487, -38.1243050577], ""type"": ""Point""}";MGA zone 55;3977;Frankston Regional Recycling and Recovery Centre;Recycling Centre;20 Harold Road;SKYE;Frankston;1300 322 322;https://www.frankston.vic.gov.au/Environment-and-Waste/Waste-and-Recycling/Frankston-Regional-Recycling-and-Recovery-Centre-FRRRC/Accepted-Items-at-FRRRC;343833;5778893;145.21834654873447;-38.12430505770815;https://www.google.com.au/maps/dir//-38.12430505770815,145.21834654873447/#your+location,17z/data=!4m2!4m1!3e0
-38.0973208774,145.4920399066;"{""coordinates"": [145.4920399066, -38.0973208774], ""type"": ""Point""}";MGA zone 55;3810;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09732087738631,145.4920399066473/#your+location,17z/data=!4m2!4m1!3e0
There are some columns that I don't need, so I create a format file to import the data. The format file is shown as below
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharFixed" LENGTH="50"/>
<FIELD ID="12" xsi:type="CharFixed" LENGTH="50"/>
<FIELD ID="13" xsi:type="CharFixed" LENGTH="50"/>
<FIELD ID="2" xsi:type="CharFixed" LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="3" xsi:type="CharFixed" LENGTH="50" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="4" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="5" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="6" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="7" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="8" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="9" xsi:type="CharFixed" LENGTH="255" COLLATION="SQL_Latin1_General_CP1_CI_AS"/>
<FIELD ID="14" xsi:type="CharFixed" LENGTH="50"/>
<FIELD ID="15" xsi:type="CharFixed" LENGTH="50"/>
<FIELD ID="10" xsi:type="CharFixed" LENGTH="41"/>
<FIELD ID="11" xsi:type="CharTerm" TERMINATOR="\r\n" MAX_LENGTH="41"/>
<FIELD ID="16" xsi:type="CharFixed" LENGTH="50"/>
</RECORD>
<ROW>
<COLUMN SOURCE="2" NAME="postcode" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="3" NAME="name" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="4" NAME="type" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="5" NAME="street" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="6" NAME="suburb" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="7" NAME="municipality" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="8" NAME="telephone" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="9" NAME="website" xsi:type="SQLVARYCHAR"/>
<COLUMN SOURCE="10" NAME="longtitude" xsi:type="SQLDECIMAL" PRECISION="18" SCALE="8"/>
<COLUMN SOURCE="11" NAME="latitude" xsi:type="SQLDECIMAL" PRECISION="18" SCALE="8"/>
</ROW>
</BCPFORMAT>
Then I tried both bulk insert and bcp in - neither of them works.
Here is the bulk insert command
USE [waste-facility-locations];
BULK INSERT [dbo].[waste_facility]
FROM 'E:\onboardingIteration\waste-facility-locations.csv'
WITH (FORMATFILE = 'E:\onboardingIteration\waste_facility_formatter.xml',
FIRSTROW = 2,
LASTROW = 6,
FIELDTERMINATOR = ';',
ROWTERMINATOR = '\n',
ERRORFILE = 'E:\onboardingIteration\myRubbishData.log');
But unlucky some error file were generated. Here is what myRubbishData.log error says:
Row 2 File Offset 1993 ErrorFile Offset 0 - HRESULT 0x80004005
And the actual row stored in myRubbishData.txt:
;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09732087738631,145.4920399066473/#your+location,17z/data=!4m2!4m1!3e0;Pakenham Waste Transfer Station (Future Recycling);Transfer Station;30-32 Exchange Drive;PAKENHAM;Cardinia;13Recycling;https://www.futurerecycling.com.au/;367776;5782313;145.4920399066473;-38.09732087738631;https://www.google.com.au/maps/dir//-38.09
As you can see, it seems like rows are not correctly separated. So I tried to change the row delimiter to "\n","\r","\n\r","\r\n", none of them work.
And I tried bcp. It did not work either.
Here is the bcp command I used:
bcp [waste-facility-locations].[dbo].[waste_facility] in "E:\onboardingIteration\waste-facility-locations.csv" -f "E:\onboardingIteration\waste_facility_formatter.xml" -T -S "(LocalDB)\MSSQLLocalDB" -F 2 -t ";" -r "\n"
Then I get an error said somehow the same thing
SQLState = S1000, NativeError = 0
Error = [Microsoft][ODBC Driver 17 for SQL Server]Unexpected EOF encountered in BCP data-file
0 rows copied.
Network packet size (bytes): 4096
Clock Time (ms.) Total : 1
One interesting things is, if I create a new excel and choose "Get data" option to import the csv file, the file can be literally correctly parsed.
Basically I can't find what I did wrong here. Can someone help me on this one?
The SQL Server import facilities are very intolerant of bad data and even just formatting variations or options. In my career, I have literally spent thousands of work-hours trying to develop and debug import procedures for customers. I can tell you right now, that trying to fix this with SQL alone is both difficult and time-consuming.
When you have this problem (bad data and/or inconsistent formatting) it is almost always easier to find or develop a more flexible tool to pre-process the data into the rigid standard that SQL expects. So I would say that if Excel can parse it then just use Excel automation to pre-process them and then use SQL to import the Excel output. If that's not practical for you, then I'd advise writing your own tool in some client language (C#, Vb, Java, Python, etc.) to pre-process the files.
You can do it in SQL (and I have done it many times), but I promise you that it is a long complicated trek.
SSIS has more flexible error-handling for problems like this, but if you are not already familiar and using it, it has a very steep learning curve and your first SSIS project is likely to be very time-consuming also.

Update Single XML Node Value of XML Column using SQL Server

I want to update single value of XML Node in SQl Server
Below is the table structure
XML Structure
<PayDetails>
<Column Name="FG" DataType="float" Value="7241" />
<Column Name="SKILL" DataType="float" Value="3" />
<Column Name="PI" DataType="float" Value="87" />
<Column Name="MD" DataType="float" Value="30" />
<Column Name="LD" DataType="float" Value="4" />
<Column Name="WEEKOFF_DAYS" DataType="float" Value="4" />
<Column Name="NETPAY" DataType="float" Value="5389" />
</PayDetails>
I want to update value of FG from 7241 to 8000
You want to use replace value of...with keywords:
Try something like the following:
update tablename
set TransactionFieldDetails.modify(
'replace value of
(/PayDetails/Column[#Name="FG"]/#Value)[1]
with "8000"');

Extract data from XML Clob using SQL from db2

I want to extract the value of Decision using sql from table TRAPTABCLOB having column testclob with XML stored as clob. IN DB2
Sample XML as below
<?xml version="1.0" encoding="UTF-8"?>
<DCResponse>
<Status>Success</Status>
<Authentication>
<Status>Success</Status>
</Authentication>
<ResponseInfo>
<ApplicationId>5701200</ApplicationId>
<SolutionSetInstanceId>
63a5c214-b5b5-4c45-9f1e-b839a0409c24
</SolutionSetInstanceId>
<CurrentQueue />
</ResponseInfo>
<ContextData>
<!--Decision Details Start-->
<Field key="SoftDecision">A</Field>
<Field key="**Decision**">1</Field>
<Field key="NodeNo">402</Field>
<Field key="NodeDescription" />
<!--Decision Details End-->
<!--Error Details Start-->
<Field key="ErrorResponse">
<Response>
<Status>[STATUS]</Status>
<ErrorCode>[ERRORCODE]</ErrorCode>
<ErrorDescription>[ERRORDESCRIPTION]</ErrorDescription>
<Segment>[SEGMENT]</Segment>
</Response>
</Field>
<Field key="ErrorCode">0</Field>
<Field key="ErrorDescription" />
</ContextData>
</DCResponse>
One of the nice things about using XMLTABLE() is that it produces an expression that can be used as a subquery or joined to a table or another SQL expression.
SELECT x.decision
FROM traptabclob, XMLTABLE(
'$d/DCResponse/ContextData[1]' PASSING XMLPARSE(DOCUMENT testclob) AS "d"
COLUMNS
DECISION CHAR(1) PATH 'Field[#key="**Decision**"][1]'
) AS x
;

Bulk Insert using format file for varying number of columns between datafile and actual database table

This is the Schema for a table
Create table dbo.Project
(
ProjectID (int,not null)
ManagerID (int,not null)
CompanyID(int, not null)
Title (nvarchar(50),not null)
StartDate(datetime,not null)
EndDate(datetime,null)
ProjDescription(nvarchar(max))
)
I created a datafile called bob.dat from this table which has around 15 rows with the following bcp command
bcp "Select ProjectID,ManagerID,CompanyID,Title,StartDate from CATS.dbo.Project" queryout "C:\Documents\bob.dat" -Sbob-pc -T -n
Also a format/mapping file called bob.fmt was created using the following bcp command
bcp CATS.dbo.Project format nul -f C:\Documents\bob.fmt -x -Sbob-pc -T -n
Then i created a copy of the table Project.
Create table dbo.ProjectCopy
(
ProjectID (int,not null)
ManagerID (int,not null)
CompanyID(int, not null)
Title (nvarchar(50),not null)
StartDate(datetime,not null)
EndDate(datetime,null)
ProjDescription(nvarchar(max))
)
What i want to do now is use the bob.dat and bob.format file to populate this table ProjectCopy using the following Bulk Insert statement.
BULK INSERT CATS.dbo.ProjectCopy
FROM 'C:\Documents\bob.dat'
WITH (FORMATFILE = 'C:\Documents\bob.fmt',
LASTROW=5,
KEEPNULLS,
DATAFILETYPE='native');
GO
SELECT * FROM CATS.dbo.ProjectCopy
GO
So basically the data file does not contain any data for the columns EndDate and ProjDescription. I want these two columns to be remained as null. Unfortunately i get the
following error when i run the bulk insert statement.
Msg 4863, Level 16, State 4, Line 2
Bulk load data conversion error (truncation) for row 1, column 6 (EndDate).
Msg 7399, Level 16, State 1, Line 2
The OLE DB provider "BULK" for linked server "(null)" reported an error.
The provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 2
Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".
(0 row(s) affected)
Anyone got any clue how this could be fixed?
just to inform you all i have already been to this sections and the solution provided there
didn't work out for me.
BULK INSERT with inconsistent number of columns,
Can't identify reason for BULK INSERT errors,
BULK INSERT with inconsistent number of columns
First of all, for a creating a datafile called bob.dat you need to add two columns: EndDate and ProjDescription. In addition, for the bulk copy operation using Unicode-characters must be added the argument -W.
Example:
bcp "Select ProjectID,ManagerID,CompanyID,Title,StartDate, NULL AS EndDate, NULL AS ProjDescription from CATS.dbo.Project" queryout "C:\Users\Pawan\Documents\bob.dat" -Sbob-pc -T -n -w
The original format file:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="NCharTerm" TERMINATOR="\t\0" MAX_LENGTH="24"/>
<FIELD ID="2" xsi:type="NCharTerm" TERMINATOR="\t\0" MAX_LENGTH="24"/>
<FIELD ID="3" xsi:type="NCharTerm" TERMINATOR="\t\0" MAX_LENGTH="24"/>
<FIELD ID="4" xsi:type="NCharTerm" TERMINATOR="\t\0" MAX_LENGTH="100" COLLATION="Cyrillic_General_CI_AS"/>
<FIELD ID="5" xsi:type="NCharTerm" TERMINATOR="\t\0" MAX_LENGTH="48"/>
<FIELD ID="6" xsi:type="NCharTerm" TERMINATOR="\t\0" MAX_LENGTH="48"/>
<FIELD ID="7" xsi:type="NCharTerm" TERMINATOR="\r\0\n\0" COLLATION="Cyrillic_General_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ProjectID" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="ManagerID" xsi:type="SQLINT"/>
<COLUMN SOURCE="3" NAME="CompanyID" xsi:type="SQLINT"/>
<COLUMN SOURCE="4" NAME="Title" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="5" NAME="StartDate" xsi:type="SQLDATETIME"/>
<COLUMN SOURCE="6" NAME="EndDate" xsi:type="SQLDATETIME"/>
<COLUMN SOURCE="7" NAME="ProjDescription" xsi:type="SQLNVARCHAR"/>
</ROW>
</BCPFORMAT>
But you want a way to fill the data only upto StartDate. Therefore, this file needs to be changed:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="NCharTerm" TERMINATOR="\t\0" MAX_LENGTH="24"/>
<FIELD ID="2" xsi:type="NCharTerm" TERMINATOR="\t\0" MAX_LENGTH="24"/>
<FIELD ID="3" xsi:type="NCharTerm" TERMINATOR="\t\0" MAX_LENGTH="24"/>
<FIELD ID="4" xsi:type="NCharTerm" TERMINATOR="\t\0" MAX_LENGTH="100" COLLATION="Cyrillic_General_CI_AS"/>
<FIELD ID="5" xsi:type="NCharTerm" TERMINATOR="\r\0\n\0" MAX_LENGTH="48"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="ProjectID" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="ManagerID" xsi:type="SQLINT"/>
<COLUMN SOURCE="3" NAME="CompanyID" xsi:type="SQLINT"/>
<COLUMN SOURCE="4" NAME="Title" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="5" NAME="StartDate" xsi:type="SQLDATETIME"/>
</ROW>
</BCPFORMAT>