UPDATE query produces no change - sql

I wrote the following UPDATE statement:
UPDATE print_archive
SET pages = pages*2
WHERE printer ILIKE '%plot%' AND paper_size = 'Arch D';
It will look in a table named print_archive for any printers with "plot" in their name and the paper size 'Arch D'.
If it finds any it is suppose to multiply the page count by 2.
Below is a sample of the print_archive table data -
(column names)
"name_id","printer","pages","source","date","file_name","duplex","paper_size"
Sample Data:
jane, \\PRINTSRV\plot9, 1, \\COMP-01, 01/21/2017 14:30:39, hello_world.pdf, No, Arch D,
billy, \\PRINTSRV\Plot13, 1, \\COMP-02, 02/20/2016 10:37:23, bye_world.doc, No, Arch D,
But no matter what I change or how many times I run the UPDATE statement it always returns 0.
What am I doing wrong?

So initially I had added the latin1 encoding when importing the CSV file as the import process kept producing encoding error messages when using the default encode. I switched the import of the CSV file to 'latin2' and that worked better. There does seem to be a leading space as well in some of the fields so adding '%' helped as well (though this was not working in latin1). pgAdmin4 (which is how my users will interact) handles latin2 pretty good. Though from psql cmd I need to set the client encoding to latin2 before it can work (SET client_encoding to 'latin2';)
Thanks guys. All your recommendations and leads helped me.

Related

BigQuery: Convert a text column to UTF-8

I want to start a Vertex AI AutoML Text Entity Extraction Batch Prediction Job, but in my own experience, texts ("content" field in the JSONL structure), must also accomplish the following two features:
Every text's size in bytes, must be between 10 and 10000 bytes: DONE
Every text encoding must be UTF-8: UNKNOWN
My original data is stored in BigQuery, so I'll have to export it to Google Cloud Storage for later batch prediction. To take advantage of BigQuery optimization, I want to accomplish the 2 previous tasks in the BigQuery data source table itself. I have checked Google's official documentation, and the closest I have got to some related information, is this; however not accurate VS what I want. BTW, the query looks as follows:
WITH mydata AS (
SELECT
CASE
WHEN BYTE_LENGTH(posting)>10000 THEN LEFT(posting, 9950)
WHEN BYTE_LENGTH(posting)<10 THEN CONCAT(posting, " is possibly an skill")
ELSE posting
END AS posting
FROM `my-project.Machine_Learning_Datasets.sample-data-source` -- Modified for data protection
)
SELECT
posting as content, -- Something needs to be done here
"text" as mimeType
FROM mydata
And my-project.Machine_Learning_Datasets.sample-data-source schema looks as follows:
Field name
Type
Mode
Records
posting
STRING
NULLABLE
100M
Any ideas?
The following answer did the job, FYI:
WITH
mydata AS (
SELECT
CASE
WHEN BYTE_LENGTH(posting)>10000 THEN LEFT(posting, 9950)
WHEN BYTE_LENGTH(posting)<10 THEN CONCAT(posting, " is possibly an skill")
ELSE
posting
END
AS posting
FROM
`my-project.Machine_Learning_Datasets.sample-data-source` )
SELECT
REGEXP_REPLACE(posting, r'[^\x00-\x7F]+', '') AS content,
"text/plain" AS mimeType
FROM
mydata
UPDATE: This case has been considered, for an improved workaround.
Thanks!

Understanding the "Not found: Dataset ### was not found in location US" error

I know this topic has come up many times but still here I am. Data processing location seems consistent (dataset, US; query: US) and I am using backticks & long format in the FROM clause
Below are two sequences of code. The first one works perfectly:
SELECT station_id
FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
Whereas the following returns an error message:
SELECT bikeshare_stations.station_id
FROM `bigquery-public-data.austin_bikeshare`
Not found: Dataset glassy-droplet-347618:bigquery-public-data was not found in location US
My question, thus, is why do the first lines of text work while the second doesn't?
You need to understand the different parts of the backticks:
bigquery-public-data is the name of the project;
austin_bikeshare is the name of the schema (aka dataset in BQ); and
bikeshare_stations is the name of the table/view.
Therefore, the shorter format you are looking for is: austin_bikeshare.bikeshare_stations (instead of bigquery-public-data.austin_bikeshare).
Using bigquery-public-data.austin_bikeshare means that you have a schema called bigquery-public-data that contains a table called austin_bikeshare , when this is not true.

SQL Full Text Contains not returning expected rows

I have a Body column that is full text indexed and is nvarchar(max)
One row has this in the Body column
You want slighty mad this sat the 60th runing of the 3peaks race! Peny-ghent whernside and inglbauher! Only in yorkshire!
If I run: select body from messages where CONTAINS(Body,'you') it doesn't return any data.
If I run the below adding wildcards select messageid,body from messages where CONTAINS(body,'"*you*"') it still doesnt return the data.
Can you help me understand what's going on please?
Thanks
UPDATE : It makes no difference if its you or You, either way no results
It can be case sensitivity issue. Try with select messageid,body from messages where CONTAINS(body,'"*You*"') and see if you are getting the result or not
A full text catalog has a set of words in a “stoplist” that it won’t search on as SQL Server considers them “unimportant for search purposes”
To get this you can run
select ssw.*
from sys.fulltext_system_stopwords ssw
where ssw.language_id = 1033;
Below are the words it won’t search on and you’ll see it contains “you” hence why it didn’t find my data.

Export Data from SQL to CSV

I'm using EntityFramework to access a sql server to return data. The data needs to be formatted into a tab delimited file. I then want to compress the data to return to the user.
I can do the select, and then iterate over the EF objects and format all the data into one big string- but this takes forever (I'm returning abouit 800k rows). The query itself is quite fast, but its just the creating of the csv file in memory that is killing it.
I found this post that describes how to use sqlcmd to do this directly as an export (but with csv) with sql which seems very promising, but I'm unclear how to pass the -E and other parameters to ExecuteSqlCommand()... or if it is even meant for this.
I tried to do something like this:
var test = context.Database.ExecuteSqlCommand("select Chromosome c,
StartLocation sl, Endlocation el, GeneName gn from Gencode where c = chr1",
"-E", "-Q", new SqlParameter("-s", "\t"));
But of course that didn't work...
Any suggestions as to how to go about this? I'm using EF 6.1 if that matters.
Alternate option using simple method.
F5-->store result--> keep file name

Apache solr - more like this score

I have a small index with ~1000 documents with only two fields:
- id (string)
- content (text_general)
I noticed that when I do MLT search by id for similar content, the original document(which id is the searched id) have a score 5.241327.
There is 1:1 duplicated document and for the duplicated content it is returning score = 1.5258181. Why? Why it is not 5.241327 when it is 100% duplicate.
Another question is can I in any way to get similarity documents by content by passing some text in the query.
Example:
/mlt/?q=content:Some encoded long text&mlt.fl=content
I am trying to check if there is similar content uploaded and the check must be performed at new content upload time.
It might be worth to try some different parameters. I also use MLT on only one field, I use the following parameters:
'mlt.boost': 'true',
'mlt.fl': 'my_field_name',
'mlt.maxqt': 1000,
'mlt.mindf': '0',
'mlt.mintf': '0',
'qt': 'mlt',
'rows': '10'
See http://wiki.apache.org/solr/MoreLikeThis for an explanation of the parameters. I think with a small index mindf might be important and I see the default mintf (term frequency) is 2, so I assume an ID is only one term, so this is probably ignored!
First, how does Solr More-Like-This works?
A regular Solr query is conducted (e.g. "?q=content:Some encoded long text&.....".
For each document returned by the above query, More-Like-This conduct More like this query...
So, the first result set "response", is just like any Solr query results set.
The More-Like-This appears below and start with something like that (Json format):
"moreLikeThis":{
"57375":{"numFound":18155,"start":0,"docs":["
For an explanation about More Like This algorithm, please read that:
http://blog.brattland.no/node/18
and: http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/
If you didn't solved the problem yet, please let me know and I will guide you through.