query using string in PyTables 3 - pytables

I have a table:
h5file=open_file("ex.h5", "w")
class ex(IsDescription):
A=StringCol(5, pos=0)
B=StringCol(5, pos=1)
C=StringCol(5, pos=2)
table=h5file.create_table('/', 'table', ex, "Passing string as column name")
table=h5file.root.table
rows=[
('abc', 'bcd', 'dse'),
('der', 'fre', 'swr'),
('xsd', 'weq', 'rty')
]
table.append(rows)
table.flush()
I am trying to query as per below:
find='swr'
creteria='B'
if creteria=='B':
condition='B'
else:
condition='C'
value=[x['A'] for x in table.where("""condition==find""")]
print(value)
It returns:
ValueError: there are no columns taking part in condition condition==find
Is there a way to use condition as a column name in above query?
Thanks in advance.

Yes, you can use Pytables .where() to search based on a condition. The problem is how you constructed your query for the table.where(condition). See Note about strings under Table.where() in the Pytables Users Guide:
A special care should be taken when the query condition includes string literals. ... Python 3 strings are unicode objects.
in Python 3, “condition” should be defined like this:
condition = 'col1 == b"AAAA"'
The reason is that in Python 3 “condition” implies a comparison between a string of bytes (“col1” contents) and an unicode literal (“AAAA”).
The simplest form of your query is shown below. It returns a subset of rows that match the condition. Note use of single and double quotes for string and unicode:
query_table = table.where('C=="swr"') # search in column C
I rewrote your example as best I could. See below. It shows several ways to enter the condition. I'm not smart enough to figure out how to combine your creteria and find variables into a single condition variable with string and unicode characters.
from tables import *
class ex(IsDescription):
A=StringCol(5, pos=0)
B=StringCol(5, pos=1)
C=StringCol(5, pos=2)
h5file=open_file("ex.h5", "w")
table=h5file.create_table('/', 'table', ex, "Passing string as column name")
## table=h5file.root.table
rows=[
('abc', 'bcd', 'dse'),
('der', 'fre', 'swr'),
('xsd', 'weq', 'rty')
]
table.append(rows)
table.flush()
find='swr'
query_table = table.where('C==find')
for row in query_table :
print (row)
print (row['A'], row['B'], row['C'])
value=[x['A'] for x in table.where('C == "swr"')]
print(value)
value=[x['A'] for x in table.where('C == find')]
print(value)
h5file.close()
Output shown below:
/table.row (Row), pointing to row #1
b'der' b'fre' b'swr'
[b'der']
[b'der']

Related

ProgrammingError: ('Expected 0 parameters, supplied 391', 'HY000') with 391 columns using dynamic approach

I have a dataframe that contains 391 columns and a number of rows. I am trying to push this to a database via pyodbc and using the following command:
cursor = conn.cursor()
cursor.fast_executemany = True
cursor.executemany(
f"INSERT INTO db.tble({', '.join(df.columns.tolist())}) VALUES ({('?,' * len(df.columns))[:-1]})",
list(df.itertuples(index=False, name=None))
)
cursor.commit()
I would have thought this method would be dynamic for a dataframe of any size yet I get the following error:
ProgrammingError: ('Expected 0 parameters, supplied 391', 'HY000')
I am struggling to understand this as the syntax looks correct, ? has been used instead of %s like other answers. Can someone please help.
Thanks
I once wrote a piece of code, where I wanted to create the insert statement dynamically based on number of columns in the data frame:
here is how the insert query would be passed to the database:
INSERT INTO dbo.Table (column1,columns2,column3) VALUES (?,?,?)
and again, the number of columns and values '?' would be required to be created dynamically at runtime based upon the number of columns the data frame had
I wrote the below piece to just write a string (of ?,?,?) and concatenate it with the insert query,
here
df is the dataframe,
symbol_counter would hold the number of columns in the dataframe,
sym_string would be the final string i.e. (?,?,?,?...n) based on the number of columns
symbol = ['?']
sym_string = ''
symbol_counter = int(df.shape[1])-1
word = 0
for word in range(symbol_counter):
# sym_string += str(symbol)
symbol.insert(word, "?")
word+=1
sym_string = (','.join(symbol))
#and then use this variable and concatenate it with the rest of the query as shown below
query = Variable_holding_first_partofthequery + " VALUES (" +sym_string+")"
I know, it's the big way, but that's how I got it to work. Good Luck!

Writing dataframe via sql query (pyodbc): pyodbc.Error: ('HY004', '[HY004])

I'd like to parse a dataframe to two pre-define columns in an sql table. The schema in sql is:
abc(varchar(255))
def(varchar(255))
With a dataframe like so:
df = pd.DataFrame(
[
[False, False],
[True, True],
],
columns=["ABC", "DEF"],
)
And the sql query is like so:
with conn.cursor() as cursor:
string = "INSERT INTO {0}.{1}(abc, def) VALUES (?,?)".format(db, table)
cursor.execute(string, (df["ABC"]), (df["DEF"]))
cursor.commit()
So that the query (string) looks like so:
'INSERT INTO my_table(abc, def) VALUES (?,?)'
This creates the following error message:
pyodbc.Error: ('HY004', '[HY004] [Cloudera][ODBC] (11320) SQL type not supported. (11320) (SQLBindParameter)')
So I try using a direct query (not via Python) in the Impala editor, on the following:
'INSERT INTO my_table(abc, def) VALUES ('Hey','Hi');'
And produces this error message:
AnalysisException: Possible loss of precision for target table 'my_table'. Expression ''hey'' (type: `STRING) would need to be cast to VARCHAR(255) for column 'abc'`
How come I cannot even insert into my table simple strings, like "Hi"? Is my schema set up correctly or perhaps something else?
STRING type in Impala has a size limit of 2GB.
VARCHAR's length is whatever you define it to be, but not more than 64KB.
Thus there is a potential of data loss if you implicitly convert one into another.
By default, literals are treated as type STRING. So, in order to insert a literal into VARCHAR field you need to CAST it appropriately.
INSERT INTO my_table(abc, def) VALUES (CAST('Hey' AS VARCHAR(255)),CAST('Hi' AS VARCHAR(255)));

PLUCK one row WHERE postgres column name includes underscore

I am trying to get a row/rows in a POSTGRES table where a column name with an underscore ("Cas_NO") matches a specific string. I am having trouble formulating a working query:
#array_a = ["a", "b", "c"]
#array_b
#array_a.each do |a|
#array_b << Database.where("\"BAS_No\" = ?", a).pluck(:INBC_name)
end
Running the above code gives me the entire rows from the Database table, not just where "BAS_No" matches "a" or "b" or "c" in the #array_a
That's not your real code. Saying this:
Database.where("\"BAS_No\" = ?", %w[a b c])
will produce an SQL WHERE clause like this:
where ("BAS_No" = 'a','b','c')
and that's not valid SQL so you should get an ActiveRecord::StatementInvalid exception rather than a bunch of data.
In any case, you shouldn't use = with a list of values, you want to use IN and the easiest way to get that is to let ActiveRecord build the SQL:
Database.where(:BAS_No => a).pluck(:INBC_Name)
#--------------^^^^^^^^^^^^
or, if you're allergic to hashrockets:
Database.where(BAS_No: a).pluck(:INBC_Name)
#--------------^^^^^^^^^
When a is an array, those will produce IN (...) expressions in the WHERE clause and you should get back what you're expecting.

Regex capturing inside a group

I working on a method to get all values based on a SQL query and then scape them in php.
The idea is to get the programmer who is careless about security when is doing a SQL query.
So when I try to execute this:
INSERT INTO tabla (a, b,c,d) VALUES ('a','b','c',a,b)
The regex needs to capture 'a' 'b' 'c' a and b
I was working on this a couple of days.
This was as far I can get with 2 regex querys, but I want to know if there is a better way to do:
VALUES ?\((([\w'"]+).+?)\)
Based on the previous SQL this will match:
VALUES ('a','b','c',a,b)
The second regex
['"]?(\w)['"]?
Will match
a b c a b
Previously removing VALUES, of course.
This way will match a lot of the values I gonna insert.
But doesn't work with JSON for example.
{a:b, "asd":"ads" ....}
Any help with this?
First, I think you should know that SQL support many types of single/double quoted string:
'Northwind\'s category name'
'Northwind''s category name'
"Northwind \"category\" name"
"Northwind ""category"" name"
"Northwind category's name"
'Northwind "category" name'
'Northwind \\ category name'
'Northwind \ncategory \nname'
to match them, try with these patterns:
"[^\\"]*(?:(?:\\.|"")[^\\"]*)*"
'[^\\']*(?:(?:\\.|'')[^\\']*)*'
combine patterns together:
VALUES\s*\(\s*(?:"[^\\"]*(?:(?:\\.|"")[^\\"]*)*"|'[^\\']*(?:(?:\\.|'')[^\\']*)*'|\w+)(?:\s*,\s*(?:"[^\\"]*(?:(?:\\.|"")[^\\"]*)*"|'[^\\']*(?:(?:\\.|'')[^\\']*)*'|\w+))*\)
PHP5.4.5 sample code:
<?php
$pat = '/\bVALUES\s*\((\s*(?:"[^\\"]*(?:(?:\\.|"")[^\\"]*)*"|\'[^\\\']*(?:(?:\\.|\'\')[^\\\']*)*\'|\w+)(?:\s*,\s*(?:"[^\\"]*(?:(?:\\.|"")[^\\"]*)*"|\'[^\\\']*(?:(?:\\.|\'\')[^\\\']*)*\'|\w+))*)\)/';
$sql_sample1 = "INSERT INTO tabla (a, b,c,d) VALUES ('a','b','c',a,b)";
if( preg_match($pat, $sql_sample1, $matches) > 0){
printf("%s\n", $matches[0]);
printf("%s\n\n", $matches[1]);
}
$sql_sample2 = 'INSERT INTO tabla (a, b,c,d) VALUES (\'a\',\'{a:b, "asd":"ads"}\',\'c\',a,b)';
if( preg_match($pat, $sql_sample2, $matches) > 0){
printf("%s\n", $matches[0]);
printf("%s\n", $matches[1]);
}
?>
output:
VALUES ('a','b','c',a,b)
'a','b','c',a,b
VALUES ('a','{a:b, "asd":"ads"}','c',a,b)
'a','{a:b, "asd":"ads"}','c',a,b
If you need to get each value from result, split by , (like parsing CSV)
I hope this will help you :)

Using SQLDF to select specific values from a column

SQLDF newbie here.
I have a data frame which has about 15,000 rows and 1 column.
The data looks like:
cars
autocar
carsinfo
whatisthat
donnadrive
car
telephone
...
I wanted to use the package sqldf to loop through the column and
pick all values which contain "car" anywhere in their value.
However, the following code generates an error.
> sqldf("SELECT Keyword FROM dat WHERE Keyword="car")
Error: unexpected symbol in "sqldf("SELECT Keyword FROM dat WHERE Keyword="car"
There is no unexpected symbol, so I'm not sure whats wrong.
so first, I want to know all the values which contain 'car'.
then I want to know only those values which contain just 'car' by itself.
Can anyone help.
EDIT:
allright, there was an unexpected symbol, but it only gives me just car and not every
row which contains 'car'.
> sqldf("SELECT Keyword FROM dat WHERE Keyword='car'")
Keyword
1 car
Using = will only return exact matches.
You should probably use the like operator combined with the wildcards % or _. The % wildcard will match multiple characters, while _ matches a single character.
Something like the following will find all instances of car, e.g. "cars", "motorcar", etc:
sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")
And the following will match "car" or "cars":
sqldf("SELECT Keyword FROM dat WHERE Keyword like 'car_'")
This has nothing to do with sqldf; your SQL statement is the problem. You need:
dat <- data.frame(Keyword=c("cars","autocar","carsinfo",
"whatisthat","donnadrive","car","telephone"))
sqldf("SELECT Keyword FROM dat WHERE Keyword like '%car%'")
# Keyword
# 1 cars
# 2 autocar
# 3 carsinfo
# 4 car
You can also use regular expressions to do this sort of filtering. grepl returns a logical vector (TRUE / FALSE) stating whether or not there was a match or not. You can get very sophisticated to match specific items, but a basic query will work in this case:
#Using #Joshua's dat data.frame
subset(dat, grepl("car", Keyword, ignore.case = TRUE))
Keyword
1 cars
2 autocar
3 carsinfo
6 car
Very similar to the solution provided by #Chase. Because we do not use subset we do not need a logical vector and can use both grep or grepl:
df <- data.frame(keyword = c("cars", "autocar", "carsinfo", "whatisthat", "donnadrive", "car", "telephone"))
df[grep("car", df$keyword), , drop = FALSE] # or
df[grepl("car", df$keyword), , drop = FALSE]
keyword
1 cars
2 autocar
3 carsinfo
6 car
I took the idea from Selecting rows where a column has a string like 'hsa..' (partial string match)