SQL IN operator, separate input values at commas - sql

user inputs to text_field_tag e.g. a,b,c are seen as one value1 i.e. value1 = 'a,b,c' in a SQL IN operator (value1, value2, value3,...,) instead of value1='a', value2='b' and value3='c'.
I'm using sequel's db.fetch to write the SQL. Splits and Joins and their associated regexp formats don't seem to give the form 'a','b','c', i.e. separate values in a SQL IN Operator.
Any thoughts?

Assuming you have some user input as a string:
user_input = 'a,b,c'
and a posts table with a value1 column. You can get posts with values a, b or c with the following query:
values = user_input.split(',')
#=> ["a", "b", "c"]
DB = Sequel.sqlite
#=> #<Sequel::SQLite::Database: {:adapter=>:sqlite}>
dataset = DB[:posts]
#=> #<Sequel::SQLite::Dataset: "SELECT * FROM `posts`">
dataset.where(:value1 => values).sql
#=> #<Sequel::SQLite::Dataset: "SELECT * FROM `posts` WHERE (`value1` IN ('a', 'b', 'c'))">

Related

Pandas read_sql_query with parameters for a string with no quotes

I have want to insert a string of identifiers into a piece of sql code using
df = pd.read_sql_query(query, self.connection,params=sql_parameter)
my parameter dictionary looks like this
sql_parameter = {'itemids':itemids_str}
where itemids_str is a string like
282940499, 276686324, 2665846, 46875436, 530272885, 2590230, 557021480, 282937154, 46259344
The SQL code looks like
SELECT
xxx,
yyy,
zzz
FROM tablexyz
where some_column_name in ( %(itemids)s )
My current code gets my the parameter inserted with its quotes
where some_column_name in ( '282940499, 276686324, 2665846, 46875436, 530272885, 2590230, 557021480, 282937154, 46259344' )
How can I prevent the string being inserted including the ', these are not part of my string, but I assume they come from the parameter type string when using %s
I don't think there is a provision in params to send a list of numeric values for one condition. I always add such condition directly to the query
item_ids = [str(item_id) for item_id in item_ids]
where_str = ','.join(item_ids)
query = f"""SELECT
xxx,
yyy,
zzz
FROM tablexyz
where some_column_name in ({where_str})"""

I can't query repeated fields in a Google BigQuery table

I've got a table in Google BigQuery which has repeated records in it, I've followed the guide at https://cloud.google.com/bigquery/docs/nested-repeated to create the table successfully and I've populated the table with some test data using
INSERT INTO `<project>.<dataset>.<table>` (<list of fields, ending with repeated record name>)
VALUES
(
"string1", false, 200.0, "string2", 0.2, 2.345, false, "2020-01-02 12:34:56",
[
("repeated field str1", CAST(2.01 AS FLOAT64), CAST(201 as NUMERIC), false),
("repeated field str2", CAST(4.01 AS FLOAT64), CAST(702 as NUMERIC), true)
]
);
(etc)
And the table is successfully populated, also I can query the data with
select * from <dataset>.<table>
and all fields, repeated and non-repeated, are returned.
I can also successfully query the non-repeated fields from the table, as long as no repeated fields are specified in the query.
However when I want to include specific repeated fields in the query (and I'm following the guide at https://cloud.google.com/bigquery/docs/legacy-nested-repeated) for example
SELECT normalfield1, normalfield2, normalfield3,
repeatedData.field1, repeatedData.field2, repeatedData.field3
FROM `profile_dataset.profile_betdatamultiples`;
I get error
Cannot access field <field name> on a value with type ARRAY<STRUCT<fieldname1 STRING, fieldname2 FLOAT64, fieldname3 NUMERIC, ...>> at [8:14]"
(annoyingly GCP truncates the error message so I can't see all of it)
Are there any suggestions for how to proceed here?
Thanks!
Below is for BigQuery Standard SQL
#standardSQL
SELECT normalfield1, normalfield2, normalfield3,
data.field1, data.field2, data.field3
FROM `project.profile_dataset.profile_betdatamultiples`,
UNNEST(repeatedData) data
If to apply to sample data in your question
output is
I recreated the table with this code:
CREATE TABLE `temp.experiment` AS
SELECT "string1" s1, false b, 200.0 i1, "string2" s2, 0.2 f1, 2.345 f2, false b2, TIMESTAMP("2020-01-02 12:34:56") t1,
[
STRUCT ("repeated field str1" AS s1, CAST(2.01 AS FLOAT64) AS f2, CAST(201 as NUMERIC) AS n1, false AS b),
STRUCT ("repeated field str2", CAST(4.01 AS FLOAT64), CAST(702 as NUMERIC), true)
] AS b1
Now I can query particular nested rows like this:
SELECT s1, b, s2
, b1[OFFSET(0)].s1 AS arr_s1, b1[OFFSET(0)].f2, b1[OFFSET(0)].n1
FROM `temp.experiment`
You might want to UNNEST instead of [OFFSET(0)], but the question doesn't say what results you are expecting.

BigQuery query from a nested field

I have a table with a nested field externalIds.value
I want to run a query where 2 entries match values in this field. Here is the query I have, which doesn't return any data.
SELECT keyField, field2, example.field1, example.field2,
externalIds.type, externalIds.value FROM (FLATTEN([dataset1.table1], externalIds))
WHERE
externalIds.value = '157' AND
externalIds.value = 'Some test data'
;
If I run this query with only 1 WHERE clause (externalIds.value = '157') and then run a query WHERE keyField = "The value returned from the previous query", then I get two lines. One showing the externalIds.value as '157' and another result where it is 'Some test data'.
I'm not interested in displaying both values in the result. My priority is to get the keyField WHERE the .value is '157' AND 'Some test data'
Maybe something like that:
SELECT keyField, field2, example.field1, example.field2,
externalIds.type, externalIds.value FROM [dataset1.table1]
OMIT RECORD IF NOT(
SOME(externalIds.value = '157') AND
SOME(externalIds.value = 'Some test data'))

Find similar entries using pig script

I have data as below
1,ref1,1200,USD,CR
2,ref1,1200,USD,DR
3,ref2,2100,USD,DR
4,ref2,800,USD,CR
5,ref2,700,USD,CR
6,ref2,600,USD,CR
I want to group these records where field2 matches, sum(field3) matches and field5 is opposite (means if lhs is "CR" then rhs should be "DR" and viceversa)
How can I achieve this using pig script?
You can also do it this way:
data = LOAD 'myData' USING PigStorage(',') AS
(field1: int, field2: chararray,
field3: int, field4: chararray,
field5: chararray) ;
B = FOREACH (GROUP data BY (field2, field5)) GENERATE group.field2, data ;
-- Since in B there will always be two sets of field2 (one for CR and one for DR)
-- grouping by field2 again will get all pairs of CR and DR
-- (where the sums are equal of course)
C = GROUP B BY (field2, SUM(field3)) ;
The schema and output at the last step:
C: {group: (field2: chararray,long),B: {(field2: chararray,data: {(field1: int,field2: chararray,field3: int,field4: chararray,field5: chararray)})}}
((ref1,1200),{(ref1,{(1,ref1,1200,USD,CR)}),(ref1,{(2,ref1,1200,USD,DR)})})
((ref2,2100),{(ref2,{(4,ref2,800,USD,CR),(5,ref2,700,USD,CR),(6,ref2,600,USD,CR)}),(ref2,{(3,ref2,2100,USD,DR)})})
The output put is a little unwieldy right now, but this will clear it up:
-- Make sure to look at the schema for C above
D = FOREACH C {
-- B is a bag containing tuples in the form: B: {(field2, data)}
-- What we want is to just extract out the data field and nothing else
-- so we can go over each tuple in the bag and pull out
-- the second element (the data we want).
justbag = FOREACH B GENERATE FLATTEN($1) ;
-- Without FLATTEN the schema for justbag would be:
-- justbag: {(data: (field1, ...))}
-- FLATTEN makes it easier to access the fields by removing data:
-- justbag: {(field1, ...)}
GENERATE justbag ;
}
Into this:
D: {justbag: {(data::field1: int,data::field2: chararray,data::field3: int,data::field4: chararray,data::field5: chararray)}}
({(1,ref1,1200,USD,CR),(2,ref1,1200,USD,DR)})
({(4,ref2,800,USD,CR),(5,ref2,700,USD,CR),(6,ref2,600,USD,CR),(3,ref2,2100,USD,DR)})
I'm not sure that I understand your requirements, but you could load the data, split into two sets (filter/split) and the cogroup such as:
data = load ... as (field1: int, field2: chararray, field3: int, field4: chararray, field5: chararray);
crs= filter data by field5='CR';
crs_grp = group crs by field1;
crs_agg = foreach crs_grp generate group.field1 as field1, sum(crs.field3);
drs = filter data by field5='DR';
drs_grp = group drs by field1;
drs_agg = foreach drs_grp generate group.field1 as field1, sum(drs.field3);
g = COGROUP crs_agg BY (field1, field3), drs_agg BY (field1, field3);

Selecting from a table where field = this and value = that

I have a mysql table that looks something like this:
Row 1:
'visitor_input_id' => int 1
'name' => string 'country'
'value' => string 'Canada'
Row 2:
'visitor_input_id' => int 1
'name' => string 'province'
'value' => string 'Alberta'
Row 3:
'visitor_input_id' => int 1
'name' => string 'first_name'
'value' => string 'Jim'
The problem is that I need to be able to filter it so that a user can generate reports using this:
filter 1:
'field_name' => string 'country'
'field_operator' => string '='
'field_value' => string 'Canada'
filter 2:
'field_name' => string 'province'
'field_operator' => string '!='
'field_value' => string 'Alberta'
filter 3:
'field_name' => string 'first_name'
'field_operator' => string '%LIKE%'
'field_value' => string 'Jim'
I am not really sure what the query would look like to be able to select from this using the filters. Any suggestions? (Unfortunately, creating a new table to store the data more sanely is not really feasible at this time because it is already full of user data)
I think it would look something like this:
if(field_name = 'province' THEN ADD WHERE field_value != 'Alberta')
if(field_name = 'country' THEN ADD WHERE field_value = 'Canada')
if(field_name = 'first_name' THEN ADD WHERE field_value LIKE '%jim%')
but I am not sure how that would work...
Turns out that this seems to work:
SELECT * FROM visitor_fields
INNER JOIN visitor_inputs ON (visitor_inputs.input_id = visitor_fields.input_id)
INNER JOIN visitor_fields as filter_0
ON (filter_0.input_id=visitor_inputs.input_id
AND filter_0.field_name = 'province'
AND filter_0.field_value != 'Alberta')
INNER JOIN visitor_fields as filter_1
ON (filter_1.input_id=visitor_inputs.input_id
AND filter_1.field_name = 'country'
AND filter_1.field_value = 'Canada')
INNER JOIN visitor_fields as filter_2
ON (filter_2.input_id=visitor_inputs.input_id
AND filter_2.field_name = 'first_name'
AND filter_2.field_value LIKE '%jim%')
I know you say creating a new table with a better schema isn't feasible, but restructuring the data would make it more efficient to query and easier to work with. Just create a new table (called visitor in my example). Then select from the old table to populate the new visitor table.
vistor
----------------
vistor_id
firstname
province
country
You could loop through the statement below with any scripting language (PHP, TSQL, whatever scripting language you're most comfortable with). Just get a list of all vistor_id's and loop through them with the sql below, replacing the x with the visitor_id.
INSERT INTO visitor (visitor_id, name, province, country) VALUES X,
(SELECT value FROM old_table WHERE name='first_name' AND vistor_id = x),
(SELECT value FROM old_table WHERE name='province' AND vistor_id = x),
(SELECT value FROM old_table WHERE name='country' AND vistor_id = x);
This will produce a table where all a visitor's data is on a single row.
Are you able to create an SQL string and then execute it? The string would look like this:
SELECT * FROM yourtable
WHERE (name='country' AND value='Canada') AND
(name='province' AND value!='Alberta') AND
(name='first_name' AND value LIKE '%jim%)
EDIT:
I see. Multiple records. So try joining them. This is not correct SQL syntax but should look similar:
SELECT * FROM
(SELECT * FROM yourtable WHERE (name='country' AND value='Canada'))
JOIN on visitor_input_id
(SELECT * FROM yourtable WHERE (name='province' AND value!='Alberta'))
JOIN on visitor_input_id
(SELECT * FROM yourtable WHERE (name='first_name' AND value LIKE '%jim%))