I'm wondering if someone can help me convert this SQL query to SQL alchemy ORM. I am having issues in bracketing the And statements with Or statements. Here is a simpler version of the query I am trying to create:
SELECT * FROM [dbo].[UpcomingThings] WHERE ([Id] = 'ES1234' AND [Date] = '2021-08-18' AND [Period] = 27) OR ([Id] = 'ES0197' AND [Date] = '2021-08-18' AND [Period] = 29)
Note that Id in this case is not unique, so I have to rely on multiple other columns to make it unique. I have tried using .filter and .filter(_or()) in various combinations but I cant seem to get it so that its a WHERE (bracket and condition) OR (bracket and condition)
EDIT:
if sql alchemy was as simple as this, this is what i'd do assuming _or would give me an OR:
session.query(models.UpcomingThings).filter(UpcomingThings.Id == 'ES1234').filter(UpcomingThings.Period == 27).filter(UpcomingThings.SettlementDate == 2021-08-18).filter(or_(UpcomingThings.Id=='ES0197')).filter(UpcomingThings.Date == 2021-08-18)).filter(UpcomingThings.Period == 29))
Is there also no way I could do
session.query(models.UpcomingThings).filter(or_((AND STATEMENT), (AND STATEMENT))
I've tried to simply even do a .filter inside the or_ but obviously thats a syntax error!
Please can someone help convert this to SQLalchemy ORM! Thank you.
I think it will seem clear after the fact but you need to use and_ and or_ to achieve this. The call to filter() applies an implicit and_ but it won't work when you need the outer sql OR. This should do what you want to achieve. I tried to format it to make it more clear. or_ will join the conditions with sql OR and and_ will join the conditions with AND.
In one case you use SettlementDate and another just Date, should those be the same? I just changed it to SettlementDate to try the schema.
Python Example
#...
from sqlalchemy.sql import or_, and_
from datetime import date
class UpcomingThings(Base):
__tablename__ = 'upcoming_things'
Id = Column(String, primary_key=True, index=True)
Period = Column(Integer)
SettlementDate = Column(Date)
Base.metadata.create_all(engine)
with Session(engine) as session:
q = session.query(
UpcomingThings
).filter(
or_(
and_(
UpcomingThings.Id=='ES1234',
UpcomingThings.Period==27,
UpcomingThings.SettlementDate==date(year=2021, month=8, day=18)),
and_(
UpcomingThings.Id=='ES0197',
UpcomingThings.SettlementDate==date(year=2021, month=8, day=18),
UpcomingThings.Period==29)))
print (q)
Printed Output
SELECT upcoming_things."Id" AS "upcoming_things_Id", upcoming_things."Period" AS "upcoming_things_Period", upcoming_things."SettlementDate" AS "upcoming_things_SettlementDate"
FROM upcoming_things
WHERE upcoming_things."Id" = ? AND upcoming_things."Period" = ? AND upcoming_things."SettlementDate" = ? OR upcoming_things."Id" = ? AND upcoming_things."SettlementDate" = ? AND upcoming_things."Period" = ?
Related
In the R programming language, I am interested in performing a "fuzzy join" and passing this through a SQL Connection:
library(fuzzyjoin)
library(dplyr)
library(RODBC)
library(sqldf)
con = odbcConnect("some name", uid = "some id", pwd = "abc")
sample_query = sqlQuery( stringdist_inner_join(table_1, table_2, by = "id2", max_dist = 2) %>%
filter(date_1 >= date_2, date_1 <= date_3) )
view(sample_query)
However, I do not think this is possible, because the function which us being used for the "fuzzy join" (stringdist_inner_join) is not supported .
I tried to find the source code for this "fuzzy join" function, and found it over here: https://rdrr.io/cran/fuzzyjoin/src/R/stringdist_join.R
My Question: Does anyone know if it is possible to (manually) convert this "fuzzy join" function into an SQL format that will be recognized? Are there any quick ways to re-write this function (stringdist_inner_join) such that it can be recognized by Netezza? Are there any pre-existing ways to do this?
Right now I can only execute "sample_query" on locally - re-writing this function (stringdist_inner_join) would let perform the "sample_query" much faster.
Does anyone know if this is possible?
Note:
My data looks like this:
table_1 = data.frame(id1 = c("123 A", "123BB", "12 5", "12--5"), id2 = c("11", "12", "14", "13"),
date_1 = c("2010-01-31","2010-01-31", "2015-01-31", "2018-01-31" ))
table_1$id1 = as.factor(table_1$id1)
table_1$id2 = as.factor(table_1$id2)
table_1$date_1 = as.factor(table_1$date_1)
table_2 = data.frame(id1 = c("0123", "1233", "125 .", "125_"), id2 = c("111", "112", "14", "113"),
date_2 = c("2009-01-31","2010-01-31", "2010-01-31", "2010-01-31" ),
date_3 = c("2011-01-31","2010-01-31", "2020-01-31", "2020-01-31" ))
table_2$id1 = as.factor(table_2$id1)
table_2$id2 = as.factor(table_2$id2)
table_2$date_2 = as.factor(table_2$date_2)
table_2$date_3 = as.factor(table_2$date_3)
Based on your other post about this issue, a solution to the question of how to structure the SQL query was solved:
SAS: Fuzzy Joins
select a.*, b.*
from table_a a
inner join table_b b
on (a.date_1 between b.date_2 and b.date_3)
and (le_dst(a.id1, b.id1) = 1 or a.id2 = b.id2)
To get this to run in an R script, I would recommend using dbplyr and creating this using tbl so you can continue doing basic manipulation of it as if it were a data.frame and dbplyr will translate it into SQL (at least basic commands), then combine everything into a query and eventually pull the data from the query with the collect() function.
Edit: Just a note, the tbl command will start building a SQL statement and get column names, but it won't run it to pull data until you enter collect() at which point, R will send the query to the server, the server will run the query and send the data.
Just keep this in mind because if dbplyr can't translate something to SQL, it will assume it's a SQL command and try to send it, so you won't know there's an error until you try to collect. For example, a function from the stringr package, str_dectect, isn't implemented in dbplyr and so dbplyr would send that command to the database, which would throw an error because it doesn't know what that is, but only after running collect(). Check out the dbplyr page linked above for details.
library(dbplyr)
new_con<- dbConnect(
odbc(),
Driver= "ODBC Driver 17 for SQL Server (as an example)",
Server = "Server name here",
uid = "some_id",
pwd = "abc"
)
sample_query<- dbplyr::tbl(
new_con,
dbplyr::sql(
"select a.*, b.*
from table_a a
inner join table_b b
on (a.date_1 between b.date_2 and b.date_3)"
sample_data<-sample_query %>%
filter(silly_example==TRUE) %>%
collect()
I agree with #Roger-123's approach. But here is a variation that might assist:
Assuming you are using remote connections to access the Netezza database, you could do this using dbplyr as follows:
remote_1 = tbl(con, "table_1_name")
remote_2 = tbl(con, "table_2_name")
# create dummy column
remote_1 = mutate(remote_1, ones = 1)
remote_2 = mutate(remote_2, ones = 1)
output = remote_1 %>%
# cross_join
inner_join(remote_2, by = "ones", prefix = c("_1","_2")) %>%
# calculate Levenshtein distance
mutate(distance = le_dst(id1, id2)) %>%
# filter to close matches
filter(distance <= 2)
Notes:
dbplyr does not allow for complex conditions in its joins. Hence we do the most general join possible and then filter.
If you also want joins by date, then you can put them into the inner_join if the conditions are simple, or create another filter condition if they are complex.
le_dst is not an R function and there is no dbplyr translation for it, so dbplyr will pass it to the server as-is.
Netezza accepts two distance functions for text: le_dst and dle_dst. You can use whichever you please here.
Output is a query, it will act like a table but it is being generated/calculated on the fly. It has not been written to disk or loaded into R memory. Depending on your application you will want to store/save this.
I've created a quiz and I record in DB if people answered right to all question and the time they take to finish the quiz.
I'm trying to create a querybuilder to retrieve the guy who answered correct to the maximum of questions with the minimum of time.
My table looks like this :
So, the request (which works) I did in SQL in the DB is :
SELECT
id
FROM
public.user_quizz
WHERE
quizz_id = 4
AND
number_correct_answers IN (SELECT max(number_correct_answers) FROM user_quizz WHERE quizz_id = 4)
AND
answered_in IN (SELECT min(answered_in) FROM user_quizz WHERE quizz_id = 4);
Of course, I don't know if it's the best (and the most optimal) request we could do in this case, but it works.
Now, I'm trying to translate this query into querybuilder.
I'm blocked on the IN expression. I don't know how I could do the SELECT here.
$qb = $this->createQueryBuilder('u');
$query = $qb->select('u')
->andWhere(
$qb->expr()->eq('u.quizz', ':quizzId'),
$qb->expr()->in(
'u.numberCorrectAnswers',
)
)
->setParameter('quizzId', $quizz->getId())
->getQuery()
;
Thanks for your help.
$qbSelectMax = $this->createQueryBuilder('uc') // user copy, to prevent alias collisions
$qbSelectMax
->select($qb->expr()->max('uc.numberCorrectAnswers'))
->where($qb->expr()->eq('uc.quizz', ':quizzId'));
$qb = $this->createQueryBuilder('u')
$query = $qb->select('u')
->andWhere(
$qb->expr()->eq('u.quizz', ':quizzId'),
$qb->expr()->in(
'u.numberCorrectAnswers',
$qbSelectMax->getDQL()
)
)
->setParameter('quizzId', $quizz->getId())
->getQuery();
You can create sub DQL query to select max numberCorrectAnswers first and then pass DQL right into in parameter
ORDER BY could be used to sort by number of answers and less time taken:
$qb = $this->createQueryBuilder('u')
->orderBy('u.numberCorrectAnswers', 'DESC')
->addOrderBy('u.answeredIn', 'ASC');
I'm trying to select the following data with the limited information. The problem is that when I have added the .select distinct section it has killed my query.
#activities = Availability.select.("DISTINCT user_id").where("team_id = ? and schedule_id = ?", current_user[:team_id], #next_game).last(5)
There's one too many dot's in there as the 'DISTINCT user_id' is the arguments for the select method call.
So:
Availability.select("DISTINCT user_id").where("team_id = ? and schedule_id = ?", current_user[:team_id], #next_game).last(5)
Also be aware that you're now only selecting one attribute and you'll get a partial representation of the classes back. To circumvent this just select the attributes you need later in the code.
Availability.select("DISTINCT(`user_id`), `team_id`").where("team_id = ? and schedule_id = ?", current_user[:team_id], #next_game).last(5)
etc.
Hope this helps.
I am trying to work out how to covert the script below from SQL in to LINQ. Any help would be welcome.
SELECT *
FROM
[tableName]
WHERE
[MyDate] IN
(SELECT
MAX([MyDate])
FROM
[tableName]
GROUP BY
[MyID])
I can't find an equivalent for the "IN" clause section. There are existing questions on this forum but none that cover selecting a DateTime.
Thanks in advance.
You can use the ".Contains(..)" function:
e.g.
var itemQuery = from cartItems in db.SalesOrderDetails
where cartItems.SalesOrderID == 75144
select cartItems.ProductID;
var myProducts = from p in db.Products
where itemQuery.Contains(p.ProductID)
select p;
Although it looks like 2 round trips, as the LINQ only constructs the query when the IEnumerable is tripped, you should get reasonable performance.
I think Any() is what you are looking for:
var result = tableName.Where(x =>
(from t in tableName
group t by t.MyID into g
where g.Max(y => y.MyDate) == x.MyDate
select 1).Any())
I've got myself in a bit of a pickle!
I've done a snazzy LINQ statement that does the job in my web app, but now I'd like to use this in a stored procedure:
var r = (from p in getautocompleteweightsproducts.tblWeights
where p.MemberId == memberid &&
p.LocationId == locationid
select p);
if (level != "0")
r = r.Where(p => p.MaterialLevel == level);
if (column == "UnitUserField1")
r = r.Where(p => p.UnitUserField1 == acitem);
if (column == "UnitUserField2")
r = r.Where(p => p.UnitUserField2 == acitem);
return r.OrderBy(p => p.LevelNo).ToList();
However, I can't for the life of me get the conditional where clause to work!!
If someone can point me in the right direction, I'd be most grateful.
Kind regards
Maybe something like this?
SELECT *
FROM dbo.weights
WHERE member_id = #memberid
AND location_id = #locationid
AND material_level = CASE WHEN #level = '0' THEN material_level
ELSE #level END
AND #acitem = CASE #column WHEN 'UnitUserField1' THEN unit_user_field_1
WHEN 'UnitUserField2' THEN unit_user_field_2
ELSE #acitem END
ORDER BY level_no
Have you tried LinqPAD, I'm pretty sure last time I played with that you could enter "LINQ to SQL" code and see the resulting SQL that produced. Failing that, place a SQL trace/profiler on your code running the LinqTOSQL and find the query being executed in the trace.
LukeH's answer will give you the correct rows, but there is something lost when you try to replace a query-generating-machine with a single query. There are parts of that query that are opaque to the optimizer.
If you need the original queries as-would-have-been-generated-by-linq, there are two options.
Generate every possible query and control which one runs by IF ELSE.
Use Dynamic sql to construct each query (although this trades away many of the benefits of using a stored procedure).
If you do decide to use dynamic sql, you should be aware of the curse and blessings of it.