Grouping items into first occurances - sql

I totally can't get my head around writing a correct query for my problem this morning so here's hoping that someone out there can help me out.
I have a database table called Sessions which basically looks like this
Sessions:
SessionID
SessionStarted
IPAddress
..other meta data..
I have a requirement where I am to show how many new Sessions (where new is defined as from a previously unseen IPAddress) arrive each day over a given period. Basically, each IPAddress should count only once in the results, namely for the day of the first session from the IPAddress. So I'm looking for a result like:
[Date] [New]
2009-10-01 : 11
2009-10-02 : 6
2009-10-03 : 19
..and so on
...which I can plot on some nice chart and show to important people. I would very much prefer a Linq2SQL query as that is what we are currently using for data access, but if I'm out of luck I may be able to go with some raw SQL (accessed via stored procedure, but I would really, really, really prefer Linq2SQL).
(As a bonus my next step will very likely be qualifying which sessions should be included by filtering on some of the other meta data)
Hoping that someone clever will help me out here...

I would use something like this.
var result = data.OrderBy(x => x.SessionStarted)
.GroupBy(x => x.IPAddress)
.Select(x => x.First())
.GroupBy(x => x.SessionStarted.Date)
.Select(x => new { Date = x.Key, New = x.Count() });

Related

.NET Core - EF - trying to match/replace strings with digits, causes System.InvalidOperationException

I have to match records in SQL around a zip code with a min/max range
the challenge is that the data qualities are bad, some zipcodes are not numbers only
so I try to match "good zip codes" either by discarding bad ones or even keeping only digits
I dont know how to use Regex.Replace(..., #"[^\d]", "") instead of Regex.Match(..., #"\d") to fit in the query bellow
I get an error with the code bellow at runtime
I tried
Regex.IsMatch
SqlFunctions.IsNumeric
they all cause errors at runtime, here is the code :
var data = context.Leads.AsQueryable();
data = data.Include(p => p.Company).Include(p => p.Contact);
data = data.Where(p => Regex.IsMatch(p.Company.ZipCode, #"\d"));
data = data.Where(p => Convert.ToInt32(p.Company.ZipCode) >= range.Min);
data = data.Where(p => Convert.ToInt32(p.Company.ZipCode) <= range.Max);
here is the error :
System.InvalidOperationException: The LINQ expression 'DbSet<Lead>
.Join(
outer: DbSet<Company>,
inner: l => EF.Property<Nullable<int>>(l, "CompanyId"),
outerKeySelector: c => EF.Property<Nullable<int>>(c, "Id"),
innerKeySelector: (o, i) => new TransparentIdentifier<Lead, Company>(
Outer = o,
Inner = i
))
.Where(l => !(Regex.IsMatch(
input: l.Inner.ZipCode,
pattern: "\d")))' could not be translated. Either rewrite the query in a form that can be translated, or switch to client evaluation explicitly by inserting a call to either AsEnumerable(), AsAsyncEnumerable(), ToList(), or ToListAsync().
I am not sure how to solve this. I really don't see how AsEnumerable(), AsAsyncEnumerable(), ToList(), or ToListAsync() could help here
what do I do wrong ?
thanks for your help
Wnen you use querable list, Ef Core 5 is always trying to translate query to SQl, so you have to use code that SQL server could understand. If you want to use C# function you will have to download data to Server using ToList() or ToArray() at first and after this you can use any c# functions using downloaded data.
You can try something like this:
var data = context.Leads
.Include(p => p.Company)
.Include(p => p.Contact)
.Where(p =>
p.Company.Zipcode.All(char.IsDigit)
&& (Convert.ToInt32(p.Company.ZipCode) >= range.Min) //or >=1
&& ( Convert.ToInt32(p.Company.ZipCode) <= range.Max) ) // or <=99999
.ToArray();
I tried everything imaginable
all sorts of linq/ef trickeries, I even tried to define a DBFunction that was never found
once I had a running stored procedure written dirrectly in SQL, I ended up with a list, not with an IQueryable, so I was back to #1
finaly, I just created a new field in my table :
ZipCodeNum
which holds a filtered , converted version of the zipcode string

How to Convert foor loop to NHibernate Futures for performance

NHibernate Version: 3.4.0.4000
I'm currently working on optimizing our code so that we can reduce the number of round trips to the database and am looking at a for loop that is one of the culprits. I'm having a hard time figuring out how to batch all of these iterations into a future that gets executed once when sent to SQL Server. Essentially each iteration of the loop causes 2 queries to hit the database!
foreach (var choice in lineItem.LineItemChoices)
{
choice.OptionVersion = _session.Query<OptionVersion>().Where(x => x.Option.Id == choice.OptionId).OrderByDescending(x => x.OptionVersionNumber).FirstOrDefault();
choice.ChoiceVersion = _session.Query<ChoiceVersion>().OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber).Where(x => x.Choice.Id == choice.ChoiceId).FirstOrDefault();
}
One option is to extract OptionId and ChoiceId from all the LineItemChoices into two lists in local memory. Then issue just two queries, one for options and one for choices, giving these lists in .Where(x => optionIds.Contains(x.Option.Id)). This corresponds to SQL IN operator. This requires some postprocessing. You will get two result lists (transform to dictionary or lookup if you expect many results), that you need to process to populate the choice objects. This postprocessing is local and tends to be very cheap compared to database roundtrips. This option can be a bit tricky if the existing FirstOrDefault part is absolutely necessary. Do you expect there to be more than result for a single optionId? If not, this code could instead have used SingleOrDefault, which could just be dropped if converting to use IN-queries.
The other option is to use futures (https://nhibernate.info/doc/nhibernate-reference/performance.html#performance-future). For Linq it means to use ToFuture or ToFutureValue at the end, which also conflicts with FirstOrDefault I believe. The important thing is that you need to loop over all line item choices to initialize ALL queries BEFORE you access the value of any of them. So this is likely to also result in some postprocessing, where you would first store the future values in some list, and then in a second loop access the real value from each query to populate the line item choice.
If you to expect that the queries can yield more than one result (before applying FirstOrDefault), I think you can just use Take(1) instead, as that will still return an IQueryable where you can apply the future method.
The first option is probably the most efficient, since it will just be two queries and allow the database engine to make just one pass over the tables.
Keep the limit on the maximum number of parameters that can be given in an SQL query in mind. If there can be thousands of line item choices, you may need to split them in batches and query for at most 2000 identifiers per round trip.
Adding on the Oskar answer, NHibernate Futures was implement in NHibernate 2.1. It is available on method Future for collections and FutureValue for single values.
In your case, you could separate the IDs of the list in memory ...
var optionIds = lineItem.LineItemChoices.Select(x => x.OptionId);
var choiceIds = lineItem.LineItemChoices.Select(x => x.ChoiceId);
... and execute two queries using Future<T> to get two lits in one hit over the database.
var optionVersions = _session.Query<OptionVersion>()
.Where(x => optionIds.Contains(x.Option.Id))
.OrderByDescending(x => x.OptionVersionNumber)
.Future<OptionVersion>();
var choiceVersions = _session.Query<ChoiceVersion>()
.Where(x => choiceIds.Contains(x.Choice.Id))
.OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber)
.Future<ChoiceVersion>();
After with all you need in memory, you could loop on the original collection you have and search in memory the data to fill up the choice object.
foreach (var choice in lineItem.LineItemChoices)
{
choice.OptionVersion = optionVersions.OrderByDescending(x => x.OptionVersionNumber).FirstOrDefault(x => x.Option.Id == choice.OptionId);
choice.ChoiceVersion = choiceVersions.OrderByDescending(x => x.ChoiceVersionIdentity.ChoiceVersionNumber).FirstOrDefault(x => x.Choice.Id == choice.ChoiceId);
}

Rails joins query killed or just too slow. Please recommend the proper way to create queries

I am having trouble properly creating query with joins. It starts to talk to server but it ends up clogged and saying "killed" (in Rails console)
I have to models.
One is 'User', the other one is 'Availability'
Some users will open availabilities in 2 weeks. And I'd like to fetch 50 users with this condition with page variable.(because there will be many of them and I'd like to fetch 50 on every call)
Availability has two columns: user_id and start_time(datetime)
And the association is that user has many availabilities.
The query looks like the below.
people = User
.where(role: SOMETHING)
.includes(:availabilities)
.joins(:availabilities)
.where('availabilities.start_time > ?', Time.now)
.where('availabilities.start_time < ?', Time.now + 2.weeks)
.limit(5)
.offset(50 * (n-1))
where n is integer from 1
However, this query never gives me result on the production (in the console it's killed. Before the console kills the process, when querying, it shows normal query statement (sql 30ms for instance) forever. In local, where data is small, it works. Are there anything missing here?
Please give me any advice!!
And weird thing is ,
people = User
.where(role: SOMETHING)
.includes(:availabilities)
.joins(:availabilities)
.limit(5)
.offset(50 * (n-1))
then if
people.map(&:id) => [18,18,18,18,18]
which means people is inappropriately fetched. I am just confused here..!
Includes availabilities will generate availability model instances after querying.
If availability rows are so many, it will cost a lot of time.
If you won't use availabilities after querying, please try
people = User
.where(role: ROLES)
.joins(:availabilities)
.where(availabilities: {start_time: (Time.now)..(2.weeks.from_now)})
.offset(50 * (n-1))
.limit(5)
I kind of find the way to work it:
people = User
.where(role: ROLES)
.joins(:availabilities)
.where(availabilities: { start_time: (Time.now)..(2.weeks.from_now) })
.distinct
.offset(50 * (n-1))
.limit(5)
Then:
people.map(&:id), => [x,y,z,l,m]

More efficient Active Record query for large number of columns

I'm trying to work out a more efficient way to add a note count, with a couple of simple where conditions applied to the query. This can take forever, though, as there are as many as 20K records to iterate over. Would welcome any thinking on this.
def reblog_array(notes)
data = []
notes.select('note_type, count(*) as count').where(:note_type => 'reblog', :created_at => Date.today.years_ago(1)..Date.today).group('DATE(created_at)').each do |n|
data << n.count
end
return data
end
This is what's passed to reblog_array(notes) from my controller.
#tumblr = Tumblr.find(params[:id])
#notes = Note.where("tumblr_id = '#{#tumblr.id}'")
From what I can tell, you are trying to calculate how many reblogs/day this Tumblr account/blog had? If so,
notes.where(:note_type => 'reblog', :created_at => Date.today.years_ago(1)..Date.today).group('DATE(created_at)').count.values
should give you the right result, without having to iterate over the result list again. One thing to note, your call right now won't indicate when there are days with 0 reblogs. If you drop the call to #values, you'll get a hash of date => count.
As an aside and in case you didn't know, I'd also suggest making more use of the ActiveRecord relations:
Class Tumblr
has_many :notes
end
#tumblr = Tumblr.find(params[:id])
#notes = #tumblr.notes
this way you avoid writing code like Note.where("tumblr_id = '#{#tumblr.id}'"). It's best to avoid string-interpolated parameters, in favour of code like Note.where(:tumblr_id => #tumblr.id) or Note.where("tumblr_id = ?", #tumblr.id) to leave less chance that you'll write code vulnerable to SQL injection

Rails 3 selecting only values

In rails 3, I would like to do the following:
SomeModel.where(:some_connection_id => anArrayOfIds).select("some_other_connection_id")
This works, but i get the following from the DB:
[{"some_other_connection_id":254},{"some_other_connection_id":315}]
Now, those id-s are the ones I need, but I am uncapable of making a query that only gives me the ids. I do not want to have to itterate over the resulst, only to get those numbers out. Are there any way for me to do this with something like :
SomeModel.where(:some_connection_id => anArrayOfIds).select("some_other_connection_id").values()
Or something of that nautre?
I have been trying with the ".select_values()" found at Git-hub, but it only returns "some_other_connection_id".
I am not an expert in rails, so this info might be helpful also:
The "SomeModel" is a connecting table, for a many-to-many relation in one of my other models. So, accually what I am trying to do is to, from the array of IDs, get all the entries from the other side of the connection. Basicly I have the source ids, and i want to get the data from the models with all the target ids. If there is a magic way of getting these without me having to do all the sql myself (with some help from active record) it would be really nice!
Thanks :)
Try pluck method
SomeModel.where(:some => condition).pluck("some_field")
it works like
SomeModel.where(:some => condition).select("some_field").map(&:some_field)
SomeModel.where(:some_connection_id => anArrayOfIds).select("some_other_connection_id").map &:some_other_connection_id
This is essentially a shorthand for:
results = SomeModel.where(:some_connection_id => anArrayOfIds).select("some_other_connection_id")
results.map {|row| row.some_other_connection_id}
Look at Array#map for details on map method.
Beware that there is no lazy loading here, as it iterates over the results, but it shouldn't be a problem, unless you want to add more constructs to you query or retrieve some associated objects(which should not be the case as you haven't got the ids for loading the associated objects).