SQL string lookup. This worked, but seems like it isn't best practice - sql

Using SSMS 2014
I have data in the db in a particular column. The average line of data looks something like this:
(5:30) 3-J.WINSTON PASS INCOMPLETE DEEP LEFT TO 13-M.EVANS (23-R.ALFORD).
My task is to retrieve the first player featured in the line, in this case I am retrieving J.Winston.
I used this to retrieve and update a column with the name. It worked exactly as needed. But something about it tells me this is poorly constructed. Any tips on improving?
UPDATE nfl.dbo.Temp_NFL2015
Set Player_Name = SUBSTRING(description,LEN(Left(description,Patindex('%-%',description)))+1,CHARINDEX(' ',description,LEN(Left(description,Patindex('%-%',description))))-1-LEN(Left(description,PATINDEX('%-%', description))))
from NFL.dbo.Temp_NFL2015

Normalization of the data would help simplify things when you're extracting data or doing research.
Considering your example:
(5:30) 3-J.WINSTON PASS INCOMPLETE DEEP LEFT TO 13-M.EVANS (23-R.ALFORD).
The syntax (roughly) works out to:
(Game Time) (Source Player) (Action) (Target Player) ((Other Involved Players))
This suggests a base table Actions with the following structure:
ActionID (identity to mark discrete actions)
GameTime
SourcePlayer
Action
TargetPlayer
To include other involved players, we split that off into a separate table, Action_InvolvedPlayers:
InvolvedActionID (referent to Actions.ActionID)
InvolvedPlayer
Points for consideration:
Your dataset likely contains multiple games, so consider an explicit way to say 'this game had this action take place'
It might make sense to store player data in another table and just referent values in the Actions table.

Related

Set a filter with distinct values

Is there a way to setup something like a <SelectInput> filter on a column of the list to get only distinct values of this column ?
Something like the <ReferenceInput> but on the same table and with unique values ...
No, but for good reason. Say you have data with billions of distinct records. You don't want your frontend determining what is unique. Instead you want an API that can support that data specifically, and hopefully quickly.
So long story short, you'll need an API for that.
Along the lines of what Shawn K says, perhaps create a View on your backend that represents the state of what is currently 'distinct', acknowledging that it might be stale/non-realtime. Then you could use the contents of that View to represent the choices available to the user. If generating the distinct set of values is non-performant, then if you're in a DB like Postgres (et al), create a Materialized View, refreshed on a timer.
The binding of the view data to the becomes the trick at that point, but there are probably clues to doing that here of SO and you could piece these two together.
BTW, I use Views regularly to handle edge certain edge cases like this. Beats caching data in a middle tier for sure.

Reference an arbitrary row and field in another table

Is there any form (data type, inherence..) of implement in postgresql something like this:
CREATE TABLE log (
datareferenced table_row_column_reference,
logged boolean
);
The referenced data may be any row field from the database. My objective is implement something like this without use Procedural Language or implement it in a higher layer, using only a relational approach and without modify the rest of the tables. Another feature may be referencial integrity, example:
-- Table foo (id, field1, field2, fieldn)
-- ('bar', '2014-01-01', 4.33, Null)
-- Table log (datareferenced, logged)
-- ({table foo -> id:'bar' -> field2 } <=> 4.33, True)
DELETE FROM foo where id='bar';
-- as result, on cascade, deleted both rows.
I have an application build onto a MVC pattern. The logic is written in Python. The application is a management tool, very data intensive. My goal is implement a module that could store additional information per every data present in the DDBB. Per example, a client have a serie of attributes (name, address, phone, email ...) across multiple tables, and I want that the app could store metadata-like for every registry from all the DDBB. A metadata could be last modification, or a user flag, etc.
I have implemented the metadata model (in postgres), its mapping to objects and a parcial API. But the part left is the most important, the glue. My plan B is create that glue in the data mapping layer as a module. Something like this:
address= person.addresses[0]
address.saveMetadata('foo', 'bar')
-- in the superclass of Address
def saveMetadata(self, code, value):
self.mapper.metadata_adapter.save(self, code, value)
-- in the metadata adapter class:
def save(self, entity, code, value):
sql = """update value=%s from metadata_values
where code=%s and idmetadata=
(select id from metadata_rels mr
where mr.schema=%s and mr.table=%s and
mr.field=%s and mr.recordpk=%s)"""%
(value, code,
self.class2data[entity.__class__]["schema"],
self.class2data[entity.__class__]["table"],
self.class2data[entity.__class__]["field"],
entity.id)
self.mapper.execute(sql)
def read(self, entity , code):
sql = """select mv.value
from metadata_values mv
join metadata_rels mr on mv.idmetadata=mr.id
where mv.code=%s and mr.schema=%s and mr.table=%s and
mr.field=%s and mr.recordpk=%s"""%
(code,
self.class2data[entity.__class__]["schema"],
self.class2data[entity.__class__]["table"],
self.class2data[entity.__class__]["field"],
entity.id )
return self.mapper.execute(sql)
But it would add overhead between python and postgresql, complicate Python logic, and using PL and triggers may be very laborious and bug-prone. That is why i'm looking at doing the same at the DDBB level.
No, there's nothing like that in PostgreSQL.
You could build triggers yourself to do it, probably using a composite type. But you've said (for some reason) you don't want to use PL/PgSQL, so you've ruled that out. Getting RI triggers right is quite hard, though, and you must apply a trigger to the referencing and referenced ends.
Frankly, this seems like a square peg, round hole kind of problem. Are you sure PostgreSQL is the right choice for this application?
Describe your needs and goal in context. Why do you want this? What problem are you trying to solve? Maybe there's a better way to approach the same problem one step back...

nhibernate and DDD suggestion

I am fairly new to nHibernate and DDD, so please bear with me.
I have a requirement to create a new report from my SQL table. The report is read-only and will be bound to a GridView control in an ASP.NET application.
The report contains the following fields Style, Color, Size, LAQty, MTLQty, Status.
I have the entities for Style, Color and Size, which I use in other asp.net pages. I use them via repositories. I am not sure If should use the same entities for my report or not. If I use them, where I am supposed to map the Qty and Status fields?
If I should not use the same entities, should I create a new class for the report?
As said I am new to this and just trying to learn and code properly.
Thank you
For reports its usually easier to use plain values or special DTO's. Of course you can query for the entity that references all the information, but to put it into the list (eg. using databinding) it's handier to have a single class that contains all the values plain.
To get more specific solutions as the few bellow you need to tell us a little about your domain model. How does the class model look like?
generally, you have at least three options to get "plain" values form the database using NHibernate.
Write HQL that returns an array of values
For instance:
select e1.Style, e1.Color, e1.Size, e2.LAQty, e2.MTLQty
from entity1 inner join entity2
where (some condition)
the result will be a list of object[]. Every item in the list is a row, every item in the object[] is a column. This is quite like sql, but on a higher level (you describe the query on entity level) and is database independent.
Or you create a DTO (data transfer object) only to hold one row of the result:
select new ReportDto(e1.Style, e1.Color, e1.Size, e2.LAQty, e2.MTLQty)
from entity1 inner join entity2
where (some condition)
ReportDto need to implement a constructor that has all this arguments. The result is a list of ReportDto.
Or you use CriteriaAPI (recommended)
session.CreateCriteria(typeof(Entity1), "e1")
.CreateCriteria(typeof(Entity2), "e2")
.Add( /* some condition */ )
.Add(Projections.Property("e1.Style", "Style"))
.Add(Projections.Property("e1.Color", "Color"))
.Add(Projections.Property("e1.Size", "Size"))
.Add(Projections.Property("e2.LAQty", "LAQty"))
.Add(Projections.Property("e2.MTLQty", "MTLQty"))
.SetResultTransformer(AliasToBean(typeof(ReportDto)))
.List<ReportDto>();
The ReportDto needs to have a proeprty with the name of each alias "Style", "Color" etc. The output is a list of ReportDto.
I'm not schooled in DDD exactly, but I've always modeled my nouns as types and I'm surprised the report itself is an entity. DDD or not, I wouldn't do that, rather I'd have my reports reflect the results of a query, in which quantity is presumably count(*) or sum(lineItem.quantity) and status is also calculated (perhaps, in the page).
You haven't described your domain, but there is a clue on those column headings that you may be doing a pivot over the data to create LAQty, MTLQty which you'll find hard to do in nHibernate as its designed for OLTP and does not even do UNION last I checked. That said, there is nothing wrong with abusing HQL (Hibernate Query Language) for doing lightweight reporting, as long as you understand you are abusing it.
I see Stefan has done a grand job of describing the syntax for that, so I'll stop there :-)

Database : best way to model a spreadsheet

I am trying to figure out the best way to model a spreadsheet (from the database point of view), taking into account :
The spreadsheet can contain a variable number of rows.
The spreadsheet can contain a variable number of columns.
Each column can contain one single value, but its type is unknown (integer, date, string).
It has to be easy (and performant) to generate a CSV file containing the data.
I am thinking about something like :
class Cell(models.Model):
column = models.ForeignKey(Column)
row_number = models.IntegerField()
value = models.CharField(max_length=100)
class Column(models.Model):
spreadsheet = models.ForeignKey(Spreadsheet)
name = models.CharField(max_length=100)
type = models.CharField(max_length=100)
class Spreadsheet(models.Model):
name = models.CharField(max_length=100)
creation_date = models.DateField()
Can you think about a better way to model a spreadsheet ? My approach allows to store the data as a String. I am worried about it being too slow to generate the CSV file.
from a relational viewpoint:
Spreadsheet <-->> Cell : RowId, ColumnId, ValueType, Contents
there is no requirement for row and column to be entities, but you can if you like
Databases aren't designed for this. But you can try a couple of different ways.
The naiive way to do it is to do a version of One Table To Rule Them All. That is, create a giant generic table, all types being (n)varchars, that has enough columns to cover any forseeable spreadsheet. Then, you'll need a second table to store metadata about the first, such as what Column1's spreadsheet column name is, what type it stores (so you can cast in and out), etc. Then you'll need triggers to run against inserts that check the data coming in and the metadata to make sure the data isn't corrupt, etc etc etc. As you can see, this way is a complete and utter cluster. I'd run screaming from it.
The second option is to store your data as XML. Most modern databases have XML data types and some support for xpath within queries. You can also use XSDs to provide some kind of data validation, and xslts to transform that data into CSVs. I'm currently doing something similar with configuration files, and its working out okay so far. No word on performance issues yet, but I'm trusting Knuth on that one.
The first option is probably much easier to search and faster to retrieve data from, but the second is probably more stable and definitely easier to program against.
It's times like this I wish Celko had a SO account.
You may want to study EAV (Entity-attribute-value) data models, as they are trying to solve a similar problem.
Entity-Attribute-Value - Wikipedia
The best solution greatly depends of the way the database will be used. Try to find a couple of top use cases you expect and then decide the design. For example if there is no use case to get the value of a certain cell from database (the data is always loaded at row level, or even in group of rows) then is no need to have a 'cell' stored as such.
That is a good question that calls for many answers, depending how you approach it, I'd love to share an opinion with you.
This topic is one the various we searched about at Zenkit, we even wrote an article about, we'd love your opinion on it: https://zenkit.com/en/blog/spreadsheets-vs-databases/

Need Pattern for dynamic search of multiple sql tables

I'm looking for a pattern for performing a dynamic search on multiple tables.
I have no control over the legacy (and poorly designed) database table structure.
Consider a scenario similar to a resume search where a user may want to perform a search against any of the data in the resume and get back a list of resumes that match their search criteria. Any field can be searched at anytime and in combination with one or more other fields.
The actual sql query gets created dynamically depending on which fields are searched. Most solutions I've found involve complicated if blocks, but I can't help but think there must be a more elegant solution since this must be a solved problem by now.
Yeah, so I've started down the path of dynamically building the sql in code. Seems godawful. If I really try to support the requested ability to query any combination of any field in any table this is going to be one MASSIVE set of if statements. shiver
I believe I read that COALESCE only works if your data does not contain NULLs. Is that correct? If so, no go, since I have NULL values all over the place.
As far as I understand (and I'm also someone who has written against a horrible legacy database), there is no such thing as dynamic WHERE clauses. It has NOT been solved.
Personally, I prefer to generate my dynamic searches in code. Makes testing convenient. Note, when you create your sql queries in code, don't concatenate in user input. Use your #variables!
The only alternative is to use the COALESCE operator. Let's say you have the following table:
Users
-----------
Name nvarchar(20)
Nickname nvarchar(10)
and you want to search optionally for name or nickname. The following query will do this:
SELECT Name, Nickname
FROM Users
WHERE
Name = COALESCE(#name, Name) AND
Nickname = COALESCE(#nick, Nickname)
If you don't want to search for something, just pass in a null. For example, passing in "brian" for #name and null for #nick results in the following query being evaluated:
SELECT Name, Nickname
FROM Users
WHERE
Name = 'brian' AND
Nickname = Nickname
The coalesce operator turns the null into an identity evaluation, which is always true and doesn't affect the where clause.
Search and normalization can be at odds with each other. So probably first thing would be to get some kind of "view" that shows all the fields that can be searched as a single row with a single key getting you the resume. then you can throw something like Lucene in front of that to give you a full text index of those rows, the way that works is, you ask it for "x" in this view and it returns to you the key. Its a great solution and come recommended by joel himself on the podcast within the first 2 months IIRC.
What you need is something like SphinxSearch (for MySQL) or Apache Lucene.
As you said in your example lets imagine a Resume that will composed of several fields:
List item
Name,
Adreess,
Education (this could be a table on its own) or
Work experience (this could grow to its own table where each row represents a previous job)
So searching for a word in all those fields with WHERE rapidly becomes a very long query with several JOINS.
Instead you could change your framework of reference and think of the Whole resume as what it is a Single Document and you just want to search said document.
This is where tools like Sphinx Search do. They create a FULL TEXT index of your 'document' and then you can query sphinx and it will give you back where in the Database that record was found.
Really good search results.
Don't worry about this tools not being part of your RDBMS it will save you a lot of headaches to use the appropriate model "Documents" vs the incorrect one "TABLES" for this application.