In SQL, what is the memory-efficient way of "mapping" 1 ID to multiple IDs? - sql

I'll describe my scenario so you guys understand what type of design pattern I'm looking for.
I'm making an application where I provide someone with a link that is associated with one or more files. For example, someone needs somePowerpoint.ppx, main.cpp and somevid.mp4, and I have a tool that makes kj13h1djdsja213j1hhadad9933932 associated with those 3 files so that I can give someone
mysite.com/getfiles?fid=kj13h1djdsja213j1hhadad9933932
and they'll get a list of those files that they can download individually or all at once.
Since I'm new to SQL, the only way I know of doing that is having my tool use a table like
fid | filename
------------------------------------------------------------------
kj13h1djdsja213j1hhadad9933932 somePowerpoint.ppx
kj13h1djdsja213j1hhadad9933932 main.cpp
kj13h1djdsja213j1hhadad9933932 somevid.mp4
jj133823u22h248884h4h24h01h232 someotherfile.someextension
to go along with the above example. It would be nice if I could do some equivalent of
fid | filename(s)
---------------------------------------------------------------------------
kj13h1djdsja213j1hhadad9933932 somePowerpoint.ppx, main.cpp, somevid.mp4
jj133823u22h248884h4h24h01h232 someotherfile.someextension
but I'm not sure if that's possible or if I should be using some other design pattern altogether.
Any advice?

I believe Concatenate many rows into a single text string? can help give you a query that would generate your condensed format (you'd still want to store it in SQL with the full list, but you could make a view showing the condensed version using the query in the link)

Related

SSIS Fuzzy Grouping Always return the same result with different similarity thrshold

Can anyone tell me why my similarity is always 1.
My goal is AAB and AAC can be set as the same group for example.
Thanks
After I tried different source data, I got the goal what I need.
I think for sample data, it should be better to use some real example in the world.
Instead of AAA and AAC, maybe use Name column like Sara vs Saraa then ssis would say they are in the same group. However, i found for Don vs Done, they won't. So....it may not good idea to filter the records that has typo with different letter?
*** try to create more than one column to be you comparison column

Clean unstructured place name to a structured format

I have around 300k unstructured data as below screen.I'm trying to use Google refine or OpenRefine to make this correct. However, I'm unable to find a proper way to do this. I'm new to this tool. Anyone's help would be greatly appreciated.Also, this tool is quite slow to process 300k records. If I am trying out something its taking lots of time to process and give an output.
OR Please suggest any other opensource tools and techniques do this?
As Owen said in comments, your question is probably too broad and cannot receive acceptable answer. We can just provide you with a general procedure to follow.
In Open Refine, you'll need to create a column based on the messy column and apply transformations to delete unwanted characters. You'll have to use regular expressions. But for that, it's necessary to be able to identify patterns. It's not clear to me why the "ST" of "Nat.secu ST." is important, but not the "US" in "Massy Intertech US". Not even the "36" in "Plowk 36" (Google doesn't know this word, so I'm not sure is an organisation name).
On the basis of your fifteen lines, however, we seem to distinguish some clear patterns. For example, it looks like you'll have to remove the tokens (character suites without spaces) at the end of the string that contain a #. For that, the GREL formula in Open Refine could look like this:
value.trim().replace(/\b\w+#\w+\b$/,'')
Here is a screencast if it's not clear to you.
But sometimes a company name may contain a #, in which case you will need to create more complex rules. For example, remove the token only if the string contains more than two words.
if(value.split(' ').length() > 2, value.replace(/\b\w+#\w+\b$/, ''), value)
And so on for the other patterns that you'll find (for example, any number sequence at the end that contains more than 4 numbers and one - between them)
Feel free to check out the Open Refine documentation in case of doubt.

How to execute a sql query in jscript/jscript.NET

First at all sorry for my English, this is not my native language. So.
I want to execute a SQL query in a script to get some data. I don't know if it's possible and if so, how to make it. To summarize :
The script add a button in M3 Smart Office (a ERP). I already done that.
When i select a row in a M3 function (like an article, or a client) i want to take and send his ID (and some other data) to a website.
They're is a lot of function in M3. In each function, they're are some field who contains a data. One of them contain the ID of the object (An article, a client,...). What i want to do, is to get this ID. The problem is that the field who contains the ID doesn't have the same name in all the function. So, i have two solutions :
Do a lot of if/elseif. Like "if it's such function, take such field". But if I (or somebody else) want to add a combination function/field later i (or somebody else ;) )need to do that in the script. It's not practical.
Create a sql table wich contain all the combination function/field. Then is the script, i do a sql query and i get all the data that the script need.
So here the situation. Maybe you have ideas to do that otherwise (without sql) and i take it !
Please see this in depth tutorial from the 4guysfromrolla site:
Server-Side JScript Objects

How to get ids from Freebase given part of a name (from Freebase Offline Dumps)?

I have asked this question before here!. At that time, I was concerned on getting the output using the Google-Api which works just fine.
Actually, the problem with that is running into timeouts and more importantly, querying a web-based API. I would like to do it offline using the Freebase data-dumps. Is there any easy way to go about it?
Thanks
zegrep $'\tns:type\.object\.name\t.*Bush.*' freebase-rdf-<date>.gz | cut -f 1
will give you a list of all MIDs for topics which contain the string "Bush" (from your previous example) in their name.
Extend the regex as needed to include things like aliases, fancier name matching, etc.

how to update field names automatically after updating SQL

I am changing the command text for a data set inside the .rdl ffile:
I would like to know how can I update the resulting fields that are returned by the select statement:
I know that these fields must be automatically generated, so I was wondering if it's possible to update them right after editing the SQL code inline??
Usually when someone wants to have a look at the data in command text they are wanting it for reference to an end user(from what I have seen). You may want to amend it but ultimately with reporting your first goal should be: "What am I doing this for?" If your goal is dynamic creation at runtime then I would avoid this and offer a few other suggestions:
Procertize it. Making a stored procedure if you have the know how in SQL Server is a convenient and fast way to get what you want and you can optimize it if you know what you are doing with your SQL FU to get good results. The downside would be if you work with multiple environments you have to deploy your code for the TSQL as well as the RDL file.
Use an expression to build the dataset at runtime. In cases where I have been told that the query itself was not properly optimized by other developers they have mentioned doing this. I myself do not always see the advantage of doing this versus just having your predicate construction work well with good indexing on the source engine. Regardless you can build your dataset at runtime. It would be similar to hitting 'fx' next to the text and then putting in something like this(assuming you have a variable named #Start):
="Select thing
from table
Where >= " & Parameters!Start.Value
Again I have not really seen if this is really that much faster than:
Select thing
from table
Where >= #Start
But it is there if you just want to build it dynamically.
You can try to build your expression dynamically from parameters being PART of the select statement. SSRS is all about the 'expressions' and what you can do with them. Once you jump in and learn how they apply to everything you can go nuts so to speak on using them. A general rule though is the more of them you use and rely on the slower your reports will become.
I hope some of this may help, I would ask first is something dynamic due to a need to be event driven or is performance related.