Measure occurrences in MS Access SQL - sql

I was wondering whether it is possible (and if it is, the way) to perform the following task in Microsoft Access: there are two tables; first one consists of three columns 1) ID 2) tweet text and 3) date. Second table is a single vector of words, like a lexicon. For every row of the first table, I want to measure occurrences of the words of the second table (lexicon) in the tweet text column (2). Following, I want to add a new column in the first table, in which I will keep a record of these occurrences. My ultimate purpose is to perform some sort of sentiment analysis.
In case this helps, this is what I have done so far:
SELECT *
FROM Tweet_data
WHERE Tweet_text LIKE "*" & Positive_sentiment & "*";
However, I most probably have to make some changes in the part following the LIKE
If you think there is a more practical way to perform such task (sentiment analysis) I am open to suggestions.

You can use like in a join:
SELECT *
FROM Tweet_data td LEFT JOIN
Lexicon l
ON td.Tweet_text LIKE "*" & l.Positive_sentiment & "*";
The part of your question on how to do sentiment analysis is too broad for Stack Overflow. I will say, though, that MS Access is not the first tool that comes to mind.
EDIT:
The question is your comment is quite different from your actual question, but it is really a simple variation:
SELECT l.Positive_sentiment, count(td.tweet_text) as cnt
FROM Lexicon l LEFT JOIN
Tweet_data td
ON td.Tweet_text LIKE "*" & l.Positive_sentiment & "*";

Related

How to process a table serially for count

I need to process through a list of technical skills one by one and get a count of the number of developers we have in 3 locations who have that skill. For example, car type = "Java". How many persons have this skill listed in their resume.
I have 2 tables:
Skills: contains a single column listing skills - "Java" for example
Resources: contains 4 columns, Resource-ID, Name, Location, and a long text field called "Resume" containing text of their resume.
If this were a single skill I would process the SQL something like below (SQL syntax not exact)
SELECT count FROM [Resources] WHERE ([Resources].[Resume] Like "SKILL-ID*");
I want to process the Skills table serially printing the "Skill" and the count in each location.
Help appreciated
I've only used Access as a DB for single record retrieval, never using table values as input to loop through a process. I suspect that this is a simple execution in MS Access.
Ok, so we have to process that table.
I would simple "send out" a row with Location and skill for each match. We could write some "messy" code to then group by, but that is what SQL is for!!!
We could quite easy keep/have/use/enjoy the results in code, but its better to send the results out to a working table. Then we can use what sql does best - group and count that data.
So, then, the code could be this:
Sub CountSkills()
' empty out our working report table
CurrentDb.Execute "DELETE * FROM ReportResult"
Dim rstSkills As DAO.Recordset
Dim rstResources As DAO.Recordset
Dim rstReportResult As DAO.Recordset
Dim strSQL As String
Set rstSkills = CurrentDb.OpenRecordset("Skills")
strSQL = "SELECT Location, Resume FROM Resources " & _
"ORDER BY Location"
Set rstResources = CurrentDb.OpenRecordset(strSQL)
Set rstReportResult = CurrentDb.OpenRecordset("ReportResult")
Do While rstResources.EOF = False
' now for each resource row, process skill set
rstSkills.MoveFirst
Do While rstSkills.EOF = False
If InStr(rstResources!Resume, rstSkills!Skill) > 0 Then
rstReportResult.AddNew
rstReportResult!Location = rstResources!Location
rstReportResult!Skill = rstSkills!Skill
rstReportResult.Update
End If
rstSkills.MoveNext
Loop
rstResources.MoveNext
Loop
End Sub
Now, the above will wind up with a table looking like this:
So, now we can query (and count) against above data.
So, this query would do the trick:
SELECT Location, Skill, Count(1) AS SkillCount
FROM ReportResult
GROUP BY Location, Skill
And now we get this:
And you can flip the above query to group by skil, then location if you wish.
so, at the most simple level?
We write out ONE row + location for every match, and then use SQL on that to group by and count.
We COULD write code to actually count up by location, but that VBA code would as noted be a bit messy, and just spitting out rows of location and skill means we can then group by skill count, skill location count, or location, skill counts just by using "simple" sql against that list of location and skill record list.
So, now use the report wizard on that query above, and we get something like this:
Of course it is simple to change around the above report, but you get the idea for such a simple task as you noted.
Summarizing count of developers by skill and location can be accomplished with SQL. It requires a dataset of all possible skill/location pairs. Consider this simple example:
Resources
ID
Name
Location
Resume
1
a
x
Java,Excel
2
b
x
Excel
3
c
y
Excel
4
d
z
VBA,Java
SELECT Skills.SkillName, Resources.Location,
Sum(Abs([Resume] Like '*' & [SkillName] & "*")) AS DevCt
FROM Skills, Resources
GROUP BY Skills.SkillName, Resources.Location;
SkillName
Location
DevCt
Excel
x
2
Excel
y
1
Excel
z
0
Java
x
1
Java
y
0
Java
z
1
VBA
x
0
VBA
y
0
VBA
z
1
This approach utilizes a Cartesian product of Skills and Resources tables to generate the data pairs. This type of query can perform slowly with large dataset. If it is too slow, saving the pairs to a table could improve performance. Otherwise, a VBA solution will be only recourse and would likely involve looping recordset object.
Regardless of approach, be aware results will be skewed if Resume has content like "excellence". Bad data output is pitfall of poor database design.
Solution:
Resource and Skill table were added into MS_Access
Step 1: Create a query that executes the below SQL to get a counts (here named "Step1Query"):
SELECT Skills.SkillName, Resources.Location,
Sum(Abs([Resume] Like '*' & [SkillName] & "*")) AS DevCt
FROM Skills, Resources
GROUP BY Skills.SkillName, Resources.Location;
Step 2: Create a second query that uses the Step 1 query as input. (you can do this via the wizard):
TRANSFORM Sum(Step1Query.DevCt) AS SumOfDevCt
SELECT Skills.SkillName, Resources.Location,
Sum(Abs([Resume] Like '*' & [SkillName] & "*")) AS DevCt
FROM Skills, Resources
GROUP BY Skills.SkillName, Resources.Location
PIVOT Step1Query_qry.[Location];
Result lists out a matrix form. Thanks all for your help.

How to select data with a complex condition?

Using Microsoft Access, I normally use condition (mostly where) to obtain the data I want to display.
So far, it went well. However now I have a complex filtering and I'm not sure of the best way to do it. I will explain how I do it with many queries, and I'd like to know if there is something simpler, since I feel like it's doing too much for what I accomplish.
I have Building and Energy tables. Between them, I have a link table since a Building has a list of possible energies.
My goal is to display ALL energy not already associated with the building.
I first have a simple query to display all the IDs of energy that are in the link table where building is the one of interest.
Once I do that, I have another query using this one, which display an energy if it is an energy absent from previous list.
This takes 2 queries and I feel like I could have a better way to do this. I'm fairly new to MS Access, so any suggestion is welcome.
Here is the first request to obtain the list of energies:
SELECT
Batiments.ID, Energies.ID, Energies.Type
FROM
Energies
INNER JOIN
(Batiments
INNER JOIN
Batiment_Energie ON Batiments.ID = Batiment_Energie.Batiment_ID) ON Energies.ID = Batiment_Energie.Energie_ID
WHERE
(((Batiments.ID) = " & cbxBatiments.Column(0) & "));"
You can query the non-associated energy types with
SELECT
ID, Type
FROM
Energies
WHERE
ID NOT IN (SELECT Energie_ID
FROM Batiment_Energie
WHERE Batiment_ID = 123)
where 123 is to be replaced by the Id comming from cbxBatiments.Column(0).
You can use not exists:
select e.*
from energie as e
where not exists (select 1
from Batiment_Energie as be
where be.energie_id = e.id and be.batiment_id = <your id>
);

MS Access Update SQL Query Extremely Slow and Multiplying the Amount of Records Updated

I am stumped on how to make this query run more efficiently/correctly. Here is the query first and then I can describe the tables that are involved:
UPDATE agg_pivot_test AS p
LEFT JOIN jd_cleaning AS c
ON c.Formerly = IIF(c.Formerly LIKE '*or*', '*' & p.LyFinalCode & '*', CStr(p.LyFinalCode))
SET p.CyFinalCode = c.FinalCode
WHERE p.CyFinalCode IS NULL AND c.Formerly IS NOT NULL;
agg_pivot_test has 200 rows of data and only 99 fit the criteria of WHERE p.CyFinalCode IS NULL. The JOIN needs some explaining. It is an IIF because some genius decided to link last year's data to this year's data using Formerly. It is a string because sometimes multiple items have been consolidated down to one so they use "or" (e.g., 632 or 631 or 630). So if I want to match this year's data I have to use Formerly to match last year's LyFinalCode. So this year the code might be 629, but I have to use the Formerly to map the items that were 632, 631, or 630 to the new code. Make sense? That is why the ON has an IIF. Also, Formerly is a string and LyFinalCode is an integer... fun.
Anyway, when you run the query it says it is updating 1807 records when again, there are only 200 records and only 99 that fit the criteria.
Any suggestions about what this is happening or how to fix it?
An interesting problem. I don't think I've ever come across something quite like this before.
I'm guessing what's happening is that rows where CyFinalCode is null, are being matched multiple times by the join statement, and thus the join expression is calculating a cartesian product of row-matches, and this is the basis of the rows updated message. It seems odd, as I would have expected access to complain about multiple row matches, when row matches should only be 1:1 in an update statement.
I would suggest rewriting the query (with this join) as a select statement, and seeing what the query gives you in the way of output; Something like:
SELECT p.*, c.*
FROM agg_pivot_test p LEFT JOIN jd_cleaning c
ON c.Formerly = IIF(c.Formerly LIKE '*or*', '*' & p.LyFinalCode & '*', CStr(p.LyFinalCode))
WHERE p.CyFinalCode IS NULL AND c.Formerly IS NOT NULL
I'm also inclined to suggest changing "... & p.LyFinalCode & ..." to "... & CStr(p.LyFinalCode) & ..." - though I can't really see why it should make a difference.
The only other thing I can think to suggest is change the join a bit: (this isnt guaranteed to be better necessarily - though it might be)
UPDATE agg_pivot_test AS p LEFT JOIN jd_cleaning AS c
ON (c.Formerly = CStr(p.LyFinalCode) OR InStr(c.Formerly, CStr(p.LyFinalCode)) > 0)
(Given the syntax of your statement, I assume this sql is running within access via ODBC; in which case this should be fine. If I'm wrong the sql is running server side, you'll need to change InStr to SubString.)

Concatenation and proportion in access

A little background:
I have two tables imported from excel. One is 300k + rows so when I do updates to it in excel it just runs too slow, and often doesn't process on my comp. Anyways, I used a 'outer' left join to bring the two together.
Now when I run the query, I get the result which works fine but I need to add some fields to these results.
I am hoping to mimic what Ive done in excel, so I can create my summary pivots in the same manner.
First, I need a field that just concatenates two others after the join.
Then I need to add a field the equivalent of:
1/Countif($T$2:$T$3330,T2) from excel to access. However, the range does not need to be fixed. I will get it so that all the text entries are at the top of the field, so in theory, i need the equivalent of Sheets("").Range("T2").End(xldown). This proportion is used to eliminate double counting when i do pivot tables.
I am probably making this much more complicated than it has to be but I am new to Access as well, so please try to explain some things in explanations.
Thanks
Edit: I currently have:
Select [Table1].*, [Table2].PlaySk, [Table2].Service
From [Table1] Left Join [Table2] On [Table1].Play + [Table1].Skill
= [Table2].PlaySk
And in a general case, what I am trying to solve is something to get ColAB and ColProportion.
ColA ColB ColAB ColProportion
a 1 a1 .5
b 1 b1 1
a 1 a1 .5
b 2 b2 .3333333
b 2 b2 .3333333
b 2 b2 .3333333
Sounds to me like you'll need to make a couple queries in sequence to do everything you need.
The first part (concatenate) is relatively easy though -- just take the two field names you wish to concatenate together, say [Play] and [Skill], and, in design view, make a new field like "PlaySk: [Play] & [Skill]".
If you want to put a character between them (I often do when I concatenate, just to keep things straight), like a semicolon for example, you can do "PlaySk: [Play] & ';' & [Skill]".
As for the second part, I think you'll want to build a "Group By" query on top of the other one. In your original query, make another field in design view like this: "T2_Counter: Iif([The field you're checking, i.e. whatever column T is] = 'whatever value you're checking for, i.e. whatever T2 is',1,0)". This will result in a column that's a 1 when the check is true, and a zero otherwise.
Then bring this query into a new one, click "Totals" at the top in the Design tab, then bring the fields you want to group by down. Then create a field in design view like this: "MagicField: 1/Sum(T2_Counter)".
Hopefully this helps get you started at least.

How to improve efficiency of this query & VBA?

I have this query in Access:
SELECT TOP 10 title,
ConcatRelated("DOCTEXT","DocumFrag", "title='" & title & "'" ) AS result
FROM DocumFrag
GROUP BY title;
DocumFrag contains about 9000 records, ConcatRelated is VBA code found here: http://allenbrowne.com/func-concat.html
When I run this query for only TOP 10 and it completes, it continually lags to the point of 20 second response time (clicking, typing, etc).
Is there a way I can improve this to be more stable? I'm doing TOP 10 as an example to test if it lags; in the end I'd need to select all.
My goal of this query is the same as Concatenating record values in database (MS Access) or in server side code (ASP.NET) (except in Access, not ASP.NET)
Or is there a way I can accomplish this using a query, instead of VBA?
My best guess is that ConcatRelated evaluates for every 'title' in 'DocumFrag'. Select the top 10 in an inner query before you apply the function:
SELECT q.title, ConcatRelated("DOCTEXT","DocumFrag", "title='" & q.title & "'" ) AS result
FROM
(SELECT TOP 10 title FROM DocumFrag) AS q
GROUP BY q.title;
yes, 1st make sure your data table has a clustered index (this determines the order the data is stored on disk), otherwise you have a heap and the sql engine needs to query the entire table as the data can be anywhere in the table. 2nd put a covering index on the querying parameters and the data you want to return. 3rd you are trying to group text? It would be better to Find the top 10 items and then conconate the text associate with them rather than conconate every group item as your code is doing and then select the top 10.