I am trying to match up a table based on two 'unique' identifiers. First one is fine and is a string text that doesnt chagne. There is multiple lines of this first variable which is why I need a second variable to match over. The issue i have is that for variable B which is a decimal number it can very minorly change. So 90% of them will match exact but there might be instances where i am trying to maytch 1.97 to 1.96 for example which leaves me with missing values. Any ideas of a work around?
need some ideas.......
For approximately join on numeric values you can use something like next query:
select *
from a
join b on (a.val/b.val) between 0.99 and 1.01;
Look live test on https://sqlize.online/sql/psql15/db4d0e6bcc5b44e8bfc3b2bc252d567d/
The above query join numbers with +- 1% accuracy :)
Related
I'm new to this.
I have a column: (chocolate_weight) On the table : (Chocolate) which has g at the end of every number, so 30x , 2x5g,10g etc.
I want to remove the letter at the end and then query it to show any that weigh greater than 35.
So far I have done
Select *
From Chocolate
Where chocolate_weight IN
(SELECT
REPLACE(chocolote_weight,'x','') From Chocolate) > 35
It is coming back with 0 , even though there are many that weigh more than 35.
Any help is appreciated
Thanks
If 'g' is always the suffix then your current query is along the right lines, but you don't need the IN you can do the replace in the where clause:
SELECT *
FROM Chocolate
WHERE CAST(REPLACE(chocolate_weight,'g','') AS DECIMAL(10, 2)) > 35;
N.B. This works in both the tagged DBMS SQL-Server and MySQL
This will fail (although only silently in MySQL) if you have anything that contains units other than grams though, so what I would strongly suggest is that you fix your design if it is not too late, store the weight as an numeric type and lose the 'g' completely if you only ever store in grams. If you use multiple different units then you may wish to standardise this so all are as grams, or alternatively store the two things in separate columns, one as a decimal/int for the numeric value and a separate column for the weight, e.g.
Weight
Unit
10
g
150
g
1000
lb
The issue you will have here though is that you will have start doing conversions in your queries to ensure you get all results. It is easier to do the conversion once when the data is saved and use a standard measure for all records.
So Im trying to use INNER JOIN in my sql command because I am trying to replace the Foreign keys ID numbers with the text value of each column. However, when I use INNER JOIN, the column for "Standards" always gives me the same value. The following is what I started with
SELECT Grade_Id, Cluster_Eng_Id, Domain_Math_Eng_Id, Standard
FROM `math_standards_eng`
WHERE 1
and returns this (which is good). Notice the value of Standard values are different
Grade_Id Cluster_Eng_Id Domain_Math_Eng_Id Standard
103 131 107 Explain equivalence of fractions in special cases...
104 143 105 Know relative sizes of measurement units within o...
When I try to use Inner Join, the values for Grade_Id, Cluster_Eng_Id, and Domain_Math_Eng_Id are changed from numbers to actual text. Standard column values, however, seems to return the same value. Here is my code:
SELECT
grades_eng.Grade, domain_math_eng.Domain, cluster_eng.Cluster,
math_standards_eng.Standard
FROM
math_standards_eng
INNER JOIN
grades_eng ON math_standards_eng.Grade_Id = grades_eng.Id
INNER JOIN
domain_math_eng ON math_standards_eng.Domain_Math_Eng_Id
INNER JOIN
cluster_eng ON math_standards_eng.Cluster_Eng_Id
This is what I get when I run the query:
Grade Domain Cluster Standard
3rd Counting and cardinality Know number names and the count sequence Explain equivalence of fractions in special cases...
3rd Expressions and Equations Know number names and the count sequence Explain equivalence of fractions in special cases...
3rd Functions Know number names and the count sequence Explain equivalence of fractions in special cases.
4th Counting and cardinality Know number names and the count sequence Know relative sizes of measurement units within o...
4th Expressions and Equations Know number names and the count sequence Know relative sizes of measurement units within o...
The text value for Standard keeps on showing the same value per grade and I do not know why. 3rd Will keep showing the same thing, and then the next grade will change to a new value and repeat over and over. Lastly, each table has a 1:M relationship with standard as they each appear multiple times in the standard Table. Any advice would be greatly appreciated.
You are missing the = part of your INNER JOIN on domain_math_eng and cluster_eng. I would expect something like:
SELECT grades_eng.Grade, domain_math_eng.Domain, cluster_eng.Cluster, math_standards_eng.Standard FROM math_standards_eng
INNER JOIN grades_eng ON math_standards_eng.Grade_Id = grades_eng.Id
INNER JOIN domain_math_eng ON math_standards_eng.Domain_Math_Eng_Id = domain_math_eng.Id
INNER JOIN cluster_eng ON math_standards_eng.Cluster_Eng_Id = cluster_eng.Id
I have two files which I would like to match by name and I would like to take account of spelling errors by using the compged function. The names have been thoroughly cleaned and I have no other useful match variables that could be used to reduce the search space.
The files name1 and name2 have over 500k rows each and thus after 11 hours this code has not run.
Is there some way I can code this more efficiently or is my issue purely due to computing power?
proc sql;
create table name1_name2_Fuzzy as
select a.*, b.*
from name1 as a
inner join name2 as b
on COMPGED(a.match_name, b.match_name) < 200;
quit;
You have a parameter in compged function that you didn't use, and that can improve the performance (maybe 6 or 7 hours instead of 11..).
this parameter is the cutoff. If you choose 300 as a cutoff, when the distance between the words reaches 300, sas stops the calculation and outputs 300.
So here in your case, you should choose a cutoff >200 (and NOT >=200).
Complev function is faster than Compged. If you don't need an exact cost of each operation (with call compost routine), you can use it instead of compged, and you can reduce minutes or maybe hours of computations. Complev has also the cutoff option.
Hope this helps !
Working off memory here, but if the first char of each match_name is different, the COMPGED will be over 200, true? So, you wouldn't consider them a match?
If so, make an indexed column with the first character of match_name in each table, and join on that before the COMPGED. That should eliminate most of the non-matches so far fewer COMPGED calculations will be needed.
A little background:
I have two tables imported from excel. One is 300k + rows so when I do updates to it in excel it just runs too slow, and often doesn't process on my comp. Anyways, I used a 'outer' left join to bring the two together.
Now when I run the query, I get the result which works fine but I need to add some fields to these results.
I am hoping to mimic what Ive done in excel, so I can create my summary pivots in the same manner.
First, I need a field that just concatenates two others after the join.
Then I need to add a field the equivalent of:
1/Countif($T$2:$T$3330,T2) from excel to access. However, the range does not need to be fixed. I will get it so that all the text entries are at the top of the field, so in theory, i need the equivalent of Sheets("").Range("T2").End(xldown). This proportion is used to eliminate double counting when i do pivot tables.
I am probably making this much more complicated than it has to be but I am new to Access as well, so please try to explain some things in explanations.
Thanks
Edit: I currently have:
Select [Table1].*, [Table2].PlaySk, [Table2].Service
From [Table1] Left Join [Table2] On [Table1].Play + [Table1].Skill
= [Table2].PlaySk
And in a general case, what I am trying to solve is something to get ColAB and ColProportion.
ColA ColB ColAB ColProportion
a 1 a1 .5
b 1 b1 1
a 1 a1 .5
b 2 b2 .3333333
b 2 b2 .3333333
b 2 b2 .3333333
Sounds to me like you'll need to make a couple queries in sequence to do everything you need.
The first part (concatenate) is relatively easy though -- just take the two field names you wish to concatenate together, say [Play] and [Skill], and, in design view, make a new field like "PlaySk: [Play] & [Skill]".
If you want to put a character between them (I often do when I concatenate, just to keep things straight), like a semicolon for example, you can do "PlaySk: [Play] & ';' & [Skill]".
As for the second part, I think you'll want to build a "Group By" query on top of the other one. In your original query, make another field in design view like this: "T2_Counter: Iif([The field you're checking, i.e. whatever column T is] = 'whatever value you're checking for, i.e. whatever T2 is',1,0)". This will result in a column that's a 1 when the check is true, and a zero otherwise.
Then bring this query into a new one, click "Totals" at the top in the Design tab, then bring the fields you want to group by down. Then create a field in design view like this: "MagicField: 1/Sum(T2_Counter)".
Hopefully this helps get you started at least.
I have a query from Access where I caluclated the percentage score of three seperate numbers Ex:
AFPercentageMajor: [AFNumberOfMajors]/([AFTotalMajor]-[AFMajorNA])
which could have values of 20/(23-2) = 95%
I have imported this table into my SQL database and tried to write a expression in the view (changed the names of the columns a bit)
AF_Major / (AF_Major_Totals - AF_Major_NA)
I tried adding *100 to the end of the statement but it only works if the calculation is at 100%. If it is anything less than that it puts it as a 0.
I have a feeling it just doesn't like the combincation of the three seperate column names. But like I said I'm still learning so I could be going at this completely wrong!
SQL Server does integer division. You need to change one of the values to a floating point representation. The following will work:
cast([AFNumberOfMajors] as float)/([AFTotalMajor]-[AFMajorNA])
You can multiply this by 100 to get the percentage value.