How to use cross function with numeric values in Openrefine - openrefine

I need to join 2 datasets using a common column that contains numbers (5,345,22 etc).
Is it possible to use the cross function? It appears to be functioning only with textual values, not numbers.
For ex: I would like (45,"project2","column1") to work if the number 45 is somewhere in column1 of project2.
Of course, I could convert the column to text in both project, but it it cumbersome.

Related

SQL Join on Not Exact Numbers

I am trying to match up a table based on two 'unique' identifiers. First one is fine and is a string text that doesnt chagne. There is multiple lines of this first variable which is why I need a second variable to match over. The issue i have is that for variable B which is a decimal number it can very minorly change. So 90% of them will match exact but there might be instances where i am trying to maytch 1.97 to 1.96 for example which leaves me with missing values. Any ideas of a work around?
need some ideas.......
For approximately join on numeric values you can use something like next query:
select *
from a
join b on (a.val/b.val) between 0.99 and 1.01;
Look live test on https://sqlize.online/sql/psql15/db4d0e6bcc5b44e8bfc3b2bc252d567d/
The above query join numbers with +- 1% accuracy :)

Add a number value to column in SQL query using SELECT method

I have am working on adding a query that calculates tuition costs. It should do this by using the Tuition table which only includes the FullTimeCost (a static number for the student fees), and the PerUnitCost (the cost per credit hour).
I am trying to use a SELECT to return 3 more columns, 1 constant value of 12 called units, and 2 more that calculate the rest based on simple math.
The problem I am having is that I cannot seem to make the column Units have a default value of 12.
This is my code, and the issue I am having is that when I use this approach, the following formulas do not recognize the the columns being created in the previous lines.
All I need is for the 3rd Line to recognize Units so it can multiply by 12 as intended. Also this is for school, so a comment saying just change it to 12 is not useful.
SELECT
FullTimeCost, PerUnitCost,
12 AS Units,
PerUnitCost * Units AS TotalPerUnitCost,
FullTimeCost + TotalPerUnitCost AS TotalTuition
FROM
Tuition
You cannot re-use a column alias in the select. However, SQL Server gives you a convenient way to define the alias in the from clause, so you can use it:
SELECT t.FullTimeCost, t.PerUnitCost, v.Units,
v2.TotalPerUnitCost,
(t.FullTimeCost + v2.TotalPerUnitCost AS TotalTuition
FROM Tuition t CROSS APPLY
(VALUES (12)) v(units) CROSS APPLY
(VALUES (t.PerUnitCost * v.Units)) v2(TotalPerUnitCost);
Use a CTE to "add" your constant as a column and then apply the calculation. Without context, a variable would also be just as simple and useful.
with cte as (select FullTimeCost, PerUnitCost, 12 as Units
from dbo.Tuition
)
SELECT
FullTimeCost, PerUnitCost,
Units,
PerUnitCost * Units AS TotalPerUnitCost,
FullTimeCost + TotalPerUnitCost AS TotalTuition
FROM cte
order by ...;
There are, of course, other ways to accomplish this. Not certain what your coursework has covered but I assume that recent topics should have provided techniques to do this.
Using apply as shown by Gordon's answer is the most elegant solution and also noted in the comments is another way using a derived table.
As you have no doubt gathered, the problem is that during query compilation, the optimizer does not "see" the calculated column aliases as it can only (generally) access columns available from tables in the where clause, or as shown by Gordon, using an apply().
What you can also do is use a derived table, by first selecting the columns you need from your table and also adding your additional columns.
You then wrap this in parentheses - it's now a derived table ie, the results of the parenthesis content is itself a table available to an outer select.
You then use this as the source for an outer select which has visiblity of any additional columns you have added.
A complication with your query is that you want to add a constant value Units and then reference it, and also reference a second calculated column that makes use of Units.
I would simply use a single derived table to calculate the TotalPerUnitCost, you don't need Units since it's used only once.
select
FullTimeCost, PerUnitCost, TotalPerUnitCost,
FullTimeCost + TotalPerUnitCost as TotalTuition
from (
select FullTimeCost, PerUnitCost, TotalPerUnitCost, PerUnitCost * 12 as TotalPerUnitCost
from Tuition
)t

How can I assign pre-determined codes (1,2,3, etc,) to a JSON-type column in PostgreSQL?

I'm extracting a table of 2000+ rows which are park details. One of the columns is JSON type. Image of the table
We have about 15 attributes like this and we also have a documentation of pre-determined codes assigned to each attribute.
Each row in the extracted table has a different set of attributes that you can see in the image. Right now, I have cast(parks.services AS text) AS "details" to get all the attributes for a particular park or extract just one of them using the code below:
CASE
WHEN cast(parks.services AS text) LIKE '%uncovered%' THEN '2'
WHEN cast(parks.services AS text) LIKE '%{covered%' THEN '1' END AS "details"
This time around, I need to extract these attributes by assigning them the codes. As an example, let's just say
Park 1 - {covered, handicap_access, elevator} to be {1,3,7}
Park 2 - {uncovered, always_open, handicap_access} to be {2,5,3}
I have thought of using subquery to pre-assign the codes, but I cannot wrap my head around JSON operators - in fact, I don't know how to extract them on 2000+ rows.
It would be helpful if someone could guide me in this topic. Thanks a lot!
You should really think about normalizing your tables. Don't store arrays. You should add a mapping table to map the parks and the attribute codes. This makes everything much easier and more performant.
step-by-step demo:db<>fiddle
SELECT
t.name,
array_agg(c.code ORDER BY elems.index) as codes -- 3
FROM mytable t,
unnest(attributes) WITH ORDINALITY as elems(value, index) -- 1
JOIN codes c ON c.name = elems.value -- 2
GROUP BY t.name
Extract the array elements into one record per element. Add the WITH ORDINALITY to save the original order.
Join your codes on the elements
Create code arrays. To ensure the correct order, you can use the index values created by the WITH ORDINALITY clause.

Join on phone numbers in different formats

I need an oracle SQL join on two tables on fields with phone numbers that have different formats. The field on one table is the format 555-555-5555 and the other (555) 555-5555.
What is the syntax that could make this work? The tables are small enough I could probably get by with dropping area codes and just focus on the last 4 digits.
Is it possible? If I can't do a join I'm curious of the syntax for a simple compare such as: Where last4(phonenumber) = '4567'
If you want to compare the whole number, you can probably user regexp_replace to keep only the digits and then do the comparison:
where regexp_replace(phone_number,'\D','') = '55555551234';
\D matches non-digit character and removes them.
If last 4 digits will do, you can use substr:
where substr(phone_number,-4) = '1234';
Basically, you can use any string function on your join (in the ON clause, it doesn't have to be straight forward columns, can be calculated values).
For example, following what you suggested, you can use SUBSTR to get the last four digits, and use this on your join:
SELECT * from tableA INNER JOIN tableB on SUBSTR(tableA.num,-4,4) = SUBSTR(tableAB,-4,4)

Splitting text in SQL Server stored procedure

I'm working with a database, where one of the fields I extract is something like:
1-117 3-134 3-133
Each of these number sets represents a different set of data in another table. Taking 1-117 as an example, 1 = equipment ID, and 117 = equipment settings.
I have another table from which I need to extract data based on the previous field. It has two columns that split equipment ID and settings. Essentially, I need a way to go from the queried column 1-117 and run a query to extract data from another table where 1 and 117 are two separate corresponding columns.
So, is there anyway to split this number to run this query?
Also, how would I split those three numbers (1-117 3-134 3-133) into three different query sets?
The tricky part here is that this column can have any number of sets here (such as 1-117 3-133 or 1-117 3-134 3-133 2-131).
I'm creating these queries in a stored procedure as part of a larger document to display the extracted data.
Thanks for any help.
Since you didn't provide the DB vendor, here's two posts that answer this question for SQL Server and Oracle respectively...
T-SQL: Opposite to string concatenation - how to split string into multiple records
Splitting comma separated string in a PL/SQL stored proc
And if you're using some other DBMS, go search for "splitting text ". I can almost guarantee you're not the first one to ask, and there's answers for every DBMS flavor out there.
As you said the format is constant though, you could also do something simpler using a SUBSTRING function.
EDIT in response to OP comment...
Since you're using SQL Server, and you said that these values are always in a consistent format, you can do something as simple as using SUBSTRING to get each part of the value and assign them to T-SQL variables, where you can then use them to do whatever you want, like using them in the predicate of a query.
Assuming that what you said is true about the format always being #-### (exactly 1 digit, a dash, and 3 digits) this is fairly easy.
WITH EquipmentSettings AS (
SELECT
S.*,
Convert(int, Substring(S.AwfulMultivalue, V.Value * 6 - 5, 1) EquipmentID,
Convert(int, Substring(S.AwfulMultivalue, V.Value * 6 - 3, 3) Settings
FROM
SourceTable S
INNER JOIN master.dbo.spt_values V
ON V.Value BETWEEN 1 AND Len(S.AwfulMultivalue) / 6
WHERE
V.type = 'P'
)
SELECT
E.Whatever,
D.Whatever
FROM
EquipmentSettings E
INNER JOIN DestinationTable D
ON E.EquipmentID = D.EquipmentID
AND E.Settings = D.Settings
In SQL Server 2005+ this query will support 1365 values in the string.
If the length of the digits can vary, then it's a little harder. Let me know.
Incase if the sets does not increase by more than 4 then you can use Parsename to retrieve the result
Declare #Num varchar(20)
Set #Num='1-117 3-134 3-133'
select parsename(replace (#Num,' ','.'),3)
Result :- 1-117
Now again use parsename on the same resultset
Select parsename(replace(parsename(replace (#Num,' ','.'),3),'-','.'),1)
Result :- 117
If the there are more than 4 values then use split functions