Formatting a table in Prolog - formatting

I am trying to print out a table of values in Prolog. Currently I have the following:
format('+~`-t~78|+ ~n', []).
format('|~tTable Title~t~78||~n', []),
format('+~`-t~78|+ ~n', []).
Produces my header:
+-----------------------------------------------------------------------------+
| Table Title |
+-----------------------------------------------------------------------------+
Now I want to pad some columns to produce the following format beneath
+-----------------------------------------------------------------------------+
| Name | Age | Eye Colour | Phone Number |
+-----------------------------------------------------------------------------+
| Joe Bloggs | 21 | Blue | 01234567890 |
+-----------------------------------------------------------------------------+
| John Smith | 32 | Brown | (+44) 012345678 |
+-----------------------------------------------------------------------------+
I have tried multiple methods of spreading the columns evenly. However this code:
format('| ~s~t~28|| ~s~t~8|| ~s~t~20|| ~s~t~24||~n',
['Name', 'Age', 'Eye Colour', 'Phone Number']).
Gives me uneven columns which aren't nicely spaced.
+-----------------------------------------------------------------------------+
| Name | Age| Eye Colour| Phone Number|
+-----------------------------------------------------------------------------+
The documentation on this was slightly confusing and I don't seem to be able to get my head round it so any help would be appreciated!

You can do relative colums with ~+:
?- format('| ~s~t~28|| ~s~t~8+| ~s~t~20+| ~s~t~24+|~n',
['Name', 'Age', 'Eye Colour', 'Phone Number']).
| Name | Age | Eye Colour | Phone Number |
true.

The tab spacing refers to absolute columns: try
format('| ~s~t~28|| ~s~t~36|| ~s~t~56|| ~s~t~80||~n',

Related

Fuzzy match a substring within a larger string in Postgres

Is it possible to fuzzy match a substring within a larger string in Postgres?
Example:
For a search of colour (ou), return all records where the string includes color, colors or colour.
select
*
from things
where fuzzy(color) in description;
id | description
----------------
1 | A red coloured car
2 | The garden
3 | Painting colors
=> return records 1 and 3
I was wondering if it's possible to combine both fuzzystrmatch and tsvector so that the fuzzy matching could be applied to each vectorized term?
Or if there is another approach?
You can do it of course, but I doubt it will be very useful:
select *,levenshtein(lexeme,'color') from things, unnest(to_tsvector('english',description))
order by levenshtein;
id | description | lexeme | positions | weights | levenshtein
----+--------------------+--------+-----------+---------+-------------
3 | Painting colors | color | {2} | {D} | 0
1 | A red coloured car | colour | {3} | {D} | 1
1 | A red coloured car | car | {4} | {D} | 3
1 | A red coloured car | red | {2} | {D} | 5
3 | Painting colors | paint | {1} | {D} | 5
2 | The garden | garden | {2} | {D} | 6
Presumably you would want to embellish the query to apply some cutoff, probably where the cutoff depends on the lengths, and return only the best result for each description assuming it met that cutoff. Doing that should be just routine SQL manipulations.
Perhaps better would be the word similarity operators recently added to pg_trgm.
select *, description <->> 'color' as distance from things order by description <->> 'color';
id | description | distance
----+--------------------+----------
3 | Painting colors | 0.166667
1 | A red coloured car | 0.333333
2 | The garden | 1
Another option would be to find a stemmer or thesaurus which standardizes British/American spellings (I am not aware of one readily available), and then not use fuzzy matching at all. I think this would be best, if you can do it.

Pandas - match a column of string with a column of regular expressions

The problem: I have two dataframes - one with a bunch of product titles that are not normalized, and one with a bunch of regular expressions that are tied to normalized product titles. I need to match the non-normalized titles to some regular expressions which are tied to normalized titles.
It should make more sense with the sample data below.
First dataframe (raw_titles):
| | Title | Release Date |
|---|------------------------------------------------|--------------|
| 1 | Apple iPad Air (3rd generation) - 64GB | 01/01/20 |
| 2 | Philips Hue White Ambiance A19 LED Smart Bulbs | 08/12/20 |
| 3 | Powerbeats Pro Totally Wireless Earphones | 06/20/19 |
Second dataframe (regex_titles):
| | Regex | Manufacturer | Model |
|---|-------------------------------------------------------|--------------|-------------------------|
| 1 | /ipad\s?air(?=.*(\b3\b|3rd\s?gen|2019))|\bair\s?3\b/i | Apple | iPad Air (2019) |
| 2 | /hue(?=.*cher)/i | Philips | Hue White Ambiance Cher |
| 3 | /powerbeats\s?pro/i | Beats | Powerbeats Pro |
The idea is to take each title in raw_titles, and run it through all the values in regex_titles to see if there's a match. Once that's done, raw_titles should then have two additional columns, Manufacturer and Model, which correspond to the regex_titles series they matched to (if there was no match, it would just stay empty.
Then the final table would look like this:
| | Title | Release Date | Manufacturer | Model |
|---|------------------------------------------------|--------------|--------------|-----------------|
| 1 | Apple iPad Air (3rd generation) - 64GB | 01/01/20 | Apple | iPad Air (2019) |
| 2 | Philips Hue White Ambiance A19 LED Smart Bulbs | 08/12/20 | | |
| 3 | Powerbeats Pro Totally Wireless Earphones | 06/12/19 | Beats | Powerbeats Pro |
There are many ways to do this, but the simplest is to test each of the regexes on each of the titles and return the first match you find. First, we'll define a function that will return two values: the manufacturer and model of the regex row, if we match, two Nones otherwise:
def find_match(title_row):
for _, regex_row in regex_titles.iterrows():
if re.search(regex_row['Regex'], title_row['Title']):
return [regex_row['Manufacturer'], regex_row['Model']]
return [None, None]
Then we'll apply our function to our titles dataframe and save the output to two new columns, Manufacturer and Model:
raw_titles[['Manufacturer', 'Model']] = raw_titles.apply(find_match, axis=1, result_type='broadcast')
Title Release Date Manufacturer Model
0 Apple iPad Air (3rd generation) - 64GB 01/01/20 Apple iPad Air (2019)
1 Philips Hue White Ambiance A19 LED Smart Bulbs 08/12/20 None None
2 Powerbeats Pro Totally Wireless Earphones 06/20/19 Beats Powerbeats Pro
One complication is that you'll have to translate your perl regexes into the python regex format:
perl: /powerbeats\s?pro/i -> python: (?i)powerbeats\s?pro
They're mostly the same, with a few small differences. Here's the reference.

Sybase, show all rows but don't display column data when duplicate

Product: Sybase ASE 11/12/15/16
I am looking to update a Stored Procedure that gets called by different applications, so changing the application(s) isn't an option. What is needed is best explained in examples:
Current results:
type | breed | name
------------------------------------
dog | german shepherd | Bernie
dog | german shepherd | James
dog | husky | Laura
cat | british blue | Mr Fluffles
cat | other | Laserchild
cat | british blue | Sleepy head
fish | goldfish | Goldie
What I need is for the First column's data to be cleared on duplicates. For example, the above data should look like:
type | breed | name
------------------------------------
dog | german shepherd | Bernie
| german shepherd | James
| husky | Laura
cat | british blue | Mr Fluffles
| other | Laserchild
| british blue | Sleepy head
fish | goldfish | Goldie
I know I can do a cursor, but there are around 10,000 records and that doesn't seem proficient. Looking for a select command, don't want to change the data in the database.
After mulling over this, I found a solution that would work and not use a cursor.
select Type,breed,name
into #DontDisplay
from #MyDataList as a1
group by breed
Having breed= (select max(name)
from #MyDataList a2
where a1.breed= a2.breed)
order by breed, name
select n.Type,d.Breed,d.Name
from #MyDataList as d
left join #DontDisplay as n
on d.Breed= n.Breed and d.Name= n.Name
order by Breed
Works great and the solution was based on another solution Sybase SQL Select Distinct Based on Multiple Columns with an ID

Select from two columns based on priority

I need to combine two columns in SQL Server. However, I need to give one column priority.
Make | Model | Segment make | Segment model
--------+------------+--------------+--------------
Ferarri | California | Sport | Null
Ferarri | F40 | Sport | Null
Porsche | 911 | Sport | Null
Porsche | Cayenne | Sport | SUV
BMW | M5 | Null | Sport
I need a table with all the models and the segment of each car. All models have the segment in one of the two columns with segment. And, if there is a segment in both columns, would I like the segment for model to override the segment for the make, as in the example with the Porsche. This is the result I need:
Make | Model | Segment
--------+------------+--------
Ferarri | California | Sport
Ferarri | F40 | Sport
Porsche | 911 | Sport
Porsche | Cayenne | SUV
BMW | M5 | Sport
I have searched and found Rank(), but it does not seem to do what I want. Any suggestions?
Thanks in advance
Use coalesce(). It returns it's first non-NULL parameter:
select, Make, Model, coalesce(Segmentmodel, Segmentmake) as segment
from tablename
I'd recomment something like
select isnull(model,make) as segment
Because then you will always select the model, unless the model is empty. In that case it will select the make.
Try :
Select Make,Model,Coalesce([Segment model], [Segment make]) Segment From YourTable
select coalesce([Segment make], [Segment model]) as Segment

How to add column with the value of another dimension?

I appologize if the title does not make sense. I am trying to do something that is probably simple, but I have not been able to figure it out, and I'm not sure how to search for the answer. I have the following MDX query:
SELECT
event_count ON 0,
TOPCOUNT(name.children, 10, event_count) ON 1
FROM
events
which returns something like this:
| | event_count |
+---------------+-------------+
| P Davis | 123 |
| J Davis | 123 |
| A Brown | 120 |
| K Thompson | 119 |
| R White | 119 |
| M Wilson | 118 |
| D Harris | 118 |
| R Thompson | 116 |
| Z Williams | 115 |
| X Smith | 114 |
I need to include an additional column (gender). Gender is not a metric. It's just another dimension on the data. For instance, consider this query:
SELECT
gender.children ON 0,
TOPCOUNT(name.children, 10, event_count) ON 1
FROM
events
But this is not what I want! :(
| | female | male | unknown |
+--------------+--------+------+---------+
| P Davis | | | 123 |
| J Davis | | 123 | |
| A Brown | | 120 | |
| K Thompson | | 119 | |
| R White | 119 | | |
| M Wilson | | | 118 |
| D Harris | | | 118 |
| R Thompson | | | 116 |
| Z Williams | | | 115 |
| X Smith | | | 114 |
Nice try, but I just want three columns: name, event_count, and gender. How hard can it be?
Obviously this reflects lack of understanding about MDX on my part. Any pointers to quality introductory material would be appreciated.
It's important to understand that in MDX you are building sets of members on each axis, and not specifying column names like a tabular rowset. You are describing a 2-dimensional grid of results, not a linear rowset. If you imagine each dimension as a table, the member set is the set of unique values from a single column in that table.
When you choose a Measure as the member (as in your first example), it looks as if you're selecting from a table, so it's easy to misunderstand. When you choose a Dimension, you get many members, and a cross-join between the rows and columns (which is sparse in this case because the names and genders are 1-to-1).
So, you could crossjoin these two dimensions on a single axis, and then filter out the null cells:
SELECT
event_count ON 0,
TOPCOUNT(
NonEmptyCrossJoin(name.children, gender.children),
10,
event_count) ON 1
FROM
events
Which should give you results that have a single column (event_count) and 10 rows, where each row is composed of the tuple (name, gender).
I hope that sets you on the right path, and please feel free to ask you want me to clarify.
For general introductory material, I think the book "MDX Solutions" is a good place to start:
http://www.amazon.ca/MDX-Solutions-Microsoft-Analysis-Services/dp/0471748080/
For an online MDX introductory material, you can have a look to this gentle introduction that presents the main MDX concepts.