Venn diagram in python, one subset containing the other two - matplotlib-venn

I'm using matplotlib-venn to create a Venn diagram that contains three subsets, one of which contains the other two (and these other two intersect each other).
venn3(subsets=(17, 29, 40, 154, 17, 29, 40), set_labels = ('A', 'B','C'), ax=axes)
C contains both A and B, and I'd like to selectively show only some of the
values.

It seems like it's not possible to do what is asked in the question using the matplotlib-venn package.

Related

How can I merge my insert scripts with IntelliJ?

I have a (spring boot) project in IntelliJ Ultimate. There are two tables Main and Extension where every entry in Main has one corresponding entry in Extension, e.g.
Main
main_id
main_col_a
0
lorem
1
ipsum
Extension
main_id
extension_col_a
extension_col_b
0
b
irrelevant
1
c
data
Now I have merged the tables, so that Main consists of main_id,main_col_a and extension_col_a (and Extension is dropped). But for my many tests I have ~100 sql files with insert clauses that need to be merged as well, so I need to turn
INSERT INTO MAIN(MAIN_ID, MAIN_COL_A) VALUES
(0, 'lorem'),
(1, 'ipsum');
INSERT INTO EXTENSION(MAIN_ID, EXTENSION_COL_A, EXTENSION_COL_B) VALUES
(0, 'b', 'irrelevant'),
(1, 'c', 'data');
into
INSERT INTO MAIN(MAIN_ID, MAIN_COL_A) VALUES
(0, 'lorem', 'b'),
(1, 'ipsum', 'c');
in an automated way.
There is some variation such as alignment, but the inserts for Extension always follow those for MAIN and the ids are always in the same order.
I'm not worried about deleting the Extension table but about moving the column from Extensions to Main. I'm currently considering writing a python script but I'm wondering if it can maybe done easily with IntelliJ features. I know about multiple cursors, but there are too many files for that and for Macros I don't think they can be easily applied for the varying number of lines in the insert statements.
You can use "Replace in Files" with regex expressions.
Open Edit -> Find -> Replace in Files...
Create expressions based on this guide: https://www.jetbrains.com/help/idea/regular-expression-syntax-reference.html#regex-syntax-reference
Here is something to get you started:
Find: INSERT INTO MAIN\(MAIN_ID\, MAIN_COL_A\) VALUES[\s]*(\([\w'"]*, [\w'"]*, [\w'"]*\),)*[\s]*(\([\w'"]*\, [\w'"]*)\)([\w\W]*INSERT INTO EXTENSION\(main_id, extension_col_a, extension_col_b\) values)[\s]*\([\w'"]\, ([\w'"]*)\, [\w'"]*\)[,;]
Replace:
INSERT INTO MAIN\(MAIN_ID\, MAIN_COL_A\) VALUES\n$1\n$2, $4\)$3
Use find&replace as many times as needed to merge the tables and then use it to edit the first line of the insert into main and delete the insert into extension.

Transforming a sheet into a table with column names as values in SQL Server

I've been given the task of turning the following Excel table into a database table in SQL Server (I have shortened the row count, of course).
A car has to go to service every 10.000 kilometers, but for some models there is a fast service that applies only to certain mileages (I don't know what mileage is called in kilometers lol).
The table shows car brands and models, and each following column represents the next maintenance service (i.e. column [10] represents the first service, performed at 10.000km, column [20] represents car service performed at 20.000km, etc.).
The values inside the mileage column will indicate if quick service "applies" to the corresponding model and mileage. (i.e. Quick service applies to [Changan A500] at 10.000km and 20.000km, but not at 30.000 or 40.000)
As mentioned before, I need to transform this table into a database table in SQL Server, with the following format.
In this format, there will be a row for every model and the mileage at which quick service corresponds. I hope this clarifies the requirement:
I can make a new SQL table with the source table, and then extract the data and insert it into the required table after transforming it (I assume there is no easy way of putting the information in the new format from the source Excel file).
Right now I'm thinking of using pointers in order to turn this data into a table, but I'm not very good at using pointers and I wanted to know if there might be an easier way before trying the hard way.
The point is to make this scalable, so they can keep adding models and their respective mileages.
How would you do it? Am I complicating myself too much by using pointers or is it a good idea?
Thanks, and sorry I used so many pictures, just thought it might clarify better, and the reason I haven't uploaded any SQL is because I just can't figure out yet how I plan to transform the data.
I'm not sure you can have a column named 10, 20, 30, 40, etc, but this is how I would solve this kind of problem.
SELECT *
INTO unpvt
FROM (VALUES
('chengan', 'a500', 'applies', 'applies', '', ''),
('renault', 'kwid', 'applies', 'applies', 'applies', 'applies')
)
v (brand, model, ten, twenty, thirty, fourty)
Select *
From unpvt
SELECT YT.brand,
YT.model,
V.number
FROM dbo.unpvt YT
CROSS APPLY (VALUES(ten),
(twenty),
(thirty),
(fourty))
V(number)
Result:

Wildcard list of numbers after comma and join on single value [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 months ago.
Improve this question
I have TABLE A with one of the columns containing a single value and TABLE B with one of the columns containing list of possible matching values.
My code seems to take only first items in the list but does not go deeper within a list to find matching number.
Can you please help me to improve the following code:
select Logs.SingleValue,
Instances.list from Logs,Instances
where Logs.Column1=Instances.DeviceNumber and
(',' + RTRIM(Instances.list) + ',') LIKE Logs.SingleValue
The data in the list looks like
106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120
or
3346, 3347, 3348, 3349, 3350, 3351, 3352, 3353, 3354, 3355, 3356, 3357, 3358, 3359, 3360
I use SQL within R programming environment; not sure what version it is. I'm not sure if the DBMS is MS SQL Server or ORACLE. All I know is that I have found a similar case and the command did not work so it needs to be handwritten in pure SQL.
The syntax looks like T-SQL, meaning it's MS SQL Server.
The best advice I can give you is to normalize your database - get rid of that comma-delimited column and move it to a table.
Read Is storing a delimited list in a database column really that bad?, where you will see a lot of reasons why the answer to this question is Absolutely yes!
If you can't do that, you should probably change your current SQL code to something like this:
select Logs.SingleValue, Instances.list
from Logs
inner join Instances ON Logs.Column1 = Instances.DeviceNumber
and (', ' + RTRIM(Instances.slotlist2) + ',') LIKE '%, '+ Logs.Column2 +',%'
This way you should be able to get all the records where slotlist2 has the number in Column2 somewhere in the list.
Note the space after the first comma in both sides of the like operator.
Please also note that I've changed your implicit join to an explicit join.
Since explicit joins have been a part of ANSI SQL for over 25 years now, and every self-respecting RDBMS supports them; there really is no need to use implicit joins anymore.
Edit: I've tested my query, and it seems to be working fine.
You can look at it yourself on rextester.
Result of most recent Query
So as on the screen shot there seems to be the problem with either a Query or a package within an R studio that lets operate SQL on CSV files without the Data Base.
Kind Regards
Dominik
P.S. Orginal post contains renamed column names just to simplify the case I am after solving.

openrefine, cluster and edit two datasets

i have two datasets. Column A has ids from dataset one, column B, has the data i need to cluster and edit, using the various available algorithms. Dataset 2, has again in the first column, the ids, and in the next column, the data. I need to reconcile, data only from dataset one, against data from the second dataset. What i have done so far, is use one dataset, merge the two, but then openrefine, gives me mixed results, ie messy data that exist only in dataset two, which is not what i want, in the current phase.
I have also investigated Reconcile-csv, but without success, in achieving desired result. Any ideas?
An alternative approach to using the reconciliation approach described by Ettore is to use algorithms similar to the 'key collision' clustering algorithms to create shared keys between the two data sets and then use this to do lookups between the data sets using the 'cross' function.
As an example for Column B in each data set you could 'Add column based on this column' using the GREL:
value.fingerprint()
This creates the same key as is used by the "Fingerprint" clustering method. Lets call the new column 'Column C'
You can then look up between the two projects using the following GREL in Dataset 2:
cells["Column C"].cross("Dataset 1","Column C")
If the values in both Dataset 1 and Dataset 2 would have clustered based on the fingerprint cluster then the lookup between the projects will work
You can also use the phonetic keying algorithms to create match keys in Column C if that works better. What you can't do using this method (as far as I know) is the equivalent of the Nearest Neighbour matching - you'd have to have a reconciliation service with fuzzy matching of some kind, or merge the two data sets, to achieve this.
Owen
Reconcile-CSV is a very good tool, but not very user friendly. You can use as an alternative the free Excel plugin Fuzzy Lookup Add-In for Excel. It's very easy to use, as evidenced by this screencast. One constraint: the two tables to be reconciled must be in Excel table format (select and CTRL + L).
And here is the same procedure with reconcile-csv (the GREL formula used is cell.recon.best.name and comes from here)

Assistance with MDX and comparing columns

I have 5 columns Temp, height, weight, pulse and site
I want to figure out which sites are reporting the same information. For example, Sites 1 and 2 have the same values for pulse...I'm looking for either an expression that would represent this, any suggestions?
Ive tried
If([PULSE Beats/Min]=[PULSE Beats/Min],"Duplicate",[PULSE Beats/Min])
You can use tuples, which are notated with parentheses and members from different hierarchies (which are e. g. site and measures) separated by commas, see the documentation.
IIf(([Measures].[PULSE Beats/Min], [Site].[Site 1]) = ([Measures].[PULSE Beats/Min], [Site].[Site 1]),
"Duplicate",
[PULSE Beats/Min]
)