combining the same nodes in different columns neo4j - cypher

I am new to neo4j, i have question. Suppose i have columns with values
Orig Term
f a
b g
c c
f d
a e
a f
+------------------------------------------------------------------------+
==> | Orig | Term | |
==> +------------------------------------------------------------------------+
==> | Node[1]{num:"f"} | Node[0]{num:"a"} | |
==> | Node[3]{num:"b"} | Node[2]{num:"g"} | |
==> | Node[5]{num:"c"} | Node[4]{num:"h"} | |
==> | Node[1]{num:"f"} | Node[6]{num:"d"} | |
==> | Node[8]{num:"a"} | Node[7]{num:"e"} | |
==> | Node[8]{num:"a"} | Node[9]{num:"d"} | |
==> +------------------------------------------------------------------------+
if we see Node 8 and node 0 are the same values.idea is to implement like mutual friends. But is there any way where not to have the duplication of the nodes no matter in which column they are. Or in other words to create the multiple relationships while using the same node like node a is related to f,e and node d. AS, my csv data has thousnds of the same values which i dont want to remove. beacuse i want to see the relationships. Thank you in advance .

When you load the data, use MERGE to create the nodes by their num attribute. MERGE in cypher will create something only if it doesn't already exist. So the node returned by that will be the old one if it already exists, which will avoid duplicates.
MERGE the node individually, and then afterwards separately merge the relationship you want.

Related

Dropping duplicates in Apache Spark DataFrame and keep row with value that has not been dropped already?

Let's say I have a DataFrame as the following:
+-------+-------+
|column1|column2|
+-------+-------+
| 1 | A |
| 1 | B |
| 2 | A |
| 2 | B |
| 3 | B |
+-------+-------+
I want to be able to find the pairs of where each unique element from column1 and column2 fit in
exactly one pair. Therefore, I would hope the outcome would be:
+-------+-------+
|column1|column2|
+-------+-------+
| 1 | A |
| 2 | B |
+-------+-------+
Notice that the pair (2, A) was removed because A was already paired up with 1. Also 3 was removed because B was already paired up with 2.
Is there a way to do this with Spark?
So far the only solution I came up with is just running a .collect() and then mapping each row and adding each value of A and B into a set. Therefore, when I meet a row and either an element from column A or B is already in the set, I remove that row.
Thanks for reading.

Last accessed timestamp of a Netezza table?

Does anyone know of a query that gives me details on the last time a Netezza table was accessed for any of the operations (select, insert or update) ?
Depending on your setup you may want to try the following query:
select *
from _v_qryhist
where lower(qh_sql) like '%tablename %'
There are a collection of history views in Netezza that should provide the information you require.
Netezza does not track this information in the catalog, so you will typically have to mine that from the query history database, if one is configured.
Modern Netezza query history information is typically stored in a dedicated database. Depending on permissions, you may be able to see if history collection is enabled, and which database it is using with the following command. Apologies in advance for the screen-breaking wrap to come.
SYSTEM.ADMIN(ADMIN)=> show history configuration;
CONFIG_NAME | CONFIG_DBNAME | CONFIG_DBTYPE | CONFIG_TARGETTYPE | CONFIG_LEVEL | CONFIG_HOSTNAME | CONFIG_USER | CONFIG_PASSWORD | CONFIG_LOADINTERVAL | CONFIG_LOADMINTHRESHOLD | CONFIG_LOADMAXTHRESHOLD | CONFIG_DISKFULLTHRESHOLD | CONFIG_STORAGELIMIT | CONFIG_LOADRETRY | CONFIG_ENABLEHIST | CONFIG_ENABLESYSTEM | CONFIG_NEXT | CONFIG_CURRENT | CONFIG_VERSION | CONFIG_COLLECTFILTER | CONFIG_KEYSTORE_ID | CONFIG_KEY_ID | KEYSTORE_NAME | KEY_ALIAS | CONFIG_SCHEMANAME | CONFIG_NAME_DELIMITED | CONFIG_DBNAME_DELIMITED | CONFIG_USER_DELIMITED | CONFIG_SCHEMANAME_DELIMITED
-------------+---------------+---------------+-------------------+--------------+-----------------+-------------+---------------------------------------+---------------------+-------------------------+-------------------------+--------------------------+---------------------+------------------+-------------------+---------------------+-------------+----------------+----------------+----------------------+--------------------+---------------+---------------+-----------+-------------------+-----------------------+-------------------------+-----------------------+-----------------------------
ALL_HIST_V3 | NEWHISTDB | 1 | 1 | 20 | localhost | HISTUSER | aFkqABhjApzE$flT/vZ7hU0vAflmU2MmPNQ== | 5 | 4 | 20 | 0 | 250 | 1 | f | f | f | t | 3 | 1 | 0 | 0 | | | HISTUSER | f | f | f | f
(1 row)
Also make note of the CONFIG_VERSION, as it will come into play when crafting the following query example. In my case, I happen to be using the version 3 format of the query history database.
Assuming history collection is configured, and that you have access to the history database, you can get the information you're looking for from the tables and views in that database. These are documented here. The following is an example, which reports when the given table was the target of a successful insert, update, or delete by referencing the "usage" column. Here I use one of the history table helper functions to unpack that column.
SELECT FORMAT_TABLE_ACCESS(usage),
hq.submittime
FROM "$v_hist_queries" hq
INNER JOIN "$hist_table_access_3" hta
USING (NPSID, NPSINSTANCEID, OPID, SESSIONID)
WHERE hq.dbname = 'PROD'
AND hta.schemaname = 'ADMIN'
AND hta.tablename = 'TEST_1'
AND hq.SUBMITTIME > '01-01-2015'
AND hq.SUBMITTIME <= '08-06-2015'
AND
(
instr(FORMAT_TABLE_ACCESS(usage),'ins') > 0
OR instr(FORMAT_TABLE_ACCESS(usage),'upd') > 0
OR instr(FORMAT_TABLE_ACCESS(usage),'del') > 0
)
AND status=0;
FORMAT_TABLE_ACCESS | SUBMITTIME
---------------------+----------------------------
ins | 2015-06-16 18:32:25.728042
ins | 2015-06-16 17:46:14.337105
ins | 2015-06-16 17:47:14.430995
(3 rows)
You will need to change the digit at the end of the $v_hist_table_access_3 view to match your query history version.

SQL Merge multiple tables with different columns into one

I need to create one master table from 5 tables, the difficulty is that the same column across the tables may have a different name. so for instance
For simplicity I`m just going to give an example for 2 tables
+----+----+
| 1 | 2 |
+----+----+
| PO | P |
| VE | V |
| TE | TE |
| LO | LO |
| IN | |
| D | |
| X | |
| Y | |
| | A |
| | B |
| | C |
+----+----+
so as you can see PO doesn`t have the same column name as the corresponding value in table 2 yet they are the same record. I need to aggregate these 2 tables into one master.
What I did was began with the table that has the most repeated columns and I am trying to merge the other tables into it. When there is a column only found on one table I want the other fields to display null. Also I don't want any duplicates. Hope someone can help me out!
Cheers
yet they are the same record.
No, they are not.
They could, however, represent different views of the same business entities. To "merge" them you must first specify what the JOIN criterion between them shall be.
Given it is
one.PO = two.P.
Then you must write a SQL statement like
SELECT one.PO AS ID,
one.VE,
/*same for TE, LO, IN, D, X, Y, */
two.A,
two.B,
two.C
INTO t_what_the_frak_the_new_table_shall_be_called
FROM t_what_the_frak_table_1_is_called AS one,
JOIN t_what_the_frak_table_2_is_called AS two
ON one.PO = two.P;
GO

Qlikview - join tables and update values

I have QV report with table that looks like this:
+---------+--------+---------------+------+-------+
| HOST | OBJECT | SPECIFICATION | COPY | LAST |
+---------+--------+---------------+------+-------+
| host001 | obj01 | spec01 | c1 | 15:55 |
| host002 | obj02 | spec02 | c2 | 14:30 |
| host003 | - | - | - | - |
| host004 | - | - | - | - |
+---------+--------+---------------+------+-------+
now I got another small table:
spec1
host1
host4
all I need is in loading script to connect these tables in this way:
the first row is specification and all others are hosts. If there is host with name from second row of second table(host1) and with specification from first row, than I need to copy all other values from the host row (host1) to rows where are other host from second table(host4), e.g.:
+---------+--------+---------------+------+-------+
| HOST | OBJECT | SPECIFICATION | COPY | LAST |
+---------+--------+---------------+------+-------+
| host001 | obj01 | spec01 | c1 | 15:55 |
| host002 | obj02 | spec02 | c2 | 14:30 |
| host003 | - | - | - | - |
| host004 | obj01 | spec01 | c1 | 15:55 |
+---------+--------+---------------+------+-------+
I have several tables like the second one and I need to connect all of them. Sure, there can be multiple rows with same host, same specification, etc. in firts table. "-" sign is null() value and one can change the second table layout.
I tried all JOINS and now Im trying to iterate over whole table and comparing, but Im new to QV and Im missing some SQL features like UPDATE.
I appreciate all your help.
Here's a script, it's not perfect and there is probably a neater solution(!) but it works for your scenario.
I rearranged your "Copy Table" so that it has three columns:
HOST SPECIFICATION TARGET_HOST
You could then repeat rows for the additional hosts that you wish to copy to as follows:
HOST SPECIFICATION TARGET_HOST
host001 spec01 host004
host001 spec01 host003
The script (I included some dummy data so you can try it out):
Source_Data:
LOAD * INLINE [
HOST, OBJECT, SPECIFICATION, COPY, LAST
host001, obj01, spec01 , c1, 15:55
host002, obj02, spec02 , c2, 14:30
host003
host004
];
Copy_Table:
LOAD * INLINE [
HOST, SPECIFICATION, TARGET_HOST
host001, spec01, host004
];
Link_Table:
NOCONCATENATE
LOAD
HOST & SPECIFICATION as %key,
TARGET_HOST
RESIDENT Copy_Table;
DROP TABLE Copy_Table;
LEFT JOIN (Link_Table)
LOAD
HOST & SPECIFICATION as %key,
HOST, OBJECT, SPECIFICATION, COPY, LAST
;
LOAD
*
RESIDENT Source_Data;
Complete_Data:
NOCONCATENATE LOAD
TARGET_HOST as HOST,
OBJECT, SPECIFICATION, COPY, LAST
RESIDENT Link_Table;
CONCATENATE (Complete_Data)
LOAD
*
RESIDENT Source_Data
WHERE NOT Exists(TARGET_HOST,HOST & SPECIFICATION); // old condition: WHERE NOT Exists(TARGET_HOST,HOST);
DROP TABLES Source_Data, Link_Table;

Check via Hector if secondary index already exists for a dynamic column in Cassandra

After the data import to my Cassandra Test-Cluster I found out that I need secondary indexes for some of the columns. Since the data is already inside the cluster, I want to achieve this by updating the ColumnFamilyDefinitions.
Now, the problem is: those columns are dynamic columns, so they are invisible to the getColumnMetaData() call.
How can I check via Hector if a secondary index has already been created and create one if this is not the case?
(I think the part how to create it can be found in http://comments.gmane.org/gmane.comp.db.hector.user/3151 )
If this is not possible, do I have to copy all data from this dynamic column family into a static one?
No need to copy all data from dynamic column family into static one.
Then How?? Let me explain you with an example, Suppose you have an CF schema mentioned below:
CREATE TABLE sample (
KEY text PRIMARY KEY,
flag boolean,
name text
)
NOTE I have done indexing on flag and name.
Now here are some data in the CF.
KEY,1 | address,Kolkata | flag,True | id,1 | name,Abhijit
KEY,2 | address,Kolkata | flag,True | id,2 | name,abc
KEY,3 | address,Delhi | flag,True | id,3 | name,xyz
KEY,4 | address,Delhi | flag,True | id,4 | name,pqr
KEY,5 | address,Delhi | col1,Hi | flag,True | id,4 | name,pqr
From the data you can understand that address, id & col1 all are dyamically created.
Now if i query something like that
SELECT * FROM sample WHERE flag =TRUE AND col1='Hi';
Note: col1 is not indexed, but i can filter using that field
Output:
KEY | address | col1 | flag | id | name
-----+---------+------+------+----+------
5 | Delhi | Hi | True | 4 | pqr
Another Query
SELECT * FROM sample WHERE flag =TRUE AND id>=1 AND id <5 AND address='Delhi';
Note: Here neither id is indexed, nor the address, still i am getting the output
Output:
KEY,3 | address,Delhi | flag,True | id,3 | name,xyz
KEY,4 | address,Delhi | flag,True | id,4 | name,pqr
KEY,5 | address,Delhi | col1,Hi | flag,True | id,4 | name,pqr
So basically if you have a column which value is always something you know, and its being indexed. Then you can easily filter on the rest of the dynamic columns aggregating them with indexed always positive column.