大數據

Oracle MySQL相關工具replace操作實現方式區別

數據導入工具中的replace

1,oracle impdp 導入數據的表已存在指定TABLE_EXISTS_ACTION參數,如果目標庫存在相同名稱的表,執行下面的操作

1) skip:默認跳過這張表,繼續下一個對象。如果CONTENT設置了DATA_ONLY參數,則默認值為append。
2) replace:先drop表,然後創建表,最後插入數據
3) append:在原來數據的基礎上增加數據
4) truncate:先truncate,然後再插入數據

2,oracle sqlloader 加載數據模式,遇到有數據時

1) APPEND //原先的表有數據 就加在後面
2) INSERT // 裝載空表 如果原先的表有數據 sqlloader會停止 默認值
3) REPLACE // 原先的表有數據 原先的數據會全部刪除
4) TRUNCATE // 指定的內容和replace的相同 會用truncate語句刪除現存數據

3,mysql load data 加載數據 對有唯一鍵記錄重複時

1)如果指定replace,新行將代替有相同的唯一鍵值的現有行。具體邏輯同mysql數據庫內replace into語句
2)如果指定ignore,跳過有唯一鍵的現有行的重複行的輸入。
3)如果不指定任何一個選項,當找到重複鍵時,出現一個錯誤,並且文本文件的餘下部分被忽略。

4,DataX工具

導入數據碰到主鍵或唯一鍵衝突:選擇導入模式,可以支持 insert/replace/insert ignore 方式

insert 指當主鍵/唯一性索引衝突,數據集成視為髒數據進行處理。
replace 指沒有遇到主鍵/唯一性索引衝突時,與 insert 行為一致,當主鍵/唯一性索引衝突時會用新行替換原有行所有字段。
insert ignore 指當主鍵/唯一性索引衝突,數據集成將直接忽略更新丟棄,並且不記錄!

注意

MySQL writer支持replace into,參考MySQL數據庫的replace into語句。

Oracle writer不支持replace into,支持insert into...(當主鍵/唯一性索引衝突時會寫不進去衝突的行),因為Oracle不支持replace、replace into語句。OracleWriter和MysqlWriter不同,不支持配置writeMode參數。原因可見文末

數據庫中的replace

1,MySQL數據庫中的replace、replace into語句

replace在沒有碰到主鍵、唯一索引重複值時,和insert完全相同。因此replace僅當表具有PRIMARY KEY或 UNIQUE索引時才有意義。
如果表中的舊行與a PRIMARY KEY或UNIQUE 索引的新行具有相同的值, 則在插入新行之前刪除該舊行。

要使用REPLACE,您必須同時擁有表的INSERT和 DELETE特權。

MySQL對REPLACE(和 LOAD DATA ... REPLACE)使用以下算法 :

  1. 嘗試將新行插入表中
  2. 雖然插入失敗是因為主鍵或唯一索引發生重複鍵錯誤:
    • 從表中刪除具有重複鍵值的衝突行
    • 再試一次將新行插入表中
mysql> select * from replace1;
+----+-------+
| id | name  |
+----+-------+
|  1 | aaa   |
|  2 | wang1 |
|  3 | ccc   |
+----+-------+
3 rows in set (0.00 sec)

mysql> replace into replace1 values (3,'ccc');
Query OK, 1 row affected (0.02 sec)

mysql> replace into replace1 values (3,'ccc');
Query OK, 1 row affected (0.01 sec)

mysql> replace into replace1 values (3,'ddd');
Query OK, 2 rows affected (0.00 sec)

mysql> replace into replace1 values (3,'ddd');
Query OK, 2 rows affected (0.00 sec)

mysql> replace replace1 values (3,'ddd');
Query OK, 1 row affected (0.00 sec)

mysql> replace replace1 values (3,'ccc');
Query OK, 2 rows affected (0.01 sec)

從執行結果來看,replace和replace into 具有相同的作用。

也可以使用MySQL help命令查看幫助:

mysql> help replace;

2,MySQL數據庫的INSERT ... ON DUPLICATE KEY UPDATE語句

insert into UNE_CBILL_PRINT_CLOUD(fid,feinvoicecode,ftype)VALUES('1','1','1')
        ON DUPLICATE KEY UPDATE foperator = '1',ftype = '3';

replace into 效率比較低,因為在更新數據的時候,要先刪除舊的,然後插入新的,在這個過程中,還要重新維護索引;

insert on duplicate 的更新操作雖然也會更新數據,但其對主鍵的索引卻不會有改變,也就是說,insert on duplicate 更新對主鍵索引沒有影響,因此對索引的維護成本就低了一些。

相關測試:

mysql> 
mysql> create table replace1(id int primary key,name varchar(8));
Query OK, 0 rows affected (0.21 sec)

mysql> 
mysql> 
mysql> 
mysql> create unique index unique_name on replace1(name);
Query OK, 0 rows affected (0.37 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> 
mysql> 
mysql> 
mysql> insert into replace1 values(1,'aaa');
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> 
mysql> 
mysql> replace into replace1 values(2,'bbb');
Query OK, 1 row affected (0.01 sec)

mysql> 
mysql> 
mysql> 
mysql> replace into replace1 values(3,'ccc');
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> 
mysql> replace into replace1 values(3,'ccc');
Query OK, 2 rows affected (0.01 sec)

mysql> replace into replace1 values(3,'ccc');
Query OK, 2 rows affected (0.00 sec)

mysql> 
mysql> select * from replace1;
+----+------+
| id | name |
+----+------+
|  1 | aaa  |
|  2 | bbb  |
|  3 | ccc  |
+----+------+
3 rows in set (0.00 sec)

mysql> 
mysql> 
mysql> replace into replace1 values(4,'ccc');
Query OK, 2 rows affected (0.01 sec)

mysql> 
mysql> 
mysql> 
mysql> select * from replace1;
+----+------+
| id | name |
+----+------+
|  1 | aaa  |
|  2 | bbb  |
|  4 | ccc  |
+----+------+
3 rows in set (0.00 sec)

mysql> 
mysql> replace into replace1 values(3,'ccc');
Query OK, 2 rows affected (0.01 sec)

mysql> 
mysql> 
mysql> select * from replace1;
+----+------+
| id | name |
+----+------+
|  1 | aaa  |
|  2 | bbb  |
|  3 | ccc  |
+----+------+
3 rows in set (0.00 sec)

mysql> 
mysql> desc replace1;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| id    | int(11)    | NO   | PRI | NULL    |       |
| name  | varchar(8) | YES  | UNI | NULL    |       |
+-------+------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> 
mysql> 
mysql> 
mysql> replace into replace1 values(4,'ddd');
Query OK, 1 row affected (0.00 sec)

mysql> 
mysql> 
mysql> select * from replace1;
+----+------+
| id | name |
+----+------+
|  1 | aaa  |
|  2 | bbb  |
|  3 | ccc  |
|  4 | ddd  |
+----+------+
4 rows in set (0.00 sec)

mysql> 
mysql> replace into replace1 values(4,'ccc');
Query OK, 3 rows affected (0.01 sec)

mysql> 
mysql> 
mysql> select * from replace1;
+----+------+
| id | name |
+----+------+
|  1 | aaa  |
|  2 | bbb  |
|  4 | ccc  |
+----+------+
3 rows in set (0.00 sec)

mysql> 
mysql> 
mysql> 
mysql> 
mysql> replace into replace1 values(3,'ccc');
Query OK, 2 rows affected (0.00 sec)

mysql> 
mysql> 
mysql> 
mysql> select * from replace1;
+----+------+
| id | name |
+----+------+
|  1 | aaa  |
|  2 | bbb  |
|  3 | ccc  |
+----+------+
3 rows in set (0.01 sec)

mysql> 
mysql> 
mysql> 
mysql> replace into replace1 values(3,'ccc');
Query OK, 2 rows affected (0.01 sec)

mysql> 
mysql> 
mysql> 
mysql> drop index unique_name;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1
mysql> 
mysql> 
mysql> 
mysql> help drop index 
Name: 'DROP INDEX'
Description:
Syntax:
DROP INDEX index_name ON tbl_name
    [algorithm_option | lock_option] ...

algorithm_option:
    ALGORITHM [=] {DEFAULT|INPLACE|COPY}

lock_option:
    LOCK [=] {DEFAULT|NONE|SHARED|EXCLUSIVE}

DROP INDEX drops the index named index_name from the table tbl_name.
This statement is mapped to an ALTER TABLE statement to drop the index.
See [HELP ALTER TABLE].

To drop a primary key, the index name is always PRIMARY, which must be
specified as a quoted identifier because PRIMARY is a reserved word:

DROP INDEX `PRIMARY` ON t;

URL: http://dev.mysql.com/doc/refman/8.0/en/drop-index.html


mysql> 
mysql> 
mysql> 
mysql> drop index unique_name on replace1;
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> 
mysql> 
mysql> 
mysql> replace into replace1 values(3,'ccc');
Query OK, 1 row affected (0.01 sec)

mysql> 
mysql> 
mysql> replace into replace1 values(3,'ccc');
Query OK, 1 row affected (0.01 sec)

mysql> 
mysql> 
mysql> insert into replace1 values(4,'ddd');
Query OK, 1 row affected (0.01 sec)

mysql> create unique index unique_name on replace1(name);
Query OK, 0 rows affected (0.02 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> insert into replace1 values(5,'eee');
Query OK, 1 row affected (0.01 sec)

mysql> 
mysql> desc replace1;
+-------+------------+------+-----+---------+-------+
| Field | Type       | Null | Key | Default | Extra |
+-------+------------+------+-----+---------+-------+
| id    | int(11)    | NO   | PRI | NULL    |       |
| name  | varchar(8) | YES  | UNI | NULL    |       |
+-------+------------+------+-----+---------+-------+
2 rows in set (0.00 sec)

mysql> 
mysql> 
mysql> update replace1 set name='ccc' where id=3;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1  Changed: 0  Warnings: 0

mysql> update replace1 set name='ccc' where id=3;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1  Changed: 0  Warnings: 0

mysql> 
mysql> 
mysql> update replace1 set name='ddd' where id=3;
ERROR 1062 (23000): Duplicate entry 'ddd' for key 'unique_name'
mysql> 
mysql> 
mysql> update replace1 set name='www' where id=3;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> 
mysql> 
mysql> update replace1 set name='ccc' where id=3;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> 
mysql> update replace1 set name='ccc' where id=3;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1  Changed: 0  Warnings: 0

mysql> 
mysql> update replace1 set name='ccc' where id=3;
Query OK, 0 rows affected (0.00 sec)
Rows matched: 1  Changed: 0  Warnings: 0

注意:

1,在有多個唯一索引的時候執行replace操作會出現一些奇怪的現象,如除主鍵外,還有個唯一索引。replace操作影響了2行,所以先刪掉衝突的2行,並進行插入一行,返回提示影響了3行數據,replace一條數據,表的總行數少了一條。

2,除主鍵外有其他唯一索引時,replace語句values如果完全相同時,影響行數為2,是先delete再insert,當僅有主鍵沒有其他唯一索引時,影響行數是1,但實際沒有做delete操作(類似於update完全相同值的場景),上述推測根據binlog分析的,再深入可能得根據MySQL源碼分析。

由此得出以下2個結論:

當有多個唯一索引時會多刪除數據,慎用replace,同時也儘量避免這種情形,滿足數據庫設計的第二範式。
只有一個主鍵or唯一索引的情況,可以使用insert on duplicate 操作,比replace操作性能好一些,減少對索引維護帶來的性能開銷。

3,Oracle數據庫中對應的實現是merge into語句

MERGE INTO --要插入的表 別名
USING (

  --查詢的SQL
  )別名 ON 
     --(連接條件)

WHEN MATCHED THEN --如果符合條件

 UPDATE SET 

WHEN NOT MATCHED THEN

 INSERT ()VALUES();--不符合就插
/*語法:
MERGE [INTO [schema .] table [t_alias]
USING [schema .] { table | view | subquery } [t_alias]
ON ( condition )
WHEN MATCHED THEN merge_update_clause
WHEN NOT MATCHED THEN merge_insert_clause;
*/

Oracle數據庫沒有和MySQL一樣的replace into語句,
但是有merge into 語句,可以同時實現update和insert的功能。
update子句後面可以跟delete子句來去掉一些不需要的行
delete只能和update配合,從而達到刪除滿足where條件的子句的記錄

Oracle 的 merge into 語句的insert、update、delete互相配合能達到跟MySQL replace語句一樣的效果。
由於沒有根據唯一約束重複值進行操作的replace,因此datax oracle writer不支持配置writeMode參數。

Leave a Reply

Your email address will not be published. Required fields are marked *