编组
产品服务

澜舟认知智能平台

澜舟认知智能平台是以 NLP 为核心的行业知识服务云平台,提供知识图谱、机器翻译、辅助写作、情感分析、智能检索、智能客服、文档解析等一系列服务和能力,可高效地接入搜索、生成、翻译、对话等业务场景。

AIGC(智能创作)平台
文学辅助写作
文档核心技术
解决方案

行业解决方案

深入金融、营销、文化创意行业,聚焦行业内的核心场景提供解决方案,助力企业智能化升级。

金融


孟子社区关于我们
控制台用户中心

合同信息抽取

从结构复杂的合同文档中抽取关键信息,当前仅支持PDF格式的合同文档。

请求地址#

发送 POST 请求到:

https://open.langboat.com

通信协议#

澜舟科技合同信息抽取服务所提供的 API 接口均通过 HTTPS 进行通信,提供高安全性的通信通道。

调用量统计#

按调用成功次数统计调用量。

请求参数#

请求头#

请求头参数是否必须描述
Accept固定值 application/json
Authorization用于验证请求合法性的认证信息,值为 AccessKey:signature格式。
Content-Type固定值 application/json
Content-MD5HTTP 协议消息体的 128-bit MD5 散列值转换成 Base64 编码的结果。
Date请求时间,GMT 格式,如: Wed, 20 Apr. 2022 17:01:00 GMT。
x-langboat-signature-nonce唯一随机数,用于防止网络重放攻击。在不同请求中要使用不同的随机数值。
x-langboat-signature-method签名方法,目前只支持 HMAC-SHA256

签名计算方法

  1. 计算 body 的 MD5 值,然后转换为十六进制编码,编码后的值设置到 Content-MD5 Header

计算示例:

  • body = {"sourceText": "Where there is a will, there is a way."}
  • body = {\"pdfBase64\": \"<pdf base64>\"}
  • MD5(body) = byte(d41d8cd98f00b204e9800998ecf8427e)
  • Base64(MD5(body)) = 1B2M2Y8AsgTpgAmY7PhCfg==

Content-MD5 Header: Content-MD5: 1B2M2Y8AsgTpgAmY7PhCfg==

  1. 使用请求中的 Header 参数构造待签名的 HeaderToSign:
StringToSign =
HTTP-Verb + "\n" + //HTTP-Verb只支持POST
Accept + “\n” + //Accept为application/json
Content-MD5 + "\n" + //第1步中计算出来的MD5值
Content-Type + "\n" + //Content-Type值为application/json
Date + "\n" + //Date值为GMT时间
x-langboat-signature-method + "\n" + // 只支持 HMAC-SHA256
x-langboat-signature-nonce + "\n";

StringToSign 示例:

POST
application/json
mZFLkyvTelC5g8XnyQrpOw==
application/json
Wed, 20 Jul 2022 13:04:02 GMT
HMAC-SHA256
10191
  1. 使用请求中的 Queries 构造待签名的 queryToSign。将query字符串(?后的所有参数)根据字典排序升序排列并以&分隔,示例如下:
action=contractExtraction
  1. 将上两步构造的规范化字符串按照下面的规则构造成待签名的字符串
stringToSign = headerStringToSign + queryToSign;
  1. 计算签名 signature。按照 RFC2104的定义,计算待签名字符串 stringToSign 的 HMAC 值,按照 Base64 编码规则把 HMAC 值编码成字符串,并在前面加上 AccessKey ,即得到签名值(Authorization),示例如下
Signature = Base64(HMAC-SHA256( AccessSecret, UTF-8-Encoding-Of(stringToSign)))
Authorization = AccessKey + ":" + Signature

Signature 示例:po/vsPI0RcvY/eu4bohhxxADHyPj4/rcglLTQEBtHQM=

Authorization 示例: Authorization: 7Bo9ByyiTWRC1Y8KJJQ9cWtNpZLmrgyb:po/vsPI0RcvY/eu4bohhxxADHyPj4/rcglLTQEBtHQM=

Query 参数#

QUERY参数是否必须描述
action固定值:contractExtraction

Body 参数#

BODY参数是否必须描述
pdfBase64PDF文件的Base64值

请求示例:

POST https://open.langboat.com/?action=contractExtraction
Content-Type: application/json
Content-MD5: 6XC6zMGHuc/2BI4Bx0lKRQ==
Date: Sun, 17 Jul 2022 08:07:51 GMT
Accept: application/json
x-langboat-signature-method: HMAC-SHA256
x-langboat-signature-nonce: 3588
Authorization: aAM9NHSkWY40wkd7EUg6HFuFzJpJPG6E:9bu0dGv/1xGnUzin2HOe2TUa0Frf+5WE4FnNHJxeT6Q=
{"pdfBase64": "<PDF base64>"}

响应#

响应体#

响应体是一个 JSON 。 data字段:

  • key为抽取关键词;
  • start为抽取的文本在当前页文本中起始位置;
  • end为抽取的文本在当前页文本中结束位置;
  • text为抽取的实体;
  • pred为抽取的实体类别;
  • page为抽取的实体所在页数;
  • status为抽取状态, 1表示成功,0表示失败;

结构示例:

{
"code": 0,
"message": "success",
"requestId": "004700cf3e118d4a274c5564b4044860",
"data": {
"results": [
{
"key": "合同名称",
"values": [
{
"start": 0,
"end": 15,
"text": "海关2021-2022年出入境预防接种疫苗供货合同",
"pred": "合同名称",
"page": 0
}
]
},
{
"key": "合同编号",
"values": [
{
"start": 21,
"end": 31,
"text": "BJZX-HPV9-2022008",
"pred": "合同编号",
"page": 0
}
]
},
{
"key": "采购人名称",
"values": [
{
"start": 36,
"end": 50,
"text": "重庆国际旅行卫生保健中心(重庆海关口岸门诊部)",
"pred": "采购人名称",
"page": 0
}
]
},
{
"key": "供应商名称",
"values": [
{
"start": 55,
"end": 61,
"text": "重庆智飞生物制品股份有限公司",
"pred": "供应商名称",
"page": 0
}
]
},
{
"key": "主要标的名称",
"values": [
{
"start": 192,
"end": 197,
"text": "九价人乳头",
"pred": "主要标的名称",
"page": 0
},
{
"start": 213,
"end": 216,
"text": "瘤病毒疫苗",
"pred": "主要标的名称",
"page": 0
}
]
},
{
"key": "主要标的单价",
"values": [
{
"start": 211,
"end": 212,
"text": "1298",
"pred": "主要标的单价",
"page": 0
}
]
},
{
"key": "主要标的数量",
"values": [
{
"start": 219,
"end": 220,
"text": "192支",
"pred": "主要标的数量",
"page": 0
}
]
},
{
"key": "合同金额",
"values": [
{
"start": 221,
"end": 224,
"text": "249216元",
"pred": "合同金额",
"page": 0
},
{
"start": 231,
"end": 249,
"text": "人民币贰拾肆万玖仟贰佰壹拾陆元整249216元",
"pred": "合同金额",
"page": 0
}
]
}
],
"status": 1
}
}

响应状态码、业务编码#

下面是可能的HTTP响应状态码、业务(错误)编码:

HTTP 状态码业务编码描述
2000返回成功
40010400请求异常
40110401鉴权失败,核对AccessKey和AccessSecret 是否正确
40310403权限不足,查看是否开通服务;或QPS,字符数,次数超过限制
42210422参数错误,核对请求参数
42910429超过请求限制(QPS,字符数,次数超过限制)
50010500服务异常

错误响应示例

  • 鉴权失败
{
"code": 10401,
"message": "鉴权失败,核对AccessKey和AccessSecret 是否正确",
"requestId": "33f0057e-f421-fb94-766f-608d837969ca"
}
  • 参数错误:不支持的 action
{
"code": 10422,
"message": "参数错误,核对请求参数, 不支持的action : generateTemplate",
"requestId": "6cc3f2a9-5fd1-4872-9c59-d6f10ceb1e62"
}
  • 权限不足
{
"code": 10403,
"message": "权限不足,查看是否开通服务",
"requestId": "6aa60868-d3f1-a7e5-8fbb-e2f96d65e7b5"
}

示例代码#

Java (JDK-11)

import cn.hutool.core.util.URLUtil;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import javax.crypto.Mac;
import javax.crypto.spec.SecretKeySpec;
import java.io.*;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection;
import java.security.MessageDigest;
import java.text.SimpleDateFormat;
import java.util.*;
public class LangboatOpenClient {
private final String accessKey;
private final String accessSecret;
private final String url;
private static final Base64.Encoder ENCODER_64 = Base64.getEncoder();
public LangboatOpenClient(String accessKey, String accessSecret) {
this.accessKey = accessKey;
this.accessSecret = accessSecret;
this.url = "https://open.langboat.com";
}
public LangboatOpenClient(String accessKey, String accessSecret, String url) {
this.accessKey = accessKey;
this.accessSecret = accessSecret;
this.url = url;
}
/*
* 计算 MD5 + Base64
*/
private String MD5Base64(String s) {
if (s == null)
return null;
String encodeStr = "";
byte[] utfBytes = s.getBytes();
MessageDigest mdTemp;
try {
mdTemp = MessageDigest.getInstance("MD5");
mdTemp.update(utfBytes);
byte[] md5Bytes = mdTemp.digest();
encodeStr = Base64.getEncoder().encodeToString(md5Bytes);
} catch (Exception e) {
throw new Error("Failed to generate MD5 : " + e.getMessage());
}
return encodeStr;
}
/*
* 计算 HMAC-SHA256 + Base64 编码
*/
private String HMACSha256Base64(String data, String key) {
String result;
try {
SecretKeySpec signingKey = new SecretKeySpec(key.getBytes(), "HmacSHA256");
Mac mac = Mac.getInstance("HmacSHA256");
mac.init(signingKey);
byte[] rawHmac = mac.doFinal(data.getBytes());
result = Base64.getEncoder().encodeToString(rawHmac);
} catch (Exception e) {
throw new Error("Failed to generate HMAC : " + e.getMessage());
}
return result;
}
/*
* 获取时间
*/
private String toGMTString(Date date) {
SimpleDateFormat df = new SimpleDateFormat("E, dd MMM yyyy HH:mm:ss z", Locale.CHINA);
df.setTimeZone(new java.util.SimpleTimeZone(0, "GMT"));
return df.format(date);
}
public Object inference(Map<String, String> queries, Map<String, Object> data) {
PrintWriter out = null;
BufferedReader in = null;
StringBuilder result = new StringBuilder();
try {
StringBuilder queriesStr = new StringBuilder();
queries.forEach((k, v) -> queriesStr.append("&").append(k).append("=").append(URLUtil.encode(v)));
queriesStr.setCharAt(0, '?');
URL openUrl = new URL(this.url +queriesStr);
String body = new ObjectMapper().writeValueAsString(data);
String method = "POST";
String accept = "application/json";
String contentType = "application/json";
String date = toGMTString(new Date());
// 1.对body做MD5+BASE64加密
String bodyMd5 = MD5Base64(body);
String nonce = "" + (int) (Math.random() * 65535);
String headerToSign = method + "\n" + accept + "\n" + bodyMd5 + "\n"
+ contentType + "\n" + date + "\n"
+ "HMAC-SHA256\n"
+ nonce + "\n";
// 2.计算 queryToSign
List<String> queriesList = new ArrayList<>();
queries.forEach((k, v) -> queriesList.add(k + "=" + v));
Collections.sort(queriesList);
String queryToSign = String.join("&", queriesList);
// 3.计算 stringToSign
String stringToSign = headerToSign + queryToSign;
// 4.计算 HMAC-SHA256 + Base64
String signature = HMACSha256Base64(stringToSign, this.accessSecret);
// 5.得到 authorization header 值
String authorization = this.accessKey + ":" + signature;
URLConnection conn = openUrl.openConnection();
conn.setRequestProperty("Accept", accept);
conn.setRequestProperty("Content-Type", contentType);
conn.setRequestProperty("Content-MD5", bodyMd5);
conn.setRequestProperty("Date", date);
conn.setRequestProperty("Authorization", authorization);
conn.setRequestProperty("x-langboat-signature-nonce", nonce);
conn.setRequestProperty("x-langboat-signature-method", "HMAC-SHA256");
// POST
conn.setDoOutput(true);
conn.setDoInput(true);
out = new PrintWriter(conn.getOutputStream());
// 发送请求参数
out.print(body);
// flush输出流的缓冲
out.flush();
// 定义BufferedReader输入流来读取URL的响应
InputStream is;
HttpURLConnection httpConn = (HttpURLConnection) conn;
if (httpConn.getResponseCode() == 200) {
is = httpConn.getInputStream();
} else {
is = httpConn.getErrorStream();
}
in = new BufferedReader(new InputStreamReader(is));
String line;
while ((line = in.readLine()) != null) {
result.append(line);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
if (out != null) {
out.close();
}
if (in != null) {
in.close();
}
} catch (IOException ex) {
ex.printStackTrace();
}
}
return result.toString();
}
/**
* PDF 转 Base64字符串
* @param file 需要转Base64的文件
* @return Base64 字符串
*/
public String fileToBase64Str(File file) throws IOException {
String base64Str = null;
FileInputStream fin = null;
BufferedInputStream bin = null;
ByteArrayOutputStream baos = null;
BufferedOutputStream bout = null;
try {
fin = new FileInputStream(file);
bin = new BufferedInputStream(fin);
baos = new ByteArrayOutputStream();
bout = new BufferedOutputStream(baos);
// io
byte[] buffer = new byte[1024];
int len = bin.read(buffer);
while (len != -1) {
bout.write(buffer, 0, len);
len = bin.read(buffer);
}
// 刷新此输出流,强制写出所有缓冲的输出字节
bout.flush();
byte[] bytes = baos.toByteArray();
// Base64字符编码
base64Str = ENCODER_64.encodeToString(bytes).trim();
} catch (IOException e) {
e.getMessage();
} finally {
try {
fin.close();
bin.close();
bout.close();
} catch (IOException e) {
e.getMessage();
}
}
return base64Str;
}
public static void main(String[] args) throws JsonProcessingException {
LangboatOpenClient client = new LangboatOpenClient("<Your Access Key>", "<Your Access Secret>");
// 合同信息抽取
File filePath = new File("/Users/admin/Downloads/test.pdf");
String pdfBase64 = "";
try {
pdfBase64 = client.fileToBase64Str(filePath);
} catch (IOException e) {
e.getMessage();
}
Map<String, String> queries = Map.of(
"action", "contractExtraction"
);
Map<String, Object> data = Map.of(
"pdfBase64", pdfBase64
);
Object o = client.inference(queries, data);
System.out.println(o);
}
}

Python (>=3.6)

# -*- coding: utf-8 -*-
import base64
import datetime
import hashlib
import hmac
import json
import random
import requests
class LangboatOpenClient:
"""澜舟开放平台客户端"""
def __init__(self,
access_key: str,
access_secret: str,
url: str = "https://open.langboat.com"):
self.access_key = access_key
self.access_secret = access_secret
self.url = url
def _build_header(self, query: dict, data: dict) -> dict:
accept = "application/json"
# 1. body MD5 加密
content_md5 = base64.b64encode(
hashlib.md5(
json.dumps(data).encode("utf-8")
).digest()
).decode()
content_type = "application/json"
gmt_format = '%a, %d %b %Y %H:%M:%S GMT'
date = datetime.datetime.utcnow().strftime(gmt_format)
signature_method = "HMAC-SHA256"
signature_nonce = str(random.randint(0, 65535))
header_string = f"POST\n{accept}\n{content_md5}\n{content_type}\n" \
f"{date}\n{signature_method}\n{signature_nonce}\n"
# 2. 计算 queryToSign
queries_str = []
for k, v in sorted(query.items(), key=lambda item: item[0]):
if isinstance(v, list):
for i in v:
queries_str.append(f"{k}={i}")
else:
queries_str.append(f"{k}={v}")
queries_string = '&'.join(queries_str)
# 3.计算 stringToSign
sign_string = header_string + queries_string
# 4.计算 HMAC-SHA256 + Base64
secret_bytes = self.access_secret.encode("utf-8")
# 5.计算签名
signature = base64.b64encode(
hmac.new(secret_bytes, sign_string.encode("utf-8"), hashlib.sha256).digest()
).decode()
res = {
"Content-Type": content_type,
"Content-MD5": content_md5,
"Date": date,
"Accept": accept,
"X-Langboat-Signature-Method": signature_method,
"X-Langboat-Signature-Nonce": signature_nonce,
"Authorization": f"{self.access_key}:{signature}"
}
return res
def inference(self, queries: dict, data: dict) -> (int, dict):
"""
调用
:param queries: query 参数
:param data: request body 数据
:return: response status, response body to json
"""
headers = self._build_header(queries, data)
response = requests.post(url=self.url, headers=headers, params=queries, json=data)
return response.status_code, response.json()
if __name__ == '__main__':
_access_key = '<Your access_key>'
_access_secret = '<Your access_secret>'
client = LangboatOpenClient(
access_key=_access_key,
access_secret=_access_secret
)
# 合同信息抽取
_queries = {
"action": "contractExtraction",
}
path = "/Users/admin/Downloads/test.pdf"
with open(path, "rb") as pdf_file:
pdf_base64 = base64.b64encode(pdf_file.read())
pdf_base64 = str(pdf_base64, 'utf8')
_data = {
"pdfBase64": str(pdf_base64),
}
status_code, result = client.inference(_queries, _data)
print("response status:", status_code)
print("response json:", json.dumps(result, ensure_ascii=False, indent=2))

Go (>=1.14)

package main
import (
"crypto/hmac"
"crypto/md5"
"crypto/sha256"
"encoding/base64"
"encoding/json"
"fmt"
"io/ioutil"
"log"
"math/rand"
"net/http"
"net/url"
"os"
"sort"
"strings"
"time"
)
func main() {
client := OpenClient{
baseURL: "https://open.langboat.com",
accessKey: "Your_Access_Key",
accessSecret: "Your_Access_Secret",
}
file, err := os.Open("/Users/admin/Downloads/test.pdf")
if err != nil {
log.Fatal("fail to open file")
}
fileContent, err := ioutil.ReadAll(file)
if err != nil {
log.Fatal("fail to read file")
}
pdfBase64 := base64.StdEncoding.EncodeToString(fileContent)
// 合同信息抽取
queries := map[string]string{
"action": "contractExtraction",
}
data := map[string]interface{}{
"pdfBase64": pdfBase64,
}
resp := client.Inference(queries, data)
response, ok := resp.(*http.Response)
if !ok {
log.Fatal("fail to convert response")
}
body, err := ioutil.ReadAll(response.Body)
if err != nil {
log.Fatal("fail to read response body")
}
log.Println(string(body))
}
type OpenClient struct {
baseURL string
accessKey string
accessSecret string
}
// Inference 调用服务。queries: query 参数;data: request body 数据
func (c *OpenClient) Inference(queries map[string]string, data map[string]interface{}) interface{} {
var queriesStr = ""
var first = true
for k, v := range queries {
if first {
queriesStr += "?" + k + "=" + url.QueryEscape(v)
first = false
} else {
queriesStr += "&" + k + "=" + url.QueryEscape(v)
}
}
dataJson, err := json.Marshal(data)
if err != nil {
log.Fatal(err.Error())
}
targetURL := c.baseURL + queriesStr
client := &http.Client{
Timeout: 15 * time.Second,
}
// 构造header
var (
payload = strings.NewReader(string(dataJson))
date = time.Now().UTC().Format(http.TimeFormat)
nonce = fmt.Sprint(10000 + rand.Intn(89999))
)
// 签名
signParam := SignParam{
Body: string(dataJson),
Query: queriesStr[1:],
DateGMT: date,
Nonce: nonce,
}
contentMD5, signature := GenSignature(signParam, c.accessSecret)
// 设置header
headers := map[string]string{
"Authorization": c.accessKey + ":" + signature,
"Content-Type": "application/json",
"Accept": "application/json",
"Date": date,
"Content-MD5": contentMD5,
"x-langboat-signature-method": "HMAC-SHA256",
"x-langboat-signature-nonce": nonce,
}
req, _ := http.NewRequest("POST", targetURL, payload)
for k, v := range headers {
req.Header.Add(k, v)
}
resp, err := client.Do(req)
if err != nil {
log.Println(err.Error())
}
return resp
}
// SignParam 生成签名需要的参数
type SignParam struct {
Body string // body数据
Query string // 原始query
DateGMT string // GTM时间
Nonce string // 随机数
}
func getMD5(str string) []byte {
h := md5.New()
h.Write([]byte(str))
return h.Sum(nil)
}
func hmacSha256(data string, secret string) []byte {
h := hmac.New(sha256.New, []byte(secret))
h.Write([]byte(data))
return h.Sum(nil)
}
func resortQuery(src string) string {
queries, _ := url.ParseQuery(src)
keys := make([]string, 0)
for k := range queries {
keys = append(keys, k)
}
sort.Strings(keys)
newQuery := url.Values{}
for _, k := range keys {
for _, value := range queries[k] {
newQuery.Add(k, value)
}
}
return newQuery.Encode()
}
// GenSignature 生成签名
func GenSignature(src SignParam, accessSecret string) (string, string) {
// 计算body的md5值
md5str := getMD5(src.Body)
// base64后得到contentMD5
contentMD5 := base64.StdEncoding.EncodeToString(md5str)
// query解析,并按照字典序重新排列
query := resortQuery(src.Query)
query, _ = url.QueryUnescape(query)
// 需要做签名的字符串结构
stringToSign := `POST
application/json
%s
application/json
%s
HMAC-SHA256
%s
%s`
stringToSign = fmt.Sprintf(stringToSign, contentMD5, src.DateGMT, src.Nonce, query)
hmac256 := hmacSha256(stringToSign, accessSecret)
signature := base64.StdEncoding.EncodeToString(hmac256)
return contentMD5, signature
}

产品服务

AIGC(智能创作)平台机器翻译平台孟子大模型金融 NLP 平台

商务合作邮箱

bd@langboat.com

ewm

gongan京公网安备 11010802035393号京ICP备 2021021087号

经营许可证编号:合字 B2-20220333


ewm

澜舟公众号

support
business