java 库,用于从文件内容中查找哑剧类型
我正在搜索一个java库,它通过查看文件内容(字节数组)来告诉您哑剧类型。我发现这个项目使用jmimemagic,它不再支持较新的文件类型(例如。MS word docx格式),因为它现在处于非活动状态(从2006年开始)。
我正在搜索一个java库,它通过查看文件内容(字节数组)来告诉您哑剧类型。我发现这个项目使用jmimemagic,它不再支持较新的文件类型(例如。MS word docx格式),因为它现在处于非活动状态(从2006年开始)。
也许对需要最常用的办公格式的人有用(并且不使用Apache Tika):
public class MimeTypeUtils {
private static final Map<String, String> fileExtensionMap;
static {
fileExtensionMap = new HashMap<String, String>();
// MS Office
fileExtensionMap.put("doc", "application/msword");
fileExtensionMap.put("dot", "application/msword");
fileExtensionMap.put("docx", "application/vnd.openxmlformats-officedocument.wordprocessingml.document");
fileExtensionMap.put("dotx", "application/vnd.openxmlformats-officedocument.wordprocessingml.template");
fileExtensionMap.put("docm", "application/vnd.ms-word.document.macroEnabled.12");
fileExtensionMap.put("dotm", "application/vnd.ms-word.template.macroEnabled.12");
fileExtensionMap.put("xls", "application/vnd.ms-excel");
fileExtensionMap.put("xlt", "application/vnd.ms-excel");
fileExtensionMap.put("xla", "application/vnd.ms-excel");
fileExtensionMap.put("xlsx", "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet");
fileExtensionMap.put("xltx", "application/vnd.openxmlformats-officedocument.spreadsheetml.template");
fileExtensionMap.put("xlsm", "application/vnd.ms-excel.sheet.macroEnabled.12");
fileExtensionMap.put("xltm", "application/vnd.ms-excel.template.macroEnabled.12");
fileExtensionMap.put("xlam", "application/vnd.ms-excel.addin.macroEnabled.12");
fileExtensionMap.put("xlsb", "application/vnd.ms-excel.sheet.binary.macroEnabled.12");
fileExtensionMap.put("ppt", "application/vnd.ms-powerpoint");
fileExtensionMap.put("pot", "application/vnd.ms-powerpoint");
fileExtensionMap.put("pps", "application/vnd.ms-powerpoint");
fileExtensionMap.put("ppa", "application/vnd.ms-powerpoint");
fileExtensionMap.put("pptx", "application/vnd.openxmlformats-officedocument.presentationml.presentation");
fileExtensionMap.put("potx", "application/vnd.openxmlformats-officedocument.presentationml.template");
fileExtensionMap.put("ppsx", "application/vnd.openxmlformats-officedocument.presentationml.slideshow");
fileExtensionMap.put("ppam", "application/vnd.ms-powerpoint.addin.macroEnabled.12");
fileExtensionMap.put("pptm", "application/vnd.ms-powerpoint.presentation.macroEnabled.12");
fileExtensionMap.put("potm", "application/vnd.ms-powerpoint.presentation.macroEnabled.12");
fileExtensionMap.put("ppsm", "application/vnd.ms-powerpoint.slideshow.macroEnabled.12");
// Open Office
fileExtensionMap.put("odt", "application/vnd.oasis.opendocument.text");
fileExtensionMap.put("ott", "application/vnd.oasis.opendocument.text-template");
fileExtensionMap.put("oth", "application/vnd.oasis.opendocument.text-web");
fileExtensionMap.put("odm", "application/vnd.oasis.opendocument.text-master");
fileExtensionMap.put("odg", "application/vnd.oasis.opendocument.graphics");
fileExtensionMap.put("otg", "application/vnd.oasis.opendocument.graphics-template");
fileExtensionMap.put("odp", "application/vnd.oasis.opendocument.presentation");
fileExtensionMap.put("otp", "application/vnd.oasis.opendocument.presentation-template");
fileExtensionMap.put("ods", "application/vnd.oasis.opendocument.spreadsheet");
fileExtensionMap.put("ots", "application/vnd.oasis.opendocument.spreadsheet-template");
fileExtensionMap.put("odc", "application/vnd.oasis.opendocument.chart");
fileExtensionMap.put("odf", "application/vnd.oasis.opendocument.formula");
fileExtensionMap.put("odb", "application/vnd.oasis.opendocument.database");
fileExtensionMap.put("odi", "application/vnd.oasis.opendocument.image");
fileExtensionMap.put("oxt", "application/vnd.openofficeorg.extension");
}
public static String getContentTypeByFileName(String fileName) {
// 1. first use java's buildin utils
FileNameMap mimeTypes = URLConnection.getFileNameMap();
String contentType = mimeTypes.getContentTypeFor(fileName);
// 2. nothing found -> lookup our in extension map to find types like ".doc" or ".docx"
if (!StringUtils.hasText(contentType)) {
String extension = FilenameUtils.getExtension(fileName);
contentType = fileExtensionMap.get(extension);
}
return contentType;
}
}
使用 Apache tika 进行内容检测。请找到下面的链接。http://tika.apache.org/0.8/detection.html。我们有很多jar依赖项,当您使用maven构建tika时,您可以找到它们
ByteArrayInputStream bai = new ByteArrayInputStream(pByte);
ContentHandler contenthandler = new BodyContentHandler();
Metadata metadata = new Metadata();
Parser parser = new AutoDetectParser();
try {
parser.parse(bai, contenthandler, metadata);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TikaException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Mime: " + metadata.get(Metadata.CONTENT_TYPE));
return metadata.get(Metadata.CONTENT_TYPE);