- Status Closed
- Percent Complete
- Task Type Feature Request
- Category QCAD/CAM
-
Assigned To
Andrew - Operating System All
- Severity Low
- Priority Very Low
- Reported Version 3.4.5
- Due in Version Undecided
-
Due Date
Undecided
- Votes
- Private
Opened by Tamas TEVESZ - 31.12.2013
Last edited by Andrew - 21.01.2014
FS#1003 - QCAD Community Edition: add unicode support for layer names, block names
Community version (still on FBSD only), so dxflib. I suspect the culprit is dxflib.
I have created layers and blocks and whatnot with (hungarian) accented characters in their names.
Apparently (according to the Internet, as evidenced probably most glaringly by usa.autodesk.com /adsk/servlet/ps/dl/item?siteID=123112&id=7586582&linkID=9240617) the R15 DXF version assumes single-byte character sets being used. Quick grepping the DXF2000 mentions “String (255-character maximum; less for Unicode strings)” (Group Code Value types), so it may be a false track...
Anyway, the DXF file written does have strings converted to single-byte encoding, but it seems it’s always ANSI-1252. When the output encoder encounters a character that is not representable in this one, it will use a literal question mark.
Actual case, I have a block with the name
106 egypólusú váltókapcsoló jelzőfénnyel
Of this, “ő” (U+0151) is not representable in ANSI-1252, so what gets written to the dxf is (non-ASCII shown in hex)
106 egyp<f3>lus<fa> v<e1>lt<f3>kapcsol<f3> jelz?f<e9>nnyel
Note the literal question mark.
Now the problem is this is an irreversible operation but the result is perfectly valid ANSI-1252, so upon opening the file again, I will get a block named
106 egypólusú váltókapcsoló jelz?fénnyel
IMHO the ideal resolution is to
- Have a preference for the export code page (and use it, too, circumstances permitting)
- Iff this is not set (or set to a default “Use system locale to determine” or something), use a look-up table to take a good guess (like old QCAD2 qcadlib/src/engine/rs_system.cpp:QCString RS_System::localeToISO())
- If the output encoder encounters a character that is not representable in the target code page, throw an error with an option to ignore the error (and keep using question marks, but then this must have been acknowledged by the user so not silent problem anymore), pick a new output code page, whatever else
This all assuming the R15 doesn’t actually depends hardly on ANSI-1252 and ANSI-1252 only. In that case, option #3 would still be nice.
Most Western European languages (and English) are not affected by this as ANSI-1252 has most of them covered, but a little to the east, a little to the south, a little to the north, and it does make a bit of a difference :)
21.01.2014 16:44
Reason for closing: Implemented
Additional comments about closing:
https://github.com/qcad/qcad/commit /c4573ae3ab2bc33620ce133f48dc9d321e7593c c
Hm, upon closer look, the DXF reference (both for R14 and for R19, have not looked at others) says this about $DWGCODEPAGE:
I couldn't quickly find any more elaborate reference to strings. To me, this suggests that AutoCAD (and, perhaps a too quick conclusion, consequently other CAD software) doesn't care about strings too much, apart from displaying them.
(This also suggests that in the end this isn't going to be a dxflib problem but a QCAD problem, as for dxflib strings are just an arbitrary stream of arbitrary bits, and it's the application that needs to make some sort of sense of these bits.)
I see three possible courses of action, from which the user should be able to choose (per-drawing or application-wide I am not sure):
Phew.
The QCAD Community Edition does not support use of non-ASCII characters for layer and block names at this point. I've changed this into a feature request since this is a known limitation of the community edition.
While these things are typically not documented by Autodesk, it seems that code page 'ANSI_1251' means 'Latin1' for DXF R15. As newer DXF versions were released, it changed its meaning at one point to 'Utf-8'.
Since the QCAD Community Edition has no support for newer DXF format versions, this means that all non-ASCII text has to be escaped (\U+xxxx).