by
Quan
• New Contributor III
- 10195 Views
- 9 replies
- 6 kudos
Hello all, I'm trying to pull table data from databricks tables that contain foreign language characters in UTF-8 into an ETL tool using a JDBC connection. I'm using the latest Simba Spark JDBC driver available from the Databricks website.The issue i...
- 10195 Views
- 9 replies
- 6 kudos
Latest Reply
Can you try setting UseUnicodeSqlCharacterTypes=1 in the driver, and also make sure 'file.encoding' is set to UTF-8 in jvm and see if the issue still persists?
8 More Replies
- 857 Views
- 3 replies
- 0 kudos
Hi TeamI was wondering if there is a document or step by step process to promote code in CICD across various environments of code repository (GIT/GITHUB/BitBucket/Gitlab) with DBx support? [Without involving code repository merging capability of the ...
- 857 Views
- 3 replies
- 0 kudos
Latest Reply
Please refer this related thread on CICD in Databricks https://community.databricks.com/s/question/0D53f00001GHVhMCAX/what-are-some-best-practices-for-cicd
2 More Replies
- 2241 Views
- 6 replies
- 4 kudos
Hello:
As you can see from below link, that it support 7 file formats. I am dealing with GeoSpatial Shape files and I want to know if Auto Loader can support Shape Files ???Any help on this is greatly appreciated.
Thanks.
https://docs.microsoft.com/...
- 2241 Views
- 6 replies
- 4 kudos
Latest Reply
You could try to use the binary file type. But the disadvantage of this is that the content of the shape files will be put into a column, that might not be what you want.If you absolutely want to use the autoloader, maybe some thinking outside the b...
5 More Replies
- 667 Views
- 2 replies
- 0 kudos
Hello - i've been using the Databricks notebook(for pyspark or scala/spark development), and recently have had issues wherein the cluster creation takes a long time to get created, often timing out. Any ideas on how to resolve this ?
- 667 Views
- 2 replies
- 0 kudos
Latest Reply
Hi Karankaran.alang,What is the error message you are getting? did you get this error while creating/starting a cluster CE?some times these errors are intermittent and go away after a few re-tries.Thank you
1 More Replies
- 1992 Views
- 3 replies
- 1 kudos
Hi,I have setup a streaming process that consumers files from HDFS staging directory and writes into target location. Input directory continuesouly gets files from another process.Lets say file producer produces 5 million records sends it to hdfs sta...
- 1992 Views
- 3 replies
- 1 kudos
Latest Reply
If it helps , you run try running the Left-Anti join on source and sink to identify missing records and see whether the record is in match with the schema provided or not
2 More Replies
- 1439 Views
- 2 replies
- 1 kudos
(since Spark 3.0)Dataset.queryExecution.debug.toFilewill dump the full plan to a file, without concatenating the output as a fully materialized Java string in memory.
- 1439 Views
- 2 replies
- 1 kudos
Latest Reply
Notebooks really aren't the best method of viewing large files. Two methods you could employ areSave the file to dbfs and then use databricks CLI to download the fileUse the web terminalIn the web terminal option you can do something like "cat my_lar...
1 More Replies
- 556 Views
- 1 replies
- 0 kudos
Hello,
As you can see from below link, that it support 7 file formats. I am dealing with GeoSpatial Shape files and I want to know if Auto Loader can support Shape Files ???Any help on this is greatly appreciated.
avro: Avro filebinaryFile: Binary f...
- 556 Views
- 1 replies
- 0 kudos
Latest Reply
Hi @Yuv Saha​ ,Currently, shapefiles are not a supported file-type when using auto-loader. Would you be willing to share more about your use case? I am the Product Manager responsible for Geospatial in Databricks, and I need help from customers like ...
- 4109 Views
- 5 replies
- 1 kudos
On a regular cluster, you can use:```spark.sparkContext._jsc.hadoopConfiguration().set(key, value)```These values are then available on the executors using the hadoop configuration. However, on a high concurrency cluster, attempting to do so results ...
- 4109 Views
- 5 replies
- 1 kudos
Latest Reply
I am not sure why you are getting that error on a high concurrency cluster. As I am able to set the configuration as you show above. Can you try the following code instead? sc._jsc.hadoopConfiguration().set(key, value)
4 More Replies