Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
I have a pyspark dataframe that I'm writing to an on-prem MSSQL server--it's a stopgap while we convert data warehousing jobs over to databricks. The processes that use those tables in the on-prem server rely on the tables maintaining the identical s...
I am trying to read data into a dataframe from Azure SQL DB, using jdbc. Here is the code I am using.driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
database_host = "server.database.windows.net"
database_port = "1433"
database_name = "dat...
Hi everyone, I've been stuck for the past two days on this issue with my Databricks JDBC driver and I'm hoping someone can give me more insight into how to troubleshoot. I am using the Databricks JDBC driver in RStudio and the connection was working ...
@Debbie Ng From your message I see there was a windows update and this failure started. based on the conversation you tried latest version of the driver and still you face the problem. I believe this is something related to the Java version compatib...
I used code like below to Use JDBC connect to databrick default cluster and read table into pyspark dataframeurl = 'jdbc:databricks://[workspace domain]:443/default;transportMode=http;ssl=1;AuthMech=3;httpPath=[path];AuthMech=3;UID=token;PWD=[your_ac...
@yu zhang :It looks like the issue with the first code snippet you provided is that it is not specifying the correct query to retrieve the data from your database.When using the load() method with the jdbc data source, you need to provide a SQL quer...
Hi! I am inserting a pyspark dataframe to Azure sql server and it takes a very long time. The database is a s4 but my dataframe that is 17 million rows and 30 columns takes up to 50 minutes to insert.Is there a way to significantly speed this up? I a...
@Hjalmar Friden :There are several ways to improve the performance of inserting data into Azure SQL Server using JDBC connector:Increase the batch size: By default, the JDBC connector sends data in batches of 1000 rows at a time. You can increase th...
Hi @Venkata Krishna Jonnalagadda Hope you are well.Just checking in. If @John Lourdu's answer helped, would you let us know and mark the answer as best? If not, would you be happy to give us more information?Thanks!
Using ODBC or JDBC to read from a table fails when I attempt to use an ORDER BY clause. In one sample case, I have a fairly small table (just 1946 rows).select *
from some_table
order by some_fieldResult:java.lang.IllegalArgumentException: requiremen...
Hi @petter@hightouch.com Petter Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it doe...
We are trying to read a column which is enum of array datatype from postgres as string datatype to target. We could able to achieve this by expilcitly using concat function while extracting like belowval jdbcDF3 = spark.read .format("jdbc") .option(...
Hi, first of all thahks for your work in databricks sql.Unfortunately i am having a problem running insert-selects statements programatically using the jdbc driver.They all have the form:`insert into `mytable` select 1, 'foo', moreLiterals`The statem...
I am doing a batch load, using the JDBC driver from a database table. I am noticing in Sparkui, that there is both memory and disk spill, but only on one executor. I am also, noticing that when trying to use the JDBC parallel read, it seems to run sl...
What are the reasons behind Databricks going for their own driver? What differences are made when switching between the previous Spark driver and the new Databricks driver?Is there any specific document I can look at or just the release notes?Also, w...
Hey @Sriramkumar Thamizharasan Hope all is well! Just wanted to check in if you were able to resolve your issue would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. We'd love to hear from...
I'm using the Databricks JDBC driver recently made available via Maven:https://mvnrepository.com/artifact/com.databricks/databricks-jdbc/2.6.25While trying to create a table with `GENERATED` columns I receive the following exception:Caused by: java.s...
I was under the impression that this has been recognised as a BUG and is being handled by Databricks.What do I need to do for reporting the issue officially as a BUG?
Hi there!I am using the SparkJDBC42.jar in my Java application to use my delta lake tables , The connection is made through databricks sql endpoint in where I created a database and store in it my delta tables. I have a simple code to open connection...