Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
ModuleNotFoundError: No module named 'com.databricks.spark.xml'I'm using Azure databricks, and I've added what I think is the correct library, Status InstalledCoordinatecom.databricks:spark-xml_2.12:0.13.0
Hi, have Databricks running on AWS, I'm looking for a way to know when is a good time to run optimize on partitioned tables. Taking into account that it's an expensive process, especially on big tables, how could I know if it's a good time to run it ...
@Alejandro Martinez​ - If Jose's answer resolved your question, would you be happy to mark his answer as best? That helps other members find the answer more quickly.
Hello there,I currently have the problem of deleted files still being in the transaction log when trying to call a delta table. What I found was this statement:%sql
FSCK REPAIR TABLE table_name [DRY RUN]But using it returned following error:Error in ...
I am using databricks runtime 9.1 LTS ML and I got this error when I tried to import Scikit Learn package. I got the following error message:TypeError Traceback (most recent call last)
<command-181041> in <module>
...
@Atanu Sarkar​ I am using databricks runtime 9.1ML LTS and python version is 3.8.10I am only just running import statementfrom sklearn.metrics import *
from sklearn.preprocessing import LabelEncoder
I have a notebook that writes a delta table with a statement similar to the following:match = "current.country = updates.country and current.process_date = updates.process_date"
deltaTable = DeltaTable.forPath(spark, silver_path)
deltaTable.alias("cu...
Initially, the affected table only had a date field as partition. So I partitioned it with country and date fields. This new partition created the country and date directories however the old directories of the date partition remained and were not de...
Hi all,​Does anyone know how to write simple SQL query to get all tables and columns name. In oracle we do ,select * from all tab columns. Similarly in SQL server we do select * from information schema . columns.​Do we have something like this in dat...
To view columns in a table, use SHOW COLUMNS.%sql
show columns in <schema_name>.<table_name>To show all the tables in a column, use following PySpark code:%python
schema_name = "default"
tbl_columns = {}
# Get all tables in a schema
tables = spar...
Hi Team,I'm trying to build a Real-time solution using Databricks and Event hubs.Something weird happens after a time that the process start.At the begining the messages flow through the process as expected with this rate: please, note that the last ...
Thanks for your answer @Hubert Dudek​ , Is already specifiedWhat do youn mean with this? This is the weird part of this, bucause the data is flowing good, but at any time is like the Job stop the reading or somethign like that and if I restart the ...
Using workspace API you can list out all the notebooks for a given user.The API response will tell you if the objects under the path is a folder or a notebook. If it's a folder then you can add it to the path and get notebooks within the folder.Put a...
I was trying to start of the Databricks cluster through a docker image. I followed the setup instruction. Excluding the additional setup to setup the IAM role and instance profile as I was facing issues.The image is stored on AWS ECR in a public repo...
org.apache.spark.SparkException: Job aborted due to stage failure: Task 19 in stage 26.0 failed 4 times, most recent failure: Lost task 19.3 in stage 26.0 (TID 4205, 10.66.225.154, executor 0): com.databricks.sql.io.FileReadException: Error while rea...
Hello, @Lili Ehrlich​. Welcome! My name is Piper, and I'm a moderator for Databricks. Thank you for bringing your question to us. Let's give it a while for the community to respond first.Thanks in advance for your patience.
I have installed ipopt solver of version 3.11.1 in azure databricks but while running the code it throwing an error - WARNING: Could not locate the 'ipopt' executable, which is required for solver ipoptApplicationError: No executable found for solv...
Hello @omran shaik​ In the past, we have recommended customers to use docker containers with Databricks as some of these solvers required native compilation and did not work great on the runtimes.With DCS you have full control of what you want to in...
The Next Databricks Office HoursOur next Office Hours session is scheduled for February 23, 2022 - 8:00 am PDTDo you have questions about how to set up or use Databricks? Do you want to get best practices for deploying your use case or tips on data a...
I'm reading a huge csv file including 39,795,158 records and writing into MSSQL server, on Azure Databricks. The Databricks(notebook) is running on a cluster node with 56 GB Memory, 16 Cores, and 12 workers.This is my code in Python and PySpark:from ...
Hi,If you are using Azure SQL DB Managed instance, could you please file a support request with Azure team? This is to review any timeouts, perf issues on the backend.Also, it seems like the timeout is coming from SQL Server which is closing the conn...
please have a look at the attached screenshotThree strings converted to float, each resulting in the same number. 22015683.000000000000000000 => 2201568422015684.000000000000000000 => 2201568422015685.000000000000000000 => 22015684
Hi @Maciej G​ ,I guess, this has something to do with the data type FLOAT and its precision.Floats are only an approximation with a given precision. Either you should consider using date type DOUBLE (double precision compared to FLOAT) - or, if you ...