cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pradeepvatsvk
by New Contributor III
  • 2191 Views
  • 2 replies
  • 0 kudos

polars to natively read and write through adls

HI Everyone,Is there a way polars can directly read files from ADLS  through abfss protocol.

  • 2191 Views
  • 2 replies
  • 0 kudos
Latest Reply
jennifer986bloc
New Contributor II
  • 0 kudos

@pradeepvatsvk wrotae:HI Everyone,Is there a way polars can directly read files from ADLS  through abfss protocol.Hello @pradeepvatsvk,Yes, Polars can directly read files from Azure Data Lake Storage (ADLS) using the ABFS (Azure Blob Filesystem) prot...

  • 0 kudos
1 More Replies
Rafael-Sousa
by Contributor II
  • 1730 Views
  • 3 replies
  • 0 kudos

Managed Delta Table corrupted

Hey guys,Recently, we have add some properties to our delta table and after that, the table shows error and we cannot do anything. The error is that: (java.util.NoSuchElementException) key not found: spark.sql.statistics.totalSizeI think maybe this i...

  • 1730 Views
  • 3 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @Rafael-Sousa, Could you please raise a support case for this, to investigate this further? help@databricks.com

  • 0 kudos
2 More Replies
samtech
by New Contributor
  • 1126 Views
  • 1 replies
  • 1 kudos

DAB multiple workspaces

Hi,We have 3 regional workspaces. Assume that we keep seperate folder for notebook say amer/xx , apac/xx, emea/xx and sepeate job/pipeline configrations for each region in git how to make sure during deploy appropriate job/pipleines are deployed in r...

  • 1126 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @samtech, Define separate bundle configuration files for each region. These configuration files will specify the resources (notebooks, jobs, pipelines) and their respective paths. For example, you can have amer_bundle.yml, apac_bundle.yml, and eme...

  • 1 kudos
BriGuy
by New Contributor II
  • 3231 Views
  • 2 replies
  • 0 kudos

create a one off job run using databricks SDK.

I'm trying to build the job spec using objects.  When I try to call execute the job I get the following error.I'm somewhat new to python and not sure what I'm doing wrong here.  Is anyone able to help?Traceback (most recent call last): File "y:\My ...

  • 3231 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @BriGuy, Can you try importing this module first? from databricks.sdk.service.jobs import PermissionLevel

  • 0 kudos
1 More Replies
Dnirmania
by Contributor
  • 1737 Views
  • 2 replies
  • 0 kudos

Foreign Catalog refresh

Hi EveryoneI have recently created one Foreign catalog from AWS Redshift in databricks and I could see some tables too but when I ran REFRESH FOREIGN SCHEMA command, it failed with following error. I tried to search about it online but didn't get any...

  • 1737 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dnirmania
Contributor
  • 0 kudos

REFRESH FOREIGN SCHEMA is databricks command to refresh foreign catalog and I don't have visibility about the queries which its runs internally.

  • 0 kudos
1 More Replies
allinux
by New Contributor II
  • 1501 Views
  • 2 replies
  • 0 kudos

When Try Returns Success for Invalid S3 Path in Spark: Is This a Bug?

Try(spark.read.format("parquet").load("s3://abcd/abcd/")) should result in Failure, but when executed in the notebook, it returns Success as shown below. Isn't this a bug?Try[DataFrame] = Success(...)

  • 1501 Views
  • 2 replies
  • 0 kudos
Latest Reply
MuthuLakshmi
Databricks Employee
  • 0 kudos

@allinux The read is a valid way to load data. Why are you expecting a failure? can you please explain? 

  • 0 kudos
1 More Replies
loic
by Contributor
  • 1542 Views
  • 1 replies
  • 1 kudos

Resolved! Several executions of a single notebook lead to java.lang.OutOfMemoryError

Hello,I am facing an issue that I do not understand. I have as simple Scala notebook with a "read function" that reads a json file on an external storage and does few changes to this DataFrame. I do my test on "all purpose" compute, DS3v2 (14gig/4cor...

Screenshot 2025-02-06 at 11.35.05.png Screenshot 2025-02-06 at 12.02.51.png
  • 1542 Views
  • 1 replies
  • 1 kudos
Latest Reply
loic
Contributor
  • 1 kudos

Finally, we understood the issue by ourself.By default, Databricks create new session for each new job. It is possible to change this behavior with the spark configuration (to put in spark config section of the compute settings):spark.databricks.sess...

  • 1 kudos
FarBo
by New Contributor III
  • 13709 Views
  • 5 replies
  • 5 kudos

Spark issue handling data from json when the schema DataType mismatch occurs

Hi,I have encountered a problem using spark, when creating a dataframe from a raw json source.I have defined an schema for my data and the problem is that when there is a mismatch between one of the column values and its defined schema, spark not onl...

  • 13709 Views
  • 5 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

@Farzad Bonabi​ :Thank you for reporting this issue. It seems to be a known bug in Spark when dealing with malformed decimal values. When a decimal value in the input JSON data is not parseable by Spark, it sets not only that column to null but also ...

  • 5 kudos
4 More Replies
Puent3
by New Contributor II
  • 1588 Views
  • 4 replies
  • 0 kudos

Error: from databricks import lakehouse_monitoring

We are using the following import: "from databricks import lakehouse_monitoring". We are receiving this error:ImportError: cannot import name 'lakehouse_monitoring' from 'databricks.sdk' (/databricks/python/lib/python3.11/site-packages/databricks/sdk...

Puent3_0-1738878577910.png
  • 1588 Views
  • 4 replies
  • 0 kudos
Latest Reply
MadhuB
Valued Contributor
  • 0 kudos

I wasnt able to find that module. However there are options under sdk. Refer to Lakehouse monitoring SDK reference  %python import databricks print(dir(databricks.sdk)) 

  • 0 kudos
3 More Replies
Fikrat
by Databricks Partner
  • 1277 Views
  • 1 replies
  • 1 kudos

Lakeflow access

Hi,Can someone please advise how to sign up for Lakeflow access? I believe it's on public preview now, but it's not listed in my workspace's preview features list.Thanks!

  • 1277 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @Fikrat, LakeFlow is currently in a gated Public Preview. To participate in the preview, you need to contact your Databricks account team. It is not listed in the workspace's preview features list because it requires specific access permissions th...

  • 1 kudos
vidya_kothavale
by Contributor
  • 1743 Views
  • 2 replies
  • 0 kudos

How to Get the Size of Filtered Rows in Databricks SQL

I have a query that filters rows from a table based on a timestamp range. The query is as follows:SELECT COUNT(*) FROM table_name WHERE ts >= '2025-02-04 00:00:00' AND ts < '2025-02-05 00:00:00';This query returns 10 rows. I need to calculate the tot...

  • 1743 Views
  • 2 replies
  • 0 kudos
Latest Reply
MadhuB
Valued Contributor
  • 0 kudos

@vidya_kothavale try this code block. Keep in mind to handle the null values.SELECT SUM(OCTET_LENGTH(CAST(column1 AS STRING)) + OCTET_LENGTH(CAST(column2 AS STRING)) + OCTET_LENGTH(CAST(COALESCE(column3, '0') AS STRING))) as bytes, SUM(OCTET...

  • 0 kudos
1 More Replies
ronaldgeneblazo
by New Contributor II
  • 1802 Views
  • 2 replies
  • 1 kudos

Urgent: Iceberg REST catalog - load a table has new JSON format

Hello, we are using Databricks Unity catalog to load an Iceberg table, i.e., Delta Lake table with Uniform feature). We are using this guide: https://docs.databricks.com/en/external-access/iceberg.html. This has been working for us since last year bu...

  • 1802 Views
  • 2 replies
  • 1 kudos
Latest Reply
ronaldgeneblazo
New Contributor II
  • 1 kudos

Satyadeepak - it looks like this has been fixed on your end and we are no longer seeing this issue. Thanks for checking.

  • 1 kudos
1 More Replies
Juju
by New Contributor II
  • 16930 Views
  • 5 replies
  • 1 kudos

DeltaFileNotFoundException: No file found in the directory (sudden task failure)

Hi all,I am currently running a job that will upsert a table by reading from delta change data feed from my silver table. Here is the relevent snippet of code:  rds_changes = spark.read.format("delta") \ .option("readChangeFeed", "true") \ .optio...

  • 16930 Views
  • 5 replies
  • 1 kudos
Latest Reply
c-data
New Contributor II
  • 1 kudos

What was the fix?

  • 1 kudos
4 More Replies
deng_dev
by New Contributor III
  • 1234 Views
  • 1 replies
  • 0 kudos

Autoloader: Cross-account bucket Assume role access denied

 Hi everyone!I have a Databricks instance profile role that has permission to assume a role in another AWS account to access an S3 bucket in that account.When I try to assume the role using boto3, it correctly reads the Databricks AWS credentials, as...

  • 1234 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ayushi_Suthar
Databricks Employee
  • 0 kudos

Hi @deng_dev , Greetings!In the above error message, you will see a request ID in it, so can you please share that Request ID with the AWS Team to check why this request is getting denied as this looks like a permission issue.  Please let me know if ...

  • 0 kudos
carlos_tasayco
by Contributor
  • 1739 Views
  • 3 replies
  • 0 kudos

how to pull a parameter from .sql file with dbutils.notebook.run

Hi,I want to use this:result = dbutils.notebook.run('/Workspace/Usersxxxxt', 600, {"environment": inputEnvironment}) this pulls from this .sql file in that path:DROP TEMPORARY VARIABLE IF EXISTS strEnv;DECLARE VARIABLE strEnv STRING;SET VARIABLE strE...

carlos_tasayco_2-1738853738229.png
  • 1739 Views
  • 3 replies
  • 0 kudos
Latest Reply
MadhuB
Valued Contributor
  • 0 kudos

@carlos_tasayco There are two methods on how you can pass a variable to the other notebooks as input.Using Widgetsusing collect method like below. # In notebook1 result = spark.sql("SELECT value FROM table").collect()[0][0] dbutils.notebook.exit(resu...

  • 0 kudos
2 More Replies
Labels