cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

rt-slowth
by Contributor
  • 1066 Views
  • 5 replies
  • 2 kudos

AutoLoader File notification mode Configuration with AWS

   from pyspark.sql import functions as F from pyspark.sql import types as T from pyspark.sql import DataFrame, Column from pyspark.sql.types import Row import dlt S3_PATH = 's3://datalake-lab/XXXXX/' S3_SCHEMA = 's3://datalake-lab/XXXXX/schemas/' ...

  • 1066 Views
  • 5 replies
  • 2 kudos
Latest Reply
djhs
New Contributor III
  • 2 kudos

Was this resolved? I run into the same issue

  • 2 kudos
4 More Replies
brianbraunstein
by New Contributor II
  • 201 Views
  • 1 replies
  • 0 kudos

spark.sql not supporting kwargs as documented

This documentation https://api-docs.databricks.com/python/pyspark/latest/pyspark.sql/api/pyspark.sql.SparkSession.sql.html#pyspark.sql.SparkSession.sql claims that spark.sql() should be able to take kwargs, such that the following should work:display...

  • 201 Views
  • 1 replies
  • 0 kudos
Latest Reply
brianbraunstein
New Contributor II
  • 0 kudos

Ok, it looks like Databricks might have broken this functionality shortly after it came out: https://community.databricks.com/t5/data-engineering/parameterized-spark-sql-not-working/m-p/57969/highlight/true#M30972

  • 0 kudos
Phani1
by Valued Contributor
  • 332 Views
  • 4 replies
  • 0 kudos

Parallel execution of SQL cell in Databricks Notebooks

Hi Team,Please provide guidance on enabling SQL cells  parallel execution in a notebook containing multiple SQL cells. Currently, when we execute notebook and all the SQL cells they run sequentially. I would appreciate assistance on how to execute th...

  • 332 Views
  • 4 replies
  • 0 kudos
Latest Reply
Ajay-Pandey
Esteemed Contributor III
  • 0 kudos

Hi @Phani1 Yes you can achieve this scenario with the help of Databricks Workflow jobs where you can create task and dependencies for each other. 

  • 0 kudos
3 More Replies
subha2
by New Contributor II
  • 197 Views
  • 2 replies
  • 0 kudos

metadata driven DQ validation for multiple tables dynamically

There are multiple tables in the config/metadata table. These tables need to bevalidated for DQ rules.1.Natural Key / Business Key /Primary Key cannot be null orblank.2.Natural Key/Primary Key cannot be duplicate.3.Join columns missing values4.Busine...

  • 197 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @subha2, To dynamically validate the data quality (DQ) rules for tables configured in a metadata-driven system using PySpark, you can follow these steps: Define Metadata for Tables: First, create a metadata configuration that describes the rules ...

  • 0 kudos
1 More Replies
rt-slowth
by Contributor
  • 651 Views
  • 6 replies
  • 0 kudos

why the userIdentity is anonymous?

Do you know why the userIdentity is anonymous in AWS Cloudtail's logs even though I have specified an instance profile?

  • 651 Views
  • 6 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 0 kudos
5 More Replies
GeorgHeiler
by New Contributor III
  • 4216 Views
  • 9 replies
  • 2 kudos

EDU discount for university research project

I need databricks for a university research project. Is there any possibility of EDU discounts on DBU? So far I was unable to reach out to Databricks sales. Can you connect me with someone from DB?

  • 4216 Views
  • 9 replies
  • 2 kudos
Latest Reply
FeliciaWilliam
New Contributor III
  • 2 kudos

Thanks for good advices

  • 2 kudos
8 More Replies
Tom_Greenwood
by New Contributor III
  • 2423 Views
  • 9 replies
  • 2 kudos

UDF importing from other modules

Hi community,I am using a pyspark udf. The function is being imported from a repo (in the repos section) and registered as a UDF in a the notebook. I am getting a PythonException error when the transformation is run. This is comming from the databric...

Tom_Greenwood_0-1706798998837.png
  • 2423 Views
  • 9 replies
  • 2 kudos
Latest Reply
DennisB
New Contributor III
  • 2 kudos

I was getting a similar error (full traceback below), and determined that it's related to this issue. Setting the env variables DATABRICKS_HOST and DATABRICKS_TOKEN as suggested in that Github issue resolved the problem for me (albeit it's not a grea...

  • 2 kudos
8 More Replies
shanebo425
by New Contributor II
  • 185 Views
  • 1 replies
  • 0 kudos

Databricks OutOfMemory error on code that previously worked without issue

I have a notebook in Azure Databricks that does some transformations on a bronze tier table and inserts the transformed data into a silver tier table. This notebook is used to do an initial load of the data from our existing system into our new datal...

  • 185 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Moderator
  • 0 kudos

Please review your Spark UI from the old job execution versus the new job execution. You might need to check if the data volume has increase and that could be the reason of the OOM

  • 0 kudos
Sagas
by New Contributor II
  • 154 Views
  • 2 replies
  • 0 kudos

SparkR or sparklyr not showing history

Hi,for some reason Azure Databricks doesn't show History if the data is saved with SparkR (2 in the figure below) or Sparklyr (3), but it does show it with Data Ingestion (0) or with PySpark (1). Is this a known bug or am I doing something wrong? Is ...

Databricks_history.PNG SparkR.PNG Sparklyr.PNG
Data Engineering
sparklyr
SparkR
  • 154 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Sagas, Let’s address your questions regarding Azure Databricks, SparkR, and Sparklyr. History in Azure Databricks: Each operation that modifies a Delta Lake table creates a new table version. You can use history information to audit operation...

  • 0 kudos
1 More Replies
Phani1
by Valued Contributor
  • 117 Views
  • 1 replies
  • 1 kudos

Job cluster configuration for 24/7

Hi Team,We intend to activate the job cluster around the clock. We  consider the following parameters regarding cost:  - Data volumes - Client SLA for job completion- Starting with a small cluster configuration Please advise on any other options we s...

  • 117 Views
  • 1 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @Phani1, When configuring a job cluster for 24/7 operation, it’s essential to consider cost, performance, and scalability. Here are some recommendations based on your specified parameters: Data Volumes: Analyze your data volumes carefully. If...

  • 1 kudos
NOOR_BASHASHAIK
by Contributor
  • 483 Views
  • 3 replies
  • 0 kudos

Machine Type for VACUUM operation

Dear allI have a workflow with 2 tasks : one that does OPTIMIZE, followed by one that does VACUUM. I used a cluster with F32s driver and F64s - 8 workers (auto-scaling enabled). All 8 workers are launched by Databricks as soon as OPTIMIZE starts. As ...

NOOR_BASHASHAIK_0-1710268182562.png
Data Engineering
best practice
F series
optimize
vacuum
  • 483 Views
  • 3 replies
  • 0 kudos
Latest Reply
ArturOA
New Contributor II
  • 0 kudos

Hi,were you able to get any useful help on this?

  • 0 kudos
2 More Replies
Paul92S
by New Contributor III
  • 1397 Views
  • 2 replies
  • 1 kudos

Resolved! DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Hi,I am having an issue of loading source data into a delta table/ unity catalog. The error we are recieving is the following:grpc_message:"[DELTA_EXCEED_CHAR_VARCHAR_LIMIT] Exceeds char/varchar type length limitation. Failed check: (isnull(\'metric_...

  • 1397 Views
  • 2 replies
  • 1 kudos
Latest Reply
Palash01
Contributor III
  • 1 kudos

 Hey @Paul92S Looking at the error message it looks like column "metric_name" is the culprit here:Understanding the Error:Character Limit Violation: The error indicates that values in the metric_name column are exceeding the maximum length allowed fo...

  • 1 kudos
1 More Replies
Olaoye_Somide
by New Contributor II
  • 338 Views
  • 1 replies
  • 0 kudos

How to Implement Custom Logging in Databricks without Using _jvm Attribute with Spark Connect?

Hello Databricks Community,I am currently working in a Databricks environment and trying to set up custom logging using Log4j in a Python notebook. However, I've run into a problem due to the use of Spark Connect, which does not support the _jvm attr...

Data Engineering
Apache Spark
data engineering
  • 338 Views
  • 1 replies
  • 0 kudos
Latest Reply
arpit
Contributor III
  • 0 kudos

import logging logging.getLogger().setLevel(logging.WARN) log = logging.getLogger("DATABRICKS-LOGGER") log.warning("Hello")

  • 0 kudos
Phani1
by Valued Contributor
  • 197 Views
  • 1 replies
  • 0 kudos

Boomi integrating with Databricks

Hi Team,Is there any impact when integrating Databricks with Boomi as opposed to Azure Event Hub? Could you offer some insights on the integration of Boomi with Databricks?https://boomi.com/blog/introducing-boomi-event-streams/Regards,Janga

  • 197 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Phani1, Let’s explore the integration of Databricks with Boomi and compare it to Azure Event Hub. Databricks Integration with Boomi: Databricks is a powerful data analytics platform that allows you to process large-scale data and build machin...

  • 0 kudos
CarstenWeber
by New Contributor II
  • 356 Views
  • 4 replies
  • 1 kudos

Resolved! Invalid configuration fs.azure.account.key trying to load ML Model with OAuth

Hi Community,i was trying to load a ML Model from a Azure Storageaccount (abfss://....) with: model = PipelineModel.load(path) i set the spark config:  spark.conf.set("fs.azure.account.auth.type", "OAuth") spark.conf.set("fs.azure.account.oauth.provi...

  • 356 Views
  • 4 replies
  • 1 kudos
Latest Reply
CarstenWeber
New Contributor II
  • 1 kudos

@daniel_sahal using the settings above did indeed work. 

  • 1 kudos
3 More Replies
Labels