cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

NTRT
by New Contributor III
  • 1062 Views
  • 2 replies
  • 0 kudos

cant read json file with just 1,75 MiB ?

Hi,I am realtively new on databricks, although I am conscious about lazy evaluation, transformations and actions and peristence.I have a json file (complex-nested) with about 1,73 MiB. when df = spark.read.option("multiLine", "false").json('dbfs:/mnt...

  • 1062 Views
  • 2 replies
  • 0 kudos
Latest Reply
koushiknpvs
New Contributor III
  • 0 kudos

This can be resolved by redefining the schema structure explicitly and using that schema to read the file. from pyspark.sql.types import StructType, StructField, StringType, IntegerType, ArrayType# Define the schema according to the JSON structuresch...

  • 0 kudos
1 More Replies
NTRT
by New Contributor III
  • 2175 Views
  • 4 replies
  • 0 kudos

Resolved! performance issues when readingjson-stat2

Hi,I am realtively new on databricks, although I am conscious about lazy evaluation, transformations and actions and peristence.I have a json file (complex-nested) with about 1,73 MiB. when df = spark.read.option("multiLine", "false").json('dbfs:/mnt...

  • 2175 Views
  • 4 replies
  • 0 kudos
Latest Reply
koushiknpvs
New Contributor III
  • 0 kudos

Please give me a kudos if this works.Efficiency in Data Collection: Using .collect() on large datasets can lead to out-of-memory errors as it collects all rows to the driver node. If the dataset is large, consider alternatives such as extracting only...

  • 0 kudos
3 More Replies
Mathias_Peters
by Contributor
  • 1210 Views
  • 2 replies
  • 0 kudos

Asset Bundles: Adding project_directory in DBT task breaks previous python task

Hi, I have a job consisting of three tasks:  tasks: - task_key: Kinesis_to_S3_new spark_python_task: python_file: ../src/kinesis.py parameters: ["${var.stream_region}", "${var.s3_base_path}"] j...

  • 1210 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mathias_Peters
Contributor
  • 0 kudos

Hi @Ajay-Pandey ,thank you for the hints. I will try to recreate the job via UI. I ran the tasks in a Github workflow. The file locations are mixed: the first two tasks (python and dlt) are located in the databricks/src folder. The dbt files come fro...

  • 0 kudos
1 More Replies
chandan_a_v
by Valued Contributor
  • 2439 Views
  • 2 replies
  • 1 kudos

Can't import local files under repo

I have a yaml file inside one of the sub dir in Databricks, I have appended the repo path to sys. Still I can't access this file. https://docs.databricks.com/_static/notebooks/files-in-repos.html

image
  • 2439 Views
  • 2 replies
  • 1 kudos
Latest Reply
Abhishek10745
New Contributor III
  • 1 kudos

Hello @chandan_a_v ,were you able to solve this issue?I am also experiencing the same thing where i cannot move file with extension .yml from repo folder to shared workspace folder.As per documentation, this is the limitation or functionality of data...

  • 1 kudos
1 More Replies
zero234
by New Contributor III
  • 836 Views
  • 1 replies
  • 0 kudos

Delta live table is inserting data multiple times

So I have created a delta live table Which uses spark.sql() to execute a query And uses df.write.mode(append).insert intoTo insert  data into the respective table And at the end i return a dumy table Since this was the requirement So now I have also ...

  • 836 Views
  • 1 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

whats your source? your sink is a delta table correct? how do you verify that there are no inserts happening? 

  • 0 kudos
Meshynix
by New Contributor III
  • 6233 Views
  • 6 replies
  • 0 kudos

Resolved! Not able to create external table in a schema under a Catalog.

Problem StatementCluster 1 (Shared Cluster) is not able to read the file location at "dbfs:/mnt/landingzone/landingzonecontainer/Inbound/" and hence we are not able to create an external table in a schema inside Enterprise Catalog.Cluster 2 (No Isola...

  • 6233 Views
  • 6 replies
  • 0 kudos
Latest Reply
Avi_Bricks
New Contributor II
  • 0 kudos

External table creation failing with error :- UnityCatalogServiceException:[RequestId=**** ErrorClass=INVALID_PARAMETER_VALUE] Unsupported path operation PATH_CREATE_TABLE on volume.Able to access and create files on external location.  

  • 0 kudos
5 More Replies
pshuk
by New Contributor III
  • 1386 Views
  • 1 replies
  • 0 kudos

run md5 using CLI

Hi,I want to run a md5 checksum on the uploaded file to databricks. I can generate md5 on the local file but how do I generate one on uploaded file on databricks using CLI (Command line interface). Any help would be appreciated.I tried running databr...

  • 1386 Views
  • 1 replies
  • 0 kudos
danial
by New Contributor II
  • 6674 Views
  • 3 replies
  • 1 kudos

Connect Databricks hosted on Azure, with RDS on AWS.

We have Databricks set up and running on Azure. Now we want to connect it with RDS (AWS) to transfer data from RDS to Azure DataLake using the Databricks.I could find the documentation on how to do it within the same cloud (Either AWS or Azure) but n...

  • 6674 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Danial Malik​ Hope everything is going great.Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so ...

  • 1 kudos
2 More Replies
cszczotka
by New Contributor III
  • 2090 Views
  • 4 replies
  • 0 kudos

Shallow clone and issue with MODIFY permission to source table

Hi,I'm running shallow clone for external delta tables. The shallow clone is failing for source tables where I don't have MODIFY permission. I'm getting below exception. I don't understand why MODIFY permission to source table is required. Is there a...

  • 2090 Views
  • 4 replies
  • 0 kudos
Latest Reply
Amit_Dass_Chmp
New Contributor III
  • 0 kudos

Also check this documentation on access mode :Shallow clone for Unity Catalog tables | Databricks on AWS Working with Unity Catalog shallow clones in Single User access mode, you must have permissions on the resources for the cloned table source as w...

  • 0 kudos
3 More Replies
surband
by New Contributor III
  • 6290 Views
  • 9 replies
  • 0 kudos

Pulsar Streaming (Read) - Benchmarking Information

We are doing a first time implementation of data streaming reading from a partitioned pulsar topics to a delta table managed by UC. We are unable to scale the job beyond about ~ 40k msgs/sec. Beyond 40k msgs/sec , the job fails.  I'd imagine Databric...

  • 6290 Views
  • 9 replies
  • 0 kudos
Latest Reply
surband
New Contributor III
  • 0 kudos

Attached Grafana screenshots

  • 0 kudos
8 More Replies
bradleyjamrozik
by New Contributor III
  • 674 Views
  • 0 replies
  • 0 kudos

Autoloader Failure Creating EventSubscription

Posting this here too in case anyone else has run into this issue... Trying to set up Autoloader File Notifications but keep getting an "Internal Server Error" message.Failure on Write EventSubscription - Internal error - Microsoft Q&A

  • 674 Views
  • 0 replies
  • 0 kudos
JacobKesinger
by New Contributor II
  • 3769 Views
  • 3 replies
  • 0 kudos

Resolved! Iterating over a pyspark.pandas.groupby.DataFrameGroupBy

I have a pyspark.pandas.frame.DataFrame object (that I called from `pandas_api` on a pyspark.sql.dataframe.DataFrame object).  I have a complicated transformation that I would like to apply to this data, and in particular I would like to apply it in ...

  • 3769 Views
  • 3 replies
  • 0 kudos
Latest Reply
MichTalebzadeh
Valued Contributor
  • 0 kudos

Hi,The error indicates that the Unity Catalog does not support Spark higher-order functions, such as those used in pandas_udf. This limitation likely comes from architectural or compatibility constraints. To resolve the issue, consider alternative ap...

  • 0 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels