cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826994223
by Databricks Employee
  • 1739 Views
  • 1 replies
  • 0 kudos

Efficient data retrieval process between Azure Blob storage and Azure databricks

I am trying to design a stream a data analytics project using functions --> event hub --> storage --> Azure factory --> databricks --> SQL server.What I am strugging with at the moment is the idea about how to optimize "data retrieval" to feed m...

  • 1739 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

Check out our auto loader capabilities that can automatically track and process files that need to be processed. AutoloaderThere are two options: directory listing, which is essentially completing the same steps that you have listed above but in a sl...

  • 0 kudos
User16826992666
by Databricks Employee
  • 3752 Views
  • 1 replies
  • 0 kudos

Resolved! Can you implement fine grained access controls on Delta tables?

I would like to provide row and column level security on my tables I have created in my workspace. Is there any way to do this?

  • 3752 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

Databricks includes two user functions that allow you to express column- and row-level permissions dynamically in the body of a view definition.current_user(): return the current user name.is_member(): determine if the current user is a member of a s...

  • 0 kudos
User16826987838
by Databricks Employee
  • 1772 Views
  • 1 replies
  • 1 kudos
  • 1772 Views
  • 1 replies
  • 1 kudos
Latest Reply
Mooune_DBU
Databricks Employee
  • 1 kudos

With Koalas, which is a Pandas'API on top of Spark Dataframes, there should be minimal code changes required.Please refer to this blog for more info

  • 1 kudos
User16765131552
by Databricks Employee
  • 8970 Views
  • 1 replies
  • 0 kudos

Read excel files and append to make one data frame in Databricks from azure data lake without specific file names

I am storing excel files in Azure data lake (gen 1). They follow filenames follow the same pattern "2021-06-18T09_00_07ONR_Usage_Dataset", "2021-06-18T09_00_07DSS_Usage_Dataset", etc. depending on the date and time. I want to read all the files in th...

  • 8970 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

If you are attempting to read all the files in a directory you should be able to use a wild card and filter using the extension. For example: df = (spark .read .format("com.crealytics.spark.excel") .option("header", "True") .option("inferSchema", "tr...

  • 0 kudos
HowardWong
by Databricks Employee
  • 2269 Views
  • 1 replies
  • 0 kudos

How many users can the JDBC endpoint support in the All Purpose HC?

What is the max number of users can the JDBC endpoint support in the All Purpose high concurrency cluster? To support more sql workloads, is it better to go with Databricks Sql Enpoints?

  • 2269 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

There is a limit an execution context limit of 145. This means you can have at most 145 notebooks attached to a cluster. https://kb.databricks.com/execution/maximum-execution-context.htmlIf you are primarily using SQL then Databricks SQL Endpoints wo...

  • 0 kudos
Srikanth_Gupta_
by Databricks Employee
  • 2302 Views
  • 1 replies
  • 0 kudos

Resolved! Does size of optimized files after running OPTIMIZE varies between cloud providers (S3, Blob and GCS)?

are there any other parameters to consider running OPTIMIZE depending cloud vendor?

  • 2302 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ryan_Chynoweth
Databricks Employee
  • 0 kudos

The optimize is not dependent on the cloud provider whatsoever. Optimize will produce the same results regardless of the underlying storage. It is idempotent, meaning if it is run twice on the same dataset the the second execution has no effect.

  • 0 kudos
Anonymous
by Not applicable
  • 3623 Views
  • 1 replies
  • 0 kudos
  • 3623 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

If you expect a column to be commonly used in query predicates and if that column has high cardinality (that is, a large number of distinct values), then use ZORDER BY.You can specify multiple columns for ZORDER BY as a comma-separated list. However,...

  • 0 kudos
User16137833804
by Databricks Employee
  • 2162 Views
  • 1 replies
  • 0 kudos
  • 2162 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

we have a limit of 100 Secret scopes and each secret can have 1000 key-value pair of secrets.https://docs.microsoft.com/en-us/azure/databricks/security/secrets/secret-scopes

  • 0 kudos
Anonymous
by Not applicable
  • 1622 Views
  • 1 replies
  • 0 kudos
  • 1622 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

Hi we have best practices around streaming with Kinesis and you can find them here .https://docs.databricks.com/spark/latest/structured-streaming/kinesis-best-practices.html

  • 0 kudos
User16826987838
by Databricks Employee
  • 1852 Views
  • 1 replies
  • 0 kudos
  • 1852 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

nope not possible you can set the name that shows up in azure portal though (but only when you’re first creating the WS i believe)

  • 0 kudos
User16826987838
by Databricks Employee
  • 1651 Views
  • 1 replies
  • 0 kudos
  • 1651 Views
  • 1 replies
  • 0 kudos
Latest Reply
User16826994223
Databricks Employee
  • 0 kudos

Running Jobs configured to use an Existing All-Purpose Cluster is considered to be a Job client. If you set workload_type.clients.jobs = false on an all-purpose cluster, it can not be used to run jobs.

  • 0 kudos
aladda
by Databricks Employee
  • 11624 Views
  • 1 replies
  • 1 kudos
  • 11624 Views
  • 1 replies
  • 1 kudos
Latest Reply
aladda
Databricks Employee
  • 1 kudos

In Databricks, each cluster creates an initial spark session. And each notebook creates a spark subsession within the same. A temporary View created in one notebook isn't accessible to others. If you need to share view across notebooks, you use Globa...

  • 1 kudos
aladda
by Databricks Employee
  • 5181 Views
  • 1 replies
  • 0 kudos
  • 5181 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

Coalesce essentially groups multiple partitions into a larger partitions. So use coalesce when you want to reduce the number of partitions (and also tasks) without impacting sort order.  Ex:- when you want to write-out a single CSV file output instea...

  • 0 kudos
aladda
by Databricks Employee
  • 2927 Views
  • 1 replies
  • 0 kudos
  • 2927 Views
  • 1 replies
  • 0 kudos
Latest Reply
aladda
Databricks Employee
  • 0 kudos

You can disable magic commands for a workspace but not on a per-user basis. Additional details about Workspace ACLs can be found here - https://docs.databricks.com/security/access-control/workspace-acl.html

  • 0 kudos
Labels